My Views on Programming Languages

Posted on 2022-08-24 In Programming

There are many debates about the good and bad aspects of programming languages, and I will present my own viewpoints in this blog. I will also explain the keys to learning a programming language, and my understanding of design patterns.

Code as an Expression of Intent

When we write code, we use it to express our intentions. When we read code, we understand the author's intentions through it. Similarly, a compiler or interpreter executes our intentions based on the code. Therefore, the primary function of code is to express intent.

Characteristics of a Good Programming Language

Given that code is an expression of intent, a good programming language should meet the following criteria:

Directness: We should be able to "translate" our intentions directly into code. There should only be differences in words and symbols between the intent and the code, with no rewritten or information loss. This makes it easier to both write code from our intentions and understand the author's intentions when reading the code.
Correctness: The language should make it harder to write bugs. Generally, correctness and directness are positively correlated. The simpler the transformation from intent to code, the fewer chances for errors that prevent the code from correctly representing the intent. Additionally, the less information the code loses (such as types), the more a compiler or static analysis tool can understand the intent and check its correctness. For example, if we want to access an object's attribute, the tool can check if the attribute exists or if the object could be null. An exception to directness is when operations are prone to misuse; in such cases, making them more cumbersome can help prevent errors, trading some directness for correctness.
Convenience: For frequently used intentions, the language should offer concise ways to express them, similar to how "+" and "-" symbols in mathematics make addition and subtraction easier to write.

Let's take an example. We want to express "user information." It includes a username, nickname, and an optional birthday. The username and nickname are text, while the birthday is a date. Here are some examples of how different programming languages might express this intention.

Definition:

data class UserInfo(
    val userName: String,
    val nickName: String,
    val birthday: Date? = null,
)

To construct user information with only a user name and nickname (no birthday):

UserInfo(
    userName = "Jason5Lee",
    nickName = "Jason Lee",
)

To create a new user information record, changing only the birthday and keeping all other information the same:

1	userInfo.copy(birthday = Date(1997, 12, 17))

Definition:

struct UserInfo {
    user_name: String,
    nick_name: String,
    birthday: Option<NaiveDate>,
}

To construct user information with only a user name and nickname (no birthday):

UserInfo {
    user_name: "Jason5Lee".to_string(),
    nick_name: "Jason Lee".to_string(),
    birthday: None,
}

To create a new user information record, changing only the birthday and keeping all other information the same:

UserInfo {
    birthday: Some(NaiveDate::from_ymd_opt(1997, 12, 17).unwrap()),
    ..user_info
}

Definition:

record UserInfo(
    string UserName,
    string NickName,
    DateTime? Birthday = null
);

To construct user information with only a user name and nickname (no birthday):

new UserInfo(
    UserName: "Jason5Lee",
    NickName: "Jason Lee"
);

To create a new user information record, changing only the birthday and keeping all other information the same:

1	userInfo with { Birthday = new DateTime(1997, 12, 17, 0, 0, 0) }

Definition:

type UserInfo = {
    readonly userName: string,
    readonly nickName: string,
    readonly birthday?: Date,
}

To construct user information with only a user name and nickname (no birthday):

{
    userName: "Jason5Lee",
    nickName: "Jason Lee",
} satisfies UserInfo

To create a new user information record, changing only the birthday and keeping all other information the same:

{
    ...userInfo,
    birthday: new Date("1997-12-17"),
} satisfies UserInfo

Definition:

@dataclass(frozen=True)
class UserInfo:
    user_name: str
    nick_name: str
    birthday: Optional[date] = None

To construct user information with only a user name and nickname (no birthday):

UserInfo(
    user_name = "Jason5Lee",
    nick_name = "Jason Lee",
)

To create a new user information record, changing only the birthday and keeping all other information the same:

1	dataclasses.replace(userInfo, birthday = date(1997, 12, 17))

In these languages, I can directly define and construct "user information" accurately reflecting my intent without any rewritten or information loss. The type checking (including Python's static type checker, Pyright) ensures that attribute values are of the correct type and that nullable attributes can indeed be null.

In other programming languages, I either have to sacrifice this directness or use design patterns. For instance, in many languages, I would need to use new UserInfo("Jason5Lee", "Jason Lee", null) to construct an object. Here, I must specify the values in a certain order rather than directly mapping them to properties as in my mind. When reading the code, it is also not immediately clear which properties correspond to which values, demonstrating a lack of directness.

Alternatively, I could use the Builder pattern in these languages, allowing me to specify property values through the builder object. However, this introduces a Builder concept that doesn't exist in my original intent. Moreover, the Builder pattern makes it harder to ensure that all necessary properties have values, reducing correctness.

Consider another example: suppose we have a sequence of user information, and we want to find the usernames of all users born in or after the year 2000. Some programmers might instinctively start writing code to create a new sequence of strings, loop through each user, add usernames of those who meet the birthday condition to the new sequence, and finally return this new sequence as the result. This process involves "rewriting" the original intent into a more detailed procedure, which then needs to be expressed in code. Here, the code does not directly represent the original intent. This not only requires extra rewritten when writing code but also "deductive reasoning" when reading the code to understand the fundamental intent—a task more suited to Sherlock Holmes than to a programmer.

In some programming languages, this intent can be expressed more directly.

// users: List<UserInfo>
users
    .filter {
        val birthday = it.birthday
        birthday != null && birthday.year >= 2000
    }
    .map { it.userName }

// users: impl Iterator<Item=UserInfo>
users
    .filter(|user| match user.birthday {
        None => false,
        Some(birthday) => birthday.year() >= 2000,
    })
    .map(|user| user.user_name)

Linq Syntax:

// users: IEnumerable<UserInfo>
from user in users
where user.Birthday is {} birthday && birthday.Year >= 2000
select user.UserName

Lambda Syntax:

1
2
3

users
    .Where(user => user.Birthday is {} birthday && birthday.Year >= 2000)
    .Select(user => user.UserName)

// users: UserInfo[]
users
  .filter(user => user.birthday !== undefined && user.birthday.getUTCFullYear() >= 2000)
  .map(user => user.userName)

# users: list[UserInfo]
[
    user.user_name 
    for user in users 
    if user.birthday != None and user.birthday.year >= 2000
]

The difference between these code and our original intent is mainly in terms and symbols, with very few restructuring. It doesn't require altering our intentions to be expressed. Once you become familiar with this style, its readability improves as it directly represents the original intent.

It's not appropriate to use this method for sequence processing in every situation. If your intention is to create a sequence first and then add elements to it, you should write it that way. This is why I'm not fond of some "functional programming languages" — they force you to write code in a so-called "functional" manner, which can feel indirect. In my opinion, the ability to express oneself directly and straightforwardly is the most important.

You may have noticed that, in addition to checking the birth year, the code also checks whether the birthday exists. This is because our original intent had a flaw: it didn't consider that the birthday might be optional. If we write code without considering the possibility of a missing birthday, a type-checking tool can detect this issue. This is an aspect of correctness. Since the code retains information about whether a field is optional or possibly null, the tool can use this information for validation.

The Simplicity of the Language and the Simplicity of its Usage

The simplicity of a programming language can be viewed in two ways: the inherent simplicity of the language itself and the simplicity of using it. Some people use "simplicity" as a reason for a language not supporting certain features, referring to the inherent simplicity of the language.

However, the more crucial simplicity, in my opinion, is often overlooked: the simplicity of usage. A straightforward programming language makes it easier to translate original intentions into code, simplifying the process of writing and reading code. The inherent simplicity of a language only reduces the effort by a constant factor. Learning a language with many features does require more effort, but this effort is only needed once. If a language is simple to use, it reduces the effort required to write and read each piece of code. If the effort required to write n amount of code in a language is a*n + b, the inherent simplicity mainly affects the value of b, while the simplicity of usage can reduce a. When n is large, I prefer a smaller a.

Of course, having too many features, especially those prone to misuse, can reduce directness and simplicity of usage. However, many people focus excessively on the inherent simplicity of a language and ignore the simplicity of expression, which I believe needs to change.

Furthermore, some complexities are objectively present. Ignoring them only makes handling them more troublesome, as you still need to deal with them, but without clear expression in the code. A typical example is types. You always need to consider the type of a value to decide how to handle it, but if this information is not expressed in the code, handling it becomes much more difficult. Not only do you need to deduce the type like a detective when reading the code, but the code also cannot be checked by static checking tools. Ignoring objective complexities and mistaking this for simplicity can sometimes cause more trouble.

This is also why I prefer Rust for low-level development. When manually managing memory, lifetimes are objectively necessary to consider to avoid errors like using freed memory. Rust’s lifetime and borrowing checks are not easy to learn, but they maintain lifetime information from your intent, allowing readers to understand and enabling the borrowing checker to prevent values from being used outside their lifetimes. The benefits of this simplification outweigh the learning complexity.

The Importance of Programming Languages

In mathematics, we use a specific language to describe formulas. We do not use natural language because mathematical formulas are more precise and clear than natural language. This is why pseudocode is often used to describe algorithms. Even if we do not need the computer to execute it, code has the advantage of precision and clarity over natural language for human reading. Code also has higher correctness, allowing tools to check for potential errors in our intentions. Therefore, a programming language is not just a "tool"; it serves as an irreplaceable means of expressing intentions concisely and unambiguously, which is why code cannot be replaced by comments or other natural language texts.

Code, like mathematical formulas, is also convenient. Common intentions are represented using short symbols. If we used natural language for mathematical formulas, it would initially seem more familiar but would be much less convenient to read and write compared to current mathematical formulas. This is another a*n + b scenario. Convenience reduces a.

Comments

The role of comments and documentation should be to describe higher-level purposes rather than detail intentions. For example, if you write a sorting algorithm, documentation can state that the function sorts and whether the sorting is stable. For a section of code, comments can explain the purpose of that code. However, code remains the primary choice for expressing each specific step.

Testing

Testing should focus on verifying whether your intended solution achieves its goal, such as whether a sorting function is correct. However, in a less direct programming language, we also need to test whether the rewritten intentions align with the original intentions. For languages with lower correctness, we need to test each step's correctness. Conversely, if a language has high directness and correctness, it only requires "coarse-grained", more "black-box" testing. This is one reason why testing cannot replace type checking: testing is good at verifying overall correctness, not the correctness of each detail step.

Test-Driven Development (TDD) is a common development approach. It should be used to build intentions. That is, when you have a rough goal but are unsure how to achieve it, you can propose a solution for a specific input, then refine this solution based on additional inputs, eventually arriving at a solution that handles all inputs correctly. In TDD, tests represent specific inputs and expected outputs, while implementation code represents the intended solution which is continuously refined. A good programming language is still valuable for expressing this intention through code.

How to Learn a Programming Language

In my view, the key to learning a programming language is learning how to express intentions with it. For example, an if statement represents executing different code under different conditions, corresponding to the thought "if... then... else...". In OOP (object-oriented programming), inheritance represents an "is-a" relationship. If a class is a type of another class, this relationship can be represented by inheritance. The combination of inheritance and method overriding also enables polymorphism (representing that the same method can have different implementations in different specific classes). In my article on object-oriented and functional programming, I detailed how these paradigms express specific intentions.

On Design Patterns

In a programming language, if an intention is hard to express directly, we use a pattern, hence design patterns. In my opinion, design patterns exist because languages lack corresponding features. Many people believe that the need for design patterns is a flaw of OOP because the concept was first introduced in a book titled "Design Patterns: Elements of Reusable Object-Oriented Software". This can create the illusion that non-OOP languages do not need design patterns. In fact, design patterns arise because some intentions are hard to express directly in languages with only the object-oriented paradigm. Even in a non-OOP language, if an intention cannot be expressed directly, design patterns are needed. The Builder pattern mentioned earlier is an example. We need the Builder pattern because some languages cannot create objects by specifying property values, and this is not directly related to whether the language is object-oriented.

Another example is the Visitor pattern, used to express data containing multiple cases, where each case might contain different information. We want to express different processing for different cases of data. The cases and information for each case are stable, but the processing is unknown and extensible. In summary, the Visitor pattern defines a Visitor interface containing methods to process each case. The data itself includes a polymorphic visit method that accepts a Visitor interface. For each case, the implementation of the visit method calls the corresponding method on the Visitor interface. This way, processing of the data can be represented by implementing the Visitor interface and processed via the visit method. This not only expresses the data, cases, and processing but also prevents missing case processing because the corresponding interface methods must be implemented.

If a language does not need the Visitor pattern, it should have corresponding features to express such data. In Kotlin and Scala, the sealed keyword can express this. Rust uses enum. TypeScript and Python's type annotations use union. We can say these languages do not need the Visitor pattern because they have a direct way to express it.

Some might argue that they can define an interface or class and cast down during processing. This approach avoids both the Visitor pattern and language features. However, no one stops them from doing this in an object-oriented language like Java. They cannot claim design patterns are unique to object-oriented languages or Java. Furthermore, this is a typical example of ignoring objective complexity. They still need to care about whether the data should be processed by case and what cases exist. But if this expression is unclear, judging it becomes more troublesome, making it harder to ensure all cases are processed, and more prone to errors.

When learning design patterns, I suggest focusing on understanding the intentions they aim to express.

Special Thanks

My understanding of programming has been greatly influenced by Scott Wlaschin. He has an F# blog and has written a book "Domain Modeling Made Functional: Tackle Software Complexity with Domain-Driven Design and F#" on combining domain modeling and functional programming. I generalized his domain modeling theory into "expressing intentions". Although I disagree with him on some views regarding functional and OOP, this does not diminish his significant contribution to shaping my views, for which I am very grateful.

0%