Not Pattern - an Algebraic Data Types alternative invented by me

Chinese Version

TL;DR: Summary

Algebraic Data Type (abbreviated ADT, sometimes called Discriminated Union, Tagged Union, enum in Rust) is a language feature which can be used to express that a data has multiple cases and each case has different content. Compared with inheritance or interface implementation, sometimes the cases and structure of a data are stable, but the operation of this data (especially the operation of different processing according to the case) is uncertain and extensible. Algebraic data types are suitable for this situation. Currently, many languages do not support this language feature. This article will introduce a pattern of my own design to express this kind of data, finding a balance between ease of use and less error-prone.

The examples in this article are written in C#, but the pattern itself applies to almost any language.

Introducing Not Pattern

Suppose we have an identity data Identity, it has three cases: Guest, User and Admin. Guest is a singleton with no data, User has a UserId, Admin has an AdminId.

You've probably been using a pattern of representing data with an interface/open class, with a subclass in each case.

1
2
3
4
5
6
7
8
9
public record Identity
{
public record Guest : Identity { public static Guest Instance { get; } = new Guest(); }
public record User(UserId Id) : Identity;
public record Admin(AdminId Id) : Identity;

// Use private constructor to prevent other classes from inheriting Identity
private Identity() { }
}

Then, you can handle it case by case by determining which specific class an Identity type object is.

1
2
3
4
5
6
7
8
9
10
11
12
switch (identity) {
case Identity.Guest:
return Redirect("/login");
case Identity.Admin admin:
Log.Info($"request from admin: {admin.Id.Value}");
return BadRequest("calling user operation as an admin");
case Identity.User user:
Log.Info($"request from user: {user.Id.Value}");
return PerformAction(user);
default:
throw new NotSupportedException("unsupported identity");
}

It seems to achieve the purpose, but it has a drawback: when writing data processing code, we are more likely to miss some cases; it is also more difficult to do code review, unless you look at the class definition every time, or remember which cases each class has accurately. A feature of algebraic data types is exhaustiveness checking. It checks the data processing code, and generates compile errors or warnings if there are unhandled cases. The sealed keyword in languages ​​like Scala and Kotlin enables this check.

The Visitor pattern can also ensure that each case is handled. This pattern requires you to pass the processing of each case as a method implementation of an interface or a delegate (i.e. higher-order function) parameter and into a visit function. The disadvantage of this pattern is that it cannot use the convenience features such as C#'s pattern matching, it has overhead, and it is not readable enough for people who do not understand this pattern.

So I designed another pattern to achieve exhaustiveness checking, which sacrifices certain correctness guarantees for convenience and readability, which is the purpose of the Not pattern.

This pattern is very simple, you just need to add the following method to the Identity class.

1
public Exception Not_Guest_User_Admin() => new Exception($"Not method called on {this}");

The method's name starts with Not, followed by the name of each case, separated by an underscore, and returns an exception.

With this method, when you need to process the data case by case, you can write the default branch first, throwing the exception returned by the Not method. This method can be written quickly through the completion function of the IDE.

1
2
3
4
5
switch (identity) {
// TODO
default:
throw identity.Not_Guest_User_Admin();
}

Then, you can find out what cases are there from the method name. Then complete the handling of each case.

This mode is as simple as that. The reason I say it gives up some correctness is because it still doesn't prevent you from missing a case, and the Not method itself can be written wrong. But it has a low probability of making mistakes, it is easy to code review, and you can use the pattern matching feature of C#. It is also very readable, here is the complete code for this example.

1
2
3
4
5
6
7
8
9
10
11
12
switch (identity) {
case Identity.Guest:
return Redirect("/login");
case Identity.Admin admin:
Log.Info($"request from admin: {admin.Id.Value}");
return BadRequest("calling user operation as an admin");
case Identity.User user:
Log.Info($"request from user: {user.Id.Value}");
return PerformAction(user);
default:
throw identity.Not_Guest_User_Admin();
}

Even people who don't know the pattern can understand: if it is Guest, do this; if it is Admin, do this; if it is User, do this; otherwise, throw a "not guest, admin or user" exception.

Adding a new case

One of the effects of Exhaustiveness checking is that if you add a case of data, there will be compilation errors or warnings where the data is processed case by case, to remind you to add the processing of the newly added case. And if you use the Not pattern, you have two ways to add a case.

Modify the Not method

One way is to modify the Not method to add a new case. Note: Do not use the IDE's rename method function, just change the method name directly. At this point, there will be "method not found" compile errors for code that doesn't handle the new case. When you see errors like this, you need to modify the Not method and add the handling of the new case. This compile error prevents you from missing out on the processing.

Obsolete Not method

Another way is to add a complete Not method without modifying the original Not method, and then add the Obsolete attribute to the original method, and indicate in the information which case need to be added or which Not method should be used. This way, there will be a compile warning for code that doesn't handle the new case. You can follow these warnings to add the unhandled case. The advantage of this approach is that it will not make the program fail to compile, and you can improve it step by step.

Use multiple fields to represent each case and apply the Not pattern

Algebraic data types can also be implemented in C# by treating each case as a field. For valid data, exactly one field is non-null. This implementation has two advantages, one is that a case of data can be an existing type, it does not require this type to inherit from a class; and the other is that both the data type and the type of the case can be value types.

Suppose UserId has two cases, Int64 and String. An example using multiple fields to represent each case and applying the Not pattern is as follows.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
public record struct UserId
{
// Marking init as private prevents users from constructing values where both fields are non-null.
public Int64? Int64 { get; private init; }
public String? String { get; private init; }

public static explicit operator UserId(Int64 value) => new UserId { Int64 = value };
public static explicit operator UserId(String value) => new UserId { String = value };

public Exception Not_Int64_String() => new Exception($"Not method called on {this}");
}

// ...

// `is {}` can be used to check if it is null, and get a value of type with the nullable removed
if (userId.Int64 is {} int64Id)
{
// For example, `int64Id` is of type `Int64` instead of `Int64?`
return $"Int64 UserId: {int64Id}";
}
else if (userId.String is {} stringId)
{
return $"Long UserId: {stringId}";
}
else throw userId.Not_Int64_String();

Wrap each case as an object and apply the Not pattern

Algebraic data types can also be implemented by wrapping the object type. This wrapper is usually a value type. Each case is stored as an object. In addition to being able to use existing types, this approach can also reduce the number of fields. But for the case that is a value type, converting it to object will box it.

Using UserId as an example, an example of implementing an algebraic data type in this way and applying the Not pattern is as follows.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
public record struct UserId
{
// Marking init as private prevents users from constructing `Value` other than `Int64` and `String`.
public object Value { get; private init; }

public static explicit operator UserId(Int64 value) => new UserId { Value = value };
public static explicit operator UserId(String value) => new UserId { Value = value };

public Exception Not_Int64_String() => new Exception($"Not method called on {this}");
}

// ...

switch (uid.Value)
{
case Int64 int64Id:
return $"Int64 ID: {int64Id}";
case String stringId:
return $"String ID: {stringId}";
default:
throw uid.Not_Int64_String();
}

Use the Not pattern on enum types

C#'s enum types can be considered of as a special case of algebraic data types, that is, each case of data has no content. Unfortunately, C#'s enum types neither support exhaustiveness checking. We can use the Not pattern on enums through extension methods to reduce errors that a case is missed.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
public enum Identity
{
Guest,
User,
Admin
}

public static class IdentityNotMethod
{
public static Exception Not_Guest_User_Admin(this Identity identity) => new Exception($"Not method called on {identity}");
}

// ...

switch (id)
{
case Identity.Guest:
return Redirect("/login");
case Identity.User:
return PerformAction();
case Identity.Admin:
return BadRequest("calling user operation as an admin");;
default:
throw id.Not_Guest_User_Admin();
}

If the Not method's exception is thrown

If your program throws the exception returned by the Not method, there are several possible situations.

  • The Not method itself is written mistakenly and doesn't cover all cases.
  • Not all cases mentioned in the method name are not handled.
  • If the Not method is marked Obsolete, there may be newly added cases unhandled.
  • If the data is a value type, a default value may be constructed from default or an uninitialized array, which usually indicates a bug in the program, such as not properly initializing the array.
  • If the data is an enum, it can actually store all integers. If the integer outside the enum branch is reasonable, the Not pattern should not be used; otherwise, the source of the illegal value should be looked for.

Shortcoming

A disadvantage of this pattern is that it is not suitable for very large amounts of cases. At this time, the Not method will be very long and inconvenient to maintain. In this case, the Visitor Pattern can be considered.

Summary

  • Core idea: a method starts with Not, each case is in its name, and returns an exception.
  • Use the name of this method to check whether each case is handled.
  • Three implementations of algebraic data types: inheritance, each case as a field, and wrapping cases as object.
  • All three implementations, as well as enums, can use the Not pattern.
  • Suitable for the data where the number of cases is not large and the name is not long.