Handling Concurrency: Understanding Coroutines, Async/Await, Scheduling, and More

As the number of concurrency that the service needs to support is getting higher and higher, technologies such as asynchronous I/O, coroutines, and async/await are being used increasingly. While these techniques can increase the amount of concurrency supported by a service, they aren't particularly easy to understand. In this article, I'll share my own understanding of these technologies, hoping to aid your understanding.

The Fundamental Idea of Coroutines

The operating system represents parallel tasks with threads. A process can include multiple threads. Thus, for tasks that require concurrent handling, such as multiple requests received by a server simultaneously, we can assign a thread to handle each. However, creating and scheduling multiple threads can lead to high system call costs. Even with techniques like thread pooling, if a large amount of concurrency is needed at the same time, the number of threads in the thread pool must also be large, resulting in substantial scheduling costs. To address this issue, an idea was proposed: to implement scheduling in user space rather than relying on the operating system's thread mechanism. If the scheduling process does not involve system calls, scheduling costs will be significantly reduced.

To implement user-level scheduling, we need our code to be able to "suspend" at specific points. At this time, the thread can execute other code. Later, this code can be resumed by this or another thread. A coroutine refers to a code execution that can suspend and resume.

The async/await mechanism in some programming languages is also closely related to coroutines. I'll introduce this concept from the perspective of coroutines later.

In concurrent programming, we can represent a concurrent task using a coroutine. It is usually executed by a worker thread or a worker thread pool. Once the coroutine being executed by the worker thread suspends, it switches to another ready but yet-to-start coroutine, allowing the coroutines to be executed concurrently. The worker thread can be built into the programming language or runtime, such as the thread pool built into Go language and C#/.NET, and the main thread of V8. It can also be provided by third-party libraries, such as tokio in Rust language and kotlinx.coroutines in Kotlin.

Stackful and Stackless Coroutines

The key to implementing coroutine mechanism lies in how to suspend and resume. Currently, there are two common types of implementation.

The first type is known as stackful coroutines. In this type, each coroutine has its own stack space. The common implementation of this type uses a strategy similar to operating system thread scheduling: when a coroutine suspends, it saves the current stack register and instruction register values. To resume a suspended coroutine, the stack register is set to the previously saved value and the execution jumps to the point after the suspension. Unlike threads, which are passively suspended by the operating system, coroutines proactively switch themselves out, and the entire process does not require system calls. The Go language and Loom, supported by the recent JRE, use this approach.

The second type is known as stackless coroutines. In this type, the coroutine does not have its own stack space, and there is no need to manipulate the stack register other than performing the normal stack push and pop operations. The common implementation of this type involves creating a state machine for the coroutine. Each suspension position corresponds to a state, and the process of state transition involves executing code from one suspension point to another. The state machine's position mirrors the suspension of the coroutine. The Kotlin language utilizes this approach.

A limitation of stackless coroutines is the necessity to determine suspension points at the time of coroutine creation, since the states and transitions of the state machine depend on them. Therefore, in Kotlin, functions that can be suspended must be marked. Conversely, stackful coroutines can be suspended at any point during runtime. The disadvantage with stackful coroutines, however, is the need to perform register operations, necessitating special support from the compiler or runtime environment. For example, Java's Loom can only be used in newer versions of the JRE, while Kotlin's coroutine is compatible with environments such as JRE 1.8 and Android runtime.

Async/Await

Several languages, including C#, Rust, JavaScript/TypeScript, and Python, have incorporated the async/await mechanism to facilitate writing code that includes asynchronous operations. These languages use a unique type to represent an ongoing asynchronous operation (represented below by TypeScript's Promise<T>). A function that returns a promise can be marked with async, and within this function, await can be used to "wait" for a Promise<T> object to fulfilled and obtain a result of type T.

The fundamental behind async/await is actually stackless coroutine. Functions marked with async effectively create a coroutine. When an async function is called, the coroutine runs until the await statement and suspends. Then, a promise is returned. The coroutine resumes when the awaited promise is fulfilled. When the coroutine finishes or throws an uncaught error, the promise returned by the async function resolves or rejects. Under the hood, a state machine is created for async function execution. The awaited promise being fulfilled triggers the state transition.

Yield

Several languages, including C#, JavaScript/TypeScript, and Python, have incorporated the yield mechanism to generate a sequence. You can use yield in a function to indicate adding an element to the sequence. Compared to directly returning an array, the biggest feature of the yield mechanism is its "laziness". That is, the code to obtain an element will only be executed when it is needed. When a function containing yield is executed, the function body doesn't immediately execute but suspends at the beginning, and a sequence is returned. Only when the next element of the sequence is acquired, does the function resume until the yield statement. At this point, the function suspends, and the value of yield is returned as the next element of the sequence.

With the previous examples, you might be able to figure out by yourself that the mechanism of suspension and resumption behind this is also a coroutine. The fundamental behind this feature is also a stackless coroutine. Under the hood, a state machine is created to yield the element. Acquiring the next element triggers the state transition.

Coroutines, as a mechanism that enables functions to suspend and resume, are not limited to concurrency. This is one scenario of coroutine application. It has other uses, such as calculating deep recursive functions without stack overflow. We won't elaborate on this here.

Both yield and async/await are implemented using stackless coroutines. Stackless coroutines need to determine potential suspension positions. For these two features, the suspension positions are the locations of yield or await, which are clear. Therefore, these two features usually use stackless coroutine implementation.

Since Kotlin natively supports stackless coroutines, both yield and await in Kotlin are functions rather than special syntax.

Coroutine Scheduling

As discussed previously, the primary difference between coroutines and threads is that coroutines need to suspend actively. In concurrent scenarios, if a coroutine never suspends, the worker thread would not switch to another coroutine, potentially causing other coroutines to remain in a suspended state for a long duration. This issue often emerges in stackless coroutines as the suspension points of stackless coroutines are predetermined and new temporary suspension points cannot be added based on runtime situations. This issue tends to be even more prevalent in single-threaded worker threads, such as in the V8 engine.

To prevent this, we can introduce casual suspensions within the coroutine. This suspension is not intended to wait for the completion of an asynchronous operation, but simply to enable the worker thread to switch to another coroutine. In Kotlin, this can be achieved by invoking the yield function.

In languages that use async/await, we've mentioned that await would suspend execution. Therefore, we can insert an await for a promise that does nothing to prevent a coroutine from occupying the worker thread for too long. Here's an example using JavaScript.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// Simulate a 1-second IO operation, and output information after the operation ends.
async function ioTask() {
await new Promise((resolve) => setTimeout(resolve, 1000));
console.log("IO finished");
}

// Simulate a 10-second CPU-intensive task.
function cpuTask() {
const end = Date.now() + 10000;
while (Date.now() < end) {
let x = 0;
for (let i = 0; i < 1e7; i++) {
x = Math.sqrt(i) + Math.pow(i, 2);
}
}
}

ioTask();
cpuTask();

If you run the above code, "IO finished" will be output only after 10 seconds. This happens because the CPU task takes up the main thread, preventing the IO task from continuing the execution of the output statement after await. To prevent the CPU task from occupying the main thread, we can add an await within the CPU task's loop.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// Simulate a 1-second IO operation, and output information after the operation ends.
async function ioTask() {
await new Promise((resolve) => setTimeout(resolve, 1000));
console.log("IO finished");
}

// Simulate a 10-second CPU-intensive task.
// This function has been rewritten as an async function
// because we want it to suspend during execution,
// enabling the main thread to execute other promises.
async function cpuTask() {
const end = Date.now() + 10000;
while (Date.now() < end) {
let x = 0;
for (let i = 0; i < 1e7; i++) {
x = Math.sqrt(i) + Math.pow(i, 2);
}
// The line below allows the CPU task to suspend,
// so the main thread can execute other promises.
//
// Note: You can't simply use `await Promise.resolve();`
// because awaiting a resolved promise will only
// add the task to be resumed in the microtask queue,
// which will execute in the current phase of the event loop.
// Only by using `setImmediate` can the other phases of the event loop
// be performed after suspending the promise.
// This allows the event loop to handle IO events.
await new Promise((resolve) => setImmediate(resolve));
}
}

ioTask();
cpuTask();

After the above modifications, "IO finished" can be output approximately one second later.

Asynchronous I/O

Coroutines often use asynchronous I/O. If traditional synchronous I/O is used, the worker thread will block on the I/O and be unable to execute other coroutines. But if asynchronous I/O is used, the coroutine can suspend when the I/O starts, and the worker thread can switch to other coroutines. After the I/O ends, the suspended coroutine will enter a ready state and can be executed by the worker thread.

If you need to perform IO-related operations for which there is no suitable asynchronous implementation, a common workaround is to use a dedicated blocking IO thread pool to execute functions that involve blocking IO. This allows the coroutine itself to pause, and the coroutine's worker thread can switch to other coroutines. However, the concurrency of blocking IO will still be limited by the number of threads in the blocking IO thread pool, so using asynchronous IO should be the preferred approach. Rust's tokio with spawn_blocking and Kotlin coroutine's Dispatchers.IO are designed to implement this strategy.

Summary

This blog explains the concepts related to concurrent programming, including coroutines and async/await. It shows the benefits of coroutines in managing concurrent tasks and details stackful and stackless coroutines. It explores further how async/await and yield, derived from coroutines, work. The article then stresses the necessity of suspending coroutines in concurrent situations. Through JavaScript examples, it shows potential issues and advises incorporating an 'await' in CPU-heavy tasks to facilitate other tasks such as I/O operations.

Here are the key points:

  1. Coroutines provide a cost-effective method for managing concurrent tasks by allowing code execution to be suspended and resumed.
  2. Asynchronous I/O and coroutines combine to prevent worker thread blockage.
  3. Stackful coroutines can suspend at any time, while stackless coroutines are implemented as state machines.
  4. Async/await and yield mechanisms, implemented in many languages, are based on stackless coroutines.
  5. In concurrent scenarios, coroutines need to actively suspend to allow worker thread switching.
  6. The insertion of await within a coroutine can enable the worker thread to switch to another coroutine.