Handling Concurrency: Understanding Coroutines, Async/Await, Scheduling, and More

Posted on 2023-06-24 In Concurrent Programming

We often need to execute multiple tasks simultaneously in the program, such as a server handling multiple requests at the same time or a browser rendering the UI while performing I/O operations. As concurrency in our programs increases, technologies like asynchronous I/O, coroutines, and async/await are being used to support it. In this article, I'll share my understanding of these technologies, hoping to help you comprehend and apply them.

The Fundamental Idea of Coroutines

The OS provides threads to run parallel tasks, but creating and scheduling a lot of threads can lead to high costs. Even with thread pooling, if a large amount of concurrency is needed at the same time, the number of threads in the thread pool must also be large.

To address this issue, one idea is to implement scheduling without involving the OS. This can be achieved by enabling our code to suspend and later resume. By doing so, we can use a worker thread or a worker thread pool to execute more concurrent tasks than the number of threads available. Once the task being executed by the worker thread suspends, the thread can switch to another task that is ready but hasn't started yet. This is known as a coroutine: a code execution that can suspend and resume.

The async/await mechanism in some programming languages is also closely related to coroutines. I'll introduce this concept from the perspective of coroutines later.

The worker thread can be built into the programming language or runtime, such as the thread pool built into Go language and C#/.NET, and the main thread of V8. It can also be provided by third-party libraries, such as tokio in Rust language and kotlinx.coroutines in Kotlin.

Stackful and Stackless Coroutines

There are two common types of coroutine implementations that allow our code to suspend and resume.

The first type is known as stackful coroutines. In this type, each coroutine has its own stack space. A strategy similar to operating system thread scheduling is used: when a coroutine suspends, it saves the register values. To resume a coroutine, the registers are restored to their previously saved values. This includes switching the stack register to the coroutine's stack space and jumping back to the point where the coroutine was suspended. Unlike threads, which are passively suspended by the OS, coroutines proactively switch themselves out without involving the OS. The Go language and Loom, supported by the recent JRE, use this approach.

The second type is known as stackless coroutines. In this type, the compiler transforms the coroutine code into a state machine, where each suspension point corresponds to a state. The process of transitioning between states corresponds to executing code from one suspension point to the next. You can think of it as the compiler "splitting" the task into smaller parts, allowing the task to be suspended between these parts. The Kotlin language uses this approach.

A limitation of stackless coroutines is the need to identify suspension points in the code, as the compiler requires this information to determine how to generate the state machine. Therefore, in Kotlin, functions that can be suspended must be marked. While stackful coroutines can be suspended at any point during runtime. The disadvantage with stackful coroutines, however, is the need to perform register operations, requring special support from the compiler or runtime environment. For example, Java's Loom can only be used in newer versions of the JRE, while Kotlin's coroutine is compatible with environments such as JRE 1.8 and Android runtime.

Async/Await

Several languages, including C#, Rust, JavaScript/TypeScript, and Python, have the async/await mechanism to write code that includes asynchronous operations. These languages use a unique type to represent an ongoing asynchronous operation (represented below by TypeScript's Promise<T>). A function that returns a promise can be marked with async, and within this function, await can be used to "wait" for a Promise<T> object to finished and obtain a result of type T.

The fundamental behind async/await is the stackless coroutine. Functions marked with async essentially become coroutines. When an async function is invoked, the coroutine runs until it encounters an await statement, at which point it suspends and returns a promise. The coroutine resumes once the awaited promise is finished. Behind the scenes, a state machine manages the execution of async functions, with the completion of the awaited promise triggering a state transition.

Generator

Several programming languages, including C#, JavaScript/TypeScript, and Python, support the generator feature. In a function that returns a sequence, you can use yield to add an element to the sequence. Unlike directly returning a list, the generator mechanism is "lazy," meaning the code to obtain an element is executed only when needed. When a function containing yield is called, the function body doesn't execute immediately but instead suspends at the start, returning a sequence. The function resumes execution only when the next element of the sequence is requested, continuing until it reaches the yield statement. At this point, the function suspends, and the value of yield is returned as the next element of the sequence.

With the previous examples, you might be able to figure out by yourself that the mechanism of suspension and resumption behind this is also a coroutine. Under the hood, a state machine is created to yield the element. Acquiring the next element triggers the state transition.

Coroutines, which allow functions to suspend and resume, are not limited to concurrency. There are many other applications besides those mentioned, such as calculating deep recursive functions without causing a stack overflow.

Both yield and async/await are implemented using stackless coroutines. Stackless coroutines need to determine suspension positions. For these two features, the suspension positions are the locations of yield or await, which are clear. Therefore, these two features usually use stackless coroutine.

Since Kotlin natively supports stackless coroutines, both yield and await in Kotlin are functions rather than special syntax.

Coroutine Scheduling

As discussed previously, the primary difference between coroutines and threads is that coroutines need to suspend actively. In concurrent scenarios, if a coroutine never suspends, the worker thread would not switch to another coroutine, potentially causing other coroutines to remain in a suspended state for a long duration. This issue is more common in stackless coroutines because their suspension points are predetermined, and new temporary suspension points cannot be added based on runtime situations. This issue tends to be even more severe in single-threaded worker threads, such as in the V8 engine.

To prevent this, we can introduce casual suspensions within the coroutine. This suspension is not intended to wait for the completion of an asynchronous operation, but simply to enable the worker thread to switch to another coroutine. In Kotlin, this can be achieved by invoking the yield function.

In languages that use async/await, we've mentioned that await would suspend execution. Therefore, we can insert an await for a promise that does nothing to prevent a coroutine from occupying the worker thread for too long. Here's an example using JavaScript.

// Simulate a 1-second I/O operation, and output information after the operation ends.
async function ioTask() {
    await new Promise((resolve) => setTimeout(resolve, 1000));
    console.log("I/O finished");
}

// Simulate a 10-second CPU-intensive task.
function cpuTask() {
    const end = Date.now() + 10000;
    while (Date.now() < end) {
        let x = 0;
        for (let i = 0; i < 1e7; i++) {
            x = Math.sqrt(i) + Math.pow(i, 2);
        }
    }
}

ioTask();
cpuTask();

If you run the above code, "I/O finished" will be output only after 10 seconds. This happens because the CPU task takes up the main thread, preventing the I/O task from continuing the execution of the output statement after await. To prevent the CPU task from occupying the main thread, we can add an await within the CPU task's loop.

// Simulate a 1-second I/O operation, and output information after the operation ends.
async function ioTask() {
    await new Promise((resolve) => setTimeout(resolve, 1000));
    console.log("I/O finished");
}

// Simulate a 10-second CPU-intensive task.
// This function has been rewritten as an async function 
// because we want it to suspend during execution, 
// enabling the main thread to execute other promises.
async function cpuTask() {
    const end = Date.now() + 10000;
    while (Date.now() < end) {
        let x = 0;
        for (let i = 0; i < 1e7; i++) {
            x = Math.sqrt(i) + Math.pow(i, 2);
        }
        // The line below allows the CPU task to suspend, 
        // so the main thread can execute other promises.
        // 
        // Note: You can't simply use `await Promise.resolve();`
        // because awaiting a resolved promise will only
        // add the task to be resumed in the microtask queue,
        // which will execute in the current phase of the event loop.
        // Only by using `setImmediate` can the other phases of the event loop
        // be performed after suspending the promise.
        // This allows the event loop to handle I/O events.
        await new Promise((resolve) => setImmediate(resolve));
    }
}

ioTask();
cpuTask();

After the above modifications, "I/O finished" can be output approximately one second later.

Asynchronous I/O

Coroutines often use asynchronous I/O. If synchronous I/O is used, the worker thread will block on the I/O and be unable to execute other coroutines. But if asynchronous I/O is used, the coroutine can suspend when the I/O starts, and the worker thread can switch to other coroutines. After the I/O ends, the suspended coroutine will enter a ready state and can be executed by the worker thread.

If you need to perform I/O-related operations for which there is no suitable asynchronous implementation, a common workaround is to use a dedicated blocking I/O thread pool to execute functions that involve blocking I/O. This allows the coroutine itself to pause, and the coroutine's worker thread can switch to other coroutines. However, the concurrency of blocking I/O will still be limited by the number of threads in the blocking I/O thread pool, so using asynchronous I/O should be the preferred approach. Rust's tokio with spawn_blocking and Kotlin coroutine's Dispatchers.IO are designed to implement this strategy.

Summary

Here are the key points:

Coroutines provide a cost-effective method for managing concurrent tasks by allowing code execution to be suspended and resumed.
Asynchronous I/O and coroutines combine to prevent worker thread blockage.
Stackful coroutines can suspend at any time, while stackless coroutines are implemented as state machines.
Async/await and yield mechanisms, implemented in many languages, are based on stackless coroutines.
In concurrent scenarios, coroutines need to actively suspend to allow worker thread switching.
The insertion of await within a coroutine can enable the worker thread to switch to another coroutine.