C++Concurrency

C++20 Coroutines: co_await, co_yield, co_return, Generators & Async I/O Deep Dive

TT
TopicTrick Team
C++20 Coroutines: co_await, co_yield, co_return, Generators & Async I/O Deep Dive

C++20 Coroutines: co_await, co_yield, co_return, Generators & Async I/O Deep Dive


Table of Contents


What Makes a Function a Coroutine

A function becomes a coroutine the moment it contains any of these three keywords in its body:

KeywordMeaningUse Case
co_await exprSuspend until expr completesWaiting for I/O, timers, other tasks
co_yield valueSuspend and produce a valueGenerators, lazy sequences
co_return valueComplete the coroutine with a resultReturn final value from async task

The coroutine's return type must be a class that satisfies the coroutine protocol by providing an inner promise_type:


The Coroutine Frame and promise_type

Every coroutine has a coroutine frame — heap-allocated storage for:

  • All local variables that must survive suspension
  • The promise_type instance
  • The resume and destroy function pointers
  • The suspend point (where it last suspended)
cpp
#include <coroutine>
#include <optional>
#include <stdexcept>

// Minimal coroutine return type — Task that produces a single int
struct SimpleTask {
    struct promise_type {
        int result;
        std::exception_ptr exception;
        
        // Called to create the return object (SimpleTask instance):
        SimpleTask get_return_object() {
            return SimpleTask{
                std::coroutine_handle<promise_type>::from_promise(*this)
            };
        }
        
        // Called BEFORE the coroutine body runs:
        std::suspend_always initial_suspend() { return {}; } // Lazy start
        // std::suspend_never initial_suspend() { return {}; } // Eager start
        
        // Called AFTER co_return (coroutine done):
        std::suspend_always final_suspend() noexcept { return {}; } // Keep frame alive
        
        // co_return value; → sets promise.result and suspends at final_suspend
        void return_value(int val) { result = val; }
        
        // Handle unhandled exceptions:
        void unhandled_exception() { exception = std::current_exception(); }
    };
    
    std::coroutine_handle<promise_type> handle;
    
    ~SimpleTask() { if (handle) handle.destroy(); } // Free the coroutine frame
    SimpleTask(SimpleTask&&) = default;
    SimpleTask(const SimpleTask&) = delete;
    
    // Resume and get result:
    int get() {
        handle.resume();  // Run to completion (or to next suspension)
        if (handle.promise().exception)
            std::rethrow_exception(handle.promise().exception);
        return handle.promise().result;
    }
};

// Usage:
SimpleTask my_computation() {
    int a = 1 + 1;
    co_return a * 21; // 42
}

SimpleTask t = my_computation(); // Coroutine suspended at initial_suspend
int result = t.get();            // Resume → runs body → returns 42

Building Generator<T> with co_yield

A generator is a coroutine that lazily produces a sequence of values using co_yield:

cpp
#include <coroutine>
#include <iterator>

template<typename T>
class Generator {
public:
    struct promise_type {
        T value;
        
        Generator get_return_object() {
            return Generator{std::coroutine_handle<promise_type>::from_promise(*this)};
        }
        
        std::suspend_always initial_suspend() { return {}; }
        std::suspend_always final_suspend()   noexcept { return {}; }
        void return_void() {}
        void unhandled_exception() { std::terminate(); }
        
        // co_yield value; → stores value, suspends
        std::suspend_always yield_value(T val) {
            value = std::move(val);
            return {};
        }
    };
    
    // Iterator support (makes Generator range-compatible):
    struct iterator {
        std::coroutine_handle<promise_type> handle;
        bool done;
        
        iterator& operator++() {
            handle.resume();
            done = handle.done();
            return *this;
        }
        T& operator*() { return handle.promise().value; }
        bool operator==(std::default_sentinel_t) const { return done; }
    };
    
    iterator begin() {
        handle_.resume(); // Start the coroutine
        return {handle_, handle_.done()};
    }
    std::default_sentinel_t end() { return {}; }
    
    ~Generator() { if (handle_) handle_.destroy(); }
    
private:
    explicit Generator(std::coroutine_handle<promise_type> h) : handle_(h) {}
    std::coroutine_handle<promise_type> handle_;
};

// Using the generator:
Generator<int> fibonacci() {
    int a = 0, b = 1;
    while (true) {
        co_yield a;
        auto next = a + b;
        a = b;
        b = next;
    }
}

Generator<std::string> read_lines(const std::string& filename) {
    std::ifstream file(filename);
    std::string line;
    while (std::getline(file, line)) {
        co_yield line; // Yield one line at a time — file stays open between yields
    }
}

// Range-compatible — works with range-based for:
for (int fib : fibonacci() | std::views::take(10)) {
    std::cout << fib << ' '; // 0 1 1 2 3 5 8 13 21 34
}

for (const auto& line : read_lines("huge_file.txt") | std::views::take(100)) {
    process(line); // Only reads 100 lines — file not loaded into memory!
}

std::generator<T> (C++23): Out of the Box

C++23 standardizes generator coroutines — no more boilerplate:

cpp
#include <generator>  // C++23

std::generator<int> iota(int start = 0) {
    while (true) co_yield start++;
}

std::generator<int> fibonacci() {
    auto [a, b] = std::pair{0, 1};
    while (true) {
        co_yield a;
        std::tie(a, b) = std::pair{b, a + b};
    }
}

// Recursive generator with co_yield*:
std::generator<int> flatten(std::generator<std::generator<int>> nested) {
    for (auto& gen : nested) {
        co_yield std::ranges::elements_of(gen); // C++23: yield from sub-generator
    }
}

// Works directly with ranges:
auto first_10_fibs = fibonacci() | std::views::take(10);
for (int n : first_10_fibs) std::print("{} ", n);

Building Task<T> for Async I/O

A Task<T> coroutine represents a future value that becomes available when async I/O completes:

cpp
#include <coroutine>
#include <functional>

// Conceptual Task<T> — in production use asio or libcoro instead of this:
template<typename T>
class Task {
public:
    struct promise_type {
        T result;
        std::coroutine_handle<> continuation; // Who to resume when done
        
        Task get_return_object() {
            return Task{std::coroutine_handle<promise_type>::from_promise(*this)};
        }
        
        std::suspend_always initial_suspend() { return {}; }
        
        // When task finishes — resume the awaiting coroutine
        struct FinalAwaiter {
            bool await_ready() noexcept { return false; }
            void await_suspend(std::coroutine_handle<promise_type> h) noexcept {
                if (h.promise().continuation)
                    h.promise().continuation.resume(); // Chain resumption
            }
            void await_resume() noexcept {}
        };
        FinalAwaiter final_suspend() noexcept { return {}; }
        
        void return_value(T val) { result = std::move(val); }
        void unhandled_exception() { std::terminate(); }
    };
    
    // Awaitable interface — allows co_await task:
    bool await_ready() { return false; }
    void await_suspend(std::coroutine_handle<> awaiter) {
        handle_.promise().continuation = awaiter;
        handle_.resume(); // Start the task
    }
    T await_resume() { return std::move(handle_.promise().result); }
    
private:
    std::coroutine_handle<promise_type> handle_;
    explicit Task(std::coroutine_handle<promise_type> h) : handle_(h) {}
};

// Usage with async operations (using Asio-style):
Task<std::vector<char>> read_file_async(const std::string& path) {
    auto fd = co_await async_open(path);
    auto data = co_await async_read(fd, 1024 * 1024);
    co_await async_close(fd);
    co_return data;
}

Task<void> handle_request(Socket socket) {
    auto request  = co_await socket.read_async();    // Non-blocking — thread free!
    auto response = process(request);
    co_await socket.write_async(response);
    co_await socket.close_async();
    // No thread blocked during any of the I/O operations above
}

Coroutines vs Threads: When to Use Which

AspectThreadsCoroutines
Memory per unit1-8 MB (stack)50-500 bytes (frame)
Creation cost~10µs (OS syscall)~30ns (heap alloc)
Max concurrent~1000-10000 (practical)Millions
Context switch~1-5µs (OS scheduler)~10ns (function call)
CPU parallelismYes (real parallel)No (cooperative, single thread)
Best forCPU-bound parallel workI/O-bound concurrent work
cpp
// Use threads for: parallel CPU computation (parallel matrix multiply, image processing)
std::vector<std::jthread> workers;
for (int i = 0; i < std::thread::hardware_concurrency(); i++)
    workers.emplace_back(process_chunk, i);

// Use coroutines for: I/O-bound concurrency (10k simultaneous connections)
// One thread runs millions of coroutines cooperatively:
for (int i = 0; i < 100'000; i++)
    schedule(handle_request(accept_connection()));
// All 100k "connections" handled by coroutines — only one thread!

Frequently Asked Questions

Why is coroutine promise_type so complex? Can't the library hide this? The C++20 coroutine mechanism is deliberately "low level" — it provides the machinery (frame, handle, suspend/resume) but not the policy (when to resume, how to schedule). This lets library authors build any async model: generators, tasks, actors, fibers. Production use should always use a library (asio::awaitable, libcoro::task, cppcoro::task) rather than raw promise_type.

Do coroutines always allocate on the heap? By default, the coroutine frame is heap-allocated. However, the standard allows the compiler to perform Heap Allocation Elision Optimization (HALO) — if the coroutine's lifetime is fully contained in the caller, the frame can be stack-allocated. This is common in short-lived generator patterns.

Can coroutines be used in embedded systems? Yes — with custom allocators. The coroutine frame is allocated via operator new by default, but you can override this by providing operator new/delete in the promise_type to use a custom pool allocator. This makes coroutines viable on platforms with limited heap.


Key Takeaway

C++20 coroutines fundamentally change how you write async code. Instead of callback chains, futures/promises, or thread-per-connection models, you write linear code that reads synchronously but executes asynchronously. The key insight: co_await doesn't block a thread — it suspends the coroutine frame and returns the thread to the scheduler, which can run other coroutines. For I/O-bound systems, this enables orders-of-magnitude better scalability than thread-per-request models.

Read next: Variadic Templates & C++26 Pack Indexing →


*Part of the C++ Mastery Course — 30 modules from modern C++ basics to expert systems engine