C++20 Coroutines: co_await, co_yield, co_return, Generators & Async I/O Deep Dive

C++20 Coroutines: co_await, co_yield, co_return, Generators & Async I/O Deep Dive
Table of Contents
- What Makes a Function a Coroutine
- The Coroutine Frame and promise_type
- Awaitables: suspend_always, suspend_never, co_await
- Building
Generator<T>with co_yield - std::generator
<T>(C++23): Out of the Box - Building
Task<T>for Async I/O - Coroutine Lifetime and Heap Allocation
- Coroutines in Production: asio and libcoro
- Coroutines vs Threads: When to Use Which
- Frequently Asked Questions
- Key Takeaway
What Makes a Function a Coroutine
A function becomes a coroutine the moment it contains any of these three keywords in its body:
| Keyword | Meaning | Use Case |
|---|---|---|
co_await expr | Suspend until expr completes | Waiting for I/O, timers, other tasks |
co_yield value | Suspend and produce a value | Generators, lazy sequences |
co_return value | Complete the coroutine with a result | Return final value from async task |
The coroutine's return type must be a class that satisfies the coroutine protocol by providing an inner promise_type:
The Coroutine Frame and promise_type
Every coroutine has a coroutine frame — heap-allocated storage for:
- All local variables that must survive suspension
- The
promise_typeinstance - The resume and destroy function pointers
- The suspend point (where it last suspended)
Building Generator<T> with co_yield
A generator is a coroutine that lazily produces a sequence of values using co_yield:
std::generator<T> (C++23): Out of the Box
C++23 standardizes generator coroutines — no more boilerplate:
Building Task<T> for Async I/O
A Task<T> coroutine represents a future value that becomes available when async I/O completes:
Coroutines vs Threads: When to Use Which
| Aspect | Threads | Coroutines |
|---|---|---|
| Memory per unit | 1-8 MB (stack) | 50-500 bytes (frame) |
| Creation cost | ~10µs (OS syscall) | ~30ns (heap alloc) |
| Max concurrent | ~1000-10000 (practical) | Millions |
| Context switch | ~1-5µs (OS scheduler) | ~10ns (function call) |
| CPU parallelism | Yes (real parallel) | No (cooperative, single thread) |
| Best for | CPU-bound parallel work | I/O-bound concurrent work |
Frequently Asked Questions
Why is coroutine promise_type so complex? Can't the library hide this?
The C++20 coroutine mechanism is deliberately "low level" — it provides the machinery (frame, handle, suspend/resume) but not the policy (when to resume, how to schedule). This lets library authors build any async model: generators, tasks, actors, fibers. Production use should always use a library (asio::awaitable, libcoro::task, cppcoro::task) rather than raw promise_type.
Do coroutines always allocate on the heap? By default, the coroutine frame is heap-allocated. However, the standard allows the compiler to perform Heap Allocation Elision Optimization (HALO) — if the coroutine's lifetime is fully contained in the caller, the frame can be stack-allocated. This is common in short-lived generator patterns.
Can coroutines be used in embedded systems?
Yes — with custom allocators. The coroutine frame is allocated via operator new by default, but you can override this by providing operator new/delete in the promise_type to use a custom pool allocator. This makes coroutines viable on platforms with limited heap.
Key Takeaway
C++20 coroutines fundamentally change how you write async code. Instead of callback chains, futures/promises, or thread-per-connection models, you write linear code that reads synchronously but executes asynchronously. The key insight: co_await doesn't block a thread — it suspends the coroutine frame and returns the thread to the scheduler, which can run other coroutines. For I/O-bound systems, this enables orders-of-magnitude better scalability than thread-per-request models.
Read next: Variadic Templates & C++26 Pack Indexing →
Part of the C++ Mastery Course — 30 modules from modern C++ basics to expert systems engineering.
