C++ Multithreading: jthread, mutex, atomic, Memory Model & Lock-Free Programming (C++20)

C++ Multithreading: jthread, mutex, atomic, Memory Model & Lock-Free Programming (C++20)
Table of Contents
- The C++ Concurrency Landscape
- std::jthread: RAII Thread with Stop Support (C++20)
- std::stop_token: Cooperative Cancellation
- Mutex Types and When to Use Each
- scoped_lock: Multi-Mutex Deadlock Prevention
- condition_variable: Producer-Consumer Pattern
- The C++ Memory Model: happens-before and memory_order
- std::atomic: Lock-Free Operations
- Atomic Fences: standalone memory barriers
- std::latch, std::barrier & std::semaphore (C++20)
- Thread Pool Design with jthread and queue
- Frequently Asked Questions
- Key Takeaway
The C++ Concurrency Landscape
std::jthread: RAII Thread with Stop Support (C++20)
std::thread (C++11) crashes if it goes out of scope without being joined or detached. std::jthread (C++20) fixes this by auto-joining in its destructor:
#include <thread>
#include <stop_token>
#include <print>
// Basic jthread:
{
std::jthread worker([]() {
std::println("Working...");
std::this_thread::sleep_for(std::chrono::milliseconds(100));
});
} // ↠jthread destructor calls join() automatically — no crash!
// jthread with stop_token (cooperative cancellation):
std::jthread long_task([](std::stop_token st) {
while (!st.stop_requested()) {
process_next_item();
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
std::println("Task stopped gracefully");
});
// Signal the thread to stop (non-blocking — just sets the flag):
long_task.request_stop();
// Destructor automatically joins — waits for thread to exit the loop
// stop_callback: register cleanup when stop is requested
std::stop_callback cleanup(long_task.get_stop_token(), []() {
cleanup_resources(); // Called when stop is requested
});Mutex Types and When to Use Each
| Mutex Type | Description | Use When |
|---|---|---|
std::mutex | Basic mutual exclusion | Default choice |
std::recursive_mutex | Same thread can lock multiple times | Recursive functions that need the lock |
std::timed_mutex | Trylock with timeout | Avoid deadlock with deadline |
std::shared_mutex | Multiple readers / one writer | Read-heavy data structures |
#include <shared_mutex>
class Config {
mutable std::shared_mutex mtx_;
std::map<std::string, std::string> data_;
public:
// Multiple readers can run concurrently (shared_lock):
std::string get(const std::string& key) const {
std::shared_lock lock(mtx_); // Multiple threads can hold this at once
auto it = data_.find(key);
return it != data_.end() ? it->second : "";
}
// Single writer blocks all readers (unique_lock):
void set(const std::string& key, const std::string& value) {
std::unique_lock lock(mtx_); // Exclusive write access
data_[key] = value;
}
};scoped_lock: Multi-Mutex Deadlock Prevention
Acquiring multiple mutexes in different orders across threads is a classic deadlock scenario. std::scoped_lock (C++17) acquires all mutexes atomically using a deadlock-avoidance algorithm:
#include <mutex>
std::mutex mtx_a, mtx_b;
// DEADLOCK — Thread 1 gets mtx_a, thread 2 gets mtx_b, both wait forever:
// Thread 1: lock(mtx_a) then lock(mtx_b)
// Thread 2: lock(mtx_b) then lock(mtx_a)
// SAFE — scoped_lock acquires both atomically (uses std::lock internally):
void transfer(Account& src, Account& dst, int amount) {
std::scoped_lock lock(src.mtx, dst.mtx); // No deadlock possible!
src.balance -= amount;
dst.balance += amount;
} // Both mutexes released on destructor
// Also works with single mutex (replaces lock_guard):
std::scoped_lock single_lock(mtx_a);condition_variable: Producer-Consumer Pattern
#include <mutex>
#include <condition_variable>
#include <queue>
#include <optional>
template<typename T>
class BoundedQueue {
std::queue<T> queue_;
std::mutex mtx_;
std::condition_variable non_empty_;
std::condition_variable non_full_;
const size_t max_size_;
bool closed_ = false;
public:
explicit BoundedQueue(size_t max) : max_size_(max) {}
// Producer: enqueue (blocks if full)
void push(T item) {
std::unique_lock lock(mtx_);
non_full_.wait(lock, [&]{ return queue_.size() < max_size_ || closed_; });
if (closed_) return;
queue_.push(std::move(item));
non_empty_.notify_one(); // Wake a waiting consumer
}
// Consumer: dequeue (blocks if empty, returns nullopt if closed+empty)
std::optional<T> pop() {
std::unique_lock lock(mtx_);
non_empty_.wait(lock, [&]{ return !queue_.empty() || closed_; });
if (queue_.empty()) return std::nullopt;
T item = std::move(queue_.front());
queue_.pop();
non_full_.notify_one(); // Wake a waiting producer
return item;
}
void close() {
std::scoped_lock lock(mtx_);
closed_ = true;
non_empty_.notify_all(); // Wake all blocked consumers
non_full_.notify_all(); // Wake all blocked producers
}
};The C++ Memory Model: happens-before and memory_order
The C++ memory model defines when writes in one thread become visible to reads in another. Without it, the compiler and CPU are free to reorder operations:
#include <atomic>
// BROKEN: No synchronization. Compiler/CPU may reorder:
bool ready = false;
int data = 0;
// Thread 1:
data = 42; // May be reordered AFTER ready = true!
ready = true;
// Thread 2:
while (!ready) {} // Busy-wait
int x = data; // May read 0 — data write not yet visible!
// FIXED with atomic and sequentially consistent ordering:
std::atomic<bool> ready_sc = false;
std::atomic<int> data_sc = 0;
// Thread 1:
data_sc.store(42, std::memory_order_release); // All prior writes visible to acquire
ready_sc.store(true, std::memory_order_release);
// Thread 2:
while (!ready_sc.load(std::memory_order_acquire)) {}
int x = data_sc.load(std::memory_order_acquire); // Guaranteed to see 42Memory orders (ordered from strongest to weakest):
| memory_order | Guarantee | Cost |
|---|---|---|
seq_cst | Total global ordering of all atomic ops | Highest (memory barrier) |
acq_rel | acquire + release on one operation | Medium |
release | All prior writes visible to acquire in other thread | Low |
acquire | Sees all writes released before this load | Low |
relaxed | Only atomicity — no ordering guarantees | Lowest |
std::atomic: Lock-Free Operations
#include <atomic>
// Integral atomics: all arithmetic operations are atomic
std::atomic<int> counter{0};
std::atomic<size_t> bytes_processed{0};
counter.fetch_add(1); // Atomic increment
counter.fetch_sub(1); // Atomic decrement
int old = counter.exchange(100); // Swap, return old value
// Compare-and-swap (CAS): the foundation of lock-free algorithms
int expected = 5;
bool swapped = counter.compare_exchange_strong(expected, 10);
// If counter == expected (5) → set to 10, return true
// If counter != expected → set expected = counter, return false
// Atomic flag — simplest lock-free type (guaranteed lock-free):
std::atomic_flag lock_flag = ATOMIC_FLAG_INIT;
// Spinlock using atomic_flag:
class SpinLock {
std::atomic_flag flag_ = ATOMIC_FLAG_INIT;
public:
void lock() {
while (flag_.test_and_set(std::memory_order_acquire));
// Busy-wait until flag was clear (we set it)
}
void unlock() {
flag_.clear(std::memory_order_release);
}
};
// RAII spinlock usage:
SpinLock sl;
{
std::lock_guard<SpinLock> guard(sl);
// Critical section — lock-free spin, no syscall
}std::latch, std::barrier & std::semaphore (C++20)
#include <latch>
#include <barrier>
#include <semaphore>
// std::latch: one-shot countdown (cannot be reset)
std::latch startup_latch(4); // Wait for 4 threads to be ready
// Each thread calls:
startup_latch.count_down(); // Signal: this thread is ready
startup_latch.wait(); // Block until all 4 have counted down
// std::barrier: reusable rendezvous point (can be reset)
std::barrier sync_point(4, []() noexcept {
// Completion function: runs once when all 4 threads arrive
merge_partial_results();
});
// In each worker:
for (auto& chunk : my_chunks) {
process(chunk);
sync_point.arrive_and_wait(); // Barrier: all 4 must reach before any continues
}
// std::counting_semaphore: limit concurrent access
std::counting_semaphore<8> db_connections(8); // Max 8 concurrent DB connections
void access_database() {
db_connections.acquire(); // Block if 8 connections already active
// ... use database ...
db_connections.release(); // Return the connection slot
}
// std::binary_semaphore: signal-based synchronization (value 0 or 1)
std::binary_semaphore signal(0);
// Producer signals:
signal.release(); // Signal consumer
// Consumer waits:
signal.acquire(); // Block until signaledThread Pool Design with jthread and queue
#include <thread>
#include <queue>
#include <functional>
#include <vector>
class ThreadPool {
std::vector<std::jthread> workers_;
BoundedQueue<std::function<void()>> tasks_{256};
public:
explicit ThreadPool(size_t n = std::thread::hardware_concurrency()) {
for (size_t i = 0; i < n; i++) {
workers_.emplace_back([this](std::stop_token st) {
while (!st.stop_requested()) {
auto task = tasks_.pop();
if (!task) break; // Queue closed
(*task)();
}
});
}
}
template<typename F>
void submit(F&& task) {
tasks_.push(std::function<void()>(std::forward<F>(task)));
}
~ThreadPool() {
tasks_.close(); // Signal queue closed — workers exit their loops
// jthread destructors join all workers automatically
}
};
// Usage:
ThreadPool pool(8);
pool.submit([]{ process_frame(1); });
pool.submit([]{ process_frame(2); });
// Pool destructor waits for all work to completeFrequently Asked Questions
When should I use std::async vs std::jthread?
Use std::async for simple parallel computations that return a value — it returns a std::future<T> you can get() later. Use std::jthread for long-running services, event loops, or any thread that runs until explicitly stopped. For structured concurrency (C++26 planned), prefer executors and coroutines over raw threads.
What is a data race and how is it different from a race condition? A data race is when two threads concurrently access the same memory location, at least one write, with no synchronization — this is undefined behavior in C++. A race condition is a logical bug where the program outcome depends on execution order — even with synchronization, wrong locking can produce wrong results. ASan + TSan (ThreadSanitizer) detect data races at runtime.
Is std::atomic always lock-free?
Check with is_lock_free() at runtime, or is_always_lock_free at compile time. Integral types (int, long, ptr) are always lock-free on modern platforms. For types larger than a register (>8 bytes), the implementation uses an internal mutex, defeating the purpose of atomic.
Key Takeaway
C++ concurrency in 2026 means std::jthread + std::stop_token for thread lifecycle, scoped_lock for multi-mutex deadlock prevention, shared_mutex for read-heavy workloads, and std::atomic with correct memory orders for lock-free state. The C++ memory model is not optional — incorrect ordering produces subtle, hardware-specific bugs that are nearly impossible to reproduce in debug builds.
Read next: C++20 Coroutines: Asynchronous Flow Control →
Part of the C++ Mastery Course — 30 modules from modern C++ basics to expert systems engineering.
