What is the difference between std::async and std::jthread in C++?

std::async launches a callable asynchronously and returns a std::future - the result is retrieved by calling future.get(), which blocks until completion. std::jthread (C++20) is a joinable thread that automatically joins on destruction and supports cooperative cancellation via std::stop_token. Use std::async for task-based parallelism where you want a result. Use std::jthread for long-running background threads that need clean shutdown.

What is a data race and how do I detect one in C++?

A data race occurs when two threads access the same memory location concurrently, at least one access is a write, and there is no synchronisation between them. Data races are undefined behaviour - the program may produce wrong results, crash, or appear to work correctly while producing subtly wrong output. Use ThreadSanitizer (-fsanitize=thread with GCC or Clang) to detect data races at runtime. The sanitizer has near-zero false-positive rates in practice.

When should I use std::atomic instead of a mutex?

Use std::atomic for single-value operations: incrementing a counter, setting a flag, updating a pointer. Atomic operations are lock-free on most platforms and have lower overhead than a mutex. Use a mutex when you need to protect a multi-step operation as a unit - atomics cannot make a sequence of reads and writes atomic together. For complex shared state, a mutex is safer and easier to reason about.

What is std::memory_order and do I need to think about it?

Memory order controls how atomic operations are visible across CPU cores - different cores may see memory writes in different orders without explicit synchronisation. std::memory_order_seq_cst (the default) gives the strongest guarantee and is correct in all cases at some performance cost. Relaxed and acquire/release orderings are faster but require careful reasoning. For most application code, use the default seq_cst and only optimise memory ordering after profiling shows it is a bottleneck.

C++ Multithreading: jthread, mutex, atomic, Memory Model & Lock-Free Programming (C++20)

← Back to C++ Mastery

C++ Multithreading: jthread, mutex, atomic, Memory Model & Lock-Free Programming (C++20)

The C++ Concurrency Landscape
std::jthread: RAII Thread with Stop Support (C++20)
std::stop_token: Cooperative Cancellation
Mutex Types and When to Use Each
scoped_lock: Multi-Mutex Deadlock Prevention
condition_variable: Producer-Consumer Pattern
The C++ Memory Model: happens-before and memory_order
std::atomic: Lock-Free Operations
Atomic Fences: standalone memory barriers
std::latch, std::barrier & std::semaphore (C++20)
Thread Pool Design with jthread and queue
Frequently Asked Questions
Key Takeaway

The C++ Concurrency Landscape

std::jthread: RAII Thread with Stop Support (C++20)

std::thread (C++11) crashes if it goes out of scope without being joined or detached. std::jthread (C++20) fixes this by auto-joining in its destructor:

cpp

#include <thread>
#include <stop_token>
#include <print>

// Basic jthread:
{
    std::jthread worker([]() {
        std::println("Working...");
        std::this_thread::sleep_for(std::chrono::milliseconds(100));
    });
} // ← jthread destructor calls join() automatically - no crash!

// jthread with stop_token (cooperative cancellation):
std::jthread long_task([](std::stop_token st) {
    while (!st.stop_requested()) {
        process_next_item();
        std::this_thread::sleep_for(std::chrono::milliseconds(10));
    }
    std::println("Task stopped gracefully");
});

// Signal the thread to stop (non-blocking - just sets the flag):
long_task.request_stop();
// Destructor automatically joins - waits for thread to exit the loop

// stop_callback: register cleanup when stop is requested
std::stop_callback cleanup(long_task.get_stop_token(), []() {
    cleanup_resources(); // Called when stop is requested
});

Mutex Types and When to Use Each

Mutex Type	Description	Use When
`std::mutex`	Basic mutual exclusion	Default choice
`std::recursive_mutex`	Same thread can lock multiple times	Recursive functions that need the lock
`std::timed_mutex`	Trylock with timeout	Avoid deadlock with deadline
`std::shared_mutex`	Multiple readers / one writer	Read-heavy data structures

cpp

#include <shared_mutex>

class Config {
    mutable std::shared_mutex mtx_;
    std::map<std::string, std::string> data_;
    
public:
    // Multiple readers can run concurrently (shared_lock):
    std::string get(const std::string& key) const {
        std::shared_lock lock(mtx_);  // Multiple threads can hold this at once
        auto it = data_.find(key);
        return it != data_.end() ? it->second : "";
    }
    
    // Single writer blocks all readers (unique_lock):
    void set(const std::string& key, const std::string& value) {
        std::unique_lock lock(mtx_); // Exclusive write access
        data_[key] = value;
    }
};

scoped_lock: Multi-Mutex Deadlock Prevention

Acquiring multiple mutexes in different orders across threads is a classic deadlock scenario. std::scoped_lock (C++17) acquires all mutexes atomically using a deadlock-avoidance algorithm:

cpp

#include <mutex>

std::mutex mtx_a, mtx_b;

// DEADLOCK - Thread 1 gets mtx_a, thread 2 gets mtx_b, both wait forever:
// Thread 1: lock(mtx_a) then lock(mtx_b)
// Thread 2: lock(mtx_b) then lock(mtx_a)

// SAFE - scoped_lock acquires both atomically (uses std::lock internally):
void transfer(Account& src, Account& dst, int amount) {
    std::scoped_lock lock(src.mtx, dst.mtx); // No deadlock possible!
    src.balance -= amount;
    dst.balance += amount;
} // Both mutexes released on destructor

// Also works with single mutex (replaces lock_guard):
std::scoped_lock single_lock(mtx_a);

condition_variable: Producer-Consumer Pattern

cpp

#include <mutex>
#include <condition_variable>
#include <queue>
#include <optional>

template<typename T>
class BoundedQueue {
    std::queue<T>           queue_;
    std::mutex              mtx_;
    std::condition_variable non_empty_;
    std::condition_variable non_full_;
    const size_t            max_size_;
    bool                    closed_ = false;
    
public:
    explicit BoundedQueue(size_t max) : max_size_(max) {}
    
    // Producer: enqueue (blocks if full)
    void push(T item) {
        std::unique_lock lock(mtx_);
        non_full_.wait(lock, [&]{ return queue_.size() < max_size_ || closed_; });
        if (closed_) return;
        queue_.push(std::move(item));
        non_empty_.notify_one(); // Wake a waiting consumer
    }
    
    // Consumer: dequeue (blocks if empty, returns nullopt if closed+empty)
    std::optional<T> pop() {
        std::unique_lock lock(mtx_);
        non_empty_.wait(lock, [&]{ return !queue_.empty() || closed_; });
        if (queue_.empty()) return std::nullopt;
        T item = std::move(queue_.front());
        queue_.pop();
        non_full_.notify_one(); // Wake a waiting producer
        return item;
    }
    
    void close() {
        std::scoped_lock lock(mtx_);
        closed_ = true;
        non_empty_.notify_all(); // Wake all blocked consumers
        non_full_.notify_all();  // Wake all blocked producers
    }
};

The C++ Memory Model: happens-before and memory_order

The C++ memory model defines when writes in one thread become visible to reads in another. Without it, the compiler and CPU are free to reorder operations:

cpp

#include <atomic>

// BROKEN: No synchronization. Compiler/CPU may reorder:
bool ready = false;
int  data  = 0;

// Thread 1:
data  = 42;    // May be reordered AFTER ready = true!
ready = true;

// Thread 2:
while (!ready) {} // Busy-wait
int x = data;     // May read 0 - data write not yet visible!

// FIXED with atomic and sequentially consistent ordering:
std::atomic<bool> ready_sc = false;
std::atomic<int>  data_sc  = 0;

// Thread 1:
data_sc.store(42, std::memory_order_release); // All prior writes visible to acquire
ready_sc.store(true, std::memory_order_release);

// Thread 2:
while (!ready_sc.load(std::memory_order_acquire)) {}
int x = data_sc.load(std::memory_order_acquire); // Guaranteed to see 42

Memory orders (ordered from strongest to weakest):

memory_order	Guarantee	Cost
`seq_cst`	Total global ordering of all atomic ops	Highest (memory barrier)
`acq_rel`	acquire + release on one operation	Medium
`release`	All prior writes visible to acquire in other thread	Low
`acquire`	Sees all writes released before this load	Low
`relaxed`	Only atomicity - no ordering guarantees	Lowest

std::atomic: Lock-Free Operations

cpp

#include <atomic>

// Integral atomics: all arithmetic operations are atomic
std::atomic<int>    counter{0};
std::atomic<size_t> bytes_processed{0};

counter.fetch_add(1);                            // Atomic increment
counter.fetch_sub(1);                            // Atomic decrement
int old = counter.exchange(100);                 // Swap, return old value

// Compare-and-swap (CAS): the foundation of lock-free algorithms
int expected = 5;
bool swapped = counter.compare_exchange_strong(expected, 10);
// If counter == expected (5) -> set to 10, return true
// If counter != expected   -> set expected = counter, return false

// Atomic flag - simplest lock-free type (guaranteed lock-free):
std::atomic_flag lock_flag = ATOMIC_FLAG_INIT;

// Spinlock using atomic_flag:
class SpinLock {
    std::atomic_flag flag_ = ATOMIC_FLAG_INIT;
public:
    void lock() {
        while (flag_.test_and_set(std::memory_order_acquire));
        // Busy-wait until flag was clear (we set it)
    }
    void unlock() {
        flag_.clear(std::memory_order_release);
    }
};

// RAII spinlock usage:
SpinLock sl;
{
    std::lock_guard<SpinLock> guard(sl);
    // Critical section - lock-free spin, no syscall
}

std::latch, std::barrier & std::semaphore (C++20)

cpp

#include <latch>
#include <barrier>
#include <semaphore>

// std::latch: one-shot countdown (cannot be reset)
std::latch startup_latch(4); // Wait for 4 threads to be ready

// Each thread calls:
startup_latch.count_down();      // Signal: this thread is ready
startup_latch.wait();            // Block until all 4 have counted down

// std::barrier: reusable rendezvous point (can be reset)
std::barrier sync_point(4, []() noexcept {
    // Completion function: runs once when all 4 threads arrive
    merge_partial_results();
});

// In each worker:
for (auto& chunk : my_chunks) {
    process(chunk);
    sync_point.arrive_and_wait(); // Barrier: all 4 must reach before any continues
}

// std::counting_semaphore: limit concurrent access
std::counting_semaphore<8> db_connections(8); // Max 8 concurrent DB connections

void access_database() {
    db_connections.acquire();       // Block if 8 connections already active
    // ... use database ...
    db_connections.release();       // Return the connection slot
}

// std::binary_semaphore: signal-based synchronization (value 0 or 1)
std::binary_semaphore signal(0);

// Producer signals:
signal.release(); // Signal consumer

// Consumer waits:
signal.acquire(); // Block until signaled

Thread Pool Design with jthread and queue

cpp

#include <thread>
#include <queue>
#include <functional>
#include <vector>

class ThreadPool {
    std::vector<std::jthread>          workers_;
    BoundedQueue<std::function<void()>> tasks_{256};
    
public:
    explicit ThreadPool(size_t n = std::thread::hardware_concurrency()) {
        for (size_t i = 0; i < n; i++) {
            workers_.emplace_back([this](std::stop_token st) {
                while (!st.stop_requested()) {
                    auto task = tasks_.pop();
                    if (!task) break;  // Queue closed
                    (*task)();
                }
            });
        }
    }
    
    template<typename F>
    void submit(F&& task) {
        tasks_.push(std::function<void()>(std::forward<F>(task)));
    }
    
    ~ThreadPool() {
        tasks_.close();  // Signal queue closed - workers exit their loops
        // jthread destructors join all workers automatically
    }
};

// Usage:
ThreadPool pool(8);
pool.submit([]{ process_frame(1); });
pool.submit([]{ process_frame(2); });
// Pool destructor waits for all work to complete

Frequently Asked Questions

When should I use std::async vs std::jthread? Use std::async for simple parallel computations that return a value - it returns a std::future<T> you can get() later. Use std::jthread for long-running services, event loops, or any thread that runs until explicitly stopped. For structured concurrency (C++26 planned), prefer executors and coroutines over raw threads.

What is a data race and how is it different from a race condition? A data race is when two threads concurrently access the same memory location, at least one write, with no synchronization - this is undefined behavior in C++. A race condition is a logical bug where the program outcome depends on execution order - even with synchronization, wrong locking can produce wrong results. ASan + TSan (ThreadSanitizer) detect data races at runtime.

Is std::atomic always lock-free? Check with is_lock_free() at runtime, or is_always_lock_free at compile time. Integral types (int, long, ptr) are always lock-free on modern platforms. For types larger than a register (>8 bytes), the implementation uses an internal mutex, defeating the purpose of atomic.

Key Takeaway

C++ concurrency in 2026 means std::jthread + std::stop_token for thread lifecycle, scoped_lock for multi-mutex deadlock prevention, shared_mutex for read-heavy workloads, and std::atomic with correct memory orders for lock-free state. The C++ memory model is not optional - incorrect ordering produces subtle, hardware-specific bugs that are nearly impossible to reproduce in debug builds.

Part of the C++ Mastery Course - 30 modules from modern C++ basics to expert systems engineering.

C++ Multithreading: jthread, mutex, atomic, Memory Model & Lock-Free Programming (C++20)

C++ Multithreading: jthread, mutex, atomic, Memory Model & Lock-Free Programming (C++20)

Table of Contents

The C++ Concurrency Landscape

std::jthread: RAII Thread with Stop Support (C++20)

Mutex Types and When to Use Each

scoped_lock: Multi-Mutex Deadlock Prevention

condition_variable: Producer-Consumer Pattern

The C++ Memory Model: happens-before and memory_order

std::atomic: Lock-Free Operations

std::latch, std::barrier & std::semaphore (C++20)

Thread Pool Design with jthread and queue

Frequently Asked Questions

Key Takeaway