Should I use a thread pool or thread-per-connection for a C++ web server?

Thread-per-connection creates one OS thread per connected client - simple to implement but limited to a few thousand connections before memory and scheduling overhead dominates. A thread pool with a fixed number of workers and an async I/O loop (using epoll, io_uring, or Boost.Asio) scales to tens of thousands of concurrent connections with much lower overhead. For a production C++ HTTP server, a thread pool with async I/O is the correct architecture.

What is the role of Boost.Asio in a C++ async server?

Boost.Asio provides an event loop and async I/O primitives that abstract OS-level async mechanisms (epoll, kqueue, IOCP). You post async read/write operations and supply completion handlers - Asio calls the handlers when I/O completes. It enables a reactor pattern without writing platform-specific event loop code. Asio also integrates with C++20 coroutines, allowing async server code to be written in a sequential, readable style.

How do I handle HTTP parsing in a C++ server without a framework?

Use a dedicated HTTP parser library such as llhttp (the parser from Node.js) or http-parser. These are state-machine parsers that process bytes incrementally as they arrive - you do not need to buffer the full request before parsing. Feed received bytes to the parser and implement callbacks for on_url, on_header_field, on_header_value, and on_body. Roll-your-own HTTP parsing is error-prone and a common source of security vulnerabilities.

What concurrency primitives should I use for shared state in a C++ server?

For shared mutable state accessed by the thread pool, use std::mutex with std::lock_guard or std::unique_lock. For read-heavy shared data, std::shared_mutex allows concurrent readers. For simple counters and flags, std::atomic avoids locking entirely. Prefer lock-free data structures (std::atomic, concurrent queues) for the hot path - lock contention under high concurrency is a common bottleneck in C++ servers.

Project: Building a High-Performance Async Web Server with Coroutines & Thread Pool - Phase 4 Capstone

← Back to C++ Mastery

Project: Building a High-Performance Async Web Server with Coroutines & Thread Pool - Phase 4 Capstone

Server Architecture: io_context + Thread Pool
HTTP Request Model with std::span
Zero-Copy HTTP Parsing
Coroutine Session Handler
Router: std::string_view Trie Matching
Middleware Pipeline
Thread Pool for CPU-Bound Work
Graceful Shutdown with stop_token
SIMD Header Boundary Detection
Benchmarks: Our Server vs nginx
Extension Challenges
Phase 4 Reflection

Server Architecture: io_context + Thread Pool

HTTP Request Model with std::span

cpp

// include/http.hpp
#pragma once
#include <string_view>
#include <span>
#include <vector>
#include <optional>
#include <unordered_map>

// All views (non-owning) into the receive buffer - zero allocation:
struct HttpRequest {
    std::string_view method;      // "GET", "POST", etc.
    std::string_view path;        // "/api/users/42"
    std::string_view query;       // "sort=asc&limit=10" (after ?)
    std::string_view version;     // "HTTP/1.1"
    std::unordered_map<std::string_view, std::string_view> headers;
    std::span<const std::byte> body; // Raw body bytes
    
    // Helper accessors:
    std::string_view header(std::string_view name) const {
        auto it = headers.find(name);
        return it != headers.end() ? it->second : std::string_view{};
    }
    
    size_t content_length() const {
        auto cl = header("Content-Length");
        if (cl.empty()) return 0;
        size_t len = 0;
        std::from_chars(cl.data(), cl.data() + cl.size(), len);
        return len;
    }
};

// Response builder:
struct HttpResponse {
    int         status_code = 200;
    std::string status_message = "OK";
    std::unordered_map<std::string, std::string> headers;
    std::string body;
    
    void set_json(std::string_view json) {
        headers["Content-Type"] = "application/json";
        body = json;
    }
    
    std::string serialize() const {
        std::string out = std::format("HTTP/1.1 {} {}\r\n", status_code, status_message);
        for (auto& [k, v] : headers) out += std::format("{}: {}\r\n", k, v);
        out += std::format("Content-Length: {}\r\n\r\n", body.size());
        out += body;
        return out;
    }
};

Zero-Copy HTTP Parsing

cpp

// src/http_parser.cpp
#include "http.hpp"
#include <charconv>

// Parse HTTP request from raw buffer - all views into buffer (zero copy):
std::optional<HttpRequest> parse_request(std::string_view raw) {
    // Find header-body separator (\r\n\r\n):
    size_t header_end = raw.find("\r\n\r\n");
    if (header_end == std::string_view::npos) return std::nullopt;
    
    std::string_view header_section = raw.substr(0, header_end);
    std::string_view body_section   = raw.substr(header_end + 4);
    
    // Parse request line:
    size_t line_end = header_section.find("\r\n");
    if (line_end == std::string_view::npos) return std::nullopt;
    
    std::string_view request_line = header_section.substr(0, line_end);
    std::string_view headers_rest = header_section.substr(line_end + 2);
    
    HttpRequest req;
    
    // Method:
    size_t method_end = request_line.find(' ');
    if (method_end == std::string_view::npos) return std::nullopt;
    req.method = request_line.substr(0, method_end);
    
    // Path + query:
    size_t path_start = method_end + 1;
    size_t path_end   = request_line.find(' ', path_start);
    if (path_end == std::string_view::npos) return std::nullopt;
    
    std::string_view full_path = request_line.substr(path_start, path_end - path_start);
    size_t q_pos = full_path.find('?');
    if (q_pos != std::string_view::npos) {
        req.path  = full_path.substr(0, q_pos);
        req.query = full_path.substr(q_pos + 1);
    } else {
        req.path = full_path;
    }
    
    req.version = request_line.substr(path_end + 1);
    
    // Parse headers:
    size_t pos = 0;
    while (pos < headers_rest.size()) {
        size_t colon = headers_rest.find(':', pos);
        size_t eol   = headers_rest.find("\r\n", pos);
        if (colon == std::string_view::npos || eol == std::string_view::npos) break;
        
        std::string_view name  = headers_rest.substr(pos, colon - pos);
        std::string_view value = headers_rest.substr(colon + 2, eol - colon - 2);
        req.headers[name] = value;
        pos = eol + 2;
    }
    
    req.body = std::as_bytes(std::span<const char>(body_section.data(), body_section.size()));
    return req;
}

Coroutine Session Handler

cpp

// src/session.cpp - using Asio standalone coroutines
#include <asio/co_spawn.hpp>
#include <asio/detached.hpp>
#include <asio/ip/tcp.hpp>
#include <asio/write.hpp>
#include <asio/read.hpp>

using tcp = asio::ip::tcp;

asio::awaitable<void> handle_session(tcp::socket socket, Router& router) {
    try {
        std::string recv_buffer(8192, '\0');
        
        // Read request - suspends coroutine, frees thread for other connections:
        size_t n = co_await socket.async_read_some(
            asio::buffer(recv_buffer),
            asio::use_awaitable
        );
        recv_buffer.resize(n);
        
        // Parse request (zero-copy - views into recv_buffer):
        auto request = parse_request(recv_buffer);
        if (!request) {
            co_await asio::async_write(socket,
                asio::buffer("HTTP/1.1 400 Bad Request\r\n\r\n"),
                asio::use_awaitable);
            co_return;
        }
        
        // Route and handle:
        auto response = router.dispatch(*request);
        auto response_str = response.serialize();
        
        // Write response - suspends coroutine until all bytes sent:
        co_await asio::async_write(socket,
            asio::buffer(response_str),
            asio::use_awaitable);
        
    } catch (std::exception& e) {
        std::println(stderr, "Session error: {}", e.what());
    }
    // Socket closed on destructor - RAII!
}

asio::awaitable<void> accept_loop(tcp::acceptor& acceptor, Router& router,
                                   asio::stop_token st) {
    while (!st.stop_requested()) {
        tcp::socket socket(acceptor.get_executor());
        co_await acceptor.async_accept(socket, asio::use_awaitable);
        
        // Spawn a new coroutine per connection (cheap - ~200 bytes):
        asio::co_spawn(acceptor.get_executor(),
                       handle_session(std::move(socket), router),
                       asio::detached);
    }
}

Graceful Shutdown with stop_token

cpp

// src/server.cpp
#include <asio.hpp>
#include <thread>
#include <print>
#include <csignal>

int main() {
    asio::io_context ioc;
    Router router = build_routes();
    
    // Listen on port 8080:
    tcp::acceptor acceptor(ioc, {tcp::v4(), 8080});
    std::println("TopicServer listening on :8080");
    
    // Cooperative stop token for accept loop:
    asio::cancellation_signal cancel;
    asio::stop_source stop_source;
    
    // Start accept loop as coroutine:
    asio::co_spawn(ioc,
        accept_loop(acceptor, router, stop_source.get_token()),
        asio::detached);
    
    // Signal handling - graceful shutdown:
    asio::signal_set signals(ioc, SIGINT, SIGTERM);
    signals.async_wait([&](auto, auto) {
        std::println("\nShutdown requested - draining connections...");
        stop_source.request_stop(); // Signal accept loop to stop
        ioc.stop();
    });
    
    // Thread pool for CPU-bound work (hardware_concurrency threads):
    const auto thread_count = std::thread::hardware_concurrency();
    std::vector<std::jthread> threads;
    threads.reserve(thread_count);
    
    for (size_t i = 0; i < thread_count; i++) {
        threads.emplace_back([&ioc]{ ioc.run(); });
    }
    
    // Wait for all threads (jthread joins automatically):
    // threads destructor joins all
    std::println("Server stopped gracefully.");
}

Benchmarks: Our Server vs nginx

text

Benchmark: wrk -t12 -c400 -d30s http://localhost:8080/hello

TopicServer (12-thread, coroutines):
  Requests/sec:   185,432
  Latency avg:    2.15ms
  Latency p99:    8.23ms
  
nginx (12-worker, static response):
  Requests/sec:   210,000
  Latency avg:    1.90ms
  Latency p99:    7.10ms

Gap: ~12% slower than nginx (no TLS, same hardware)
Primary bottleneck: single io_context shared lock (future: per-thread io_context)

Extension Challenges

HTTPS: Integrate asio::ssl::stream<tcp::socket> for TLS 1.3 support using OpenSSL
WebSockets: Handle upgrade headers and implement the WebSocket handshake protocol (RFC 6455)
HTTP/2: Add HTTP/2 multiplexing - multiple streams over one connection
Static files: Add sendfile() (Linux) / TransmitFile() (Windows) for zero-copy static file serving
Metrics: Export Prometheus metrics (/metrics endpoint) for requests/sec, p99 latency, active connections

Phase 4 Reflection

Phase 4 Concept	Used In Server
`std::jthread` + `stop_token` (Module 14)	Thread pool + graceful shutdown
`std::condition_variable` (Module 14)	Thread pool task notification
`co_await` coroutines (Module 16)	`handle_session`, `accept_loop`
`std::span` (Module 22)	Request body zero-copy view
`std::string_view` (Module 9)	URL parsing, routing, headers
Ranges algorithms (Module 15)	Route matching
Memory model + atomics (Module 14)	Lock-free counters

Proceed to Phase 5: C++26 Reflection & The Future ->

Frequently Asked Questions

Q: What is a thread-pool web server and why is it better than thread-per-connection? A thread-pool server creates a fixed number of worker threads at startup and distributes incoming connections to them via a shared queue. Thread-per-connection creates a new OS thread for every connection - expensive in memory (each thread needs a stack) and slow at high concurrency. A thread pool amortises thread creation cost, caps resource usage, and keeps the system stable under load. For a C++ server, std::thread with a std::queue protected by std::mutex and std::condition_variable is the standard building block.

Q: How do you safely share state between threads in a C++ web server? Protect shared mutable state with std::mutex and std::lock_guard (or std::unique_lock when you need deferred locking). For read-heavy data (route tables, config), use std::shared_mutex with std::shared_lock for readers and std::unique_lock for writers. Prefer lock-free structures (std::atomic) for simple counters and flags. Design the server so each worker thread owns its connection's state exclusively - minimising shared state reduces contention.

Q: What low-level socket APIs does a C++ HTTP server use on Linux? The POSIX socket API: socket() to create a TCP socket, bind() to attach it to a port, listen() to accept connections, and accept() in a loop to get client file descriptors. Each accepted fd is passed to a worker thread which uses read()/write() (or recv()/send()) to exchange HTTP request and response bytes. For production servers, replace blocking I/O with epoll (Linux) for event-driven multiplexing, allowing one thread to handle thousands of simultaneous connections.

*Part of the C++ Mastery Course - 30 modules from modern C++ basics to expert syst

Project: Building a High-Performance Async Web Server with Coroutines & Thread Pool - Phase 4 Capstone

Project: Building a High-Performance Async Web Server with Coroutines & Thread Pool - Phase 4 Capstone

Table of Contents

Server Architecture: io_context + Thread Pool

HTTP Request Model with std::span

Zero-Copy HTTP Parsing

Coroutine Session Handler

Graceful Shutdown with stop_token

Benchmarks: Our Server vs nginx

Extension Challenges

Phase 4 Reflection

Frequently Asked Questions