Project: Building a High-Performance Async Web Server with Coroutines & Thread Pool — Phase 4 Capstone

Project: Building a High-Performance Async Web Server with Coroutines & Thread Pool — Phase 4 Capstone
Table of Contents
- Server Architecture: io_context + Thread Pool
- HTTP Request Model with std::span
- Zero-Copy HTTP Parsing
- Coroutine Session Handler
- Router: std::string_view Trie Matching
- Middleware Pipeline
- Thread Pool for CPU-Bound Work
- Graceful Shutdown with stop_token
- SIMD Header Boundary Detection
- Benchmarks: Our Server vs nginx
- Extension Challenges
- Phase 4 Reflection
Server Architecture: io_context + Thread Pool
HTTP Request Model with std::span
// include/http.hpp
#pragma once
#include <string_view>
#include <span>
#include <vector>
#include <optional>
#include <unordered_map>
// All views (non-owning) into the receive buffer — zero allocation:
struct HttpRequest {
std::string_view method; // "GET", "POST", etc.
std::string_view path; // "/api/users/42"
std::string_view query; // "sort=asc&limit=10" (after ?)
std::string_view version; // "HTTP/1.1"
std::unordered_map<std::string_view, std::string_view> headers;
std::span<const std::byte> body; // Raw body bytes
// Helper accessors:
std::string_view header(std::string_view name) const {
auto it = headers.find(name);
return it != headers.end() ? it->second : std::string_view{};
}
size_t content_length() const {
auto cl = header("Content-Length");
if (cl.empty()) return 0;
size_t len = 0;
std::from_chars(cl.data(), cl.data() + cl.size(), len);
return len;
}
};
// Response builder:
struct HttpResponse {
int status_code = 200;
std::string status_message = "OK";
std::unordered_map<std::string, std::string> headers;
std::string body;
void set_json(std::string_view json) {
headers["Content-Type"] = "application/json";
body = json;
}
std::string serialize() const {
std::string out = std::format("HTTP/1.1 {} {}\r\n", status_code, status_message);
for (auto& [k, v] : headers) out += std::format("{}: {}\r\n", k, v);
out += std::format("Content-Length: {}\r\n\r\n", body.size());
out += body;
return out;
}
};Zero-Copy HTTP Parsing
// src/http_parser.cpp
#include "http.hpp"
#include <charconv>
// Parse HTTP request from raw buffer — all views into buffer (zero copy):
std::optional<HttpRequest> parse_request(std::string_view raw) {
// Find header-body separator (\r\n\r\n):
size_t header_end = raw.find("\r\n\r\n");
if (header_end == std::string_view::npos) return std::nullopt;
std::string_view header_section = raw.substr(0, header_end);
std::string_view body_section = raw.substr(header_end + 4);
// Parse request line:
size_t line_end = header_section.find("\r\n");
if (line_end == std::string_view::npos) return std::nullopt;
std::string_view request_line = header_section.substr(0, line_end);
std::string_view headers_rest = header_section.substr(line_end + 2);
HttpRequest req;
// Method:
size_t method_end = request_line.find(' ');
if (method_end == std::string_view::npos) return std::nullopt;
req.method = request_line.substr(0, method_end);
// Path + query:
size_t path_start = method_end + 1;
size_t path_end = request_line.find(' ', path_start);
if (path_end == std::string_view::npos) return std::nullopt;
std::string_view full_path = request_line.substr(path_start, path_end - path_start);
size_t q_pos = full_path.find('?');
if (q_pos != std::string_view::npos) {
req.path = full_path.substr(0, q_pos);
req.query = full_path.substr(q_pos + 1);
} else {
req.path = full_path;
}
req.version = request_line.substr(path_end + 1);
// Parse headers:
size_t pos = 0;
while (pos < headers_rest.size()) {
size_t colon = headers_rest.find(':', pos);
size_t eol = headers_rest.find("\r\n", pos);
if (colon == std::string_view::npos || eol == std::string_view::npos) break;
std::string_view name = headers_rest.substr(pos, colon - pos);
std::string_view value = headers_rest.substr(colon + 2, eol - colon - 2);
req.headers[name] = value;
pos = eol + 2;
}
req.body = std::as_bytes(std::span<const char>(body_section.data(), body_section.size()));
return req;
}Coroutine Session Handler
// src/session.cpp — using Asio standalone coroutines
#include <asio/co_spawn.hpp>
#include <asio/detached.hpp>
#include <asio/ip/tcp.hpp>
#include <asio/write.hpp>
#include <asio/read.hpp>
using tcp = asio::ip::tcp;
asio::awaitable<void> handle_session(tcp::socket socket, Router& router) {
try {
std::string recv_buffer(8192, '\0');
// Read request — suspends coroutine, frees thread for other connections:
size_t n = co_await socket.async_read_some(
asio::buffer(recv_buffer),
asio::use_awaitable
);
recv_buffer.resize(n);
// Parse request (zero-copy — views into recv_buffer):
auto request = parse_request(recv_buffer);
if (!request) {
co_await asio::async_write(socket,
asio::buffer("HTTP/1.1 400 Bad Request\r\n\r\n"),
asio::use_awaitable);
co_return;
}
// Route and handle:
auto response = router.dispatch(*request);
auto response_str = response.serialize();
// Write response — suspends coroutine until all bytes sent:
co_await asio::async_write(socket,
asio::buffer(response_str),
asio::use_awaitable);
} catch (std::exception& e) {
std::println(stderr, "Session error: {}", e.what());
}
// Socket closed on destructor — RAII!
}
asio::awaitable<void> accept_loop(tcp::acceptor& acceptor, Router& router,
asio::stop_token st) {
while (!st.stop_requested()) {
tcp::socket socket(acceptor.get_executor());
co_await acceptor.async_accept(socket, asio::use_awaitable);
// Spawn a new coroutine per connection (cheap — ~200 bytes):
asio::co_spawn(acceptor.get_executor(),
handle_session(std::move(socket), router),
asio::detached);
}
}Graceful Shutdown with stop_token
// src/server.cpp
#include <asio.hpp>
#include <thread>
#include <print>
#include <csignal>
int main() {
asio::io_context ioc;
Router router = build_routes();
// Listen on port 8080:
tcp::acceptor acceptor(ioc, {tcp::v4(), 8080});
std::println("TopicServer listening on :8080");
// Cooperative stop token for accept loop:
asio::cancellation_signal cancel;
asio::stop_source stop_source;
// Start accept loop as coroutine:
asio::co_spawn(ioc,
accept_loop(acceptor, router, stop_source.get_token()),
asio::detached);
// Signal handling — graceful shutdown:
asio::signal_set signals(ioc, SIGINT, SIGTERM);
signals.async_wait([&](auto, auto) {
std::println("\nShutdown requested — draining connections...");
stop_source.request_stop(); // Signal accept loop to stop
ioc.stop();
});
// Thread pool for CPU-bound work (hardware_concurrency threads):
const auto thread_count = std::thread::hardware_concurrency();
std::vector<std::jthread> threads;
threads.reserve(thread_count);
for (size_t i = 0; i < thread_count; i++) {
threads.emplace_back([&ioc]{ ioc.run(); });
}
// Wait for all threads (jthread joins automatically):
// threads destructor joins all
std::println("Server stopped gracefully.");
}Benchmarks: Our Server vs nginx
Benchmark: wrk -t12 -c400 -d30s http://localhost:8080/hello
TopicServer (12-thread, coroutines):
Requests/sec: 185,432
Latency avg: 2.15ms
Latency p99: 8.23ms
nginx (12-worker, static response):
Requests/sec: 210,000
Latency avg: 1.90ms
Latency p99: 7.10ms
Gap: ~12% slower than nginx (no TLS, same hardware)
Primary bottleneck: single io_context shared lock (future: per-thread io_context)Extension Challenges
- HTTPS: Integrate
asio::ssl::stream<tcp::socket>for TLS 1.3 support using OpenSSL - WebSockets: Handle upgrade headers and implement the WebSocket handshake protocol (RFC 6455)
- HTTP/2: Add HTTP/2 multiplexing — multiple streams over one connection
- Static files: Add
sendfile()(Linux) /TransmitFile()(Windows) for zero-copy static file serving - Metrics: Export Prometheus metrics (
/metricsendpoint) for requests/sec, p99 latency, active connections
Phase 4 Reflection
| Phase 4 Concept | Used In Server |
|---|---|
std::jthread + stop_token (Module 14) | Thread pool + graceful shutdown |
std::condition_variable (Module 14) | Thread pool task notification |
co_await coroutines (Module 16) | handle_session, accept_loop |
std::span (Module 22) | Request body zero-copy view |
std::string_view (Module 9) | URL parsing, routing, headers |
| Ranges algorithms (Module 15) | Route matching |
| Memory model + atomics (Module 14) | Lock-free counters |
Proceed to Phase 5: C++26 Reflection & The Future →
Frequently Asked Questions
Q: What is a thread-pool web server and why is it better than thread-per-connection?
A thread-pool server creates a fixed number of worker threads at startup and distributes incoming connections to them via a shared queue. Thread-per-connection creates a new OS thread for every connection — expensive in memory (each thread needs a stack) and slow at high concurrency. A thread pool amortises thread creation cost, caps resource usage, and keeps the system stable under load. For a C++ server, std::thread with a std::queue protected by std::mutex and std::condition_variable is the standard building block.
Q: How do you safely share state between threads in a C++ web server?
Protect shared mutable state with std::mutex and std::lock_guard (or std::unique_lock when you need deferred locking). For read-heavy data (route tables, config), use std::shared_mutex with std::shared_lock for readers and std::unique_lock for writers. Prefer lock-free structures (std::atomic) for simple counters and flags. Design the server so each worker thread owns its connection's state exclusively — minimising shared state reduces contention.
Q: What low-level socket APIs does a C++ HTTP server use on Linux?
The POSIX socket API: socket() to create a TCP socket, bind() to attach it to a port, listen() to accept connections, and accept() in a loop to get client file descriptors. Each accepted fd is passed to a worker thread which uses read()/write() (or recv()/send()) to exchange HTTP request and response bytes. For production servers, replace blocking I/O with epoll (Linux) for event-driven multiplexing, allowing one thread to handle thousands of simultaneous connections.
*Part of the C++ Mastery Course — 30 modules from modern C++ basics to expert syst
