C++Projects

Project: Building a High-Performance In-Memory Key-Value Store (TopicKV) — Phase 2 Capstone

TT
TopicTrick Team
Project: Building a High-Performance In-Memory Key-Value Store (TopicKV) — Phase 2 Capstone

Project: Building a High-Performance In-Memory Key-Value Store (TopicKV) — Phase 2 Capstone


Table of Contents


Architecture: Sharded Store Design

To avoid a single global lock bottlenecking all operations, TopicKV uses sharding — 16 independent sub-maps each with their own mutex:


Core Data Model: Entry with TTL

cpp
// include/topickv.hpp
#pragma once
#include <string>
#include <string_view>
#include <optional>
#include <memory>
#include <chrono>
#include <unordered_map>
#include <shared_mutex>
#include <array>
#include <atomic>

using Clock     = std::chrono::steady_clock;
using TimePoint = std::chrono::time_point<Clock>;
using Duration  = std::chrono::milliseconds;

// Internal entry — owns the value and tracks TTL:
struct Entry {
    std::shared_ptr<std::string> value;     // Shared ownership for snapshot reads
    std::optional<TimePoint>     expires_at; // nullopt = no expiry
    TimePoint                    last_access; // For LRU tracking
    
    explicit Entry(std::string val, std::optional<Duration> ttl = std::nullopt)
        : value(std::make_shared<std::string>(std::move(val)))
        , last_access(Clock::now())
    {
        if (ttl) expires_at = Clock::now() + *ttl;
    }
    
    bool is_expired() const {
        if (!expires_at) return false;
        return Clock::now() >= *expires_at;
    }
    
    void touch() { last_access = Clock::now(); }
};

Sharded Storage for Scalability

cpp
class TopicKV {
    static constexpr size_t SHARD_COUNT = 16;
    static constexpr size_t MAX_ENTRIES = 10'000; // Per shard
    
    struct Shard {
        mutable std::shared_mutex          mtx;
        std::unordered_map<std::string, Entry> data;
    };
    
    std::array<Shard, SHARD_COUNT> shards_;
    std::atomic<size_t> total_gets_{0};
    std::atomic<size_t> total_sets_{0};
    std::atomic<size_t> cache_hits_{0};
    std::atomic<size_t> cache_misses_{0};
    
    // Determine which shard owns a key:
    Shard& shard_for(std::string_view key) {
        size_t h = std::hash<std::string_view>{}(key);
        return shards_[h % SHARD_COUNT];
    }
    const Shard& shard_for(std::string_view key) const {
        size_t h = std::hash<std::string_view>{}(key);
        return shards_[h % SHARD_COUNT];
    }

Thread-Safe GET with std::optional

cpp
public:
    // GET: returns shared_ptr for snapshot safety (value stays valid even if key deleted)
    std::optional<std::shared_ptr<std::string>> get(std::string_view key) {
        auto& shard = shard_for(key);
        std::shared_lock lock(shard.mtx); // Multiple concurrent readers OK
        
        total_gets_.fetch_add(1, std::memory_order_relaxed);
        
        auto it = shard.data.find(std::string(key));
        if (it == shard.data.end()) {
            cache_misses_.fetch_add(1, std::memory_order_relaxed);
            return std::nullopt; // Key not found
        }
        
        // Lazy TTL check:
        if (it->second.is_expired()) {
            // Can't erase under shared_lock — caller should call del() or GC handles it
            cache_misses_.fetch_add(1, std::memory_order_relaxed);
            return std::nullopt; // Pretend it's gone
        }
        
        it->second.touch(); // Mark LRU access time
        cache_hits_.fetch_add(1, std::memory_order_relaxed);
        return it->second.value; // Return shared_ptr — safe even after shard unlock
    }
    
    // GET with fallback value:
    std::string get_or(std::string_view key, std::string_view fallback) {
        auto result = get(key);
        return result ? **result : std::string(fallback);
    }

Thread-Safe SET with RAII

cpp
    // SET: insert or update, with optional TTL
    void set(std::string_view key, std::string value,
             std::optional<Duration> ttl = std::nullopt) {
        auto& shard = shard_for(key);
        std::unique_lock lock(shard.mtx); // Exclusive write access
        
        total_sets_.fetch_add(1, std::memory_order_relaxed);
        
        // Evict if at capacity:
        if (shard.data.size() >= MAX_ENTRIES) {
            evict_lru(shard);
        }
        
        // Insert or update:
        auto [it, inserted] = shard.data.try_emplace(
            std::string(key),
            std::move(value),
            ttl
        );
        if (!inserted) {
            // Key exists — update in-place:
            it->second = Entry(std::move(value), ttl);
        }
    }
    
    // DEL: remove a key:
    bool del(std::string_view key) {
        auto& shard = shard_for(key);
        std::unique_lock lock(shard.mtx);
        return shard.data.erase(std::string(key)) > 0;
    }
    
    // MSET: set multiple at once (atomically within each shard):
    void mset(std::initializer_list<std::pair<std::string_view, std::string>> entries) {
        for (auto& [key, value] : entries) set(key, value);
    }

TTL Expiry and Lazy Deletion

cpp
    // Garbage collect all expired entries (call periodically):
    size_t gc() {
        size_t removed = 0;
        for (auto& shard : shards_) {
            std::unique_lock lock(shard.mtx);
            auto it = shard.data.begin();
            while (it != shard.data.end()) {
                if (it->second.is_expired()) {
                    it = shard.data.erase(it);
                    removed++;
                } else {
                    ++it;
                }
            }
        }
        return removed;
    }

LRU Eviction Policy

cpp
private:
    // Evict the least-recently-used entry from a shard (must hold unique_lock):
    static void evict_lru(Shard& shard) {
        if (shard.data.empty()) return;
        
        // Find the entry with the oldest last_access:
        auto oldest = std::ranges::min_element(shard.data, {},
            [](const auto& entry) { return entry.second.last_access; });
        
        if (oldest != shard.data.end()) {
            shard.data.erase(oldest);
        }
    }

Command-Line REPL with std::format

cpp
// src/repl.cpp
#include <print>
#include <string>
#include <sstream>
#include "topickv.hpp"

void run_repl(TopicKV& store) {
    std::println("TopicKV v1.0 — type HELP for commands");
    std::string line;
    
    while (std::print("> "), std::getline(std::cin, line)) {
        std::istringstream iss(line);
        std::string cmd;
        iss >> cmd;
        
        if (cmd == "SET") {
            std::string key, val;
            iss >> key >> val;
            store.set(key, val);
            std::println("OK");
            
        } else if (cmd == "SETEX") {  // SET with TTL
            std::string key, val;
            int ms;
            iss >> key >> ms >> val;
            store.set(key, val, std::chrono::milliseconds(ms));
            std::println("OK (expires in {}ms)", ms);
            
        } else if (cmd == "GET") {
            std::string key;
            iss >> key;
            auto result = store.get(key);
            result ? std::println("{}", **result) : std::println("(nil)");
            
        } else if (cmd == "DEL") {
            std::string key;
            iss >> key;
            std::println("{}", store.del(key) ? "1" : "0");
            
        } else if (cmd == "GC") {
            std::println("Removed {} expired entries", store.gc());
            
        } else if (cmd == "STATS") {
            auto [gets, sets, hits, misses] = store.stats();
            std::println("GETs: {} | SETs: {} | Hits: {} | Misses: {} | Rate: {:.1f}%",
                         gets, sets, hits, misses,
                         gets > 0 ? 100.0 * hits / gets : 0.0);
        } else if (cmd == "QUIT" || cmd == "EXIT") {
            break;
        } else {
            std::println("Unknown command. Commands: SET SETEX GET DEL GC STATS QUIT");
        }
    }
}

Extension Challenges

  1. Persistence: Serialize the entire store to a binary file on SIGTERM (use std::ofstream + std::bit_cast for POD types)
  2. Pub/Sub: Add a SUBSCRIBE key / PUBLISH key msg command using std::condition_variable and a std::deque<std::string> channel
  3. Transactions: Implement MULTI / EXEC by buffering commands in a std::vector<Command> and applying atomically
  4. Network interface: Wrap the store in a TCP server using std::jthread (from Module 14) — one thread per connection using the REPL dispatch logic

Phase 2 Reflection

Phase 2 ConceptUsed In TopicKV
std::unordered_map (Module 10)Core key-value storage per shard
std::shared_ptr (Module 3)Entry value — snapshot safety
std::shared_mutex (Module 14)Concurrent reader/writer safety
std::optional (Module 6)GET return type — explicit "not found"
std::chronoTTL timestamps, expiry comparison
RAII locks (Module 8)shared_lock / unique_lock auto-release
std::atomic (Module 14)Lock-free statistics counters
std::ranges::min_element (Module 15)LRU eviction algorithm

Proceed to Phase 3: Inheritance & Composition →

Frequently Asked Questions

Q: What C++ data structures are best suited for building an in-memory key-value store? std::unordered_map<Key, Value> provides O(1) average-case lookup and is the natural choice for a hash-based key-value store. For ordered range queries, std::map (red-black tree, O(log n)) is better. For a concurrent store, wrap with std::shared_mutex for read-heavy workloads or use a sharded approach (multiple maps each with their own lock) to reduce contention. For extremely high throughput, consider lock-free hash maps from libraries like libcuckoo or Junction.

Q: How do you implement expiry (TTL) for keys in a C++ in-memory store? Store the expiry timestamp alongside each value — std::chrono::steady_clock::time_point. On every read, check whether the current time exceeds the stored expiry and return "not found" if so (lazy expiration). For active cleanup, maintain a priority queue (std::priority_queue) sorted by expiry time and run a background thread that periodically pops and deletes expired keys. This two-pronged approach (lazy + active) balances memory usage and read latency.

Q: How does an in-memory datastore project demonstrate important C++ concepts? It exercises RAII for resource management (automatic cleanup of allocated entries), templates and type erasure for storing heterogeneous value types, concurrency primitives (std::mutex, std::shared_mutex, std::condition_variable) for thread safety, move semantics for zero-copy value insertion, and std::chrono for TTL management. It also naturally motivates benchmarking with std::chrono::high_resolution_clock and profiling to understand cache behaviour — making it an excellent capstone project for systems-level C++ skills.


Part of the C++ Mastery Course — 30 modules from modern C++ basics to expert systems engineering.