Software ArchitectureSystem Design

Database Sharding: Horizontal Partitioning Strategies for Petabyte-Scale Systems

TT
TopicTrick Team
Database Sharding: Horizontal Partitioning Strategies for Petabyte-Scale Systems

Database Sharding: Horizontal Partitioning Strategies for Petabyte-Scale Systems


Table of Contents


Vertical vs Horizontal Scaling: When Each Fails

Before sharding, exhaust these options:

  1. Query optimisation (indexes, query plans, avoid N+1)
  2. Read replicas (offload 80%+ of traffic)
  3. Vertical scaling (larger instance type)
  4. Caching layer (Redis — serve hot data without DB)
  5. Database partitioning (single DB, multiple files — simpler than sharding)

Sharding should be the last resort — it fundamentally changes how your application queries data.


What Sharding Actually Does

Sharding horizontally partitions rows across multiple database instances. Each shard is an independent database that owns a subset of the data:

text
Unsharded (1 server):     Sharded (3 servers):
┌──────────────────┐      ┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│ orders table     │      │ Shard 1     │  │ Shard 2     │  │ Shard 3     │
│ id=1  user=alice │  →   │ id=1 alice  │  │ id=6 carol  │  │ id=4 eve    │
│ id=2  user=bob   │      │ id=2 bob    │  │ id=7 dave   │  │ id=5 frank  │
│ id=3  user=carol │      │ id=3 carol? │  └─────────────┘  └─────────────┘
│ ...10M rows...   │      └─────────────┘
└──────────────────┘      Each shard handles 3.3M rows
                          Each accepts writes independently

Your application layer (or a sharding proxy) must determine which shard to route each query to based on a sharding key.


The Sharding Key Decision: Most Critical Choice

The sharding key is the column used to determine which shard owns a row. A poor sharding key causes hot shards — one shard receiving disproportionate traffic while others sit idle.

Sharding KeyDistributionQuery PatternRisk
user_idEven (high cardinality)All user data on one shard✅ Good default
tenant_idUneven (large tenants)Complete tenant isolationHot shard for large tenants
created_atAll new data to one shardTime-range queries land on one shard❌ Always hot
random UUIDPerfect distributionNo locality — every query may hit any shardCross-shard joins nightmare
regionDepends on geographyGeographic localityHot if users cluster by region

The hot shard problem in detail:

text
created_at-based sharding (2026 data):
Shard 1: 2024 data  → 0% of current writes
Shard 2: 2025 data  → 5% of current writes
Shard 3: 2026 data  → 95% of all current writes (BURNING HOT)

user_id-based sharding (hash to 3 shards):
Shard 1: users where hash(user_id) % 3 == 0 → 33% of writes
Shard 2: users where hash(user_id) % 3 == 1 → 33% of writes
Shard 3: users where hash(user_id) % 3 == 2 → 33% of writes (BALANCED)

Strategy 2: Hash Sharding

Hash sharding applies a hash function to the sharding key to determine the shard:

python
def get_shard(user_id: int, num_shards: int) -> int:
    return hash(user_id) % num_shards
    # hash("user-12345") % 3 → Shard 1
    # hash("user-99999") % 3 → Shard 0

# Routing in application code:
def get_order(user_id: str, order_id: str):
    shard_index = get_shard(hash(user_id), NUM_SHARDS)
    db = shard_connections[shard_index]
    return db.query("SELECT * FROM orders WHERE id = %s", [order_id])

Advantage: Perfectly even distribution — no hot shards.

Disadvantage: Adding a shard requires re-hashing all data. If you go from 3 shards to 4 shards, hash(user_id) % 4 produces different shard assignments — most data must move.


Strategy 3: Consistent Hashing — Minimising Resharding

Consistent hashing places both shards and keys on a virtual "ring". When a shard is added or removed, only keys near that shard's position on the ring need to move:

text
Consistent Hash Ring (simplified):

         Key A → Shard 1
    ┌────────────────────────┐
    │   Shard 3    Shard 1   │  Adding Shard 4:
    │         ↑              │  Only 25% of keys move
    │   Key D   Key B        │  (compared to 75% with simple modulo)
    │         ↓              │
    │        Shard 2         │
    └────────────────────────┘

Simple modulo (3→4 shards): 75% of keys move to different shard
Consistent hashing (3→4 shards): ~25% of keys move

Used by: DynamoDB, Cassandra, Redis Cluster, Riak

Implementation: Use a sorted array of virtual nodes. Libraries: ketama (Python), HashRing (Java), or use a database with built-in consistent hashing (Redis Cluster, Cassandra).


Strategy 4: Directory Sharding (Lookup Table)

A central lookup table maps each entity ID to its shard:

sql
-- Shard directory table (in a dedicated metadata DB):
CREATE TABLE shard_directory (
    user_id    BIGINT PRIMARY KEY,
    shard_id   SMALLINT NOT NULL,
    created_at TIMESTAMP NOT NULL
);

-- Routing in application:
SELECT shard_id FROM shard_directory WHERE user_id = $1;
-- Then query the appropriate shard

-- Rebalancing is simple: UPDATE shard_directory SET shard_id = 3 WHERE user_id = 12345;
-- Then migrate that user's data to shard 3
-- No formula change needed!

Advantage: Maximum flexibility — entries can be moved to different shards at will. Easy to handle "whale" tenants (move large tenants to dedicated shards).

Disadvantage: The directory table is a critical single point of failure and a performance bottleneck for every query. Must be heavily cached with Redis and designed for HA.


Cross-Shard Queries: The Pain of Sharding

After sharding, queries that previously used JOIN become complex:

python
# ❌ BEFORE sharding — simple join:
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.region = 'EU'
GROUP BY u.id;

# ❌ AFTER sharding — impossible as a single query:
# users are on Shard 1, 2, 3... orders are on Shard 1, 2, 3...
# You can't JOIN across shards in a single SQL statement

# ✅ Application-level join (scatter-gather):
eu_users = []
for shard in all_shards:
    eu_users.extend(shard.query("SELECT id, name FROM users WHERE region = 'EU'"))

user_ids = [u.id for u in eu_users]

# For each shard, get order counts for users that live there:
order_counts = {}
for shard in all_shards:
    shard_counts = shard.query(
        f"SELECT user_id, COUNT(*) AS cnt FROM orders WHERE user_id IN ({','.join(user_ids)}) GROUP BY user_id"
    )
    order_counts.update({row.user_id: row.cnt for row in shard_counts})

# Merge in application memory — expensive and slow!
result = [(u.name, order_counts.get(u.id, 0)) for u in eu_users]

Mitigation strategies:

  • Design your sharding key so that related entities (users + their orders) live on the same shard
  • Use a data warehouse (BigQuery, Redshift) for cross-shard analytics queries
  • Denormalise data — store copies on both shards for frequently-joined tables

Managed Sharding: Vitess and Citus

Building sharding middleware yourself is an enormous engineering investment. Use managed solutions:

Vitess (YouTube/MySQL): A proxy layer + orchestration system for MySQL sharding. Your app connects to Vitess as if it's a single MySQL server; Vitess routes queries to the right shard, handles resharding, manages replication, and provides a web UI for operations.

Citus (PostgreSQL): A PostgreSQL extension that transforms a single Postgres instance into a sharded cluster. Native PostgreSQL protocol — apps connect as if to a single database. Supports hash and range distribution, parallel query execution across shards, and transparent resharding.

sql
-- Citus: Mark a table for sharding
SELECT create_distributed_table('orders', 'user_id');
-- Citus handles all routing transparently from now on
-- All queries via psql/JDBC work as normal — Citus routes internally

Frequently Asked Questions

How do I handle transactions across shards? You generally can't use database-level transactions across shards — each shard has an independent transaction manager. Use the Saga Pattern for cross-shard operations: break the transaction into compensatable local transactions on each shard. For financial systems requiring strict atomicity, consider keeping all financial data for one entity on the same shard.

Can I shard without changing application code? With Vitess or Citus, you can shard transparently to the application for most query types. However, cross-shard JOINs and some aggregate queries require application changes. The rule: if you design your sharding key so queries always specify the sharding key, most queries route to a single shard with zero application changes.


Key Takeaway

Database sharding is a last-resort scaling strategy that comes with a high permanent complexity cost — loss of cross-shard joins, loss of multi-row transactions, need for scatter-gather queries for analytics. The sharding key decision is irreversible in practice — choosing user_id vs tenant_id vs region determines your system's scaling characteristics for years. Use read replicas, caching, and vertical scaling first. When you genuinely need sharding, prefer managed solutions (Vitess, Citus) over hand-rolled middleware, and invest heavily in choosing a sharding key that colocates related data on the same shard.

Read next: Leader-Follower Replication: High-Availability Database Patterns →


Part of the Software Architecture Hub — comprehensive guides from architectural foundations to advanced distributed systems patterns.