Database Sharding: Horizontal Partitioning Strategies for Petabyte-Scale Systems

Database Sharding: Horizontal Partitioning Strategies for Petabyte-Scale Systems
Table of Contents
- Vertical vs Horizontal Scaling: When Each Fails
- What Sharding Actually Does
- The Sharding Key Decision: Most Critical Choice
- Strategy 1: Range Sharding
- Strategy 2: Hash Sharding
- Strategy 3: Consistent Hashing — Minimising Resharding
- Strategy 4: Directory Sharding (Lookup Table)
- Cross-Shard Queries: The Pain of Sharding
- Managed Sharding: Vitess and Citus
- When NOT to Shard
- Resharding: The Hardest Operation
- Frequently Asked Questions
- Key Takeaway
Vertical vs Horizontal Scaling: When Each Fails
Before sharding, exhaust these options:
- Query optimisation (indexes, query plans, avoid N+1)
- Read replicas (offload 80%+ of traffic)
- Vertical scaling (larger instance type)
- Caching layer (Redis — serve hot data without DB)
- Database partitioning (single DB, multiple files — simpler than sharding)
Sharding should be the last resort — it fundamentally changes how your application queries data.
What Sharding Actually Does
Sharding horizontally partitions rows across multiple database instances. Each shard is an independent database that owns a subset of the data:
Unsharded (1 server): Sharded (3 servers):
┌──────────────────┠┌─────────────┠┌─────────────┠┌─────────────â”
│ orders table │ │ Shard 1 │ │ Shard 2 │ │ Shard 3 │
│ id=1 user=alice │ → │ id=1 alice │ │ id=6 carol │ │ id=4 eve │
│ id=2 user=bob │ │ id=2 bob │ │ id=7 dave │ │ id=5 frank │
│ id=3 user=carol │ │ id=3 carol? │ └─────────────┘ └─────────────┘
│ ...10M rows... │ └─────────────┘
└──────────────────┘ Each shard handles 3.3M rows
Each accepts writes independentlyYour application layer (or a sharding proxy) must determine which shard to route each query to based on a sharding key.
The Sharding Key Decision: Most Critical Choice
The sharding key is the column used to determine which shard owns a row. A poor sharding key causes hot shards — one shard receiving disproportionate traffic while others sit idle.
| Sharding Key | Distribution | Query Pattern | Risk |
|---|---|---|---|
user_id | Even (high cardinality) | All user data on one shard | ✅ Good default |
tenant_id | Uneven (large tenants) | Complete tenant isolation | Hot shard for large tenants |
created_at | All new data to one shard | Time-range queries land on one shard | ⌠Always hot |
random UUID | Perfect distribution | No locality — every query may hit any shard | Cross-shard joins nightmare |
region | Depends on geography | Geographic locality | Hot if users cluster by region |
The hot shard problem in detail:
created_at-based sharding (2026 data):
Shard 1: 2024 data → 0% of current writes
Shard 2: 2025 data → 5% of current writes
Shard 3: 2026 data → 95% of all current writes (BURNING HOT)
user_id-based sharding (hash to 3 shards):
Shard 1: users where hash(user_id) % 3 == 0 → 33% of writes
Shard 2: users where hash(user_id) % 3 == 1 → 33% of writes
Shard 3: users where hash(user_id) % 3 == 2 → 33% of writes (BALANCED)Strategy 2: Hash Sharding
Hash sharding applies a hash function to the sharding key to determine the shard:
def get_shard(user_id: int, num_shards: int) -> int:
return hash(user_id) % num_shards
# hash("user-12345") % 3 → Shard 1
# hash("user-99999") % 3 → Shard 0
# Routing in application code:
def get_order(user_id: str, order_id: str):
shard_index = get_shard(hash(user_id), NUM_SHARDS)
db = shard_connections[shard_index]
return db.query("SELECT * FROM orders WHERE id = %s", [order_id])Advantage: Perfectly even distribution — no hot shards.
Disadvantage: Adding a shard requires re-hashing all data. If you go from 3 shards to 4 shards, hash(user_id) % 4 produces different shard assignments — most data must move.
Strategy 3: Consistent Hashing — Minimising Resharding
Consistent hashing places both shards and keys on a virtual "ring". When a shard is added or removed, only keys near that shard's position on the ring need to move:
Consistent Hash Ring (simplified):
Key A → Shard 1
┌────────────────────────â”
│ Shard 3 Shard 1 │ Adding Shard 4:
│ ↑ │ Only 25% of keys move
│ Key D Key B │ (compared to 75% with simple modulo)
│ ↓ │
│ Shard 2 │
└────────────────────────┘
Simple modulo (3→4 shards): 75% of keys move to different shard
Consistent hashing (3→4 shards): ~25% of keys move
Used by: DynamoDB, Cassandra, Redis Cluster, RiakImplementation: Use a sorted array of virtual nodes. Libraries: ketama (Python), HashRing (Java), or use a database with built-in consistent hashing (Redis Cluster, Cassandra).
Strategy 4: Directory Sharding (Lookup Table)
A central lookup table maps each entity ID to its shard:
-- Shard directory table (in a dedicated metadata DB):
CREATE TABLE shard_directory (
user_id BIGINT PRIMARY KEY,
shard_id SMALLINT NOT NULL,
created_at TIMESTAMP NOT NULL
);
-- Routing in application:
SELECT shard_id FROM shard_directory WHERE user_id = $1;
-- Then query the appropriate shard
-- Rebalancing is simple: UPDATE shard_directory SET shard_id = 3 WHERE user_id = 12345;
-- Then migrate that user's data to shard 3
-- No formula change needed!Advantage: Maximum flexibility — entries can be moved to different shards at will. Easy to handle "whale" tenants (move large tenants to dedicated shards).
Disadvantage: The directory table is a critical single point of failure and a performance bottleneck for every query. Must be heavily cached with Redis and designed for HA.
Cross-Shard Queries: The Pain of Sharding
After sharding, queries that previously used JOIN become complex:
# ⌠BEFORE sharding — simple join:
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.region = 'EU'
GROUP BY u.id;
# ⌠AFTER sharding — impossible as a single query:
# users are on Shard 1, 2, 3... orders are on Shard 1, 2, 3...
# You can't JOIN across shards in a single SQL statement
# ✅ Application-level join (scatter-gather):
eu_users = []
for shard in all_shards:
eu_users.extend(shard.query("SELECT id, name FROM users WHERE region = 'EU'"))
user_ids = [u.id for u in eu_users]
# For each shard, get order counts for users that live there:
order_counts = {}
for shard in all_shards:
shard_counts = shard.query(
f"SELECT user_id, COUNT(*) AS cnt FROM orders WHERE user_id IN ({','.join(user_ids)}) GROUP BY user_id"
)
order_counts.update({row.user_id: row.cnt for row in shard_counts})
# Merge in application memory — expensive and slow!
result = [(u.name, order_counts.get(u.id, 0)) for u in eu_users]Mitigation strategies:
- Design your sharding key so that related entities (users + their orders) live on the same shard
- Use a data warehouse (BigQuery, Redshift) for cross-shard analytics queries
- Denormalise data — store copies on both shards for frequently-joined tables
Managed Sharding: Vitess and Citus
Building sharding middleware yourself is an enormous engineering investment. Use managed solutions:
Vitess (YouTube/MySQL): A proxy layer + orchestration system for MySQL sharding. Your app connects to Vitess as if it's a single MySQL server; Vitess routes queries to the right shard, handles resharding, manages replication, and provides a web UI for operations.
Citus (PostgreSQL): A PostgreSQL extension that transforms a single Postgres instance into a sharded cluster. Native PostgreSQL protocol — apps connect as if to a single database. Supports hash and range distribution, parallel query execution across shards, and transparent resharding.
-- Citus: Mark a table for sharding
SELECT create_distributed_table('orders', 'user_id');
-- Citus handles all routing transparently from now on
-- All queries via psql/JDBC work as normal — Citus routes internallyFrequently Asked Questions
How do I handle transactions across shards? You generally can't use database-level transactions across shards — each shard has an independent transaction manager. Use the Saga Pattern for cross-shard operations: break the transaction into compensatable local transactions on each shard. For financial systems requiring strict atomicity, consider keeping all financial data for one entity on the same shard.
Can I shard without changing application code? With Vitess or Citus, you can shard transparently to the application for most query types. However, cross-shard JOINs and some aggregate queries require application changes. The rule: if you design your sharding key so queries always specify the sharding key, most queries route to a single shard with zero application changes.
Key Takeaway
Database sharding is a last-resort scaling strategy that comes with a high permanent complexity cost — loss of cross-shard joins, loss of multi-row transactions, need for scatter-gather queries for analytics. The sharding key decision is irreversible in practice — choosing user_id vs tenant_id vs region determines your system's scaling characteristics for years. Use read replicas, caching, and vertical scaling first. When you genuinely need sharding, prefer managed solutions (Vitess, Citus) over hand-rolled middleware, and invest heavily in choosing a sharding key that colocates related data on the same shard.
Read next: Leader-Follower Replication: High-Availability Database Patterns →
Part of the Software Architecture Hub — comprehensive guides from architectural foundations to advanced distributed systems patterns.
