ArchitectureDatabases

Database Sharding and Replication: Scale

TT
TopicTrick Team
Database Sharding and Replication: Scale

Database Sharding and Replication: Scale


1. Read Replication: The "Speed" Trick

Most apps do 90% Reads and 10% Writes.

  • You have one Primary (The Leader). All "Writes" go here.
  • You have 5 Replicas (The Followers). All "Reads" go here.
  • The Benefit: You can handle 5x more users without buying a bigger database.
  • The Trade-off: "Eventual Consistency." If I change my name on the Leader, it might take $100$ms to copy that to the Followers. For a split second, I see my old name!

2. Sharding: The "Infinite" Trick

Replication gives you more speed, but it doesn't give you more Space.

  • In Sharding, you split the table.
  • "Users A-M live on Server 1. Users N-Z live on Server 2."
  • Now you have TWO tables, each half the size.
  • The Difficulty: If you want to do a "Join" across all users, you have to talk to multiple servers. It is the "Final Boss" of system design.

3. Sharding Keys: The Most Important Choice

How do you decide which user goes where?

  • Hash Sharding: Use a mathematical hash of the UserID. (Good for "Balance"—every server gets an equal amount of data).
  • Range Sharding: Use the Date. (Good for "Time-series"—everything from 2024 is on one server). The Warning: Choose your shard key carefully! If you choose poorly and one server gets 90% of the data (Wait times!), your whole system will crash.

4. Multi-Region Replication: Global Speed

In 2026, we don't just replicate for speed; we replicate for Geography.

  • You have a database replica in London, one in Tokyo, and one in New York.
  • When a user in Tokyo clicks "Profile," the data only has to travel 5 miles to their local data center, instead of 10,000 miles to the US. It's the only way to build a website that feels "Frictionless" across the entire planet.

Frequently Asked Questions

Can I shard automatically? In 2026, tools like Vitess (used by YouTube) can handle the sharding architecture for you. You write standard SQL, and Vitess manages the "Routing" to the different servers. It allows you to use a "Distributed database" without losing your mind.

Is NoSQL easier for sharding? YES. Databases like MongoDB and Cassandra were built for sharding from day one. If you know your data will reach $50$TB, choosing NoSQL (Module 123) is often a safer bet than trying to "Force" sharding onto a legacy SQL engine.


Key Takeaway

Scaling data is about "Division of Labor." By mastering the distinction between replication for throughput and sharding for storage capacity, you gain the ability to maintain world-class performance for any amount of data. You graduate from "Managing a database" to "Architecting a Global Data Platform."

Read next: API Gateway: The Security and Traffic Guard →


Part of the Software Architecture Hub — engineering the data.