Software ArchitectureSystem Design

Leader-Follower Database Replication: High Availability, Read Scaling & Failover Engineering

TT
TopicTrick Team
Leader-Follower Database Replication: High Availability, Read Scaling & Failover Engineering

Leader-Follower Database Replication: High Availability, Read Scaling & Failover Engineering


Table of Contents


Why a Single Database Is a Liability

A single database is a Single Point of Failure (SPOF). In production:

EventImpact without Replication
Hardware failure (disk, NIC)Total data loss risk, hours of downtime
OS/software crashMinutes-to-hours RTO (Recovery Time Objective)
Maintenance windowFull application downtime
Read traffic spikeWrites blocked waiting for disk I/O
Backup operationDramatically slows production queries

SLA implications: A single PostgreSQL instance delivers ~99.9% availability (8.7 hours downtime/year). A Leader-Follower cluster with automated failover delivers ~99.99% (52 minutes downtime/year).


The Leader-Follower Architecture Explained

mermaid

Key invariants:

  1. Only the Leader accepts writes
  2. Every write to the Leader is replicated to all Followers
  3. Followers are read-only (applications cannot write to them)
  4. If the Leader fails, exactly one Follower is promoted to Leader

How Replication Works: Binary Log and WAL

PostgreSQL: Write-Ahead Log (WAL)

PostgreSQL uses streaming replication via WAL:

  1. Every write operation is first written to the WAL (a sequential log of all changes)
  2. The transaction is committed locally on the Leader
  3. A WAL sender process on the Leader streams WAL records to connected Followers
  4. A WAL receiver process on each Follower applies WAL records to its local data files
  5. Followers maintain a standby state — they replay incoming WAL but accept no writes
bash

MySQL: Binary Log (binlog)

MySQL uses an I/O thread + SQL thread model:

  1. Leader writes changes to the binary log (binlog)
  2. Follower's I/O thread connects to Leader and copies binlog events to a local relay log
  3. Follower's SQL thread reads the relay log and applies events to local tables
sql

Three Replication Modes: Async, Semi-Sync, Sync

ModeLeader waits forData loss on crashWrite latencyWhen to use
AsynchronousNothingUp to seconds of dataLowestHigh-write throughput, some data loss tolerable
Semi-SynchronousAt least one follower to receive (not apply)At most one transactionLow (+1-2ms)Default for production — best balance
SynchronousAll followers to applyZero data lossHigher (+10-50ms)Financial systems, zero-RPO requirements

PostgreSQL synchronous_standby_names:

ini

Replication Lag: Measurement and Mitigation

Replication lag is the time between a write being committed on the Leader and being visible on a Follower.

Causes:

  1. Network bandwidth saturation (high write volume)
  2. Follower CPU overloaded applying complex statements
  3. Long-running queries on Follower blocking WAL replay

Measuring lag:

sql

Mitigation in application code:

python

Automated Failover: Patroni and Orchestrator

PostgreSQL: Patroni

Patroni is the industry standard for PostgreSQL HA:

yaml

Failover sequence with Patroni:

  1. Leader misses heartbeat in etcd → Patroni on Followers detect loss of lock
  2. Followers compare their WAL positions — most up-to-date wins the election
  3. Winner acquires etcd lock, promotes itself to Leader
  4. Other Followers receive new Leader information, switch to replicate from new Leader
  5. Former Leader restarts as a new Follower (after fencing prevents split-brain)

Typical failover time: 10–30 seconds


Multi-Region Replication Topologies

mermaid

Cross-region replication is always asynchronous — synchronous replication across AWS regions (100ms+ RTT) would make every write take 100ms+ longer.

Use case: Users in Singapore read from the Singapore replica (10ms latency), while all writes still go to US-East (100ms for the write itself, but reads are fast).


Frequently Asked Questions

What is "read-your-writes" consistency and how do I implement it? After a user writes data (e.g., updates their profile), they expect to see the updated data if they immediately read it back. But if the read goes to a replica with replication lag, they see stale data. Solutions: (1) Route reads to the leader for 1-2 seconds after a write. (2) Track the write's LSN/binlog position, route reads to replicas only after they've caught up to that position. (3) Use application-level sessions to sticky a user to the leader for a short window post-write.

Can I use Followers for other workloads like backups and analytics? Yes — this is a primary benefit of follower replicas. Designate one follower exclusively for pg_dump backups (backup replica) so the backup I/O doesn't affect production queries. Use another follower for long-running analytics queries that would lock production tables on the leader.


Key Takeaway

Leader-Follower replication is the non-negotiable foundation of production database reliability. Asynchronous replication maximizes write throughput but risks seconds of data loss on Leader failure. Synchronous replication guarantees zero data loss but adds latency to every write. Semi-synchronous (wait for receipt, not apply) provides the best balance for most production systems. The operational sophistication lies in automated failover (Patroni, Orchestrator), split-brain prevention (fencing), and application-level lag handling — skills that differentiate junior from senior infrastructure architects.

Read next: The Saga Pattern: Distributed Transactions Made Easy →


Part of the Software Architecture Hub — comprehensive guides from architectural foundations to advanced distributed systems patterns.