Leader-Follower Database Replication: High Availability, Read Scaling & Failover Engineering

Leader-Follower Database Replication: High Availability, Read Scaling & Failover Engineering
Table of Contents
- Why a Single Database Is a Liability
- The Leader-Follower Architecture Explained
- How Replication Works: Binary Log and WAL
- Three Replication Modes: Async, Semi-Sync, Sync
- Replication Lag: Measurement and Mitigation
- Read Replica Routing Strategies
- Automated Failover: Patroni and Orchestrator
- Multi-Region Replication Topologies
- Handling Split-Brain: The Fencing Problem
- Cascading Followers and Chain Replication
- Frequently Asked Questions
- Key Takeaway
Why a Single Database Is a Liability
A single database is a Single Point of Failure (SPOF). In production:
| Event | Impact without Replication |
|---|---|
| Hardware failure (disk, NIC) | Total data loss risk, hours of downtime |
| OS/software crash | Minutes-to-hours RTO (Recovery Time Objective) |
| Maintenance window | Full application downtime |
| Read traffic spike | Writes blocked waiting for disk I/O |
| Backup operation | Dramatically slows production queries |
SLA implications: A single PostgreSQL instance delivers ~99.9% availability (8.7 hours downtime/year). A Leader-Follower cluster with automated failover delivers ~99.99% (52 minutes downtime/year).
The Leader-Follower Architecture Explained
Key invariants:
- Only the Leader accepts writes
- Every write to the Leader is replicated to all Followers
- Followers are read-only (applications cannot write to them)
- If the Leader fails, exactly one Follower is promoted to Leader
How Replication Works: Binary Log and WAL
PostgreSQL: Write-Ahead Log (WAL)
PostgreSQL uses streaming replication via WAL:
- Every write operation is first written to the WAL (a sequential log of all changes)
- The transaction is committed locally on the Leader
- A WAL sender process on the Leader streams WAL records to connected Followers
- A WAL receiver process on each Follower applies WAL records to its local data files
- Followers maintain a standby state — they replay incoming WAL but accept no writes
MySQL: Binary Log (binlog)
MySQL uses an I/O thread + SQL thread model:
- Leader writes changes to the binary log (binlog)
- Follower's I/O thread connects to Leader and copies binlog events to a local relay log
- Follower's SQL thread reads the relay log and applies events to local tables
Three Replication Modes: Async, Semi-Sync, Sync
| Mode | Leader waits for | Data loss on crash | Write latency | When to use |
|---|---|---|---|---|
| Asynchronous | Nothing | Up to seconds of data | Lowest | High-write throughput, some data loss tolerable |
| Semi-Synchronous | At least one follower to receive (not apply) | At most one transaction | Low (+1-2ms) | Default for production — best balance |
| Synchronous | All followers to apply | Zero data loss | Higher (+10-50ms) | Financial systems, zero-RPO requirements |
PostgreSQL synchronous_standby_names:
Replication Lag: Measurement and Mitigation
Replication lag is the time between a write being committed on the Leader and being visible on a Follower.
Causes:
- Network bandwidth saturation (high write volume)
- Follower CPU overloaded applying complex statements
- Long-running queries on Follower blocking WAL replay
Measuring lag:
Mitigation in application code:
Automated Failover: Patroni and Orchestrator
PostgreSQL: Patroni
Patroni is the industry standard for PostgreSQL HA:
Failover sequence with Patroni:
- Leader misses heartbeat in etcd → Patroni on Followers detect loss of lock
- Followers compare their WAL positions — most up-to-date wins the election
- Winner acquires etcd lock, promotes itself to Leader
- Other Followers receive new Leader information, switch to replicate from new Leader
- Former Leader restarts as a new Follower (after fencing prevents split-brain)
Typical failover time: 10–30 seconds
Multi-Region Replication Topologies
Cross-region replication is always asynchronous — synchronous replication across AWS regions (100ms+ RTT) would make every write take 100ms+ longer.
Use case: Users in Singapore read from the Singapore replica (10ms latency), while all writes still go to US-East (100ms for the write itself, but reads are fast).
Frequently Asked Questions
What is "read-your-writes" consistency and how do I implement it? After a user writes data (e.g., updates their profile), they expect to see the updated data if they immediately read it back. But if the read goes to a replica with replication lag, they see stale data. Solutions: (1) Route reads to the leader for 1-2 seconds after a write. (2) Track the write's LSN/binlog position, route reads to replicas only after they've caught up to that position. (3) Use application-level sessions to sticky a user to the leader for a short window post-write.
Can I use Followers for other workloads like backups and analytics?
Yes — this is a primary benefit of follower replicas. Designate one follower exclusively for pg_dump backups (backup replica) so the backup I/O doesn't affect production queries. Use another follower for long-running analytics queries that would lock production tables on the leader.
Key Takeaway
Leader-Follower replication is the non-negotiable foundation of production database reliability. Asynchronous replication maximizes write throughput but risks seconds of data loss on Leader failure. Synchronous replication guarantees zero data loss but adds latency to every write. Semi-synchronous (wait for receipt, not apply) provides the best balance for most production systems. The operational sophistication lies in automated failover (Patroni, Orchestrator), split-brain prevention (fencing), and application-level lag handling — skills that differentiate junior from senior infrastructure architects.
Read next: The Saga Pattern: Distributed Transactions Made Easy →
Part of the Software Architecture Hub — comprehensive guides from architectural foundations to advanced distributed systems patterns.
