What is leader-follower database replication?

Leader-follower replication (also called primary-replica or master-slave) routes all writes to a single leader node that logs changes to a replication stream. Follower nodes apply those changes asynchronously (or semi-synchronously) and serve read traffic. It is the most common replication pattern for scaling read-heavy workloads.

What is replication lag and when does it cause problems?

Replication lag is the delay between a write on the leader and its appearance on followers. Under normal conditions it is milliseconds. During high write load it can grow to seconds. Applications that read their own writes (a user sees their just-posted comment) can see stale data from a lagging follower — use read-your-writes consistency or route those reads to the leader.

How does leader failover work in a replicated database?

When the leader fails, a monitoring system detects the outage and promotes the most up-to-date follower to leader. The other followers reconfigure to replicate from the new leader. DNS or a virtual IP is updated to point to the new leader. This process takes 10-60 seconds and involves a brief write outage during the transition.

Leader-Follower Database Replication: High Availability, Read Scaling & Failover Engineering

← Back to Software Architecture Hub

Leader-Follower Database Replication: High Availability, Read Scaling & Failover Engineering

Why a Single Database Is a Liability
The Leader-Follower Architecture Explained
How Replication Works: Binary Log and WAL
Three Replication Modes: Async, Semi-Sync, Sync
Replication Lag: Measurement and Mitigation
Read Replica Routing Strategies
Automated Failover: Patroni and Orchestrator
Multi-Region Replication Topologies
Handling Split-Brain: The Fencing Problem
Cascading Followers and Chain Replication
Frequently Asked Questions
Key Takeaway

Why a Single Database Is a Liability

A single database is a Single Point of Failure (SPOF). In production:

Event	Impact without Replication
Hardware failure (disk, NIC)	Total data loss risk, hours of downtime
OS/software crash	Minutes-to-hours RTO (Recovery Time Objective)
Maintenance window	Full application downtime
Read traffic spike	Writes blocked waiting for disk I/O
Backup operation	Dramatically slows production queries

SLA implications: A single PostgreSQL instance delivers ~99.9% availability (8.7 hours downtime/year). A Leader-Follower cluster with automated failover delivers ~99.99% (52 minutes downtime/year).

The Leader-Follower Architecture Explained

Key invariants:

Only the Leader accepts writes
Every write to the Leader is replicated to all Followers
Followers are read-only (applications cannot write to them)
If the Leader fails, exactly one Follower is promoted to Leader

How Replication Works: Binary Log and WAL

PostgreSQL: Write-Ahead Log (WAL)

PostgreSQL uses streaming replication via WAL:

Every write operation is first written to the WAL (a sequential log of all changes)
The transaction is committed locally on the Leader
A WAL sender process on the Leader streams WAL records to connected Followers
A WAL receiver process on each Follower applies WAL records to its local data files
Followers maintain a standby state - they replay incoming WAL but accept no writes

bash

# PostgreSQL: Check replication status on Leader:
SELECT
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    (sent_lsn - replay_lsn) AS replication_lag_bytes
FROM pg_stat_replication;

MySQL: Binary Log (binlog)

MySQL uses an I/O thread + SQL thread model:

Leader writes changes to the binary log (binlog)
Follower's I/O thread connects to Leader and copies binlog events to a local relay log
Follower's SQL thread reads the relay log and applies events to local tables

sql

-- MySQL: Check replication status on Follower:
SHOW REPLICA STATUS\G
-- Key fields:
-- Seconds_Behind_Source: replication lag in seconds
-- Replica_IO_Running: Is I/O thread running?
-- Replica_SQL_Running: Is SQL thread running?

Three Replication Modes: Async, Semi-Sync, Sync

Mode	Leader waits for	Data loss on crash	Write latency	When to use
Asynchronous	Nothing	Up to seconds of data	Lowest	High-write throughput, some data loss tolerable
Semi-Synchronous	At least one follower to receive (not apply)	At most one transaction	Low (+1-2ms)	Default for production - best balance
Synchronous	All followers to apply	Zero data loss	Higher (+10-50ms)	Financial systems, zero-RPO requirements

PostgreSQL synchronous_standby_names:

ini

# postgresql.conf on Leader:
synchronous_standby_names = 'FIRST 1 (standby1, standby2)'
# Wait for at least 1 of (standby1, standby2) to confirm write before ACK
synchronous_commit = on  # remote_write: receive only, on: apply confirmed

Replication Lag: Measurement and Mitigation

Replication lag is the time between a write being committed on the Leader and being visible on a Follower.

Causes:

Network bandwidth saturation (high write volume)
Follower CPU overloaded applying complex statements
Long-running queries on Follower blocking WAL replay

Measuring lag:

sql

-- PostgreSQL (on Leader): lag in seconds per replica:
SELECT 
    application_name,
    EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) AS lag_seconds
FROM pg_stat_replication;

-- Or query on the Follower itself:
SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) AS replica_lag_seconds;

Mitigation in application code:

python

import time
from database import leader_conn, replica_conn

def get_user_with_consistent_read(user_id: int, max_lag_seconds: float = 0.5):
    """
    Try replica first. If lag is too high, fall back to leader.
    Pattern: "Read from replica, fall back to leader on lag."
    """
    replica_lag = get_replica_lag_seconds()
    
    if replica_lag < max_lag_seconds:
        return replica_conn.query("SELECT * FROM users WHERE id = %s", [user_id])
    else:
        # Replica too stale - read from leader for consistency:
        return leader_conn.query("SELECT * FROM users WHERE id = %s", [user_id])

def get_user_after_write(user_id: int):
    """
    After writing, always read from leader for a short window.
    Pattern: 'Read-your-writes consistency.'
    """
    # Use sticky session to leader for 2 seconds after a write:
    session['use_leader_until'] = time.time() + 2.0
    return leader_conn.query("SELECT * FROM users WHERE id = %s", [user_id])

Automated Failover: Patroni and Orchestrator

PostgreSQL: Patroni

Patroni is the industry standard for PostgreSQL HA:

yaml

# patroni.yml (simplified)
scope: postgres-cluster
name: node1

etcd:
  hosts: etcd1:2379,etcd2:2379,etcd3:2379  # Distributed consensus

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576  # 1MB WAL - don't promote if too far behind

postgresql:
  parameters:
    synchronous_commit: "on"
    synchronous_standby_names: "FIRST 1 (*)"

watchdog:
  mode: required  # Fencing: kill the node if it loses leadership

Failover sequence with Patroni:

Leader misses heartbeat in etcd -> Patroni on Followers detect loss of lock
Followers compare their WAL positions - most up-to-date wins the election
Winner acquires etcd lock, promotes itself to Leader
Other Followers receive new Leader information, switch to replicate from new Leader
Former Leader restarts as a new Follower (after fencing prevents split-brain)

Typical failover time: 10-30 seconds

Multi-Region Replication Topologies

Cross-region replication is always asynchronous - synchronous replication across AWS regions (100ms+ RTT) would make every write take 100ms+ longer.

Use case: Users in Singapore read from the Singapore replica (10ms latency), while all writes still go to US-East (100ms for the write itself, but reads are fast).

Frequently Asked Questions

What is "read-your-writes" consistency and how do I implement it? After a user writes data (e.g., updates their profile), they expect to see the updated data if they immediately read it back. But if the read goes to a replica with replication lag, they see stale data. Solutions: (1) Route reads to the leader for 1-2 seconds after a write. (2) Track the write's LSN/binlog position, route reads to replicas only after they've caught up to that position. (3) Use application-level sessions to sticky a user to the leader for a short window post-write.

Can I use Followers for other workloads like backups and analytics? Yes - this is a primary benefit of follower replicas. Designate one follower exclusively for pg_dump backups (backup replica) so the backup I/O doesn't affect production queries. Use another follower for long-running analytics queries that would lock production tables on the leader.

Key Takeaway

Leader-Follower replication is the non-negotiable foundation of production database reliability. Asynchronous replication maximizes write throughput but risks seconds of data loss on Leader failure. Synchronous replication guarantees zero data loss but adds latency to every write. Semi-synchronous (wait for receipt, not apply) provides the best balance for most production systems. The operational sophistication lies in automated failover (Patroni, Orchestrator), split-brain prevention (fencing), and application-level lag handling - skills that differentiate junior from senior infrastructure architects.

Part of the Software Architecture Hub - comprehensive guides from architectural foundations to advanced distributed systems patterns.

Leader-Follower Database Replication: High Availability, Read Scaling & Failover Engineering

Table of Contents

Why a Single Database Is a Liability

The Leader-Follower Architecture Explained

How Replication Works: Binary Log and WAL

PostgreSQL: Write-Ahead Log (WAL)

MySQL: Binary Log (binlog)

Three Replication Modes: Async, Semi-Sync, Sync

Replication Lag: Measurement and Mitigation

Automated Failover: Patroni and Orchestrator

PostgreSQL: Patroni

Multi-Region Replication Topologies

Frequently Asked Questions

Key Takeaway