What is the difference between round-robin and least-connections load balancing?

Round-robin distributes requests sequentially across servers regardless of their current load — simple but inefficient when requests have variable processing times. Least-connections routes each new request to the server with the fewest active connections, producing better distribution for workloads with uneven request durations.

What is the difference between Layer 4 and Layer 7 load balancing?

Layer 4 load balancers route TCP/UDP traffic based on IP address and port without inspecting content. They are extremely fast but cannot make routing decisions based on HTTP headers, URLs, or cookies. Layer 7 load balancers inspect HTTP content and can route by path, header, or session affinity, enabling more sophisticated traffic management.

How do you handle session affinity (sticky sessions) in a load-balanced application?

Sticky sessions bind a client to a specific server using a cookie. The load balancer sets a cookie on the first response and routes subsequent requests from that client to the same server. The downside is that server failure loses all sessions bound to it. The better approach is to store session state externally (Redis) so any server can handle any request.

Load Balancing Strategies: Managing Traffic at Scale

← Back to Architecture Hub

Load Balancing Strategies: Managing Traffic at Scale

A load balancer sits in front of your servers and distributes incoming requests so no single server bears the entire load. At small scale this is a convenience. At large scale it is the foundation of availability - a properly configured load balancer lets you take servers offline for maintenance, absorb traffic spikes by adding instances, and automatically route traffic away from unhealthy nodes.

This guide covers every major load balancing strategy, the difference between Layer 4 and Layer 7 balancing, consistent hashing for stateful workloads, NGINX configuration examples, and health check patterns.

How Load Balancing Works

Without a load balancer, your DNS record points to a single server IP. Every user connects to that one server. When it crashes or gets overwhelmed, your site goes down.

With a load balancer:

text

User -> DNS -> Load Balancer (single public IP)
                   |
          +--------+--------+
          v        v        v
       Server A  Server B  Server C
       (private) (private) (private)

The load balancer is the only public-facing component. The backend servers live on a private network, invisible to the internet. This architecture provides:

Horizontal scalability: Add more servers without changing DNS
High availability: Remove any server without downtime
Security: Backend servers never directly exposed to the internet
SSL termination: Load balancer handles TLS, backends use plain HTTP

Load Balancing Algorithms

Round Robin

The simplest algorithm. Distribute requests sequentially across all servers:

text

Request 1 -> Server A
Request 2 -> Server B
Request 3 -> Server C
Request 4 -> Server A  (cycle repeats)

Best for: Stateless services where all servers have identical capacity and handle similar request complexity.

Problem: If Server A is a 32-core machine and Server C is a 4-core VM, Round Robin sends them equal traffic. Server C gets crushed while Server A sits mostly idle.

Solution - Weighted Round Robin:

nginx

upstream backend {
    server server_a weight=8;   # Gets 8 out of every 11 requests
    server server_b weight=2;   # Gets 2 out of every 11 requests
    server server_c weight=1;   # Gets 1 out of every 11 requests
}

Least Connections

The load balancer tracks how many active connections each server has and routes the next request to the server with the fewest active connections.

text

Server A: 100 active connections
Server B: 3 active connections    ← next request goes here
Server C: 47 active connections

Best for: Long-lived connections where processing time varies significantly. Examples:

WebSocket connections (each may last hours)
Video streaming (each stream holds a connection open)
Database proxy connections
File upload handlers (large uploads take longer)

Round Robin would send equal requests to all servers, but if Server A is slow at processing, it accumulates a backlog. Least Connections automatically steers traffic toward the faster, less-loaded servers.

Weighted Least Connections: Combines connection count with server weights:

nginx

upstream backend {
    least_conn;
    server server_a weight=4;
    server server_b weight=2;
    server server_c weight=1;
}

IP Hash (Session Affinity / Sticky Sessions)

Routes the same client IP to the same server on every request:

nginx

upstream backend {
    ip_hash;
    server server_a;
    server server_b;
    server server_c;
}

Best for: Applications with server-side session state that cannot be moved to a shared store yet. A user who logs in on Server A continues to hit Server A, so their session data is always available.

Problem: Uneven distribution. If a large corporate network appears as a single IP (via NAT), all of that company's users go to one server.

Better solution: Instead of IP hash, store sessions in Redis and use Round Robin or Least Connections. This eliminates the uneven distribution problem.

Random with Two Choices (Power of Two Choices)

Pick two servers at random, then route to whichever has fewer active connections. This sounds counterintuitive but is mathematically optimal - it achieves near-perfect load distribution with almost no overhead.

Used internally by Netflix's Zuul, HAProxy's random balancer, and many high-performance cloud load balancers.

Layer 4 vs. Layer 7 Load Balancing

This is one of the most important distinctions in load balancing:

Layer 4 (Transport Layer)

The load balancer operates at the TCP/UDP level. It routes packets based on source/destination IP and port without reading the packet content.

How it works:

Client connects to load balancer on port 443
Load balancer picks a backend server (by algorithm)
Load balancer forwards the TCP stream to that server
The backend handles TLS, HTTP parsing, everything else

Advantages:

Extremely fast - no packet inspection required
Works with any protocol (HTTP, HTTPS, gRPC, MQTT, custom TCP)
Lower CPU usage

Limitations:

Cannot route based on URL path, HTTP headers, or cookies
Cannot do SSL termination (passes encrypted traffic through)
No HTTP-level features (host-based routing, redirect rules)

Use Layer 4 when: You need maximum throughput and the routing logic is simple (just distribute by IP).

Layer 7 (Application Layer)

The load balancer reads and understands the HTTP request before routing it.

What Layer 7 can do that Layer 4 cannot:

Feature	Example
URL-based routing	`/api/` -> API servers, `/static/` -> CDN
Host-based routing	`app.example.com` -> app servers, `api.example.com` -> API servers
Header-based routing	`X-API-Version: 2` -> v2 servers
Cookie-based affinity	Route based on session cookie (better than IP hash)
SSL termination	Load balancer decrypts HTTPS, backend gets plain HTTP
HTTP redirects	301 redirect HTTP -> HTTPS at the load balancer
Rate limiting	Block clients sending more than 100 req/s
Authentication	Validate JWT tokens before forwarding to backend
Request modification	Add headers, rewrite URLs, inject request IDs

AWS ALB (Application Load Balancer) routing example:

json

{
  "Rules": [
    {
      "Conditions": [{"Field": "path-pattern", "Values": ["/api/*"]}],
      "Actions": [{"Type": "forward", "TargetGroupArn": "arn:aws:...api-servers"}]
    },
    {
      "Conditions": [{"Field": "path-pattern", "Values": ["/static/*"]}],
      "Actions": [{"Type": "redirect", "RedirectConfig": {"Host": "cdn.example.com"}}]
    }
  ]
}

Use Layer 7 when: You need URL routing, SSL termination, A/B testing, canary deployments, or any HTTP-aware logic.

Consistent Hashing

Consistent hashing solves a critical problem with cache-backed systems.

The Problem with Modulo Hashing

Simple IP hash routes a client to server = hash(ip) % num_servers. With 3 servers:

Client 192.168.1.10 -> hash = 123 -> 123 % 3 = 0 -> Server A
Client 192.168.1.20 -> hash = 456 -> 456 % 3 = 1 -> Server B

If you add a 4th server, all the modulo math changes. 123 % 4 = 3 (Server D, not Server A). Nearly every client gets remapped to a different server. If those servers cache session data or user state, every user effectively loses their cache simultaneously - your database gets hammered.

How Consistent Hashing Works

Visualize a ring of numbers from 0 to 2^32. Each server is placed at a position on the ring (by hashing its IP/name). Each request is placed on the ring (by hashing the client IP or session ID). The request goes to the first server clockwise from its position on the ring.

text

         Server A (position 100)
              |
0 ------------+---------------------- 2^32
         Requests 0-100 -> Server C
         Requests 101-400 -> Server A
         Requests 401-800 -> Server B
         ...

Adding a new server: The new server only takes over the requests that fall between it and its predecessor on the ring. On average only 1/N of clients move - not all of them.

Virtual nodes: To prevent hot spots (one server getting more load due to ring position), each physical server is represented by multiple positions on the ring (virtual nodes). With 150 virtual nodes per server, the distribution is statistically uniform.

Where Consistent Hashing Is Used

Redis Cluster: Distributes keys across Redis nodes using consistent hashing
Cassandra: Uses a consistent hash ring to distribute partitions across nodes
CDNs: Consistent hashing determines which edge node caches which content
Distributed caches (Memcached): Client-side consistent hashing routes gets/sets

NGINX as a Load Balancer

NGINX is the most widely deployed software load balancer. A single 16-core NGINX instance can handle 100,000+ connections per second.

Basic Round Robin Setup

nginx

# /etc/nginx/nginx.conf

http {
    upstream my_app {
        server 10.0.0.1:3000;
        server 10.0.0.2:3000;
        server 10.0.0.3:3000;
    }

    server {
        listen 80;
        server_name example.com;

        location / {
            proxy_pass http://my_app;
            proxy_http_version 1.1;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        }
    }
}

SSL Termination + Least Connections

nginx

http {
    upstream my_app {
        least_conn;
        server 10.0.0.1:3000;
        server 10.0.0.2:3000;
        server 10.0.0.3:3000;

        keepalive 32;   # Keep 32 connections open per worker to each backend
    }

    server {
        listen 443 ssl http2;
        server_name example.com;

        ssl_certificate     /etc/ssl/example.com.crt;
        ssl_certificate_key /etc/ssl/example.com.key;
        ssl_protocols       TLSv1.2 TLSv1.3;

        location / {
            proxy_pass http://my_app;
            proxy_http_version 1.1;
            proxy_set_header Connection "";   # Required for keepalive
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_connect_timeout 5s;
            proxy_read_timeout 60s;
        }
    }

    # Redirect HTTP to HTTPS
    server {
        listen 80;
        return 301 https://$host$request_uri;
    }
}

Health Checks

NGINX Plus (commercial) has active health checks built in. The open-source version uses passive health checks - it marks a server as failed when it receives enough errors:

nginx

upstream my_app {
    server 10.0.0.1:3000 max_fails=3 fail_timeout=30s;
    server 10.0.0.2:3000 max_fails=3 fail_timeout=30s;
    server 10.0.0.3:3000 max_fails=3 fail_timeout=30s;
}

max_fails=3 fail_timeout=30s means: if 3 failures occur within 30 seconds, mark the server as unavailable for 30 seconds, then retry.

For active health checks in open-source NGINX, use the ngx_http_healthcheck_module or use HAProxy instead.

Health Checks: How Load Balancers Self-Heal

Health checks are probes the load balancer sends to each backend server to verify it can handle traffic.

Types of Health Checks

TCP health check: Try to establish a TCP connection to the server port. If it succeeds, the server is alive.

json

{
  "HealthCheck": {
    "Protocol": "TCP",
    "Port": 3000,
    "Interval": 10,
    "Timeout": 5,
    "HealthyThreshold": 2,
    "UnhealthyThreshold": 3
  }
}

HTTP health check: Send an HTTP GET request and check the response status code.

json

{
  "HealthCheck": {
    "Protocol": "HTTP",
    "Path": "/health",
    "Port": 3000,
    "Interval": 10,
    "Timeout": 5,
    "Matcher": {"HttpCode": "200"}
  }
}

Your application must expose a /health endpoint:

javascript

// Express.js health check endpoint
app.get('/health', async (req, res) => {
  res.json({ status: 'ok', timestamp: Date.now() });
});

// More thorough readiness check
app.get('/ready', async (req, res) => {
  try {
    await db.query('SELECT 1');
    res.json({ status: 'ready', database: 'ok' });
  } catch (err) {
    res.status(503).json({ status: 'unavailable', database: err.message });
  }
});

Separate liveness from readiness:

/health (liveness): Is the process running? Should it be restarted if it returns unhealthy?
/ready (readiness): Is the server ready to accept traffic? Should the load balancer send it requests?

A server can be alive but not ready (still warming up, waiting for database). Use /ready as the load balancer health check endpoint so traffic is only routed to fully initialized instances.

Global Load Balancing (GeoDNS + Anycast)

For applications serving users worldwide, traffic should be routed to the nearest data center.

GeoDNS: DNS returns different IP addresses based on the user's geographic location. A user in London gets the London server IP; a user in Tokyo gets the Tokyo server IP.

Anycast: Multiple servers in different regions share the same public IP address. BGP routing directs each user to the nearest server automatically. Used by CDNs (Cloudflare, Fastly) and DNS resolvers (1.1.1.1, 8.8.8.8).

AWS Global Accelerator: Combines Anycast with intelligent routing - traffic enters AWS's backbone network at the edge and is routed to the optimal AWS region, bypassing congested public internet paths.

Frequently Asked Questions

Q: What is the difference between a load balancer and a reverse proxy?

A reverse proxy (like NGINX) sits in front of your servers and forwards requests on their behalf. A load balancer distributes those requests across multiple servers. In practice, NGINX does both simultaneously - it is a reverse proxy that can also load balance. AWS ALB and Cloudflare Load Balancing are dedicated load balancers.

Q: When should I use Least Connections instead of Round Robin?

Use Least Connections when your request handling time varies significantly. For example, a REST API where most endpoints return in 50ms but one endpoint does a heavy report that takes 10 seconds. Round Robin would pile up 10-second requests on the current server. Least Connections would route new fast requests away from the server still processing the slow one.

Q: How do you handle WebSocket connections in a load balancer?

WebSockets require the load balancer to be in Layer 7 mode and configured to upgrade the HTTP connection. In NGINX:

nginx

location /socket.io/ {
    proxy_pass http://my_app;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_read_timeout 3600s;  # Keep WebSocket connections open
}

For sticky sessions with WebSockets, use IP hash or cookie-based session affinity to ensure reconnections go back to the same server.

Q: Can I use NGINX as a load balancer for free?

Yes. NGINX open-source is free and handles most load balancing needs. Active health checks, advanced session draining, and JWT authentication are NGINX Plus (paid) features. HAProxy is a free, open-source alternative with active health checks included.

Key Takeaway

Load balancing is the infrastructure of reliability. The algorithm you choose - Round Robin, Least Connections, or Consistent Hashing - should match your traffic pattern and whether your workload is stateless or stateful. Layer 7 balancing is the modern default for HTTP services because it enables routing intelligence, SSL termination, and health checks. Combine your load balancer with proper health check endpoints in your application and you have the foundation of a self-healing system that routes around failures automatically, invisible to your users.

Part of the Software Architecture Hub - engineering the traffic.