Load Balancing Strategies: Managing Traffic at Scale

Load Balancing Strategies: Managing Traffic at Scale
A load balancer sits in front of your servers and distributes incoming requests so no single server bears the entire load. At small scale this is a convenience. At large scale it is the foundation of availability — a properly configured load balancer lets you take servers offline for maintenance, absorb traffic spikes by adding instances, and automatically route traffic away from unhealthy nodes.
This guide covers every major load balancing strategy, the difference between Layer 4 and Layer 7 balancing, consistent hashing for stateful workloads, NGINX configuration examples, and health check patterns.
How Load Balancing Works
Without a load balancer, your DNS record points to a single server IP. Every user connects to that one server. When it crashes or gets overwhelmed, your site goes down.
With a load balancer:
User → DNS → Load Balancer (single public IP)
│
┌────────┼────────â”
â–¼ â–¼ â–¼
Server A Server B Server C
(private) (private) (private)The load balancer is the only public-facing component. The backend servers live on a private network, invisible to the internet. This architecture provides:
- Horizontal scalability: Add more servers without changing DNS
- High availability: Remove any server without downtime
- Security: Backend servers never directly exposed to the internet
- SSL termination: Load balancer handles TLS, backends use plain HTTP
Load Balancing Algorithms
Round Robin
The simplest algorithm. Distribute requests sequentially across all servers:
Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A (cycle repeats)Best for: Stateless services where all servers have identical capacity and handle similar request complexity.
Problem: If Server A is a 32-core machine and Server C is a 4-core VM, Round Robin sends them equal traffic. Server C gets crushed while Server A sits mostly idle.
Solution — Weighted Round Robin:
upstream backend {
server server_a weight=8; # Gets 8 out of every 11 requests
server server_b weight=2; # Gets 2 out of every 11 requests
server server_c weight=1; # Gets 1 out of every 11 requests
}Least Connections
The load balancer tracks how many active connections each server has and routes the next request to the server with the fewest active connections.
Server A: 100 active connections
Server B: 3 active connections ↠next request goes here
Server C: 47 active connectionsBest for: Long-lived connections where processing time varies significantly. Examples:
- WebSocket connections (each may last hours)
- Video streaming (each stream holds a connection open)
- Database proxy connections
- File upload handlers (large uploads take longer)
Round Robin would send equal requests to all servers, but if Server A is slow at processing, it accumulates a backlog. Least Connections automatically steers traffic toward the faster, less-loaded servers.
Weighted Least Connections: Combines connection count with server weights:
upstream backend {
least_conn;
server server_a weight=4;
server server_b weight=2;
server server_c weight=1;
}IP Hash (Session Affinity / Sticky Sessions)
Routes the same client IP to the same server on every request:
upstream backend {
ip_hash;
server server_a;
server server_b;
server server_c;
}Best for: Applications with server-side session state that cannot be moved to a shared store yet. A user who logs in on Server A continues to hit Server A, so their session data is always available.
Problem: Uneven distribution. If a large corporate network appears as a single IP (via NAT), all of that company's users go to one server.
Better solution: Instead of IP hash, store sessions in Redis and use Round Robin or Least Connections. This eliminates the uneven distribution problem.
Random with Two Choices (Power of Two Choices)
Pick two servers at random, then route to whichever has fewer active connections. This sounds counterintuitive but is mathematically optimal — it achieves near-perfect load distribution with almost no overhead.
Used internally by Netflix's Zuul, HAProxy's random balancer, and many high-performance cloud load balancers.
Layer 4 vs. Layer 7 Load Balancing
This is one of the most important distinctions in load balancing:
Layer 4 (Transport Layer)
The load balancer operates at the TCP/UDP level. It routes packets based on source/destination IP and port without reading the packet content.
How it works:
- Client connects to load balancer on port 443
- Load balancer picks a backend server (by algorithm)
- Load balancer forwards the TCP stream to that server
- The backend handles TLS, HTTP parsing, everything else
Advantages:
- Extremely fast — no packet inspection required
- Works with any protocol (HTTP, HTTPS, gRPC, MQTT, custom TCP)
- Lower CPU usage
Limitations:
- Cannot route based on URL path, HTTP headers, or cookies
- Cannot do SSL termination (passes encrypted traffic through)
- No HTTP-level features (host-based routing, redirect rules)
Use Layer 4 when: You need maximum throughput and the routing logic is simple (just distribute by IP).
Layer 7 (Application Layer)
The load balancer reads and understands the HTTP request before routing it.
What Layer 7 can do that Layer 4 cannot:
| Feature | Example |
|---|---|
| URL-based routing | /api/* → API servers, /static/* → CDN |
| Host-based routing | app.example.com → app servers, api.example.com → API servers |
| Header-based routing | X-API-Version: 2 → v2 servers |
| Cookie-based affinity | Route based on session cookie (better than IP hash) |
| SSL termination | Load balancer decrypts HTTPS, backend gets plain HTTP |
| HTTP redirects | 301 redirect HTTP → HTTPS at the load balancer |
| Rate limiting | Block clients sending more than 100 req/s |
| Authentication | Validate JWT tokens before forwarding to backend |
| Request modification | Add headers, rewrite URLs, inject request IDs |
AWS ALB (Application Load Balancer) routing example:
{
"Rules": [
{
"Conditions": [{"Field": "path-pattern", "Values": ["/api/*"]}],
"Actions": [{"Type": "forward", "TargetGroupArn": "arn:aws:...api-servers"}]
},
{
"Conditions": [{"Field": "path-pattern", "Values": ["/static/*"]}],
"Actions": [{"Type": "redirect", "RedirectConfig": {"Host": "cdn.example.com"}}]
}
]
}Use Layer 7 when: You need URL routing, SSL termination, A/B testing, canary deployments, or any HTTP-aware logic.
Consistent Hashing
Consistent hashing solves a critical problem with cache-backed systems.
The Problem with Modulo Hashing
Simple IP hash routes a client to server = hash(ip) % num_servers. With 3 servers:
- Client 192.168.1.10 → hash = 123 → 123 % 3 = 0 → Server A
- Client 192.168.1.20 → hash = 456 → 456 % 3 = 1 → Server B
If you add a 4th server, all the modulo math changes. 123 % 4 = 3 (Server D, not Server A). Nearly every client gets remapped to a different server. If those servers cache session data or user state, every user effectively loses their cache simultaneously — your database gets hammered.
How Consistent Hashing Works
Visualize a ring of numbers from 0 to 2^32. Each server is placed at a position on the ring (by hashing its IP/name). Each request is placed on the ring (by hashing the client IP or session ID). The request goes to the first server clockwise from its position on the ring.
Server A (position 100)
│
0 ────────────┼────────────────────── 2^32
Requests 0-100 → Server C
Requests 101-400 → Server A
Requests 401-800 → Server B
...Adding a new server: The new server only takes over the requests that fall between it and its predecessor on the ring. On average only 1/N of clients move — not all of them.
Virtual nodes: To prevent hot spots (one server getting more load due to ring position), each physical server is represented by multiple positions on the ring (virtual nodes). With 150 virtual nodes per server, the distribution is statistically uniform.
Where Consistent Hashing Is Used
- Redis Cluster: Distributes keys across Redis nodes using consistent hashing
- Cassandra: Uses a consistent hash ring to distribute partitions across nodes
- CDNs: Consistent hashing determines which edge node caches which content
- Distributed caches (Memcached): Client-side consistent hashing routes gets/sets
NGINX as a Load Balancer
NGINX is the most widely deployed software load balancer. A single 16-core NGINX instance can handle 100,000+ connections per second.
Basic Round Robin Setup
# /etc/nginx/nginx.conf
http {
upstream my_app {
server 10.0.0.1:3000;
server 10.0.0.2:3000;
server 10.0.0.3:3000;
}
server {
listen 80;
server_name example.com;
location / {
proxy_pass http://my_app;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
}SSL Termination + Least Connections
http {
upstream my_app {
least_conn;
server 10.0.0.1:3000;
server 10.0.0.2:3000;
server 10.0.0.3:3000;
keepalive 32; # Keep 32 connections open per worker to each backend
}
server {
listen 443 ssl http2;
server_name example.com;
ssl_certificate /etc/ssl/example.com.crt;
ssl_certificate_key /etc/ssl/example.com.key;
ssl_protocols TLSv1.2 TLSv1.3;
location / {
proxy_pass http://my_app;
proxy_http_version 1.1;
proxy_set_header Connection ""; # Required for keepalive
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_connect_timeout 5s;
proxy_read_timeout 60s;
}
}
# Redirect HTTP to HTTPS
server {
listen 80;
return 301 https://$host$request_uri;
}
}Health Checks
NGINX Plus (commercial) has active health checks built in. The open-source version uses passive health checks — it marks a server as failed when it receives enough errors:
upstream my_app {
server 10.0.0.1:3000 max_fails=3 fail_timeout=30s;
server 10.0.0.2:3000 max_fails=3 fail_timeout=30s;
server 10.0.0.3:3000 max_fails=3 fail_timeout=30s;
}max_fails=3 fail_timeout=30s means: if 3 failures occur within 30 seconds, mark the server as unavailable for 30 seconds, then retry.
For active health checks in open-source NGINX, use the ngx_http_healthcheck_module or use HAProxy instead.
Health Checks: How Load Balancers Self-Heal
Health checks are probes the load balancer sends to each backend server to verify it can handle traffic.
Types of Health Checks
TCP health check: Try to establish a TCP connection to the server port. If it succeeds, the server is alive.
{
"HealthCheck": {
"Protocol": "TCP",
"Port": 3000,
"Interval": 10,
"Timeout": 5,
"HealthyThreshold": 2,
"UnhealthyThreshold": 3
}
}HTTP health check: Send an HTTP GET request and check the response status code.
{
"HealthCheck": {
"Protocol": "HTTP",
"Path": "/health",
"Port": 3000,
"Interval": 10,
"Timeout": 5,
"Matcher": {"HttpCode": "200"}
}
}Your application must expose a /health endpoint:
// Express.js health check endpoint
app.get('/health', async (req, res) => {
res.json({ status: 'ok', timestamp: Date.now() });
});
// More thorough readiness check
app.get('/ready', async (req, res) => {
try {
await db.query('SELECT 1');
res.json({ status: 'ready', database: 'ok' });
} catch (err) {
res.status(503).json({ status: 'unavailable', database: err.message });
}
});Separate liveness from readiness:
/health(liveness): Is the process running? Should it be restarted if it returns unhealthy?/ready(readiness): Is the server ready to accept traffic? Should the load balancer send it requests?
A server can be alive but not ready (still warming up, waiting for database). Use /ready as the load balancer health check endpoint so traffic is only routed to fully initialized instances.
Global Load Balancing (GeoDNS + Anycast)
For applications serving users worldwide, traffic should be routed to the nearest data center.
GeoDNS: DNS returns different IP addresses based on the user's geographic location. A user in London gets the London server IP; a user in Tokyo gets the Tokyo server IP.
Anycast: Multiple servers in different regions share the same public IP address. BGP routing directs each user to the nearest server automatically. Used by CDNs (Cloudflare, Fastly) and DNS resolvers (1.1.1.1, 8.8.8.8).
AWS Global Accelerator: Combines Anycast with intelligent routing — traffic enters AWS's backbone network at the edge and is routed to the optimal AWS region, bypassing congested public internet paths.
Frequently Asked Questions
Q: What is the difference between a load balancer and a reverse proxy?
A reverse proxy (like NGINX) sits in front of your servers and forwards requests on their behalf. A load balancer distributes those requests across multiple servers. In practice, NGINX does both simultaneously — it is a reverse proxy that can also load balance. AWS ALB and Cloudflare Load Balancing are dedicated load balancers.
Q: When should I use Least Connections instead of Round Robin?
Use Least Connections when your request handling time varies significantly. For example, a REST API where most endpoints return in 50ms but one endpoint does a heavy report that takes 10 seconds. Round Robin would pile up 10-second requests on the current server. Least Connections would route new fast requests away from the server still processing the slow one.
Q: How do you handle WebSocket connections in a load balancer?
WebSockets require the load balancer to be in Layer 7 mode and configured to upgrade the HTTP connection. In NGINX:
location /socket.io/ {
proxy_pass http://my_app;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 3600s; # Keep WebSocket connections open
}For sticky sessions with WebSockets, use IP hash or cookie-based session affinity to ensure reconnections go back to the same server.
Q: Can I use NGINX as a load balancer for free?
Yes. NGINX open-source is free and handles most load balancing needs. Active health checks, advanced session draining, and JWT authentication are NGINX Plus (paid) features. HAProxy is a free, open-source alternative with active health checks included.
Key Takeaway
Load balancing is the infrastructure of reliability. The algorithm you choose — Round Robin, Least Connections, or Consistent Hashing — should match your traffic pattern and whether your workload is stateless or stateful. Layer 7 balancing is the modern default for HTTP services because it enables routing intelligence, SSL termination, and health checks. Combine your load balancer with proper health check endpoints in your application and you have the foundation of a self-healing system that routes around failures automatically, invisible to your users.
Read next: Caching Strategies: Redis, CDN, and Beyond →
Part of the Software Architecture Hub — engineering the traffic.
