API Gateway: The Security and Traffic Guard

API Gateway: The Security and Traffic Guard
An API gateway is the single entry point to a microservices architecture. Every external request passes through it before reaching any internal service. The gateway handles cross-cutting concerns — authentication, rate limiting, routing, protocol translation, request/response transformation — so that individual services do not need to implement these themselves.
This guide covers the core capabilities of an API gateway, implementation patterns for authentication at the edge, rate limiting strategies, the Backend-for-Frontend (BFF) pattern, gateway tooling selection, and the trade-offs between a centralised gateway and distributed service meshes.
What an API Gateway Does
External Client
│
â–¼
┌──────────────────────────────────────────────â”
│ API Gateway │
│ │
│ 1. TLS termination │
│ 2. Authentication (validate JWT/API key) │
│ 3. Authorisation (check permissions) │
│ 4. Rate limiting (per user, per IP) │
│ 5. Request routing (path → service) │
│ 6. Protocol translation (REST ↔ gRPC) │
│ 7. Request/response transformation │
│ 8. Response aggregation │
│ 9. Logging and distributed tracing │
└──────────────────────────────────────────────┘
│ │ │
â–¼ â–¼ â–¼
User Service Order Service Payment ServiceWithout a gateway, every microservice must implement authentication, rate limiting, and logging independently — duplicating work and creating inconsistency. The gateway centralises these concerns.
Authentication at the Edge
The gateway validates the JWT or API key before forwarding requests to backend services. Internal services trust the gateway's verdict and receive user identity via trusted headers.
// Kong Gateway — authentication plugin configuration
// Apply globally: all routes require authentication
{
"name": "jwt",
"config": {
"key_claim_name": "kid",
"claims_to_verify": ["exp", "nbf"],
"secret_is_base64": false
}
}# NGINX API Gateway — validate JWT and forward user identity
location /api/ {
# Validate JWT using auth_jwt module
auth_jwt "API" token=$http_authorization;
auth_jwt_key_file /etc/nginx/jwt-public.pem;
# Extract user ID from token claims and pass as header
auth_jwt_claim_set $user_id sub;
proxy_set_header X-User-ID $user_id;
proxy_set_header Authorization ""; # Remove raw token from upstream request
proxy_pass http://backend_services;
}// Backend service — trusts the gateway's X-User-ID header
// No JWT verification needed; the gateway has already done it
app.get('/orders', (req, res) => {
const userId = req.headers['x-user-id'];
// The gateway guarantees this header is set and valid
const orders = await ordersService.getOrdersForUser(userId);
res.json(orders);
});This pattern works because the gateway and backend services are in a trusted network. External clients cannot inject X-User-ID headers directly — the gateway strips or overrides them.
Rate Limiting
Rate limiting protects backend services from overload and prevents abuse. The gateway enforces limits before requests reach any service.
Token bucket algorithm
The most common production rate limiting algorithm:
User has a "bucket" with capacity N tokens.
Each request costs 1 token.
Tokens refill at rate R per second.
If the bucket is empty, the request is rejected (429).// Rate limiter implementation (Redis-backed, for distributed gateways)
import { createClient } from 'redis';
const redis = createClient();
async function checkRateLimit(
userId: string,
limit: number, // tokens per window
windowSeconds: number
): Promise<{ allowed: boolean; remaining: number; resetAt: number }> {
const key = `ratelimit:${userId}`;
const now = Date.now();
const windowMs = windowSeconds * 1000;
const windowStart = now - windowMs;
// Sliding window using sorted sets: each request is a member scored by timestamp
const pipeline = redis.multi();
pipeline.zRemRangeByScore(key, 0, windowStart); // Remove old requests
pipeline.zCard(key); // Count requests in window
pipeline.zAdd(key, { score: now, value: `${now}` }); // Add current request
pipeline.expire(key, windowSeconds);
const results = await pipeline.exec();
const requestCount = results[1] as number;
if (requestCount >= limit) {
const oldestRequest = await redis.zRange(key, 0, 0, { REV: false });
const resetAt = parseInt(oldestRequest[0]) + windowMs;
return { allowed: false, remaining: 0, resetAt };
}
return { allowed: true, remaining: limit - requestCount - 1, resetAt: now + windowMs };
}// Express middleware using the rate limiter
app.use(async (req, res, next) => {
const userId = req.headers['x-user-id'] as string;
const apiKey = req.headers['x-api-key'] as string;
const identifier = userId ?? apiKey ?? req.ip;
// Different limits for different tiers
const limit = apiKey ? 1000 : 100; // API key users get 1000/min, others 100/min
const { allowed, remaining, resetAt } = await checkRateLimit(
identifier,
limit,
60 // 60-second window
);
res.setHeader('X-RateLimit-Limit', limit);
res.setHeader('X-RateLimit-Remaining', remaining);
res.setHeader('X-RateLimit-Reset', Math.ceil(resetAt / 1000));
if (!allowed) {
return res.status(429).json({
error: 'Too Many Requests',
retryAfter: Math.ceil((resetAt - Date.now()) / 1000)
});
}
next();
});Request Routing
The gateway maps incoming paths and methods to backend services:
# Kong Gateway — declarative routing configuration
services:
- name: user-service
url: http://user-service.internal:3000
routes:
- name: users-api
paths: ["/api/v1/users"]
methods: ["GET", "POST", "PUT", "DELETE"]
- name: order-service
url: http://order-service.internal:3001
routes:
- name: orders-api
paths: ["/api/v1/orders"]
methods: ["GET", "POST"]
- name: payment-service
url: http://payment-service.internal:3002
routes:
- name: payments-api
paths: ["/api/v1/payments"]
methods: ["POST"]
plugins:
- name: request-validator
config:
body_schema: '{"type": "object", "required": ["amount", "userId"]}'Path rewriting
The external API path does not have to match the internal service path:
External: GET /api/v1/users/123/profile
Gateway rewrites to: GET /users/123/profile (internal service path)
or
Gateway rewrites to: GET /profile?userId=123 (if service uses different convention)# NGINX path rewriting
location /api/v1/users/ {
rewrite ^/api/v1/users/(.*)$ /users/$1 break;
proxy_pass http://user-service;
}Protocol Translation: REST to gRPC
Backend services may use gRPC for efficiency while external clients use REST/JSON. The gateway translates:
Mobile client API Gateway Backend
GET /users/123 (HTTP/JSON) → Translates to: → UserService.GetUser({id: "123"})
gRPC call (Protobuf binary)
↠Translates response ↠Returns User protobuf message
{"id":"123","name":"Alice"}Tools like Envoy and gRPC-gateway handle this translation automatically from protobuf definitions:
// user.proto
service UserService {
rpc GetUser(GetUserRequest) returns (User) {
option (google.api.http) = {
get: "/api/v1/users/{id}" // Maps HTTP GET to gRPC call
};
}
}Response Aggregation: Backend for Frontend (BFF)
A mobile app that needs user profile, orders, and recommendations would make 3 API calls — each adding network latency. The BFF pattern aggregates them:
// API Gateway BFF endpoint — aggregates 3 service calls into 1 response
app.get('/api/mobile/dashboard/:userId', async (req, res) => {
const { userId } = req.params;
// Call three services in parallel
const [profile, recentOrders, recommendations] = await Promise.all([
userService.getProfile(userId),
orderService.getRecentOrders(userId, { limit: 5 }),
recommendationService.getRecommendations(userId, { limit: 10 })
]);
// Return a single aggregated response optimised for mobile
res.json({
user: {
id: profile.id,
name: profile.name,
avatarUrl: profile.avatarUrl
// Mobile only needs subset of user fields — exclude large profile fields
},
recentOrders: recentOrders.map(o => ({
id: o.id,
status: o.status,
total: o.totalAmount
// Mobile only needs status and total — exclude full item details
})),
recommendations: recommendations.map(r => ({
productId: r.productId,
name: r.name,
imageUrl: r.thumbnailUrl // Mobile uses thumbnail, not full image
}))
});
});The BFF can have separate endpoints for web, mobile, and third-party consumers — each returning data shaped for its specific client.
Gateway Tool Selection
| Tool | Best for | Managed | Protocol support |
|---|---|---|---|
| Kong Gateway | High-traffic enterprise, extensive plugin ecosystem | Self-hosted or Kong Konnect | HTTP, gRPC, WebSocket |
| AWS API Gateway | AWS Lambda and AWS-native microservices | Fully managed | HTTP, WebSocket |
| Azure API Management | Azure ecosystems, enterprise governance | Fully managed | HTTP, WebSocket |
| Traefik | Kubernetes-native dynamic routing | Self-hosted | HTTP, gRPC, TCP |
| Envoy | Service mesh integration (Istio), high performance | Self-hosted | HTTP, gRPC, TCP |
| Apollo Gateway | GraphQL federation, multiple GraphQL services | Self-hosted or Apollo Cloud | GraphQL |
| NGINX | Simple reverse proxy, high performance | Self-hosted | HTTP, TCP |
For most Kubernetes deployments in 2026: Traefik or NGINX Ingress for basic routing, Kong or Envoy for full API gateway capabilities.
High Availability
The API gateway is a critical single point of failure. Deploy it in a high-availability configuration:
# Kubernetes Deployment — multiple gateway replicas
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-gateway
spec:
replicas: 3 # Minimum 3 for HA
strategy:
rollingUpdate:
maxUnavailable: 0 # No downtime during gateway updates
maxSurge: 1
template:
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname # One replica per node
containers:
- name: gateway
image: kong:3.6
resources:
requests:
cpu: "500m"
memory: "512Mi"
livenessProbe:
httpGet:
path: /status
port: 8001
initialDelaySeconds: 10
periodSeconds: 5Frequently Asked Questions
Q: What is the difference between an API gateway and a reverse proxy?
A reverse proxy forwards requests from clients to backend servers — its primary function is routing and load balancing. An API gateway does everything a reverse proxy does, plus: authentication, authorisation, rate limiting, protocol translation, request/response transformation, API key management, and developer portal features. NGINX is a reverse proxy; Kong is an API gateway. Many teams start with NGINX as a reverse proxy and graduate to a dedicated API gateway as requirements grow.
Q: Should every microservice route through the API gateway?
External requests always route through the gateway. Internal service-to-service calls typically bypass the gateway and communicate directly (or through a service mesh for mTLS). The gateway handles edge concerns (external authentication, rate limiting, public routing); the service mesh handles internal concerns (service-to-service authentication, traffic management, observability).
Q: How do I handle gateway downtime without losing all traffic?
A multi-region active-active deployment ensures the gateway is never a single point of failure. In Kubernetes, multiple gateway replicas across availability zones, combined with PodDisruptionBudgets, provide resilience against node failures. Use health checks at the load balancer level to route away from unhealthy gateway instances within seconds.
Q: What is GraphQL federation and when do I need an Apollo Gateway?
GraphQL federation allows multiple GraphQL services to compose a unified schema. The Apollo Gateway federates requests across subgraphs: a query that spans user data and order data is split by the gateway into sub-queries to the User Service and Order Service, then the results are merged before returning to the client. Use Apollo Gateway when you have multiple teams each owning a GraphQL service and you want clients to see a single unified GraphQL API.
Key Takeaway
The API gateway centralises the cross-cutting concerns that every microservice would otherwise implement independently: authentication, rate limiting, routing, protocol translation, and observability. By handling these at the gateway, each backend service becomes simpler — it trusts the authenticated user identity passed via headers and focuses on its domain logic. The BFF pattern extends the gateway to aggregate and shape responses for specific clients, reducing the number of round trips a mobile or web client must make. Deploy the gateway in a high-availability configuration — three or more replicas across availability zones — because every service in the architecture depends on it.
Read next: Message Queues: RabbitMQ and the Art of Async →
Part of the Software Architecture Hub — engineering the entry.
