ArchitectureSystem Design

API Gateway: The Security and Traffic Guard

TT
TopicTrick Team
API Gateway: The Security and Traffic Guard

API Gateway: The Security and Traffic Guard

An API gateway is the single entry point to a microservices architecture. Every external request passes through it before reaching any internal service. The gateway handles cross-cutting concerns — authentication, rate limiting, routing, protocol translation, request/response transformation — so that individual services do not need to implement these themselves.

This guide covers the core capabilities of an API gateway, implementation patterns for authentication at the edge, rate limiting strategies, the Backend-for-Frontend (BFF) pattern, gateway tooling selection, and the trade-offs between a centralised gateway and distributed service meshes.


What an API Gateway Does

text
External Client
      │
      â–¼
┌──────────────────────────────────────────────┐
│                API Gateway                   │
│                                              │
│  1. TLS termination                          │
│  2. Authentication (validate JWT/API key)    │
│  3. Authorisation (check permissions)        │
│  4. Rate limiting (per user, per IP)         │
│  5. Request routing (path → service)         │
│  6. Protocol translation (REST ↔ gRPC)       │
│  7. Request/response transformation          │
│  8. Response aggregation                     │
│  9. Logging and distributed tracing          │
└──────────────────────────────────────────────┘
      │              │               │
      â–¼              â–¼               â–¼
User Service    Order Service   Payment Service

Without a gateway, every microservice must implement authentication, rate limiting, and logging independently — duplicating work and creating inconsistency. The gateway centralises these concerns.


Authentication at the Edge

The gateway validates the JWT or API key before forwarding requests to backend services. Internal services trust the gateway's verdict and receive user identity via trusted headers.

typescript
// Kong Gateway — authentication plugin configuration
// Apply globally: all routes require authentication
{
  "name": "jwt",
  "config": {
    "key_claim_name": "kid",
    "claims_to_verify": ["exp", "nbf"],
    "secret_is_base64": false
  }
}
nginx
# NGINX API Gateway — validate JWT and forward user identity
location /api/ {
    # Validate JWT using auth_jwt module
    auth_jwt "API" token=$http_authorization;
    auth_jwt_key_file /etc/nginx/jwt-public.pem;

    # Extract user ID from token claims and pass as header
    auth_jwt_claim_set $user_id sub;
    proxy_set_header X-User-ID $user_id;
    proxy_set_header Authorization "";  # Remove raw token from upstream request

    proxy_pass http://backend_services;
}
typescript
// Backend service — trusts the gateway's X-User-ID header
// No JWT verification needed; the gateway has already done it
app.get('/orders', (req, res) => {
  const userId = req.headers['x-user-id'];
  // The gateway guarantees this header is set and valid
  const orders = await ordersService.getOrdersForUser(userId);
  res.json(orders);
});

This pattern works because the gateway and backend services are in a trusted network. External clients cannot inject X-User-ID headers directly — the gateway strips or overrides them.


Rate Limiting

Rate limiting protects backend services from overload and prevents abuse. The gateway enforces limits before requests reach any service.

Token bucket algorithm

The most common production rate limiting algorithm:

text
User has a "bucket" with capacity N tokens.
Each request costs 1 token.
Tokens refill at rate R per second.
If the bucket is empty, the request is rejected (429).
typescript
// Rate limiter implementation (Redis-backed, for distributed gateways)
import { createClient } from 'redis';

const redis = createClient();

async function checkRateLimit(
  userId: string,
  limit: number,       // tokens per window
  windowSeconds: number
): Promise<{ allowed: boolean; remaining: number; resetAt: number }> {
  const key = `ratelimit:${userId}`;
  const now = Date.now();
  const windowMs = windowSeconds * 1000;
  const windowStart = now - windowMs;

  // Sliding window using sorted sets: each request is a member scored by timestamp
  const pipeline = redis.multi();
  pipeline.zRemRangeByScore(key, 0, windowStart);          // Remove old requests
  pipeline.zCard(key);                                      // Count requests in window
  pipeline.zAdd(key, { score: now, value: `${now}` });     // Add current request
  pipeline.expire(key, windowSeconds);

  const results = await pipeline.exec();
  const requestCount = results[1] as number;

  if (requestCount >= limit) {
    const oldestRequest = await redis.zRange(key, 0, 0, { REV: false });
    const resetAt = parseInt(oldestRequest[0]) + windowMs;

    return { allowed: false, remaining: 0, resetAt };
  }

  return { allowed: true, remaining: limit - requestCount - 1, resetAt: now + windowMs };
}
typescript
// Express middleware using the rate limiter
app.use(async (req, res, next) => {
  const userId = req.headers['x-user-id'] as string;
  const apiKey = req.headers['x-api-key'] as string;
  const identifier = userId ?? apiKey ?? req.ip;

  // Different limits for different tiers
  const limit = apiKey ? 1000 : 100;  // API key users get 1000/min, others 100/min

  const { allowed, remaining, resetAt } = await checkRateLimit(
    identifier,
    limit,
    60  // 60-second window
  );

  res.setHeader('X-RateLimit-Limit', limit);
  res.setHeader('X-RateLimit-Remaining', remaining);
  res.setHeader('X-RateLimit-Reset', Math.ceil(resetAt / 1000));

  if (!allowed) {
    return res.status(429).json({
      error: 'Too Many Requests',
      retryAfter: Math.ceil((resetAt - Date.now()) / 1000)
    });
  }

  next();
});

Request Routing

The gateway maps incoming paths and methods to backend services:

yaml
# Kong Gateway — declarative routing configuration
services:
  - name: user-service
    url: http://user-service.internal:3000
    routes:
      - name: users-api
        paths: ["/api/v1/users"]
        methods: ["GET", "POST", "PUT", "DELETE"]

  - name: order-service
    url: http://order-service.internal:3001
    routes:
      - name: orders-api
        paths: ["/api/v1/orders"]
        methods: ["GET", "POST"]

  - name: payment-service
    url: http://payment-service.internal:3002
    routes:
      - name: payments-api
        paths: ["/api/v1/payments"]
        methods: ["POST"]
        plugins:
          - name: request-validator
            config:
              body_schema: '{"type": "object", "required": ["amount", "userId"]}'

Path rewriting

The external API path does not have to match the internal service path:

text
External: GET /api/v1/users/123/profile
Gateway rewrites to: GET /users/123/profile (internal service path)
or
Gateway rewrites to: GET /profile?userId=123 (if service uses different convention)
nginx
# NGINX path rewriting
location /api/v1/users/ {
    rewrite ^/api/v1/users/(.*)$ /users/$1 break;
    proxy_pass http://user-service;
}

Protocol Translation: REST to gRPC

Backend services may use gRPC for efficiency while external clients use REST/JSON. The gateway translates:

text
Mobile client                    API Gateway              Backend
GET /users/123 (HTTP/JSON)  →   Translates to:      →   UserService.GetUser({id: "123"})
                                gRPC call                (Protobuf binary)
                            ←   Translates response  ←   Returns User protobuf message
{"id":"123","name":"Alice"}

Tools like Envoy and gRPC-gateway handle this translation automatically from protobuf definitions:

protobuf
// user.proto
service UserService {
  rpc GetUser(GetUserRequest) returns (User) {
    option (google.api.http) = {
      get: "/api/v1/users/{id}"    // Maps HTTP GET to gRPC call
    };
  }
}

Response Aggregation: Backend for Frontend (BFF)

A mobile app that needs user profile, orders, and recommendations would make 3 API calls — each adding network latency. The BFF pattern aggregates them:

typescript
// API Gateway BFF endpoint — aggregates 3 service calls into 1 response
app.get('/api/mobile/dashboard/:userId', async (req, res) => {
  const { userId } = req.params;

  // Call three services in parallel
  const [profile, recentOrders, recommendations] = await Promise.all([
    userService.getProfile(userId),
    orderService.getRecentOrders(userId, { limit: 5 }),
    recommendationService.getRecommendations(userId, { limit: 10 })
  ]);

  // Return a single aggregated response optimised for mobile
  res.json({
    user: {
      id: profile.id,
      name: profile.name,
      avatarUrl: profile.avatarUrl
      // Mobile only needs subset of user fields — exclude large profile fields
    },
    recentOrders: recentOrders.map(o => ({
      id: o.id,
      status: o.status,
      total: o.totalAmount
      // Mobile only needs status and total — exclude full item details
    })),
    recommendations: recommendations.map(r => ({
      productId: r.productId,
      name: r.name,
      imageUrl: r.thumbnailUrl   // Mobile uses thumbnail, not full image
    }))
  });
});

The BFF can have separate endpoints for web, mobile, and third-party consumers — each returning data shaped for its specific client.


Gateway Tool Selection

ToolBest forManagedProtocol support
Kong GatewayHigh-traffic enterprise, extensive plugin ecosystemSelf-hosted or Kong KonnectHTTP, gRPC, WebSocket
AWS API GatewayAWS Lambda and AWS-native microservicesFully managedHTTP, WebSocket
Azure API ManagementAzure ecosystems, enterprise governanceFully managedHTTP, WebSocket
TraefikKubernetes-native dynamic routingSelf-hostedHTTP, gRPC, TCP
EnvoyService mesh integration (Istio), high performanceSelf-hostedHTTP, gRPC, TCP
Apollo GatewayGraphQL federation, multiple GraphQL servicesSelf-hosted or Apollo CloudGraphQL
NGINXSimple reverse proxy, high performanceSelf-hostedHTTP, TCP

For most Kubernetes deployments in 2026: Traefik or NGINX Ingress for basic routing, Kong or Envoy for full API gateway capabilities.


High Availability

The API gateway is a critical single point of failure. Deploy it in a high-availability configuration:

yaml
# Kubernetes Deployment — multiple gateway replicas
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-gateway
spec:
  replicas: 3    # Minimum 3 for HA
  strategy:
    rollingUpdate:
      maxUnavailable: 0   # No downtime during gateway updates
      maxSurge: 1
  template:
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - topologyKey: kubernetes.io/hostname  # One replica per node
      containers:
        - name: gateway
          image: kong:3.6
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
          livenessProbe:
            httpGet:
              path: /status
              port: 8001
            initialDelaySeconds: 10
            periodSeconds: 5

Frequently Asked Questions

Q: What is the difference between an API gateway and a reverse proxy?

A reverse proxy forwards requests from clients to backend servers — its primary function is routing and load balancing. An API gateway does everything a reverse proxy does, plus: authentication, authorisation, rate limiting, protocol translation, request/response transformation, API key management, and developer portal features. NGINX is a reverse proxy; Kong is an API gateway. Many teams start with NGINX as a reverse proxy and graduate to a dedicated API gateway as requirements grow.

Q: Should every microservice route through the API gateway?

External requests always route through the gateway. Internal service-to-service calls typically bypass the gateway and communicate directly (or through a service mesh for mTLS). The gateway handles edge concerns (external authentication, rate limiting, public routing); the service mesh handles internal concerns (service-to-service authentication, traffic management, observability).

Q: How do I handle gateway downtime without losing all traffic?

A multi-region active-active deployment ensures the gateway is never a single point of failure. In Kubernetes, multiple gateway replicas across availability zones, combined with PodDisruptionBudgets, provide resilience against node failures. Use health checks at the load balancer level to route away from unhealthy gateway instances within seconds.

Q: What is GraphQL federation and when do I need an Apollo Gateway?

GraphQL federation allows multiple GraphQL services to compose a unified schema. The Apollo Gateway federates requests across subgraphs: a query that spans user data and order data is split by the gateway into sub-queries to the User Service and Order Service, then the results are merged before returning to the client. Use Apollo Gateway when you have multiple teams each owning a GraphQL service and you want clients to see a single unified GraphQL API.


Key Takeaway

The API gateway centralises the cross-cutting concerns that every microservice would otherwise implement independently: authentication, rate limiting, routing, protocol translation, and observability. By handling these at the gateway, each backend service becomes simpler — it trusts the authenticated user identity passed via headers and focuses on its domain logic. The BFF pattern extends the gateway to aggregate and shape responses for specific clients, reducing the number of round trips a mobile or web client must make. Deploy the gateway in a high-availability configuration — three or more replicas across availability zones — because every service in the architecture depends on it.

Read next: Message Queues: RabbitMQ and the Art of Async →


Part of the Software Architecture Hub — engineering the entry.