Node.jsBackendFull-Stack

Node.js Performance: Clustering, Worker Threads & PM2

TT
TopicTrick Team
Node.js Performance: Clustering, Worker Threads & PM2

Node.js Performance: Clustering, Worker Threads & PM2

A single-threaded Node.js process is fast for I/O-bound work — database queries, HTTP calls, file reads — because those operations happen asynchronously. But it has two performance ceilings: it can only use one CPU core, and any CPU-intensive operation blocks the event loop for every other request.

This module covers how to remove both ceilings: clustering for multi-core utilisation, worker threads for CPU-bound work, PM2 for production process management, and profiling to find where time is actually being spent.

This is Module 32 of the Node.js Full‑Stack Developer course.


The Problem: Single-Threaded Bottlenecks

js
// This blocks the event loop — no other requests can be handled while it runs
app.get('/compute', (req, res) => {
  let result = 0;
  for (let i = 0; i < 10_000_000_000; i++) result += i; // 5-10 seconds
  res.json({ result });
});

While the loop runs, every other request to your server waits. This is the event loop blocking problem.

There are two solutions:

  1. Clustering — run multiple Node.js processes so other processes handle requests while one is busy
  2. Worker threads — move the CPU-intensive work off the main thread

Clustering with the cluster Module

js
// src/cluster.js
import cluster from 'cluster';
import os from 'os';
import { fileURLToPath } from 'url';

const numCPUs = os.availableParallelism(); // Node 19+, or os.cpus().length

if (cluster.isPrimary) {
  console.log(`Primary process ${process.pid} running`);
  console.log(`Spawning ${numCPUs} workers`);

  // Fork one worker per CPU core
  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  // Restart crashed workers
  cluster.on('exit', (worker, code, signal) => {
    console.warn(`Worker ${worker.process.pid} died (${signal || code}). Restarting...`);
    cluster.fork();
  });

  cluster.on('online', (worker) => {
    console.log(`Worker ${worker.process.pid} is online`);
  });
} else {
  // Workers run the Express app
  const { default: app } = await import('./app.js');
  const PORT = process.env.PORT ?? 3000;

  app.listen(PORT, () => {
    console.log(`Worker ${process.pid} listening on port ${PORT}`);
  });
}

Run with: node src/cluster.js

All workers share port 3000. The OS distributes incoming connections across them (round-robin on Linux).


PM2 Cluster Mode (Recommended for Production)

PM2 handles clustering without changing your application code:

js
// ecosystem.config.js
module.exports = {
  apps: [{
    name:            'myapp',
    script:          'src/server.js',
    instances:       'max',        // one per CPU core; or a number: 4
    exec_mode:       'cluster',
    env_file:        '.env.production',
    max_memory_restart: '500M',
    listen_timeout:  10000,        // ms to wait for app to start listening
    kill_timeout:    5000,         // ms to wait for graceful shutdown
    error_file:      '/var/log/myapp/error.log',
    out_file:        '/var/log/myapp/out.log',
    merge_logs:      true,
  }],
};
bash
pm2 start ecosystem.config.js

# Zero-downtime reload — sends SIGINT to old workers one at a time
# and waits for new workers to be ready before removing old ones
pm2 reload myapp

pm2 status          # shows each worker's CPU and memory
pm2 monit           # live dashboard

PM2 reload is the key production command — it restarts workers one by one, ensuring zero downtime during deployments.


Graceful Shutdown

PM2 sends SIGINT to workers during reload. Handle it to drain in-flight requests:

js
// src/server.js
import http from 'http';
import app from './app.js';
import { disconnectDB } from './config/db.js';
import redis from './lib/redis.js';

const server = http.createServer(app);
const PORT   = process.env.PORT ?? 3000;

server.listen(PORT, () => {
  console.log(`Process ${process.pid} listening on port ${PORT}`);
});

async function gracefulShutdown(signal) {
  console.log(`${signal} received — shutting down gracefully`);

  // Stop accepting new connections
  server.close(async () => {
    console.log('HTTP server closed');

    try {
      await disconnectDB();
      await redis.quit();
      console.log('Database and Redis connections closed');
      process.exit(0);
    } catch (err) {
      console.error('Error during shutdown:', err);
      process.exit(1);
    }
  });

  // Force exit after 10 seconds if server hasn't closed
  setTimeout(() => {
    console.error('Forced shutdown after timeout');
    process.exit(1);
  }, 10_000);
}

process.on('SIGINT',  () => gracefulShutdown('SIGINT'));
process.on('SIGTERM', () => gracefulShutdown('SIGTERM'));

Worker Threads for CPU-Intensive Tasks

Worker threads run in separate V8 isolates within the same process — they share memory via SharedArrayBuffer and communicate via postMessage:

js
// workers/hash.worker.js
import { parentPort, workerData } from 'worker_threads';
import bcrypt from 'bcrypt';

// Receive work from the main thread
const { password, rounds } = workerData;

const hash = await bcrypt.hash(password, rounds);

// Send result back
parentPort.postMessage({ hash });
js
// lib/worker-pool.js
import { Worker } from 'worker_threads';
import path from 'path';
import { fileURLToPath } from 'url';

const __dirname = path.dirname(fileURLToPath(import.meta.url));

/**
 * Run a task in a worker thread.
 * @param {string} workerScript - path to worker file
 * @param {object} data - data to pass to the worker
 */
export function runInWorker(workerScript, data) {
  return new Promise((resolve, reject) => {
    const worker = new Worker(path.resolve(__dirname, '..', workerScript), {
      workerData: data,
    });

    worker.on('message', resolve);
    worker.on('error',   reject);
    worker.on('exit', (code) => {
      if (code !== 0) reject(new Error(`Worker exited with code ${code}`));
    });
  });
}
js
// Usage in a route handler
import { runInWorker } from '../lib/worker-pool.js';

app.post('/hash', async (req, res, next) => {
  try {
    const { hash } = await runInWorker('workers/hash.worker.js', {
      password: req.body.password,
      rounds:   12,
    });
    res.json({ hash });
  } catch (err) {
    next(err);
  }
});

The event loop is not blocked — the hash runs in a worker thread and the main thread continues serving other requests.


Worker Thread Pool

For frequent CPU tasks, creating a new worker per request is expensive. Use a pool:

bash
npm install piscina    # battle-tested worker thread pool
js
// lib/hash-pool.js
import Piscina from 'piscina';
import { fileURLToPath } from 'url';
import path from 'path';

const __dirname = path.dirname(fileURLToPath(import.meta.url));

export const hashPool = new Piscina({
  filename: path.resolve(__dirname, '../workers/hash.worker.js'),
  maxThreads: 4,
  idleTimeout: 30_000,
});
js
// Usage
const { hash } = await hashPool.run({ password: 'secret', rounds: 12 });

Piscina manages a pool of workers, queuing tasks and dispatching to idle threads.


Memory Usage Monitoring

js
// middleware/memoryLogger.js
export function memoryLogger(req, res, next) {
  const { heapUsed, heapTotal, rss } = process.memoryUsage();
  const toMB = bytes => (bytes / 1024 / 1024).toFixed(1);

  // Log if heap usage exceeds 80%
  if (heapUsed / heapTotal > 0.8) {
    console.warn(`High memory: heap ${toMB(heapUsed)}MB / ${toMB(heapTotal)}MB | RSS ${toMB(rss)}MB`);
  }

  next();
}
js
// Health check with memory info
app.get('/health', (req, res) => {
  const mem   = process.memoryUsage();
  const toMB  = b => Math.round(b / 1024 / 1024);

  res.json({
    status:  'ok',
    pid:     process.pid,
    uptime:  Math.round(process.uptime()),
    memory: {
      heapUsed:  `${toMB(mem.heapUsed)}MB`,
      heapTotal: `${toMB(mem.heapTotal)}MB`,
      rss:       `${toMB(mem.rss)}MB`,
    },
  });
});

Profiling with --inspect

bash
# Start with inspector
node --inspect src/server.js

# Open Chrome → chrome://inspect → "Open dedicated DevTools for Node"
# Go to the "Performance" tab → Record → run load test → Stop

Load Testing with autocannon

bash
npm install -g autocannon

# Benchmark your endpoint
autocannon -c 100 -d 10 http://localhost:3000/api/posts
# -c 100: 100 concurrent connections
# -d 10: 10 second duration
text
┌─────────────────────────────────────────────────────────┐
│ Stat │ 2.5% │ 50%  │ 97.5% │ 99%  │ Avg    │ Req/sec │
│──────┼──────┼──────┼───────┼──────┼────────┼─────────│
│ Latency │ 5ms │ 8ms │ 22ms │ 45ms │ 9.2ms │         │
│ Requests │    │      │       │      │        │ 10,840  │
└─────────────────────────────────────────────────────────┘

Key Performance Optimisations

Response Compression

bash
npm install compression
js
import compression from 'compression';
app.use(compression()); // compress all responses > 1kb

Connection Keep-Alive

js
// Increase keep-alive timeout (default is 5 seconds — too short for AWS ALB)
server.keepAliveTimeout    = 65_000; // must be > ALB idle timeout (60s)
server.headersTimeout      = 66_000; // must be > keepAliveTimeout

Mongoose Connection Pool

js
mongoose.connect(uri, {
  maxPoolSize: 10,        // default: 5; increase for high concurrency
  minPoolSize: 2,
  socketTimeoutMS: 30_000,
  serverSelectionTimeoutMS: 5_000,
});

Avoid Synchronous Operations in Requests

js
// ❌ Blocks event loop
const data = fs.readFileSync('file.json');

// ✅ Non-blocking
const data = await fs.promises.readFile('file.json', 'utf8');

Common Memory Leak Patterns

js
// ❌ Event listener added inside request handler — never removed
app.get('/subscribe', (req, res) => {
  emitter.on('data', (data) => res.write(data)); // adds listener on every request!
});

// ✅ Remove listener when connection closes
app.get('/subscribe', (req, res) => {
  const handler = (data) => res.write(data);
  emitter.on('data', handler);
  req.on('close', () => emitter.off('data', handler));
});

// ❌ Unbounded cache — grows forever
const cache = {};
app.get('/data/:id', (req, res) => {
  cache[req.params.id] = fetchData(req.params.id); // never evicted
});

// ✅ Use a bounded LRU cache
import { LRUCache } from 'lru-cache';
const cache = new LRUCache({ max: 500, ttl: 1000 * 60 }); // max 500 items, 1min TTL

Node.js Full‑Stack Course — Module 32 of 32

You have mastered Node.js performance. Continue to the final module to test everything you have learned.


    Summary

    Scaling a Node.js application requires addressing both throughput and CPU bottlenecks:

    • Cluster mode (PM2 instances: 'max') spawns one worker per CPU core — the easiest path to full CPU utilisation with no code changes
    • Graceful shutdown listens for SIGINT/SIGTERM, drains in-flight requests, and closes DB connections cleanly — required for zero-downtime PM2 reloads
    • Worker threads move CPU-intensive operations off the main event loop — use Piscina for a managed pool
    • Profiling with --inspect and load testing with autocannon reveals real bottlenecks before they hit production
    • Enable gzip compression, tune MongoDB connection pool, and set keep-alive timeouts correctly for cloud load balancers
    • Watch for event listener leaks and unbounded caches — both cause steadily growing heap usage that eventually crashes the process
    • Monitor memory via process.memoryUsage() in the health check endpoint and pm2 monit in production

    Continue to Module 33: Final Knowledge Test →