Node.js Performance: Clustering, Worker Threads & PM2

Node.js Performance: Clustering, Worker Threads & PM2
A single-threaded Node.js process is fast for I/O-bound work — database queries, HTTP calls, file reads — because those operations happen asynchronously. But it has two performance ceilings: it can only use one CPU core, and any CPU-intensive operation blocks the event loop for every other request.
This module covers how to remove both ceilings: clustering for multi-core utilisation, worker threads for CPU-bound work, PM2 for production process management, and profiling to find where time is actually being spent.
This is Module 32 of the Node.js Full‑Stack Developer course.
The Problem: Single-Threaded Bottlenecks
// This blocks the event loop — no other requests can be handled while it runs
app.get('/compute', (req, res) => {
let result = 0;
for (let i = 0; i < 10_000_000_000; i++) result += i; // 5-10 seconds
res.json({ result });
});While the loop runs, every other request to your server waits. This is the event loop blocking problem.
There are two solutions:
- Clustering — run multiple Node.js processes so other processes handle requests while one is busy
- Worker threads — move the CPU-intensive work off the main thread
Clustering with the cluster Module
// src/cluster.js
import cluster from 'cluster';
import os from 'os';
import { fileURLToPath } from 'url';
const numCPUs = os.availableParallelism(); // Node 19+, or os.cpus().length
if (cluster.isPrimary) {
console.log(`Primary process ${process.pid} running`);
console.log(`Spawning ${numCPUs} workers`);
// Fork one worker per CPU core
for (let i = 0; i < numCPUs; i++) {
cluster.fork();
}
// Restart crashed workers
cluster.on('exit', (worker, code, signal) => {
console.warn(`Worker ${worker.process.pid} died (${signal || code}). Restarting...`);
cluster.fork();
});
cluster.on('online', (worker) => {
console.log(`Worker ${worker.process.pid} is online`);
});
} else {
// Workers run the Express app
const { default: app } = await import('./app.js');
const PORT = process.env.PORT ?? 3000;
app.listen(PORT, () => {
console.log(`Worker ${process.pid} listening on port ${PORT}`);
});
}Run with: node src/cluster.js
All workers share port 3000. The OS distributes incoming connections across them (round-robin on Linux).
PM2 Cluster Mode (Recommended for Production)
PM2 handles clustering without changing your application code:
// ecosystem.config.js
module.exports = {
apps: [{
name: 'myapp',
script: 'src/server.js',
instances: 'max', // one per CPU core; or a number: 4
exec_mode: 'cluster',
env_file: '.env.production',
max_memory_restart: '500M',
listen_timeout: 10000, // ms to wait for app to start listening
kill_timeout: 5000, // ms to wait for graceful shutdown
error_file: '/var/log/myapp/error.log',
out_file: '/var/log/myapp/out.log',
merge_logs: true,
}],
};pm2 start ecosystem.config.js
# Zero-downtime reload — sends SIGINT to old workers one at a time
# and waits for new workers to be ready before removing old ones
pm2 reload myapp
pm2 status # shows each worker's CPU and memory
pm2 monit # live dashboardPM2 reload is the key production command — it restarts workers one by one, ensuring zero downtime during deployments.
Graceful Shutdown
PM2 sends SIGINT to workers during reload. Handle it to drain in-flight requests:
// src/server.js
import http from 'http';
import app from './app.js';
import { disconnectDB } from './config/db.js';
import redis from './lib/redis.js';
const server = http.createServer(app);
const PORT = process.env.PORT ?? 3000;
server.listen(PORT, () => {
console.log(`Process ${process.pid} listening on port ${PORT}`);
});
async function gracefulShutdown(signal) {
console.log(`${signal} received — shutting down gracefully`);
// Stop accepting new connections
server.close(async () => {
console.log('HTTP server closed');
try {
await disconnectDB();
await redis.quit();
console.log('Database and Redis connections closed');
process.exit(0);
} catch (err) {
console.error('Error during shutdown:', err);
process.exit(1);
}
});
// Force exit after 10 seconds if server hasn't closed
setTimeout(() => {
console.error('Forced shutdown after timeout');
process.exit(1);
}, 10_000);
}
process.on('SIGINT', () => gracefulShutdown('SIGINT'));
process.on('SIGTERM', () => gracefulShutdown('SIGTERM'));Worker Threads for CPU-Intensive Tasks
Worker threads run in separate V8 isolates within the same process — they share memory via SharedArrayBuffer and communicate via postMessage:
// workers/hash.worker.js
import { parentPort, workerData } from 'worker_threads';
import bcrypt from 'bcrypt';
// Receive work from the main thread
const { password, rounds } = workerData;
const hash = await bcrypt.hash(password, rounds);
// Send result back
parentPort.postMessage({ hash });// lib/worker-pool.js
import { Worker } from 'worker_threads';
import path from 'path';
import { fileURLToPath } from 'url';
const __dirname = path.dirname(fileURLToPath(import.meta.url));
/**
* Run a task in a worker thread.
* @param {string} workerScript - path to worker file
* @param {object} data - data to pass to the worker
*/
export function runInWorker(workerScript, data) {
return new Promise((resolve, reject) => {
const worker = new Worker(path.resolve(__dirname, '..', workerScript), {
workerData: data,
});
worker.on('message', resolve);
worker.on('error', reject);
worker.on('exit', (code) => {
if (code !== 0) reject(new Error(`Worker exited with code ${code}`));
});
});
}// Usage in a route handler
import { runInWorker } from '../lib/worker-pool.js';
app.post('/hash', async (req, res, next) => {
try {
const { hash } = await runInWorker('workers/hash.worker.js', {
password: req.body.password,
rounds: 12,
});
res.json({ hash });
} catch (err) {
next(err);
}
});The event loop is not blocked — the hash runs in a worker thread and the main thread continues serving other requests.
Worker Thread Pool
For frequent CPU tasks, creating a new worker per request is expensive. Use a pool:
npm install piscina # battle-tested worker thread pool// lib/hash-pool.js
import Piscina from 'piscina';
import { fileURLToPath } from 'url';
import path from 'path';
const __dirname = path.dirname(fileURLToPath(import.meta.url));
export const hashPool = new Piscina({
filename: path.resolve(__dirname, '../workers/hash.worker.js'),
maxThreads: 4,
idleTimeout: 30_000,
});// Usage
const { hash } = await hashPool.run({ password: 'secret', rounds: 12 });Piscina manages a pool of workers, queuing tasks and dispatching to idle threads.
Memory Usage Monitoring
// middleware/memoryLogger.js
export function memoryLogger(req, res, next) {
const { heapUsed, heapTotal, rss } = process.memoryUsage();
const toMB = bytes => (bytes / 1024 / 1024).toFixed(1);
// Log if heap usage exceeds 80%
if (heapUsed / heapTotal > 0.8) {
console.warn(`High memory: heap ${toMB(heapUsed)}MB / ${toMB(heapTotal)}MB | RSS ${toMB(rss)}MB`);
}
next();
}// Health check with memory info
app.get('/health', (req, res) => {
const mem = process.memoryUsage();
const toMB = b => Math.round(b / 1024 / 1024);
res.json({
status: 'ok',
pid: process.pid,
uptime: Math.round(process.uptime()),
memory: {
heapUsed: `${toMB(mem.heapUsed)}MB`,
heapTotal: `${toMB(mem.heapTotal)}MB`,
rss: `${toMB(mem.rss)}MB`,
},
});
});Profiling with --inspect
# Start with inspector
node --inspect src/server.js
# Open Chrome → chrome://inspect → "Open dedicated DevTools for Node"
# Go to the "Performance" tab → Record → run load test → StopLoad Testing with autocannon
npm install -g autocannon
# Benchmark your endpoint
autocannon -c 100 -d 10 http://localhost:3000/api/posts
# -c 100: 100 concurrent connections
# -d 10: 10 second duration┌─────────────────────────────────────────────────────────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Req/sec │
│──────┼──────┼──────┼───────┼──────┼────────┼─────────│
│ Latency │ 5ms │ 8ms │ 22ms │ 45ms │ 9.2ms │ │
│ Requests │ │ │ │ │ │ 10,840 │
└─────────────────────────────────────────────────────────┘Key Performance Optimisations
Response Compression
npm install compressionimport compression from 'compression';
app.use(compression()); // compress all responses > 1kbConnection Keep-Alive
// Increase keep-alive timeout (default is 5 seconds — too short for AWS ALB)
server.keepAliveTimeout = 65_000; // must be > ALB idle timeout (60s)
server.headersTimeout = 66_000; // must be > keepAliveTimeoutMongoose Connection Pool
mongoose.connect(uri, {
maxPoolSize: 10, // default: 5; increase for high concurrency
minPoolSize: 2,
socketTimeoutMS: 30_000,
serverSelectionTimeoutMS: 5_000,
});Avoid Synchronous Operations in Requests
// ❌ Blocks event loop
const data = fs.readFileSync('file.json');
// ✅ Non-blocking
const data = await fs.promises.readFile('file.json', 'utf8');Common Memory Leak Patterns
// ❌ Event listener added inside request handler — never removed
app.get('/subscribe', (req, res) => {
emitter.on('data', (data) => res.write(data)); // adds listener on every request!
});
// ✅ Remove listener when connection closes
app.get('/subscribe', (req, res) => {
const handler = (data) => res.write(data);
emitter.on('data', handler);
req.on('close', () => emitter.off('data', handler));
});
// ❌ Unbounded cache — grows forever
const cache = {};
app.get('/data/:id', (req, res) => {
cache[req.params.id] = fetchData(req.params.id); // never evicted
});
// ✅ Use a bounded LRU cache
import { LRUCache } from 'lru-cache';
const cache = new LRUCache({ max: 500, ttl: 1000 * 60 }); // max 500 items, 1min TTLNode.js Full‑Stack Course — Module 32 of 32
You have mastered Node.js performance. Continue to the final module to test everything you have learned.
Summary
Scaling a Node.js application requires addressing both throughput and CPU bottlenecks:
- Cluster mode (PM2
instances: 'max') spawns one worker per CPU core — the easiest path to full CPU utilisation with no code changes - Graceful shutdown listens for
SIGINT/SIGTERM, drains in-flight requests, and closes DB connections cleanly — required for zero-downtime PM2 reloads - Worker threads move CPU-intensive operations off the main event loop — use Piscina for a managed pool
- Profiling with
--inspectand load testing with autocannon reveals real bottlenecks before they hit production - Enable gzip compression, tune MongoDB connection pool, and set keep-alive timeouts correctly for cloud load balancers
- Watch for event listener leaks and unbounded caches — both cause steadily growing heap usage that eventually crashes the process
- Monitor memory via
process.memoryUsage()in the health check endpoint andpm2 monitin production
Continue to Module 33: Final Knowledge Test →
