Why does Node.js only use one CPU core by default?

Node.js runs JavaScript on a single thread. A single thread can only execute on one CPU core at a time. On a server with 8 cores, a single Node.js process uses at most 1/8 of the available CPU capacity. The cluster module and PM2 cluster mode solve this by spawning one worker process per core, each with its own event loop, sharing the same port. The operating system distributes incoming connections across the workers.

What is the difference between the cluster module and worker threads?

The cluster module spawns multiple Node.js processes, each with its own event loop, memory space, and V8 instance. Workers communicate via IPC messages. It is designed for scaling HTTP servers across CPU cores. Worker threads run within the same process and share memory (via SharedArrayBuffer). They are designed for CPU-intensive calculations that would block the event loop — image processing, encryption, data transformation. Use cluster for HTTP scaling; use worker threads for CPU-bound work.

What is PM2 cluster mode and how does it differ from the cluster module?

PM2 cluster mode manages a cluster of Node.js processes for you, without any changes to your application code. You specify instances: 'max' in the ecosystem config and PM2 handles spawning, monitoring, restarting, and load balancing across all CPU cores. The Node.js cluster module requires you to write the master/worker logic yourself. PM2 is preferred in production because it adds process monitoring, log management, graceful reloads, and startup scripts on top of clustering.

How do I detect a memory leak in Node.js?

Memory leaks in Node.js are usually caused by: references held in closures or global variables that prevent garbage collection, event listeners that are not removed, growing arrays or maps that are never cleared, or caching without eviction. Detect leaks with: (1) monitoring process.memoryUsage().heapUsed over time — a steadily growing heap indicates a leak; (2) using clinic.js or --inspect with Chrome DevTools to take heap snapshots and compare them; (3) pm2 monit to watch memory per process in production.

What is graceful shutdown and why is it important?

A graceful shutdown stops accepting new requests, allows in-flight requests to complete, closes database connections, and then exits. Without it, a deployment or crash kills the process mid-request, leaving clients with broken responses and potentially corrupting database operations. Implement it by listening to SIGTERM, calling server.close() to stop accepting new connections, then closing DB connections and calling process.exit(0) once the server is idle.

← Back to Node.js Full‑Stack Course

Node.js Performance: Clustering, Worker Threads & PM2

A single-threaded Node.js process is fast for I/O-bound work — database queries, HTTP calls, file reads — because those operations happen asynchronously. But it has two performance ceilings: it can only use one CPU core, and any CPU-intensive operation blocks the event loop for every other request.

This module covers how to remove both ceilings: clustering for multi-core utilisation, worker threads for CPU-bound work, PM2 for production process management, and profiling to find where time is actually being spent.

This is Module 32 of the Node.js Full‑Stack Developer course.

The Problem: Single-Threaded Bottlenecks

// This blocks the event loop — no other requests can be handled while it runs
app.get('/compute', (req, res) => {
  let result = 0;
  for (let i = 0; i < 10_000_000_000; i++) result += i; // 5-10 seconds
  res.json({ result });
});

While the loop runs, every other request to your server waits. This is the event loop blocking problem.

There are two solutions:

Clustering — run multiple Node.js processes so other processes handle requests while one is busy
Worker threads — move the CPU-intensive work off the main thread

Clustering with the cluster Module

// src/cluster.js
import cluster from 'cluster';
import os from 'os';
import { fileURLToPath } from 'url';

const numCPUs = os.availableParallelism(); // Node 19+, or os.cpus().length

if (cluster.isPrimary) {
  console.log(`Primary process ${process.pid} running`);
  console.log(`Spawning ${numCPUs} workers`);

  // Fork one worker per CPU core
  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  // Restart crashed workers
  cluster.on('exit', (worker, code, signal) => {
    console.warn(`Worker ${worker.process.pid} died (${signal || code}). Restarting...`);
    cluster.fork();
  });

  cluster.on('online', (worker) => {
    console.log(`Worker ${worker.process.pid} is online`);
  });
} else {
  // Workers run the Express app
  const { default: app } = await import('./app.js');
  const PORT = process.env.PORT ?? 3000;

  app.listen(PORT, () => {
    console.log(`Worker ${process.pid} listening on port ${PORT}`);
  });
}

Run with: node src/cluster.js

All workers share port 3000. The OS distributes incoming connections across them (round-robin on Linux).

PM2 Cluster Mode (Recommended for Production)

PM2 handles clustering without changing your application code:

// ecosystem.config.js
module.exports = {
  apps: [{
    name:            'myapp',
    script:          'src/server.js',
    instances:       'max',        // one per CPU core; or a number: 4
    exec_mode:       'cluster',
    env_file:        '.env.production',
    max_memory_restart: '500M',
    listen_timeout:  10000,        // ms to wait for app to start listening
    kill_timeout:    5000,         // ms to wait for graceful shutdown
    error_file:      '/var/log/myapp/error.log',
    out_file:        '/var/log/myapp/out.log',
    merge_logs:      true,
  }],
};

bash

pm2 start ecosystem.config.js

# Zero-downtime reload — sends SIGINT to old workers one at a time
# and waits for new workers to be ready before removing old ones
pm2 reload myapp

pm2 status          # shows each worker's CPU and memory
pm2 monit           # live dashboard

PM2 reload is the key production command — it restarts workers one by one, ensuring zero downtime during deployments.

Graceful Shutdown

PM2 sends SIGINT to workers during reload. Handle it to drain in-flight requests:

// src/server.js
import http from 'http';
import app from './app.js';
import { disconnectDB } from './config/db.js';
import redis from './lib/redis.js';

const server = http.createServer(app);
const PORT   = process.env.PORT ?? 3000;

server.listen(PORT, () => {
  console.log(`Process ${process.pid} listening on port ${PORT}`);
});

async function gracefulShutdown(signal) {
  console.log(`${signal} received — shutting down gracefully`);

  // Stop accepting new connections
  server.close(async () => {
    console.log('HTTP server closed');

    try {
      await disconnectDB();
      await redis.quit();
      console.log('Database and Redis connections closed');
      process.exit(0);
    } catch (err) {
      console.error('Error during shutdown:', err);
      process.exit(1);
    }
  });

  // Force exit after 10 seconds if server hasn't closed
  setTimeout(() => {
    console.error('Forced shutdown after timeout');
    process.exit(1);
  }, 10_000);
}

process.on('SIGINT',  () => gracefulShutdown('SIGINT'));
process.on('SIGTERM', () => gracefulShutdown('SIGTERM'));

Worker Threads for CPU-Intensive Tasks

Worker threads run in separate V8 isolates within the same process — they share memory via SharedArrayBuffer and communicate via postMessage:

// workers/hash.worker.js
import { parentPort, workerData } from 'worker_threads';
import bcrypt from 'bcrypt';

// Receive work from the main thread
const { password, rounds } = workerData;

const hash = await bcrypt.hash(password, rounds);

// Send result back
parentPort.postMessage({ hash });

// lib/worker-pool.js
import { Worker } from 'worker_threads';
import path from 'path';
import { fileURLToPath } from 'url';

const __dirname = path.dirname(fileURLToPath(import.meta.url));

/**
 * Run a task in a worker thread.
 * @param {string} workerScript - path to worker file
 * @param {object} data - data to pass to the worker
 */
export function runInWorker(workerScript, data) {
  return new Promise((resolve, reject) => {
    const worker = new Worker(path.resolve(__dirname, '..', workerScript), {
      workerData: data,
    });

    worker.on('message', resolve);
    worker.on('error',   reject);
    worker.on('exit', (code) => {
      if (code !== 0) reject(new Error(`Worker exited with code ${code}`));
    });
  });
}

// Usage in a route handler
import { runInWorker } from '../lib/worker-pool.js';

app.post('/hash', async (req, res, next) => {
  try {
    const { hash } = await runInWorker('workers/hash.worker.js', {
      password: req.body.password,
      rounds:   12,
    });
    res.json({ hash });
  } catch (err) {
    next(err);
  }
});

The event loop is not blocked — the hash runs in a worker thread and the main thread continues serving other requests.

Worker Thread Pool

For frequent CPU tasks, creating a new worker per request is expensive. Use a pool:

bash

npm install piscina    # battle-tested worker thread pool

// lib/hash-pool.js
import Piscina from 'piscina';
import { fileURLToPath } from 'url';
import path from 'path';

const __dirname = path.dirname(fileURLToPath(import.meta.url));

export const hashPool = new Piscina({
  filename: path.resolve(__dirname, '../workers/hash.worker.js'),
  maxThreads: 4,
  idleTimeout: 30_000,
});

// Usage
const { hash } = await hashPool.run({ password: 'secret', rounds: 12 });

Piscina manages a pool of workers, queuing tasks and dispatching to idle threads.

Memory Usage Monitoring

// middleware/memoryLogger.js
export function memoryLogger(req, res, next) {
  const { heapUsed, heapTotal, rss } = process.memoryUsage();
  const toMB = bytes => (bytes / 1024 / 1024).toFixed(1);

  // Log if heap usage exceeds 80%
  if (heapUsed / heapTotal > 0.8) {
    console.warn(`High memory: heap ${toMB(heapUsed)}MB / ${toMB(heapTotal)}MB | RSS ${toMB(rss)}MB`);
  }

  next();
}

// Health check with memory info
app.get('/health', (req, res) => {
  const mem   = process.memoryUsage();
  const toMB  = b => Math.round(b / 1024 / 1024);

  res.json({
    status:  'ok',
    pid:     process.pid,
    uptime:  Math.round(process.uptime()),
    memory: {
      heapUsed:  `${toMB(mem.heapUsed)}MB`,
      heapTotal: `${toMB(mem.heapTotal)}MB`,
      rss:       `${toMB(mem.rss)}MB`,
    },
  });
});

Profiling with --inspect

bash

# Start with inspector
node --inspect src/server.js

# Open Chrome → chrome://inspect → "Open dedicated DevTools for Node"
# Go to the "Performance" tab → Record → run load test → Stop

Load Testing with autocannon

bash

npm install -g autocannon

# Benchmark your endpoint
autocannon -c 100 -d 10 http://localhost:3000/api/posts
# -c 100: 100 concurrent connections
# -d 10: 10 second duration

text

┌─────────────────────────────────────────────────────────┐
│ Stat │ 2.5% │ 50%  │ 97.5% │ 99%  │ Avg    │ Req/sec │
│──────┼──────┼──────┼───────┼──────┼────────┼─────────│
│ Latency │ 5ms │ 8ms │ 22ms │ 45ms │ 9.2ms │         │
│ Requests │    │      │       │      │        │ 10,840  │
└─────────────────────────────────────────────────────────┘

Key Performance Optimisations

Response Compression

bash

npm install compression

import compression from 'compression';
app.use(compression()); // compress all responses > 1kb

Connection Keep-Alive

// Increase keep-alive timeout (default is 5 seconds — too short for AWS ALB)
server.keepAliveTimeout    = 65_000; // must be > ALB idle timeout (60s)
server.headersTimeout      = 66_000; // must be > keepAliveTimeout

Mongoose Connection Pool

mongoose.connect(uri, {
  maxPoolSize: 10,        // default: 5; increase for high concurrency
  minPoolSize: 2,
  socketTimeoutMS: 30_000,
  serverSelectionTimeoutMS: 5_000,
});

Avoid Synchronous Operations in Requests

// ❌ Blocks event loop
const data = fs.readFileSync('file.json');

// ✅ Non-blocking
const data = await fs.promises.readFile('file.json', 'utf8');

Common Memory Leak Patterns

// ❌ Event listener added inside request handler — never removed
app.get('/subscribe', (req, res) => {
  emitter.on('data', (data) => res.write(data)); // adds listener on every request!
});

// ✅ Remove listener when connection closes
app.get('/subscribe', (req, res) => {
  const handler = (data) => res.write(data);
  emitter.on('data', handler);
  req.on('close', () => emitter.off('data', handler));
});

// ❌ Unbounded cache — grows forever
const cache = {};
app.get('/data/:id', (req, res) => {
  cache[req.params.id] = fetchData(req.params.id); // never evicted
});

// ✅ Use a bounded LRU cache
import { LRUCache } from 'lru-cache';
const cache = new LRUCache({ max: 500, ttl: 1000 * 60 }); // max 500 items, 1min TTL

Node.js Full‑Stack Course — Module 32 of 32

You have mastered Node.js performance. Continue to the final module to test everything you have learned.

Summary

Scaling a Node.js application requires addressing both throughput and CPU bottlenecks:

Cluster mode (PM2 instances: 'max') spawns one worker per CPU core — the easiest path to full CPU utilisation with no code changes
Graceful shutdown listens for SIGINT/SIGTERM, drains in-flight requests, and closes DB connections cleanly — required for zero-downtime PM2 reloads
Worker threads move CPU-intensive operations off the main event loop — use Piscina for a managed pool
Profiling with --inspect and load testing with autocannon reveals real bottlenecks before they hit production
Enable gzip compression, tune MongoDB connection pool, and set keep-alive timeouts correctly for cloud load balancers
Watch for event listener leaks and unbounded caches — both cause steadily growing heap usage that eventually crashes the process
Monitor memory via process.memoryUsage() in the health check endpoint and pm2 monit in production

Continue to Module 33: Final Knowledge Test →