How do I find the slowest queries on a PostgreSQL database?

Enable pg_stat_statements and query it to find queries with the highest average execution time, total time, or call count. High total time with high call count indicates a good optimisation or caching target.

← Back to SQL Mastery

SQL Project: Performance Tuning & Optimization Lab

Q: What is the first thing to check when a SQL query is unexpectedly slow?

Run EXPLAIN ANALYZE and look for sequential scans on large tables, large discrepancies between estimated and actual row counts indicating stale statistics, and nested loop joins applied to large datasets.

Q: What is a SARGable predicate and why does it matter?

A SARGable predicate allows index seeks by operating on the raw column value without wrapping it in a function. Non-SARGable predicates like applying YEAR() to a date column prevent index use and force a full table scan.

Welcome to the Phase 4 Capstone of the SQL Masterclass. You have secured the fortress, balanced the ledger, and mastered the analytical stream. Now, you will fix the engine.

This 1,500+ word project guide is designed as a "War Room" simulation. You are the Senior Performance Architect at "HyperScale," a global logistics giant. Their central database is hitting 100% CPU usage, the connection pool is exhausted, and the site is slowing down for millions of customers. Your mission is to find the "Poison Queries," neutralize them, and tune the physical hardware mirror for maximum throughput.

1. Project Goal: The "Millisecond Mirror" Standard

Most developers think "Fast enough" is when the result appears on their screen. In production, "Fast enough" is determined by Concurrency. If a query takes $200$ms on your laptop but runs $1,000$ times a second in production, you have a CPU apocalypse.

Your goal is to reach the "Sub-10ms Standard" for all core transactional queries and the "Sub-500ms Standard" for complex analytical reports.

2. Task 1: The "Sequential Scan" Massacre (EXPLAIN Lab)

A core search query is taking $1.8$ seconds, locking rows and causing a massive backlog in the connection pool.

The Mission

Use EXPLAIN (ANALYZE, BUFFERS) to diagnose the problem in the shipping_logs table (500M rows).

The Hardware-Mirror: Random I/O vs. Sequential

You must identify the Sequential Scan. This means the database is physically reading every single block of data from the SSD.

The Metric: Look for Buffers: Shared Read. If the number is huge, your query is "Disk Bound."
The Fix: Design a Composite B-Tree Index on (status, priority, created_at).

2.1 The Physics of Write-Ahead Logs (WAL) Bottlenecks

In high-concurrency environments, your bottleneck isn't always the "Read." It is the WAL (Write-Ahead Log).

The Physics: Every change in your database is written sequentially to the WAL before the transaction is confirmed.
The Bottleneck: If your WAL disk (the physical mirror of your intent) is slow, your entire database will wait on "I/O Wait" during every INSERT or UPDATE.
The Architect's Solution: Separate your WAL logs onto a dedicated high-speed NVMe drive. This ensures that massive "Data Reads" don't interfere with the mission-critical "Intent Writes."

3. Task 2: Refactoring the Correlated "Killer"

You find a "Reporting Query" that calculates "Average order value per user" using a correlated subquery in the SELECT list.

The Problem

This triggers an O(N^2) Nested Loop. For every user in the table, the database is running a completely new search for their orders. On a $100,000$ user table, that's $100,000$ extra queries.

The Mission

Refactor this "Scalar Subquery" into a JOIN with an Aggregated CTE (Common Table Expression).

4. Task 3: Index Pruning (The Partial Mirror)

Your orders table has $15$ indexes. While this makes searching fast, it makes every "Insert" take $400$ms because the database has to update $15$ different physical files on the disk for every row.

4.1 The Physics of Index Bloom: The Silent Performance Killer

Why does having too many indexes destroy your database?

The Physics: Every index is a separate B-Tree Structure on your SSD. When you insert a single row into a table with 15 indexes, the database engine must perform 15 separate "Write" operations across 15 different physical locations on the disk.
The Latency Tax: This causes massive Non-Sequential I/O. Your SSD's write-head (logical) is bouncing across the disk surface, causing "Wait I/O" to spike and locking the table's "Write Lock" for much longer than necessary.
The Tip: Use pg_stat_user_indexes to find "Dead Indexes" (where idx_scan = 0). If an index hasn't been used in a month, delete it to save CPU/Disk cycles.

The Optimization: Partial Indexing

Convert your massive idx_status into a Partial Index.

sql

CREATE INDEX idx_active_orders ON orders (customer_id, created_at)
WHERE status IN ('PENDING', 'SHIPPING', 'IN_TRANSIT');

5. Task 4: The Parallel Query Mirror

Modern servers have $64+$ CPU cores, but by default, many databases use only one core for a single query.

5.1 The Autovacuum Mirror: High-Concurrency Maintenance

A "Fast" database becomes "Slow" if maintenance is ignored.

The Physics of Bloat: As rows are updated, "Dead Tuples" accumulate. If Autovacuum can't keep up, your table size doubles, and your L2 cache hits drop to zero.

6. Task 5: Go Integration (The Benchmarking Engine)

To prove your optimizations work in the "Real World," you must build a benchmarking script.

The Task

Write a small Go utility that launches $50$ concurrent goroutines, each hammering the database with your "Slow" query vs. your "Optimized" query.

func RunBenchmark(db *sql.DB, query string, concurrency int) {
    var wg sync.WaitGroup
    start := time.Now()
    for i := 0; i < concurrency; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            _, err := db.Exec(query)
            if err != nil {
                log.Printf("Query error: %v", err)
            }
        }()
    }
    wg.Wait()
    fmt.Printf("Total time for %d concurrent queries: %v\n", concurrency, time.Since(start))
}

6.1 Architecting for High-Throughput: The Sharding Performance Mirror

When a single server hits the "Vertical Limit" (CPU is at 100% despite all tuning), you must move to Sharding.

The Data Partition Physics: By splitting your 1 Billion row table into 10 shards of 100M rows each across 10 different machines, you parallelize the Hardware Bottleneck.
The Performance Win: Instead of one SSD reading 1TB, 10 SSDs read 100GB simultaneously. This is the only way to scale to "Internet-Scale" where you process millions of requests per second.

7. Fixing "Index Bloat" (Zero-Downtime Maintenance)

Indexes are not static. After millions of updates, they become "Fragmented" or "Bloated," leaving huge gaps of empty space on the disk.

7.1 Memory-Mapped I/O and the OS Page Cache

Why does a "Warm" database outperform a "Cold" database?

The Physics: Databases like Postgres use Buffered I/O, while others use Direct I/O.
The Page Cache Mirror: When you read a page from the SSD, the OS stores it in the Kernel Page Cache. Subsequent reads are "Memory-to-Memory" copies, which are $1,000x$ faster than disk reads.
The Task: You must tune shared_buffers to ensure it is large enough to hold your "Working Set" of active indexes, but small enough to leave room for the OS Page Cache.

8. Summary: The Performance Success Criteria

Index-Only Success: Demonstrate at least one query where the Execution Plan shows an Index-Only Scan, meaning the table heap was never touched.
Nested Loop Elimination: Prove that all O(N^2) subqueries have been refactored into Hash or Merge joins.
Cache Hit Superiority: Show a "Shared Hit" ratio of $>98%$ for your primary transaction table.
Concurrency Validation: Use your Go script to prove that your optimizations reduced "Connection Wait Time" under load.
WAL Latency Check: Verify that the "I/O Wait" during high-concurrency inserts is under 5ms.

8.1 Memory Locality and CPU Affinitization

The final frontier of performance is NUMA (Non-Uniform Memory Access).

The Physics: On a server with 2 CPU sockets, each CPU has its "Own" RAM. If CPU 1 tries to read data from RAM attached to CPU 2, there is a latency penalty (The NUMA Hop).
Modern Mastery: High-performance database architects "Affinitize" processes so they only run on a specific socket and only use local RAM, eliminating the NUMA hop and reaching the absolute physical limit of the silicon.

8.2 Hardware Interrupt Coalescing: The System Mirror

When your database is processing 100,000 packets per second, the CPU can be overwhelmed by "Interrupts."

The Physics: For every packet, the NIC (Network Card) "Interrupts" the CPU.
The Tuning: We use Interrupt Coalescing to tell the NIC: "Wait until you have 50 packets, then interrupt me once."
The Result: This reduces CPU context switching and leaves more cycles for the Postgres Query Executor, often cutting total CPU usage by 10% on high-traffic nodes.

This project transitions you from "Solving bugs" to "Optimizing Reality." By mastering the physics of execution plans and the geometry of index storage, you gain the power to save your company millions of dollars in infrastructure costs and provide a sub-second experience to millions of users.

Phase 21: Deliverables

A comparison report of EXPLAIN ANALYZE plans (Before/After Task 1).
Refactored SQL code using CTEs and Hash-Joins for Task 2.
A "Partial Indexing" strategy document for the orders table.
Advanced Goal: A tuned postgresql.conf fragment for high-concurrency autovacuum and WAL performance.
Benchmarking results from your Go concurrency script.
Extra Credit: Propose a Sharding Key strategy based on the customer_id distribution.
Network Audit: Verify that Interrupt Coalescing is enabled on the primary database interface.

Read next: SQL Views and Materialization: The Cached Mirror ->(Coming soon)

Frequently Asked Questions

Q: What is the first thing to check when a SQL query is unexpectedly slow?

Run EXPLAIN ANALYZE and look for sequential scans on large tables, high estimated vs actual row count discrepancies (stale statistics - run ANALYZE tablename), and nested loop joins on large datasets. Check whether the query filters on an indexed column and whether the index is actually being used. Verify the WHERE clause predicate is SARGable (Search ARGument able) - conditions like WHERE LOWER(email) = 'x' cannot use an index on email; a functional index or storing data in consistent case solves this. Most slow queries are explained by a missing index, stale statistics, or a non-SARGable predicate.

Q: What is a SARGable predicate and why does it matter for index performance?

A SARGable (Search ARGument able) predicate can be resolved using an index seek - the database can go directly to the matching rows. WHERE created_at > '2024-01-01' is SARGable on an index on created_at. Non-SARGable predicates prevent index use: WHERE YEAR(created_at) = 2024 wraps the column in a function, forcing a full scan. Fix: rewrite as WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01'. Similarly, WHERE col LIKE '%suffix' (leading wildcard) is not SARGable, but WHERE col LIKE 'prefix%' is. Always write predicates to operate on the raw column value, not a transformed version.

Q: How do I find the slowest queries running on a PostgreSQL database?

Enable `pg_stat_statements` (add it to `shared_preload_libraries` and run `CREATE EXTENSION pg_stat_statements`). Query it: `SELECT query, mean_exec_time, calls, total_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 20`. This shows the top queries by average execution time, total time, and call count. High `total_exec_time` with low `mean_exec_time` but high `calls` indicates a query that is individually fast but collectively expensive - a good caching candidate. High `mean_exec_time` indicates a query that needs optimisation. Reset statistics with `SELECT pg_stat_statements_reset()` after making changes to measure improvement.

Part of the SQL Mastery Course - engineering the speed.