DB2 EXPLAIN and Query Optimization: Complete Performance Guide

DB2 EXPLAIN and Query Optimization
Query performance is one of the most critical skills for any DB2 developer or DBA. A slow query in a CICS transaction can bring an entire system to its knees; a missing index on a high-volume batch program can turn a 2-minute job into a 2-hour one. DB2's EXPLAIN facility exposes exactly how the optimizer plans to execute your SQL — and understanding it is the foundation of all performance tuning.
How the DB2 Optimizer Works
The DB2 optimizer is a cost-based optimizer. When you submit a SQL statement, the optimizer:
- Parses and validates the SQL syntax and semantics
- Enumerates possible access plans (which indexes to use, join orders, join methods)
- Estimates the cost of each plan using catalog statistics (row counts, column cardinalities, index clustering)
- Selects the plan with the lowest estimated cost
- Stores the plan in a package or plan (at BIND time) or compiles it dynamically
The quality of the optimizer's decisions depends entirely on the quality of the statistics. Outdated statistics lead to poor estimates, which lead to poor access plans and slow queries.
Setting Up EXPLAIN on DB2 LUW
On DB2 LUW, create the EXPLAIN tables in your schema:
-- Create all explain tables in your current schema
CALL SYSPROC.SYSINSTALLOBJECTS('EXPLAIN', 'C', NULL, CURRENT SCHEMA);Now run EXPLAIN:
EXPLAIN PLAN FOR
SELECT E.EMPNO, E.LASTNAME, D.DEPTNAME
FROM EMPLOYEE E
JOIN DEPARTMENT D ON E.WORKDEPT = D.DEPTNO
WHERE E.SALARY > 70000
ORDER BY E.SALARY DESC;
-- View the plan
SELECT OPERATOR_TYPE, OBJECT_SCHEMA, OBJECT_NAME, ESTIMATED_CARDINALITY, IO_COST
FROM EXPLAIN_OPERATOR
ORDER BY OPERATOR_ID;Setting Up EXPLAIN on DB2 for z/OS
On z/OS, PLAN_TABLE must exist in your schema before running EXPLAIN:
CREATE TABLE PLAN_TABLE (
QUERYNO INTEGER NOT NULL,
QBLOCKNO SMALLINT NOT NULL,
APPLNAME CHAR(8),
PROGNAME CHAR(8),
PLANNO SMALLINT NOT NULL,
METHOD SMALLINT NOT NULL,
CREATOR CHAR(8),
TNAME CHAR(18),
TABNO SMALLINT NOT NULL,
ACCESSTYPE CHAR(2),
MATCHCOLS SMALLINT NOT NULL,
ACCESSCREATOR CHAR(8),
ACCESSNAME CHAR(18),
PREFETCH CHAR(1),
COLUMN_FN_EVAL CHAR(1),
MIXOPSEQ SMALLINT NOT NULL,
-- ... additional columns
PRIMARY KEY (QUERYNO, QBLOCKNO, PLANNO, MIXOPSEQ)
);Then run:
EXPLAIN PLAN SET QUERYNO = 1 FOR
SELECT EMPNO, LASTNAME, SALARY
FROM EMPLOYEE
WHERE WORKDEPT = 'A00'
ORDER BY SALARY DESC;
SELECT TNAME, ACCESSTYPE, MATCHCOLS, ACCESSNAME, PREFETCH
FROM PLAN_TABLE
WHERE QUERYNO = 1
ORDER BY PLANNO;Reading the PLAN_TABLE
ACCESSTYPE — How DB2 Reaches the Data
| ACCESSTYPE | Meaning | Performance |
|---|---|---|
| I | Index scan | Good — selective queries |
| I1 | One-fetch index scan (unique key) | Excellent |
| N | Index scan, non-matching rows filtered | OK |
| R | Tablespace scan (full scan) | Poor for large tables |
| M | Multiple index access (RID list) | Situational |
MATCHCOLS — Index Match Depth
MATCHCOLS tells you how many leading columns of the index are being used to filter data:
- MATCHCOLS = 0: The index is used but only for ordering, not filtering
- MATCHCOLS = 1: The first index column matches a predicate
- MATCHCOLS = n: The first n columns match predicates
Higher MATCHCOLS generally means fewer rows examined.
PREFETCH — I/O Strategy
| PREFETCH | Meaning |
|---|---|
| S | Sequential prefetch (reads ahead in large blocks) |
| L | List prefetch (collects RIDs first, then fetches in order) |
| D | Dynamic prefetch |
| (blank) | No prefetch |
Common Performance Problems and Fixes
Problem 1: Tablespace Scan on a Large Table
-- Slow: full tablespace scan
SELECT * FROM EMPLOYEE WHERE SALARY > 70000;EXPLAIN shows ACCESSTYPE = 'R'. Fix: create an index on SALARY:
CREATE INDEX EMP_SALARY_IDX ON EMPLOYEE (SALARY);After creating the index and running RUNSTATS, re-EXPLAIN to verify ACCESSTYPE changes to 'I'.
Problem 2: Index Exists but Not Used
If EXPLAIN still shows a tablespace scan after adding an index, possible reasons:
- Query is not selective enough: If 40% of rows qualify, a tablespace scan is faster than an index + data page fetch
- Statistics are outdated: Run RUNSTATS to refresh
- Implicit type conversion:
WHERE EMPNO = 10when EMPNO is CHAR(6) — fix withWHERE EMPNO = '000010' - Function on indexed column:
WHERE YEAR(HIREDATE) = 2020prevents index use — rewrite asWHERE HIREDATE BETWEEN '2020-01-01' AND '2020-12-31'
Problem 3: Sort Required for ORDER BY
When SORTN_JOIN or SORTU_UNIQ is 'Y' in PLAN_TABLE, DB2 is performing an explicit sort. This is expensive for large result sets. Fix: create an index that matches both the filter and the ORDER BY:
-- Query with both filter and sort
SELECT EMPNO, LASTNAME FROM EMPLOYEE
WHERE WORKDEPT = 'A00'
ORDER BY SALARY DESC;
-- Index that avoids the sort (leading column = filter, trailing = sort order)
CREATE INDEX EMP_DEPT_SAL_IDX ON EMPLOYEE (WORKDEPT, SALARY DESC);Problem 4: Cartesian Product (Missing JOIN Condition)
-- Missing JOIN condition produces a Cartesian product
SELECT E.LASTNAME, D.DEPTNAME
FROM EMPLOYEE E, DEPARTMENT D; -- missing WHERE E.WORKDEPT = D.DEPTNOEXPLAIN shows METHOD = 4 (nested loop join with no matching columns). This returns every combination of rows — exponentially more data than intended.
OPTIMIZE FOR n ROWS
Tell DB2 to optimise for fetching the first n rows quickly rather than minimising overall cost:
-- Optimise for interactive display of top 25 results
SELECT EMPNO, LASTNAME, SALARY
FROM EMPLOYEE
ORDER BY SALARY DESC
OPTIMIZE FOR 25 ROWS;This is especially useful in CICS programs where you display the first page of results and users rarely scroll further.
Visual Explain (DB2 LUW)
IBM Data Studio and Db2 Data Management Console provide a Visual Explain feature that renders the access plan as a graphical tree. Each node shows the operation type, estimated cardinality, and cost. This is often faster to interpret than reading the flat PLAN_TABLE output.
To use Visual Explain in IBM Data Studio:
- Right-click on a SQL statement in the SQL Editor
- Select Run Explain
- Switch to the Explain Plan tab to see the graphical view
Explain for Dynamic SQL (RUNSTATS-Aware)
After running RUNSTATS, dynamic SQL queries will use updated statistics on the next execution. For static SQL (embedded in COBOL packages), you must REBIND the package:
//REBIND EXEC PGM=IKJEFT01
//SYSTSIN DD *
DSN SYSTEM(DSN1)
REBIND PACKAGE(EMPPKG.EMPRPT(*)) ACTION(REPLACE)
END
/*DB2 Accounting and Statistics Traces
For production performance problems, EXPLAIN only shows the planned path. The actual runtime behaviour is captured in DB2 traces:
Accounting Trace (Class 1): Per-transaction summary — DB2 CPU, elapsed time, lock waits, I/O counts.
Accounting Trace (Class 3): Includes wait time broken down by lock waits, I/O waits, and DB2 internal waits.
Statistics Trace: System-level — buffer pool hit rates, total lock waits, log I/O.
On z/OS, these traces write to SMF records processed by IBM Omegamon, OMPE (DB2 Performance Expert), or Query Monitor. On DB2 LUW, use the db2pd command or the activity monitor.
Quick Performance Checklist
Before escalating a slow query, work through this checklist:
- Run EXPLAIN and check for tablespace scans on large tables
- Run RUNSTATS if statistics are more than a week old for volatile tables
- Check that predicates on indexed columns use the correct data type (no implicit conversion)
- Verify that functions on indexed columns are not preventing index use
- Check MATCHCOLS — is the leading index column being used?
- For ORDER BY, check if a composite index can eliminate the sort
- For joins, check that JOIN conditions match indexed columns
- For z/OS packages, run REBIND after RUNSTATS to apply new statistics
Next Steps
Query optimisation depends heavily on indexes. Read the DB2 Indexes Guide for a complete treatment of index types, creation strategies, and when indexes hurt rather than help. For the full learning path, visit the DB2 Mastery course hub.
