DB2 Subqueries and CTEs: Complete Guide with Practical Examples

DB2 Subqueries and CTEs: Complete Guide with Practical Examples
If you work with IBM DB2 — whether on z/OS or LUW — writing a DB2 subquery is something you will do every day. Subqueries let you embed one SELECT inside another, enabling you to filter rows based on derived values, cross-reference tables without explicit joins, and express complex business logic in a single SQL statement. Common Table Expressions (CTEs), introduced with the WITH clause, extend that power further: they let you name a result set, reuse it, and even make it recursive for hierarchical queries.
This guide walks through every major subquery type in DB2, the full syntax of the WITH clause for CTEs, recursive CTE patterns for hierarchical data, performance considerations, and a set of practical examples you can adapt directly to your own work.
Understanding DB2 Subqueries
A subquery is a SELECT statement enclosed in parentheses that appears inside another SQL statement. DB2 supports subqueries in the WHERE clause, the FROM clause (as a derived table), the SELECT list (scalar subquery), and inside INSERT, UPDATE, and DELETE statements.
The outer statement that contains the subquery is called the outer query or parent query. The subquery is also called an inner query.
Scalar Subqueries
A scalar subquery returns exactly one row and one column. You can use it anywhere a single value is expected — inside a SELECT list, a WHERE condition, or even in a SET clause of an UPDATE.
-- Find each employee's salary compared to the company-wide average
SELECT EMPNO,
LASTNAME,
SALARY,
(SELECT AVG(SALARY) FROM DSN8C10.EMP) AS AVG_SALARY,
SALARY - (SELECT AVG(SALARY) FROM DSN8C10.EMP) AS DIFF_FROM_AVG
FROM DSN8C10.EMP
ORDER BY SALARY DESC;If the scalar subquery returns more than one row at runtime, DB2 raises an error (SQLCODE -811). If it returns no rows, it evaluates to NULL. Always ensure the inner query is guaranteed to produce exactly one row, typically by using aggregate functions (AVG, MAX, COUNT, etc.) without a GROUP BY, or by filtering on a unique key.
Subqueries in the WHERE Clause
The most common placement for a DB2 subquery is inside the WHERE clause. DB2 supports four main predicates that work with subqueries here: IN, NOT IN, EXISTS/NOT EXISTS, and the quantified predicates ANY/ALL.
Subquery with IN
IN tests whether the outer column value matches any value in the subquery result set. The subquery must return a single column.
-- Find all employees who work in departments located in New York
SELECT EMPNO, LASTNAME, WORKDEPT
FROM DSN8C10.EMP
WHERE WORKDEPT IN (
SELECT DEPTNO
FROM DSN8C10.DEPT
WHERE LOCATION = 'NEW YORK'
);NOT IN is the inverse — it returns rows where the outer value does not appear in the subquery result. Be careful with NOT IN when the subquery might return NULL values; if any row in the subquery result is NULL, NOT IN returns no rows at all due to three-valued logic. Use NOT EXISTS when nulls are a concern.
Subquery with EXISTS
EXISTS tests whether the subquery returns at least one row. It is typically faster than IN on large result sets because DB2 stops scanning as soon as the first matching row is found.
-- Find departments that have at least one manager assigned
SELECT DEPTNO, DEPTNAME
FROM DSN8C10.DEPT D
WHERE EXISTS (
SELECT 1
FROM DSN8C10.EMP E
WHERE E.WORKDEPT = D.DEPTNO
AND E.JOB = 'MANAGER'
);Notice the SELECT 1 — the actual select list inside an EXISTS subquery is irrelevant because DB2 only checks for row existence, not values. Using SELECT 1 or SELECT * makes no performance difference.
NOT EXISTS
NOT EXISTS is the correct way to find rows in one table that have no matching rows in another — the "anti-join" pattern.
-- Find departments that have NO employees assigned
SELECT DEPTNO, DEPTNAME
FROM DSN8C10.DEPT D
WHERE NOT EXISTS (
SELECT 1
FROM DSN8C10.EMP E
WHERE E.WORKDEPT = D.DEPTNO
);Quantified Predicates: ANY and ALL
ANY (synonym: SOME) returns true if the comparison is true for at least one value in the subquery result. ALL returns true only if the comparison is true for every value in the subquery result.
-- Employees earning more than ANY employee in department A00
SELECT EMPNO, LASTNAME, SALARY
FROM DSN8C10.EMP
WHERE SALARY > ANY (
SELECT SALARY
FROM DSN8C10.EMP
WHERE WORKDEPT = 'A00'
);
-- Employees earning more than ALL employees in department A00
-- (i.e., more than the maximum salary in A00)
SELECT EMPNO, LASTNAME, SALARY
FROM DSN8C10.EMP
WHERE SALARY > ALL (
SELECT SALARY
FROM DSN8C10.EMP
WHERE WORKDEPT = 'A00'
);> ALL is equivalent to > MAX(...) and > ANY is equivalent to > MIN(...). The scalar subquery versions with MAX/MIN are often cleaner and equally efficient.
Correlated Subqueries
A correlated subquery references one or more columns from the outer query. This creates a dependency — the subquery cannot run independently; it is re-evaluated for every row the outer query processes.
-- Find employees who earn more than the average salary in their own department
SELECT EMPNO, LASTNAME, WORKDEPT, SALARY
FROM DSN8C10.EMP E1
WHERE SALARY > (
SELECT AVG(SALARY)
FROM DSN8C10.EMP E2
WHERE E2.WORKDEPT = E1.WORKDEPT -- correlation: references outer alias E1
)
ORDER BY WORKDEPT, SALARY DESC;In this example, the inner subquery computes the average salary for the department of the current outer row (E1.WORKDEPT). DB2's optimizer frequently rewrites correlated subqueries as joins, so the physical execution plan may not actually repeat the inner query for every outer row — but it is good practice to verify this with EXPLAIN.
Correlated Subquery in UPDATE
Correlated subqueries are extremely useful in UPDATE and DELETE statements.
-- Give a 10% raise to the top earner in each department
UPDATE DSN8C10.EMP E1
SET SALARY = SALARY * 1.10
WHERE SALARY = (
SELECT MAX(SALARY)
FROM DSN8C10.EMP E2
WHERE E2.WORKDEPT = E1.WORKDEPT
);Subqueries in the FROM Clause (Derived Tables)
When a subquery appears in the FROM clause, it is called a derived table or inline view. DB2 materialises the subquery result into a temporary result set, then treats it like a regular table for the outer query. You must give it an alias.
-- Derived table: average salary per department, then filter departments
-- where the average is above the company-wide average
SELECT DEPT_AVG.WORKDEPT,
DEPT_AVG.AVG_SAL
FROM (
SELECT WORKDEPT,
AVG(SALARY) AS AVG_SAL
FROM DSN8C10.EMP
GROUP BY WORKDEPT
) AS DEPT_AVG
WHERE DEPT_AVG.AVG_SAL > (
SELECT AVG(SALARY)
FROM DSN8C10.EMP
)
ORDER BY DEPT_AVG.AVG_SAL DESC;Derived tables are particularly useful when you need to apply a WHERE or HAVING filter to an aggregation that cannot be expressed directly in a single query level. They also allow you to reference column aliases defined in the subquery, which is not possible with a simple WHERE clause at the same query level.
Common Table Expressions (CTEs) — The WITH Clause
A Common Table Expression is defined using the WITH keyword before the main SELECT statement. It creates a named result set that lives for the duration of the query. Think of it as giving a name to a derived table so you can reference it more than once, and so you can read the query logic from top to bottom rather than inside-out.
Basic CTE Syntax
WITH cte_name (col1, col2, ...) AS (
-- subquery that defines the CTE
SELECT ...
FROM ...
WHERE ...
)
SELECT col1, col2
FROM cte_name
WHERE ...;The column list after cte_name is optional when the subquery's output columns are already named.
Single CTE Example: Top Earners per Department
WITH DEPT_STATS AS (
SELECT WORKDEPT,
MAX(SALARY) AS MAX_SAL,
AVG(SALARY) AS AVG_SAL,
COUNT(*) AS HEADCOUNT
FROM DSN8C10.EMP
GROUP BY WORKDEPT
)
SELECT E.EMPNO,
E.LASTNAME,
E.WORKDEPT,
E.SALARY,
D.MAX_SAL,
D.AVG_SAL,
D.HEADCOUNT
FROM DSN8C10.EMP E
JOIN DEPT_STATS D ON E.WORKDEPT = D.WORKDEPT
WHERE E.SALARY = D.MAX_SAL
ORDER BY E.WORKDEPT;This is far more readable than nesting the derived table inline, especially as query complexity grows.
Multiple CTEs in a Single WITH Clause
You can define several CTEs in the same WITH clause, separating them with commas. Later CTEs can reference earlier ones.
WITH
-- CTE 1: department-level salary statistics
DEPT_STATS AS (
SELECT WORKDEPT,
AVG(SALARY) AS AVG_SAL,
MAX(SALARY) AS MAX_SAL
FROM DSN8C10.EMP
GROUP BY WORKDEPT
),
-- CTE 2: employees who earn above their department average
HIGH_EARNERS AS (
SELECT E.EMPNO,
E.LASTNAME,
E.WORKDEPT,
E.SALARY,
D.AVG_SAL
FROM DSN8C10.EMP E
JOIN DEPT_STATS D ON E.WORKDEPT = D.WORKDEPT
WHERE E.SALARY > D.AVG_SAL
),
-- CTE 3: department names joined in
DEPT_NAMES AS (
SELECT DEPTNO, DEPTNAME
FROM DSN8C10.DEPT
)
SELECT H.EMPNO,
H.LASTNAME,
N.DEPTNAME,
H.SALARY,
H.AVG_SAL,
DECIMAL(H.SALARY / H.AVG_SAL * 100, 6, 2) AS PCT_OF_AVG
FROM HIGH_EARNERS H
JOIN DEPT_NAMES N ON H.WORKDEPT = N.DEPTNO
ORDER BY PCT_OF_AVG DESC;Recursive CTEs in DB2
Recursive CTEs allow a CTE to reference itself, making it possible to traverse hierarchical data such as organisational charts, bill-of-materials structures, or any parent-child relationship stored in a table. DB2 requires that the recursion uses UNION ALL (not UNION), and the recursive member may reference the CTE name only once.
Recursive CTE Structure
WITH RECURSIVE cte_name (col1, col2, ...) AS (
-- Anchor member: the base case (non-recursive)
SELECT ...
FROM ...
WHERE <starting condition>
UNION ALL
-- Recursive member: references cte_name
SELECT ...
FROM table
JOIN cte_name ON <join condition>
)
SELECT * FROM cte_name;Note: On DB2 for z/OS, the RECURSIVE keyword is optional — DB2 automatically detects recursion. On DB2 LUW, it is also optional but recommended for clarity.
Org Hierarchy Example
Suppose the DEPT table has a MGRNO column that points to the employee number of the department manager, and a ADMRDEPT column that contains the parent department code. We want to list the full management chain from a given starting department downward.
WITH DEPT_HIERARCHY (DEPTNO, DEPTNAME, ADMRDEPT, LEVEL) AS (
-- Anchor: start from the top-level department
SELECT DEPTNO,
DEPTNAME,
ADMRDEPT,
0 AS LEVEL
FROM DSN8C10.DEPT
WHERE ADMRDEPT = DEPTNO -- top-level departments report to themselves
UNION ALL
-- Recursive: find child departments
SELECT D.DEPTNO,
D.DEPTNAME,
D.ADMRDEPT,
H.LEVEL + 1
FROM DSN8C10.DEPT D
JOIN DEPT_HIERARCHY H ON D.ADMRDEPT = H.DEPTNO
AND D.DEPTNO <> H.DEPTNO -- prevent self-join loop
)
SELECT LEVEL,
REPEAT(' ', LEVEL) || DEPTNO AS INDENTED_DEPT,
DEPTNAME
FROM DEPT_HIERARCHY
ORDER BY LEVEL, DEPTNO;The LEVEL column tracks how deep in the hierarchy each row sits. REPEAT(' ', LEVEL) creates visual indentation in the output. DB2 will terminate the recursion automatically when no more child rows are found.
Running Totals with a Recursive CTE
Recursive CTEs can also compute running totals, though window functions (covered in a related post) are usually more efficient for this:
WITH MONTHLY_SALES (MONTH_NUM, MONTHLY_AMT, RUNNING_TOTAL) AS (
-- Anchor: first month
SELECT MONTH_NUM,
MONTHLY_AMT,
MONTHLY_AMT AS RUNNING_TOTAL
FROM SALES_SUMMARY
WHERE MONTH_NUM = 1
UNION ALL
-- Recursive: add each subsequent month
SELECT S.MONTH_NUM,
S.MONTHLY_AMT,
R.RUNNING_TOTAL + S.MONTHLY_AMT
FROM SALES_SUMMARY S
JOIN MONTHLY_SALES R ON S.MONTH_NUM = R.MONTH_NUM + 1
)
SELECT MONTH_NUM,
MONTHLY_AMT,
RUNNING_TOTAL
FROM MONTHLY_SALES
ORDER BY MONTH_NUM;CTE vs Subquery: When to Use Which
Both CTEs and subqueries can express the same logic in most cases. The choice comes down to readability, reusability, and in some situations, performance.
| Factor | Subquery | CTE |
|---|---|---|
| Readability | Harder to read when deeply nested | Top-down, easy to follow |
| Reuse in same query | Must be duplicated | Define once, reference many times |
| Recursion | Not supported | Supported with UNION ALL |
| Query optimiser | Usually same plan | Usually same plan |
| Debugging | Hard to isolate | Each CTE can be tested separately |
| DB2 z/OS compatibility | All versions | DB2 V8 and later |
Use a subquery when:
- The logic is simple and used only once.
- You need an uncorrelated filter that is easy to read inline.
- You are on a very old DB2 version that predates CTE support.
Use a CTE when:
- You reference the same derived result set multiple times.
- The query has multiple layers of aggregation that would be heavily nested.
- You need recursion for hierarchical data.
- Code maintainability and readability are priorities (production code, audited COBOL programs, etc.).
Performance Considerations
EXPLAIN Is Your Friend
Always run EXPLAIN after writing complex subquery or CTE logic. DB2's optimizer is sophisticated and will often rewrite subqueries as joins internally, but this is not guaranteed. Check the PLANNO, TNAME, and ACCESSTYPE columns in DSN_STATEMNT_TABLE and DSN_FILTER_TABLE to understand how DB2 plans to execute the query.
-- On DB2 for z/OS, populate the PLAN_TABLE:
EXPLAIN PLAN SET QUERYNO = 100 FOR
WITH TOP_EARNERS AS (
SELECT WORKDEPT, MAX(SALARY) AS MAX_SAL
FROM DSN8C10.EMP
GROUP BY WORKDEPT
)
SELECT E.EMPNO, E.LASTNAME, E.SALARY
FROM DSN8C10.EMP E
JOIN TOP_EARNERS T ON E.WORKDEPT = T.WORKDEPT
AND E.SALARY = T.MAX_SAL;
-- Review the plan
SELECT PLANNO, TNAME, ACCESSTYPE, MATCHCOLS, INDEXONLY
FROM PLAN_TABLE
WHERE QUERYNO = 100
ORDER BY PLANNO;Correlated Subquery Performance
Correlated subqueries that are not rewritten by the optimizer as joins can execute the inner query once per outer row — potentially millions of executions on large tables. If EXPLAIN shows a nested loop where the inner side is a table scan, consider rewriting the correlated subquery as a join or a CTE.
CTE Materialisation
In DB2, a CTE is not automatically materialised (persisted to a work file) unless DB2 decides materialisation would be beneficial. In most cases, the optimizer treats a CTE as an inline view and pushes predicates into it. If you find that a CTE is being scanned multiple times inefficiently, check the EXPLAIN output and consider whether an intermediate temporary table would help.
Statistics and RUNSTATS
Subquery and CTE performance depends heavily on the accuracy of catalog statistics. Run RUNSTATS on the underlying tables after significant data changes to keep the optimizer's cost estimates accurate.
-- DB2 LUW: gather statistics on the EMP table
RUNSTATS ON TABLE DSN8C10.EMP WITH DISTRIBUTION AND INDEXES ALL;Practical Example: Department Report with Multiple CTEs
Here is a complete, production-style query that combines multiple CTEs to produce a department salary report:
WITH
-- Step 1: Department-level aggregation
DEPT_SUMMARY AS (
SELECT WORKDEPT,
COUNT(*) AS HEADCOUNT,
SUM(SALARY) AS TOTAL_SAL,
AVG(SALARY) AS AVG_SAL,
MIN(SALARY) AS MIN_SAL,
MAX(SALARY) AS MAX_SAL
FROM DSN8C10.EMP
GROUP BY WORKDEPT
),
-- Step 2: Rank employees within their department by salary
RANKED_EMPS AS (
SELECT E.EMPNO,
E.LASTNAME,
E.WORKDEPT,
E.SALARY,
E.JOB,
ROW_NUMBER() OVER (PARTITION BY E.WORKDEPT
ORDER BY E.SALARY DESC) AS SALARY_RANK
FROM DSN8C10.EMP E
),
-- Step 3: Pull only the top 3 earners per department
TOP3_PER_DEPT AS (
SELECT EMPNO, LASTNAME, WORKDEPT, SALARY, JOB, SALARY_RANK
FROM RANKED_EMPS
WHERE SALARY_RANK <= 3
)
-- Final output: join with department details and summary stats
SELECT D.DEPTNAME,
T.SALARY_RANK,
T.LASTNAME,
T.JOB,
T.SALARY,
S.AVG_SAL,
S.HEADCOUNT,
DECIMAL(T.SALARY / S.TOTAL_SAL * 100, 5, 2) AS PCT_OF_DEPT_PAYROLL
FROM TOP3_PER_DEPT T
JOIN DEPT_SUMMARY S ON T.WORKDEPT = S.WORKDEPT
JOIN DSN8C10.DEPT D ON T.WORKDEPT = D.DEPTNO
ORDER BY D.DEPTNAME, T.SALARY_RANK;This query is far easier to read, debug, and maintain than an equivalent version with three levels of nested subqueries.
Subqueries in INSERT, UPDATE, and DELETE
Subqueries are not limited to SELECT statements. They appear frequently in data modification statements.
-- INSERT: populate a summary table from a subquery
INSERT INTO DEPT_SALARY_SUMMARY (WORKDEPT, TOTAL_SALARY, UPDATED_DATE)
SELECT WORKDEPT,
SUM(SALARY),
CURRENT DATE
FROM DSN8C10.EMP
GROUP BY WORKDEPT;
-- DELETE: remove employees from departments that have been dissolved
DELETE FROM DSN8C10.EMP
WHERE WORKDEPT IN (
SELECT DEPTNO
FROM DSN8C10.DEPT
WHERE STATUS = 'DISSOLVED'
);Summary
DB2 subqueries and CTEs are essential tools for every DB2 developer:
- Scalar subqueries return a single value and can appear in the
SELECTlist orWHEREclause. - Subqueries with IN/NOT IN test set membership; be cautious of
NULLhandling withNOT IN. - EXISTS/NOT EXISTS are efficient for existence tests and anti-joins.
- ANY/ALL offer quantified comparisons against a set of values.
- Correlated subqueries reference the outer query and re-evaluate per row — check EXPLAIN when performance matters.
- Derived tables (subquery in
FROM) let you layer aggregations without needing temporary tables. - CTEs (WITH clause) name derived result sets, improve readability, support reuse, and enable recursion.
- Recursive CTEs traverse hierarchical data using an anchor member and a
UNION ALLrecursive member. - Always validate complex queries with EXPLAIN and keep RUNSTATS current on underlying tables.
