What is a DB2 subquery and when should I use one?

A DB2 subquery is a SELECT statement nested inside another SQL statement (SELECT, INSERT, UPDATE, or DELETE). Use a subquery when you need to filter rows based on a value that must itself be computed from the database — for example, finding all employees who earn more than the average salary. Subqueries are appropriate for single-use logic; if you need to reference the same derived result set more than once in a query, a CTE is usually cleaner.

What is a CTE in DB2 and how does it differ from a subquery?

A CTE (Common Table Expression) in DB2 is defined with the WITH clause before the main SELECT statement. It acts like a named temporary result set that you can reference one or more times in the main query. The key differences from an inline subquery are readability (CTEs are defined once at the top, not buried inside the query), reusability (a CTE can be referenced multiple times in the same statement), and support for recursion. DB2 supports CTEs from version 8 onwards.

Does DB2 support recursive CTEs for hierarchical data?

Yes. DB2 for z/OS and DB2 LUW both support recursive CTEs using the WITH clause. A recursive CTE consists of an anchor member (the base case) combined with a recursive member using UNION ALL. DB2 requires that the recursive reference appears only once in the recursive member and only on one side of the UNION ALL. The recursion automatically terminates when the recursive member returns no more rows. Recursive CTEs are ideal for org charts, bill-of-materials, and tree-structured data.

Are DB2 subqueries in the WHERE clause correlated or uncorrelated by default?

By default a subquery is uncorrelated — it is executed once and its result is used for all rows in the outer query. A correlated subquery references a column from the outer query, which causes it to be re-evaluated for every row processed by the outer query. Correlated subqueries are powerful but can be expensive on large tables. DB2's optimizer often rewrites correlated subqueries as joins internally, but it is worth checking the EXPLAIN output when performance matters.

DB2 Subqueries and CTEs: Complete Guide with Practical Examples

If you work with IBM DB2 — whether on z/OS or LUW — writing a DB2 subquery is something you will do every day. Subqueries let you embed one SELECT inside another, enabling you to filter rows based on derived values, cross-reference tables without explicit joins, and express complex business logic in a single SQL statement. Common Table Expressions (CTEs), introduced with the WITH clause, extend that power further: they let you name a result set, reuse it, and even make it recursive for hierarchical queries.

This guide walks through every major subquery type in DB2, the full syntax of the WITH clause for CTEs, recursive CTE patterns for hierarchical data, performance considerations, and a set of practical examples you can adapt directly to your own work.

Understanding DB2 Subqueries

A subquery is a SELECT statement enclosed in parentheses that appears inside another SQL statement. DB2 supports subqueries in the WHERE clause, the FROM clause (as a derived table), the SELECT list (scalar subquery), and inside INSERT, UPDATE, and DELETE statements.

The outer statement that contains the subquery is called the outer query or parent query. The subquery is also called an inner query.

Scalar Subqueries

A scalar subquery returns exactly one row and one column. You can use it anywhere a single value is expected — inside a SELECT list, a WHERE condition, or even in a SET clause of an UPDATE.

sql

-- Find each employee's salary compared to the company-wide average
SELECT EMPNO,
       LASTNAME,
       SALARY,
       (SELECT AVG(SALARY) FROM DSN8C10.EMP) AS AVG_SALARY,
       SALARY - (SELECT AVG(SALARY) FROM DSN8C10.EMP) AS DIFF_FROM_AVG
FROM   DSN8C10.EMP
ORDER BY SALARY DESC;

If the scalar subquery returns more than one row at runtime, DB2 raises an error (SQLCODE -811). If it returns no rows, it evaluates to NULL. Always ensure the inner query is guaranteed to produce exactly one row, typically by using aggregate functions (AVG, MAX, COUNT, etc.) without a GROUP BY, or by filtering on a unique key.

Subqueries in the WHERE Clause

The most common placement for a DB2 subquery is inside the WHERE clause. DB2 supports four main predicates that work with subqueries here: IN, NOT IN, EXISTS/NOT EXISTS, and the quantified predicates ANY/ALL.

Subquery with IN

IN tests whether the outer column value matches any value in the subquery result set. The subquery must return a single column.

sql

-- Find all employees who work in departments located in New York
SELECT EMPNO, LASTNAME, WORKDEPT
FROM   DSN8C10.EMP
WHERE  WORKDEPT IN (
           SELECT DEPTNO
           FROM   DSN8C10.DEPT
           WHERE  LOCATION = 'NEW YORK'
       );

NOT IN is the inverse — it returns rows where the outer value does not appear in the subquery result. Be careful with NOT IN when the subquery might return NULL values; if any row in the subquery result is NULL, NOT IN returns no rows at all due to three-valued logic. Use NOT EXISTS when nulls are a concern.

Subquery with EXISTS

EXISTS tests whether the subquery returns at least one row. It is typically faster than IN on large result sets because DB2 stops scanning as soon as the first matching row is found.

sql

-- Find departments that have at least one manager assigned
SELECT DEPTNO, DEPTNAME
FROM   DSN8C10.DEPT D
WHERE  EXISTS (
           SELECT 1
           FROM   DSN8C10.EMP E
           WHERE  E.WORKDEPT = D.DEPTNO
           AND    E.JOB       = 'MANAGER'
       );

Notice the SELECT 1 — the actual select list inside an EXISTS subquery is irrelevant because DB2 only checks for row existence, not values. Using SELECT 1 or SELECT * makes no performance difference.

NOT EXISTS

NOT EXISTS is the correct way to find rows in one table that have no matching rows in another — the "anti-join" pattern.

sql

-- Find departments that have NO employees assigned
SELECT DEPTNO, DEPTNAME
FROM   DSN8C10.DEPT D
WHERE  NOT EXISTS (
           SELECT 1
           FROM   DSN8C10.EMP E
           WHERE  E.WORKDEPT = D.DEPTNO
       );

Quantified Predicates: ANY and ALL

ANY (synonym: SOME) returns true if the comparison is true for at least one value in the subquery result. ALL returns true only if the comparison is true for every value in the subquery result.

sql

-- Employees earning more than ANY employee in department A00
SELECT EMPNO, LASTNAME, SALARY
FROM   DSN8C10.EMP
WHERE  SALARY > ANY (
           SELECT SALARY
           FROM   DSN8C10.EMP
           WHERE  WORKDEPT = 'A00'
       );

-- Employees earning more than ALL employees in department A00
-- (i.e., more than the maximum salary in A00)
SELECT EMPNO, LASTNAME, SALARY
FROM   DSN8C10.EMP
WHERE  SALARY > ALL (
           SELECT SALARY
           FROM   DSN8C10.EMP
           WHERE  WORKDEPT = 'A00'
       );

> ALL is equivalent to > MAX(...) and > ANY is equivalent to > MIN(...). The scalar subquery versions with MAX/MIN are often cleaner and equally efficient.

Correlated Subqueries

A correlated subquery references one or more columns from the outer query. This creates a dependency — the subquery cannot run independently; it is re-evaluated for every row the outer query processes.

sql

-- Find employees who earn more than the average salary in their own department
SELECT EMPNO, LASTNAME, WORKDEPT, SALARY
FROM   DSN8C10.EMP E1
WHERE  SALARY > (
           SELECT AVG(SALARY)
           FROM   DSN8C10.EMP E2
           WHERE  E2.WORKDEPT = E1.WORKDEPT   -- correlation: references outer alias E1
       )
ORDER BY WORKDEPT, SALARY DESC;

In this example, the inner subquery computes the average salary for the department of the current outer row (E1.WORKDEPT). DB2's optimizer frequently rewrites correlated subqueries as joins, so the physical execution plan may not actually repeat the inner query for every outer row — but it is good practice to verify this with EXPLAIN.

Correlated Subquery in UPDATE

Correlated subqueries are extremely useful in UPDATE and DELETE statements.

sql

-- Give a 10% raise to the top earner in each department
UPDATE DSN8C10.EMP E1
SET    SALARY = SALARY * 1.10
WHERE  SALARY = (
           SELECT MAX(SALARY)
           FROM   DSN8C10.EMP E2
           WHERE  E2.WORKDEPT = E1.WORKDEPT
       );

Subqueries in the FROM Clause (Derived Tables)

When a subquery appears in the FROM clause, it is called a derived table or inline view. DB2 materialises the subquery result into a temporary result set, then treats it like a regular table for the outer query. You must give it an alias.

sql

-- Derived table: average salary per department, then filter departments
-- where the average is above the company-wide average
SELECT DEPT_AVG.WORKDEPT,
       DEPT_AVG.AVG_SAL
FROM  (
          SELECT WORKDEPT,
                 AVG(SALARY) AS AVG_SAL
          FROM   DSN8C10.EMP
          GROUP BY WORKDEPT
      ) AS DEPT_AVG
WHERE DEPT_AVG.AVG_SAL > (
          SELECT AVG(SALARY)
          FROM   DSN8C10.EMP
      )
ORDER BY DEPT_AVG.AVG_SAL DESC;

Derived tables are particularly useful when you need to apply a WHERE or HAVING filter to an aggregation that cannot be expressed directly in a single query level. They also allow you to reference column aliases defined in the subquery, which is not possible with a simple WHERE clause at the same query level.

Common Table Expressions (CTEs) — The WITH Clause

A Common Table Expression is defined using the WITH keyword before the main SELECT statement. It creates a named result set that lives for the duration of the query. Think of it as giving a name to a derived table so you can reference it more than once, and so you can read the query logic from top to bottom rather than inside-out.

Basic CTE Syntax

sql

WITH cte_name (col1, col2, ...) AS (
    -- subquery that defines the CTE
    SELECT ...
    FROM   ...
    WHERE  ...
)
SELECT col1, col2
FROM   cte_name
WHERE  ...;

The column list after cte_name is optional when the subquery's output columns are already named.

Single CTE Example: Top Earners per Department

sql

WITH DEPT_STATS AS (
    SELECT WORKDEPT,
           MAX(SALARY)   AS MAX_SAL,
           AVG(SALARY)   AS AVG_SAL,
           COUNT(*)      AS HEADCOUNT
    FROM   DSN8C10.EMP
    GROUP BY WORKDEPT
)
SELECT E.EMPNO,
       E.LASTNAME,
       E.WORKDEPT,
       E.SALARY,
       D.MAX_SAL,
       D.AVG_SAL,
       D.HEADCOUNT
FROM   DSN8C10.EMP  E
JOIN   DEPT_STATS   D ON E.WORKDEPT = D.WORKDEPT
WHERE  E.SALARY = D.MAX_SAL
ORDER BY E.WORKDEPT;

This is far more readable than nesting the derived table inline, especially as query complexity grows.

Multiple CTEs in a Single WITH Clause

You can define several CTEs in the same WITH clause, separating them with commas. Later CTEs can reference earlier ones.

sql

WITH
-- CTE 1: department-level salary statistics
DEPT_STATS AS (
    SELECT WORKDEPT,
           AVG(SALARY)  AS AVG_SAL,
           MAX(SALARY)  AS MAX_SAL
    FROM   DSN8C10.EMP
    GROUP BY WORKDEPT
),
-- CTE 2: employees who earn above their department average
HIGH_EARNERS AS (
    SELECT E.EMPNO,
           E.LASTNAME,
           E.WORKDEPT,
           E.SALARY,
           D.AVG_SAL
    FROM   DSN8C10.EMP   E
    JOIN   DEPT_STATS    D ON E.WORKDEPT = D.WORKDEPT
    WHERE  E.SALARY > D.AVG_SAL
),
-- CTE 3: department names joined in
DEPT_NAMES AS (
    SELECT DEPTNO, DEPTNAME
    FROM   DSN8C10.DEPT
)
SELECT H.EMPNO,
       H.LASTNAME,
       N.DEPTNAME,
       H.SALARY,
       H.AVG_SAL,
       DECIMAL(H.SALARY / H.AVG_SAL * 100, 6, 2) AS PCT_OF_AVG
FROM   HIGH_EARNERS  H
JOIN   DEPT_NAMES    N ON H.WORKDEPT = N.DEPTNO
ORDER BY PCT_OF_AVG DESC;

Recursive CTEs in DB2

Recursive CTEs allow a CTE to reference itself, making it possible to traverse hierarchical data such as organisational charts, bill-of-materials structures, or any parent-child relationship stored in a table. DB2 requires that the recursion uses UNION ALL (not UNION), and the recursive member may reference the CTE name only once.

Recursive CTE Structure

sql

WITH RECURSIVE cte_name (col1, col2, ...) AS (
    -- Anchor member: the base case (non-recursive)
    SELECT ...
    FROM   ...
    WHERE  <starting condition>

    UNION ALL

    -- Recursive member: references cte_name
    SELECT ...
    FROM   table
    JOIN   cte_name ON <join condition>
)
SELECT * FROM cte_name;

Note: On DB2 for z/OS, the RECURSIVE keyword is optional — DB2 automatically detects recursion. On DB2 LUW, it is also optional but recommended for clarity.

Org Hierarchy Example

Suppose the DEPT table has a MGRNO column that points to the employee number of the department manager, and a ADMRDEPT column that contains the parent department code. We want to list the full management chain from a given starting department downward.

sql

WITH DEPT_HIERARCHY (DEPTNO, DEPTNAME, ADMRDEPT, LEVEL) AS (
    -- Anchor: start from the top-level department
    SELECT DEPTNO,
           DEPTNAME,
           ADMRDEPT,
           0 AS LEVEL
    FROM   DSN8C10.DEPT
    WHERE  ADMRDEPT = DEPTNO      -- top-level departments report to themselves

    UNION ALL

    -- Recursive: find child departments
    SELECT D.DEPTNO,
           D.DEPTNAME,
           D.ADMRDEPT,
           H.LEVEL + 1
    FROM   DSN8C10.DEPT      D
    JOIN   DEPT_HIERARCHY    H ON D.ADMRDEPT = H.DEPTNO
                                AND D.DEPTNO  <> H.DEPTNO   -- prevent self-join loop
)
SELECT LEVEL,
       REPEAT('  ', LEVEL) || DEPTNO AS INDENTED_DEPT,
       DEPTNAME
FROM   DEPT_HIERARCHY
ORDER BY LEVEL, DEPTNO;

The LEVEL column tracks how deep in the hierarchy each row sits. REPEAT(' ', LEVEL) creates visual indentation in the output. DB2 will terminate the recursion automatically when no more child rows are found.

Running Totals with a Recursive CTE

Recursive CTEs can also compute running totals, though window functions (covered in a related post) are usually more efficient for this:

sql

WITH MONTHLY_SALES (MONTH_NUM, MONTHLY_AMT, RUNNING_TOTAL) AS (
    -- Anchor: first month
    SELECT MONTH_NUM,
           MONTHLY_AMT,
           MONTHLY_AMT AS RUNNING_TOTAL
    FROM   SALES_SUMMARY
    WHERE  MONTH_NUM = 1

    UNION ALL

    -- Recursive: add each subsequent month
    SELECT S.MONTH_NUM,
           S.MONTHLY_AMT,
           R.RUNNING_TOTAL + S.MONTHLY_AMT
    FROM   SALES_SUMMARY  S
    JOIN   MONTHLY_SALES  R ON S.MONTH_NUM = R.MONTH_NUM + 1
)
SELECT MONTH_NUM,
       MONTHLY_AMT,
       RUNNING_TOTAL
FROM   MONTHLY_SALES
ORDER BY MONTH_NUM;

CTE vs Subquery: When to Use Which

Both CTEs and subqueries can express the same logic in most cases. The choice comes down to readability, reusability, and in some situations, performance.

Factor	Subquery	CTE
Readability	Harder to read when deeply nested	Top-down, easy to follow
Reuse in same query	Must be duplicated	Define once, reference many times
Recursion	Not supported	Supported with UNION ALL
Query optimiser	Usually same plan	Usually same plan
Debugging	Hard to isolate	Each CTE can be tested separately
DB2 z/OS compatibility	All versions	DB2 V8 and later

Use a subquery when:

The logic is simple and used only once.
You need an uncorrelated filter that is easy to read inline.
You are on a very old DB2 version that predates CTE support.

Use a CTE when:

You reference the same derived result set multiple times.
The query has multiple layers of aggregation that would be heavily nested.
You need recursion for hierarchical data.
Code maintainability and readability are priorities (production code, audited COBOL programs, etc.).

Performance Considerations

EXPLAIN Is Your Friend

Always run EXPLAIN after writing complex subquery or CTE logic. DB2's optimizer is sophisticated and will often rewrite subqueries as joins internally, but this is not guaranteed. Check the PLANNO, TNAME, and ACCESSTYPE columns in DSN_STATEMNT_TABLE and DSN_FILTER_TABLE to understand how DB2 plans to execute the query.

sql

-- On DB2 for z/OS, populate the PLAN_TABLE:
EXPLAIN PLAN SET QUERYNO = 100 FOR
    WITH TOP_EARNERS AS (
        SELECT WORKDEPT, MAX(SALARY) AS MAX_SAL
        FROM   DSN8C10.EMP
        GROUP BY WORKDEPT
    )
    SELECT E.EMPNO, E.LASTNAME, E.SALARY
    FROM   DSN8C10.EMP E
    JOIN   TOP_EARNERS T ON E.WORKDEPT = T.WORKDEPT
                         AND E.SALARY  = T.MAX_SAL;

-- Review the plan
SELECT PLANNO, TNAME, ACCESSTYPE, MATCHCOLS, INDEXONLY
FROM   PLAN_TABLE
WHERE  QUERYNO = 100
ORDER BY PLANNO;

Correlated Subquery Performance

Correlated subqueries that are not rewritten by the optimizer as joins can execute the inner query once per outer row — potentially millions of executions on large tables. If EXPLAIN shows a nested loop where the inner side is a table scan, consider rewriting the correlated subquery as a join or a CTE.

CTE Materialisation

In DB2, a CTE is not automatically materialised (persisted to a work file) unless DB2 decides materialisation would be beneficial. In most cases, the optimizer treats a CTE as an inline view and pushes predicates into it. If you find that a CTE is being scanned multiple times inefficiently, check the EXPLAIN output and consider whether an intermediate temporary table would help.

Statistics and RUNSTATS

Subquery and CTE performance depends heavily on the accuracy of catalog statistics. Run RUNSTATS on the underlying tables after significant data changes to keep the optimizer's cost estimates accurate.

sql

-- DB2 LUW: gather statistics on the EMP table
RUNSTATS ON TABLE DSN8C10.EMP WITH DISTRIBUTION AND INDEXES ALL;

Practical Example: Department Report with Multiple CTEs

Here is a complete, production-style query that combines multiple CTEs to produce a department salary report:

sql

WITH
-- Step 1: Department-level aggregation
DEPT_SUMMARY AS (
    SELECT WORKDEPT,
           COUNT(*)              AS HEADCOUNT,
           SUM(SALARY)           AS TOTAL_SAL,
           AVG(SALARY)           AS AVG_SAL,
           MIN(SALARY)           AS MIN_SAL,
           MAX(SALARY)           AS MAX_SAL
    FROM   DSN8C10.EMP
    GROUP BY WORKDEPT
),
-- Step 2: Rank employees within their department by salary
RANKED_EMPS AS (
    SELECT E.EMPNO,
           E.LASTNAME,
           E.WORKDEPT,
           E.SALARY,
           E.JOB,
           ROW_NUMBER() OVER (PARTITION BY E.WORKDEPT
                              ORDER BY E.SALARY DESC) AS SALARY_RANK
    FROM   DSN8C10.EMP E
),
-- Step 3: Pull only the top 3 earners per department
TOP3_PER_DEPT AS (
    SELECT EMPNO, LASTNAME, WORKDEPT, SALARY, JOB, SALARY_RANK
    FROM   RANKED_EMPS
    WHERE  SALARY_RANK <= 3
)
-- Final output: join with department details and summary stats
SELECT D.DEPTNAME,
       T.SALARY_RANK,
       T.LASTNAME,
       T.JOB,
       T.SALARY,
       S.AVG_SAL,
       S.HEADCOUNT,
       DECIMAL(T.SALARY / S.TOTAL_SAL * 100, 5, 2) AS PCT_OF_DEPT_PAYROLL
FROM   TOP3_PER_DEPT  T
JOIN   DEPT_SUMMARY   S ON T.WORKDEPT = S.WORKDEPT
JOIN   DSN8C10.DEPT   D ON T.WORKDEPT = D.DEPTNO
ORDER BY D.DEPTNAME, T.SALARY_RANK;

This query is far easier to read, debug, and maintain than an equivalent version with three levels of nested subqueries.

Subqueries in INSERT, UPDATE, and DELETE

Subqueries are not limited to SELECT statements. They appear frequently in data modification statements.

sql

-- INSERT: populate a summary table from a subquery
INSERT INTO DEPT_SALARY_SUMMARY (WORKDEPT, TOTAL_SALARY, UPDATED_DATE)
SELECT WORKDEPT,
       SUM(SALARY),
       CURRENT DATE
FROM   DSN8C10.EMP
GROUP BY WORKDEPT;

-- DELETE: remove employees from departments that have been dissolved
DELETE FROM DSN8C10.EMP
WHERE WORKDEPT IN (
    SELECT DEPTNO
    FROM   DSN8C10.DEPT
    WHERE  STATUS = 'DISSOLVED'
);

Summary

DB2 subqueries and CTEs are essential tools for every DB2 developer:

Scalar subqueries return a single value and can appear in the SELECT list or WHERE clause.
Subqueries with IN/NOT IN test set membership; be cautious of NULL handling with NOT IN.
EXISTS/NOT EXISTS are efficient for existence tests and anti-joins.
ANY/ALL offer quantified comparisons against a set of values.
Correlated subqueries reference the outer query and re-evaluate per row — check EXPLAIN when performance matters.
Derived tables (subquery in FROM) let you layer aggregations without needing temporary tables.
CTEs (WITH clause) name derived result sets, improve readability, support reuse, and enable recursion.
Recursive CTEs traverse hierarchical data using an anchor member and a UNION ALL recursive member.
Always validate complex queries with EXPLAIN and keep RUNSTATS current on underlying tables.

Frequently Asked Questions

Q: What is the difference between a correlated and non-correlated subquery in DB2? A non-correlated subquery executes once and its result is reused for every outer row — for example, WHERE salary > (SELECT AVG(salary) FROM employee). A correlated subquery references a column from the outer query and re-executes for each outer row — for example, WHERE salary > (SELECT AVG(salary) FROM employee e2 WHERE e2.dept = e1.dept). Correlated subqueries are more flexible but can be slower; consider rewriting as a join or CTE when performance is critical.

Q: When should you use a CTE instead of a subquery in DB2? Use a CTE (Common Table Expression, written with WITH name AS (...)) when you need to reference the same derived result more than once, when the logic is complex enough that naming it improves readability, or when you need a recursive query. A subquery is fine for simple one-off derivations. CTEs do not guarantee materialisation in DB2 — the optimiser may inline them — so they are primarily a readability and reuse tool, not a performance hint.

Q: How do you write a recursive CTE in DB2 to traverse a hierarchy? Use WITH RECURSIVE: define an anchor member (base case) and a recursive member joined back to the CTE name. For example, to walk an employee hierarchy: WITH RECURSIVE mgr(emp_id, name, level) AS (SELECT emp_id, name, 1 FROM employee WHERE manager_id IS NULL UNION ALL SELECT e.emp_id, e.name, m.level+1 FROM employee e JOIN mgr m ON e.manager_id = m.emp_id) SELECT * FROM mgr. DB2 repeats the recursive member until no new rows are produced.