MainframeDatabaseDB2 Complete Reference

DB2 Subqueries and CTEs: Complete Guide with Practical Examples

TT
TopicTrick Team
DB2 Subqueries and CTEs: Complete Guide with Practical Examples

DB2 Subqueries and CTEs: Complete Guide with Practical Examples

If you work with IBM DB2 — whether on z/OS or LUW — writing a DB2 subquery is something you will do every day. Subqueries let you embed one SELECT inside another, enabling you to filter rows based on derived values, cross-reference tables without explicit joins, and express complex business logic in a single SQL statement. Common Table Expressions (CTEs), introduced with the WITH clause, extend that power further: they let you name a result set, reuse it, and even make it recursive for hierarchical queries.

This guide walks through every major subquery type in DB2, the full syntax of the WITH clause for CTEs, recursive CTE patterns for hierarchical data, performance considerations, and a set of practical examples you can adapt directly to your own work.


Understanding DB2 Subqueries

A subquery is a SELECT statement enclosed in parentheses that appears inside another SQL statement. DB2 supports subqueries in the WHERE clause, the FROM clause (as a derived table), the SELECT list (scalar subquery), and inside INSERT, UPDATE, and DELETE statements.

The outer statement that contains the subquery is called the outer query or parent query. The subquery is also called an inner query.

Scalar Subqueries

A scalar subquery returns exactly one row and one column. You can use it anywhere a single value is expected — inside a SELECT list, a WHERE condition, or even in a SET clause of an UPDATE.

sql
-- Find each employee's salary compared to the company-wide average
SELECT EMPNO,
       LASTNAME,
       SALARY,
       (SELECT AVG(SALARY) FROM DSN8C10.EMP) AS AVG_SALARY,
       SALARY - (SELECT AVG(SALARY) FROM DSN8C10.EMP) AS DIFF_FROM_AVG
FROM   DSN8C10.EMP
ORDER BY SALARY DESC;

If the scalar subquery returns more than one row at runtime, DB2 raises an error (SQLCODE -811). If it returns no rows, it evaluates to NULL. Always ensure the inner query is guaranteed to produce exactly one row, typically by using aggregate functions (AVG, MAX, COUNT, etc.) without a GROUP BY, or by filtering on a unique key.


Subqueries in the WHERE Clause

The most common placement for a DB2 subquery is inside the WHERE clause. DB2 supports four main predicates that work with subqueries here: IN, NOT IN, EXISTS/NOT EXISTS, and the quantified predicates ANY/ALL.

Subquery with IN

IN tests whether the outer column value matches any value in the subquery result set. The subquery must return a single column.

sql
-- Find all employees who work in departments located in New York
SELECT EMPNO, LASTNAME, WORKDEPT
FROM   DSN8C10.EMP
WHERE  WORKDEPT IN (
           SELECT DEPTNO
           FROM   DSN8C10.DEPT
           WHERE  LOCATION = 'NEW YORK'
       );

NOT IN is the inverse — it returns rows where the outer value does not appear in the subquery result. Be careful with NOT IN when the subquery might return NULL values; if any row in the subquery result is NULL, NOT IN returns no rows at all due to three-valued logic. Use NOT EXISTS when nulls are a concern.

Subquery with EXISTS

EXISTS tests whether the subquery returns at least one row. It is typically faster than IN on large result sets because DB2 stops scanning as soon as the first matching row is found.

sql
-- Find departments that have at least one manager assigned
SELECT DEPTNO, DEPTNAME
FROM   DSN8C10.DEPT D
WHERE  EXISTS (
           SELECT 1
           FROM   DSN8C10.EMP E
           WHERE  E.WORKDEPT = D.DEPTNO
           AND    E.JOB       = 'MANAGER'
       );

Notice the SELECT 1 — the actual select list inside an EXISTS subquery is irrelevant because DB2 only checks for row existence, not values. Using SELECT 1 or SELECT * makes no performance difference.

NOT EXISTS

NOT EXISTS is the correct way to find rows in one table that have no matching rows in another — the "anti-join" pattern.

sql
-- Find departments that have NO employees assigned
SELECT DEPTNO, DEPTNAME
FROM   DSN8C10.DEPT D
WHERE  NOT EXISTS (
           SELECT 1
           FROM   DSN8C10.EMP E
           WHERE  E.WORKDEPT = D.DEPTNO
       );

Quantified Predicates: ANY and ALL

ANY (synonym: SOME) returns true if the comparison is true for at least one value in the subquery result. ALL returns true only if the comparison is true for every value in the subquery result.

sql
-- Employees earning more than ANY employee in department A00
SELECT EMPNO, LASTNAME, SALARY
FROM   DSN8C10.EMP
WHERE  SALARY > ANY (
           SELECT SALARY
           FROM   DSN8C10.EMP
           WHERE  WORKDEPT = 'A00'
       );

-- Employees earning more than ALL employees in department A00
-- (i.e., more than the maximum salary in A00)
SELECT EMPNO, LASTNAME, SALARY
FROM   DSN8C10.EMP
WHERE  SALARY > ALL (
           SELECT SALARY
           FROM   DSN8C10.EMP
           WHERE  WORKDEPT = 'A00'
       );

> ALL is equivalent to > MAX(...) and > ANY is equivalent to > MIN(...). The scalar subquery versions with MAX/MIN are often cleaner and equally efficient.


Correlated Subqueries

A correlated subquery references one or more columns from the outer query. This creates a dependency — the subquery cannot run independently; it is re-evaluated for every row the outer query processes.

sql
-- Find employees who earn more than the average salary in their own department
SELECT EMPNO, LASTNAME, WORKDEPT, SALARY
FROM   DSN8C10.EMP E1
WHERE  SALARY > (
           SELECT AVG(SALARY)
           FROM   DSN8C10.EMP E2
           WHERE  E2.WORKDEPT = E1.WORKDEPT   -- correlation: references outer alias E1
       )
ORDER BY WORKDEPT, SALARY DESC;

In this example, the inner subquery computes the average salary for the department of the current outer row (E1.WORKDEPT). DB2's optimizer frequently rewrites correlated subqueries as joins, so the physical execution plan may not actually repeat the inner query for every outer row — but it is good practice to verify this with EXPLAIN.

Correlated Subquery in UPDATE

Correlated subqueries are extremely useful in UPDATE and DELETE statements.

sql
-- Give a 10% raise to the top earner in each department
UPDATE DSN8C10.EMP E1
SET    SALARY = SALARY * 1.10
WHERE  SALARY = (
           SELECT MAX(SALARY)
           FROM   DSN8C10.EMP E2
           WHERE  E2.WORKDEPT = E1.WORKDEPT
       );

Subqueries in the FROM Clause (Derived Tables)

When a subquery appears in the FROM clause, it is called a derived table or inline view. DB2 materialises the subquery result into a temporary result set, then treats it like a regular table for the outer query. You must give it an alias.

sql
-- Derived table: average salary per department, then filter departments
-- where the average is above the company-wide average
SELECT DEPT_AVG.WORKDEPT,
       DEPT_AVG.AVG_SAL
FROM  (
          SELECT WORKDEPT,
                 AVG(SALARY) AS AVG_SAL
          FROM   DSN8C10.EMP
          GROUP BY WORKDEPT
      ) AS DEPT_AVG
WHERE DEPT_AVG.AVG_SAL > (
          SELECT AVG(SALARY)
          FROM   DSN8C10.EMP
      )
ORDER BY DEPT_AVG.AVG_SAL DESC;

Derived tables are particularly useful when you need to apply a WHERE or HAVING filter to an aggregation that cannot be expressed directly in a single query level. They also allow you to reference column aliases defined in the subquery, which is not possible with a simple WHERE clause at the same query level.


Common Table Expressions (CTEs) — The WITH Clause

A Common Table Expression is defined using the WITH keyword before the main SELECT statement. It creates a named result set that lives for the duration of the query. Think of it as giving a name to a derived table so you can reference it more than once, and so you can read the query logic from top to bottom rather than inside-out.

Basic CTE Syntax

sql
WITH cte_name (col1, col2, ...) AS (
    -- subquery that defines the CTE
    SELECT ...
    FROM   ...
    WHERE  ...
)
SELECT col1, col2
FROM   cte_name
WHERE  ...;

The column list after cte_name is optional when the subquery's output columns are already named.

Single CTE Example: Top Earners per Department

sql
WITH DEPT_STATS AS (
    SELECT WORKDEPT,
           MAX(SALARY)   AS MAX_SAL,
           AVG(SALARY)   AS AVG_SAL,
           COUNT(*)      AS HEADCOUNT
    FROM   DSN8C10.EMP
    GROUP BY WORKDEPT
)
SELECT E.EMPNO,
       E.LASTNAME,
       E.WORKDEPT,
       E.SALARY,
       D.MAX_SAL,
       D.AVG_SAL,
       D.HEADCOUNT
FROM   DSN8C10.EMP  E
JOIN   DEPT_STATS   D ON E.WORKDEPT = D.WORKDEPT
WHERE  E.SALARY = D.MAX_SAL
ORDER BY E.WORKDEPT;

This is far more readable than nesting the derived table inline, especially as query complexity grows.

Multiple CTEs in a Single WITH Clause

You can define several CTEs in the same WITH clause, separating them with commas. Later CTEs can reference earlier ones.

sql
WITH
-- CTE 1: department-level salary statistics
DEPT_STATS AS (
    SELECT WORKDEPT,
           AVG(SALARY)  AS AVG_SAL,
           MAX(SALARY)  AS MAX_SAL
    FROM   DSN8C10.EMP
    GROUP BY WORKDEPT
),
-- CTE 2: employees who earn above their department average
HIGH_EARNERS AS (
    SELECT E.EMPNO,
           E.LASTNAME,
           E.WORKDEPT,
           E.SALARY,
           D.AVG_SAL
    FROM   DSN8C10.EMP   E
    JOIN   DEPT_STATS    D ON E.WORKDEPT = D.WORKDEPT
    WHERE  E.SALARY > D.AVG_SAL
),
-- CTE 3: department names joined in
DEPT_NAMES AS (
    SELECT DEPTNO, DEPTNAME
    FROM   DSN8C10.DEPT
)
SELECT H.EMPNO,
       H.LASTNAME,
       N.DEPTNAME,
       H.SALARY,
       H.AVG_SAL,
       DECIMAL(H.SALARY / H.AVG_SAL * 100, 6, 2) AS PCT_OF_AVG
FROM   HIGH_EARNERS  H
JOIN   DEPT_NAMES    N ON H.WORKDEPT = N.DEPTNO
ORDER BY PCT_OF_AVG DESC;

Recursive CTEs in DB2

Recursive CTEs allow a CTE to reference itself, making it possible to traverse hierarchical data such as organisational charts, bill-of-materials structures, or any parent-child relationship stored in a table. DB2 requires that the recursion uses UNION ALL (not UNION), and the recursive member may reference the CTE name only once.

Recursive CTE Structure

sql
WITH RECURSIVE cte_name (col1, col2, ...) AS (
    -- Anchor member: the base case (non-recursive)
    SELECT ...
    FROM   ...
    WHERE  <starting condition>

    UNION ALL

    -- Recursive member: references cte_name
    SELECT ...
    FROM   table
    JOIN   cte_name ON <join condition>
)
SELECT * FROM cte_name;

Note: On DB2 for z/OS, the RECURSIVE keyword is optional — DB2 automatically detects recursion. On DB2 LUW, it is also optional but recommended for clarity.

Org Hierarchy Example

Suppose the DEPT table has a MGRNO column that points to the employee number of the department manager, and a ADMRDEPT column that contains the parent department code. We want to list the full management chain from a given starting department downward.

sql
WITH DEPT_HIERARCHY (DEPTNO, DEPTNAME, ADMRDEPT, LEVEL) AS (
    -- Anchor: start from the top-level department
    SELECT DEPTNO,
           DEPTNAME,
           ADMRDEPT,
           0 AS LEVEL
    FROM   DSN8C10.DEPT
    WHERE  ADMRDEPT = DEPTNO      -- top-level departments report to themselves

    UNION ALL

    -- Recursive: find child departments
    SELECT D.DEPTNO,
           D.DEPTNAME,
           D.ADMRDEPT,
           H.LEVEL + 1
    FROM   DSN8C10.DEPT      D
    JOIN   DEPT_HIERARCHY    H ON D.ADMRDEPT = H.DEPTNO
                                AND D.DEPTNO  <> H.DEPTNO   -- prevent self-join loop
)
SELECT LEVEL,
       REPEAT('  ', LEVEL) || DEPTNO AS INDENTED_DEPT,
       DEPTNAME
FROM   DEPT_HIERARCHY
ORDER BY LEVEL, DEPTNO;

The LEVEL column tracks how deep in the hierarchy each row sits. REPEAT(' ', LEVEL) creates visual indentation in the output. DB2 will terminate the recursion automatically when no more child rows are found.

Running Totals with a Recursive CTE

Recursive CTEs can also compute running totals, though window functions (covered in a related post) are usually more efficient for this:

sql
WITH MONTHLY_SALES (MONTH_NUM, MONTHLY_AMT, RUNNING_TOTAL) AS (
    -- Anchor: first month
    SELECT MONTH_NUM,
           MONTHLY_AMT,
           MONTHLY_AMT AS RUNNING_TOTAL
    FROM   SALES_SUMMARY
    WHERE  MONTH_NUM = 1

    UNION ALL

    -- Recursive: add each subsequent month
    SELECT S.MONTH_NUM,
           S.MONTHLY_AMT,
           R.RUNNING_TOTAL + S.MONTHLY_AMT
    FROM   SALES_SUMMARY  S
    JOIN   MONTHLY_SALES  R ON S.MONTH_NUM = R.MONTH_NUM + 1
)
SELECT MONTH_NUM,
       MONTHLY_AMT,
       RUNNING_TOTAL
FROM   MONTHLY_SALES
ORDER BY MONTH_NUM;

CTE vs Subquery: When to Use Which

Both CTEs and subqueries can express the same logic in most cases. The choice comes down to readability, reusability, and in some situations, performance.

FactorSubqueryCTE
ReadabilityHarder to read when deeply nestedTop-down, easy to follow
Reuse in same queryMust be duplicatedDefine once, reference many times
RecursionNot supportedSupported with UNION ALL
Query optimiserUsually same planUsually same plan
DebuggingHard to isolateEach CTE can be tested separately
DB2 z/OS compatibilityAll versionsDB2 V8 and later

Use a subquery when:

  • The logic is simple and used only once.
  • You need an uncorrelated filter that is easy to read inline.
  • You are on a very old DB2 version that predates CTE support.

Use a CTE when:

  • You reference the same derived result set multiple times.
  • The query has multiple layers of aggregation that would be heavily nested.
  • You need recursion for hierarchical data.
  • Code maintainability and readability are priorities (production code, audited COBOL programs, etc.).

Performance Considerations

EXPLAIN Is Your Friend

Always run EXPLAIN after writing complex subquery or CTE logic. DB2's optimizer is sophisticated and will often rewrite subqueries as joins internally, but this is not guaranteed. Check the PLANNO, TNAME, and ACCESSTYPE columns in DSN_STATEMNT_TABLE and DSN_FILTER_TABLE to understand how DB2 plans to execute the query.

sql
-- On DB2 for z/OS, populate the PLAN_TABLE:
EXPLAIN PLAN SET QUERYNO = 100 FOR
    WITH TOP_EARNERS AS (
        SELECT WORKDEPT, MAX(SALARY) AS MAX_SAL
        FROM   DSN8C10.EMP
        GROUP BY WORKDEPT
    )
    SELECT E.EMPNO, E.LASTNAME, E.SALARY
    FROM   DSN8C10.EMP E
    JOIN   TOP_EARNERS T ON E.WORKDEPT = T.WORKDEPT
                         AND E.SALARY  = T.MAX_SAL;

-- Review the plan
SELECT PLANNO, TNAME, ACCESSTYPE, MATCHCOLS, INDEXONLY
FROM   PLAN_TABLE
WHERE  QUERYNO = 100
ORDER BY PLANNO;

Correlated Subquery Performance

Correlated subqueries that are not rewritten by the optimizer as joins can execute the inner query once per outer row — potentially millions of executions on large tables. If EXPLAIN shows a nested loop where the inner side is a table scan, consider rewriting the correlated subquery as a join or a CTE.

CTE Materialisation

In DB2, a CTE is not automatically materialised (persisted to a work file) unless DB2 decides materialisation would be beneficial. In most cases, the optimizer treats a CTE as an inline view and pushes predicates into it. If you find that a CTE is being scanned multiple times inefficiently, check the EXPLAIN output and consider whether an intermediate temporary table would help.

Statistics and RUNSTATS

Subquery and CTE performance depends heavily on the accuracy of catalog statistics. Run RUNSTATS on the underlying tables after significant data changes to keep the optimizer's cost estimates accurate.

sql
-- DB2 LUW: gather statistics on the EMP table
RUNSTATS ON TABLE DSN8C10.EMP WITH DISTRIBUTION AND INDEXES ALL;

Practical Example: Department Report with Multiple CTEs

Here is a complete, production-style query that combines multiple CTEs to produce a department salary report:

sql
WITH
-- Step 1: Department-level aggregation
DEPT_SUMMARY AS (
    SELECT WORKDEPT,
           COUNT(*)              AS HEADCOUNT,
           SUM(SALARY)           AS TOTAL_SAL,
           AVG(SALARY)           AS AVG_SAL,
           MIN(SALARY)           AS MIN_SAL,
           MAX(SALARY)           AS MAX_SAL
    FROM   DSN8C10.EMP
    GROUP BY WORKDEPT
),
-- Step 2: Rank employees within their department by salary
RANKED_EMPS AS (
    SELECT E.EMPNO,
           E.LASTNAME,
           E.WORKDEPT,
           E.SALARY,
           E.JOB,
           ROW_NUMBER() OVER (PARTITION BY E.WORKDEPT
                              ORDER BY E.SALARY DESC) AS SALARY_RANK
    FROM   DSN8C10.EMP E
),
-- Step 3: Pull only the top 3 earners per department
TOP3_PER_DEPT AS (
    SELECT EMPNO, LASTNAME, WORKDEPT, SALARY, JOB, SALARY_RANK
    FROM   RANKED_EMPS
    WHERE  SALARY_RANK <= 3
)
-- Final output: join with department details and summary stats
SELECT D.DEPTNAME,
       T.SALARY_RANK,
       T.LASTNAME,
       T.JOB,
       T.SALARY,
       S.AVG_SAL,
       S.HEADCOUNT,
       DECIMAL(T.SALARY / S.TOTAL_SAL * 100, 5, 2) AS PCT_OF_DEPT_PAYROLL
FROM   TOP3_PER_DEPT  T
JOIN   DEPT_SUMMARY   S ON T.WORKDEPT = S.WORKDEPT
JOIN   DSN8C10.DEPT   D ON T.WORKDEPT = D.DEPTNO
ORDER BY D.DEPTNAME, T.SALARY_RANK;

This query is far easier to read, debug, and maintain than an equivalent version with three levels of nested subqueries.


Subqueries in INSERT, UPDATE, and DELETE

Subqueries are not limited to SELECT statements. They appear frequently in data modification statements.

sql
-- INSERT: populate a summary table from a subquery
INSERT INTO DEPT_SALARY_SUMMARY (WORKDEPT, TOTAL_SALARY, UPDATED_DATE)
SELECT WORKDEPT,
       SUM(SALARY),
       CURRENT DATE
FROM   DSN8C10.EMP
GROUP BY WORKDEPT;

-- DELETE: remove employees from departments that have been dissolved
DELETE FROM DSN8C10.EMP
WHERE WORKDEPT IN (
    SELECT DEPTNO
    FROM   DSN8C10.DEPT
    WHERE  STATUS = 'DISSOLVED'
);

Summary

DB2 subqueries and CTEs are essential tools for every DB2 developer:

  • Scalar subqueries return a single value and can appear in the SELECT list or WHERE clause.
  • Subqueries with IN/NOT IN test set membership; be cautious of NULL handling with NOT IN.
  • EXISTS/NOT EXISTS are efficient for existence tests and anti-joins.
  • ANY/ALL offer quantified comparisons against a set of values.
  • Correlated subqueries reference the outer query and re-evaluate per row — check EXPLAIN when performance matters.
  • Derived tables (subquery in FROM) let you layer aggregations without needing temporary tables.
  • CTEs (WITH clause) name derived result sets, improve readability, support reuse, and enable recursion.
  • Recursive CTEs traverse hierarchical data using an anchor member and a UNION ALL recursive member.
  • Always validate complex queries with EXPLAIN and keep RUNSTATS current on underlying tables.

Further Reading