Can you use a column alias in the DB2 WHERE clause?

No. Column aliases defined in the SELECT list are not visible to the WHERE clause because WHERE is evaluated before SELECT in DB2's logical query processing order. To filter on a computed expression, either repeat the expression in WHERE, or wrap the query in a subquery or CTE and filter in the outer WHERE. This is one of the most common mistakes developers make when writing DB2 queries.

Why does GROUP BY require all SELECT columns to be listed?

SQL (and DB2) requires that every non-aggregated column in the SELECT list also appear in the GROUP BY clause. This is because GROUP BY collapses multiple rows into one group row, so the database must know which single value to display for each column. If a column is not in GROUP BY and not inside an aggregate function, DB2 has no deterministic way to choose which row's value to display, so it raises an error.

What is the performance difference between ROLLUP, CUBE, and GROUPING SETS in DB2?

ROLLUP generates N+1 grouping levels for N columns, producing the fewest extra rows. CUBE generates 2^N combinations, which can be very expensive for more than 3-4 grouping columns. GROUPING SETS is the most efficient when you need only specific combinations — it processes exactly the grouping combinations you specify without computing the rest. For large tables, prefer GROUPING SETS over CUBE unless you genuinely need all cross-dimensional combinations.

DB2 WHERE, GROUP BY, and HAVING: Filtering and Grouping Data

Q: What is the difference between WHERE and HAVING in DB2?

WHERE filters individual rows before any grouping or aggregation takes place. HAVING filters groups after GROUP BY has been applied and aggregate functions have been evaluated. You cannot use an aggregate function (such as SUM or COUNT) in a WHERE clause — aggregate conditions must go in HAVING. Conversely, HAVING can technically filter on non-aggregate columns, but WHERE is more efficient for that purpose because it reduces the rows before aggregation.

Understanding how to correctly filter and group data is a core skill for any DB2 developer or SQL author working on IBM mainframe or DB2 LUW systems. The DB2 GROUP BY and HAVING clauses are among the most powerful — and most frequently misunderstood — features in SQL. Used correctly, they transform raw transactional data into actionable summaries. Used incorrectly, they silently return wrong results or cause hard-to-diagnose errors. This guide walks through every aspect of the WHERE, GROUP BY, and HAVING clauses in DB2, from basic filtering to advanced multidimensional grouping with ROLLUP, CUBE, and GROUPING SETS.

DB2 Logical Query Processing Order

Before diving into syntax, it is essential to understand the order in which DB2 logically processes the clauses of a SELECT statement. This order determines what each clause can and cannot reference:

FROM — identify the source tables and joins
WHERE — filter individual rows
GROUP BY — partition filtered rows into groups
HAVING — filter groups based on aggregate conditions
SELECT — compute output columns and aliases
ORDER BY — sort the final result set

This explains why a WHERE clause cannot reference a SELECT alias (aliases don't exist yet at step 2), and why HAVING can reference aggregate functions (aggregation happens at step 3-4).

The WHERE Clause: Filtering Individual Rows

The WHERE clause evaluates a condition for each row. Only rows for which the condition is TRUE pass through to the next stage. In DB2's three-valued logic, conditions can evaluate to TRUE, FALSE, or UNKNOWN (when NULLs are involved) — only TRUE rows survive a WHERE filter.

Comparison Operators

sql

-- Basic comparisons
SELECT EMPNO, LASTNAME, SALARY, DEPT
FROM   EMPLOYEE
WHERE  SALARY > 60000       -- greater than
  AND  DEPT <> 'D01'        -- not equal (also valid: != )
  AND  HIREDATE >= '2020-01-01';

DB2 supports: =, <> (or !=), >, >=, <, ≤.

BETWEEN, IN, and LIKE

sql

-- BETWEEN (inclusive on both ends)
SELECT EMPNO, SALARY
FROM   EMPLOYEE
WHERE  SALARY BETWEEN 50000 AND 80000;

-- IN: match any value in a list
SELECT EMPNO, LASTNAME
FROM   EMPLOYEE
WHERE  JOB IN ('ANALYST', 'DESIGNER', 'MANAGER');

-- NOT IN: exclude a list (beware of NULLs — see HAVING section)
SELECT EMPNO, LASTNAME
FROM   EMPLOYEE
WHERE  DEPT NOT IN ('D01', 'D02');

-- LIKE: pattern matching with wildcards
-- % matches any sequence of characters
-- _ matches exactly one character
SELECT EMPNO, LASTNAME
FROM   EMPLOYEE
WHERE  LASTNAME LIKE 'SM%'      -- starts with SM
  AND  FIRSTNME LIKE '_A%';     -- second character is A

IS NULL and IS NOT NULL

sql

-- Filter on NULL presence
SELECT EMPNO, LASTNAME, BONUS
FROM   EMPLOYEE
WHERE  BONUS IS NULL;      -- no bonus recorded

SELECT EMPNO, LASTNAME, BONUS
FROM   EMPLOYEE
WHERE  BONUS IS NOT NULL;  -- has a bonus

Never use = NULL in DB2 (or any SQL database). The expression BONUS = NULL always evaluates to UNKNOWN, so the WHERE clause never selects any rows. Always use IS NULL.

EXISTS: Testing Subquery Row Existence

sql

-- Employees who have placed at least one order
SELECT E.EMPNO, E.LASTNAME
FROM   EMPLOYEE E
WHERE  EXISTS (
    SELECT 1
    FROM   ORDERS O
    WHERE  O.SALES_REP = E.EMPNO
);

-- Employees who have placed NO orders
SELECT E.EMPNO, E.LASTNAME
FROM   EMPLOYEE E
WHERE  NOT EXISTS (
    SELECT 1
    FROM   ORDERS O
    WHERE  O.SALES_REP = E.EMPNO
);

EXISTS stops scanning as soon as it finds one matching row, making it efficient for semi-join patterns. NOT EXISTS is generally safer than NOT IN when the subquery might return NULLs (see the common mistakes section).

Logical Operators: AND, OR, NOT

sql

-- AND, OR, NOT with explicit parentheses for clarity
SELECT EMPNO, LASTNAME, SALARY, JOB, DEPT
FROM   EMPLOYEE
WHERE  (JOB = 'ANALYST' OR JOB = 'DESIGNER')
  AND  SALARY > 70000
  AND  NOT DEPT = 'D99';

Always use parentheses when mixing AND and OR. Without them, AND has higher precedence than OR, which often produces logically incorrect results that can be hard to spot.

WHERE with Subqueries

sql

-- Employees earning above the company-wide average salary
SELECT EMPNO, LASTNAME, SALARY
FROM   EMPLOYEE
WHERE  SALARY > (SELECT AVG(SALARY) FROM EMPLOYEE)
ORDER BY SALARY DESC;

-- Employees in departments located in New York
SELECT E.EMPNO, E.LASTNAME
FROM   EMPLOYEE E
WHERE  E.DEPT IN (
    SELECT D.DEPTNO
    FROM   DEPARTMENT D
    WHERE  D.LOCATION = 'NEW YORK'
);

GROUP BY: Grouping Rows for Aggregation

GROUP BY divides the rows that survive the WHERE clause into groups, one per unique combination of the grouping columns. An aggregate function applied after GROUP BY returns one value per group rather than one value for the whole table.

GROUP BY with a Single Column

sql

-- Employee count and total salary per department
SELECT   DEPT,
         COUNT(*)     AS HEADCOUNT,
         SUM(SALARY)  AS TOTAL_SALARY
FROM     EMPLOYEE
GROUP BY DEPT
ORDER BY TOTAL_SALARY DESC;

GROUP BY with Multiple Columns

sql

-- Breakdown by department AND job title
SELECT   DEPT,
         JOB,
         COUNT(*)            AS HEADCOUNT,
         AVG(SALARY)         AS AVG_SALARY,
         MIN(SALARY)         AS MIN_SALARY,
         MAX(SALARY)         AS MAX_SALARY
FROM     EMPLOYEE
GROUP BY DEPT, JOB
ORDER BY DEPT, AVG_SALARY DESC;

Every column in SELECT that is not inside an aggregate function must appear in GROUP BY. If you add LASTNAME to the SELECT above without adding it to GROUP BY, DB2 raises a SQL error.

GROUP BY with Expressions

You can group by an expression — not just a raw column. DB2 evaluates the expression for each row and groups by the result.

sql

-- Group sales by year and quarter derived from a date column
SELECT   YEAR(SALE_DATE)    AS SALE_YEAR,
         QUARTER(SALE_DATE) AS SALE_QTR,
         COUNT(*)           AS ORDER_COUNT,
         SUM(AMOUNT)        AS TOTAL_SALES
FROM     ORDERS
GROUP BY YEAR(SALE_DATE), QUARTER(SALE_DATE)
ORDER BY SALE_YEAR, SALE_QTR;

Note: DB2 does not allow referencing the SELECT alias (SALE_YEAR) in GROUP BY — you must repeat the full expression YEAR(SALE_DATE).

HAVING: Filtering Groups After Aggregation

HAVING is the post-aggregation equivalent of WHERE. It evaluates a condition against each group and discards groups for which the condition is not TRUE. HAVING is where you place conditions involving aggregate functions.

Basic HAVING Usage

sql

-- Only departments with more than 5 employees
-- AND an average salary above 65000
SELECT   DEPT,
         COUNT(*)    AS HEADCOUNT,
         AVG(SALARY) AS AVG_SALARY
FROM     EMPLOYEE
GROUP BY DEPT
HAVING   COUNT(*)    > 5
     AND AVG(SALARY) > 65000
ORDER BY AVG_SALARY DESC;

HAVING Without GROUP BY

HAVING can be used without GROUP BY, treating the entire table as a single group. This is niche but valid:

sql

-- Only return a result if the average salary exceeds 60000
SELECT COUNT(*) AS TOTAL_EMP, AVG(SALARY) AS AVG_SALARY
FROM   EMPLOYEE
HAVING AVG(SALARY) > 60000;

If the condition is false, the query returns no rows at all.

HAVING with Non-Aggregate Conditions

Technically you can put non-aggregate filters in HAVING, but it is always better practice to put them in WHERE for performance — WHERE reduces the rows before aggregation, while HAVING removes groups after the more expensive aggregation step.

sql

-- CORRECT but inefficient: non-aggregate filter in HAVING
SELECT   DEPT, COUNT(*)
FROM     EMPLOYEE
HAVING   DEPT <> 'D99'   -- DB2 will aggregate D99 and then discard it
GROUP BY DEPT;

-- BETTER: non-aggregate filter in WHERE
SELECT   DEPT, COUNT(*)
FROM     EMPLOYEE
WHERE    DEPT <> 'D99'   -- D99 rows excluded before aggregation
GROUP BY DEPT;

WHERE vs HAVING: The Definitive Comparison

Criterion	WHERE	HAVING
Evaluates at stage	Before GROUP BY	After GROUP BY
Can filter on aggregate functions (COUNT, SUM…)	No	Yes
Can filter on individual row columns	Yes	Yes (but use WHERE)
Performance impact	Reduces rows before aggregation (cheaper)	Removes groups after aggregation (costlier)
Can use SELECT column aliases	No	No

Rule of thumb: put every condition that does not involve an aggregate into WHERE. Put every condition that does involve an aggregate into HAVING.

ORDER BY with GROUP BY

ORDER BY sorts the final result and is always the last clause evaluated. You can sort by any column in the SELECT list, including aggregate expressions. On DB2 for z/OS, ORDER BY position numbers (e.g., ORDER BY 3 DESC) are supported but discouraged — they make queries fragile when columns are reordered.

sql

-- Quarterly sales ranked by total revenue, highest first
SELECT   YEAR(SALE_DATE)    AS YR,
         QUARTER(SALE_DATE) AS QTR,
         SUM(AMOUNT)        AS TOTAL,
         COUNT(*)           AS ORDERS
FROM     ORDERS
GROUP BY YEAR(SALE_DATE), QUARTER(SALE_DATE)
ORDER BY TOTAL DESC,    -- alias is valid in ORDER BY (unlike WHERE/HAVING)
         YR   ASC,
         QTR  ASC;

GROUPING SETS: Precise Multi-Level Aggregation

GROUPING SETS allows you to specify multiple grouping specifications in a single query. DB2 computes each grouping set independently and unions the results. This replaces UNION ALL of multiple GROUP BY queries with a single, more efficient pass.

sql

-- Three separate subtotals: by DEPT, by JOB, and the grand total
SELECT   DEPT,
         JOB,
         SUM(SALARY) AS TOTAL_SALARY,
         COUNT(*)    AS HEADCOUNT
FROM     EMPLOYEE
GROUP BY GROUPING SETS (
    (DEPT),      -- subtotal per department
    (JOB),       -- subtotal per job
    ()           -- grand total
)
ORDER BY GROUPING(DEPT), GROUPING(JOB),
         DEPT NULLS LAST, JOB NULLS LAST;

DB2 uses NULL in the result rows to indicate the column is part of the roll-up rather than a genuine NULL in the data. Use the GROUPING() function to distinguish them.

ROLLUP: Hierarchical Subtotals

ROLLUP is a shorthand for generating subtotals at every level of a left-to-right column hierarchy, plus a grand total. For N columns in the ROLLUP, DB2 generates N+1 grouping combinations.

sql

-- Sales hierarchy: Year > Month > Region
SELECT   YEAR(SALE_DATE)   AS SALE_YEAR,
         MONTH(SALE_DATE)  AS SALE_MONTH,
         REGION,
         SUM(AMOUNT)       AS REVENUE,
         COUNT(*)          AS ORDERS
FROM     ORDERS
GROUP BY ROLLUP(
    YEAR(SALE_DATE),
    MONTH(SALE_DATE),
    REGION
)
ORDER BY GROUPING(YEAR(SALE_DATE)),
         GROUPING(MONTH(SALE_DATE)),
         GROUPING(REGION),
         SALE_YEAR NULLS LAST,
         SALE_MONTH NULLS LAST,
         REGION     NULLS LAST;

The result includes:

Year + Month + Region (detail rows)
Year + Month subtotals
Year subtotals
Grand total row (all grouping columns are NULL)

CUBE: All-Combinations Multidimensional Analysis

CUBE generates subtotals for every possible combination of the grouping columns. For three columns, that is 2³ = 8 combinations. It is the SQL equivalent of a full OLAP cross-tabulation.

sql

-- All-dimension analysis: Category × Region × Channel
SELECT   CATEGORY,
         REGION,
         CHANNEL,
         SUM(AMOUNT)  AS TOTAL,
         GROUPING(CATEGORY) AS GRP_CAT,
         GROUPING(REGION)   AS GRP_REG,
         GROUPING(CHANNEL)  AS GRP_CHAN
FROM     SALES
GROUP BY CUBE(CATEGORY, REGION, CHANNEL)
ORDER BY GRP_CAT, GRP_REG, GRP_CHAN,
         CATEGORY NULLS LAST,
         REGION   NULLS LAST,
         CHANNEL  NULLS LAST;

The GROUPING() columns return 1 for subtotal rows and 0 for detail rows, making it easy to apply conditional formatting in a reporting tool.

Common Mistakes and How to Avoid Them

Mistake 1: Using a SELECT Alias in WHERE

sql

-- WRONG: alias ANNUAL_SALARY does not exist during WHERE evaluation
SELECT EMPNO, SALARY * 12 AS ANNUAL_SALARY
FROM   EMPLOYEE
WHERE  ANNUAL_SALARY > 720000;   -- ERROR: ANNUAL_SALARY is unknown

-- CORRECT: repeat the expression, or use a derived table
SELECT EMPNO, ANNUAL_SALARY
FROM (
    SELECT EMPNO, SALARY * 12 AS ANNUAL_SALARY
    FROM   EMPLOYEE
) AS E
WHERE  ANNUAL_SALARY > 720000;

Mistake 2: Putting an Aggregate in WHERE

sql

-- WRONG: aggregate function in WHERE
SELECT DEPT, COUNT(*)
FROM   EMPLOYEE
WHERE  COUNT(*) > 5     -- ERROR: aggregates not allowed in WHERE
GROUP BY DEPT;

-- CORRECT: aggregate condition in HAVING
SELECT DEPT, COUNT(*)
FROM   EMPLOYEE
GROUP BY DEPT
HAVING COUNT(*) > 5;

Mistake 3: NOT IN with a Subquery That Returns NULLs

sql

-- DANGEROUS: if ANY row in the subquery returns NULL,
-- NOT IN evaluates to UNKNOWN for every row, returning zero results
SELECT EMPNO, LASTNAME
FROM   EMPLOYEE
WHERE  DEPT NOT IN (
    SELECT MGR_DEPT FROM MANAGERS   -- MGR_DEPT might contain NULLs
);

-- SAFE: use NOT EXISTS instead
SELECT E.EMPNO, E.LASTNAME
FROM   EMPLOYEE E
WHERE  NOT EXISTS (
    SELECT 1
    FROM   MANAGERS M
    WHERE  M.MGR_DEPT = E.DEPT
);

Mistake 4: Omitting a GROUP BY Column from SELECT

sql

-- If you GROUP BY DEPT and JOB but only select DEPT,
-- the JOB breakdown is computed but silently discarded —
-- you may be aggregating at a finer grain than intended.

-- Always verify your GROUP BY matches your reporting grain.
SELECT   DEPT,
         SUM(SALARY) AS TOTAL   -- JOB missing from SELECT, GROUP BY is DEPT+JOB
FROM     EMPLOYEE
GROUP BY DEPT, JOB;             -- This is legal but unusual; produces one row per DEPT+JOB
                                -- while only showing DEPT in the output

Performance Considerations

Index Usage with WHERE

sql

-- Sargable: index on HIREDATE can be used (range scan)
WHERE HIREDATE BETWEEN '2020-01-01' AND '2022-12-31'

-- Non-sargable: function wrapping the column prevents index use
WHERE YEAR(HIREDATE) BETWEEN 2020 AND 2022

Sargable predicates (Search ARGument ABLE) allow DB2 to use an index to limit the rows scanned. Wrapping a column in a function makes it non-sargable. Rewrite date range predicates as BETWEEN literals for optimal index use.

Stage 1 vs Stage 2 Predicates (DB2 for z/OS)

In DB2 for z/OS, predicates are classified as Stage 1 (applied during the data manager scan, most efficient) or Stage 2 (applied after rows are returned to RDS, less efficient). Simple column comparisons (=, >, BETWEEN, IN) are typically Stage 1. Functions and complex expressions are Stage 2. Keeping your WHERE predicates Stage 1 is critical for mainframe performance.

Avoid SELECT * with GROUP BY

SELECT * is invalid with GROUP BY in most cases (you cannot include all columns in both GROUP BY and aggregates simultaneously). More importantly, selecting only the columns you need reduces the data volume DB2 must sort and return.

RUNSTATS and GROUP BY Plans

DB2's optimizer decides between sort-based and hash-based aggregation based on catalog statistics. After large data loads, run RUNSTATS on the tables and indexes involved in frequent GROUP BY queries to ensure the optimizer has accurate row counts and column cardinality estimates.

Full Example: Sales Performance Report

sql

-- Complete sales performance report combining all concepts
SELECT   R.REGION_NAME,
         YEAR(O.ORDER_DATE)              AS ORDER_YEAR,
         COUNT(*)                        AS TOTAL_ORDERS,
         COUNT(DISTINCT O.CUSTOMER_ID)   AS UNIQUE_CUSTOMERS,
         SUM(O.AMOUNT)                   AS GROSS_REVENUE,
         SUM(O.AMOUNT - O.DISCOUNT)      AS NET_REVENUE,
         AVG(O.AMOUNT)                   AS AVG_ORDER_VALUE,
         MAX(O.AMOUNT)                   AS LARGEST_ORDER
FROM     ORDERS     O
JOIN     REGIONS    R  ON O.REGION_ID = R.REGION_ID
WHERE    O.STATUS   IN ('COMPLETED', 'SHIPPED')
  AND    O.ORDER_DATE BETWEEN '2025-01-01' AND '2026-12-31'
GROUP BY R.REGION_NAME, YEAR(O.ORDER_DATE)
HAVING   SUM(O.AMOUNT) > 100000
     AND COUNT(*)       > 50
ORDER BY ORDER_YEAR, NET_REVENUE DESC;

This query demonstrates every concept in this article: WHERE with IN and BETWEEN, a JOIN, GROUP BY with an expression, multiple aggregates, HAVING with aggregate conditions, and ORDER BY with a mix of column and aggregate expressions.

Related Resources

Frequently Asked Questions

Q: What is the difference between WHERE and HAVING in DB2? WHERE filters individual rows before they are grouped — it cannot reference aggregate functions because aggregation has not happened yet. HAVING filters groups after GROUP BY aggregation and can reference aggregate results: HAVING COUNT(*) > 5. As a performance rule, push as much filtering as possible into WHERE to reduce the number of rows that enter the aggregation phase.

Q: Can you GROUP BY a column alias defined in the SELECT list in DB2? No — DB2 (like most SQL databases) evaluates GROUP BY before SELECT, so column aliases defined in SELECT are not available to GROUP BY. Repeat the full expression: GROUP BY YEAR(order_date) if the SELECT has YEAR(order_date) AS order_year. Alternatively, wrap the query in a subquery and group by the alias in the outer query.

Q: How do you filter on multiple conditions after GROUP BY in DB2? Chain conditions in the HAVING clause with AND/OR: HAVING SUM(amount) > 1000 AND COUNT(*) >= 5. You can also mix aggregate and non-aggregate conditions in HAVING, though non-aggregate conditions are better placed in WHERE for efficiency. For complex post-aggregation filtering, wrap the grouped query in a subquery (or CTE) and filter in the outer WHERE clause where aliases are accessible.