DB2 Aggregate Functions: COUNT, SUM, AVG, MIN, MAX Complete Guide

DB2 Aggregate Functions: COUNT, SUM, AVG, MIN, MAX Complete Guide
DB2 aggregate functions are the foundation of every analytical SQL query written against IBM DB2 — whether you are building management reports on a mainframe, running batch analytics in JCL, or querying a DB2 LUW data warehouse. Aggregate functions collapse a set of rows into a single summary value, enabling you to answer questions such as "How many orders were placed last month?", "What is the average salary by department?", or "Which product line generated the highest revenue?" This guide covers every major DB2 aggregate function in depth — from the everyday COUNT and SUM to advanced capabilities like LISTAGG, ROLLUP, and CUBE — with practical SQL examples you can run immediately.
What Are Aggregate Functions in DB2?
An aggregate function (also called a column function in DB2 documentation) takes a set of values from a column across multiple rows and returns a single scalar result. Aggregate functions are used in the SELECT list and, with a filter condition, in the HAVING clause. They are always evaluated after the WHERE clause filters rows and after the GROUP BY clause forms groups.
DB2 follows the SQL standard: all aggregate functions except COUNT(*) ignore NULL values in the input column. This is not a DB2 quirk — it is standard behaviour, but it is one of the most common sources of subtle bugs in reporting queries.
COUNT: Counting Rows and Non-NULL Values
COUNT(*) vs COUNT(column)
This is the most important distinction for any DB2 developer to understand.
-- COUNT(*): counts ALL rows, including rows with NULLs
SELECT COUNT(*)
FROM EMPLOYEE;
-- COUNT(column): counts only rows where BONUS is NOT NULL
SELECT COUNT(BONUS)
FROM EMPLOYEE;
-- COUNT(DISTINCT column): counts unique non-null values
SELECT COUNT(DISTINCT DEPT)
FROM EMPLOYEE;If the EMPLOYEE table has 500 rows but only 320 employees have a BONUS recorded (the other 180 have NULL), COUNT(*) returns 500 while COUNT(BONUS) returns 320. Getting this wrong in a payroll report produces materially incorrect figures.
COUNT in a GROUP BY
-- Number of employees per department
SELECT DEPT,
COUNT(*) AS TOTAL_EMP,
COUNT(BONUS) AS EMP_WITH_BONUS
FROM EMPLOYEE
GROUP BY DEPT
ORDER BY DEPT;SUM: Totalling Numeric Columns
SUM adds all non-NULL values in a numeric column. If every value in the group is NULL, SUM returns NULL, not zero.
-- Total salary bill by department
SELECT DEPT,
SUM(SALARY) AS TOTAL_SALARY,
SUM(SALARY + COALESCE(BONUS, 0)) AS TOTAL_COMPENSATION
FROM EMPLOYEE
GROUP BY DEPT
ORDER BY TOTAL_SALARY DESC;Notice the use of COALESCE(BONUS, 0) inside the SUM. Without it, any employee with a NULL bonus causes the entire addition SALARY + BONUS to be NULL, silently dropping that employee's salary from the total. Always handle NULLs explicitly inside expressions fed to aggregate functions.
SUM with DISTINCT
-- Sum of distinct order values (deduplicate duplicate order amounts)
SELECT SUM(DISTINCT ORDER_AMOUNT)
FROM ORDERS
WHERE ORDER_DATE >= '2026-01-01';DISTINCT inside an aggregate is rarely needed in practice but is valid DB2 syntax.
AVG: Calculating Averages
AVG returns the arithmetic mean of non-NULL values. DB2 performs integer division when the column is an integer type — a well-known gotcha.
-- Average salary per job title
-- CAST ensures decimal precision, not integer truncation
SELECT JOB,
AVG(SALARY) AS AVG_SALARY_INT,
AVG(CAST(SALARY AS DECIMAL(11,2))) AS AVG_SALARY_DEC
FROM EMPLOYEE
GROUP BY JOB
ORDER BY AVG_SALARY_DEC DESC;If SALARY is INTEGER, AVG(SALARY) in DB2 returns an integer result (truncated). Cast to DECIMAL or FLOAT to get the fractional portion in your reports.
MIN and MAX: Finding Extremes
MIN and MAX work on numeric, character, and date/time columns. They return the lowest and highest non-NULL value in the group.
-- Earliest and latest hire date per department
SELECT DEPT,
MIN(HIREDATE) AS EARLIEST_HIRE,
MAX(HIREDATE) AS LATEST_HIRE,
MAX(HIREDATE) - MIN(HIREDATE) DAYS AS HIRE_SPAN_DAYS
FROM EMPLOYEE
GROUP BY DEPT
ORDER BY DEPT;MIN and MAX on character columns use the column's collation sequence, which on z/OS is typically EBCDIC. Be aware that EBCDIC sort order differs from ASCII — lowercase letters sort after uppercase in EBCDIC, the reverse of ASCII.
STDDEV and VARIANCE: Statistical Aggregates
DB2 supports population and sample standard deviation and variance. These are used in quality control, financial risk, and data science queries.
-- Salary dispersion analysis per department
SELECT DEPT,
AVG(SALARY) AS MEAN_SALARY,
STDDEV(SALARY) AS STD_DEV, -- sample standard deviation
VARIANCE(SALARY) AS VARIANCE_VAL -- sample variance
FROM EMPLOYEE
GROUP BY DEPT
HAVING COUNT(*) > 5 -- only departments with more than 5 employees
ORDER BY STD_DEV DESC;DB2 also provides STDDEV_POP and VAR_POP for population (not sample) statistics if your group represents the entire population rather than a sample.
LISTAGG: String Aggregation
LISTAGG concatenates values from multiple rows into a single delimited string per group. It is invaluable for reports that need comma-separated lists without writing application-layer string logic.
-- List all employees per department as a comma-separated string
SELECT DEPT,
LISTAGG(LASTNAME, ', ')
WITHIN GROUP (ORDER BY LASTNAME) AS EMPLOYEE_LIST
FROM EMPLOYEE
GROUP BY DEPT
ORDER BY DEPT;The WITHIN GROUP (ORDER BY ...) clause controls the order of values within the concatenated string. Without it, DB2 does not guarantee order.
LISTAGG with ON OVERFLOW TRUNCATE
On very large groups, the concatenated result can exceed the maximum VARCHAR length. DB2 provides overflow handling:
SELECT DEPT,
LISTAGG(LASTNAME, ', ' ON OVERFLOW TRUNCATE '...' WITHOUT COUNT)
WITHIN GROUP (ORDER BY LASTNAME) AS EMPLOYEE_LIST
FROM EMPLOYEE
GROUP BY DEPT;ON OVERFLOW TRUNCATE silently truncates the string and appends the specified suffix rather than raising an error.
GROUP BY: Grouping Rows for Aggregation
GROUP BY partitions the result set into groups, and aggregate functions are applied per group. Every non-aggregate column in the SELECT list must appear in the GROUP BY clause — this is a strict SQL rule that DB2 enforces.
GROUP BY with Multiple Columns
-- Sales summary by year, quarter, and region
SELECT YEAR(SALE_DATE) AS SALE_YEAR,
QUARTER(SALE_DATE) AS SALE_QTR,
REGION,
COUNT(*) AS ORDER_COUNT,
SUM(AMOUNT) AS TOTAL_AMOUNT,
AVG(AMOUNT) AS AVG_ORDER_VALUE
FROM SALES
GROUP BY YEAR(SALE_DATE),
QUARTER(SALE_DATE),
REGION
ORDER BY SALE_YEAR, SALE_QTR, REGION;HAVING: Filtering on Aggregated Results
HAVING filters groups after aggregation, exactly as WHERE filters rows before aggregation. You cannot use a WHERE clause to filter on an aggregate expression — that is a very common mistake.
-- Departments where average salary exceeds 75000
-- and total headcount is at least 10
SELECT DEPT,
COUNT(*) AS HEADCOUNT,
AVG(SALARY) AS AVG_SALARY,
SUM(SALARY) AS PAYROLL
FROM EMPLOYEE
GROUP BY DEPT
HAVING AVG(SALARY) > 75000
AND COUNT(*) >= 10
ORDER BY AVG_SALARY DESC;HAVING vs WHERE: Key Rule
| Clause | When evaluated | Can reference aggregates? | Can reference row-level columns? |
|---|---|---|---|
| WHERE | Before GROUP BY | No | Yes |
| HAVING | After GROUP BY | Yes | Yes (but not aliases) |
Aggregates with DISTINCT
Any aggregate function in DB2 can use DISTINCT to operate only on unique values before aggregating.
-- Count distinct customers who placed orders each month
SELECT YEAR(ORDER_DATE) AS ORDER_YEAR,
MONTH(ORDER_DATE) AS ORDER_MONTH,
COUNT(DISTINCT CUSTOMER_ID) AS UNIQUE_CUSTOMERS,
SUM(AMOUNT) AS TOTAL_REVENUE
FROM ORDERS
GROUP BY YEAR(ORDER_DATE), MONTH(ORDER_DATE)
ORDER BY ORDER_YEAR, ORDER_MONTH;Aggregates with CASE: Conditional Aggregation
Combining CASE expressions with aggregate functions is one of the most powerful DB2 reporting techniques. It allows you to pivot data, calculate conditional sums, and produce cross-tabulation without needing a separate PIVOT keyword.
-- Pivot: count employees by job type per department
SELECT DEPT,
COUNT(CASE WHEN JOB = 'ANALYST' THEN 1 END) AS ANALYSTS,
COUNT(CASE WHEN JOB = 'MANAGER' THEN 1 END) AS MANAGERS,
COUNT(CASE WHEN JOB = 'CLERK' THEN 1 END) AS CLERKS,
SUM(CASE WHEN JOB = 'ANALYST' THEN SALARY ELSE 0 END) AS ANALYST_PAYROLL,
SUM(CASE WHEN JOB = 'MANAGER' THEN SALARY ELSE 0 END) AS MANAGER_PAYROLL
FROM EMPLOYEE
GROUP BY DEPT
ORDER BY DEPT;When CASE returns NULL for non-matching rows (no ELSE clause), COUNT naturally ignores those NULLs, which is exactly the desired behaviour for counting matching rows.
Aggregates with NULL Values
Understanding how NULLs interact with aggregates prevents silent data quality issues.
-- Demonstrate NULL behaviour in aggregates
SELECT COUNT(*) AS TOTAL_ROWS, -- includes NULL rows
COUNT(COMMISSION) AS WITH_COMMISSION, -- excludes NULLs
SUM(COMMISSION) AS TOTAL_COMMISSION, -- NULLs excluded from sum
AVG(COMMISSION) AS AVG_COMMISSION, -- avg of non-NULLs only
-- Treat NULL commission as 0 for a different business interpretation
AVG(COALESCE(COMMISSION, 0)) AS AVG_COMMISSION_INCL_ZERO
FROM EMPLOYEE;If 200 of 500 employees earn a commission, AVG(COMMISSION) is the average across those 200 only. AVG(COALESCE(COMMISSION, 0)) spreads the sum across all 500, giving a lower per-employee average. Which one is correct depends entirely on the business question being answered.
Aggregates in Subqueries
Aggregate functions are frequently used in subqueries to filter rows relative to a computed threshold.
-- Employees earning above their department's average salary
SELECT E.EMPNO,
E.LASTNAME,
E.DEPT,
E.SALARY,
D.AVG_DEPT_SALARY
FROM EMPLOYEE E
JOIN (SELECT DEPT,
AVG(SALARY) AS AVG_DEPT_SALARY
FROM EMPLOYEE
GROUP BY DEPT) D
ON E.DEPT = D.DEPT
WHERE E.SALARY > D.AVG_DEPT_SALARY
ORDER BY E.DEPT, E.SALARY DESC;This pattern — joining to a derived table that contains aggregates — is generally more efficient in DB2 than a correlated subquery that recalculates the average for every row.
Combining Multiple Aggregates
Real business reports combine many aggregates in one pass, which is far more efficient than running separate queries.
-- Comprehensive order analysis per customer segment
SELECT SEGMENT,
COUNT(*) AS TOTAL_ORDERS,
COUNT(DISTINCT CUSTOMER_ID) AS UNIQUE_CUSTOMERS,
SUM(ORDER_AMOUNT) AS TOTAL_REVENUE,
AVG(ORDER_AMOUNT) AS AVG_ORDER_VALUE,
MIN(ORDER_AMOUNT) AS MIN_ORDER,
MAX(ORDER_AMOUNT) AS MAX_ORDER,
MAX(ORDER_AMOUNT) - MIN(ORDER_AMOUNT) AS ORDER_RANGE,
STDDEV(ORDER_AMOUNT) AS STDDEV_ORDER,
SUM(CASE WHEN STATUS = 'RETURNED' THEN ORDER_AMOUNT ELSE 0 END) AS RETURNED_VALUE,
DECIMAL(
100.0 * SUM(CASE WHEN STATUS = 'RETURNED' THEN 1 ELSE 0 END)
/ COUNT(*), 5, 2) AS RETURN_RATE_PCT
FROM ORDERS
WHERE ORDER_DATE BETWEEN '2026-01-01' AND '2026-03-31'
GROUP BY SEGMENT
ORDER BY TOTAL_REVENUE DESC;ROLLUP: Hierarchical Subtotals
ROLLUP is a powerful GROUP BY extension that automatically generates subtotal rows at each level of a column hierarchy, plus a grand total row. It eliminates the need to write multiple queries and UNION ALL them together.
-- Sales hierarchy: Country > Region > City
SELECT COUNTRY,
REGION,
CITY,
SUM(SALES_AMOUNT) AS TOTAL_SALES,
COUNT(*) AS ORDER_COUNT
FROM SALES
GROUP BY ROLLUP(COUNTRY, REGION, CITY)
ORDER BY COUNTRY NULLS LAST,
REGION NULLS LAST,
CITY NULLS LAST;DB2 generates rows for:
- Each
COUNTRY + REGION + CITYcombination (most detailed) - Each
COUNTRY + REGIONsubtotal - Each
COUNTRYsubtotal - The grand total (all columns NULL in GROUP BY position)
Use GROUPING(column) to distinguish subtotal rows from rows where the column actually contains NULL:
SELECT CASE GROUPING(COUNTRY) WHEN 1 THEN 'ALL COUNTRIES' ELSE COUNTRY END AS COUNTRY,
CASE GROUPING(REGION) WHEN 1 THEN 'ALL REGIONS' ELSE REGION END AS REGION,
SUM(SALES_AMOUNT) AS TOTAL_SALES
FROM SALES
GROUP BY ROLLUP(COUNTRY, REGION)
ORDER BY GROUPING(COUNTRY), GROUPING(REGION), COUNTRY, REGION;CUBE: Multidimensional Analysis
CUBE extends ROLLUP by generating subtotals for every possible combination of the grouping columns. For three columns A, B, C, CUBE(A, B, C) produces 2³ = 8 grouping combinations.
-- Cross-tabulation: sales by product category, region, and channel
SELECT CATEGORY,
REGION,
CHANNEL,
SUM(AMOUNT) AS TOTAL_SALES
FROM SALES
GROUP BY CUBE(CATEGORY, REGION, CHANNEL)
ORDER BY GROUPING(CATEGORY),
GROUPING(REGION),
GROUPING(CHANNEL),
CATEGORY NULLS LAST,
REGION NULLS LAST,
CHANNEL NULLS LAST;CUBE is ideal for OLAP-style reports where analysts need to slice data by any combination of dimensions. However, it generates significantly more rows than ROLLUP — plan for the extra I/O in large tables.
GROUPING SETS: Precise Control Over Aggregation Levels
GROUPING SETS lets you specify exactly which grouping combinations you want, without generating all the combinations that CUBE would produce. It is the most flexible of the three superaggregate extensions.
-- Produce only: total per department, total per job, and grand total
-- (not the department+job combination)
SELECT DEPT,
JOB,
SUM(SALARY) AS TOTAL_SALARY,
COUNT(*) AS HEADCOUNT
FROM EMPLOYEE
GROUP BY GROUPING SETS (
(DEPT), -- subtotal per department
(JOB), -- subtotal per job
() -- grand total
)
ORDER BY GROUPING(DEPT), GROUPING(JOB), DEPT NULLS LAST, JOB NULLS LAST;GROUPING SETS eliminates redundant computation. If you only need two of the eight combinations that CUBE would generate, use GROUPING SETS and save both CPU and elapsed time.
Practical Business Reporting Examples
Monthly Revenue Trend with Running Total
-- Monthly revenue with month-over-month growth
SELECT YEAR(SALE_DATE) AS SALE_YEAR,
MONTH(SALE_DATE) AS SALE_MONTH,
SUM(AMOUNT) AS MONTHLY_REVENUE,
SUM(SUM(AMOUNT)) OVER (
PARTITION BY YEAR(SALE_DATE)
ORDER BY MONTH(SALE_DATE)
ROWS UNBOUNDED PRECEDING
) AS YTD_REVENUE
FROM SALES
WHERE SALE_DATE >= '2026-01-01'
GROUP BY YEAR(SALE_DATE), MONTH(SALE_DATE)
ORDER BY SALE_YEAR, SALE_MONTH;This example nests a SUM aggregate inside an OVER clause — a valid DB2 pattern for computing year-to-date running totals alongside monthly figures.
Department Budget vs Actuals
-- Compare actual payroll to budget by department
SELECT E.DEPT,
SUM(E.SALARY + COALESCE(E.BONUS, 0)) AS ACTUAL_COST,
MAX(B.BUDGET_AMOUNT) AS BUDGET,
SUM(E.SALARY + COALESCE(E.BONUS, 0))
- MAX(B.BUDGET_AMOUNT) AS VARIANCE,
DECIMAL(
100.0 * SUM(E.SALARY + COALESCE(E.BONUS, 0))
/ MAX(B.BUDGET_AMOUNT), 6, 2) AS PCT_OF_BUDGET
FROM EMPLOYEE E
JOIN DEPT_BUDGET B ON E.DEPT = B.DEPT
GROUP BY E.DEPT
HAVING SUM(E.SALARY + COALESCE(E.BONUS, 0)) > MAX(B.BUDGET_AMOUNT)
ORDER BY VARIANCE DESC;Performance Tips for DB2 Aggregate Queries
- Use indexes on GROUP BY columns. DB2 can perform a grouped index scan (GIS) when a matching index exists on the grouping columns, avoiding a sort.
- Filter early with WHERE. Push row-level filters into the
WHEREclause rather thanHAVINGso DB2 aggregates fewer rows. - Avoid functions on indexed columns in WHERE.
WHERE YEAR(SALE_DATE) = 2026prevents index use; preferWHERE SALE_DATE BETWEEN '2026-01-01' AND '2026-12-31'. - Run RUNSTATS after large loads. The DB2 optimizer relies on table statistics to choose between sort-based aggregation and hash aggregation. Stale statistics lead to poor plan choices.
- Consider materialized query tables (MQTs). For frequently run aggregate reports on large tables, an MQT (DB2's equivalent of a materialized view) pre-computes aggregates and dramatically reduces query elapsed time.
