MainframeDatabaseDB2 Complete Reference

DB2 Aggregate Functions: COUNT, SUM, AVG, MIN, MAX Complete Guide

TT
TopicTrick Team
DB2 Aggregate Functions: COUNT, SUM, AVG, MIN, MAX Complete Guide

DB2 Aggregate Functions: COUNT, SUM, AVG, MIN, MAX Complete Guide

DB2 aggregate functions are the foundation of every analytical SQL query written against IBM DB2 — whether you are building management reports on a mainframe, running batch analytics in JCL, or querying a DB2 LUW data warehouse. Aggregate functions collapse a set of rows into a single summary value, enabling you to answer questions such as "How many orders were placed last month?", "What is the average salary by department?", or "Which product line generated the highest revenue?" This guide covers every major DB2 aggregate function in depth — from the everyday COUNT and SUM to advanced capabilities like LISTAGG, ROLLUP, and CUBE — with practical SQL examples you can run immediately.


What Are Aggregate Functions in DB2?

An aggregate function (also called a column function in DB2 documentation) takes a set of values from a column across multiple rows and returns a single scalar result. Aggregate functions are used in the SELECT list and, with a filter condition, in the HAVING clause. They are always evaluated after the WHERE clause filters rows and after the GROUP BY clause forms groups.

DB2 follows the SQL standard: all aggregate functions except COUNT(*) ignore NULL values in the input column. This is not a DB2 quirk — it is standard behaviour, but it is one of the most common sources of subtle bugs in reporting queries.


COUNT: Counting Rows and Non-NULL Values

COUNT(*) vs COUNT(column)

This is the most important distinction for any DB2 developer to understand.

sql
-- COUNT(*): counts ALL rows, including rows with NULLs
SELECT COUNT(*)
FROM   EMPLOYEE;

-- COUNT(column): counts only rows where BONUS is NOT NULL
SELECT COUNT(BONUS)
FROM   EMPLOYEE;

-- COUNT(DISTINCT column): counts unique non-null values
SELECT COUNT(DISTINCT DEPT)
FROM   EMPLOYEE;

If the EMPLOYEE table has 500 rows but only 320 employees have a BONUS recorded (the other 180 have NULL), COUNT(*) returns 500 while COUNT(BONUS) returns 320. Getting this wrong in a payroll report produces materially incorrect figures.

COUNT in a GROUP BY

sql
-- Number of employees per department
SELECT   DEPT,
         COUNT(*)        AS TOTAL_EMP,
         COUNT(BONUS)    AS EMP_WITH_BONUS
FROM     EMPLOYEE
GROUP BY DEPT
ORDER BY DEPT;

SUM: Totalling Numeric Columns

SUM adds all non-NULL values in a numeric column. If every value in the group is NULL, SUM returns NULL, not zero.

sql
-- Total salary bill by department
SELECT   DEPT,
         SUM(SALARY)                         AS TOTAL_SALARY,
         SUM(SALARY + COALESCE(BONUS, 0))    AS TOTAL_COMPENSATION
FROM     EMPLOYEE
GROUP BY DEPT
ORDER BY TOTAL_SALARY DESC;

Notice the use of COALESCE(BONUS, 0) inside the SUM. Without it, any employee with a NULL bonus causes the entire addition SALARY + BONUS to be NULL, silently dropping that employee's salary from the total. Always handle NULLs explicitly inside expressions fed to aggregate functions.

SUM with DISTINCT

sql
-- Sum of distinct order values (deduplicate duplicate order amounts)
SELECT SUM(DISTINCT ORDER_AMOUNT)
FROM   ORDERS
WHERE  ORDER_DATE >= '2026-01-01';

DISTINCT inside an aggregate is rarely needed in practice but is valid DB2 syntax.


AVG: Calculating Averages

AVG returns the arithmetic mean of non-NULL values. DB2 performs integer division when the column is an integer type — a well-known gotcha.

sql
-- Average salary per job title
-- CAST ensures decimal precision, not integer truncation
SELECT   JOB,
         AVG(SALARY)                             AS AVG_SALARY_INT,
         AVG(CAST(SALARY AS DECIMAL(11,2)))      AS AVG_SALARY_DEC
FROM     EMPLOYEE
GROUP BY JOB
ORDER BY AVG_SALARY_DEC DESC;

If SALARY is INTEGER, AVG(SALARY) in DB2 returns an integer result (truncated). Cast to DECIMAL or FLOAT to get the fractional portion in your reports.


MIN and MAX: Finding Extremes

MIN and MAX work on numeric, character, and date/time columns. They return the lowest and highest non-NULL value in the group.

sql
-- Earliest and latest hire date per department
SELECT   DEPT,
         MIN(HIREDATE)   AS EARLIEST_HIRE,
         MAX(HIREDATE)   AS LATEST_HIRE,
         MAX(HIREDATE) - MIN(HIREDATE) DAYS AS HIRE_SPAN_DAYS
FROM     EMPLOYEE
GROUP BY DEPT
ORDER BY DEPT;

MIN and MAX on character columns use the column's collation sequence, which on z/OS is typically EBCDIC. Be aware that EBCDIC sort order differs from ASCII — lowercase letters sort after uppercase in EBCDIC, the reverse of ASCII.


STDDEV and VARIANCE: Statistical Aggregates

DB2 supports population and sample standard deviation and variance. These are used in quality control, financial risk, and data science queries.

sql
-- Salary dispersion analysis per department
SELECT   DEPT,
         AVG(SALARY)        AS MEAN_SALARY,
         STDDEV(SALARY)     AS STD_DEV,      -- sample standard deviation
         VARIANCE(SALARY)   AS VARIANCE_VAL  -- sample variance
FROM     EMPLOYEE
GROUP BY DEPT
HAVING   COUNT(*) > 5       -- only departments with more than 5 employees
ORDER BY STD_DEV DESC;

DB2 also provides STDDEV_POP and VAR_POP for population (not sample) statistics if your group represents the entire population rather than a sample.


LISTAGG: String Aggregation

LISTAGG concatenates values from multiple rows into a single delimited string per group. It is invaluable for reports that need comma-separated lists without writing application-layer string logic.

sql
-- List all employees per department as a comma-separated string
SELECT   DEPT,
         LISTAGG(LASTNAME, ', ')
             WITHIN GROUP (ORDER BY LASTNAME) AS EMPLOYEE_LIST
FROM     EMPLOYEE
GROUP BY DEPT
ORDER BY DEPT;

The WITHIN GROUP (ORDER BY ...) clause controls the order of values within the concatenated string. Without it, DB2 does not guarantee order.

LISTAGG with ON OVERFLOW TRUNCATE

On very large groups, the concatenated result can exceed the maximum VARCHAR length. DB2 provides overflow handling:

sql
SELECT   DEPT,
         LISTAGG(LASTNAME, ', ' ON OVERFLOW TRUNCATE '...' WITHOUT COUNT)
             WITHIN GROUP (ORDER BY LASTNAME) AS EMPLOYEE_LIST
FROM     EMPLOYEE
GROUP BY DEPT;

ON OVERFLOW TRUNCATE silently truncates the string and appends the specified suffix rather than raising an error.


GROUP BY: Grouping Rows for Aggregation

GROUP BY partitions the result set into groups, and aggregate functions are applied per group. Every non-aggregate column in the SELECT list must appear in the GROUP BY clause — this is a strict SQL rule that DB2 enforces.

GROUP BY with Multiple Columns

sql
-- Sales summary by year, quarter, and region
SELECT   YEAR(SALE_DATE)     AS SALE_YEAR,
         QUARTER(SALE_DATE)  AS SALE_QTR,
         REGION,
         COUNT(*)            AS ORDER_COUNT,
         SUM(AMOUNT)         AS TOTAL_AMOUNT,
         AVG(AMOUNT)         AS AVG_ORDER_VALUE
FROM     SALES
GROUP BY YEAR(SALE_DATE),
         QUARTER(SALE_DATE),
         REGION
ORDER BY SALE_YEAR, SALE_QTR, REGION;

HAVING: Filtering on Aggregated Results

HAVING filters groups after aggregation, exactly as WHERE filters rows before aggregation. You cannot use a WHERE clause to filter on an aggregate expression — that is a very common mistake.

sql
-- Departments where average salary exceeds 75000
-- and total headcount is at least 10
SELECT   DEPT,
         COUNT(*)        AS HEADCOUNT,
         AVG(SALARY)     AS AVG_SALARY,
         SUM(SALARY)     AS PAYROLL
FROM     EMPLOYEE
GROUP BY DEPT
HAVING   AVG(SALARY) > 75000
     AND COUNT(*)     >= 10
ORDER BY AVG_SALARY DESC;

HAVING vs WHERE: Key Rule

ClauseWhen evaluatedCan reference aggregates?Can reference row-level columns?
WHEREBefore GROUP BYNoYes
HAVINGAfter GROUP BYYesYes (but not aliases)

Aggregates with DISTINCT

Any aggregate function in DB2 can use DISTINCT to operate only on unique values before aggregating.

sql
-- Count distinct customers who placed orders each month
SELECT   YEAR(ORDER_DATE)   AS ORDER_YEAR,
         MONTH(ORDER_DATE)  AS ORDER_MONTH,
         COUNT(DISTINCT CUSTOMER_ID)  AS UNIQUE_CUSTOMERS,
         SUM(AMOUNT)                  AS TOTAL_REVENUE
FROM     ORDERS
GROUP BY YEAR(ORDER_DATE), MONTH(ORDER_DATE)
ORDER BY ORDER_YEAR, ORDER_MONTH;

Aggregates with CASE: Conditional Aggregation

Combining CASE expressions with aggregate functions is one of the most powerful DB2 reporting techniques. It allows you to pivot data, calculate conditional sums, and produce cross-tabulation without needing a separate PIVOT keyword.

sql
-- Pivot: count employees by job type per department
SELECT   DEPT,
         COUNT(CASE WHEN JOB = 'ANALYST'  THEN 1 END)  AS ANALYSTS,
         COUNT(CASE WHEN JOB = 'MANAGER'  THEN 1 END)  AS MANAGERS,
         COUNT(CASE WHEN JOB = 'CLERK'    THEN 1 END)  AS CLERKS,
         SUM(CASE WHEN JOB = 'ANALYST'  THEN SALARY ELSE 0 END) AS ANALYST_PAYROLL,
         SUM(CASE WHEN JOB = 'MANAGER'  THEN SALARY ELSE 0 END) AS MANAGER_PAYROLL
FROM     EMPLOYEE
GROUP BY DEPT
ORDER BY DEPT;

When CASE returns NULL for non-matching rows (no ELSE clause), COUNT naturally ignores those NULLs, which is exactly the desired behaviour for counting matching rows.


Aggregates with NULL Values

Understanding how NULLs interact with aggregates prevents silent data quality issues.

sql
-- Demonstrate NULL behaviour in aggregates
SELECT COUNT(*)           AS TOTAL_ROWS,       -- includes NULL rows
       COUNT(COMMISSION)  AS WITH_COMMISSION,   -- excludes NULLs
       SUM(COMMISSION)    AS TOTAL_COMMISSION,  -- NULLs excluded from sum
       AVG(COMMISSION)    AS AVG_COMMISSION,    -- avg of non-NULLs only
       -- Treat NULL commission as 0 for a different business interpretation
       AVG(COALESCE(COMMISSION, 0)) AS AVG_COMMISSION_INCL_ZERO
FROM   EMPLOYEE;

If 200 of 500 employees earn a commission, AVG(COMMISSION) is the average across those 200 only. AVG(COALESCE(COMMISSION, 0)) spreads the sum across all 500, giving a lower per-employee average. Which one is correct depends entirely on the business question being answered.


Aggregates in Subqueries

Aggregate functions are frequently used in subqueries to filter rows relative to a computed threshold.

sql
-- Employees earning above their department's average salary
SELECT E.EMPNO,
       E.LASTNAME,
       E.DEPT,
       E.SALARY,
       D.AVG_DEPT_SALARY
FROM   EMPLOYEE E
JOIN   (SELECT DEPT,
               AVG(SALARY) AS AVG_DEPT_SALARY
        FROM   EMPLOYEE
        GROUP BY DEPT) D
  ON   E.DEPT = D.DEPT
WHERE  E.SALARY > D.AVG_DEPT_SALARY
ORDER BY E.DEPT, E.SALARY DESC;

This pattern — joining to a derived table that contains aggregates — is generally more efficient in DB2 than a correlated subquery that recalculates the average for every row.


Combining Multiple Aggregates

Real business reports combine many aggregates in one pass, which is far more efficient than running separate queries.

sql
-- Comprehensive order analysis per customer segment
SELECT   SEGMENT,
         COUNT(*)                             AS TOTAL_ORDERS,
         COUNT(DISTINCT CUSTOMER_ID)          AS UNIQUE_CUSTOMERS,
         SUM(ORDER_AMOUNT)                    AS TOTAL_REVENUE,
         AVG(ORDER_AMOUNT)                    AS AVG_ORDER_VALUE,
         MIN(ORDER_AMOUNT)                    AS MIN_ORDER,
         MAX(ORDER_AMOUNT)                    AS MAX_ORDER,
         MAX(ORDER_AMOUNT) - MIN(ORDER_AMOUNT) AS ORDER_RANGE,
         STDDEV(ORDER_AMOUNT)                 AS STDDEV_ORDER,
         SUM(CASE WHEN STATUS = 'RETURNED' THEN ORDER_AMOUNT ELSE 0 END) AS RETURNED_VALUE,
         DECIMAL(
           100.0 * SUM(CASE WHEN STATUS = 'RETURNED' THEN 1 ELSE 0 END)
           / COUNT(*), 5, 2)                  AS RETURN_RATE_PCT
FROM     ORDERS
WHERE    ORDER_DATE BETWEEN '2026-01-01' AND '2026-03-31'
GROUP BY SEGMENT
ORDER BY TOTAL_REVENUE DESC;

ROLLUP: Hierarchical Subtotals

ROLLUP is a powerful GROUP BY extension that automatically generates subtotal rows at each level of a column hierarchy, plus a grand total row. It eliminates the need to write multiple queries and UNION ALL them together.

sql
-- Sales hierarchy: Country > Region > City
SELECT   COUNTRY,
         REGION,
         CITY,
         SUM(SALES_AMOUNT)  AS TOTAL_SALES,
         COUNT(*)           AS ORDER_COUNT
FROM     SALES
GROUP BY ROLLUP(COUNTRY, REGION, CITY)
ORDER BY COUNTRY NULLS LAST,
         REGION  NULLS LAST,
         CITY    NULLS LAST;

DB2 generates rows for:

  • Each COUNTRY + REGION + CITY combination (most detailed)
  • Each COUNTRY + REGION subtotal
  • Each COUNTRY subtotal
  • The grand total (all columns NULL in GROUP BY position)

Use GROUPING(column) to distinguish subtotal rows from rows where the column actually contains NULL:

sql
SELECT   CASE GROUPING(COUNTRY) WHEN 1 THEN 'ALL COUNTRIES' ELSE COUNTRY END AS COUNTRY,
         CASE GROUPING(REGION)  WHEN 1 THEN 'ALL REGIONS'   ELSE REGION  END AS REGION,
         SUM(SALES_AMOUNT) AS TOTAL_SALES
FROM     SALES
GROUP BY ROLLUP(COUNTRY, REGION)
ORDER BY GROUPING(COUNTRY), GROUPING(REGION), COUNTRY, REGION;

CUBE: Multidimensional Analysis

CUBE extends ROLLUP by generating subtotals for every possible combination of the grouping columns. For three columns A, B, C, CUBE(A, B, C) produces 2³ = 8 grouping combinations.

sql
-- Cross-tabulation: sales by product category, region, and channel
SELECT   CATEGORY,
         REGION,
         CHANNEL,
         SUM(AMOUNT)  AS TOTAL_SALES
FROM     SALES
GROUP BY CUBE(CATEGORY, REGION, CHANNEL)
ORDER BY GROUPING(CATEGORY),
         GROUPING(REGION),
         GROUPING(CHANNEL),
         CATEGORY NULLS LAST,
         REGION   NULLS LAST,
         CHANNEL  NULLS LAST;

CUBE is ideal for OLAP-style reports where analysts need to slice data by any combination of dimensions. However, it generates significantly more rows than ROLLUP — plan for the extra I/O in large tables.


GROUPING SETS: Precise Control Over Aggregation Levels

GROUPING SETS lets you specify exactly which grouping combinations you want, without generating all the combinations that CUBE would produce. It is the most flexible of the three superaggregate extensions.

sql
-- Produce only: total per department, total per job, and grand total
-- (not the department+job combination)
SELECT   DEPT,
         JOB,
         SUM(SALARY)  AS TOTAL_SALARY,
         COUNT(*)     AS HEADCOUNT
FROM     EMPLOYEE
GROUP BY GROUPING SETS (
    (DEPT),          -- subtotal per department
    (JOB),           -- subtotal per job
    ()               -- grand total
)
ORDER BY GROUPING(DEPT), GROUPING(JOB), DEPT NULLS LAST, JOB NULLS LAST;

GROUPING SETS eliminates redundant computation. If you only need two of the eight combinations that CUBE would generate, use GROUPING SETS and save both CPU and elapsed time.


Practical Business Reporting Examples

Monthly Revenue Trend with Running Total

sql
-- Monthly revenue with month-over-month growth
SELECT   YEAR(SALE_DATE)   AS SALE_YEAR,
         MONTH(SALE_DATE)  AS SALE_MONTH,
         SUM(AMOUNT)       AS MONTHLY_REVENUE,
         SUM(SUM(AMOUNT)) OVER (
             PARTITION BY YEAR(SALE_DATE)
             ORDER BY MONTH(SALE_DATE)
             ROWS UNBOUNDED PRECEDING
         )                 AS YTD_REVENUE
FROM     SALES
WHERE    SALE_DATE >= '2026-01-01'
GROUP BY YEAR(SALE_DATE), MONTH(SALE_DATE)
ORDER BY SALE_YEAR, SALE_MONTH;

This example nests a SUM aggregate inside an OVER clause — a valid DB2 pattern for computing year-to-date running totals alongside monthly figures.

Department Budget vs Actuals

sql
-- Compare actual payroll to budget by department
SELECT   E.DEPT,
         SUM(E.SALARY + COALESCE(E.BONUS, 0))  AS ACTUAL_COST,
         MAX(B.BUDGET_AMOUNT)                   AS BUDGET,
         SUM(E.SALARY + COALESCE(E.BONUS, 0))
             - MAX(B.BUDGET_AMOUNT)             AS VARIANCE,
         DECIMAL(
             100.0 * SUM(E.SALARY + COALESCE(E.BONUS, 0))
             / MAX(B.BUDGET_AMOUNT), 6, 2)      AS PCT_OF_BUDGET
FROM     EMPLOYEE E
JOIN     DEPT_BUDGET B ON E.DEPT = B.DEPT
GROUP BY E.DEPT
HAVING   SUM(E.SALARY + COALESCE(E.BONUS, 0)) > MAX(B.BUDGET_AMOUNT)
ORDER BY VARIANCE DESC;

Performance Tips for DB2 Aggregate Queries

  • Use indexes on GROUP BY columns. DB2 can perform a grouped index scan (GIS) when a matching index exists on the grouping columns, avoiding a sort.
  • Filter early with WHERE. Push row-level filters into the WHERE clause rather than HAVING so DB2 aggregates fewer rows.
  • Avoid functions on indexed columns in WHERE. WHERE YEAR(SALE_DATE) = 2026 prevents index use; prefer WHERE SALE_DATE BETWEEN '2026-01-01' AND '2026-12-31'.
  • Run RUNSTATS after large loads. The DB2 optimizer relies on table statistics to choose between sort-based aggregation and hash aggregation. Stale statistics lead to poor plan choices.
  • Consider materialized query tables (MQTs). For frequently run aggregate reports on large tables, an MQT (DB2's equivalent of a materialized view) pre-computes aggregates and dramatically reduces query elapsed time.

Related Resources