DB2 Window Functions: ROW_NUMBER, RANK, LAG, LEAD Complete Guide

DB2 Window Functions: ROW_NUMBER, RANK, LAG, LEAD Complete Guide
DB2 window functions (also called OLAP functions or analytic functions) are one of the most powerful features of modern SQL. Available in both DB2 for z/OS and DB2 LUW, they allow you to compute rankings, running totals, moving averages, and row-to-row comparisons without resorting to self-joins, correlated subqueries, or procedural cursor loops. Once you understand the OVER clause, you will rewrite dozens of complex queries in far fewer lines.
This guide covers every major DB2 window function with working SQL examples, explains how PARTITION BY, ORDER BY, and frame specifications interact, and shows practical use cases including sales rankings, year-over-year analysis, and gap detection.
What Are Window Functions?
A window function performs a calculation over a set of table rows that are related to the current row. The set of related rows is called the window, and it is defined by the OVER clause that follows every window function call.
The critical distinction from GROUP BY aggregation is that window functions do not collapse rows. Every input row remains in the output, and the window function result appears as an additional column alongside all the original columns.
-- GROUP BY: collapses to one row per department
SELECT WORKDEPT, AVG(SALARY) AS AVG_SAL
FROM DSN8C10.EMP
GROUP BY WORKDEPT;
-- Window function: keeps every row, adds the department average alongside
SELECT EMPNO,
LASTNAME,
WORKDEPT,
SALARY,
AVG(SALARY) OVER (PARTITION BY WORKDEPT) AS DEPT_AVG_SAL
FROM DSN8C10.EMP;The second query returns one row per employee, with DEPT_AVG_SAL showing the average for that employee's department — something impossible to express cleanly with GROUP BY alone.
The OVER Clause: PARTITION BY and ORDER BY
The OVER clause defines the window (the set of rows) and the ordering within that window. It has three optional components:
function_name(...) OVER (
[PARTITION BY partition_expression, ...]
[ORDER BY sort_expression [ASC | DESC], ...]
[frame_clause]
)PARTITION BY
PARTITION BY divides the result set into independent partitions. The window function resets and is computed independently for each partition. It is analogous to GROUP BY, but without collapsing rows.
If PARTITION BY is omitted, the entire result set is treated as one partition.
ORDER BY Inside OVER
ORDER BY inside the OVER clause determines the logical ordering within each partition for functions that are order-sensitive (ranking functions, LAG/LEAD, running totals). This ORDER BY is completely separate from the ORDER BY at the end of the full SELECT statement.
-- Rank employees by salary within each department
SELECT EMPNO,
LASTNAME,
WORKDEPT,
SALARY,
RANK() OVER (PARTITION BY WORKDEPT
ORDER BY SALARY DESC) AS SAL_RANK
FROM DSN8C10.EMP
ORDER BY WORKDEPT, SAL_RANK;Frame Specification: ROWS BETWEEN
For aggregate window functions (SUM, AVG, MAX, MIN, COUNT) used with ORDER BY in the OVER clause, DB2 applies a frame that determines which rows within the partition are included in each calculation. The frame clause uses the syntax:
ROWS BETWEEN <start> AND <end>Common frame boundaries:
| Boundary | Meaning |
|---|---|
UNBOUNDED PRECEDING | First row of the partition |
N PRECEDING | N rows before the current row |
CURRENT ROW | The current row |
N FOLLOWING | N rows after the current row |
UNBOUNDED FOLLOWING | Last row of the partition |
-- Running total: accumulate salary from the first row through the current row
SELECT EMPNO,
LASTNAME,
WORKDEPT,
SALARY,
SUM(SALARY) OVER (PARTITION BY WORKDEPT
ORDER BY HIREDATE
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS RUNNING_TOTAL
FROM DSN8C10.EMP
ORDER BY WORKDEPT, HIREDATE;
-- 3-row moving average (current row + 1 before + 1 after)
SELECT EMPNO,
SALARY,
AVG(SALARY) OVER (PARTITION BY WORKDEPT
ORDER BY HIREDATE
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
AS MOVING_AVG_3
FROM DSN8C10.EMP
ORDER BY WORKDEPT, HIREDATE;When ORDER BY is present in the OVER clause but no frame clause is specified, DB2 defaults to ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW for most aggregate functions — which computes a running total or running aggregate. Always specify the frame clause explicitly in production code to avoid ambiguity.
ROW_NUMBER()
ROW_NUMBER() assigns a unique sequential integer to each row within a partition, starting from 1. No ties are possible — even if two rows have identical ORDER BY values, they receive different row numbers (the ordering between them is arbitrary in that case).
-- Assign a sequence number to each employee within their department,
-- ordered by descending salary
SELECT EMPNO,
LASTNAME,
WORKDEPT,
SALARY,
ROW_NUMBER() OVER (PARTITION BY WORKDEPT
ORDER BY SALARY DESC) AS SEQ_NUM
FROM DSN8C10.EMP
ORDER BY WORKDEPT, SEQ_NUM;Selecting the Top N Rows per Group
ROW_NUMBER() is the standard way to select the top N rows per group in DB2 (because DB2 does not support LIMIT in z/OS SQL or the FETCH FIRST clause inside subqueries in older versions).
-- Select the top 2 earners per department
SELECT EMPNO, LASTNAME, WORKDEPT, SALARY
FROM (
SELECT EMPNO,
LASTNAME,
WORKDEPT,
SALARY,
ROW_NUMBER() OVER (PARTITION BY WORKDEPT
ORDER BY SALARY DESC) AS RN
FROM DSN8C10.EMP
) AS RANKED
WHERE RN <= 2
ORDER BY WORKDEPT, SALARY DESC;RANK() and DENSE_RANK()
RANK()
RANK() assigns a rank to each row, with ties receiving the same rank. After a tie of N rows at rank K, the next rank assigned is K+N (ranks are skipped).
DENSE_RANK()
DENSE_RANK() also assigns the same rank to ties, but does not skip rank numbers — the next rank after a tie is always the previous rank plus 1.
-- Compare RANK vs DENSE_RANK for salary ranking
SELECT EMPNO,
LASTNAME,
WORKDEPT,
SALARY,
RANK() OVER (PARTITION BY WORKDEPT ORDER BY SALARY DESC) AS RNKK,
DENSE_RANK() OVER (PARTITION BY WORKDEPT ORDER BY SALARY DESC) AS DENSE_RNK
FROM DSN8C10.EMP
ORDER BY WORKDEPT, SALARY DESC;Example output for one department:
| LASTNAME | SALARY | RNKK | DENSE_RNK |
|---|---|---|---|
| HAAS | 52750 | 1 | 1 |
| THOMPSON | 46500 | 2 | 2 |
| JONES | 46500 | 2 | 2 |
| STERN | 32250 | 4 | 3 |
With RANK, position 3 is skipped because two people share rank 2. With DENSE_RANK, the next rank after the tie is 3.
NTILE()
NTILE(n) divides the rows in each partition into n approximately equal buckets and assigns a bucket number (1 through n) to each row. It is useful for percentile calculations and quartile analysis.
-- Divide employees within each department into salary quartiles
SELECT EMPNO,
LASTNAME,
WORKDEPT,
SALARY,
NTILE(4) OVER (PARTITION BY WORKDEPT
ORDER BY SALARY ASC) AS SALARY_QUARTILE
FROM DSN8C10.EMP
ORDER BY WORKDEPT, SALARY_QUARTILE, SALARY;Rows are distributed as evenly as possible; if the partition size is not evenly divisible by n, the first partitions receive one extra row.
LAG() and LEAD()
LAG() and LEAD() are among the most practically useful window functions. They allow you to access data from a previous row (LAG) or a following row (LEAD) within the same partition, without requiring a self-join.
Syntax
LAG(expression [, offset [, default]]) OVER (partition_and_order)
LEAD(expression [, offset [, default]]) OVER (partition_and_order)offset: how many rows back (LAG) or forward (LEAD) to look. Defaults to 1.default: value to return when there is no such row (e.g., the first row has no previous row). Defaults toNULL.
Year-over-Year Sales Comparison
-- Compare this year's monthly sales to the same month last year
SELECT SALE_YEAR,
SALE_MONTH,
MONTHLY_AMOUNT,
LAG(MONTHLY_AMOUNT, 12, 0)
OVER (ORDER BY SALE_YEAR, SALE_MONTH) AS SAME_MONTH_LAST_YEAR,
MONTHLY_AMOUNT -
LAG(MONTHLY_AMOUNT, 12, 0)
OVER (ORDER BY SALE_YEAR, SALE_MONTH) AS YOY_CHANGE
FROM MONTHLY_SALES
ORDER BY SALE_YEAR, SALE_MONTH;Row-to-Row Difference: Salary Changes
-- Show salary change between consecutive hires in the same department
SELECT EMPNO,
LASTNAME,
WORKDEPT,
HIREDATE,
SALARY,
LAG(SALARY) OVER (PARTITION BY WORKDEPT
ORDER BY HIREDATE) AS PREV_HIRE_SALARY,
SALARY - LAG(SALARY, 1, SALARY)
OVER (PARTITION BY WORKDEPT
ORDER BY HIREDATE) AS SAL_DIFF
FROM DSN8C10.EMP
ORDER BY WORKDEPT, HIREDATE;Gap Analysis with LEAD
-- Find gaps in order numbers (missing order IDs)
SELECT ORDER_ID,
LEAD(ORDER_ID) OVER (ORDER BY ORDER_ID) AS NEXT_ORDER_ID,
LEAD(ORDER_ID) OVER (ORDER BY ORDER_ID) - ORDER_ID - 1 AS GAP_SIZE
FROM ORDERS
WHERE LEAD(ORDER_ID) OVER (ORDER BY ORDER_ID) - ORDER_ID > 1;Note: In DB2, you cannot use a window function directly in a WHERE clause. Wrap it in a derived table or CTE:
-- Correct approach: wrap in a derived table
SELECT ORDER_ID, NEXT_ORDER_ID, GAP_SIZE
FROM (
SELECT ORDER_ID,
LEAD(ORDER_ID) OVER (ORDER BY ORDER_ID) AS NEXT_ORDER_ID,
LEAD(ORDER_ID) OVER (ORDER BY ORDER_ID) - ORDER_ID - 1
AS GAP_SIZE
FROM ORDERS
) AS GAPS
WHERE GAP_SIZE > 0
ORDER BY ORDER_ID;FIRST_VALUE() and LAST_VALUE()
FIRST_VALUE() returns the value of an expression evaluated at the first row of the window frame. LAST_VALUE() returns the value from the last row of the frame.
-- For each employee, show the highest and lowest salary in their department
SELECT EMPNO,
LASTNAME,
WORKDEPT,
SALARY,
FIRST_VALUE(SALARY) OVER (PARTITION BY WORKDEPT
ORDER BY SALARY DESC
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) AS DEPT_MAX_SAL,
LAST_VALUE(SALARY) OVER (PARTITION BY WORKDEPT
ORDER BY SALARY DESC
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) AS DEPT_MIN_SAL
FROM DSN8C10.EMP
ORDER BY WORKDEPT, SALARY DESC;The ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING frame is important for LAST_VALUE. Without it, the default frame would only include rows up to the current row, making LAST_VALUE return the current row's own value — rarely what you want.
Running Totals and Cumulative Aggregates
Running totals are a classic use case for window functions. Using SUM() OVER with the appropriate frame clause is far more efficient than a correlated subquery for every row.
-- Cumulative salary spend by hire date within each department
SELECT EMPNO,
LASTNAME,
WORKDEPT,
HIREDATE,
SALARY,
SUM(SALARY) OVER (PARTITION BY WORKDEPT
ORDER BY HIREDATE
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS CUMULATIVE_SALARY,
COUNT(*) OVER (PARTITION BY WORKDEPT
ORDER BY HIREDATE
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS CUMULATIVE_HEADCOUNT
FROM DSN8C10.EMP
ORDER BY WORKDEPT, HIREDATE;Moving Average
-- 5-period moving average of monthly sales
SELECT SALE_MONTH,
MONTHLY_AMOUNT,
AVG(MONTHLY_AMOUNT)
OVER (ORDER BY SALE_MONTH
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW)
AS MOVING_AVG_5
FROM MONTHLY_SALES
ORDER BY SALE_MONTH;Window Functions vs GROUP BY
| Scenario | Use GROUP BY | Use Window Function |
|---|---|---|
| Total per group, no row detail needed | Yes | No |
| Total per group AND individual row detail | No | Yes |
| Ranking within a group | No | Yes |
| Access previous/next row value | No | Yes |
| Running or cumulative aggregate | No | Yes |
| Percentage of group total | No | Yes |
| Multiple different aggregations in one pass | Possible | Yes, cleaner |
A common pattern is combining both in the same query — GROUP BY to produce a summary table (or CTE), then window functions on the expanded detail.
Practical Use Case: Sales Rankings and Performance Tiers
WITH SALES_RANKED AS (
SELECT SALESPERSON_ID,
REGION,
SALE_YEAR,
TOTAL_SALES,
RANK() OVER (PARTITION BY REGION, SALE_YEAR
ORDER BY TOTAL_SALES DESC) AS REGION_RANK,
DENSE_RANK() OVER (ORDER BY TOTAL_SALES DESC) AS OVERALL_RANK,
NTILE(4) OVER (PARTITION BY SALE_YEAR
ORDER BY TOTAL_SALES DESC) AS PERFORMANCE_TIER,
SUM(TOTAL_SALES) OVER (PARTITION BY REGION, SALE_YEAR)
AS REGION_TOTAL,
DECIMAL(TOTAL_SALES /
SUM(TOTAL_SALES) OVER (PARTITION BY REGION, SALE_YEAR)
* 100, 5, 2) AS PCT_OF_REGION
FROM SALES_SUMMARY
WHERE SALE_YEAR IN (2025, 2026)
)
SELECT SALESPERSON_ID,
REGION,
SALE_YEAR,
TOTAL_SALES,
REGION_RANK,
OVERALL_RANK,
CASE PERFORMANCE_TIER
WHEN 1 THEN 'TOP 25%'
WHEN 2 THEN 'UPPER MIDDLE'
WHEN 3 THEN 'LOWER MIDDLE'
WHEN 4 THEN 'BOTTOM 25%'
END AS TIER_LABEL,
PCT_OF_REGION
FROM SALES_RANKED
WHERE REGION_RANK <= 5
ORDER BY REGION, SALE_YEAR, REGION_RANK;Performance Considerations
Window Functions Are a Single Pass
A key performance advantage of window functions over equivalent correlated subqueries or self-joins is that DB2 typically processes the data in a single pass. Correlated subqueries for running totals re-scan the table for every row; window functions scan once, sort if necessary, and compute the window aggregate incrementally.
Sorting Cost
Window functions that use ORDER BY in the OVER clause may require sorting. If the underlying data is already sorted (due to an index or a prior sort in the query plan), DB2 can avoid an additional sort step. Include suitable indexes on PARTITION BY and ORDER BY columns to reduce sort overhead.
Multiple Window Functions with Different Partitions
If a query contains several window functions with different PARTITION BY or ORDER BY expressions, DB2 may need to sort the data multiple times. In this case, it can be worth separating the computations into CTEs to give the optimizer more flexibility.
-- Potentially expensive: two different PARTITION BY in one SELECT
SELECT EMPNO,
SALARY,
AVG(SALARY) OVER (PARTITION BY WORKDEPT) AS DEPT_AVG,
AVG(SALARY) OVER (PARTITION BY JOB) AS JOB_AVG
FROM DSN8C10.EMP;Check the EXPLAIN output to see if DB2 performs two sort operations. If performance is critical, separate into CTEs or intermediate result sets.
Use EXPLAIN
Always explain queries with window functions on large tables:
EXPLAIN PLAN SET QUERYNO = 200 FOR
SELECT EMPNO, SALARY,
ROW_NUMBER() OVER (PARTITION BY WORKDEPT ORDER BY SALARY DESC) AS RN
FROM DSN8C10.EMP;
SELECT PLANNO, TNAME, ACCESSTYPE, SORTC_PREF
FROM PLAN_TABLE
WHERE QUERYNO = 200
ORDER BY PLANNO;Summary
DB2 window functions extend SQL with powerful row-by-row analytical capabilities:
- The OVER clause defines the window — use
PARTITION BYto create groups andORDER BYto set the processing order within each group. - Frame clauses (
ROWS BETWEEN) control which rows are included in aggregate calculations. - ROW_NUMBER() assigns unique row numbers — ideal for top-N-per-group queries.
- RANK() and DENSE_RANK() handle ties differently;
DENSE_RANKhas no gaps in numbering. - NTILE(n) divides rows into n equal buckets for percentile analysis.
- LAG() and LEAD() access previous and next rows without self-joins — perfect for YOY comparisons and gap analysis.
- FIRST_VALUE() and LAST_VALUE() return boundary values within the frame.
- SUM() OVER and AVG() OVER with frame clauses produce running totals and moving averages in a single table pass.
- Window functions cannot be used in
WHEREorHAVINGclauses — wrap them in a derived table or CTE.
