MainframeDatabaseDB2 Complete Reference

DB2 Window Functions: ROW_NUMBER, RANK, LAG, LEAD Complete Guide

TT
TopicTrick Team
DB2 Window Functions: ROW_NUMBER, RANK, LAG, LEAD Complete Guide

DB2 Window Functions: ROW_NUMBER, RANK, LAG, LEAD Complete Guide

DB2 window functions (also called OLAP functions or analytic functions) are one of the most powerful features of modern SQL. Available in both DB2 for z/OS and DB2 LUW, they allow you to compute rankings, running totals, moving averages, and row-to-row comparisons without resorting to self-joins, correlated subqueries, or procedural cursor loops. Once you understand the OVER clause, you will rewrite dozens of complex queries in far fewer lines.

This guide covers every major DB2 window function with working SQL examples, explains how PARTITION BY, ORDER BY, and frame specifications interact, and shows practical use cases including sales rankings, year-over-year analysis, and gap detection.


What Are Window Functions?

A window function performs a calculation over a set of table rows that are related to the current row. The set of related rows is called the window, and it is defined by the OVER clause that follows every window function call.

The critical distinction from GROUP BY aggregation is that window functions do not collapse rows. Every input row remains in the output, and the window function result appears as an additional column alongside all the original columns.

sql
-- GROUP BY: collapses to one row per department
SELECT WORKDEPT, AVG(SALARY) AS AVG_SAL
FROM   DSN8C10.EMP
GROUP BY WORKDEPT;

-- Window function: keeps every row, adds the department average alongside
SELECT EMPNO,
       LASTNAME,
       WORKDEPT,
       SALARY,
       AVG(SALARY) OVER (PARTITION BY WORKDEPT) AS DEPT_AVG_SAL
FROM   DSN8C10.EMP;

The second query returns one row per employee, with DEPT_AVG_SAL showing the average for that employee's department — something impossible to express cleanly with GROUP BY alone.


The OVER Clause: PARTITION BY and ORDER BY

The OVER clause defines the window (the set of rows) and the ordering within that window. It has three optional components:

text
function_name(...) OVER (
    [PARTITION BY partition_expression, ...]
    [ORDER BY sort_expression [ASC | DESC], ...]
    [frame_clause]
)

PARTITION BY

PARTITION BY divides the result set into independent partitions. The window function resets and is computed independently for each partition. It is analogous to GROUP BY, but without collapsing rows.

If PARTITION BY is omitted, the entire result set is treated as one partition.

ORDER BY Inside OVER

ORDER BY inside the OVER clause determines the logical ordering within each partition for functions that are order-sensitive (ranking functions, LAG/LEAD, running totals). This ORDER BY is completely separate from the ORDER BY at the end of the full SELECT statement.

sql
-- Rank employees by salary within each department
SELECT EMPNO,
       LASTNAME,
       WORKDEPT,
       SALARY,
       RANK() OVER (PARTITION BY WORKDEPT
                    ORDER BY SALARY DESC) AS SAL_RANK
FROM   DSN8C10.EMP
ORDER BY WORKDEPT, SAL_RANK;

Frame Specification: ROWS BETWEEN

For aggregate window functions (SUM, AVG, MAX, MIN, COUNT) used with ORDER BY in the OVER clause, DB2 applies a frame that determines which rows within the partition are included in each calculation. The frame clause uses the syntax:

text
ROWS BETWEEN <start> AND <end>

Common frame boundaries:

BoundaryMeaning
UNBOUNDED PRECEDINGFirst row of the partition
N PRECEDINGN rows before the current row
CURRENT ROWThe current row
N FOLLOWINGN rows after the current row
UNBOUNDED FOLLOWINGLast row of the partition
sql
-- Running total: accumulate salary from the first row through the current row
SELECT EMPNO,
       LASTNAME,
       WORKDEPT,
       SALARY,
       SUM(SALARY) OVER (PARTITION BY WORKDEPT
                         ORDER BY HIREDATE
                         ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
           AS RUNNING_TOTAL
FROM   DSN8C10.EMP
ORDER BY WORKDEPT, HIREDATE;

-- 3-row moving average (current row + 1 before + 1 after)
SELECT EMPNO,
       SALARY,
       AVG(SALARY) OVER (PARTITION BY WORKDEPT
                         ORDER BY HIREDATE
                         ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
           AS MOVING_AVG_3
FROM   DSN8C10.EMP
ORDER BY WORKDEPT, HIREDATE;

When ORDER BY is present in the OVER clause but no frame clause is specified, DB2 defaults to ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW for most aggregate functions — which computes a running total or running aggregate. Always specify the frame clause explicitly in production code to avoid ambiguity.


ROW_NUMBER()

ROW_NUMBER() assigns a unique sequential integer to each row within a partition, starting from 1. No ties are possible — even if two rows have identical ORDER BY values, they receive different row numbers (the ordering between them is arbitrary in that case).

sql
-- Assign a sequence number to each employee within their department,
-- ordered by descending salary
SELECT EMPNO,
       LASTNAME,
       WORKDEPT,
       SALARY,
       ROW_NUMBER() OVER (PARTITION BY WORKDEPT
                          ORDER BY SALARY DESC) AS SEQ_NUM
FROM   DSN8C10.EMP
ORDER BY WORKDEPT, SEQ_NUM;

Selecting the Top N Rows per Group

ROW_NUMBER() is the standard way to select the top N rows per group in DB2 (because DB2 does not support LIMIT in z/OS SQL or the FETCH FIRST clause inside subqueries in older versions).

sql
-- Select the top 2 earners per department
SELECT EMPNO, LASTNAME, WORKDEPT, SALARY
FROM  (
          SELECT EMPNO,
                 LASTNAME,
                 WORKDEPT,
                 SALARY,
                 ROW_NUMBER() OVER (PARTITION BY WORKDEPT
                                    ORDER BY SALARY DESC) AS RN
          FROM   DSN8C10.EMP
      ) AS RANKED
WHERE RN <= 2
ORDER BY WORKDEPT, SALARY DESC;

RANK() and DENSE_RANK()

RANK()

RANK() assigns a rank to each row, with ties receiving the same rank. After a tie of N rows at rank K, the next rank assigned is K+N (ranks are skipped).

DENSE_RANK()

DENSE_RANK() also assigns the same rank to ties, but does not skip rank numbers — the next rank after a tie is always the previous rank plus 1.

sql
-- Compare RANK vs DENSE_RANK for salary ranking
SELECT EMPNO,
       LASTNAME,
       WORKDEPT,
       SALARY,
       RANK()       OVER (PARTITION BY WORKDEPT ORDER BY SALARY DESC) AS RNKK,
       DENSE_RANK() OVER (PARTITION BY WORKDEPT ORDER BY SALARY DESC) AS DENSE_RNK
FROM   DSN8C10.EMP
ORDER BY WORKDEPT, SALARY DESC;

Example output for one department:

LASTNAMESALARYRNKKDENSE_RNK
HAAS5275011
THOMPSON4650022
JONES4650022
STERN3225043

With RANK, position 3 is skipped because two people share rank 2. With DENSE_RANK, the next rank after the tie is 3.


NTILE()

NTILE(n) divides the rows in each partition into n approximately equal buckets and assigns a bucket number (1 through n) to each row. It is useful for percentile calculations and quartile analysis.

sql
-- Divide employees within each department into salary quartiles
SELECT EMPNO,
       LASTNAME,
       WORKDEPT,
       SALARY,
       NTILE(4) OVER (PARTITION BY WORKDEPT
                      ORDER BY SALARY ASC) AS SALARY_QUARTILE
FROM   DSN8C10.EMP
ORDER BY WORKDEPT, SALARY_QUARTILE, SALARY;

Rows are distributed as evenly as possible; if the partition size is not evenly divisible by n, the first partitions receive one extra row.


LAG() and LEAD()

LAG() and LEAD() are among the most practically useful window functions. They allow you to access data from a previous row (LAG) or a following row (LEAD) within the same partition, without requiring a self-join.

Syntax

sql
LAG(expression  [, offset [, default]]) OVER (partition_and_order)
LEAD(expression [, offset [, default]]) OVER (partition_and_order)
  • offset: how many rows back (LAG) or forward (LEAD) to look. Defaults to 1.
  • default: value to return when there is no such row (e.g., the first row has no previous row). Defaults to NULL.

Year-over-Year Sales Comparison

sql
-- Compare this year's monthly sales to the same month last year
SELECT SALE_YEAR,
       SALE_MONTH,
       MONTHLY_AMOUNT,
       LAG(MONTHLY_AMOUNT, 12, 0)
           OVER (ORDER BY SALE_YEAR, SALE_MONTH) AS SAME_MONTH_LAST_YEAR,
       MONTHLY_AMOUNT -
           LAG(MONTHLY_AMOUNT, 12, 0)
               OVER (ORDER BY SALE_YEAR, SALE_MONTH) AS YOY_CHANGE
FROM   MONTHLY_SALES
ORDER BY SALE_YEAR, SALE_MONTH;

Row-to-Row Difference: Salary Changes

sql
-- Show salary change between consecutive hires in the same department
SELECT EMPNO,
       LASTNAME,
       WORKDEPT,
       HIREDATE,
       SALARY,
       LAG(SALARY) OVER (PARTITION BY WORKDEPT
                         ORDER BY HIREDATE)       AS PREV_HIRE_SALARY,
       SALARY - LAG(SALARY, 1, SALARY)
                    OVER (PARTITION BY WORKDEPT
                          ORDER BY HIREDATE)      AS SAL_DIFF
FROM   DSN8C10.EMP
ORDER BY WORKDEPT, HIREDATE;

Gap Analysis with LEAD

sql
-- Find gaps in order numbers (missing order IDs)
SELECT ORDER_ID,
       LEAD(ORDER_ID) OVER (ORDER BY ORDER_ID) AS NEXT_ORDER_ID,
       LEAD(ORDER_ID) OVER (ORDER BY ORDER_ID) - ORDER_ID - 1 AS GAP_SIZE
FROM   ORDERS
WHERE  LEAD(ORDER_ID) OVER (ORDER BY ORDER_ID) - ORDER_ID > 1;

Note: In DB2, you cannot use a window function directly in a WHERE clause. Wrap it in a derived table or CTE:

sql
-- Correct approach: wrap in a derived table
SELECT ORDER_ID, NEXT_ORDER_ID, GAP_SIZE
FROM  (
          SELECT ORDER_ID,
                 LEAD(ORDER_ID) OVER (ORDER BY ORDER_ID) AS NEXT_ORDER_ID,
                 LEAD(ORDER_ID) OVER (ORDER BY ORDER_ID) - ORDER_ID - 1
                     AS GAP_SIZE
          FROM   ORDERS
      ) AS GAPS
WHERE GAP_SIZE > 0
ORDER BY ORDER_ID;

FIRST_VALUE() and LAST_VALUE()

FIRST_VALUE() returns the value of an expression evaluated at the first row of the window frame. LAST_VALUE() returns the value from the last row of the frame.

sql
-- For each employee, show the highest and lowest salary in their department
SELECT EMPNO,
       LASTNAME,
       WORKDEPT,
       SALARY,
       FIRST_VALUE(SALARY) OVER (PARTITION BY WORKDEPT
                                  ORDER BY SALARY DESC
                                  ROWS BETWEEN UNBOUNDED PRECEDING
                                           AND UNBOUNDED FOLLOWING) AS DEPT_MAX_SAL,
       LAST_VALUE(SALARY)  OVER (PARTITION BY WORKDEPT
                                  ORDER BY SALARY DESC
                                  ROWS BETWEEN UNBOUNDED PRECEDING
                                           AND UNBOUNDED FOLLOWING) AS DEPT_MIN_SAL
FROM   DSN8C10.EMP
ORDER BY WORKDEPT, SALARY DESC;

The ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING frame is important for LAST_VALUE. Without it, the default frame would only include rows up to the current row, making LAST_VALUE return the current row's own value — rarely what you want.


Running Totals and Cumulative Aggregates

Running totals are a classic use case for window functions. Using SUM() OVER with the appropriate frame clause is far more efficient than a correlated subquery for every row.

sql
-- Cumulative salary spend by hire date within each department
SELECT EMPNO,
       LASTNAME,
       WORKDEPT,
       HIREDATE,
       SALARY,
       SUM(SALARY) OVER (PARTITION BY WORKDEPT
                         ORDER BY HIREDATE
                         ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
           AS CUMULATIVE_SALARY,
       COUNT(*) OVER (PARTITION BY WORKDEPT
                      ORDER BY HIREDATE
                      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
           AS CUMULATIVE_HEADCOUNT
FROM   DSN8C10.EMP
ORDER BY WORKDEPT, HIREDATE;

Moving Average

sql
-- 5-period moving average of monthly sales
SELECT SALE_MONTH,
       MONTHLY_AMOUNT,
       AVG(MONTHLY_AMOUNT)
           OVER (ORDER BY SALE_MONTH
                 ROWS BETWEEN 4 PRECEDING AND CURRENT ROW)
           AS MOVING_AVG_5
FROM   MONTHLY_SALES
ORDER BY SALE_MONTH;

Window Functions vs GROUP BY

ScenarioUse GROUP BYUse Window Function
Total per group, no row detail neededYesNo
Total per group AND individual row detailNoYes
Ranking within a groupNoYes
Access previous/next row valueNoYes
Running or cumulative aggregateNoYes
Percentage of group totalNoYes
Multiple different aggregations in one passPossibleYes, cleaner

A common pattern is combining both in the same query — GROUP BY to produce a summary table (or CTE), then window functions on the expanded detail.


Practical Use Case: Sales Rankings and Performance Tiers

sql
WITH SALES_RANKED AS (
    SELECT SALESPERSON_ID,
           REGION,
           SALE_YEAR,
           TOTAL_SALES,
           RANK()       OVER (PARTITION BY REGION, SALE_YEAR
                              ORDER BY TOTAL_SALES DESC)  AS REGION_RANK,
           DENSE_RANK() OVER (ORDER BY TOTAL_SALES DESC)  AS OVERALL_RANK,
           NTILE(4)     OVER (PARTITION BY SALE_YEAR
                              ORDER BY TOTAL_SALES DESC)  AS PERFORMANCE_TIER,
           SUM(TOTAL_SALES) OVER (PARTITION BY REGION, SALE_YEAR)
                                                           AS REGION_TOTAL,
           DECIMAL(TOTAL_SALES /
               SUM(TOTAL_SALES) OVER (PARTITION BY REGION, SALE_YEAR)
               * 100, 5, 2)                                AS PCT_OF_REGION
    FROM   SALES_SUMMARY
    WHERE  SALE_YEAR IN (2025, 2026)
)
SELECT SALESPERSON_ID,
       REGION,
       SALE_YEAR,
       TOTAL_SALES,
       REGION_RANK,
       OVERALL_RANK,
       CASE PERFORMANCE_TIER
           WHEN 1 THEN 'TOP 25%'
           WHEN 2 THEN 'UPPER MIDDLE'
           WHEN 3 THEN 'LOWER MIDDLE'
           WHEN 4 THEN 'BOTTOM 25%'
       END AS TIER_LABEL,
       PCT_OF_REGION
FROM   SALES_RANKED
WHERE  REGION_RANK <= 5
ORDER BY REGION, SALE_YEAR, REGION_RANK;

Performance Considerations

Window Functions Are a Single Pass

A key performance advantage of window functions over equivalent correlated subqueries or self-joins is that DB2 typically processes the data in a single pass. Correlated subqueries for running totals re-scan the table for every row; window functions scan once, sort if necessary, and compute the window aggregate incrementally.

Sorting Cost

Window functions that use ORDER BY in the OVER clause may require sorting. If the underlying data is already sorted (due to an index or a prior sort in the query plan), DB2 can avoid an additional sort step. Include suitable indexes on PARTITION BY and ORDER BY columns to reduce sort overhead.

Multiple Window Functions with Different Partitions

If a query contains several window functions with different PARTITION BY or ORDER BY expressions, DB2 may need to sort the data multiple times. In this case, it can be worth separating the computations into CTEs to give the optimizer more flexibility.

sql
-- Potentially expensive: two different PARTITION BY in one SELECT
SELECT EMPNO,
       SALARY,
       AVG(SALARY) OVER (PARTITION BY WORKDEPT) AS DEPT_AVG,
       AVG(SALARY) OVER (PARTITION BY JOB)      AS JOB_AVG
FROM   DSN8C10.EMP;

Check the EXPLAIN output to see if DB2 performs two sort operations. If performance is critical, separate into CTEs or intermediate result sets.

Use EXPLAIN

Always explain queries with window functions on large tables:

sql
EXPLAIN PLAN SET QUERYNO = 200 FOR
    SELECT EMPNO, SALARY,
           ROW_NUMBER() OVER (PARTITION BY WORKDEPT ORDER BY SALARY DESC) AS RN
    FROM   DSN8C10.EMP;

SELECT PLANNO, TNAME, ACCESSTYPE, SORTC_PREF
FROM   PLAN_TABLE
WHERE  QUERYNO = 200
ORDER BY PLANNO;

Summary

DB2 window functions extend SQL with powerful row-by-row analytical capabilities:

  • The OVER clause defines the window — use PARTITION BY to create groups and ORDER BY to set the processing order within each group.
  • Frame clauses (ROWS BETWEEN) control which rows are included in aggregate calculations.
  • ROW_NUMBER() assigns unique row numbers — ideal for top-N-per-group queries.
  • RANK() and DENSE_RANK() handle ties differently; DENSE_RANK has no gaps in numbering.
  • NTILE(n) divides rows into n equal buckets for percentile analysis.
  • LAG() and LEAD() access previous and next rows without self-joins — perfect for YOY comparisons and gap analysis.
  • FIRST_VALUE() and LAST_VALUE() return boundary values within the frame.
  • SUM() OVER and AVG() OVER with frame clauses produce running totals and moving averages in a single table pass.
  • Window functions cannot be used in WHERE or HAVING clauses — wrap them in a derived table or CTE.

Further Reading