SQL Recursive Queries: Mastering Trees and Hierarchies

Recursive Queries: Navigating Trees & Graphs
Standard SQL is "horizontal"—it processes rows in batches. But what happens when a row in a table references another row in the same table?
- A manager who has employees, who are also managers.
- A folder that contains sub-folders, which contain files.
- A "Shortest Path" problem in a delivery network.
To solve these, you need Recursion. This flagship guide explores how the database engine handles "Self-Referential Logic" and how to architect high-fidelity graph traversals without collapsing into a stack overflow.
1. The Physics of the "Working Table"
Recursive CTEs look like magic, but they are powered by a very specific hardware process: the Working Table Iteration.
The Anatomy of the Loop
A recursive query consists of two parts linked by a UNION or UNION ALL:
- The Anchor Member: This is the "Base Case." It's the starting point of your recursion (e.g., the CEO at the top of the tree).
- The Recursive Member: This is the "Step Case." It's the query that joins the previous result back to the table to find the next level.
The Mirror of Execution
When you run WITH RECURSIVE, the engine creates two temporary tables in RAM:
- The Result Table: This stores the final output.
- The Working Table: This stores only the rows found in the last iteration.
The Loop Physics:
- The engine runs the Anchor and puts the results into both tables.
- It then runs the Recursive Member against the Working Table.
- It takes those new rows, puts them in the Result Table, and replaces the Working Table with them.
- It repeats this until the Working Table is empty.
The Architect's Lesson: The database never looks at the "Entire" result set during recursion; it only looks at the "Current Level." This is why recursion in SQL is surprisingly memory-efficient.
2. Real-World Hierarchy: The Organization Tree
Let's build a classic "Management Chain" report.
The Result: By limiting the degree to 3, the database only has to scan a small branch of the network. Even with 10 million users, the "Third Degree" check returned in 12 milliseconds.
5. Alternative: Path-Based Hierarchies (LTREE)
5.1 Architecting Closure Tables: The High-Speed Alternative
If your tree is read-heavy and you need instant "Sub-tree" searches, Recursive CTEs might be too slow. Enter the Closure Table.
- The Architecture: You create a separate table
category_relationshipsthat stores every ancestor-descendant pair. - The Physics: If Root has Child, and Child has Grandchild, the Closure Table stores:
- (Root, Child, 1)
- (Root, Grandchild, 2)
- (Child, Grandchild, 1)
- The Performance Mirror: To find all descendants of Root, you perform a single, indexed
SELECTon the closure table (WHERE ancestor_id = Root). You have traded Disk Space (N^2 storage) for O(1) Search Latency. This is how Reddit handles million-comment threads with sub-millisecond load times.
6. Project War Room: The 10-Level Org Chart
Imagine a global enterprise with 500,000 employees. The CEO wants a report of the "Total Salary Weight" for every branch.
- The Recursive Approach: You use a CTE to sum salaries as you descend.
- The Performance Mirror: 1.5 seconds.
- The Optimization: You add Path Materialization. You store a
/CEO/VP/Director/Manager/string in each row. - The Hardware Win: By using a GIST Index on the path, the 1.5-second recursive query becomes a 15ms index scan.
- The Lesson: Recursion is for logic; Materialization is for scale.
7. Summary: The Recursive Checklist
- Anchor Fidelity: Ensure your anchor returns exactly the starting rows you need.
- Breadth-First Default: Understand that you are processing levels, not branches, by default.
- Depth Sentry: Always filter by
depthto prevent runaways. - Closure Tables for Scale: If reading is 1,000x more frequent than writing, move your hierarchy to a Closure Table.
- Index the Foreign Keys: Use B-Tree indexes on
parent_idfor recursive joins.
Recursive queries are the "High-Level Logic" of SQL. By mastering the movement of the working set and the safety of cycle detection, you gain the power to traverse complex human and machine networks with ease. You graduate from "Querying rows" to "Architecting Graphs."
Masterclass Recursion Checklist
- Implement Sovereign Cycle Detection: Use the
CYCLEclause to prevent infinite loops. - Audit Anchor Specificity: Ensure the starting query is as narrow as possible.
- Compare Recursive CTE performance vs. a Closure Table for a 5-level hierarchy.
- Advanced Goal: Build a "Shortest Path" finder for a simple delivery node network.
Read next: SQL Transactions: The ACID Concurrency Mirror →
Part of the SQL Mastery Course — engineering the paths.
