SQL Common Table Expressions (CTE): Top-Down Logic

Q: What is a CTE and how does it differ from a subquery?

A CTE is a named temporary result defined with WITH name AS (SELECT ...) that can be referenced by name in the main query. CTEs improve readability and can be referenced multiple times in the query without re-executing the logic.

Q: When should I use a CTE versus a temporary table?

Use CTEs for single-query readability when you do not need to reuse intermediate results across multiple statements. Use temporary tables when the result is large and you want to materialise it once for efficient reuse.

Q: What is a recursive CTE and what problems does it solve?

A recursive CTE references itself repeatedly until no new rows are produced, solving hierarchical traversal problems like finding all descendants in an org chart or computing paths in a graph.

"Code is read 10x more than it is written. CTEs are the bridge between a query that works and a query that can be understood."

In your journey through advanced SQL, you will eventually encounter a query that is 50 lines of nested subqueries, 10 JOINs, and 4 layers of "Derived Tables." These "Spaghetti Queries" are a disaster for teams. They are hard to debug, impossible to refactor, and hide bugs in their complexity.

The Common Table Expression (CTE)-implemented via the WITH clause-is the professional solution to this problem. It allows you to name your sub-calculations and define them before the main query starts. This guide explores the architecture of the "Top-Down" query and how to use CTEs to build production-grade, maintainable software.

1. What is a CTE? (The Named Variable)

A CTE is a temporary result set that you define at the beginning of your SQL statement. You give it a name, and then you can refer to that name just like a real table in your main SELECT, INSERT, UPDATE, or DELETE.

The Basic Syntax

sql

WITH top_customers AS (
    -- This looks like a variable definition
    SELECT customer_id, SUM(amount) as total_spent
    FROM orders
    WHERE order_date > '2026-01-01'
    GROUP BY customer_id
    HAVING SUM(amount) > 1000
)
-- Now we use the 'variable' in our main query
SELECT c.name, t.total_spent
FROM customers c
JOIN top_customers t ON c.id = t.customer_id;

Organization: You can separate your "Data Cleaning" logic from your "Reporting" logic.

The Multi-Reference Mirror

What happens if you use the same CTE three times in one query? SELECT * FROM my_cte JOIN some_table JOIN my_cte JOIN other_table...

The Physics: In older engines, the database would run the CTE code three separate times.
The Buffer Mirror: In modern SQL, the database detects the multiple references and calculates the CTE exactly once, storing the result in a local memory buffer for the rest of the query life-cycle.
The Win: This saves massive amounts of I/O and CPU cycles compared to repeating subqueries.

2. Chaining CTEs: The Assembly Line

One of the most powerful features of the WITH clause is the ability to chain multiple CTEs together. Each CTE can reference the ones defined before it.

Use Case: The Multi-Stage Report

Imagine you need to:

Find all orders from the last month.
Calculate the standard tax for those orders.
Calculate the shipping cost for those orders.
Join everything together for a final report.

sql

WITH monthly_orders AS (
    SELECT * FROM orders WHERE order_date >= '2026-03-01'
),
orders_with_tax AS (
    SELECT *, total_amount * 0.15 AS tax_amount
    FROM monthly_orders  -- References the previous CTE
),
orders_with_shipping AS (
    SELECT *, 
        CASE WHEN total_amount > 500 THEN 0 ELSE 25 END AS shipping_cost
    FROM orders_with_tax -- References the previous CTE
)
SELECT 
    order_id, 
    total_amount + tax_amount + shipping_cost AS final_grand_total
FROM orders_with_shipping;

Architect's Note: This is an "Assembly Line" for data. If the tax logic changes, you only change one specific block of code. The rest of the query remains untouched.

3. CTEs for Write Operations: Fast Updates

CTEs aren't just for SELECT. You can use them to calculate values for UPDATE or DELETE statements, which is much safer and faster than using complex subqueries in the WHERE clause.

Use Case: Deleting Stale Users

sql

WITH stale_users AS (
    SELECT id 
    FROM users 
    WHERE last_login < '2025-01-01' 
    AND account_type = 'guest'
)
DELETE FROM users 
WHERE id IN (SELECT id FROM stale_users);

Why do this? It allows you to "Preview" the data. You can run the WITH block + a SELECT first to verify you are deleting the right people, and then just swap the SELECT for a DELETE.

4. Performance: CTE vs. Subquery (The "Materialization" Question)

A common myth is that CTEs are always slower than subqueries because the database "Materializes" (saves to RAM) the result set.

When to force Materialization: The Optimization Fence

Sometimes, you want the CTE to be a barrier. This is called an Optimization Fence.

sql

WITH heavy_data AS MATERIALIZED (
    -- This calculation stays isolated
    SELECT ... 
)
SELECT ... FROM heavy_data WHERE status = 'ACTIVE';

The Physics: When you use AS MATERIALIZED, the optimizer is forced to calculate the entire WITH block first, then apply the WHERE filter.
The Win: This prevents the optimizer from trying to "Push" filters into the WITH block in ways that might accidentally trigger a slow execution plan. It gives you, the Architect, the final control over the execution mirror.

5. Recursive CTEs: The Graph Solver

This is the "Superpower" of the WITH clause. A Recursive CTE allows you to traverse Hierarchical Data (like a Reddit comment tree, an organization chart, or a file system).

Note: We have a dedicated module for Recursive logic (Module 118), but here is a quick preview:

sql

WITH RECURSIVE org_chart AS (
    -- The Anchor (Starting point: The CEO)
    SELECT id, name, manager_id, 1 as level
    FROM employees WHERE manager_id IS NULL
    UNION ALL
    -- The Step (Find everyone who reports to the people found so far)
    SELECT e.id, e.name, e.manager_id, oc.level + 1
    FROM employees e
    JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT * FROM org_chart;

Summary: The Professional's Preference

Hardware-Mirror: The Working Set

During recursion, the engine maintains a "Working Set" in memory.

It processes Level 1, puts the results in the working set.
It uses the working set to find Level 2, and then replaces the working set with the new results.
This continues until the working set is empty.
Architect's Warning: If you create an "Infinite Loop" (e.g., John reports to Sarah, Sarah reports to John), your database will hit 100% CPU and potentially run out of RAM. Always use a level limit!

4. CTEs for Complex Data Modification (Writability)

Most developers don't realize that you can use CTEs with INSERT, UPDATE, and DELETE. This is called a Data Modifying CTE.

The "Move and Log" Pattern

Imagine you are deleting a user from the active_users table and you want to simultaneously record their ID in a deletion_audit table.

sql

WITH deleted_user AS (
    DELETE FROM active_users 
    WHERE id = 500 
    RETURNING id, name, deleted_at
)
INSERT INTO deletion_audit (user_id, log_time)
SELECT id, NOW() FROM deleted_user;

Why this is professional: It ensures that both operations happen in the same transaction at the same time. You don't have to write two separate queries in your Python/Java code. The database handles the "Pipeline" of data.

5. Case Study: The "Social Degrees" Performance Wall

A dating app wanted to show: "You have 5 mutual friends with Sarah." The Failure: They used a standard JOIN across the friendships table. At 1,000 friends, the query was fast. At 1 million users, the join became a "Many-to-Many" explosion that crashed the server.

The Fix: A Recursive CTE that searched exactly 2 levels deep, stopping as soon as it found a match. By materializing the "Friend list" in the first stage of the CTE, they prevented the database from re-scanning 10 million rows for every friend check. The query speed dropped from 3 seconds to 15 milliseconds.

Mastering CTEs is the structural engineering of SQL. By mastering logic segmentation and recursion, you gain the power to write code that is not just fast, but Maintainable.

Masterclass Logic Checklist

Audit CTE Materialization: Use EXPLAIN to see if your engine is "Folding" the CTE or "Materializing" it.
Implement Sovereign Writing: Use WITH ... RETURNING for atomic data transfers between tables.
Monitor Recursion depth: Always include a hard-coded safety limit (WHERE level < 100) in Recursive CTEs.
Optimize Chain Integrity: Ensure every stage in a CTE chain significantly reduces the working set before the next stage begins.

Frequently Asked Questions

Q: What is a CTE and how does it differ from a subquery?

A Common Table Expression (CTE) is a named temporary result defined with WITH name AS (SELECT ...) that can be referenced by name in the main query. A subquery is an unnamed SELECT embedded inline. CTEs improve readability - complex logic broken into named steps reads like a recipe rather than a nested maze. CTEs can also be referenced multiple times in the same query without re-executing, whereas an inline subquery is logically re-evaluated each reference. In most databases, the query optimiser may or may not materialise a CTE - PostgreSQL 12+ chose not to materialise by default for better optimisation.

Q: When should I use a CTE versus a temporary table?

A CTE exists only for the duration of the single statement that defines it - it is not a stored object. A temporary table persists for the session and can be indexed, analysed, and referenced by multiple subsequent statements. Use CTEs for single-query readability and decomposition when you do not need to reuse the intermediate result across multiple queries. Use temporary tables when the intermediate result is large and you want to materialise it once for efficient reuse, when you need to add an index to the intermediate result for performance, or when you are building a multi-step procedure.

Q: What is a recursive CTE and what problems does it solve?

A recursive CTE (`WITH RECURSIVE`) references itself, executing repeatedly until no new rows are produced. It solves hierarchical and graph traversal problems: finding all descendants of a node in an org chart, traversing a bill-of-materials tree, computing shortest paths, or generating date series. The structure is always: an anchor member (base case, non-recursive), UNION ALL, and a recursive member that references the CTE name. Most databases limit recursion depth to prevent infinite loops - PostgreSQL defaults to 100 levels, adjustable with `max_recursion_depth`.

Part of the SQL Mastery Course - engineering the logic.