SQLDatabase

SQL Introduction: Relational Databases and ACID

TT
TopicTrick Team
SQL Introduction: Relational Databases and ACID

SQL Introduction: The Relational Mirror

In the 1960s, data storage was a chaotic frontier. Companies relied on "Hierarchical" or "Network" databases where data was stored in fixed, physical paths. If you wanted to find an employee's salary, your program had to "Navigate" a specific physical linked-list on the disk. If a developer moved a single pointer, every application in the company would crash. This was known as Data Dependence, and it was the single biggest bottleneck in early computer science.

Everything changed in 1970. Edgar F. Codd, an IBM researcher with a PhD in computer science, published a paper that would change the world: "A Relational Model of Data for Large Shared Data Banks." Codd proposed a radical idea: Data Independence. He argued that users should never care how data is stored on a spinning disk or a flash drive; they should only care about the Logical Relationship between pieces of information.


1. The Relational Model: Logic Over Physicality

The Relational Model is built on Set Theory and Predicate Logic. Unlike a spreadsheet, which is a visual layout, a Relational Database is a mathematical set of "Relations" (Tables).

The Mathematical Foundation

To understand SQL at a senior level, you must understand the terminology:

  • Relation (Table): A set of tuples. Mathematically, a relation is a subset of the Cartesian product of a list of domains.
  • Tuple (Row): A single record. In a set, the order of elements doesn't matter. This is why SQL doesn't guarantee the order of your results unless you use ORDER BY.
  • Attribute (Column): A named domain. Every attribute has a physical constraint (e.g., INT32, UTF-8 String).
  • Degree: The number of columns in a table.
  • Cardinality: The number of rows in a table.

Codd's Rules: The 12 Commandments

Codd defined $12$ (actually $13$, starting from $0$) rules that a database must follow to be considered truly "Relational." The most critical for modern architects are:

  1. Rule 1: The Information Rule: All information is represented in one and only one way—as values in tables. There are no "Hidden pointers."
  2. Rule 3: Systematic Treatment of NULL: The database must handle "Missing information" in a consistent way, distinct from zero or an empty string.
  3. Rule 12: Nonsubversion Rule: If the system has a low-level interface, it cannot be used to subvert the security or integrity constraints of the high-level SQL language.

2. Hardware-Mirror: The Anatomy of a Transaction

When you run an UPDATE or INSERT, you aren't just changing a file—you are interacting with the physical limitations of silicon and magnetism.

The Storage Engine vs. The Relational Engine

A modern RDBMS like PostgreSQL is split into two halves:

  • The Relational Engine (The Brain): Parses your SQL, optimizes the math, and decides which rows to touch.
  • The Storage Engine (The Hands): Writes bits to the disk and manages the Buffer Pool (RAM).

The Physiology of a Write: The WAL Pattern

If every transaction required a "Random Write" to the middle of a 1TB data file, databases would be impossibly slow. Instead, SQL uses the Write-Ahead Log (WAL):

  1. The Sequential Write: The change is first written to a simple log file. Writing to the end of a log is a "Sequential I/O" operation. On modern NVMe drives, this is nearly as fast as RAM.
  2. The Acknowledgment: Only once the WAL entry is "Sync'd" to the physical disk platter does the database tell you "Success."
  3. The Checkpoint: Later, a background process (The Checkpointer) moves the data from the WAL to the actual table files. If the power goes out, the database simply "Replays" the WAL entries to rebuild the state.

The 8KB Data Page Mirror

To understand how SQL physically "sees" your data, you must understand the Page.

  • The Physical Unit: Databases like PostgreSQL do not read rows; they read 8KB Pages.
  • The Slotted Page Physics: Every page has a header, a line-pointer array, and the actual tuple data at the bottom. When you ask for a single row, the database must pull the entire 8KB page into the Buffer Pool (RAM).
  • The IO Cost Mirror: This is why "Selecting *" is so expensive. If your row size is small, you fit more rows per page, reducing the number of physical disk reads required to mirror your results into RAM.


3. The ACID Deep Dive: The Pillars of Global Finance

"ACID" is not just a marketing term; it is the contract that prevents banks from losing your money.

Atomicity: The Unit of Work

A transaction is "Atomic"—it cannot be split. If you move $$100$ from User A to User B, and the server crashes after subtracing $$100$ but before adding it to B, Atomicity forces a Rollback.

  • Internal Reality: Every data page has a "LSN" (Log Sequence Number). The engine uses these numbers to undo partial changes.

Consistency: Data Integrity

Consistency ensures the database never enters an "Impossible State."

  • Constraints: Foreign keys, unique indexes, and CHECK constraints are enforced before the transaction commits.
  • Business Logic Integration: By putting a CHECK (balance >= 0) in SQL, you protect your company from bugs in your frontend or backend code. Even if a junior developer writes bad code, the database refuses to be "Inconsistent."

Isolation: Concurrent Realities

Isolation allows thousands of people to use the same table at the same time without seeing each other's "Drafts."

  • MVCC (Multi-Version Concurrency Control): PostgreSQL doesn't overwrite data. It creates a Version of a row. While Transaction A is updating the row, Transaction B is reading the "Old Version."
  • The Performance Win: "Readers never block Writers." This is why SQL scales to the massive demands of 2026.

Durability: The Hard Drive Contract

Durability means that once a transaction is committed, it will survive a power outage, a system crash, or even an OS failure.

  • The fsync Call: The engine uses the fsync() primitive to bypass the OS's software cache and force the disk controller to physically write the data to the non-volatile surface.

4. Case Study: The "Phantom Read" Disaster

Imagine a simple E-commerce audit.

  1. Transaction 1 (Audit): Counts all orders for the day. Result: $100$.
  2. Transaction 2 (New Sale): A customer buys a product. Transaction 2 commits.
  3. Transaction 1 (Audit): Checks the same table again to calculate total revenue. Result: $101$.

The Result: The audit is now logically broken because the count (100) and the revenue (101 orders worth) don't match. This is a Phantom Read.

The Architectural Solution

Professional architects solve this by choosing the correct Isolation Level:

  • Read Committed: The default. Fast but allows Phantoms.
  • Repeatable Read: Ensures that if you read a row once, it won't change while your transaction is open.
  • Serializable: The highest level. It makes the database behave as if every transaction happened one-by-one in a perfect line.

5. SQL vs NoSQL: The 2026 Convergence

For a decade, the "NoSQL" movement (MongoDB, Cassandra) claimed that SQL couldn't scale. They sacrificed ACID for "Speed."

The Current Reality:

  • PostgreSQL now supports JSONB, giving you NoSQL flexibility inside a SQL engine.
  • Distributed SQL (CockroachDB, Yugabyte) has solved the "Scaling Problem" by using the Raft Consensus Algorithm to keep ACID properties across thousands of servers.
  • The Verdict: Unless you are building something specialized like a real-time game server or a massive log-aggregator, Postgre/SQL is the 2026 standard for 99% of mission-critical applications.

7. The Mathematical Mirror: Relational Algebra

Beyond the hardware, SQL is an implementation of Relational Algebra. When you write a query, you are performing set operations.

The Algebra Physics

  • Selection ($\sigma$): Filtering rows based on a predicate (The WHERE clause).
  • Projection ($\pi$): Choosing specific columns (The SELECT clause).
  • Join ($\bowtie$): Composing two relations into a new set.
  • Set Operations: UNION, INTERSECT, and EXCEPT.

Understanding that SQL is "Declarative Math" changes how you optimize. You don't tell the database how to loop over data; you describe the desired result set, and the Query Optimizer (the database's "Silicon Brain") builds the most efficient physical execution mirror.


6. Summary: The SQL Master's Vocabulary

  1. Relation: A mathematical set of data (Table).
  2. WAL: The sequential log that handles the "Durability" of your ACID promise.
  3. MVCC: The versioning system that allows high concurrency without locking.
  4. Optimizer: The engine that converts your "Request" (SQL) into "Execution" (Physical IO).
  5. Schema: The contract that defines the types and constraints of your digital universe.

SQL is the "Language of Logic." By mastering the relational model and the physics of how data hits the disk, you gain the ability to build systems that are not just "Fast," but Indestructible. You move from being a "User of data" to an "Architect of Truth."


Phase 1: Action Items

  • Install Docker and start a PostgreSQL 16 container.
  • Create a table and insert $1,000$ rows.
  • Observe the pg_wal directory in your container to see the physical Write-Ahead Log being generated.

Read next: SQL Installation and Environment Setup: Postgre vs MySQL →


Part of the SQL Mastery Course — engineering the truth.