SQLProjects

SQL CLI Project: Building a Database Explorer

TT
TopicTrick Team
SQL CLI Project: Building a Database Explorer

SQL Project: Building a High-Performance Database Explorer

Welcome to the Phase 1 Capstone of the SQL Masterclass. You have normalized schemas, mastered complex joins, and explored aggregations. Now, you will build the "Flashlight."

This 1,500+ word project guide is designed to be a technical assignment. You are a Tooling Engineer at "InfraGen," a company that manages $10,000$ database clusters. Your mission is to build a CLI utility that helps developers understand their schemas, detect "Bloat," and identify missing indexes without using a slow, heavy GUI.


1. Project Goal: The Observability Architectural Standard

Most developers treat the database as a "Black Box" where they send a query and get a row. As an architect, you must see the Machine Behind the Mirror.

Your goal is to build a tool that answers three fundamental questions for any database:

  1. Structure: What tables exist and how are they linked?
  2. Health: Which tables have too many "Dead Rows" and require maintenance?
  3. Performance: Which queries are consuming the most CPU time across the entire cluster?

The Technology Choice: Go or Zig?

For this project, we recommend building the companion code in Go (for its standard database/sql library and ease of cross-compilation) or Zig (for its bare-metal efficiency and C-level control over memory). Since these tools are used in production, they must be fast, lightweight, and have zero external dependencies.


2. Task 1: Interrogating the Catalog (Metadata Mirror)

Before your tool can display data, it must know what data exists.

The Information Schema Logic

The information_schema is a standard, ISO-compliant set of views that exists in almost every relational database (Postgres, SQL Server, MySQL). It is portable but can be slow.

Your first task is to write a routine that extracts all user-defined tables and their column counts.

sql

Architect's Note: We use a LEFT JOIN on a subquery here to ensure we get a single row per table. On huge databases with 1,000+ tables, this is significantly faster than running a correlated subquery in the SELECT list.

2.1 The Physics of Metadata Latency: Schema vs. Catalog

Why is querying information_schema slower than querying the internal catalog?

  • The Abstraction Mirror: information_schema is a set of Complex Views. To show you a simple table list, the database engine must join sometimes dozens of internal tables (pg_class, pg_namespace, pg_authid) just to satisfy the ISO standard.
  • The Hardware Reality: Each join in the information_schema views involves internal memory overhead. If you are building a tool for 2026 systems, you should favor the Direct Catalog Access (pg_catalog) whenever performance is critical. Your tool should support a --fast mode that hits the catalogs directly, bypassing the ISO abstraction layer.

3. Task 2: Surgical Type Extraction (System Catalog Physics)

Standard tools use SELECT * FROM table LIMIT 0 to guess types. Your tool must be smarter. It must read the System Catalog (pg_attribute).

The Task

When a user runs explorer describe <table_name>, your CLI should fetch:

  • Column Name
  • Internal Data Type (including length/precision)
  • Primary Key Status
  • Default Values

The Hardware-Mirror Query

Querying pg_catalog.pg_attribute directly bypasses the "View" layer of the Information Schema, hitting the database's internal C-structs directly.

sql

Why this is professional: By using ::regclass, you allow Postgres to perform an O(1) OID lookup rather than a string search. This allows your tool to "Describe" a table in microseconds.


4. Task 3: The "Bloat Hunter" (Storage Efficiency Lab)

In MVCC (Multi-Version Concurrency Control) databases like Postgres, UPDATE and DELETE commands don't actually remove data on the disk. They mark it as "Dead." If a table has too many dead rows, it is called "Bloated."

The Task

Identify tables where the "Dead Row Ratio" is exceeding 20%. These tables are "Vacuum candidates" that are currently wasting disk space and slowing down sequential scans.

sql

Mission Requirement: Your CLI should output this in a list and highlight any row with >20% bloat in RED to warn the administrator.


5. Task 4: The "Ghost Index" Detector

Indexes are the "Speed of Sound" for data. But unused indexes waste memory, and missing indexes kill production site speed.

The Task

Build a "Missing Index Advisory." You must find tables where the count of Sequential Scans (scanning the whole table) is much higher than the count of Index Scans.

sql

5.1 The Index Bloat (B-Tree Entropy) Mirror

It isn't just tables that get bloated; Indexes suffer from entropy too.

  • The Physics: When you delete a row, the B-Tree index entry is marked as dead but remains in the page. If you have "Heavy Deletes," your index can double or triple in size while holding the same number of active entries.
  • The Observability Challenge: A bloated index is slower because the CPU must traverse more "Empty" pages in the L2 cache to find a valid record.
  • The Task Upgrade: Add a command explorer-index-bloat that identifies indexes where the physical size is significantly larger than the theoretical size based on the item count. (Hint: Use pgstattuple extension if available).

6. Task 5: The "Top 10" Performance Monitor

To finish your tool, you must integrate with the pg_stat_statements extension. This allows you to see the "Heaviest" queries currently running on the server.

The Task

List the top 10 queries sorted by total execution time.

sql

Formatting Challenge: Raw SQL strings can be thousands of characters long. In your CLI (Go/Zig), you must implement a "Truncation Filter" that extracts the first 80 characters of the query and appends ... for display in the terminal.


7. Go Implementation Snippet: The Connector

To help you get started, here is how you might architect the "Describe Table" functionality in Go.

go

7.1 Portability Paradox: Supporting MySQL and Postgres

A truly professional CLI explorer should not be a "One-Trick Pony."

  • The Architecture: Use an Interface-based Pattern (in Go) or a Trait-based Pattern (in Zig).
  • The Implementation: Create a DatabaseDriver interface that defines methods like GetTables() and DescribeTable().
  • The Dialect Mirror: Your tool should detect the connection string (e.g., postgres:// vs mysql://) and automatically swap the SQL logic. MySQL users need to hit SHOW FULL COLUMNS, while Postgres users hit pg_catalog. This separation of "Logic" from "Dialect" is the mark of a Senior Infrastructure Architect.

8. Project Success Criteria: The Architect's Standard

To pass the "Database Explorer" lab, your tool must meet these engineering requirements:

  1. Extreme Performance: The tool must launch, connect, and query in under 100ms. Tooling that feels slow will not be used by developers.
  2. Zero-Write Protocol: Implement a --read-only flag (the default) that prevents the execution of any DROP, DELETE, or TRUNCATE commands.
  3. Connectivity Diagnostics: If the connection fails, don't just "Crash." Provide actionable feedback: "Error: Host unreachable. Check if the database instance is running on port 5432."
  4. UI Fidelity: Use ASCII tables for column outputs and ANSI color codes (Green for Healthy, Red for Bloated) to make the data glanceable.
  5. Streaming Exports: Implement a export <table_name> command that streams data out of the database as CSV. Requirement: It must use a Buffered Writer to avoid loading the entire dataset into RAM, allowing for 10GB exports on a 500MB RAM machine.

This project transitions you from a "Consumer of Data" to an "Infrastructure Architect." By mastering the movement of metadata and the physics of system observability, you gain the power to build tools that define how the rest of your engineering team interacts with the database.


Phase 6: Project Deliverables

  • A functional CLI binary (Go, Zig, or similar) that can "Describe" any table in a cluster.
  • An "Audit" command that outputs the Bloat Ratio for all user tables.
  • An "Advisory" engine that identifies at least one table with high sequential scan volume.
  • Advanced Goal: Implement a stream-export command using buffered I/O for large-scale data migration.
  • Full documentation on how to connect your tool to a production environment.

Read next: SQL Project: Financial Reporting Dashboard (Phase 3) →


Part of the SQL Mastery Course — engineering the infrastructure.