SQL CLI Project: Building a Database Explorer

SQL Project: Building a High-Performance Database Explorer
Welcome to the Phase 1 Capstone of the SQL Masterclass. You have normalized schemas, mastered complex joins, and explored aggregations. Now, you will build the "Flashlight."
This 1,500+ word project guide is designed to be a technical assignment. You are a Tooling Engineer at "InfraGen," a company that manages $10,000$ database clusters. Your mission is to build a CLI utility that helps developers understand their schemas, detect "Bloat," and identify missing indexes without using a slow, heavy GUI.
1. Project Goal: The Observability Architectural Standard
Most developers treat the database as a "Black Box" where they send a query and get a row. As an architect, you must see the Machine Behind the Mirror.
Your goal is to build a tool that answers three fundamental questions for any database:
- Structure: What tables exist and how are they linked?
- Health: Which tables have too many "Dead Rows" and require maintenance?
- Performance: Which queries are consuming the most CPU time across the entire cluster?
The Technology Choice: Go or Zig?
For this project, we recommend building the companion code in Go (for its standard database/sql library and ease of cross-compilation) or Zig (for its bare-metal efficiency and C-level control over memory). Since these tools are used in production, they must be fast, lightweight, and have zero external dependencies.
2. Task 1: Interrogating the Catalog (Metadata Mirror)
Before your tool can display data, it must know what data exists.
The Information Schema Logic
The information_schema is a standard, ISO-compliant set of views that exists in almost every relational database (Postgres, SQL Server, MySQL). It is portable but can be slow.
Your first task is to write a routine that extracts all user-defined tables and their column counts.
SELECT
t.table_name,
COALESCE(tc.column_count, 0) as column_count,
t.table_type
FROM information_schema.tables t
LEFT JOIN (
SELECT table_name, count(*) as column_count
FROM information_schema.columns
GROUP BY table_name
) tc ON t.table_name = tc.table_name
WHERE t.table_schema = 'public'
AND t.table_type = 'BASE TABLE'
ORDER BY t.table_name;Architect's Note: We use a LEFT JOIN on a subquery here to ensure we get a single row per table. On huge databases with 1,000+ tables, this is significantly faster than running a correlated subquery in the SELECT list.
2.1 The Physics of Metadata Latency: Schema vs. Catalog
Why is querying information_schema slower than querying the internal catalog?
- The Abstraction Mirror:
information_schemais a set of Complex Views. To show you a simple table list, the database engine must join sometimes dozens of internal tables (pg_class,pg_namespace,pg_authid) just to satisfy the ISO standard. - The Hardware Reality: Each join in the
information_schemaviews involves internal memory overhead. If you are building a tool for 2026 systems, you should favor the Direct Catalog Access (pg_catalog) whenever performance is critical. Your tool should support a--fastmode that hits the catalogs directly, bypassing the ISO abstraction layer.
3. Task 2: Surgical Type Extraction (System Catalog Physics)
Standard tools use SELECT * FROM table LIMIT 0 to guess types. Your tool must be smarter. It must read the System Catalog (pg_attribute).
The Task
When a user runs explorer describe <table_name>, your CLI should fetch:
- Column Name
- Internal Data Type (including length/precision)
- Primary Key Status
- Default Values
The Hardware-Mirror Query
Querying pg_catalog.pg_attribute directly bypasses the "View" layer of the Information Schema, hitting the database's internal C-structs directly.
SELECT
a.attname AS column_name,
format_type(a.atttypid, a.atttypmod) AS data_type,
a.attnotnull AS is_required,
(SELECT pg_get_expr(adbin, adrelid)
FROM pg_attrdef WHERE adrelid = a.attrelid AND adnum = a.attnum) AS default_value
FROM pg_attribute a
WHERE a.attrelid = 'your_table_name'::regclass
AND a.attnum > 0
AND NOT a.attisdropped;Why this is professional: By using ::regclass, you allow Postgres to perform an O(1) OID lookup rather than a string search. This allows your tool to "Describe" a table in microseconds.
4. Task 3: The "Bloat Hunter" (Storage Efficiency Lab)
In MVCC (Multi-Version Concurrency Control) databases like Postgres, UPDATE and DELETE commands don't actually remove data on the disk. They mark it as "Dead." If a table has too many dead rows, it is called "Bloated."
The Task
Identify tables where the "Dead Row Ratio" is exceeding 20%. These tables are "Vacuum candidates" that are currently wasting disk space and slowing down sequential scans.
SELECT
relname AS table_name,
n_live_tup AS live_rows,
n_dead_tup AS dead_rows,
ROUND(n_dead_tup::numeric / GREATEST(n_live_tup + n_dead_tup, 1), 4) * 100 AS bloat_percentage
FROM pg_stat_user_tables
WHERE (n_live_tup + n_dead_tup) > 1000 -- Ignore tiny tables
ORDER BY bloat_percentage DESC;Mission Requirement: Your CLI should output this in a list and highlight any row with >20% bloat in RED to warn the administrator.
5. Task 4: The "Ghost Index" Detector
Indexes are the "Speed of Sound" for data. But unused indexes waste memory, and missing indexes kill production site speed.
The Task
Build a "Missing Index Advisory." You must find tables where the count of Sequential Scans (scanning the whole table) is much higher than the count of Index Scans.
SELECT
relname AS table_name,
seq_scan AS full_scans,
idx_scan AS indexed_scans,
n_live_tup AS total_rows
FROM pg_stat_user_tables
WHERE seq_scan > 10000
AND idx_scan < seq_scan
AND n_live_tup > 50000;5.1 The Index Bloat (B-Tree Entropy) Mirror
It isn't just tables that get bloated; Indexes suffer from entropy too.
- The Physics: When you delete a row, the B-Tree index entry is marked as dead but remains in the page. If you have "Heavy Deletes," your index can double or triple in size while holding the same number of active entries.
- The Observability Challenge: A bloated index is slower because the CPU must traverse more "Empty" pages in the L2 cache to find a valid record.
- The Task Upgrade: Add a command
explorer-index-bloatthat identifies indexes where the physical size is significantly larger than the theoretical size based on the item count. (Hint: Usepgstattupleextension if available).
6. Task 5: The "Top 10" Performance Monitor
To finish your tool, you must integrate with the pg_stat_statements extension. This allows you to see the "Heaviest" queries currently running on the server.
The Task
List the top 10 queries sorted by total execution time.
SELECT
query,
calls,
total_exec_time / 1000 / 60 AS total_minutes,
mean_exec_time AS average_ms,
rows / calls AS avg_rows_per_call
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;Formatting Challenge: Raw SQL strings can be thousands of characters long. In your CLI (Go/Zig), you must implement a "Truncation Filter" that extracts the first 80 characters of the query and appends ... for display in the terminal.
7. Go Implementation Snippet: The Connector
To help you get started, here is how you might architect the "Describe Table" functionality in Go.
func DescribeTable(db *sql.DB, tableName string) {
query := `...[SQL from Task 2]...`
rows, err := db.Query(query, tableName)
if err != nil {
log.Fatalf("Error describing table: %v", err)
}
defer rows.Close()
fmt.Printf("Columns for %s:\n", tableName)
fmt.Println("-------------------------------------------------")
for rows.Next() {
var name, dtype, required, defaultVal sql.NullString
rows.Scan(&name, &dtype, &required, &defaultVal)
fmt.Printf("%-20s | %-15s | %s\n", name.String, dtype.String, required.String)
}
}7.1 Portability Paradox: Supporting MySQL and Postgres
A truly professional CLI explorer should not be a "One-Trick Pony."
- The Architecture: Use an Interface-based Pattern (in Go) or a Trait-based Pattern (in Zig).
- The Implementation: Create a
DatabaseDriverinterface that defines methods likeGetTables()andDescribeTable(). - The Dialect Mirror: Your tool should detect the connection string (e.g.,
postgres://vsmysql://) and automatically swap the SQL logic. MySQL users need to hitSHOW FULL COLUMNS, while Postgres users hitpg_catalog. This separation of "Logic" from "Dialect" is the mark of a Senior Infrastructure Architect.
8. Project Success Criteria: The Architect's Standard
To pass the "Database Explorer" lab, your tool must meet these engineering requirements:
- Extreme Performance: The tool must launch, connect, and query in under 100ms. Tooling that feels slow will not be used by developers.
- Zero-Write Protocol: Implement a
--read-onlyflag (the default) that prevents the execution of anyDROP,DELETE, orTRUNCATEcommands. - Connectivity Diagnostics: If the connection fails, don't just "Crash." Provide actionable feedback: "Error: Host unreachable. Check if the database instance is running on port 5432."
- UI Fidelity: Use ASCII tables for column outputs and ANSI color codes (Green for Healthy, Red for Bloated) to make the data glanceable.
- Streaming Exports: Implement a
export <table_name>command that streams data out of the database as CSV. Requirement: It must use a Buffered Writer to avoid loading the entire dataset into RAM, allowing for 10GB exports on a 500MB RAM machine.
This project transitions you from a "Consumer of Data" to an "Infrastructure Architect." By mastering the movement of metadata and the physics of system observability, you gain the power to build tools that define how the rest of your engineering team interacts with the database.
Phase 6: Project Deliverables
- A functional CLI binary (Go, Zig, or similar) that can "Describe" any table in a cluster.
- An "Audit" command that outputs the Bloat Ratio for all user tables.
- An "Advisory" engine that identifies at least one table with high sequential scan volume.
- Advanced Goal: Implement a
stream-exportcommand using buffered I/O for large-scale data migration. - Full documentation on how to connect your tool to a production environment.
Read next: SQL Project: Financial Reporting Dashboard (Phase 3) →
Frequently Asked Questions
Q: What are the most useful psql meta-commands for exploring a PostgreSQL database?
\l lists all databases. \c dbname connects to a database. \dt lists tables in the current schema. \d tablename describes a table's columns, types, and constraints. \di lists indexes. \df lists functions. \dv lists views. \dn lists schemas. iming toggles query execution time display. \e opens the last query in your default editor. \i filename.sql executes a SQL file. These meta-commands are only available in psql — they are shortcuts that psql translates into system catalog queries, equivalent to querying information_schema or pg_catalog tables directly.
Q: How do I export query results to a CSV file from the command line?
In psql: \copy (SELECT * FROM table) TO '/path/to/file.csv' WITH CSV HEADER — \copy runs client-side and works without superuser privileges. The server-side COPY command (without backslash) writes to a server path and requires superuser. From the shell: psql -c "SELECT * FROM table" -A -F ',' -o output.csv using psql's own formatting. For large exports, piping through COPY is faster than fetching through the client. For scheduled exports or ETL, pg_dump handles full tables and psql -f query.sql handles scripted exports.
Q: What is information_schema and how do I use it to explore a database's structure?
information_schema is a standardised, read-only set of views defined by the SQL standard that exposes metadata about the database — tables, columns, constraints, views, and privileges. SELECT * FROM information_schema.tables WHERE table_schema = 'public' lists all user tables. SELECT column_name, data_type FROM information_schema.columns WHERE table_name = 'orders' describes a table's columns. SELECT * FROM information_schema.table_constraints lists constraints. It is more portable than database-specific catalogs (pg_catalog in PostgreSQL, INFORMATION_SCHEMA in SQL Server) but may expose less detail. For PostgreSQL-specific metadata, pg_catalog is more comprehensive.
Part of the SQL Mastery Course — engineering the infrastructure.
