Enterprise ArchitectureZachman Framework

Zachman What Column: Data and Inventory Architecture Explained

TT
TopicTrick Team
Zachman What Column: Data and Inventory Architecture Explained

Zachman What Column: Data and Inventory Architecture Explained

The "What" interrogative forms the first column of the Zachman Framework matrix. It answers the fundamental question: "What does the enterprise consist of?" In other words, what data entities, objects, and inventory items are central to the enterprise's operation?

The What column represents data architecture across all six perspectives - from the executive planner's simple entity list down to the actual working database containing millions of records. Understanding this column is essential for enterprise data governance, data quality, and ensuring data consistency across the enterprise.


What Does the What Column Address?

The What column specifically focuses on:

  • Entities and objects: All the "nouns" the enterprise cares about (customers, orders, products, accounts, employees, assets)
  • Entity relationships: How entities relate to each other (a customer has multiple accounts, an account has multiple transactions)
  • Data attributes: Properties of each entity (a customer has a name, email, phone)
  • Inventory management: Classification and counting of business objects
  • Information classification: Categorising data by type and sensitivity
  • Master data: Reference data that is reused across the enterprise (product codes, customer segments)

The What column is intentionally NOT about processes, locations, timing, or business rules. That is the domain of the other interrogatives. The What column focuses purely on "what things exist and how they relate."


The What Column Across Six Perspectives

Row 1 (Planner): Entity List / Data Scope

Question: What types of entities does the business deal with?

Artefacts:

  • Simple list of entities (nouns): Customer, Order, Product, Account, Invoice, Employee, etc.
  • Brief description of each entity (one or two sentences)
  • Scope boundaries (e.g., "We track customers globally but products only in North America")

Characteristics:

  • Executive summary level - one or two pages
  • Business terminology only, no technical jargon
  • Stable and rarely changing

Example:

text
Entities:
- Customer: A person or organisation who purchases products
- Order: A request to purchase one or more products
- Product: An item available for sale
- Account: A billing account associated with a customer
- Invoice: A bill sent to a customer for an order

Why it matters: Establishes shared vocabulary. Everyone agrees on what "customer," "order," and "product" mean.


Row 2 (Owner): Entity-Relationship Model / Business Data Model

Question: How do entities relate to each other from a business perspective?

Artefacts:

  • Entity-Relationship diagram (ER diagram) with business entities and relationships
  • Entity definitions including attributes (non-technical)
  • Cardinalities and constraints (one-to-many, many-to-many)
  • Business rules governing relationships (e.g., "An order must have at least one line item")

Characteristics:

  • 5-30 pages depending on complexity
  • Business owner's view - understandable without technical training
  • Reflects current business model - changes when business processes change

Example ER relationships:

text
Customer ---> (has many) ---> Account
Account ---> (has many) ---> Transaction
Customer ---> (places) ---> Order
Order ---> (contains many) ---> LineItem
LineItem ---> (references) ---> Product

Business rules:

  • A customer can have 0 to unlimited accounts
  • An order must contain at least 1 line item and at most 100 line items
  • An order must reference at least one customer

Why it matters: Ensures business stakeholders agree on data relationships. Prevents misalignment between business model and technical implementation.


Row 3 (Designer): Logical Data Model

Question: What is the system's logical data structure, independent of technology choice?

Artefacts:

  • Normalised Entity-Relationship diagram (third-normal form or higher)
  • Logical entity definitions with attributes and data types (logical types: text, number, date, boolean)
  • Logical keys and constraints
  • Relationship specifications without database-specific syntax

Characteristics:

  • 10-50 pages depending on complexity
  • System designer's view - technical but not technology-specific
  • Detailed and comprehensive - includes all entities that the system must track
  • Changes when system requirements change, not when technology changes

Example logical entity:

text
Customer entity:
  - CustomerID (unique identifier)
  - FirstName (text, required)
  - LastName (text, required)
  - Email (text, required, unique)
  - Phone (text)
  - Address (text)
  - DateOfBirth (date)
  - CustomerSegment (text) [references Segment]
  - CreatedDate (date)
  - ModifiedDate (date)

Logical relationships:

text
Customer --< 1:M >-- Account
  A customer has 0 to many accounts
  An account belongs to exactly 1 customer

Account --< 1:M >-- Transaction
  An account has 1 to many transactions
  A transaction belongs to exactly 1 account

Why it matters: Ensures system design is sound before committing to specific technologies. Can be implemented using any database technology.


Row 4 (Builder): Physical Data Model / Database Schema

Question: What is the actual database design using specific technology?

Artefacts:

  • Physical database schema (Oracle schema, PostgreSQL schema, MongoDB schema, etc.)
  • Table/collection definitions with column types specific to the chosen database
  • Indexes and partitioning strategy
  • Performance optimisation decisions (e.g., denormalisation for speed)
  • Storage parameters (tablespace, file locations)

Characteristics:

  • 20-100 pages depending to database complexity
  • Technical and technology-specific
  • Designed for performance, not pure normalisation
  • Changes when technology or performance requirements change

Example for Oracle:

sql
CREATE TABLE CUSTOMER (
  CUSTOMER_ID NUMBER PRIMARY KEY,
  FIRST_NAME VARCHAR2(100) NOT NULL,
  LAST_NAME VARCHAR2(100) NOT NULL,
  EMAIL VARCHAR2(255) NOT NULL UNIQUE,
  PHONE VARCHAR2(20),
  ADDRESS VARCHAR2(500),
  DATE_OF_BIRTH DATE,
  CUSTOMER_SEGMENT VARCHAR2(50),
  CREATED_DATE DATE DEFAULT SYSDATE,
  MODIFIED_DATE DATE DEFAULT SYSDATE
);

CREATE INDEX IDX_CUSTOMER_EMAIL ON CUSTOMER(EMAIL);
CREATE INDEX IDX_CUSTOMER_SEGMENT ON CUSTOMER(CUSTOMER_SEGMENT);

Why it matters: Ensures database is designed for performance and configured correctly for the chosen technology.


Row 5 (Sub-Contractor): DDL Scripts and Data Loading

Question: What are the specific database creation and data loading scripts?

Artefacts:

  • Complete DDL (Data Definition Language) scripts (CREATE TABLE, CREATE INDEX, etc.)
  • Data loading scripts (INSERT statements or ETL scripts)
  • Migration scripts (ALTER TABLE for schema changes)
  • Backup and recovery procedures
  • Data validation scripts

Characteristics:

  • Hundreds to thousands of lines of code
  • Fully executable and version-controlled
  • Highly volatile (changes constantly as requirements evolve)
  • "Out of context" without explanation - difficult to understand without deep technical knowledge

Example:

sql
-- Create main tables
CREATE TABLE CUSTOMER (...);
CREATE TABLE ACCOUNT (...);
CREATE TABLE TRANSACTION (...);

-- Load initial data
INSERT INTO CUSTOMER VALUES (1, 'John', 'Doe', 'john@example.com', ...);
INSERT INTO CUSTOMER VALUES (2, 'Jane', 'Smith', 'jane@example.com', ...);

-- Create indexes
CREATE INDEX IDX_CUSTOMER_EMAIL ON CUSTOMER(EMAIL);

-- Create views
CREATE VIEW ACTIVE_CUSTOMERS AS SELECT * FROM CUSTOMER WHERE status = 'ACTIVE';

-- Grant permissions
GRANT SELECT ON CUSTOMER TO app_user;
GRANT INSERT, UPDATE ON CUSTOMER TO app_user;

Why it matters: Makes the database operational. Scripts are version-controlled and repeatable.


Row 6 (Enterprise): Live Database

Question: What data actually exists in the running system?

Artefacts:

  • Actual database contents (live records)
  • Data volumes and growth metrics
  • Data quality metrics
  • Sample queries and data extraction
  • Backup and recovery status
  • Data age and refresh rates

Characteristics:

  • Constantly changing (new data arrives every second)
  • Measured and observable
  • The source of truth for "what actually exists"
  • Monitored for quality and compliance

Example metrics:

text
CUSTOMER table: 145.2 million records
  - Active customers: 142.1 M
  - Inactive: 3.1 M
  - Average growth: 2.1% per year
  - Data freshness: Real-time (updated via API)

ACCOUNT table: 287.4 million records
  - Average per customer: 1.97 accounts
  - Growth rate: 2.3% per year

Data quality:
  - Duplicate email addresses: 0 (checked daily)
  - Null phone numbers: 12.3% (acceptable)
  - Data freshness: Last update 34 seconds ago

Why it matters: Validates that the system is operating correctly and data quality is maintained.


Data Governance: Using the What Column

The What column is the foundation for enterprise data governance:

  1. Row 1 alignment: Business and IT agree on what entities matter.
  2. Row 2 specification: Business defines entity definitions and relationships.
  3. Row 3 design: Architects design system-independent logical model.
  4. Row 4 implementation: DBAs implement in specific technology.
  5. Row 5 deployment: Operations deploy the database.
  6. Row 6 monitoring: Data stewards monitor for quality and compliance.

This traceability ensures data governance policies flow from business definition through technical implementation to operational monitoring.


Common Mistakes in the What Column

  1. Skipping Row 2 (Business ER Model): Going directly from scope (Row 1) to logical design (Row 3). Result: Requirements misaligned with business.

  2. Over-normalisation: Normalising to 5NF or higher in Row 3, then de-normalising for performance in Row 4. Result: Confusion about the "true" model.

  3. Denormalisation gone wrong: De-normalising in Row 4 to the point where the model is unrecognisable. Result: Data anomalies and maintainability problems.

  4. Ignoring Row 6 validation: Never comparing Rows 1-5 (designs) with Row 6 (actual data). Result: Designs that don't match reality.

  5. Mixing interrogatives: Including process information (How column) in the What column. Result: Confusion about what the What column is supposed to do.


What Column vs Other Columns

  • What vs How: What is "nouns" (entities), How is "verbs" (processes). A customer (What) undergoes a billing process (How).
  • What vs Where: What is "what exists," Where is "where it's located." Customer data (What) is stored in us-east-1 and replicated to eu-west-1 (Where).
  • What vs Who: What is "objects," Who is "actors." Customer records (What) are owned by the sales department (Who).
  • What vs When: What is "things," When is "timing." Customer records (What) are updated nightly (When).
  • What vs Why: What is "inventory," Why is "motivation." We track customers (What) because we need billing information (Why).

Industry Examples: The What Column

Banking

  • What entities: Customer, Account, Transaction, Card, Loan, Collateral, Payment
  • Key relationships: Customer has many accounts; Account has many transactions; Loan has collateral

Healthcare

  • What entities: Patient, Provider, Appointment, Diagnosis, Medication, Lab Result, Insurance
  • Key relationships: Patient has many appointments; Patient has many diagnoses; Prescription contains many medications

Retail

  • What entities: Customer, Order, Product, Inventory, Supplier, Category, Promotion
  • Key relationships: Order contains many products; Product belongs to many categories; Inventory tracks products across locations

Data Architecture Best Practices

  1. Maintain traceability: Ensure each Row 4-5 table maps back to a Row 3 logical entity.

  2. Version your models: Use version control for Rows 3-5. Track changes over time.

  3. Separate logical from physical: Keep Row 3 (logical) independent of Row 4 (physical). This enables technology changes without breaking the model.

  4. Validate against reality: Periodically compare Rows 1-5 with Row 6. Do they align?

  5. Document relationships: Clearly document why entities relate the way they do (the "Why" of the What column).


Key Takeaways

  1. The What column classifies enterprise data and inventory: It answers "what entities and data must the enterprise track?"

  2. Six perspectives provide a complete data view: From business scope to live database.

  3. Traceability from Row 1 to Row 6 is critical: Ensure business definitions flow through design and implementation to actual data.

  4. The What column is foundational for data governance: It defines the universe of data the enterprise manages.

  5. Do not skip any rows: Each row adds value and catches different types of data architecture errors.


Next Steps

  • Explore How Column (processes) to see how data (What) flows through processes.
  • Read Practical Application: Data Governance for real-world examples.
  • Jump to Artifacts and Deliverables for detailed specifications of ER diagrams, schemas, and DDL.
  • Explore Other Interrogatives (Where, Who, When, Why) to see the complete data architecture picture.

The What column is the foundation of data architecture. Master it, and you master a critical dimension of enterprise architecture.


Meta Keywords: Zachman What column, data architecture, entity modeling, data inventory, ER diagrams, logical data model, physical data model, DDL, database schema, enterprise data governance.