TOGAF Phase C: Data Architecture and Information Management

Q: What does Data Architecture cover in TOGAF Phase C?

Data Architecture describes the structure of data assets including a Data Entity Catalogue, Data Entity and Business Function Matrix, Logical Data Models, and Data Governance definitions.

Q: What is the difference between a Logical Data Model and a Physical Data Model in TOGAF?

A Logical Data Model describes data entities and relationships independent of specific database technology. A Physical Data Model describes implementation details in a specific database system.

Q: How does Data Architecture relate to the concept of a Single Source of Truth?

Data Architecture makes Single Source of Truth concrete by identifying each entity and specifying which application is the authoritative System of Record for that data.

← Back to TOGAF Hub

What is TOGAF Phase C Data Architecture?

TOGAF Phase C Data Architecture defines the logical data model, data governance framework, data flow maps, and data storage patterns for the enterprise. It is developed alongside Application Architecture in Phase C, but TOGAF recommends completing data architecture first because application design decisions must conform to the agreed data model. Good Data Architecture is what prevents organisations from ending up with fragmented, inconsistent data that undermines both operations and compliance.

Data is the lifeblood of a modern enterprise. Every application, every business process, and every strategic decision depends on data being available, accurate, and properly governed. TOGAF's Phase C Data Architecture is where the enterprise architecture practice addresses this reality head-on.

Working alongside Application Architecture, Data Architecture answers the question of what information the enterprise needs, where it lives, how it is structured, and how it flows across the organization. Getting this right is what separates organizations that can trust their data from those that struggle with fragmented, inconsistent information that leads to bad decisions.

What Is Data Architecture?

Data Architecture describes the structure of an organization's logical and physical data assets and the associated data management resources. In practice, this covers:

What data the organization holds and what it represents
How data is logically structured and related
Where data physically resides and how it is stored
How data flows between systems, people, and external parties
Who is responsible for data quality, access, and governance
What the rules are around data security, privacy, and retention

Data Architecture in TOGAF is developed in Phase C alongside Application Architecture, but the sequence matters. TOGAF recommends that Data Architecture is completed before Application Architecture is finalized, because application design decisions must conform to the agreed data model - not the other way around.

Why Data Architecture Often Gets Neglected

In many organizations, data architecture is treated as an afterthought. Applications are built first, and data structures emerge from those application decisions. This approach produces:

Multiple definitions of the same concept (for example, three systems each have a different definition of "customer")
Duplicated data held in siloed stores with no synchronization
Integration projects that fail because systems cannot agree on a shared data model
Regulatory failures because data cannot be traced to its source or accurately reported

Phase C Data Architecture prevents these problems by establishing the agreed logical data model before applications are designed to implement it.

Logical Data Models

The cornerstone of Phase C Data Architecture is the Logical Data Model. This is a structured representation of the key data entities the enterprise needs, how they relate to each other, and the rules that govern those relationships.

A logical data model is independent of any specific technology or database system. It describes the business reality of data, not the technical implementation.

Key Elements of a Logical Data Model

Entities: The core things the organization needs to store information about, such as Customer, Product, Order, Employee, or Policy
Attributes: The properties each entity has, such as a Customer having a Name, Date of Birth, and Contact Preferences
Relationships: How entities connect to each other, such as a Customer placing multiple Orders, or an Order containing multiple Products
Cardinality: The numeric nature of relationships, such as one-to-many (one Customer, many Orders) or many-to-many (many Orders, many Products)
Business rules: Constraints on data, such as an Order must have at least one line item, or a Policy must be associated with exactly one Policyholder

Conceptual vs Logical vs Physical Data Models

TOGAF uses three levels of data model abstraction. The Conceptual Data Model identifies the key entities at a business level. The Logical Data Model adds attributes and relationships. The Physical Data Model adds database-specific design (tables, columns, indexes, partitioning). Phase C focuses on logical models; physical models are implementation detail.

Data Governance

Data Architecture cannot be separated from Data Governance. Governance defines who has responsibility for each data asset and what the rules are for its management.

TOGAF Phase C establishes the governance framework that will apply to data across the enterprise. This includes:

Data ownership: For each entity or data domain, which business role is the authoritative owner, accountable for its accuracy and completeness?
Data stewardship: Who is responsible for day-to-day management of data quality within a specific domain?
Data standards: Agreed formats, naming conventions, and code values (for example, ISO country codes, standardized product category codes)
Data quality rules: Specific, measurable quality criteria for each important data element
Access and security policies: Who can see, modify, and delete which data, and under what conditions
Retention and disposal policies: How long data is kept and how it is safely destroyed after its retention period

Data governance is increasingly critical in a regulatory environment that includes GDPR, CCPA, and sector-specific regulations in financial services, healthcare, and government.

Data Flows and Information Flows

Phase C maps the flows of data across the enterprise. A data flow shows how information moves from its point of origination through the systems and processes that use, transform, and store it.

Documenting data flows reveals:

Where data enters the enterprise (from customers, partners, regulators, or internal processes)
Which systems create, read, update, or delete specific data entities
Where data is duplicated or replicated, and whether that duplication is controlled or accidental
Where sensitive data flows to, which is critical for privacy and security compliance
Latency characteristics: how quickly does data need to flow from source to consumer?

Data flow diagrams in Phase C use a context level (showing the enterprise as a whole interacting with external entities) and process levels (showing how data moves between internal systems and processes).

The Data Entity/Business Function Matrix

One of the most powerful tools in Phase C is the Data Entity/Business Function Matrix (also called a CRUD matrix - Create, Read, Update, Delete).

This matrix maps each data entity against the business functions that interact with it, showing which function creates the data, which functions read it, which update it, and which delete it. This analysis reveals:

Which functions own which data (the creator is typically the authoritative owner)
Dependencies between functions: if Function A reads data that Function B creates, then A is dependent on B
Data quality risks: if multiple functions update the same entity, there is a risk of conflicting changes
Migration sequencing constraints: when migrating systems, the system that creates data must be migrated before the systems that read it

Data Architecture Patterns

Like application architecture, data architecture uses established patterns for structuring information assets.

Single Source of Truth

Each data entity has exactly one authoritative source system. All other systems that need the data receive it from that source, rather than maintaining their own copy. This eliminates inconsistency but requires a reliable, well-governed master data platform.

Master Data Management (MDM)

A Master Data Management platform maintains the authoritative, golden record for key business entities (typically Customer, Product, Supplier, and Location). All other systems synchronize from the MDM system, ensuring every application is working with the same core data.

Data Warehouse and Data Lake

A Data Warehouse consolidates data from multiple operational systems into an integrated store optimized for reporting and analytics. A Data Lake stores raw, structured, and unstructured data at scale, typically before transformation. The choice between these approaches (or a combination) is a significant Data Architecture decision.

Data Mesh

A more modern pattern where data is treated as a product, owned and managed by the business domain that creates it, and exposed to consumers via standardized interfaces. This distributes data ownership rather than centralizing it.

Choose Patterns Based on Business Needs

Data architecture patterns each come with tradeoffs between consistency, scalability, governance complexity, and implementation cost. The right choice depends on the organization's data maturity, regulatory context, and the nature of the business services that data must support.

Baseline and Target Data Architecture

Phase C Data Architecture follows the same baseline-target-gap structure as other ADM phases.

The Baseline Data Architecture documents current data assets, existing data models, known quality issues, governance gaps, and redundant or conflicting data stores.

The Target Data Architecture defines:

The enterprise logical data model for key entities
The agreed data governance framework and ownership model
The data storage and management platforms to be used
The master data management approach
The data flow model for key information streams
Data security and privacy controls

Phase C Data Architecture Deliverables

Catalogs

Data Entity/Data Component Catalog: All data entities with their attributes, owners, and source systems

Matrices

Data Entity/Business Function Matrix (CRUD matrix)
Application/Data Matrix: Which applications create, read, update, or delete which data entities

Diagrams

Conceptual Data Diagram: High-level entity relationships
Logical Data Diagram: Detailed attributes and relationships
Data Dissemination Diagram: How data flows to consuming applications
Data Security Diagram: Access control and classification
Data Migration Diagram: How data will be moved from baseline to target systems

Real-World Example: Healthcare Provider

A regional healthcare provider's Phase C Data Architecture work reveals:

Baseline finding: Patient demographic data exists in four separate systems - the Patient Administration System, the GP referral portal, the pharmacy system, and the radiology platform - with different formats and no synchronization. A patient's name might be spelled differently across all four
Target: A single Patient Master Index becomes the authoritative source for all patient demographic data. All four systems synchronize from it via the integration platform
Data governance decision: The Patient Administration team owns the Patient entity. Pharmacy and Radiology are consumers, not owners, and cannot update demographic data directly
Data flow implication: All patient registrations and amendments must flow through the Patient Administration System, which then publishes updates to consuming systems via an event-driven integration
Application Architecture impact: The radiology system requires an API integration update to receive patient data rather than maintaining its own patient registry

Summary

Phase C Data Architecture defines the logical data model, governance framework, data flows, and storage patterns that the whole enterprise will use. Its core activities are:

Building the enterprise logical data model aligned to business capabilities
Establishing data ownership and governance policies
Mapping data flows across systems and boundaries
Choosing appropriate data architecture patterns
Performing a gap analysis between the current and target data states

With both Application and Data Architecture defined in Phase C, the enterprise architecture team moves to Phase D to define the technology infrastructure that will host and connect everything. Continue with Phase D: Technology Architecture for the final core domain.

If you want to understand how data needs are derived from business needs, review Phase B Business Architecture for the business service definitions that drive data requirements.

Data governance frameworks referenced in Phase C often draw from industry standards such as DAMA-DMBOK (Data Management Body of Knowledge), which provides detailed guidance on data quality, metadata management, and master data management that complements the TOGAF ADM. For certification exam preparation on Phase C, see our TOGAF Foundation Study Guide and TOGAF Foundation Mock Exam.

Frequently Asked Questions

Q: What does Data Architecture cover in TOGAF Phase C?

Data Architecture in Phase C describes the structure of an organisation's logical and physical data assets and the resources used to manage them. This includes: a Data Entity/Data Component Catalogue (what data exists); a Data Entity/Business Function Matrix (which functions create, read, update, or delete which data); Logical Data Models showing data structure and relationships; a Data Migration Strategy if existing data must be moved; and Data Governance definitions covering ownership, quality standards, and lifecycle. Data Architecture answers "what data do we need, where does it live, and who is responsible for it?"

Q: What is the difference between a Logical Data Model and a Physical Data Model in TOGAF?

A Logical Data Model describes data entities, their attributes, and their relationships independent of any specific database technology - it represents the business view of data structure. A Physical Data Model describes how the logical model is implemented in a specific database system - including table names, column types, indexes, and partitioning strategies. TOGAF Phase C primarily works with Logical Data Models; Physical Data Models belong to the solution design that follows in later phases. Keeping the logical model technology-independent allows it to remain stable even if the underlying database platform changes.

Q: How does Data Architecture relate to the concept of a "Single Source of Truth"?

Many architecture principles include a "Single Source of Truth" rule - each data entity should be mastered in exactly one system, which is authoritative for that entity. Data Architecture makes this principle concrete: the Data Entity Catalogue identifies each entity, and the architecture specifies which application or system is the "System of Record" for each one. This directly drives application integration design - consuming systems read from the master source rather than maintaining their own copies, eliminating the data quality and consistency problems that arise from data duplication.

Part of the TOGAF 9.2 Masterclass.