TOGAF Phase C: Data Architecture and Information Management

Data is the lifeblood of a modern enterprise. Every application, every business process, and every strategic decision depends on data being available, accurate, and properly governed. TOGAF's Phase C Data Architecture is where the enterprise architecture practice addresses this reality head-on.
Working alongside Application Architecture, Data Architecture answers the question of what information the enterprise needs, where it lives, how it is structured, and how it flows across the organization. Getting this right is what separates organizations that can trust their data from those that struggle with fragmented, inconsistent information that leads to bad decisions.
What Is Data Architecture?
Data Architecture describes the structure of an organization's logical and physical data assets and the associated data management resources. In practice, this covers:
- What data the organization holds and what it represents
- How data is logically structured and related
- Where data physically resides and how it is stored
- How data flows between systems, people, and external parties
- Who is responsible for data quality, access, and governance
- What the rules are around data security, privacy, and retention
Data Architecture in TOGAF is developed in Phase C alongside Application Architecture, but the sequence matters. TOGAF recommends that Data Architecture is completed before Application Architecture is finalized, because application design decisions must conform to the agreed data model — not the other way around.
Why Data Architecture Often Gets Neglected
In many organizations, data architecture is treated as an afterthought. Applications are built first, and data structures emerge from those application decisions. This approach produces:
- Multiple definitions of the same concept (for example, three systems each have a different definition of "customer")
- Duplicated data held in siloed stores with no synchronization
- Integration projects that fail because systems cannot agree on a shared data model
- Regulatory failures because data cannot be traced to its source or accurately reported
Phase C Data Architecture prevents these problems by establishing the agreed logical data model before applications are designed to implement it.
Logical Data Models
The cornerstone of Phase C Data Architecture is the Logical Data Model. This is a structured representation of the key data entities the enterprise needs, how they relate to each other, and the rules that govern those relationships.
A logical data model is independent of any specific technology or database system. It describes the business reality of data, not the technical implementation.
Key Elements of a Logical Data Model
- Entities: The core things the organization needs to store information about, such as Customer, Product, Order, Employee, or Policy
- Attributes: The properties each entity has, such as a Customer having a Name, Date of Birth, and Contact Preferences
- Relationships: How entities connect to each other, such as a Customer placing multiple Orders, or an Order containing multiple Products
- Cardinality: The numeric nature of relationships, such as one-to-many (one Customer, many Orders) or many-to-many (many Orders, many Products)
- Business rules: Constraints on data, such as an Order must have at least one line item, or a Policy must be associated with exactly one Policyholder
Conceptual vs Logical vs Physical Data Models
TOGAF uses three levels of data model abstraction. The Conceptual Data Model identifies the key entities at a business level. The Logical Data Model adds attributes and relationships. The Physical Data Model adds database-specific design (tables, columns, indexes, partitioning). Phase C focuses on logical models; physical models are implementation detail.
Data Governance
Data Architecture cannot be separated from Data Governance. Governance defines who has responsibility for each data asset and what the rules are for its management.
TOGAF Phase C establishes the governance framework that will apply to data across the enterprise. This includes:
- Data ownership: For each entity or data domain, which business role is the authoritative owner, accountable for its accuracy and completeness?
- Data stewardship: Who is responsible for day-to-day management of data quality within a specific domain?
- Data standards: Agreed formats, naming conventions, and code values (for example, ISO country codes, standardized product category codes)
- Data quality rules: Specific, measurable quality criteria for each important data element
- Access and security policies: Who can see, modify, and delete which data, and under what conditions
- Retention and disposal policies: How long data is kept and how it is safely destroyed after its retention period
Data governance is increasingly critical in a regulatory environment that includes GDPR, CCPA, and sector-specific regulations in financial services, healthcare, and government.
Data Flows and Information Flows
Phase C maps the flows of data across the enterprise. A data flow shows how information moves from its point of origination through the systems and processes that use, transform, and store it.
Documenting data flows reveals:
- Where data enters the enterprise (from customers, partners, regulators, or internal processes)
- Which systems create, read, update, or delete specific data entities
- Where data is duplicated or replicated, and whether that duplication is controlled or accidental
- Where sensitive data flows to, which is critical for privacy and security compliance
- Latency characteristics: how quickly does data need to flow from source to consumer?
Data flow diagrams in Phase C use a context level (showing the enterprise as a whole interacting with external entities) and process levels (showing how data moves between internal systems and processes).
The Data Entity/Business Function Matrix
One of the most powerful tools in Phase C is the Data Entity/Business Function Matrix (also called a CRUD matrix — Create, Read, Update, Delete).
This matrix maps each data entity against the business functions that interact with it, showing which function creates the data, which functions read it, which update it, and which delete it. This analysis reveals:
- Which functions own which data (the creator is typically the authoritative owner)
- Dependencies between functions: if Function A reads data that Function B creates, then A is dependent on B
- Data quality risks: if multiple functions update the same entity, there is a risk of conflicting changes
- Migration sequencing constraints: when migrating systems, the system that creates data must be migrated before the systems that read it
Data Architecture Patterns
Like application architecture, data architecture uses established patterns for structuring information assets.
Single Source of Truth
Each data entity has exactly one authoritative source system. All other systems that need the data receive it from that source, rather than maintaining their own copy. This eliminates inconsistency but requires a reliable, well-governed master data platform.
Master Data Management (MDM)
A Master Data Management platform maintains the authoritative, golden record for key business entities (typically Customer, Product, Supplier, and Location). All other systems synchronize from the MDM system, ensuring every application is working with the same core data.
Data Warehouse and Data Lake
A Data Warehouse consolidates data from multiple operational systems into an integrated store optimized for reporting and analytics. A Data Lake stores raw, structured, and unstructured data at scale, typically before transformation. The choice between these approaches (or a combination) is a significant Data Architecture decision.
Data Mesh
A more modern pattern where data is treated as a product, owned and managed by the business domain that creates it, and exposed to consumers via standardized interfaces. This distributes data ownership rather than centralizing it.
Choose Patterns Based on Business Needs
Data architecture patterns each come with tradeoffs between consistency, scalability, governance complexity, and implementation cost. The right choice depends on the organization's data maturity, regulatory context, and the nature of the business services that data must support.
Baseline and Target Data Architecture
Phase C Data Architecture follows the same baseline-target-gap structure as other ADM phases.
The Baseline Data Architecture documents current data assets, existing data models, known quality issues, governance gaps, and redundant or conflicting data stores.
The Target Data Architecture defines:
- The enterprise logical data model for key entities
- The agreed data governance framework and ownership model
- The data storage and management platforms to be used
- The master data management approach
- The data flow model for key information streams
- Data security and privacy controls
Phase C Data Architecture Deliverables
Catalogs
- Data Entity/Data Component Catalog: All data entities with their attributes, owners, and source systems
Matrices
- Data Entity/Business Function Matrix (CRUD matrix)
- Application/Data Matrix: Which applications create, read, update, or delete which data entities
Diagrams
- Conceptual Data Diagram: High-level entity relationships
- Logical Data Diagram: Detailed attributes and relationships
- Data Dissemination Diagram: How data flows to consuming applications
- Data Security Diagram: Access control and classification
- Data Migration Diagram: How data will be moved from baseline to target systems
Real-World Example: Healthcare Provider
A regional healthcare provider's Phase C Data Architecture work reveals:
- Baseline finding: Patient demographic data exists in four separate systems — the Patient Administration System, the GP referral portal, the pharmacy system, and the radiology platform — with different formats and no synchronization. A patient's name might be spelled differently across all four
- Target: A single Patient Master Index becomes the authoritative source for all patient demographic data. All four systems synchronize from it via the integration platform
- Data governance decision: The Patient Administration team owns the Patient entity. Pharmacy and Radiology are consumers, not owners, and cannot update demographic data directly
- Data flow implication: All patient registrations and amendments must flow through the Patient Administration System, which then publishes updates to consuming systems via an event-driven integration
- Application Architecture impact: The radiology system requires an API integration update to receive patient data rather than maintaining its own patient registry
Summary
Phase C Data Architecture defines the logical data model, governance framework, data flows, and storage patterns that the whole enterprise will use. Its core activities are:
- Building the enterprise logical data model aligned to business capabilities
- Establishing data ownership and governance policies
- Mapping data flows across systems and boundaries
- Choosing appropriate data architecture patterns
- Performing a gap analysis between the current and target data states
With both Application and Data Architecture defined in Phase C, the enterprise architecture team moves to Phase D to define the technology infrastructure that will host and connect everything. Continue with Phase D: Technology Architecture for the final core domain.
If you want to understand how data needs are derived from business needs, review Phase B Business Architecture for the business service definitions that drive data requirements.
