Architecture Decision Records (ADR): The Team's Long-Term Technical Memory

Architecture Decision Records (ADR): The Team's Long-Term Technical Memory
Table of Contents
- The "Why Did We Do This?" Problem
- What an ADR Is (and Isn't)
- The Michael Nygard ADR Template: Full Breakdown
- Writing Effective Context Sections
- Writing Effective Consequences Sections
- The MADR Format: A Modern Alternative
- Git-Based ADR Governance Workflow
- Integrating ADRs with Backstage
- Which Decisions Warrant an ADR?
- ADR Status Lifecycle: Proposed → Accepted → Superseded
- Building an ADR Culture
- Frequently Asked Questions
- Key Takeaway
The "Why Did We Do This?" Problem
Every long-lived codebase accumulates decisions that become mysteries:
The Archaeology Code Problem:
Git blame line 847 of gateway/MessageBroker.java:
// Process messages in batches of exactly 47
private static final int BATCH_SIZE = 47;
Questions:
- Why 47? Not 50, not 100, not a power of 2?
- Was this tuned from performance testing?
- Is this a constraint from a third-party system?
- Is this a coincidence from a developer's keyboard?
- What breaks if we change it to 100?
The original developer left 3 years ago.
The code that made 47 significant was deleted 2 years ago.
Nobody knows. Nobody changes it. The magic number persists forever.The cost of missing ADRs:
- New developers spend weeks reverse-engineering decisions before touching sensitive code
- "Safe" refactoring is impossible when rationale is unknown — you might break an implicit constraint
- The same architectural debate is re-litigated in every sprint retro ("why don't we just use GraphQL?")
- Onboarding time for senior engineers is measured in months, not days
An ADR written at the time the decision was made would have answered all these questions in 2 minutes.
What an ADR Is (and Isn't)
An ADR is:
- A short, specific record of one architectural decision (~1 page in Markdown)
- Immutable once accepted — you never edit an accepted ADR; you write a new one that supersedes it
- Focused on the "Why" — the "What" is visible in the code; the "Why" is only in the ADR
An ADR is NOT:
- A design document (those describe a system, not a decision)
- A retrospective (ADRs are written before or during implementation, not after)
- A technical specification (RFCs and specs describe how to build something)
- A ticket or story (those are transient; ADRs are permanent)
The Michael Nygard ADR Template: Full Breakdown
Michael Nygard's 2011 template is the industry standard — five sections that capture everything needed:
# ADR 023: Use PostgreSQL as the Primary Datastore
**Date:** 2026-04-17
**Status:** Accepted
**Architect:** Sarah Chen (Lead Architect)
**Deciders:** Engineering leadership meeting 2026-04-15
## Context
We are building the new Order Management System (OMS) that will replace the
legacy Oracle Forms application. We need to select a primary relational database.
**Constraints:**
- Must support ACID transactions (order state changes must be atomic)
- Team has no DBA staff — operations must be manageable by engineers
- Budget: $0/month for database licensing (startup phase)
- Scale requirement: ~50K orders/day, 500 concurrent users initially
- Must run on AWS (company standard cloud provider)
**Alternatives evaluated:**
- MySQL 8.0 (AWS RDS)
- PostgreSQL 16 (AWS RDS)
- Amazon Aurora PostgreSQL
- MongoDB (considered briefly, rejected — document model unsuitable for relational order data)
## Decision
We will use **PostgreSQL 16 on AWS RDS** as the primary datastore for the OMS.
Aurora PostgreSQL was considered but its minimum cost (~$180/month) is not
justified at our current scale. We will re-evaluate Aurora when monthly active
orders exceed 500K.
## Consequences
**Positive:**
- Zero licensing cost (open source)
- Strong ACID guarantees for order state machine
- Rich ecosystem: pgvector, PostGIS, full-text search available without extra services
- Team has existing PostgreSQL expertise (3/5 engineers have production experience)
- AWS RDS provides automated backups, failover, and patching
**Negative:**
- No native horizontal write scaling (sharding must be added manually when needed)
- Read scaling requires manual read-replica configuration
- RDS managed service costs ~$50/month (t3.medium, Multi-AZ)
**Neutral:**
- Limits to features available in PostgreSQL; MySQL-specific features unavailable
- Team must learn PostgreSQL-specific tooling (pgAdmin, pg_dump vs mysqldump)
**Migration path:**
- If we need horizontal write scaling: evaluate Citus or add sharding layer
- If we need Aurora features: migration path exists (Aurora is PostgreSQL-compatible)Writing Effective Context Sections
The Context is the most important section — it captures information that will be lost when the decision-makers leave. An effective Context includes:
1. The force driving the decision (why now?)
## Context
The authentication service is hitting 10K token validations/second during peak
traffic. At this rate, the HMAC-SHA256 verification adds 8ms average latency
to every API call, contributing 25% of our p99 latency budget.2. Hard constraints (business, technical, legal)
Constraints:
- GDPR requires tokens expire within 24 hours
- Must be compatible with our existing JWT RS256 tokens (100M issued)
- Cannot require user re-authentication (UX constraint from Product)3. Options evaluated (not just the one chosen)
Alternatives evaluated:
1. Redis token cache (rejected: adds Redis as critical dependency, ops overhead)
2. Paseto tokens (rejected: requires migrating all existing JWT clients)
3. Ed25519 signature algorithm (selected: 3× faster ECDSA verification vs RSA-2048)The MADR Format: A Modern Alternative
MADR (Markdown Any Decision Record) provides a more structured format ideal for teams that want explicit decision rationale and alternatives:
# Use OpenTelemetry Collector for Observability Pipeline
## Status
Accepted
## Context and Problem Statement
How do we collect telemetry (traces, metrics, logs) from 40+ microservices
without locking ourselves into a single observability vendor?
## Decision Drivers
* Ability to switch observability backends without code changes
* Support for all three signal types (traces/metrics/logs)
* Production-grade reliability (not experimental)
## Considered Options
* Option A: Vendor-specific agents (Datadog Agent, New Relic Agent)
* Option B: OpenTelemetry Collector with OTLP exporters
* Option C: Direct SDK integration to each backend
## Decision Outcome
Chosen option: **Option B: OpenTelemetry Collector**
Because:
- Changes backend (Grafana → Datadog) requires only Collector config change
- CNCF project — vendor-neutral, broad community support
- Supports all three signal types in one pipeline
### Consequences
* Good: Can switch observability vendors in hours, not weeks
* Good: Centralized sampling and filtering in one place
* Bad: Adds an OTel Collector fleet to operate (Kubernetes DaemonSet)
* Neutral: Team must learn OTel Collector configuration and pipeline designGit-Based ADR Governance Workflow
The most effective ADR workflow keeps records in the repository, alongside the code they describe:
Repository structure:
├── docs/
│ └── architecture/
│ └── decisions/
│ ├── README.md (Index of all ADRs)
│ ├── ADR-001-postgresql.md (Accepted)
│ ├── ADR-002-kafka-messaging.md (Accepted)
│ ├── ADR-015-graphql-rejected.md (Rejected — keep rejections too!)
│ ├── ADR-022-redis-cache.md (Accepted)
│ └── ADR-023-otel-collector.md (Proposed — under review)The PR-based review workflow:
1. Engineer drafts ADR-024-use-typescript.md with Status: Proposed
2. Opens Pull Request with description: "ADR: Adopt TypeScript for all new services"
3. PR comments become the official decision discussion record
4. Deciders approve → Status changed to Accepted → PR merged
5. Future engineers git blame the ADR → see who approved it and when
6. If decision reversed: new ADR-031-revert-to-javascript.md with Status: Supersedes ADR-024
ADR-024 updated: Status: Superseded by ADR-031 (history preserved — never deleted)Which Decisions Warrant an ADR?
Write an ADR when:
- The decision is difficult to reverse (database choice, runtime language, major framework)
- The decision has non-obvious trade-offs (why async instead of sync, why eventual consistency)
- The decision will cause questions ("why don't we just use X?")
- The decision involves rejected alternatives that seem superior at first glance
- The decision has external constraints that future engineers won't know about
Don't write an ADR for:
- Trivial implementation choices (variable naming, folder structure)
- Decisions that are obvious from the code context
- Temporary decisions that will be revisited imminently
- Choices with no meaningful alternatives
Frequently Asked Questions
How do I get my team to actually write ADRs?
Start with requirements rather than requests: require an ADR for any decision that changes a dependency in package.json, go.mod, pom.xml, or requirements.txt. Add an ADR checklist item to your PR template for architectural changes. Have the tech lead write the first 5 ADRs as examples — teams model behaviour they see in leadership. Make ADR review part of your architecture review process, not an optional add-on.
How do ADRs differ from RFCs? RFCs (Request for Comments) are broader proposals for how to build something — they describe a design in detail and seek implementation feedback. ADRs capture a specific decision — they are more concise and focused on the "Why" rather than the "How". Many teams use both: an RFC to design a system, then an ADR to record the key decisions made during RFC review. GitHub uses both extensively.
Key Takeaway
Architecture Decision Records are the cheapest documentation investment with the highest return. One page of Markdown written in the 30 minutes after a decision is made prevents hours of archaeology, months of "why did we do this" uncertainty, and years of "nobody changes it because nobody understands it" stagnation. The practice is simple: every time a consequential architectural decision is made, write a short ADR with Context, Decision, and Consequences, and merge it to the repository via PR. The first ADR takes 45 minutes. The hundredth takes 10 minutes. The ROI compounds with every engineer who joins and every decision that is reconsidered.
Read next: Platform Engineering Architecture: The Internal Developer Platform →
Part of the Software Architecture Hub — comprehensive guides from architectural foundations to advanced distributed systems patterns.
