110 Software Architect Interview Questions and Answers (2026)

110 Software Architect Interview Questions and Answers (2026)
Software architect interviews test three things simultaneously: your technical depth, your ability to reason about trade-offs under uncertainty, and your communication skill with non-technical stakeholders. This guide covers all three with 110 real questions drawn from senior architect hiring panels at tech companies, banks, and consultancies.
Questions are grouped into eight categories. Work through the ones relevant to the role you are targeting, then revisit the trade-off and leadership sections regardless — every panel asks them.
1. Foundations and Role Definition
Q1. What is the difference between a software architect and a senior engineer?
A senior engineer maximises the quality of a component they own. An architect is accountable for the structural decisions that span components: how services communicate, where data lives, which trade-offs the team accepts to meet non-functional requirements. Architects write less production code and spend more time on ADRs, diagrams, and stakeholder conversations.
Q2. What are the three main types of software architect?
Solution architect: designs the technical approach for a specific project or product. Enterprise architect: governs technology standards and strategy across the entire organisation. Domain architect (or technical lead): owns the architecture of one bounded domain within a product. Many organisations blur these; know which you are interviewing for.
Q3. How do you define a successful architecture?
An architecture is successful when the system can be changed safely and quickly by developers who were not involved in its original design. Fitness for purpose today is necessary but not sufficient — the architecture must also support the rate of change the business needs over the next 2–3 years without becoming a rewrite candidate.
Q4. What is an Architecture Decision Record (ADR)?
An ADR is a short document (typically one page) that records a significant architectural decision: the context, the options considered, the decision made, and the trade-offs accepted. ADRs are stored in version control alongside code so future engineers can understand why the architecture looks the way it does.
Q5. How do you handle a situation where stakeholders want to change a core architectural decision mid-project?
Acknowledge the business driver first. Then present the cost of the change: what work is already done, what will be thrown away, and what new risks are introduced. If the change is justified, update the ADR and communicate the revised timeline. If it is not justified, explain the consequences clearly and recommend deferring the change to a post-launch iteration.
Q6. What is the difference between architecture and design?
Architecture defines structural decisions that are expensive to change: service boundaries, communication protocols, data ownership, deployment topology. Design covers decisions within a service or module that a developer can change without affecting other services. The boundary is not fixed — what is architecture in a small codebase may be design in a monorepo.
Q7. How do you document architecture for a team that does not read documents?
Use architecture diagrams in the C4 model format (context, container, component, code) embedded in the wiki. Keep ADRs short (one page max). Run walking-skeleton demos that show the architecture in action rather than describing it abstractly. Add architecture decision context to PR descriptions so decisions are encountered at code-review time.
2. System Design
Q8. Walk me through how you would design a URL shortener.
Clarify: expected redirects per second, URL expiry requirements, analytics needed. Design: a write service that generates a 6-character base-62 key (bijective encoding of an auto-increment ID is simpler than random with collision checks), stores key→URL in a key-value store (DynamoDB or Redis), and returns the short URL. Read service does a cache-first lookup (Redis TTL matched to URL expiry) then falls back to the database. CDN in front for popular URLs. Separate analytics pipeline if click tracking is required.
Q9. How would you design a notification system that sends 10 million emails per day?
Ingest: API accepts notification requests, validates them, and publishes to a message queue (SQS or Kafka). Workers: a pool of workers consumes the queue, resolves user preferences (opt-out, frequency caps), renders the template, and hands off to an email sending provider (SES, SendGrid). Rate control: the worker pool respects provider sending limits via a token-bucket rate limiter. Retry: failed sends go to a dead-letter queue for investigation. Monitoring: track delivery rate, bounce rate, and spam complaints.
Q10. How would you design Twitter's home timeline feed?
Two approaches. Fan-out on write: when a user posts, push the tweet to the inbox of every follower. Fast reads, expensive writes, and problematic for users with millions of followers (celebrities). Fan-out on read: compute the timeline at read time by merging the recent tweets of everyone the user follows. Cheaper writes, expensive reads at scale. Hybrid: fan-out on write for normal users, fan-out on read for celebrities with >1 million followers. Twitter used this hybrid approach.
Q11. How would you design a distributed rate limiter?
Options: token bucket or sliding window. For distributed state, store the counter in Redis with a TTL equal to the window size. Use Redis Lua scripts to atomically check and decrement the counter (avoids race conditions). For very high-throughput scenarios, use Redis cluster with consistent hashing. Accept that a small percentage of requests may slip through during Redis failover — this is usually acceptable for rate limiting.
Q12. How would you design a payment processing system?
Key constraints: exactly-once processing, audit trail, regulatory compliance. Design: idempotent API (client supplies a unique transaction ID; duplicate submissions return the original result). State machine per transaction (initiated → validated → authorised → settled → reconciled). All state transitions persisted to an append-only event log before external calls. Saga pattern for multi-step flows (authorise, capture, settle) with compensating transactions for rollback. PCI DSS scope: card data handled only in a separate hardened service; no card numbers logged.
Q13. How would you design a search autocomplete system?
Use a trie data structure for prefix matching. For scale, pre-compute top-N completions for each prefix and store them in a fast key-value cache. Refresh the cache periodically from query logs (most popular searches float to the top). For very large datasets, partition the trie by first character across multiple servers. Add typo tolerance with a fuzzy match fallback for prefixes that return no results.
Q14. How would you design a distributed job scheduler?
Leader election (ZooKeeper or etcd) to ensure only one node schedules at a time. Jobs stored in a database with next-run-time, status, and retry count. Scheduler polls for due jobs, locks the row (SELECT FOR UPDATE or optimistic locking), and dispatches to a work queue. Workers execute jobs and update status. Handle missed runs (system downtime) by running all jobs whose next-run-time is in the past on startup. Provide an idempotency key so jobs that run twice do not cause double effects.
Q15. How would you design an API gateway?
Core concerns: routing, authentication, rate limiting, SSL termination, request/response transformation, and observability. Use an off-the-shelf gateway (Kong, AWS API Gateway, NGINX) rather than building from scratch unless you have requirements they cannot meet. Add circuit breakers per backend service. Log all requests with correlation IDs so traces can be assembled across services. Avoid putting business logic in the gateway — it becomes a coupling point.
3. Architecture Patterns
Q16. What is the difference between microservices and a monolith?
A monolith deploys as a single unit; all components share a process and database. Microservices deploy independently; each service owns its data and communicates over the network. Monoliths are simpler to develop and debug at small scale. Microservices allow independent scaling and deployment at the cost of network latency, distributed tracing complexity, and operational overhead. Start with a well-structured monolith and extract services when a specific scaling or team autonomy problem justifies it.
Q17. What is event-driven architecture and when should you use it?
Event-driven architecture has producers publishing events to a broker (Kafka, SQS, EventBridge) and consumers reacting to those events. Use it when you need loose coupling between services, when consumers need to process the same event independently, or when you need a replay-able audit log. Avoid it for simple request-response operations, when strong consistency is required, or when debugging distributed async flows would slow your team unacceptably.
Q18. What is CQRS and what problem does it solve?
Command Query Responsibility Segregation separates the model used for writes (commands) from the model used for reads (queries). It solves the problem of a single model that is optimal for neither: write models need transaction integrity; read models need denormalised, query-optimised projections. CQRS is most valuable when read and write workloads have very different scaling requirements. It adds complexity (eventual consistency between write and read models) so only use it where the benefit is clear.
Q19. What is the saga pattern and why is it used in microservices?
The saga pattern manages distributed transactions across multiple services without a global two-phase commit (which would create tight coupling and a locking bottleneck). A saga is a sequence of local transactions, each publishing an event that triggers the next step. If a step fails, compensating transactions undo the previous steps. Choreography-based sagas use events with no central coordinator; orchestration-based sagas use an orchestrator service that calls each step. Orchestration is easier to reason about and test.
Q20. What is the strangler fig pattern?
The strangler fig pattern is a migration technique for incrementally replacing a legacy system. New functionality is built in the new system. A routing layer (typically an API gateway or reverse proxy) directs traffic to either the legacy or new system based on the route. Old functionality is migrated piece by piece until the legacy system is fully replaced and can be decommissioned. The legacy system is never shut down all at once, which reduces migration risk.
Q21. What is the outbox pattern?
The outbox pattern solves the dual-write problem: you need to update a database and publish an event atomically, but the database and message broker are separate systems. The solution: write the event to an outbox table in the same database transaction as the business data update. A separate relay process reads the outbox table and publishes the events to the broker, then marks them as sent. This guarantees at-least-once delivery without distributed transactions.
Q22. What is the circuit breaker pattern?
A circuit breaker wraps calls to an external dependency and tracks the failure rate. When failures exceed a threshold, the circuit opens and subsequent calls fail immediately without attempting the external call (fast-fail). After a timeout, the circuit moves to half-open and allows a probe request through. If the probe succeeds, the circuit closes. Circuit breakers prevent cascading failures when a dependency is slow or unavailable.
Q23. What is the sidecar pattern in Kubernetes?
A sidecar is a secondary container in the same pod as the main application container. It handles cross-cutting concerns: service mesh proxy (Envoy in Istio), log shipping, secrets injection, or certificate rotation. The main application container does not need to know about these concerns. The sidecar shares the pod's network namespace and local storage. This pattern is the foundation of the service mesh.
Q24. What is the backend for frontend (BFF) pattern?
A BFF is an API layer specific to one client type (mobile app, web SPA, third-party partner). Instead of one generic API that tries to serve all clients, each client gets an API tailored to its data shape and interaction patterns. A mobile BFF returns smaller payloads; a web BFF may aggregate multiple service calls into a single response. BFFs avoid over-fetching and under-fetching at the cost of duplication across BFFs.
Q25. When would you choose a monorepo over a multi-repo setup?
Monorepo: one repository containing all services and libraries. Advantages: atomic cross-service changes, shared tooling, easy code reuse. Works well for organisations where teams collaborate frequently across service boundaries. Multi-repo: each service in its own repository. Advantages: independent release cycles, smaller clone size, clear ownership. Works well when services are genuinely independent and team boundaries are stable. The build tooling required for monorepos at scale (Nx, Turborepo, Bazel) is significant.
4. Scalability and Performance
Q26. What is horizontal scaling and when is it preferable to vertical scaling?
Horizontal scaling adds more machines. Vertical scaling adds more resources to one machine. Prefer horizontal scaling when: your workload can be parallelised across instances, you need fault tolerance (one node failing does not take the service down), and your cloud provider's largest instance size is a constraint. Vertical scaling is simpler and is a valid first step — it has no architectural changes and no need for load balancing.
Q27. How do you design a system for 99.99% availability?
99.99% availability allows approximately 52 minutes of downtime per year. Achieve it with: redundant instances across multiple availability zones, a load balancer with health checks, automated failover, circuit breakers to isolate failing dependencies, zero-downtime deployment (blue-green or rolling), regular chaos engineering to find hidden single points of failure, and an incident response runbook that brings MTTR under 5 minutes.
Q28. What is the difference between latency and throughput?
Latency is the time to complete one operation (milliseconds per request). Throughput is the number of operations completed per unit of time (requests per second). They are related but not the same. A system can have high throughput with high latency (batch processing). Reducing latency often reduces throughput for the same hardware because each request holds resources longer. Design targets should specify both P99 latency and target RPS, not just one.
Q29. How does caching improve system performance and what are its risks?
Caching reduces latency and database load by serving frequent reads from fast in-memory storage. Risks: stale data (cache not invalidated after writes), cache stampede (many requests hit the database simultaneously when a cache entry expires — mitigate with probabilistic early expiry or a mutex), and cache poisoning (incorrect data cached and served to all users). Cache invalidation is one of the two hard problems in computer science.
Q30. What is database sharding and what problems does it introduce?
Sharding partitions data across multiple database nodes, each holding a subset of rows. It removes the single-database bottleneck for write-heavy workloads. Problems it introduces: cross-shard queries become expensive joins across the network, transactions spanning shards require distributed coordination, and resharding (rebalancing as data grows) is operationally complex. Exhaust vertical scaling and read replicas before choosing sharding.
Q31. What is the difference between a read replica and a cache?
A read replica is a database copy that receives writes from the primary asynchronously. Queries can run against it, slightly behind the primary. It stores all data and supports arbitrary SQL queries. A cache (Redis, Memcached) stores a subset of data in memory in a pre-computed format. Cache reads are orders of magnitude faster but only serve data that has been explicitly stored. Use read replicas for complex queries; use caches for hot, frequently-read objects.
Q32. How do you handle the N+1 query problem?
The N+1 problem occurs when fetching a list of N items and then making one query per item to fetch related data — N+1 queries total. Solutions: use JOIN queries to fetch related data in one query, use DataLoader (batching and caching for GraphQL), or use eager loading in your ORM (include in Sequelize, joinedload in SQLAlchemy). Always check query logs when working with ORM-generated SQL.
Q33. What is eventual consistency and when is it acceptable?
Eventual consistency means that if no new updates are made, all replicas will eventually converge to the same value. It is acceptable when: brief inconsistency does not cause a bad user experience (social media like counts), when the business does not require read-your-writes guarantees, or when the latency cost of strong consistency is unacceptable. It is not acceptable for financial ledgers, inventory counts where overselling matters, or any domain where acting on stale data causes real-world harm.
Q34. What is the CAP theorem and what does it mean in practice?
The CAP theorem states that a distributed system can guarantee at most two of three properties: Consistency (all nodes see the same data at the same time), Availability (every request receives a response), and Partition tolerance (the system continues operating when network partitions occur). Since network partitions are unavoidable, the real choice is between CP (consistent but may be unavailable during partition) and AP (available but may return stale data). Most NoSQL databases are AP; relational databases with synchronous replication are CP.
Q35. How do you approach capacity planning for a new system?
Start with target throughput and growth rate. Estimate resource consumption per request (CPU cycles, memory, database connections, I/O). Calculate the fleet size needed at peak traffic with a 40% headroom. Add a load testing phase before launch to validate assumptions. Set up auto-scaling with target CPU utilisation around 60% to leave headroom for spikes. Revisit the capacity model quarterly as traffic patterns evolve.
5. Data Architecture
Q36. When would you choose a NoSQL database over a relational database?
Choose NoSQL when: your data has a variable or nested structure that maps poorly to fixed table schemas, you need horizontal write scaling beyond what a single relational node can provide, or you need a specific access pattern that a specialised store handles better (document store for JSON, graph database for relationship traversal, time-series DB for metrics). Use relational databases by default — their consistency guarantees and query flexibility are valuable and most applications do not outgrow them.
Q37. What is a time-series database and what is it used for?
A time-series database (InfluxDB, TimescaleDB, Prometheus) is optimised for storing and querying data points indexed by timestamp. It provides efficient compression for sequential writes, fast range queries over time windows, and built-in downsampling (aggregating old data to reduce storage). Use it for application metrics, IoT sensor data, financial tick data, and any workload where the primary access pattern is "all values between time A and time B."
Q38. What is data normalisation and when should you denormalise?
Normalisation organises data into tables to eliminate redundancy (1NF through 3NF). Denormalisation adds redundant data to avoid expensive joins at query time. Normalise by default to maintain consistency and reduce update anomalies. Denormalise when a specific query is too slow and profiling shows the join is the bottleneck, or when designing a read model in a CQRS system where query performance is the primary concern.
Q39. What is a data lake versus a data warehouse?
A data warehouse stores structured, processed data optimised for analytical queries (Redshift, BigQuery, Snowflake). A data lake stores raw data in its original format at scale, including unstructured and semi-structured data (S3, ADLS). The warehouse gives business users a consistent query interface; the data lake preserves the raw source for data scientists who need flexibility. A lakehouse (Delta Lake, Apache Iceberg) attempts to combine both: raw storage with ACID transactions and SQL query capability.
Q40. How do you handle database migrations in a zero-downtime deployment?
Use the expand-contract pattern. Phase 1 (expand): add the new column or table with a default value while the old schema is still in use. Deploy the application to write to both old and new columns. Phase 2 (migrate): backfill old rows with data in the new column. Phase 3 (contract): deploy a new application version that reads only from the new column, then drop the old column. Never rename or drop a column in the same deployment that changes the application code that reads it.
Q41. What is event sourcing and how does it differ from traditional persistence?
Traditional persistence stores current state: the latest value of each record. Event sourcing stores the sequence of events that led to the current state. The current state is derived by replaying the event log. Benefits: complete audit trail, ability to replay to any point in time, natural fit for CQRS. Costs: more complex reads (must replay or maintain a projection), schema evolution is harder (old events must be interpretable by new code), and storage grows without bound (mitigate with snapshotting).
Q42. What is database connection pooling and why is it important?
Creating a new database connection is expensive (TCP handshake, authentication, session setup). A connection pool maintains a fixed set of open connections and lends them to application threads. When the thread finishes, the connection returns to the pool rather than being closed. Without pooling, each request creates and destroys a connection, which limits throughput and increases database CPU load. Configure pool size to roughly match the database's max_connections divided by the number of application instances.
6. Cloud and Infrastructure
Q43. What is infrastructure as code and why does it matter for architecture?
Infrastructure as code (IaC) defines cloud resources (servers, networks, databases) in version-controlled configuration files (Terraform, CloudFormation, Pulumi). It matters because: environments become reproducible and consistent, changes go through code review, drift between environments is detectable, and disaster recovery becomes a plan execution rather than a manual reconstruction. An architecture that cannot be reproduced from code is fragile.
Q44. What is the difference between blue-green deployment and canary deployment?
Blue-green: maintain two identical production environments (blue and green). Deploy to the inactive one, run smoke tests, then shift all traffic instantly by updating the load balancer. Canary: shift a small percentage of traffic (1–10%) to the new version, monitor error rates and latency, then gradually roll forward or roll back. Blue-green gives faster rollback but requires double the infrastructure. Canary limits blast radius for problems that smoke tests do not catch.
Q45. What is a service mesh and when do you need one?
A service mesh (Istio, Linkerd) adds a sidecar proxy to every service pod. The proxies handle mutual TLS, traffic management (retries, timeouts, circuit breakers), and telemetry (distributed tracing, metrics) without changes to application code. You need a service mesh when you have more than 10–20 services and managing these cross-cutting concerns at the application level is becoming unmanageable. Below that scale, a service mesh adds more operational complexity than it solves.
Q46. What is the difference between a virtual machine and a container?
A virtual machine includes a full operating system kernel; the hypervisor abstracts the physical hardware. A container shares the host OS kernel; the container runtime (Docker) isolates processes using Linux namespaces and cgroups. Containers start in milliseconds vs minutes for VMs, consume less memory, and pack more densely on a host. VMs provide stronger isolation (separate kernel) and are required for running different operating systems on the same host.
Q47. What is a Kubernetes pod and how does it differ from a container?
A pod is the smallest deployable unit in Kubernetes. It wraps one or more containers that share a network namespace (same IP address) and can share volumes. Containers in a pod are always scheduled together on the same node. Most pods contain one main container plus optional sidecar containers. The pod abstraction exists to support the sidecar pattern and init containers without exposing container runtime details to the scheduler.
Q48. How do you manage secrets in a cloud-native architecture?
Never store secrets in environment variables baked into container images or in unencrypted configuration files. Use a secrets manager: AWS Secrets Manager, HashiCorp Vault, or Kubernetes Secrets encrypted at rest with KMS. Inject secrets into pods at runtime via volume mounts or environment variables from the secrets manager. Rotate secrets automatically. Audit who accessed which secret and when. Treat a leaked secret as compromised immediately regardless of whether you see evidence of misuse.
Q49. What is a multi-region architecture and when do you need it?
A multi-region architecture runs the application in multiple geographic regions. Use it for: compliance (data residency requirements), latency (serve users from the nearest region), and disaster recovery (survive a full regional outage). It is complex: data replication across regions introduces consistency challenges, and traffic routing (latency-based DNS, Anycast) requires global load balancing. Most organisations do not need active-active multi-region; active-passive (failover) is sufficient and far simpler.
Q50. What is a CDN and how does it improve system performance?
A content delivery network caches static assets (images, JS, CSS) at edge nodes globally. When a user requests an asset, they are served from the nearest edge node rather than the origin server, reducing latency from hundreds of milliseconds to single digits. CDNs also absorb large traffic spikes and DDoS attacks at the edge. Dynamic content can also be accelerated by routing requests over the CDN's optimised network backbone, even when caching is not applicable.
7. Security Architecture
Q51. What is the principle of least privilege and how do you apply it?
Every process, service, and user account should have only the permissions required to perform its function and no more. Apply it by: giving each microservice its own IAM role with only the policies it needs, using read-only database users for services that only query, scoping API keys to specific operations, and regularly auditing and removing unused permissions. The blast radius of a compromise is proportional to the permissions held by the compromised identity.
Q52. What is defence in depth?
Defence in depth places multiple independent security controls at different layers of the stack so that the failure of one control does not expose the system to compromise. Example layers: network (VPC, security groups, WAF), application (input validation, parameterised queries, output encoding), identity (MFA, short-lived credentials), data (encryption at rest and in transit, field-level encryption for PII), and monitoring (anomaly detection, alerting on unusual access patterns).
Q53. How do you design authentication and authorisation in a microservices architecture?
Use an identity provider (Auth0, Cognito, Keycloak) to issue signed JWTs at login. Each service validates the JWT signature against the IdP's public key — no network call required. Extract claims (user ID, roles, scopes) from the token and apply authorisation logic locally. Use short token expiry (15 minutes) with refresh tokens. For service-to-service calls, use mutual TLS or service accounts with short-lived credentials rather than sharing user tokens.
Q54. What is a zero-trust network architecture?
Zero trust assumes no implicit trust based on network location. Every request must be authenticated and authorised regardless of whether it comes from inside or outside the corporate network. Implement with: mutual TLS for all service-to-service communication, strong identity verification for every API call, micro-segmentation to limit lateral movement, and continuous monitoring of all traffic. VPN access to a trusted network zone is replaced by per-application access policies.
Q55. How do you protect against SQL injection in application architecture?
Use parameterised queries or prepared statements in all database access code — never concatenate user input into SQL strings. Use an ORM that parameterises by default. Validate and sanitise all user inputs at the system boundary. Run the application with a database user that has the minimum required privileges (no DROP TABLE). Automated security scanning (SAST) in CI should flag any dynamic query construction.
Q56. What is OAuth 2.0 and how does it work?
OAuth 2.0 is an authorisation framework that allows a user to grant a third-party application limited access to their account on another service without sharing credentials. The flow: the user redirects to the authorisation server, authenticates, consents to the requested scopes, and receives an authorisation code. The application exchanges the code for an access token. The application uses the token to call the resource server. Tokens are scoped and short-lived.
Q57. What is threat modelling and when should you do it?
Threat modelling identifies potential security threats before they are implemented. Do it during design, not after. The STRIDE framework categories threats: Spoofing, Tampering, Repudiation, Information disclosure, Denial of service, Elevation of privilege. For each component in your architecture, ask which STRIDE threats apply and what mitigations exist. Repeat when the architecture changes significantly. Threat modelling shifts security left and is far cheaper than finding vulnerabilities in production.
8. Leadership, Communication, and Trade-offs
Q58. How do you communicate an architecture decision to non-technical stakeholders?
Anchor on business outcomes: reliability (fewer outages), time to market (faster feature delivery), cost (infrastructure spend). Use analogies that map to familiar concepts. Avoid acronyms unless the stakeholder uses them first. Present the trade-off as a business decision: "We can have X benefit if we accept Y cost or delay." Let the stakeholder choose with full information rather than advocating for a technical preference.
Q59. How do you handle disagreement with a senior engineer about an architectural direction?
Acknowledge their experience and concern before presenting your counterargument. Ask clarifying questions — they may have constraints you are not aware of. Present your position with data: benchmarks, production incident data, or reference architectures from companies at similar scale. If agreement cannot be reached, propose a time-boxed proof of concept that produces concrete evidence for the decision. Document the final decision and the dissenting view in an ADR.
Q60. How do you decide when to pay down technical debt?
Technical debt should be paid when: it is slowing down feature delivery measurably, it has caused or nearly caused a production incident, or refactoring is cheaper now than later because the system is about to be significantly extended. Quantify the carrying cost (hours of developer time lost per sprint due to the debt) and present it to the business as a velocity investment. Never allow technical debt to accumulate silently — it must be tracked, prioritised, and periodically reviewed.
Q61. What is your process for evaluating a new technology before adopting it?
Define the problem it solves and check whether an existing tool in your stack solves it adequately. Run a time-boxed proof of concept against a production-representative workload. Evaluate: operational complexity (who will run it at 2am?), community and vendor support, licence compatibility, and security posture. Check reference deployments at organisations at your scale. Present findings in an ADR with a recommendation and the alternatives considered.
Q62. How do you ensure consistency of architectural standards across multiple teams?
Establish a lightweight Architecture Review Board that reviews high-impact decisions before implementation, not after. Publish an Architecture Decision Log that teams can browse. Create golden-path templates for common patterns (new service scaffolding, CI/CD pipeline) so the standard path is also the easy path. Run regular architecture guild meetings where teams share what they have learned. Enforce non-negotiable standards (security scanning, observability) in the CI pipeline where possible.
Q63. How do you approach the build vs buy decision?
Build when: the capability is a core differentiator, no commercial product meets your requirements, or the long-term total cost of ownership of a commercial product exceeds building. Buy when: the problem is commodity (authentication, payments, email), a commercial product meets 90%+ of requirements, and integration cost is lower than build cost. Be honest about hidden build costs: initial engineering, ongoing maintenance, documentation, and opportunity cost of not building product features instead.
Q64. How do you manage scope creep during an architecture engagement?
Define scope and success criteria in writing before starting. When a new requirement emerges, evaluate it explicitly: impact on timeline, cost, and risk. Log it in a decisions register with its disposition (accepted, deferred, rejected). Communicate scope changes to stakeholders with updated estimates before absorbing the work. Prevent scope creep from being invisible — every change must have a named decision-maker who accepts responsibility for the impact.
Q65. Describe a situation where you made an architectural decision that turned out to be wrong.
Structure the answer: describe the context and the decision made, explain what evidence looked like at the time (make clear it was reasonable), describe the failure mode that appeared, and explain what you did to correct it and what you would do differently. Panels ask this to assess self-awareness and whether you learn from failure. A candidate who cannot produce an example is either inexperienced or not self-aware — neither is attractive.
9. Reliability and Operations
Q66. What is the difference between RTO and RPO?
RTO (Recovery Time Objective) is how long the business can tolerate the system being unavailable after a failure. RPO (Recovery Point Objective) is how much data loss the business can tolerate (measured in time from the last backup to the failure). A payment system might have an RPO of 0 (zero data loss) and an RTO of 5 minutes. A reporting system might have an RPO of 1 hour and an RTO of 4 hours. Architecture decisions (synchronous vs asynchronous replication, backup frequency) must align with these objectives.
Q67. What is chaos engineering?
Chaos engineering deliberately introduces failures into a production or staging system to discover weaknesses before customers do. Pioneered by Netflix's Chaos Monkey, which randomly terminated EC2 instances. Modern chaos engineering (Chaos Toolkit, Gremlin, AWS Fault Injection Simulator) tests network latency injection, CPU saturation, dependency failures, and region outages. Run experiments with a hypothesis, measure the impact, and fix weaknesses discovered. Never run chaos experiments in production without executive awareness and an incident response team on standby.
Q68. What is an SLA, SLO, and SLI?
SLI (Service Level Indicator): a metric that measures service behaviour, e.g., request success rate. SLO (Service Level Objective): the target value for the SLI, e.g., 99.9% success rate over 30 days. SLA (Service Level Agreement): a contractual commitment to an external party, typically less aggressive than the SLO to leave a buffer. Set SLOs first, then negotiate SLAs from them. Use error budgets (the acceptable amount of unreliability) to balance feature velocity against reliability investment.
Q69. What is observability and how does it differ from monitoring?
Monitoring asks predefined questions about known failure modes ("is CPU over 90%?"). Observability is the property of a system that allows you to understand its internal state from external outputs alone — even for failure modes you did not predict. The three pillars of observability are metrics (aggregated numeric measurements), logs (discrete events with context), and traces (request paths across service boundaries). A system is observable if you can debug a novel failure from its outputs without deploying new instrumentation.
Q70. How do you implement distributed tracing?
Add a trace context (trace ID, span ID) to every inbound request at the system entry point. Propagate the trace context in all outbound calls (HTTP headers, message metadata). Each service creates a child span that records the operation start time, end time, and relevant attributes. Export spans to a tracing backend (Jaeger, Zipkin, AWS X-Ray, Datadog). This allows you to reconstruct the full request path across services for any individual request and identify which service introduced latency.
Q71. What is a runbook and why should architects care about it?
A runbook is a document describing how to respond to a specific operational event: an alert fires, an incident is declared, a deployment fails. Architects should care because systems they design will be operated by on-call engineers at 2am who do not have the architect's context. Designing operability into a system means: providing meaningful alerts with runbook links, ensuring log messages include enough context to diagnose the alert, and writing runbooks that a junior engineer can follow without calling the architect.
10. API Design
Q72. What is the difference between REST, GraphQL, and gRPC?
REST uses HTTP verbs and URLs; resources are identified by URIs; responses are typically JSON. GraphQL uses a single endpoint and a query language; clients specify exactly what data they need, eliminating over-fetching and under-fetching. gRPC uses HTTP/2 and Protocol Buffers; it is strongly typed, supports streaming, and is faster than JSON-based REST — ideal for internal service communication. Use REST for public APIs, GraphQL for complex client-driven data needs, and gRPC for high-performance internal service calls.
Q73. What makes an API RESTful?
A REST API follows six constraints: stateless (no session state on the server between requests), uniform interface (resources identified by URIs, standard HTTP verbs, hypermedia as the engine of application state), client-server separation, cacheable responses, layered system (client cannot tell if it is connected to the origin server or an intermediary), and optionally code on demand (server can send executable code). In practice, most "REST" APIs are Level 2 on the Richardson Maturity Model (resources + HTTP verbs) without hypermedia.
Q74. How do you version an API?
Three common strategies. URI versioning (/api/v1/users) is simple and visible but couples version to the URL. Header versioning (Accept: application/vnd.api+json;version=1) keeps URLs clean but is less visible. Query parameter versioning (/api/users?version=1) is convenient but pollutes URLs. Choose URI versioning for public APIs (discoverability and simplicity outweigh purity). Maintain at least one previous major version concurrently and give consumers a migration timeline before deprecating.
Q75. What is idempotency and why is it important in API design?
An idempotent operation produces the same result regardless of how many times it is called with the same input. GET, PUT, and DELETE are idempotent; POST is not by default. Idempotency matters because networks fail: clients retry requests when they do not receive a response. If the server processed the first request but the response was lost, the retry must not cause a double effect. Implement idempotency for non-idempotent operations using a client-supplied idempotency key that the server checks before processing.
11. Domain-Driven Design
Q76. What is a bounded context?
A bounded context defines the boundary within which a specific domain model is consistent and valid. The same term can mean different things in different bounded contexts: "Customer" in the billing context has different attributes and behaviour than "Customer" in the CRM context. Bounded contexts map naturally to microservices boundaries — each service owns one bounded context. The interfaces between bounded contexts are defined by context maps.
Q77. What is a domain event?
A domain event is a record that something significant happened in the domain: OrderPlaced, PaymentFailed, InventoryReserved. Domain events are immutable facts in the past tense. They are the primary mechanism for communication between bounded contexts in an event-driven architecture. They should carry all the information a consumer needs to react, without requiring the consumer to query back to the originating service.
Q78. What is the difference between an entity and a value object?
An entity has a unique identity that persists across state changes (a Customer with a CustomerID). A value object has no identity; it is defined entirely by its attributes and is interchangeable with any other value object that has the same attribute values (a Money object with amount and currency). Value objects are immutable. Prefer value objects where identity is not meaningful — they are simpler, thread-safe, and easier to test.
Q79. What is an aggregate in DDD?
An aggregate is a cluster of domain objects (entities and value objects) treated as a single unit for data changes. Each aggregate has one root entity (the aggregate root) which is the only entry point for external operations. Transactions should not span aggregate boundaries. Aggregates are persisted atomically. The boundary of an aggregate is defined by the consistency rule that must hold within it — if two objects must always be consistent with each other, they belong in the same aggregate.
12. Advanced and Situational Questions
Q80. How would you migrate a monolith to microservices?
Use the strangler fig pattern. Map the monolith's bounded contexts first — these become your eventual service boundaries. Start with the domain that has the highest independent deployment value or the most scaling pressure. Extract it into a separate service behind an API that the monolith calls. Move data ownership to the new service over time using the dual-write pattern. Repeat until the monolith is fully replaced or has shrunk to a maintainable core.
Q81. How do you design for failure in a distributed system?
Assume every network call will fail eventually. Design each service to be stateless so instances are replaceable. Add retries with exponential backoff and jitter on all outbound calls. Set timeouts on every external call — an infinite wait is always worse than a fast failure. Use circuit breakers to stop calling a failing dependency. Design the user experience for degraded mode: what can the user still do if one service is down?
Q82. How do you ensure data consistency across microservices without a distributed transaction?
Use eventual consistency with the saga pattern for multi-step operations. Use the outbox pattern to make event publication atomic with local state changes. Design compensating transactions for saga rollback scenarios. Accept that there will be windows of inconsistency between services — design the user experience so brief inconsistency is invisible or clearly communicated. If strong consistency is truly required across service boundaries, question whether those services should be separate.
Q83. What is the two-phase commit protocol and why is it rarely used in microservices?
Two-phase commit (2PC) coordinates a distributed transaction across multiple nodes: the coordinator sends a prepare message to all participants, waits for acknowledgement, then sends commit or rollback. If any participant fails to prepare, all roll back. Rarely used in microservices because: it blocks resources across services during the prepare phase, the coordinator is a single point of failure, and it is synchronous, reducing availability. The saga pattern achieves similar outcomes without these drawbacks.
Q84. How would you design the architecture for a real-time collaborative document editor?
Operational Transformation (OT) or Conflict-free Replicated Data Types (CRDTs) handle concurrent edits from multiple clients. Each client sends operations to a central server that transforms concurrent operations to maintain consistency and broadcasts the transformed operation to all other clients. For high availability, the server maintains the authoritative document state in memory with writes persisted to a database. WebSocket connections maintain the real-time channel between clients and server.
Q85. What is a write-ahead log and how is it used?
A write-ahead log (WAL) is an append-only log where changes are written before they are applied to the main data store. It provides durability (if the process crashes after writing the WAL but before applying the change, the change can be replayed on restart) and atomicity (partial writes are detectable and recoverable). PostgreSQL, Kafka, and ZooKeeper all use WAL-based designs. The WAL is also used for replication: replicas replay the primary's WAL to stay in sync.
Q86. How do you design a multi-tenant SaaS architecture?
Three isolation models. Silo: each tenant has dedicated infrastructure (database, compute). Highest isolation, highest cost. Bridge: shared compute, tenant-specific database. Good balance of isolation and efficiency. Pool: all tenants share the same database with tenant_id columns on every table. Lowest cost, lowest isolation. Choose based on tenant requirements (compliance, performance SLA, price sensitivity). Add row-level security in the database to prevent data leakage in the pool model.
Q87. How do you handle schema evolution in an event-driven system?
Consumers must be able to process events produced by old and new versions of the producer. Use schema evolution strategies: backward compatible (new fields are optional, old fields are never removed), forward compatible (consumers ignore unknown fields). Use a schema registry (Confluent Schema Registry, AWS Glue) to enforce compatibility rules before new schemas are deployed. Treat breaking schema changes as a new event type, not a modification to the existing type.
Q88. What is a dead-letter queue and when should you use it?
A dead-letter queue (DLQ) receives messages that a consumer fails to process after a configured number of retries. It separates poison messages (malformed data, unexpected format) from the main queue so they do not block processing of valid messages. Monitor the DLQ and alert on new messages. Inspect DLQ messages to determine whether the issue is a data quality problem (fix the producer), a consumer bug (fix and replay), or a transient infrastructure issue (replay directly).
Q89. How do you optimise a slow database query?
Profile first: use EXPLAIN ANALYZE to see the query plan. Common fixes: add an index on the columns used in WHERE, JOIN, and ORDER BY clauses, rewrite the query to avoid full table scans, partition the table by date range if the query always filters by date, use a covering index if the query only needs columns in the index, or move expensive aggregations to a pre-computed summary table updated by a batch job.
Q90. What is a content delivery network's impact on architecture?
A CDN offloads origin traffic for static assets and reduces global latency by serving from edge nodes. Architecturally, it introduces cache invalidation complexity: when you deploy a new version of a static asset, you must purge the CDN cache or use cache-busting file names (hashing the file content into the filename). For APIs, CDNs can cache responses for read-heavy endpoints (product listings, public prices) reducing origin load by orders of magnitude.
Q91. How do you approach observability for a new microservice?
From day one: structured logs with a correlation ID in every log line, request/response metrics (rate, error rate, latency histogram) exposed as a Prometheus endpoint, and health check endpoints (/health/live and /health/ready). Add distributed tracing with automatic span creation for inbound requests and explicit spans for external calls. Define an SLO and create an alert on SLO burn rate rather than raw error count. Document the runbook link in the alert message.
Q92. What is a feature flag and how does it affect architecture?
A feature flag is a configuration switch that enables or disables a feature at runtime without a deployment. It allows: trunk-based development (incomplete features merged but disabled), canary releases (enabled for a percentage of users), A/B testing, and fast rollback (disable without redeployment). Architecturally, feature flags add conditional branches in code. Use a feature flag service (LaunchDarkly, Flagsmith) rather than environment variables to enable dynamic updates and per-user targeting. Remove flags when features are fully launched to avoid flag debt.
Q93. How do you decide on service boundaries in a microservices architecture?
Align service boundaries with business capabilities, not technical layers. Each service should own its data, have a single team responsible for it, and change independently of other services. Start with coarse-grained boundaries (fewer, larger services) and split when a specific scaling, deployment, or team ownership problem justifies it. Use DDD bounded contexts as the primary tool for identifying boundaries. Services that always deploy together, share the same database, or frequently call each other are candidates for consolidation.
Q94. What is the SOLID principle most relevant to system architecture?
The Dependency Inversion Principle: high-level modules should not depend on low-level modules; both should depend on abstractions. At the system level this means: services should communicate through stable contracts (APIs, events), not by sharing implementation details (shared databases, shared libraries with business logic). This allows services to evolve independently as long as they honour their contracts. The Open/Closed Principle also applies at scale: extend the system by adding new services rather than modifying existing ones.
Q95. How do you handle cross-cutting concerns in microservices?
Cross-cutting concerns (logging, tracing, authentication, rate limiting, health checks) should be implemented consistently across all services. Options: bake them into a shared library (risk: tight coupling, forces synchronised upgrades), use a service mesh to handle them in the infrastructure layer (preferred for networking concerns), or build a platform team that provides opinionated service templates with cross-cutting concerns pre-wired. The service mesh approach keeps application code free of operational boilerplate.
Q96. What is a saga orchestration vs choreography?
In saga choreography, each service listens for events and publishes its own events — no central coordinator. In saga orchestration, an orchestrator service explicitly calls each step and handles failures. Choreography: fewer moving parts, higher coupling through shared event schemas, harder to trace the full flow. Orchestration: easier to understand and test the full flow, the orchestrator is a potential single point of failure, but the failure modes are more explicit. Prefer orchestration for complex sagas with many steps or rollback conditions.
Q97. How do you design a system that must process 1 billion events per day?
1 billion events per day is ~11,500 events per second average, with peaks potentially 10× higher. Use Apache Kafka as the ingestion backbone (it scales horizontally, retains messages for replay, and handles millions of events per second). Partition topics by a key that distributes load evenly. Consumer groups read in parallel across partitions. For aggregation and analytics, stream processing with Apache Flink or Spark Streaming processes events in near-real-time. Store raw events in cold storage (S3) for replay and long-term analytics.
Q98. What is a hybrid cloud architecture?
A hybrid cloud architecture spans private on-premises infrastructure and one or more public clouds. Common use cases: regulatory requirements that mandate certain data stays on-premises, latency-sensitive workloads that must run near hardware, and burst capacity (run steady-state workloads on-premises, scale to the cloud during peaks). The complexity lies in networking (private connectivity via Direct Connect or ExpressRoute), identity federation (single sign-on across environments), and operational consistency (same tooling and standards on both sides).
Q99. How do you architect for GDPR compliance?
Data minimisation: collect only the data you need. Consent management: record consent with timestamp and version; withdraw must be honoured. Right to erasure: implement a deletion service that can remove a user's personal data from all stores — including backups and event logs (pseudonymisation before archiving simplifies this). Data portability: export a user's data in machine-readable format on request. Data processing register: document every category of personal data, its purpose, legal basis, and retention period. Privacy by design: make compliance decisions during architecture, not after implementation.
Q100. What is your approach to architecture documentation?
Documentation that is separate from code rots quickly. Embed architecture in the repository: C4 diagrams as code (Structurizr DSL, Mermaid), ADRs in /docs/adr/, and API contracts as OpenAPI specs generated from code annotations. Keep prose documentation minimal: one README per service explaining its purpose, dependencies, and how to run it. Write documentation for the next person who will operate the system at 3am — operational runbooks, not design philosophies.
13. Five Bonus Questions for Senior Architect Panels
Q101. How do you evaluate whether a rewrite is justified over incremental refactoring?
A rewrite is justified only when incremental improvement cannot fix the fundamental problem: the data model is wrong, the technology is end-of-life with no migration path, or the cost of building new features is higher than starting fresh. Even then, prefer a partial rewrite (strangler fig) over a full rewrite. Full rewrites routinely double in time and cost while replicating the bugs of the original system. The rewrite that starts as a 6-month project and ships 2 years later is a common failure mode.
Q102. How do you assess the maturity of a team's architecture practices during a technical due diligence?
Look for: ADRs or documented architectural decisions (even informal), automated test coverage including integration and contract tests, a deployment pipeline that enforces quality gates, monitoring with defined SLOs and active alerting, a post-incident review process, and evidence that the team can deploy and roll back safely without manual steps. The absence of any of these is a risk signal. The absence of all of them is a project risk, not just a technical risk.
Q103. Describe how you would reduce coupling between two services that currently share a database.
Step 1: identify which tables each service reads and writes. Step 2: agree on data ownership — each table belongs to exactly one service. Step 3: the non-owning service calls the owning service's API (or subscribes to its events) instead of querying the table directly. Step 4: migrate the reads one by one, verifying behaviour at each step. Step 5: remove the non-owner's database credentials once all reads are migrated. This is months of careful work, not a sprint task.
Q104. What metrics would you use to measure the success of an architecture migration?
Deployment frequency (are teams deploying more independently?), lead time for changes (time from commit to production), change failure rate (percentage of deployments causing incidents), MTTR (time to restore service after an incident), and team cognitive load (qualitative — do engineers understand the system they are working in?). These are the DORA metrics plus team health. Technical metrics (coupling, cyclomatic complexity) are useful leading indicators but business outcomes are the ultimate measure.
Q105. If you joined a company and found the architecture was significantly different from what was described in interviews, what would you do?
Spend the first 30 days observing before proposing changes. Map what actually exists (not what the diagrams show). Identify the pain points the current team lives with daily. Build trust with the engineering team — they have context you lack. After 30 days, present a pragmatic assessment: what is working well, what is the highest-risk debt, and a phased improvement roadmap with business justification. Avoid arriving with a rewrite plan — it signals you are not listening.
Glossary of Key Architecture Terms
| Term | Definition |
|---|---|
| ADR | Architecture Decision Record — a document capturing a significant design decision and its rationale |
| CQRS | Command Query Responsibility Segregation — separate models for reads and writes |
| CAP Theorem | Consistency, Availability, Partition tolerance — a distributed system can guarantee at most two |
| DDD | Domain-Driven Design — aligning software model with business domain |
| Event Sourcing | Storing state as a sequence of events rather than current values |
| Idempotency | Operation that produces the same result when called multiple times with the same input |
| Outbox Pattern | Atomically publishing events by writing them to a local table before relaying to a broker |
| Saga | Distributed transaction pattern using compensating transactions for rollback |
| SLO | Service Level Objective — target reliability metric for a service |
| Strangler Fig | Migration pattern that incrementally replaces a legacy system |
Practice System Design Prompts
Use these for mock interviews — set a 45-minute timer and design before reading any answer resources:
- Design a ride-sharing platform (Uber-scale: 10 million rides per day)
- Design a distributed file storage system (Dropbox-scale)
- Design a real-time leaderboard for a multiplayer game (10 million concurrent users)
- Design a webhook delivery system (guaranteed at-least-once delivery)
- Design an e-commerce inventory system (prevent overselling during flash sales)
- Design a distributed logging system (1 TB of logs per day, 30-day retention)
- Design a CI/CD pipeline orchestration system
- Design a fraud detection system for payments (decisions in under 100ms)
Ready to Master Software Architecture?
The Architecture Masterclass takes you from senior engineer to confident architect with hands-on system design walkthroughs, ADR templates, and pattern deep-dives.
Start the Architecture Masterclass →