Event-Driven Architecture: Scaling Beyond Request-Response with Kafka, Idempotency & Choreography

Event-Driven Architecture: Scaling Beyond Request-Response with Kafka, Idempotency & Choreography
Table of Contents
- Events vs Commands vs Queries
- Pub/Sub vs Competing Consumers: When to Use Each
- Kafka vs RabbitMQ: The Honest Comparison
- Designing Events: Schema and Versioning
- Idempotency: Handling At-Least-Once Delivery
- Dead-Letter Queues: Graceful Failure Handling
- Choreography vs Orchestration
- The Transactional Outbox Pattern
- Message Ordering: Partitions and Sequence Numbers
- Observability: Distributed Tracing in EDA
- Frequently Asked Questions
- Key Takeaway
Events vs Commands vs Queries
Understanding the three types of messages prevents architectural confusion:
| Type | Direction | Semantic | Example | Response? |
|---|---|---|---|---|
| Command | One producer → one consumer | "Do this" — an instruction | CreateInvoiceCommand | Yes (success/failure) |
| Event | One producer → many consumers | "This happened" — a fact | UserRegisteredEvent | No — fire and forget |
| Query | One requester → one responder | "Tell me this" | GetUserByIdQuery | Yes (data) |
Why events are different:
- A Command expects something specific to happen — it fails if the consumer rejects it
- An Event records that something already happened — producers don't control or wait for reactions
- Events are past tense (
OrderCreated,PaymentReceived) — they are immutable historical facts
Pub/Sub vs Competing Consumers: When to Use Each
Pub/Sub: One event → all consumer groups receive it. Each consumer group processes all events independently. New consumer groups can be added without affecting the producer.
Work Queue: One task → exactly one worker processes it. Adding workers increases throughput — workers compete for tasks.
In Kafka: Both models use the same topic. The difference is consumer group structure:
- Multiple consumer groups = Pub/Sub (each group gets all messages)
- Multiple consumers in one group = Competing Consumers (partitions distributed)
Kafka vs RabbitMQ: The Honest Comparison
| Feature | Apache Kafka | RabbitMQ |
|---|---|---|
| Model | Append-only log (events stored for days/weeks) | Queue (messages deleted after consumption) |
| Throughput | Millions of messages/second | ~50K messages/second |
| Message replay | Yes — any consumer group can replay from any offset | No — once consumed, gone |
| Message routing | Topic + partition key | Exchange + routing key (very flexible) |
| Message ordering | Guaranteed within a partition | Guaranteed within a queue |
| Retention | Configurable (days, weeks, forever) | Until acknowledged |
| Perfect for | Event streaming, audit logs, big data pipelines | Task queues, RPC, complex routing |
| Operational complexity | High (ZooKeeper/KRaft, tuning required) | Lower (management UI, simple ops) |
Rule of thumb: Use Kafka when you need replay, multiple independent consumers, or high volume. Use RabbitMQ when you need flexible routing, request/reply, or simple task queues.
Designing Events: Schema and Versioning
Events are contracts between services. Breaking changes to event schemas break all consumers.
Good event design principles:
Evolving schemas safely (Avro + Schema Registry):
Idempotency: Handling At-Least-Once Delivery
Kafka guarantees at-least-once delivery — a message may be delivered multiple times if a consumer crashes before acknowledging. Your handlers must be idempotent (applying the same message twice has the same effect as once):
Dead-Letter Queues: Graceful Failure Handling
When a consumer fails to process a message (validation error, downstream service down, unexpected data), it must not be lost or block the main queue:
Choreography vs Orchestration
Two patterns for coordinating multi-step workflows in EDA:
Choreography (Decentralized): Each service reacts to events and publishes its own events — no central coordinator:
✅ Highly decoupled. ❌ Hard to visualize the full workflow. Hard to handle failures.
Orchestration (Centralized): A dedicated Saga Coordinator sends commands and tracks state:
✅ Visible workflow, easy failure compensation. ❌ The coordinator becomes a new dependency.
When to use each:
- Choreography: < 4 steps, well-understood failure modes
- Orchestration (Saga): Multi-step transactions requiring compensating actions
Frequently Asked Questions
How do I guarantee exactly-once delivery in Kafka?
Kafka's built-in Exactly-Once Semantics (EOS) uses idempotent producers (producer ID + sequence numbers) and transactional APIs (beginTransaction, commitTransaction). Combined with read_committed isolation on consumers, this ensures a message is processed exactly once as long as you use Kafka transactions correctly. However, EOS only applies within the Kafka ecosystem — once you write to an external system (database, REST API), you need application-level idempotency.
What is the difference between a topic partition and a consumer group? A partition is Kafka's unit of parallelism on the producer side — messages with the same key (e.g., userId) go to the same partition, guaranteeing per-key ordering. A consumer group is Kafka's unit of parallelism on the consumer side — each partition is assigned to exactly one consumer in the group, enabling parallel consumption without duplicate processing.
Key Takeaway
Event-Driven Architecture enables the loose coupling that makes large distributed systems resilient and scalable. The technical depth lies not in setting up a Kafka cluster, but in the operational patterns: schema versioning without breaking consumers, idempotent handlers for at-least-once delivery, dead-letter queues for unprocessable messages, and transactional outbox for atomic database-plus-event writes. Mastering these patterns is what separates an event-driven system that works in demos from one that runs reliably in production under real-world failure conditions.
Read next: CQRS & Event Sourcing: A Practical Implementation Guide →
Part of the Software Architecture Hub — comprehensive guides from architectural foundations to advanced distributed systems patterns.
