When should I use event-driven architecture vs request-response?

Use event-driven when services need to react to things that happened in other services without tight coupling, when processing is inherently asynchronous (order fulfillment, email sending, report generation), or when you need to decouple producers from consumers for independent scaling. Use request-response when you need an immediate result, when the caller must know the outcome before proceeding, or when the operation is synchronous by nature.

What is event sourcing and how does it differ from regular event-driven architecture?

Regular event-driven architecture uses events to communicate between services - events are ephemeral notifications. Event sourcing stores events as the primary record of state - the current state of an entity is derived by replaying its event history. All EDA systems publish events; event sourcing persists them as the source of truth. Event sourcing is a storage pattern; EDA is an integration pattern. They are often combined but are distinct concepts.

How do I handle event schema evolution without breaking consumers?

Use a schema registry (Confluent Schema Registry with Avro, or Protobuf) to version and validate schemas. Follow additive-only changes: add optional fields, never remove or rename required fields. Use a compatibility mode (BACKWARD or FORWARD) that the registry enforces. For breaking changes, use a new event type or topic rather than modifying the existing schema. Consumer code should ignore unknown fields rather than failing on unexpected data.

Event-Driven Architecture: Scaling Beyond Request-Response with Kafka, Idempotency & Choreography

Q: How do I guarantee exactly-once delivery in Kafka?

Kafka's Exactly-Once Semantics (EOS) uses idempotent producers (producer ID and sequence numbers) and transactional APIs (beginTransaction, commitTransaction). Combined with read_committed isolation on consumers, this prevents duplicate processing. In practice, EOS adds latency and complexity - most teams instead design consumers to be idempotent (safe to process the same message twice) and accept at-least-once delivery semantics.

← Back to Software Architecture Hub

Event-Driven Architecture: Scaling Beyond Request-Response with Kafka, Idempotency & Choreography

Events vs Commands vs Queries
Pub/Sub vs Competing Consumers: When to Use Each
Kafka vs RabbitMQ: The Honest Comparison
Designing Events: Schema and Versioning
Idempotency: Handling At-Least-Once Delivery
Dead-Letter Queues: Graceful Failure Handling
Choreography vs Orchestration
The Transactional Outbox Pattern
Message Ordering: Partitions and Sequence Numbers
Observability: Distributed Tracing in EDA
Frequently Asked Questions
Key Takeaway

Events vs Commands vs Queries

Understanding the three types of messages prevents architectural confusion:

Type	Direction	Semantic	Example	Response?
Command	One producer -> one consumer	"Do this" - an instruction	`CreateInvoiceCommand`	Yes (success/failure)
Event	One producer -> many consumers	"This happened" - a fact	`UserRegisteredEvent`	No - fire and forget
Query	One requester -> one responder	"Tell me this"	`GetUserByIdQuery`	Yes (data)

Why events are different:

A Command expects something specific to happen - it fails if the consumer rejects it
An Event records that something already happened - producers don't control or wait for reactions
Events are past tense (OrderCreated, PaymentReceived) - they are immutable historical facts

Pub/Sub vs Competing Consumers: When to Use Each

Pub/Sub: One event -> all consumer groups receive it. Each consumer group processes all events independently. New consumer groups can be added without affecting the producer.

Work Queue: One task -> exactly one worker processes it. Adding workers increases throughput - workers compete for tasks.

In Kafka: Both models use the same topic. The difference is consumer group structure:

Multiple consumer groups = Pub/Sub (each group gets all messages)
Multiple consumers in one group = Competing Consumers (partitions distributed)

Kafka vs RabbitMQ: The Honest Comparison

Feature	Apache Kafka	RabbitMQ
Model	Append-only log (events stored for days/weeks)	Queue (messages deleted after consumption)
Throughput	Millions of messages/second	~50K messages/second
Message replay	Yes - any consumer group can replay from any offset	No - once consumed, gone
Message routing	Topic + partition key	Exchange + routing key (very flexible)
Message ordering	Guaranteed within a partition	Guaranteed within a queue
Retention	Configurable (days, weeks, forever)	Until acknowledged
Perfect for	Event streaming, audit logs, big data pipelines	Task queues, RPC, complex routing
Operational complexity	High (ZooKeeper/KRaft, tuning required)	Lower (management UI, simple ops)

Rule of thumb: Use Kafka when you need replay, multiple independent consumers, or high volume. Use RabbitMQ when you need flexible routing, request/reply, or simple task queues.

Designing Events: Schema and Versioning

Events are contracts between services. Breaking changes to event schemas break all consumers.

Good event design principles:

json

{
  "eventId": "7a9f0d2e-4b1c-4a8e-9c3d-123456789abc",
  "eventType": "user.registered.v2",
  "timestamp": "2026-04-17T12:00:00Z",
  "version": 2,
  "correlationId": "request-abc-123",
  "source": "identity-service",
  "payload": {
    "userId": "user-550e8400",
    "email": "alice@example.com",
    "registrationMethod": "GOOGLE_OAUTH",
    "country": "GB"
  }
}

Evolving schemas safely (Avro + Schema Registry):

avro

// v1 schema - deployed first
{
  "type": "record",
  "name": "UserRegistered",
  "fields": [
    {"name": "userId", "type": "string"},
    {"name": "email",  "type": "string"}
  ]
}

// v2 schema - added field with default (backward compatible!)
{
  "type": "record",
  "name": "UserRegistered",
  "fields": [
    {"name": "userId", "type": "string"},
    {"name": "email",  "type": "string"},
    {"name": "country", "type": "string", "default": "UNKNOWN"}
    // No default = BREAKING CHANGE - old consumers can't deserialize
  ]
}

Idempotency: Handling At-Least-Once Delivery

Kafka guarantees at-least-once delivery - a message may be delivered multiple times if a consumer crashes before acknowledging. Your handlers must be idempotent (applying the same message twice has the same effect as once):

python

def handle_payment_received(event: dict):
    payment_id = event['paymentId']
    
    # Idempotency check: have we already processed this payment?
    if payment_repository.exists(payment_id):
        logger.info(f"Duplicate payment event ignored: {payment_id}")
        return  # Safe to skip - we already processed this
    
    # Process the payment (first time only):
    with transaction():
        payment = Payment.from_event(event)
        payment_repository.save(payment)
        # Process only once
        billing_service.reconcile(payment)
        # Mark as processed atomically:
        processed_events.insert(payment_id)  # Prevents future duplicates

Dead-Letter Queues: Graceful Failure Handling

When a consumer fails to process a message (validation error, downstream service down, unexpected data), it must not be lost or block the main queue:

python

# Kafka consumer with DLT routing:
@kafka_consumer('orders.created')
def handle_order_created(message: Message):
    try:
        order = parse_order(message.value)
        process_order(order)
        message.commit()
    except ValidationError as e:
        # Permanent failure - route to DLT immediately:
        dlq_producer.send('orders.created.dlq', 
                          value=message.value,
                          headers={'failure_reason': str(e),
                                   'original_offset': str(message.offset)})
        message.commit()  # Acknowledge to avoid infinite retry
    except TemporaryError as e:
        # Transient failure - retry (do NOT commit):
        time.sleep(exponential_backoff(message.retry_count))
        raise  # Let framework retry up to max_retries

Choreography vs Orchestration

Two patterns for coordinating multi-step workflows in EDA:

Choreography (Decentralized): Each service reacts to events and publishes its own events - no central coordinator:

text

OrderService:   Emits  OrderCreated
InventoryService: Listens OrderCreated -> Reserves stock -> Emits InventoryReserved
PaymentService: Listens InventoryReserved -> Charges card -> Emits PaymentProcessed
ShippingService: Listens PaymentProcessed -> Creates shipment

✅ Highly decoupled. ❌ Hard to visualize the full workflow. Hard to handle failures.

Orchestration (Centralized): A dedicated Saga Coordinator sends commands and tracks state:

text

SagaCoordinator: -> Commands InventoryService.reserve(order)
InventoryService: -> Replies success/failure
SagaCoordinator: -> Commands PaymentService.charge(order)
PaymentService:   -> Replies success/failure
SagaCoordinator: -> Commands ShippingService.createShipment(order)

✅ Visible workflow, easy failure compensation. ❌ The coordinator becomes a new dependency.

When to use each:

Choreography: < 4 steps, well-understood failure modes
Orchestration (Saga): Multi-step transactions requiring compensating actions

Frequently Asked Questions

How do I guarantee exactly-once delivery in Kafka? Kafka's built-in Exactly-Once Semantics (EOS) uses idempotent producers (producer ID + sequence numbers) and transactional APIs (beginTransaction, commitTransaction). Combined with read_committed isolation on consumers, this ensures a message is processed exactly once as long as you use Kafka transactions correctly. However, EOS only applies within the Kafka ecosystem - once you write to an external system (database, REST API), you need application-level idempotency.

What is the difference between a topic partition and a consumer group? A partition is Kafka's unit of parallelism on the producer side - messages with the same key (e.g., userId) go to the same partition, guaranteeing per-key ordering. A consumer group is Kafka's unit of parallelism on the consumer side - each partition is assigned to exactly one consumer in the group, enabling parallel consumption without duplicate processing.

Key Takeaway

Event-Driven Architecture enables the loose coupling that makes large distributed systems resilient and scalable. The technical depth lies not in setting up a Kafka cluster, but in the operational patterns: schema versioning without breaking consumers, idempotent handlers for at-least-once delivery, dead-letter queues for unprocessable messages, and transactional outbox for atomic database-plus-event writes. Mastering these patterns is what separates an event-driven system that works in demos from one that runs reliably in production under real-world failure conditions.

Part of the Software Architecture Hub - comprehensive guides from architectural foundations to advanced distributed systems patterns.

Event-Driven Architecture: Scaling Beyond Request-Response with Kafka, Idempotency & Choreography

Table of Contents

Events vs Commands vs Queries

Pub/Sub vs Competing Consumers: When to Use Each

Kafka vs RabbitMQ: The Honest Comparison

Designing Events: Schema and Versioning

Idempotency: Handling At-Least-Once Delivery

Dead-Letter Queues: Graceful Failure Handling

Choreography vs Orchestration

Frequently Asked Questions

Key Takeaway