Kafka & RabbitMQ

The log vs the queue: partitions, consumer groups, offsets and replay — versus exchanges, acks and routing — and how to pick between them.

backendkafkarabbitmqmessagingevent-streaming

Two tools, one word, different ideas

Queues in general decouple producers from consumers. This page goes inside the two systems you'll actually be asked about — and the first thing to understand is that despite both being called "message brokers," they embody different data structures:

  • RabbitMQ is a queue: messages go in, a consumer takes one out, acknowledges it, and it's gone. The broker tracks every message's fate.
  • Kafka is a log: an append-only file (the file-handling instinct, industrialized). Messages are never removed by consumption — consumers just remember how far they've read. The same data can be read by ten consumers, or re-read from the beginning tomorrow.

That single difference — broker tracks messages vs consumers track positions — explains every practical divergence below.

Kafka: the distributed log

topic "orders" (a category of events)
├── partition 0:  [e1][e4][e7][e10] ──→ appended, never modified
├── partition 1:  [e2][e5][e8] ...
└── partition 2:  [e3][e6][e9] ...
                       ↑
              consumer group "billing": partition 1, offset 2
              consumer group "analytics": partition 1, offset 0   ← same data!

The four concepts that are Kafka:

  • Topic — a named stream of events (orders, tweets, the event log in the Twitter design).
  • Partition — a topic is split into N independent append-only logs. Partitions are the unit of parallelism and of ordering: order is guaranteed within a partition only. The producer picks the partition by key — hash(user_id) % N — so all of one user's events stay ordered (the hash-table trick used for routing).
  • Offset — a consumer's bookmark per partition: "I've processed through entry 4711." Stored compactly, by the consumer group. Crash and resume = continue from your committed offset. Want to reprocess last week? Rewind the offset — replay is a built-in superpower, not a recovery hack.
  • Consumer group — a team of consumer instances sharing a group id; Kafka assigns each partition to exactly one member, so the group processes the topic in parallel without overlap. Two different groups (billing, analytics) each get the whole stream — pub/sub and work-queue semantics from one mechanism.

Retention is time/size-based ("keep 7 days"), not consumption-based — which is why Kafka doubles as the source-of-truth event log in event-driven architectures and feeds N independent views: fan-out workers, search indexers and trend pipelines all reading the same topic at their own pace, none aware of the others.

Throughput notes that explain its dominance: sequential appends + batching + zero-copy reads make a modest cluster do millions of messages/sec; consumers poll in batches; everything is replicated per partition (leader/follower — the replication story again).

RabbitMQ: the smart broker

RabbitMQ implements classic messaging (AMQP): producers publish to an exchange, which routes copies into queues by binding rules, and consumers receive pushed messages, ack each one, and the broker deletes it:

producer → [exchange: "notifications"]
              ├─ binding: type=email   → [queue: email-q]  → consumers
              ├─ binding: type=sms     → [queue: sms-q]    → consumers
              └─ binding: type=#       → [queue: audit-q]  (gets everything)

What its per-message intelligence buys:

  • Routing logic in the broker — topic/direct/fanout exchanges, header matching; the notification system's per-channel and per-priority queues are two binding declarations.
  • Per-message acks, redelivery and dead-lettering — a consumer that crashes mid-message triggers redelivery; a message that fails N times routes automatically to a dead-letter queue (the DLQ pattern, natively).
  • Priorities, per-message TTLs, delayed delivery — work-queue niceties Kafka doesn't natively do (a delayed retry in Kafka means extra topics and discipline).
  • Backpressure that pushes back — prefetch limits and queue bounds; the broker actively manages slow consumers rather than letting them lag silently.

The cost: tracking every message's lifecycle is work — RabbitMQ tops out orders of magnitude below Kafka's throughput, and history beyond consumption is simply gone (no replay).

Choosing (the interview table)

QuestionKafkaRabbitMQ
"What happened?" — events as facts, multiple readers, replay✅ its purpose✗ consumed = gone
"Do this work" — task distribution, routing, retries, prioritiesworks, with ceremony✅ its purpose
Throughput / firehose (view beacons, clickstreams)millions/sectens of thousands/sec
Strict per-entity orderingper partition key — by designper single queue only
Operational weightcluster + (historically) ZooKeeper; heavierone node to start; lighter

The one-sentence versions: Kafka when the message is a fact others may also care about, now or later; RabbitMQ when the message is a job for exactly one worker. Plenty of systems run both — orders as Kafka events, image-resize jobs in RabbitMQ — and saying that out loud beats picking a winner.

Common mistakes

  • Kafka as a job queue — per-message acks, delays and priorities fight the log model; you'll rebuild RabbitMQ badly on top of it.
  • Ignoring partition keys — random keys destroy per-user ordering; one celebrity key creates a hot partition that caps the whole topic's throughput (salt it).
  • Assuming exactly-once — both deliver at-least-once by default; consumers must be idempotent, full stop. (Kafka's transactional "exactly-once" applies within Kafka-to-Kafka pipelines, not to your database side effects — the outbox pattern handles that boundary.)
  • Unbounded consumer lag with no alarm — Kafka happily retains while your consumer falls a day behind; lag is the metric (observability).
  • Rebalancing surprises — adding/removing group members pauses partitions mid-flight; design consumers to checkpoint and resume gracefully.

Interview perspective

Practice

  1. Feel the difference: run both locally (Docker). Kafka: one topic, two consumer groups — watch both receive everything; kill a consumer mid-batch and watch redelivery from the offset. RabbitMQ: one exchange, two bound queues with different routing keys; watch routing and acks.
  2. Replay: produce 1,000 events, consume them, then reset the group's offset to zero and reprocess — the operation that makes Kafka Kafka.
  3. Break ordering: produce a user's events with random keys vs keyed by user id; consume with 4 instances and log the order. See it, never forget it.
  4. Design rep: for the notification system, decide Kafka or RabbitMQ for (a) the incoming event firehose from product teams, (b) the per-channel send queues with retries and priorities. (It's both — argue why.)

Next: API Gateway & Service Discovery — how requests find their way into and around the microservices you've split.