Two tools, one word, different ideas
Queues in general decouple producers from consumers. This page goes inside the two systems you'll actually be asked about — and the first thing to understand is that despite both being called "message brokers," they embody different data structures:
- RabbitMQ is a queue: messages go in, a consumer takes one out, acknowledges it, and it's gone. The broker tracks every message's fate.
- Kafka is a log: an append-only file (the file-handling instinct, industrialized). Messages are never removed by consumption — consumers just remember how far they've read. The same data can be read by ten consumers, or re-read from the beginning tomorrow.
That single difference — broker tracks messages vs consumers track positions — explains every practical divergence below.
Kafka: the distributed log
topic "orders" (a category of events)
├── partition 0: [e1][e4][e7][e10] ──→ appended, never modified
├── partition 1: [e2][e5][e8] ...
└── partition 2: [e3][e6][e9] ...
↑
consumer group "billing": partition 1, offset 2
consumer group "analytics": partition 1, offset 0 ← same data!
The four concepts that are Kafka:
- Topic — a named stream of events (
orders,tweets, the event log in the Twitter design). - Partition — a topic is split into N independent append-only
logs. Partitions are the unit of parallelism and of ordering:
order is guaranteed within a partition only. The producer picks
the partition by key —
hash(user_id) % N— so all of one user's events stay ordered (the hash-table trick used for routing). - Offset — a consumer's bookmark per partition: "I've processed through entry 4711." Stored compactly, by the consumer group. Crash and resume = continue from your committed offset. Want to reprocess last week? Rewind the offset — replay is a built-in superpower, not a recovery hack.
- Consumer group — a team of consumer instances sharing a group id; Kafka assigns each partition to exactly one member, so the group processes the topic in parallel without overlap. Two different groups (billing, analytics) each get the whole stream — pub/sub and work-queue semantics from one mechanism.
Retention is time/size-based ("keep 7 days"), not consumption-based — which is why Kafka doubles as the source-of-truth event log in event-driven architectures and feeds N independent views: fan-out workers, search indexers and trend pipelines all reading the same topic at their own pace, none aware of the others.
Throughput notes that explain its dominance: sequential appends + batching + zero-copy reads make a modest cluster do millions of messages/sec; consumers poll in batches; everything is replicated per partition (leader/follower — the replication story again).
RabbitMQ: the smart broker
RabbitMQ implements classic messaging (AMQP): producers publish to an exchange, which routes copies into queues by binding rules, and consumers receive pushed messages, ack each one, and the broker deletes it:
producer → [exchange: "notifications"]
├─ binding: type=email → [queue: email-q] → consumers
├─ binding: type=sms → [queue: sms-q] → consumers
└─ binding: type=# → [queue: audit-q] (gets everything)
What its per-message intelligence buys:
- Routing logic in the broker — topic/direct/fanout exchanges, header matching; the notification system's per-channel and per-priority queues are two binding declarations.
- Per-message acks, redelivery and dead-lettering — a consumer that crashes mid-message triggers redelivery; a message that fails N times routes automatically to a dead-letter queue (the DLQ pattern, natively).
- Priorities, per-message TTLs, delayed delivery — work-queue niceties Kafka doesn't natively do (a delayed retry in Kafka means extra topics and discipline).
- Backpressure that pushes back — prefetch limits and queue bounds; the broker actively manages slow consumers rather than letting them lag silently.
The cost: tracking every message's lifecycle is work — RabbitMQ tops out orders of magnitude below Kafka's throughput, and history beyond consumption is simply gone (no replay).
Choosing (the interview table)
| Question | Kafka | RabbitMQ |
|---|---|---|
| "What happened?" — events as facts, multiple readers, replay | ✅ its purpose | ✗ consumed = gone |
| "Do this work" — task distribution, routing, retries, priorities | works, with ceremony | ✅ its purpose |
| Throughput / firehose (view beacons, clickstreams) | millions/sec | tens of thousands/sec |
| Strict per-entity ordering | per partition key — by design | per single queue only |
| Operational weight | cluster + (historically) ZooKeeper; heavier | one node to start; lighter |
The one-sentence versions: Kafka when the message is a fact others may also care about, now or later; RabbitMQ when the message is a job for exactly one worker. Plenty of systems run both — orders as Kafka events, image-resize jobs in RabbitMQ — and saying that out loud beats picking a winner.
Common mistakes
- Kafka as a job queue — per-message acks, delays and priorities fight the log model; you'll rebuild RabbitMQ badly on top of it.
- Ignoring partition keys — random keys destroy per-user ordering; one celebrity key creates a hot partition that caps the whole topic's throughput (salt it).
- Assuming exactly-once — both deliver at-least-once by default; consumers must be idempotent, full stop. (Kafka's transactional "exactly-once" applies within Kafka-to-Kafka pipelines, not to your database side effects — the outbox pattern handles that boundary.)
- Unbounded consumer lag with no alarm — Kafka happily retains while your consumer falls a day behind; lag is the metric (observability).
- Rebalancing surprises — adding/removing group members pauses partitions mid-flight; design consumers to checkpoint and resume gracefully.
Interview perspective
Practice
- Feel the difference: run both locally (Docker). Kafka: one topic, two consumer groups — watch both receive everything; kill a consumer mid-batch and watch redelivery from the offset. RabbitMQ: one exchange, two bound queues with different routing keys; watch routing and acks.
- Replay: produce 1,000 events, consume them, then reset the group's offset to zero and reprocess — the operation that makes Kafka Kafka.
- Break ordering: produce a user's events with random keys vs keyed by user id; consume with 4 instances and log the order. See it, never forget it.
- Design rep: for the notification system, decide Kafka or RabbitMQ for (a) the incoming event firehose from product teams, (b) the per-channel send queues with retries and priorities. (It's both — argue why.)
Next: API Gateway & Service Discovery — how requests find their way into and around the microservices you've split.