Why async at all
If a request triggers slow work (send email, run an ML scoring job, evaluate alerts), doing it inline makes the user wait and couples your uptime to the downstream's. Put a queue between them: the request enqueues a job and returns fast; workers consume at their own pace.
Benefits: decoupling (producer/consumer scale independently), buffering (absorb spikes), resilience (retry on failure), and smoothing load. StockVision's APScheduler alert worker and LandAI's ingestion are exactly this: slow/bursty work moved off the hot path.
Queue vs pub/sub
- Queue (point-to-point): each message is processed by one consumer — work distribution (e.g. task queues like RabbitMQ/SQS).
- Pub/sub (fan-out): each message goes to every subscriber — event broadcast (e.g. Kafka topics, Redis pub/sub for live price ticks).
Kafka blurs the line: a partitioned, durable log where consumer groups split partitions (queue-like) while different groups each see all events (pub/sub-like).
Delivery guarantees
| Guarantee | Meaning | Cost |
|---|---|---|
| At-most-once | may drop, never duplicates | simplest, lossy |
| At-least-once | never drops, may duplicate | the common default |
| Exactly-once | no loss, no dupes | expensive; often "effectively-once" via idempotency |
A worker that crashes after doing the work but before ack'ing will reprocess the message. So consumers must be idempotent — processing the same message twice has the same effect as once. Use an idempotency key / dedupe table, or make the operation naturally idempotent (upsert, set-to-value).
Ordering, backpressure, DLQs
- Ordering is only guaranteed within a partition/key — design so order- sensitive events share a key.
- Backpressure: if producers outrun consumers, the queue grows; bound it and shed or slow producers.
- Dead-letter queue: messages that keep failing go to a DLQ for inspection instead of blocking the pipeline forever.
Design drills
Queues are where reliability lives or dies. Drill the failure modes.
Whiteboard each one out loud for 5–10 minutes before you reveal what a strong answer covers — the gap between your sketch and the checklist is your study list. Progress is saved on this device.
When do you put a queue between two services — and when is it the wrong call?
Exactly-once delivery is impossible. Design for at-least-once instead.
A consumer keeps failing on one poisoned message. What happens to the queue, and how do you contain it?
Task queue vs log (Kafka): when each, and why does ordering differ?