Scope it first
"A service other teams call to notify users — email, SMS, push. Templates with variables, user channel preferences and opt-outs, retries on provider failure, no duplicates, rate limits so we don't spam. OK?"
Why this one matters disproportionately: you will almost certainly build or extend one at work — every product notifies — and it's the LLD where the pattern vocabulary stops being academic: Strategy, Factory, Decorator, Observer and a queue all earn their keep in one design. It's also the cleanest LLD→HLD bridge in the catalog.
Entities & relationships
The flow: a caller says notify(user, ORDER_SHIPPED, {orderId: 42}) —
note callers speak in events, never in channels or copy; that
decision is the whole API. The service resolves preferences ("Asha:
push for shipping, email for billing, never SMS"), renders the
template per channel, and dispatches.
Where the patterns live (the showcase)
- Strategy —
Channel. One interface, three (then ten) implementations. Adding WhatsApp = one class (Open/Closed, again). - Factory + registry —
ChannelFactory.for(channelType)so the dispatch loop never names a concrete class (patterns-in-depth). - Adapter — each channel wraps a vendor SDK (SES, Twilio, FCM)
behind the
Channelport; swapping SMS providers touches one file. - Decorator — retries, rate limiting and metrics wrap any channel:
Metered(RateLimited(Retrying(SmsChannel)))— assembled at startup, testable in isolation. - Observer — the caller side. Product code publishes
OrderShipped; the notification service is one subscriber. Shipping code knows nothing about emails — which is why marketing can add a "review your purchase" notification without touching the orders team (event-driven decoupling in miniature). - Template Method-ish rendering — one
Templateper (event, locale), rendering per channel: email gets HTML + subject, SMS gets 160 chars, push gets title + body. Content shaping is a template concern, never an if-chain in the channel.
That Observer seam is the decoupling that makes the whole thing extensible —
the orders service publishes one event and never learns who listens. Step through
a publish fanning out to subscribers, then unsubscribe one (a user opts out) and
publish again:
solid = subscribed · dashed = not listening · the subject calls the same method on every subscriber
1/15A subject and 3 possible observers. The subject holds a list of subscribers and knows nothing else about them — that decoupling is the whole pattern.
The reliability core (where seniors are graded)
The naive version sends synchronously in the request path. The real version is asynchronous by construction:
notify()validates, assigns a request id, persists the request, enqueues, returns. (Caller latency: milliseconds. Provider outages: invisible to callers — the queue's job.)- Workers consume, resolve preferences, render, dispatch per channel.
- Retries with exponential backoff + jitter on transient failures (Decorator); after N attempts → dead-letter queue for inspection, never silent drop.
- Idempotency end-to-end: the queue is at-least-once, so workers
may see a request twice; a
(request id, channel)sent-record makes redelivery a no-op — "no duplicate notifications" is a dedup-key promise, not a hope (the same move as WhatsApp messages). - Status tracking: per (request, channel):
PENDING → SENT → DELIVERED/FAILED— provider delivery callbacks (webhooks) advance it. A small state machine, one more time.
Plus the two product-integrity rules that distinguish people who've run one: opt-outs are checked at send time, not enqueue time (a user who unsubscribes between enqueue and send must not be messaged — compliance, not courtesy), and rate limiting is per-user as well as per-provider (a bug upstream must not let one user get 400 pushes — the token bucket, applied with empathy). For ordering ("OTP must beat marketing"): separate queues per priority class, never one queue with sorting.
Think it through like the interview
PROBLEMA service other teams call to notify users via email, SMS and push: templates, user preferences, retries, no duplicates, rate limits.
- 1
Design the API before the classes
“What should callers pass — a message, or something else? This decision decides everything downstream.”
- 2
Let the patterns earn their keep
“Channels vary, vendors vary, cross-cutting concerns stack. Which pattern goes where?”
unlocks after the stage above - 3
Go async by construction
“Twilio is down for 10 minutes. What may callers of notify() experience?”
unlocks after the stage above - 4
Make 'no duplicates' a mechanism, not a hope
“The queue is at-least-once and providers time out ambiguously. Where exactly do duplicates die?”
unlocks after the stage above - 5
The two rules only operators know
“What checks happen at SEND time rather than enqueue time, and why per-user rate limits?”
unlocks after the stage above
Walk a scenario
Order ships → orders service publishes OrderShipped{user, orderId} →
notification service (subscriber) creates request n-7741, persists,
enqueues, acks. Worker picks it up: preferences say push + email;
template order_shipped renders both shapes; PushChannel (wrapped
in retry/ratelimit decorators) sends — FCM times out, retry #2
succeeds → status SENT; email passes through SES adapter → SENT;
webhook later marks DELIVERED. Same afternoon, a redelivered queue
message replays n-7741 — the sent-record short-circuits both
channels. Nobody got two pushes; the orders team never knew any of
this happened. That mutual invisibility is the design working.
Practice — level up
A notification system is pub/sub fan-out with throttling: one event reaches many subscribers across channels, without spamming or duplicating. These drills isolate fan-out and rate control.
Climb in order — every rung assumes the one above it. Solve on LeetCode, then tick it here; progress is saved on this device.
Warm-up — drop the duplicate
Allow a message at most once per window — dedupe before you ever notify.
Core — one event, many targets
Fan out, then count to throttle.- Design TwitterMedium
Push one event to every subscriber — the Observer fan-out at the system's heart.
Count events in a sliding window — the per-user send throttle.
Stretch — batch a window
- Design Hit CounterMedium
Aggregate events over the last N minutes — collapsing many notifications into one digest.