Prompt
Design a microblogging platform: short posts, follows, home timeline, retweets/likes, trending topics, search. Hundreds of millions of users; some have 100M followers.
1. Requirements
Functional: post (small text + media refs); follow; home timeline; retweet & like; trending topics; search. Non-functional: timeline under ~200 ms; posting feels instant; extreme read-heavy; the follower distribution is a power law — and unlike Instagram, celebrity content is the product: a head-of-state's post must reach 100 M timelines fast.
Why this design exists separately from Instagram: tweets are tiny and text-first (storage is easy, distribution is everything), reshares (retweets) amplify fan-out recursively, and two new subsystems — trending and search — are first-class. The feed machinery transfers; this page builds what doesn't.
2. Estimation (shapes the design)
~500 M tweets/day ≈ 6 K writes/sec (tweets are ~300 bytes — a day's text fits on a laptop; storage is not the problem). Timeline reads: ~100 K/sec. Average followers ~700 → naive full fan-out ≈ 4 M timeline-writes/sec average, with single posts spiking to 100 M. The power law isn't an edge case here — the head of the distribution carries the product, so the hybrid strategy isn't an optimization, it's the core design.
3. High-level design
The structural idea to announce: every tweet enters one durable event log, and four independent consumers build four derived views — timelines, trends, search, analytics. Adding a fifth view never touches the write path (event-driven decoupling, drawn large).
4. Deep dives
Timelines under a power law (the canonical answer, compressed)
Same hybrid as Instagram — push for normal users, pull-and-merge for celebrities, skip dormant accounts — with the Twitter-specific torques:
- The celebrity threshold matters more (the head is the traffic); celebrity tweets are served from a hot celebrity-recent cache that millions of timeline-reads merge in — one cache line doing 100 M timelines' work.
- Retweets amplify recursively: a retweet is a new timeline entry referencing the original (never a copy — the original's counters must stay singular), and a mid-tier user retweeting a celebrity triggers fan-out to their followers. Fan-out workers therefore consume from the event log with per-tweet dedup: a user following five retweeters sees the tweet once (timeline insertion checks the referenced id against recent entries).
- Posting writes the tweet + appends to the log, then returns — fan-out is fully async; "instant post" means "durably logged," not "delivered to 100 M lists."
The famous trace — an 80M-follower account tweets — in one picture:
Trending topics (the streaming subsystem)
"What's spiking right now" is a windowed heavy-hitters problem over the firehose:
- Tokenize tweets into terms/hashtags as they flow through the log.
- Count per term over sliding windows (last 10–60 min) — at firehose volume, exact counts per term are wasteful; use count-min sketch (a probabilistic counting structure: small fixed memory, slight overestimates — name it, one sentence, move on) with exact counting for the current top-K candidates.
- Trending ≠ popular: "Monday" is always frequent. Score by acceleration — current window count vs the term's historical baseline; a 50× spike on a small base beats a 1.02× wiggle on a huge one.
- Compute per region; smooth over adjacent windows; apply the editorial/abuse filters every real system has.
Output: a tiny top-K list per region, recomputed every few seconds and cached — read by 100 M users, computed once (compute-once-broadcast-many, again).
Search (the second index)
Every tweet ever, searchable in milliseconds: an inverted index (term → list of tweet ids — the hash-map idea applied to text, the structure inside Elasticsearch/Lucene). Twitter-specific shape:
- Shard by time, not just term — queries overwhelmingly want recent matches, so the freshest index shards are small, hot, in-memory, and searched first; older shards are bigger, colder, searched only when needed (pagination past recent results). The long-tail tiering of YouTube's caches, applied to an index.
- Indexing lag of seconds is acceptable and is what the async indexer buys; ranking blends recency, engagement and relevance — acknowledge and move on.
Counters, briefly
Likes/retweet counts on viral tweets are the hot-counter problem — sharded counters / windowed aggregation; display lags seconds; nobody pays per like, so no settlement pipeline needed. One sentence in the room; the cross- reference is the point.
Think it through like the interview
PROBLEMMicroblogging: short posts, follows, home timeline, retweets, trending topics, search. Hundreds of millions of users; some have 100M followers.
- 1
What's different from Instagram?
“You've done the feed problem. What does THIS prompt add that transfers nothing?”
- 2
One log, N views
“Timelines, trending, search, analytics all need every tweet. How many write paths?”
unlocks after the stage above - 3
Timelines under a power law
“Average 700 followers, but the head of the distribution hits 100M. What does the hybrid look like HERE?”
unlocks after the stage above - 4
Trending = acceleration, not frequency
“Why does 'Monday' never trend, and what structure counts a firehose in fixed memory?”
unlocks after the stage above - 5
Search, and the closing trace
“Index every tweet ever — what's the Twitter-specific sharding? Then trace the 80M-follower tweet.”
unlocks after the stage above
5. Bottlenecks & failure modes
- Fan-out backlog during global events (World Cup final: tweet volume × 5, every tweet hot) → prioritized fan-out (active users first), celebrity-cache absorbs the head, timelines staleness degrades gracefully by minutes — posting never blocks.
- Trending manipulation (bot brigades) → rate limits per account, spam-scoring upstream of the trend pipeline, editorial gates — name the adversarial reality; it's a product surface, not a footnote.
- Timeline cache loss → rebuild lazily via pull-model (the rebuildable-derived-data story once more).
- Search indexer lag → recent tweets invisible in search for minutes; trending (separate pipeline) keeps working — independent consumers fail independently, which is why the event-log architecture wins.
Design drills
The timeline is the fan-out problem in its purest form. Drill the extremes.
Whiteboard each one out loud for 5–10 minutes before you reveal what a strong answer covers — the gap between your sketch and the checklist is your study list. Progress is saved on this device.
Build a user's home timeline. Push (fan-out on write) or pull (fan-out on read)?
A celebrity with 40M followers tweets. Why is it special, and what do you do?
Likes and retweets storm in on a viral tweet. Keep the counters from melting a partition.
Trending and search must reflect a tweet within seconds. How, without slowing the write path?