Design Instagram

The feed problem: fan-out on write vs read, the celebrity exception, photo storage done right, and why the home feed is precomputed.

system-designfeedfan-outcase-study

Prompt

Design a photo-sharing app: users post photos, follow each other, and scroll a home feed of the people they follow. Likes and comments. Hundreds of millions of users.

1. Requirements

Functional: upload photos with captions; follow/unfollow; home feed (recent posts from people you follow, roughly reverse- chronological); like/comment; user profile (own posts grid). Non-functional: feed loads fast (perceived-instant, under ~200 ms); read-heavy in the extreme; eventual consistency is fine almost everywhere (a follower seeing your post 30 s late is invisible; a deleted post reappearing is not).

Scope cut to state: "Stories, DMs (that's WhatsApp), Reels (that's YouTube/Netflix) and the ML-ranked ordering are out; the deep dive is the feed."

2. Estimation (shapes the design)

~500 M daily users; maybe ~100 M photos posted/day ≈ 1,200 uploads/sec — modest. But each user opens the feed several times a day: ~2 B feed loads/day ≈ 25–50 K feed reads/sec. Read:write is roughly 100:1 — and each feed read logically touches hundreds of followees. Photos: 100 M/day × ~2 MB ≈ 200 TB/day into storage — a blob + CDN problem, solved the standard way.

The number that designs the system: a naive feed query is a join across hundreds of accounts, executed 50 K times per second. Make that fast and everything else is routine.

3. High-level design

The split to announce immediately: photos are bytes (object storage

  • CDN; the API hands out URLs, exactly like WhatsApp media) and the feed is ids (small lists of post ids, hydrated on read). All the interesting design is in the ids.

4. Deep dive — the feed problem

Two pure strategies, then the real answer:

Fan-out on read (compute when they ask)

Store posts once, by author. To build a feed: fetch the user's followee list, query recent posts per followee, merge by time, return.

  • ✅ Writes are trivial (one insert). Nothing stale, nothing wasted on inactive users.
  • ❌ Every feed load does hundreds of queries + a merge, 50 K times/sec. Latency and load are paid on the frequent operation (reads) to subsidize the rare one (writes) — backwards for a 100:1 read-heavy system.

Fan-out on write (compute when they post)

When a user posts, push the post id into every follower's precomputed feed list (Redis list per user, capped at ~800 entries). Reading the feed = one list fetch + hydration. This is the compute-once-broadcast-many instinct applied to feeds.

  • ✅ Reads are O(1) — the 200 ms budget is easy.
  • ❌ A post by someone with 50 M followers triggers 50 M list pushes. The write cost is unbounded — this is the celebrity problem, and it's the single most famous follow-up in system design interviews.

The hybrid (the answer)

Fan-out on write for normal users; fan-out on read for celebrities. Define a threshold (say 100 K followers). Normal posts push to follower feeds via queue workers (async — a few seconds of delivery lag is invisible). Celebrity posts are not fanned out; instead, each feed read merges the precomputed list with a live query of the (few) celebrities the user follows — bounded, because nobody follows many thousands of accounts, and celebrity posts are cache-friendly (millions of users request the same recent posts — one cache entry serves them all).

Also skip fan-out to dormant users (no login in 30 days) and rebuild their feed lazily on return — why maintain 200 M feeds nobody is reading?

Both halves of the hybrid, on one timeline:

Likes & comments at scale

Counters are their own mini-problem: a viral post takes thousands of likes/sec, and UPDATE posts SET likes = likes + 1 serializes on one row. Standard answer: sharded counters or Redis increments, flushed/reconciled to the DB periodically; display counts may lag by seconds, and nobody notices (the same trade as view counts — YouTube's version goes deeper). Comments are just posts attached to a post id — paginated, newest first, no fan-out.

Data model sketch

  • posts — sharded by author id (profile pages read one shard; feed hydration does scatter-gather by post id, mitigated by caching hot posts).
  • follows — (follower → followee) and the reverse index (followee → followers); the reverse index is what fan-out workers read, and for celebrities it's millions of rows — another reason not to fan out their posts.
  • feed:{userId} — Redis list of post ids, capped; the product decision hiding inside: scroll past the cap → fall back to fan-out-on-read for older content.

Think it through like the interview

Think it through: Design InstagramHLD Classic — the feed problem0/5 stages

PROBLEMPhoto-sharing app: post photos, follow people, scroll a home feed. Hundreds of millions of users. You have 40 minutes.

  1. 1

    Scope + find the one hard problem

    Photos, follows, feed, likes — which one is THE problem, and how do I find it from the numbers?

  2. 2

    Separate bytes from ids

    Where do the photos live, and what does the feed actually store?

    unlocks after the stage above
  3. 3

    Fan-out: pay on write or on read?

    Precompute each feed at post time, or compute at read time? Argue from the 100:1 ratio.

    unlocks after the stage above
  4. 4

    The celebrity problem

    Someone with 50M followers posts. What just happened to your design?

    unlocks after the stage above
  5. 5

    Failure modes prove the design

    Redis dies. A post is deleted. The fan-out queue backs up. Which of these is a disaster?

    unlocks after the stage above

5. Bottlenecks & failure modes

  • Fan-out queue backlog (everyone posts during the World Cup final) → feeds go stale by minutes. Mitigation: scale workers; prioritize fan-out to active-now users first — staleness for offline users is free.
  • Redis feed cache loss → feeds rebuild via fan-out-on-read on demand (degraded latency, not data loss — posts are the source of truth, feeds are derived). This "cache is rebuildable" property is worth saying out loud; it's why the design sleeps at night.
  • Hot post (celebrity engagement storm) → counter sharding above, plus the post itself cached at the API layer.
  • Delete must propagate — a deleted post's id lingers in millions of feed lists; hydration must treat missing posts as tombstones and skip them. (Cheaper than chasing every list.)

Design drills

The feed is a fan-out decision plus a media path. Defend the calls.

Design drills: Feed (Instagram)0/4 done

Whiteboard each one out loud for 5–10 minutes before you reveal what a strong answer covers — the gap between your sketch and the checklist is your study list. Progress is saved on this device.

Warm-up

Fan-out on write vs fan-out on read for the home feed — when do you use each?

Core

Design the photo upload → serve path.

Core

The precomputed feed cache is enormous. Is it durable — and what if it's lost?

Stretch

A post is deleted (or taken down). Eventual disappearance vs synchronous — which, and when does the cheap answer break?