Design YouTube

Netflix's problem plus user-generated chaos: the upload pipeline, view counting at scale, long-tail caching, and comments under fire.

system-designvideougccase-study

Prompt

Design a video platform where anyone uploads: process and serve videos globally, count views, rank search and recommendations, handle comments. 500 hours of video uploaded per minute.

1. Requirements

Functional: upload videos of arbitrary size/format; watch with adaptive quality; view counts; search; comments; channel pages. Non-functional: smooth playback worldwide (Netflix's bar); uploads must never be lost; processing delay of minutes is acceptable; view counts may lag but must end up right (creators are paid on them).

The framing that earns immediate credit: "This is Netflix with three hard additions: the catalog is unbounded and user-generated, the popularity curve has a massive long tail, and counters are money. I'll focus there and inherit the CDN/ABR machinery."

2. Estimation (shapes the design)

500 hours/min uploaded ≈ 30 K hours/hour ≈ raw ingest of ~5–10 GB/s, multiplied by transcoding into ~a dozen renditions. Watch side: ~5 B views/day ≈ 60 K video starts/sec, ~1 M+ concurrent streams of different videos (vs Netflix's concentrated catalog). Storage grows petabytes per month, forever — deletion is rare; tiering is mandatory. The asymmetry to call out: uploads are 1,200/min while watches are 60 K/sec — but each upload costs minutes of compute, while each watch is a CDN byte-push.

3. High-level design

4. Deep dives — the three YouTube-specific problems

The upload pipeline (UGC ≠ studio masters)

Netflix ingests a few pristine files per week from studios; YouTube ingests anything — 4 GB phone videos over hotel Wi-Fi, weird codecs, 12-hour streams. The pipeline:

  1. Resumable, chunked upload — the client splits the file and uploads parts; a dropped connection resumes at the last chunk, not byte zero (the reconnect-and-resume instinct, applied to uploads). The raw file lands in object storage before anything else — durability first; everything downstream can retry.
  2. Queue + transcode workersthe ingestion-pipeline playbook: split the video into segments, transcode segments in parallel across workers into the rendition matrix, reassemble, publish (Netflix's trick, at conveyor-belt scale). Idempotent workers; per-segment retries; a poison-pill lane for corrupt files.
  3. Progressive availability — publish a watchable 360p fast, backfill higher qualities; creators see "processing HD…" — a product expression of the async architecture.
  4. Along the way: thumbnail extraction, content moderation / copyright matching (fingerprint against a known-content index — acknowledge, don't deep-dive), metadata indexing for search.

The pipeline as a timeline — note where durability lands and where the creator's experience comes from:

View counting (counters are money)

A viral video takes tens of thousands of views/sec; row-level UPDATE views = views + 1 melts, and creators are paid per view so "roughly right" must converge to exactly right:

  • Clients send view beacons into a stream/queue (Kafka-class), not the database.
  • Stream processors aggregate in windows (per video, per ~10 s), writing one increment per window instead of 30 K — the accumulator pattern running on a firehose.
  • Displayed counts read from the aggregate store (seconds stale — fine); billing/analytics replay the raw log with dedup and fraud filtering (a view ≠ a beacon: minimum watch time, bot detection).
  • The split to name: display counters = eventually consistent; monetized counters = recomputed from the durable log. Same trick as Instagram's likes, plus an audit trail because money.

The long tail (why YouTube's CDN works harder than Netflix's)

Netflix: ~20 K titles, viewing concentrated on a few hundred — pre-push to edges, ~everything served from cache. YouTube: billions of videos, and while the head is hot, a huge fraction of daily watches hit videos requested rarely — the long tail. Edge caches can't hold it all, so:

  • Tiered storage/caching: edges hold the hot head; regional origins hold the warm middle; cold tail lives in cheap object storage (and even cheaper archival tiers) and is pulled through on demand — first viewer in a region pays a slower start.
  • Popularity prediction still pre-positions: a rising video gets pushed outward as it trends, not on a nightly schedule.
  • Cache-hit ratio becomes the metric the infra team lives by; every point of miss-rate is origin bandwidth — real money (caching economics, at maximum stakes).

Comments (briefly, but say it)

Comments on a viral video are a write-heavy, celebrity-shaped problem: attach them to the video (no fan-out), paginate by cursor (never offset at this depth), cache the first page hard (that's 95% of reads), rank asynchronously, and rate-limit per user (rate limiting) because comment spam is an industry.

Think it through like the interview

Think it through: Design YouTubeHLD Classic — UGC video0/5 stages

PROBLEMA video platform where anyone uploads: process and serve globally, count views (creators are paid on them), search, comments. 500 hours uploaded per minute.

  1. 1

    Inherit, then isolate the new

    You've (mentally) designed Netflix. What transfers unchanged, and what are the three NEW problems?

  2. 2

    Durability first, then everything retries

    A 4GB phone video over hotel Wi-Fi. What's the FIRST guarantee the pipeline makes?

    unlocks after the stage above
  3. 3

    Counters that are money

    50K views/sec on one video, and creators are paid per view. Why does UPDATE views+1 fail twice?

    unlocks after the stage above
  4. 4

    Cache for YOUR popularity curve

    Why can't you just do what Netflix does at the CDN — and what's the tiered answer?

    unlocks after the stage above
  5. 5

    Failure modes close the loop

    Transcode backlog. Viral video in minutes. Count pipeline lags. Which one pages someone?

    unlocks after the stage above

5. Bottlenecks & failure modes

  • Transcode backlog (a global event spikes uploads) → queue absorbs; autoscale workers; progressive availability keeps creators calm. Watch path unaffected — the planes are decoupled.
  • A video goes viral in minutes → CDN pull-through warms edges organically; the count pipeline's windowing absorbs the beacon storm; the metadata/first-comment-page caches are the parts that actually feel it.
  • Region loses its origin tier → edges pull from a sibling region; tail latency rises, nothing breaks (replication doing its job).
  • Count pipeline lag → display counts freeze (users shrug); monetization unaffected because it reads the durable log. Decoupling display from settlement is the design insight of counters.

Design drills

UGC video adds an upload pipeline and a brutal popularity curve. Drill both.

Design drills: UGC video (YouTube)0/4 done

Whiteboard each one out loud for 5–10 minutes before you reveal what a strong answer covers — the gap between your sketch and the checklist is your study list. Progress is saved on this device.

Warm-up

Upload → watchable: walk the pipeline for user-generated video.

Core

View counts at billions-of-events scale — exact or approximate?

Core

Caching must match the popularity curve (a few viral videos, a giant cold tail). Design it.

Stretch

Live streaming vs VOD — what changes in the design?