Design Netflix

Video streaming at planet scale: the transcode pipeline, why the CDN is 95% of the design, adaptive bitrate, and the metadata/play split.

system-designvideocdncase-study

Prompt

Design a video streaming service: catalog browsing, smooth playback on any device/network, resumable progress. Hundreds of millions of subscribers worldwide.

1. Requirements

Functional: browse/search catalog; play with adaptive quality; resume across devices; (content ingestion for the studio side). Non-functional: instant start (under 2 s), no rebuffering — smoothness beats sharpness; works from fiber to 3G; global; cost-sane (video bytes are the bill).

Scope cut: "recommendations and DRM acknowledged but not deep-dived; focus on the video path — ingest → encode → distribute → play — OK?"

2. Estimation (shapes the design)

~300 M subscribers, ~100 M daily watchers × ~1 h/day. A 1080p stream ≈ 5 Mbps → ~10 M concurrent streams at peak ≈ 50 Tbps egress. Compare: the catalog is maybe ~20 K titles × ~50 GB across formats ≈ 1 PB — static. So the workload is: a near-static petabyte read by everyone, everywhere, at once — read-heavy in the most extreme form, ~zero writes on the hot path. The conclusion writes itself: this is a distribution problem. Almost nothing else in the design matters if the bytes don't sit near the viewers (latency numbers).

3. High-level design

Two planes that share almost nothing:

The split to announce: the control plane (browse, search, auth, progress — small JSON, normal scalable web service) and the data plane (video bytes — CDN only). API servers never touch video; the CDN never makes decisions.

4. Deep dives

Transcode pipeline (write path — offline, embarrassingly parallel)

One studio master becomes an encoding matrix: resolutions (4K → 240p) × codecs (H.264 for compatibility, HEVC/AV1 for efficiency) × audio/subs — dozens of renditions. The trick that makes it fast: chunk the source (~shot-aligned segments), encode chunks in parallel across thousands of workers, reassemble — a multi-hour film transcodes in minutes. Orchestrate with queues; a failed chunk retries without restarting the title (idempotent workers — the ingestion-pipeline playbook). This plane has no latency requirement, which is what lets it be cheap.

Segments + adaptive bitrate (the playback magic)

Every rendition is cut into ~4-second segments; a manifest (HLS/DASH) lists every segment at every quality. The player:

  1. Fetches the manifest, starts on a conservative quality → fast start.
  2. Measures actual download throughput continuously.
  3. Picks each next segment's quality from the measured bandwidth — shifting up on good network, down before the buffer empties.

ABR (adaptive bitrate) is why playback survives a train ride: quality degrades gracefully instead of pausing. The intelligence is in the client; servers just serve files. Segments also make seeking trivial (jump = fetch segment at offset) and caching perfect (immutable files, cache forever).

The CDN (95% of the answer)

Serving 50 Tbps from origin data centers is impossible-shaped: the solution is edge caches — and Netflix's famous move, Open Connect: cache servers placed physically inside ISP networks, so video bytes often never cross the public internet at all. Because the catalog is finite and popularity is predictable (a Pareto few titles = most watching), edges pre-load popular content during off-peak hours — push-based caching, not just pull-on-miss (caching). "Play" then means: control plane authenticates, returns manifest with URLs pointing at the nearest healthy edge; player streams segments from metres-away storage. Cache miss → edge pulls from a regional tier → origin (rare).

Progress & resume (the one hot write)

The single hot-path write: a tiny (user, title, position) heartbeat every few seconds × 10 M streams ≈ ~2 M writes/sec — small values, perfect for a write-optimized KV store (Cassandra-class); losing 10 s of progress is harmless, so it can be async/batched (eventual consistency where it's free). Resume-on-another-device = read your last heartbeat. This contrast — 50 Tbps of reads vs 2 M trivial writes — is the system in one sentence.

Think it through like the interview

Think it through: Design NetflixHLD Classic — distribution0/5 stages

PROBLEMVideo streaming: catalog browsing, smooth playback on any device and network, resumable progress. Hundreds of millions of subscribers worldwide.

  1. 1

    Let the estimate write the conclusion

    ~10M concurrent 5 Mbps streams vs a ~1 PB near-static catalog. What kind of problem is this?

  2. 2

    Two planes that share nothing

    Where's the boundary that organizes the whole architecture?

    unlocks after the stage above
  3. 3

    The write path is offline — exploit it

    One studio master → dozens of renditions. What property of this pipeline makes it cheap?

    unlocks after the stage above
  4. 4

    ABR: smoothness beats sharpness

    Why does playback survive a train ride — and why MUST the logic live in the client?

    unlocks after the stage above
  5. 5

    Invert the scary scenario

    New-release night: 100M people press play on ONE title at 8 PM. Disaster?

    unlocks after the stage above

5. Bottlenecks & failure modes

  • New-release night — everyone plays one title at 8 PM: pre-pushed to every edge beforehand; this is the easiest load, not the hardest (one hot, fully-cached file).
  • An edge cluster dies → manifests/redirects steer players to the next-nearest edge; quality may dip (longer path), playback continues — ABR absorbs infrastructure failure too.
  • Control plane down ≠ video down — active sessions keep streaming from edges (manifest in hand); you can't start new sessions. Decoupled planes shrink every blast radius.
  • The recommendation row is precomputed offline per user (batch pipeline) and cached — homepage loads read a prebuilt list, no live ML on the hot path. One sentence of this acknowledges the topic and moves on.

Design drills

Streaming is a CDN + an encoding pipeline + a control/data split. Drill each.

Design drills: Video streaming (Netflix)0/4 done

Whiteboard each one out loud for 5–10 minutes before you reveal what a strong answer covers — the gap between your sketch and the checklist is your study list. Progress is saved on this device.

Warm-up

Deliver video to millions of viewers without melting your origin servers. What's the spine?

Core

Design the encoding/transcoding pipeline from a master file to something streamable.

Core

Why split the control plane from the data (video) plane, and where's the seam?

Stretch

The progress/resume heartbeat is ~1–2M tiny writes/sec. How do you store it?