Why this page exists
Every other HLD doc shows a finished architecture. Real systems never start there — they evolve, each component added when (and only when) something measurably broke. This page runs one app — say a BookMyShow-style booking product (whose LLD you've built) — from 100 users to a billion, breaking one thing at a time. Internalize the sequence and two skills appear: you can design by evolution in interviews ("I'd start simple and scale when X breaks" — the strongest possible framing), and you can diagnose real systems by recognizing which stage they're stuck at.
The iron law of the journey: every addition buys headroom and costs complexity — never pay before the bill arrives.
Stage 1 — 100 users: one box
[ Users ] → [ One server: app + PostgreSQL + files ]
Everything on one machine. Deploys are git pull && restart.
This is not a confession — it's correct. A modern server handles
hundreds of concurrent users without noticing; the product is the
risk, not the architecture (microservices doc's
warning starts here).
What breaks next: a deploy or crash takes everything down, and the database competes with the app for RAM.
Stage 2 — 10K users: separate the database, add a spare
[ Users ] → [ App server ] → [ DB server ]
↑ (second app server, behind a load balancer)
Split app and DB onto separate machines (independent resources, independent failure). Put two stateless app servers behind a load balancer — now deploys roll one at a time (zero downtime), and one crash halves capacity instead of zeroing it.
The forcing function nobody warns you about: statelessness. Two app servers means user sessions can't live in app memory anymore — sessions move to the DB or Redis, uploads move to object storage. This hurts once, and every later stage depends on it (scalability fundamentals).
What breaks next: the database, always. Read traffic grows with users; one box's CPU and disk hit their ceiling.
Stage 3 — 100K users: cache + read replicas
[ LB ] → [ App × N ] → [ Redis cache ] → [ Primary DB ]
↓ replication
[ Read replicas × 2 ]
Two additions, in the order they're usually needed:
- Cache (caching): the same shows/seats/profiles are read thousands of times — Redis absorbs 80–95% of reads for ~zero effort. Cost paid: invalidation bugs and the new question "how stale is acceptable, per data type?"
- Read replicas (replication): the primary handles writes; replicas serve reads. Cost paid: replication lag — a user writes, reads a replica, sees old data. You now choose read-your-own-writes strategies per feature. Welcome to consistency trade-offs; you live here now.
What breaks next: anything slow inside the request path (emails, PDFs, image resizing) and traffic spikes.
Stage 4 — 1M users: async work + auto-scaling + CDN
[ CDN ] → [ LB ] → [ App × auto ] → [ cache / DB as above ]
↓ enqueue
[[ Queue ]] → [ Workers × auto ]
- Queues + workers (messaging): every non-essential step leaves the request path — booking confirmation emails, invoices, analytics. Requests get faster and spikes get absorbed. Cost: at-least-once delivery → every consumer must be idempotent; you now debug flows, not call stacks.
- Auto-scaling app/worker fleets (the cloud's killer feature) — capacity follows the daily curve.
- CDN for static assets and images (the Netflix lesson at small scale).
- And the unglamorous one that saves you: observability — metrics, structured logs, alerts (microservices' pillars). Beyond this scale, what you can't see, you can't fix.
What breaks next: the write path. One primary database takes every booking on earth — and at some point, no bigger box exists.
Stage 5 — 10M users: shard the writes (the painful stage)
[ App ] → [ Router ] → [ DB shard: users A–F ]
[ DB shard: users G–M ]
[ DB shard: ... ]
Sharding (databases at scale): partition data across independent databases by a key (user id, or for booking systems, event/venue — recall BookMyShow's natural per-show isolation). Each shard handles a slice of writes; capacity finally scales horizontally on the write path.
This is the journey's most expensive toll — pay it as late as possible: cross-shard queries ("revenue across all venues") stop being SQL and become scatter-gather or analytics pipelines; cross-shard transactions stop existing and become sagas; resharding later is a migration project. Often teams split services first (payments, search, notifications as separate systems with their own stores) precisely to delay sharding the core.
What breaks next: physics and organizations — one region's latency for global users; one codebase for 50 teams.
Stage 6 — 100M+ users: multi-region, cells, and the org chart
[ Mumbai region: full stack ] [ Virginia region: full stack ]
↕ async replication / global routing ↕
- Multi-region: full stacks per geography, users routed to the nearest (latency is physics); data replicated across regions for disaster recovery. The hard part is write coordination — most systems route each user's writes to a home region and replicate asynchronously, accepting cross-region staleness (CAP, now unavoidable).
- Cellular architecture: within a region, independent "cells" (full slices serving a user subset — Uber's cities are natural cells) bound every blast radius.
- The org scales with the system: platform teams, on-call rotations, deployment infrastructure (Level 10) — at this stage Conway's law is an architecture input, and the engineering ladder's staff/principal work is this page's decisions.
The whole machine, assembled
Every box below is labeled with the stage that earned it. Cover the diagram and rebuild it stage by stage — if you can say what broke to justify each box, you own this page:
The journey on one card
| Stage | Users | You add | You pay |
|---|---|---|---|
| 1 | 100 | one box | nothing — correctly |
| 2 | 10K | LB + stateless apps, split DB | sessions must externalize |
| 3 | 100K | cache, read replicas | invalidation, replication lag |
| 4 | 1M | queues/workers, auto-scale, CDN, observability | idempotency, async debugging |
| 5 | 10M | sharding (and/or service splits) | no cross-shard SQL/transactions |
| 6 | 100M+ | multi-region, cells | eventual consistency, org design |
Read the table backwards for diagnosis: "we have replication-lag bugs" → you're a Stage 3 system; "our deploys block each other" → a Stage 5–6 organization on Stage 4 architecture.
Common mistakes
- Building Stage 5 on day one — sharded microservices for zero users: all the toll, none of the traffic. The startup graveyard's favorite headstone.
- Skipping statelessness at Stage 2 — every later stage assumes it; retrofitting it at Stage 4 under load is misery.
- Caching before measuring — cache what's measured hot; speculative caches are invalidation bugs with no payoff.
- Sharding before splitting services — often the write-hot table is one domain (bookings); extracting that service buys years before the core needs sharding.
- Treating the table as a schedule — user count is shorthand; the real triggers are measured: p99 latency, replication lag, queue depth, primary CPU. Scale when the metric says so, not the milestone.
Design drills
Each rung of the journey is a decision with a cost. Practise naming both.
Whiteboard each one out loud for 5–10 minutes before you reveal what a strong answer covers — the gap between your sketch and the checklist is your study list. Progress is saved on this device.
At 1k users everything is one box. What's the first thing that breaks as you grow, and the first lever you pull?
Reads dominate at scale. Lay out the read-scaling ladder and the catch on each rung.
Writes now exceed one primary. You must shard. What dies the moment you do?
Which decisions should you defer as long as possible on this journey, and why?