Prompt
Design cloud file storage with sync: upload/download any file, edit on one device and see it everywhere, share with permissions, restore old versions. Billions of files, exabytes.
1. Requirements
Functional: upload/download files of any type/size; automatic sync across a user's devices; folder hierarchy; sharing with view/edit permissions; version history & trash. Non-functional: never lose a byte (durability is the product — eleven nines, not five); sync feels near-instant for small edits; bandwidth-efficient (users edit 5 MB of a 2 GB file; don't re-upload 2 GB); offline edits must reconcile.
The framing: "Unlike feeds or video, the hard parts here are state reconciliation — many devices, one truth — and storage economics. The deep dives are chunking, delta sync, and conflicts."
2. Estimation (shapes the design)
1 B users × ~15 GB average = exabyte scale, write-heavy in bursts (a photo library import) but read-light per file (most files are written once, read rarely — the cold opposite of YouTube's hot head). Metadata: ~1 B users × thousands of files = trillions of metadata rows, hit on every sync check — the metadata store is the hot path; the blob store is the big path. Dedup matters at this scale: millions of users store the same PDF/installer/meme — content-addressing turns those into one stored copy.
3. High-level design
The structural split to announce: metadata (small, hot, relational, consistent) and blocks (huge, cold, immutable, dumb) never share a system. Every operation talks to metadata first; bytes move only when strictly necessary.
4. Deep dives
Chunking + content addressing (the storage engine)
Files are split into chunks (~4–8 MB). Each chunk is named by the hash of its content (SHA-256) — content addressing:
- Upload protocol: client hashes its chunks and sends the hash list to the metadata service, which answers "I already have chunks 3, 7, 19 — send the rest." A file the system has ever seen — from any user — uploads instantly (cross-user dedup: one stored copy of every popular installer on earth, reference-counted).
- A "file" is metadata: name, folder, version, and an ordered chunk-hash list. Versions are nearly free: editing one chunk of a 2 GB file creates a new chunk + a new chunk-list that shares every other entry with the old version — version history is structural sharing, not copies (the same trick as git).
- Chunks are immutable — never edited, only created and garbage-collected when no version references them (reference counting, at exabyte scale). Immutability is what makes the blob tier "dumb": cacheable, replicable, no coordination.
The upload protocol in motion — note how little actually moves:
Delta sync (the bandwidth engine)
Edit 5 MB in the middle of a 2 GB file → re-hash chunks → only the changed chunks have new hashes → upload only those. One subtlety earns senior points: with fixed-size chunks, inserting one byte at the front shifts every boundary — every chunk hash changes, delta sync collapses. Fix: content-defined chunking — boundaries chosen by a rolling hash over the content (split where the hash matches a pattern), so an insertion only disturbs the chunk it lands in; boundaries downstream re-align. The rolling hash you met in Rabin-Karp is literally what rsync and Drive-class systems run on.
Sync & notification (the freshness engine)
Each device tracks a cursor (last seen change-sequence per user namespace). The metadata service maintains an ordered change log; devices hold a lightweight connection (long poll or WebSocket) and get poked when the log advances, then pull changes after their cursor and fetch any missing chunks. Offline devices just have old cursors — reconnection is the same code path as a missed notification (design the offline path as primary).
Conflicts (the part interviewers actually probe)
Two devices edit the same file while one is offline. On reconnect, device B uploads a new version whose parent version is stale — the metadata service detects the fork (B's parent ≠ current head):
- Never silently merge, never silently drop. Drive-class systems keep B's upload as a conflicted copy ("report (Asha's MacBook, conflicted).docx") alongside the winner — both versions durable, a human resolves. This is optimistic concurrency with "make a copy" as the conflict handler, the only safe default for opaque binary files.
- Folder-level races (rename vs delete, move into deleted folder) are resolved by ordering through the change log — last-writer-wins on metadata with the loser preserved in trash/history, because the history makes every resolution reversible.
- Real-time co-editing (Google Docs' character-level merging — OT/CRDT) is a different product on a different data model; name it as out-of-scope — knowing the boundary is the point.
Sharing & permissions
ACLs live in metadata: (subject, file/folder, role). The design wrinkle is inheritance — permission checks walk up the folder tree, so deep hierarchies cache effective permissions per (user, subtree) with invalidation on ACL change. Every block fetch is authorized via short-lived signed URLs minted by the metadata layer — the blob store itself stays dumb and trusts nobody (the API-gateway instinct).
Think it through like the interview
PROBLEMCloud file storage with sync: any file, edit on one device → appears everywhere, sharing with permissions, version history. Billions of files, exabytes. Never lose a byte.
- 1
Split metadata from blocks — first and loudly
“Exabytes of file bytes, trillions of tiny rows about them. One system or two?”
- 2
Content addressing does three jobs
“Chunks named by SHA-256 of their content. What falls out of that one decision?”
unlocks after the stage above - 3
Delta sync + the boundary subtlety
“Edit 5 MB inside a 2 GB file on hotel Wi-Fi. What uploads — and what breaks with fixed-size chunks?”
unlocks after the stage above - 4
Conflicts: never merge, never drop
“Two devices edited the same file, one offline. The fork is detected (stale parent version). Resolve it.”
unlocks after the stage above - 5
Sync freshness + the cowardly GC
“How do other devices find out, and what's the scariest background job in the system?”
unlocks after the stage above
5. Bottlenecks & failure modes
- Metadata DB is the hot path — shard by user/namespace; shared folders cross shards, so sharing metadata gets its own service. The blob tier scales by being boring.
- Durability story (the product promise): chunks replicated across zones + erasure-coded across regions; metadata backed up independently — losing the map to the bytes is losing the bytes.
- Mass-change storms (user renames a folder of 100 K files; a team's bot rewrites everything) → change-log batching, per-account rate limits, and clients that coalesce filesystem events before syncing.
- The GC must be cowardly: reference-counting bugs that delete a live chunk are catastrophic and silent; real systems garbage-collect with delays, tombstones and audits — slow deletion is a feature (the BookMyShow TTL philosophy, inverted).
Design drills
File storage is metadata/data separation + reliable upload + sync. Drill each.
Whiteboard each one out loud for 5–10 minutes before you reveal what a strong answer covers — the gap between your sketch and the checklist is your study list. Progress is saved on this device.
Why is separating metadata from file bytes the core design decision here?
Upload a 5 GB file reliably over a flaky connection.
The same file is edited on two devices offline, then both sync. Resolve the conflict.
Deduplicate storage across users who upload the same file, without leaking who has what.