SQL vs NoSQL — pick with a reason
- Relational (Postgres/MySQL): strong schema, ACID transactions, joins, rich queries. Default when you have relationships and invariants (users, orders, balances). All three of my backends use Postgres.
- NoSQL: document (Mongo), key-value (Redis/DynamoDB), wide-column (Cassandra), graph. Choose for a specific access pattern at scale — massive write throughput, flexible schema, or simple key lookups — accepting weaker guarantees.
"SQL unless I have a reason." Most apps are well served by Postgres until a clear pressure (write volume, scale-out, schema-flexibility, or a key-value access pattern) justifies a specialized store. Saying that signals maturity.
Replication — scale reads, survive failures
Leader-follower: writes go to the leader, which streams its log to followers that serve reads.
Wins: read scale + availability (promote a follower if the leader dies). Catch: replication lag → a follower read can be stale (a read-your-writes problem). Fixes: route a user's own reads to the leader briefly, or read from the leader for must-be-fresh queries.
Sharding — when one box can't hold the writes/data
Partition data across nodes by a shard key:
| Strategy | How | Watch out for |
|---|---|---|
| Range | by key ranges (A–M, N–Z) | hotspots on sequential keys (timestamps!) |
| Hash | hash(key) % N | resharding moves almost everything when N changes |
| Consistent hashing | keys + nodes on a ring | the standard; minimal movement on resize |
| Directory | a lookup service maps key→shard | flexible, but the directory is a SPOF |
Consistent hashing is the one to know: adding/removing a node only remaps the keys between adjacent points on the ring (≈ 1/N of keys), not the whole dataset — and virtual nodes smooth out skew.
Here's why it beats hash(key) % N. Each key walks clockwise to the next node;
when you add a node, only the keys in its new arc move — watch the "keys moved"
counter when N4 appears, versus the near-total remap that % N would force:
1/83 nodes on the ring. Each key belongs to the next node clockwise from its hash position — that's the whole rule.
A bad key creates hotspots (one shard gets all the traffic) and cross-shard queries/joins (slow, hard to make transactional). Pick a key with high cardinality and even access — and design so most queries hit a single shard.
Indexing & denormalization (one line each)
An index turns an O(n) scan into an O(log n) B-tree lookup at the cost of write speed and space. Denormalization duplicates data to avoid expensive joins on the read path — you trade write complexity (keep copies in sync) for read speed.
Design drills
You can name the strategies — now make the calls a scaling system forces on you.
Whiteboard each one out loud for 5–10 minutes before you reveal what a strong answer covers — the gap between your sketch and the checklist is your study list. Progress is saved on this device.
Reads are crushing your single Postgres box, but writes are fine. What's the first move — and what new problem does it create?
Writes now exceed one leader. Pick a shard key for an orders table and defend it against hotspots and cross-shard queries.
You sharded with hash(key) % N. Now you must add a node. What breaks, and what should you have used?
The leader dies. Walk through failover and the consistency risks.
Pick SQL or NoSQL for (a) accounts + balances, (b) a 10M-row/day event firehose, (c) a product catalog with flexible attributes — and justify each.