Databases at Scale

SQL vs NoSQL with a real reason, leader-follower replication, the four sharding strategies, and the hotspot/rebalancing traps that consistent hashing solves.

databasesshardingreplicationconsistent-hashing

SQL vs NoSQL — pick with a reason

  • Relational (Postgres/MySQL): strong schema, ACID transactions, joins, rich queries. Default when you have relationships and invariants (users, orders, balances). All three of my backends use Postgres.
  • NoSQL: document (Mongo), key-value (Redis/DynamoDB), wide-column (Cassandra), graph. Choose for a specific access pattern at scale — massive write throughput, flexible schema, or simple key lookups — accepting weaker guarantees.
The honest answer

"SQL unless I have a reason." Most apps are well served by Postgres until a clear pressure (write volume, scale-out, schema-flexibility, or a key-value access pattern) justifies a specialized store. Saying that signals maturity.

Replication — scale reads, survive failures

Leader-follower: writes go to the leader, which streams its log to followers that serve reads.

Wins: read scale + availability (promote a follower if the leader dies). Catch: replication lag → a follower read can be stale (a read-your-writes problem). Fixes: route a user's own reads to the leader briefly, or read from the leader for must-be-fresh queries.

Sharding — when one box can't hold the writes/data

Partition data across nodes by a shard key:

StrategyHowWatch out for
Rangeby key ranges (A–M, N–Z)hotspots on sequential keys (timestamps!)
Hashhash(key) % Nresharding moves almost everything when N changes
Consistent hashingkeys + nodes on a ringthe standard; minimal movement on resize
Directorya lookup service maps key→shardflexible, but the directory is a SPOF

Consistent hashing is the one to know: adding/removing a node only remaps the keys between adjacent points on the ring (≈ 1/N of keys), not the whole dataset — and virtual nodes smooth out skew.

Here's why it beats hash(key) % N. Each key walks clockwise to the next node; when you add a node, only the keys in its new arc move — watch the "keys moved" counter when N4 appears, versus the near-total remap that % N would force:

Consistent hashing — add a node, move ~1/N keystime O(log N) lookupspace O(N + keys)
user1cart9ord42img7sess3u88N1N2N3

1/83 nodes on the ring. Each key belongs to the next node clockwise from its hash position — that's the whole rule.

nodes = 3
Choosing the shard key is the whole game

A bad key creates hotspots (one shard gets all the traffic) and cross-shard queries/joins (slow, hard to make transactional). Pick a key with high cardinality and even access — and design so most queries hit a single shard.

Indexing & denormalization (one line each)

An index turns an O(n) scan into an O(log n) B-tree lookup at the cost of write speed and space. Denormalization duplicates data to avoid expensive joins on the read path — you trade write complexity (keep copies in sync) for read speed.

Design drills

You can name the strategies — now make the calls a scaling system forces on you.

Design drills: Databases at scale0/5 done

Whiteboard each one out loud for 5–10 minutes before you reveal what a strong answer covers — the gap between your sketch and the checklist is your study list. Progress is saved on this device.

Warm-up

Reads are crushing your single Postgres box, but writes are fine. What's the first move — and what new problem does it create?

Core

Writes now exceed one leader. Pick a shard key for an orders table and defend it against hotspots and cross-shard queries.

Core

You sharded with hash(key) % N. Now you must add a node. What breaks, and what should you have used?

Stretch

The leader dies. Walk through failover and the consistency risks.

Stretch

Pick SQL or NoSQL for (a) accounts + balances, (b) a 10M-row/day event firehose, (c) a product catalog with flexible attributes — and justify each.