DBMS — Transactions, Isolation & Indexes

ACID, the isolation levels and the anomalies they prevent, how B-tree indexes really work (and when they hurt), and normalization vs denormalization.

ACID

Atomicity — all-or-nothing; a partial transaction rolls back.
Consistency — a transaction moves the DB from one valid state to another (constraints hold).
Isolation — concurrent transactions don't see each other's partial work.
Durability — once committed, it survives a crash (write-ahead log).

StockStump's balance decrement needs all four — which is why the fix for the double-spend was a single atomic, isolated UPDATE.

Isolation levels & the anomalies they stop

Level	Dirty read	Non-repeatable read	Phantom
Read Uncommitted	✅ possible	✅	✅
Read Committed	❌	✅	✅
Repeatable Read	❌	❌	✅ (varies)
Serializable	❌	❌	❌

Dirty read: you read another transaction's uncommitted change.
Non-repeatable read: you read a row twice and get different values (someone committed an update between).
Phantom: you re-run a range query and new rows appear.

Higher isolation = more correctness, less concurrency. Read Committed is the common default; bump to Serializable only for the operations that need it.

An index is usually a B-tree: ordered, balanced, O(log n) lookups, and great for range scans and ORDER BY. It speeds reads but slows writes (every insert/update maintains the tree) and costs space.

Composite index follows the leftmost-prefix rule: an index on (a, b) helps queries filtering on a or a,b — not on b alone.
Covering index: if the index contains every column a query needs, the DB answers from the index without touching the row ("index-only scan").
A hash index is O(1) for equality but useless for ranges.

Indexes aren't free

Over-indexing slows writes and wastes space; an index on a low-cardinality column (e.g. a boolean) barely helps because it can't narrow the scan. Index the columns you actually filter/sort/join on, and measure with EXPLAIN.

Normalization vs denormalization

Normalize to remove redundancy and update anomalies (3NF: every non-key column depends on the key, the whole key, and nothing but the key). Denormalize (duplicate data) to avoid expensive joins on read-heavy paths — trading write complexity for read speed. OLTP leans normalized; analytics/read models lean denormalized.