Why Transactions Exist
Without transactions, concurrent database operations produce corrupted data:
The bank transfer problem — without transactions:
Thread A: Read Alice's balance: $500
Thread B: Read Alice's balance: $500
Thread A: Alice - $100 → Write $400
Thread B: Alice - $100 → Write $400 ← Overwrites A's write! Alice loses $100.
A transaction groups multiple operations into an atomic unit: all succeed or all fail, no half-states.
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1; -- debit Alice
UPDATE accounts SET balance = balance + 100 WHERE id = 2; -- credit Bob
-- If anything fails here, ROLLBACK undoes both updates
COMMIT;
The ACID Properties (Deep Dive)
Atomicity
All operations in a transaction complete, or none do. PostgreSQL uses a Write-Ahead Log (WAL):
- Before changing data on disk, write the change to the WAL
- On crash: replay WAL to redo committed transactions, undo uncommitted ones
- Application never sees partial state
Consistency
The database moves from one valid state to another. All constraints are enforced atomically:
BEGIN;
INSERT INTO orders (user_id, total) VALUES (1, 150.00);
-- FK check: does user_id=1 exist? If not, ROLLBACK entire transaction
-- CHECK constraint: total >= 0? If not, ROLLBACK
-- UNIQUE constraint: duplicate order_number? ROLLBACK
COMMIT;
Isolation
Concurrent transactions behave as if they ran sequentially. The level of isolation is configurable — more isolation = more safety but less throughput.
Durability
Once committed, data survives crashes. PostgreSQL achieves this via:
- WAL written to disk before
COMMITreturns fsync()ensures WAL is truly flushed (not just OS buffer)- Checkpoint process periodically writes WAL changes to actual data files
Concurrency Anomalies
Understanding what can go wrong at each isolation level:
Dirty Read
Transaction A starts, updates row X (not committed)
Transaction B reads row X → gets A's uncommitted value
Transaction A rolls back
Transaction B now has data that never existed
Prevented at: Read Committed and above.
Non-Repeatable Read
Transaction A reads row X → gets value 100
Transaction B commits: UPDATE row X to 200
Transaction A reads row X again → gets 200 (different!)
Prevented at: Repeatable Read and above.
Phantom Read
Transaction A queries: SELECT * FROM orders WHERE status = 'pending'
Transaction B commits: INSERT INTO orders (status) VALUES ('pending')
Transaction A runs the same query → gets additional row (a "phantom")
Prevented at: Serializable only (PostgreSQL's Repeatable Read also prevents phantoms via MVCC).
Lost Update
Transaction A reads balance: 100
Transaction B reads balance: 100
Transaction A writes: balance + 50 = 150
Transaction B writes: balance + 30 = 130 ← A's update is lost!
Prevented by: proper locking or optimistic concurrency control.
Isolation Levels in Practice
-- Set isolation level for the current transaction
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED; -- PostgreSQL default
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- Or set session default
SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ;
Production guidance:
- Read Committed (PostgreSQL default): sufficient for most CRUD operations. Prevents dirty reads.
- Repeatable Read: use when a transaction reads the same data multiple times and must see consistent values. Financial reporting, generating documents.
- Serializable: use when correctness requires full isolation. Complex financial operations, inventory management. Has overhead — serialization errors possible, requires retry logic.
Locking — Two Strategies
Pessimistic Locking — "Lock before you work"
Acquire a lock before reading/writing. Other transactions must wait.
-- SELECT FOR UPDATE — locks selected rows until transaction ends
BEGIN;
SELECT * FROM orders WHERE id = 42 FOR UPDATE;
-- Other transactions trying to touch order 42 will BLOCK here
UPDATE orders SET status = 'processing' WHERE id = 42;
COMMIT;
-- Lock released at COMMIT
-- SELECT FOR SHARE — allows concurrent reads, blocks writes
BEGIN;
SELECT balance FROM accounts WHERE id = 1 FOR SHARE;
-- Other transactions can read balance concurrently
-- But can't UPDATE balance until we COMMIT
COMMIT;
-- SKIP LOCKED — skip rows already locked (great for job queues)
BEGIN;
SELECT * FROM job_queue WHERE status = 'pending'
ORDER BY priority DESC
LIMIT 1
FOR UPDATE SKIP LOCKED; -- multiple workers take different jobs, no waiting
UPDATE job_queue SET status = 'processing', worker_id = $1 WHERE id = ...;
COMMIT;
Optimistic Locking — "Work freely, check for conflicts at commit"
No locks during work. At commit, verify nobody else changed the data.
-- Version-based optimistic locking
-- Table has a `version` column (integer, starts at 1)
-- 1. Read with version
SELECT id, balance, version FROM accounts WHERE id = 1;
-- Returns: id=1, balance=500, version=3
-- 2. Do work (in application layer, no DB lock held)
-- 3. Update only if version hasn't changed
UPDATE accounts
SET balance = 450, version = version + 1
WHERE id = 1 AND version = 3;
-- Returns rowcount=1: success → proceed
-- Returns rowcount=0: someone else updated first → retry the whole operation
-- In ORM (TypeORM example):
@Column()
@VersionColumn()
version: number;
// TypeORM automatically adds WHERE id=? AND version=? to every UPDATE
Choose pessimistic when: contention is high, retries are expensive, operations are long. Choose optimistic when: contention is low, retries are cheap, reads are far more common than writes.
Table-Level vs Row-Level Locks
-- Row-level locks (default for DML — only locks specific rows)
UPDATE users SET name = 'Alice' WHERE id = 1; -- only locks row id=1
-- Table-level locks (explicit, blocks all access to the table)
LOCK TABLE users IN EXCLUSIVE MODE;
-- Rarely needed — DDL statements (ALTER TABLE) do this automatically
-- Advisory locks — application-level named locks (not tied to data)
SELECT pg_advisory_lock(12345); -- acquire lock named "12345"
-- ... critical section ...
SELECT pg_advisory_unlock(12345); -- release
-- Use case: prevent two processes from running the same job simultaneously
Deadlocks
A deadlock occurs when two transactions each hold a lock the other needs:
Transaction A: Lock order 1, then try to lock order 2 → WAITING for B
Transaction B: Lock order 2, then try to lock order 1 → WAITING for A
Both wait forever → DEADLOCK
PostgreSQL automatically detects deadlocks and aborts one transaction (the "victim") with error ERROR: deadlock detected. Your application must catch this and retry.
import psycopg2
from time import sleep
import random
def transfer_with_retry(from_id, to_id, amount, max_retries=3):
for attempt in range(max_retries):
try:
with connection.cursor() as cur:
cur.execute("BEGIN")
# Lock in consistent order (always lower ID first) — prevents deadlocks
low, high = min(from_id, to_id), max(from_id, to_id)
cur.execute("SELECT balance FROM accounts WHERE id IN (%s, %s) FOR UPDATE",
(low, high))
cur.execute("UPDATE accounts SET balance = balance - %s WHERE id = %s",
(amount, from_id))
cur.execute("UPDATE accounts SET balance = balance + %s WHERE id = %s",
(amount, to_id))
cur.execute("COMMIT")
return # success
except psycopg2.errors.DeadlockDetected:
cur.execute("ROLLBACK")
if attempt == max_retries - 1:
raise
sleep(random.uniform(0.1, 0.5)) # jitter before retry
Deadlock prevention strategies:
- Consistent lock ordering — always acquire locks in the same order (e.g., by ID ascending)
- Short transactions — hold locks for the minimum time
- Lock all resources upfront — acquire all needed locks at transaction start
- Retry logic — deadlocks are expected; handle them gracefully
Two-Phase Locking (2PL)
The theoretical framework behind pessimistic locking:
Growing phase: Transaction acquires locks, never releases
Shrinking phase: Transaction releases locks, never acquires new ones
Strict 2PL: All locks released only at COMMIT/ROLLBACK
(this is what PostgreSQL actually uses)
Strict 2PL guarantees serializability — transactions appear to have run one at a time.
MVCC vs Locking
PostgreSQL combines MVCC with locking:
READS → MVCC (see a snapshot, never blocked by writers)
WRITES → Lock the specific rows being modified
This means:
SELECT never waits for UPDATE on the same row
UPDATE waits if another UPDATE already locks the row
-- This query NEVER blocks (MVCC snapshot):
SELECT * FROM orders; -- sees snapshot from transaction start
-- This query BLOCKS if another transaction has a lock on id=42:
UPDATE orders SET status = 'done' WHERE id = 42;
Savepoints
BEGIN;
INSERT INTO users (email) VALUES ('alice@example.com');
SAVEPOINT sp_user;
INSERT INTO profiles (user_id, bio) VALUES (LASTVAL(), 'About me');
-- Something goes wrong with profile creation:
ROLLBACK TO SAVEPOINT sp_user; -- undo profile insert, keep user insert
-- Try again with different data:
INSERT INTO profiles (user_id, bio) VALUES (LASTVAL(), 'Default bio');
COMMIT; -- user + new profile both committed
Common Interview Questions
Practice
- Lost Update: Simulate the lost update problem — two concurrent Python threads both read a counter, increment, and write back. Show the bug, then fix it with
FOR UPDATElocking. - Optimistic Locking: Implement a version-based optimistic lock in Node.js — show a conflict, handle it with a retry, and test with two concurrent requests.
- Deadlock: Deliberately create a deadlock scenario with two transactions that lock resources in opposite orders. Capture the error and implement retry logic.
- SKIP LOCKED Queue: Implement a job queue with PostgreSQL where multiple worker processes can pull jobs concurrently without conflicts, using
SELECT FOR UPDATE SKIP LOCKED.
This covers Level 9 — Databases. Next: Cloud & DevOps for Level 10.