PostgreSQL & MySQL

Production SQL databases — ACID guarantees, storage engines, connection pooling, EXPLAIN plans, and the differences that matter in interviews.

Starting from Zero — A Physical Intuition

Before comparing storage engines, let's visualize PostgreSQL and MySQL through physical analogies:

PostgreSQL as the Photocopy Vault (MVCC): Imagine an office where documents are stored in a cabinet. When an employee wants to update a client's address, they don't erase the old text. Instead, they write the new address on a photocopy of the file and put it in the drawer.
- The Reader: A coworker who started reading the old file five minutes ago continues to read their copy undisturbed.
- The Writer: The writer puts their new copy in the drawer. Once approved (committed), any new coworker looking at the drawer will get the new photocopy. This is Multi-Version Concurrency Control (MVCC): readers and writers never block each other.
MySQL as the Bound Catalog (Clustered Indexing): Think of a thick dictionary. The pages are physically printed and bound alphabetically by word (the Primary Key). If you want to search by page number or definition length, you check a separate index list in the back (Secondary Index). This list doesn't contain the full page data; it just points you back to the word (Primary Key), requiring you to turn to that page to read the definition.

PostgreSQL vs MySQL — The Big Picture

Both are open-source relational databases that implement SQL. The choice shapes your architecture.

	PostgreSQL	MySQL (InnoDB)
License	PostgreSQL (fully open)	GPL (Oracle owns it; MariaDB is the open fork)
ACID	Full ACID, all isolation levels	Full ACID (InnoDB engine only)
JSON	`JSONB` — binary, fully indexable	JSON type, limited indexing
Full-text search	Built-in, good	Built-in, basic
Advanced types	Arrays, hstore, custom types, enums	Limited
Window functions	Full support	Full support (5.7+)
Replication	Streaming, logical, synchronous	Binary log, GTID-based
Concurrency	MVCC (never locks for reads)	MVCC (InnoDB)
Use it for	Complex queries, analytics, PostGIS	High write throughput, simpler queries
Used by	Instagram, GitHub, Heroku, Notion	Twitter (historically), Airbnb, Uber (now PG)

FAANG preference: Most modern FAANG companies default to PostgreSQL for new systems. MySQL remains common in older stacks.

ACID — The Four Guarantees

Every production SQL database guarantees ACID:

A — Atomicity:   A transaction is all-or-nothing.
                 If any step fails, the entire transaction is rolled back.

C — Consistency: A transaction moves the database from one valid state to another.
                 All constraints (NOT NULL, UNIQUE, FK) are enforced.

I — Isolation:   Concurrent transactions see consistent data.
                 No dirty reads, no partial results from other transactions.

D — Durability:  Once committed, data survives crashes.
                 PostgreSQL writes to WAL (Write-Ahead Log) before confirming commit.

-- ACID in practice: transfer $100 from Alice to Bob
BEGIN;

UPDATE accounts SET balance = balance - 100 WHERE id = 1;  -- debit Alice
UPDATE accounts SET balance = balance + 100 WHERE id = 2;  -- credit Bob

-- If anything goes wrong between these two:
COMMIT;    -- both changes saved
ROLLBACK;  -- both changes undone

Without atomicity, a crash between the two UPDATEs would debit Alice but never credit Bob — money vanishes.

Transactions Deep Dive

-- Transaction syntax (PostgreSQL / MySQL identical)
BEGIN;                            -- or START TRANSACTION
  INSERT INTO orders (user_id, total) VALUES (1, 99.99);
  UPDATE inventory SET qty = qty - 1 WHERE product_id = 42;
COMMIT;

-- Savepoints — partial rollback within a transaction
BEGIN;
  INSERT INTO orders ...;
  SAVEPOINT sp1;
  INSERT INTO order_items ...;
  -- Something fails:
  ROLLBACK TO SAVEPOINT sp1;      -- undo order_items, keep orders
  -- Try alternative:
  INSERT INTO order_items ...;    -- different attempt
COMMIT;

Isolation Levels

Isolation levels trade consistency for concurrency. Higher isolation = safer but slower.

READ UNCOMMITTED → READ COMMITTED → REPEATABLE READ → SERIALIZABLE
      (fastest)                                           (slowest)

Phenomenon	Read Uncommitted	Read Committed	Repeatable Read	Serializable
Dirty Read	❌ Can happen	✅ Prevented	✅ Prevented	✅ Prevented
Non-repeatable Read	❌ Can happen	❌ Can happen	✅ Prevented	✅ Prevented
Phantom Read	❌ Can happen	❌ Can happen	❌ Can happen (PG: ✅)	✅ Prevented

Dirty Read: reading data another transaction hasn't committed yet (could roll back)
Non-repeatable Read: reading the same row twice gets different values (another transaction updated it between reads)
Phantom Read: re-running a query returns different rows (another transaction inserted/deleted)

-- PostgreSQL defaults to READ COMMITTED
-- MySQL InnoDB defaults to REPEATABLE READ

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN;
-- ... everything here runs as if no other transactions exist
COMMIT;

Production default: Read Committed (PostgreSQL default) is sufficient for most applications. Use Serializable only when correctness requires it — it's slow and can fail with serialization errors that your app must retry.

MVCC — Multi-Version Concurrency Control

Both PostgreSQL and MySQL (InnoDB) use MVCC — the key to non-blocking reads:

Instead of locking a row when writing, MVCC keeps multiple versions:
- Writers create a NEW version of the row
- Readers see the version that was committed when their transaction STARTED
- Reads and writes never block each other

Result: read queries NEVER wait for write locks
        write queries only conflict with other write queries on the same row

This is why SELECT is always fast even under heavy writes — it reads an older snapshot.

Vacuum in PostgreSQL: Dead row versions must be cleaned up. PostgreSQL runs autovacuum to reclaim space. Neglecting vacuum on high-churn tables causes table bloat and performance degradation — a real production concern.

Indexing (Introduction)

Indexes dramatically speed up queries at the cost of slower writes and storage:

-- Default B-tree index (binary search tree)
CREATE INDEX idx_users_email ON users(email);
-- SELECT * FROM users WHERE email = 'a@b.com';  → O(log n) instead of O(n)

-- Composite index — column order matters
CREATE INDEX idx_posts_author_date ON posts(author_id, created_at DESC);
-- Useful for: WHERE author_id = X ORDER BY created_at DESC

-- Partial index — index a subset of rows
CREATE INDEX idx_active_users ON users(email) WHERE is_active = true;
-- Smaller index, faster for queries that always filter on is_active = true

-- Unique index (enforced as constraint)
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);

-- Check index usage
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
ORDER BY idx_scan;
-- idx_scan = 0: index never used → consider dropping it

Full indexing deep-dive: Indexing & Optimization.

EXPLAIN — Understanding Query Plans

EXPLAIN is the most important tool for database performance debugging:

-- Basic explain (shows plan, not actual execution)
EXPLAIN SELECT * FROM users WHERE email = 'alice@example.com';

-- EXPLAIN ANALYZE — actually runs the query and measures
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT u.name, COUNT(p.id) AS posts
FROM users u
LEFT JOIN posts p ON u.id = p.author_id
WHERE u.is_active = true
GROUP BY u.id;

Reading EXPLAIN output (PostgreSQL):

Seq Scan on users  (cost=0.00..431.00 rows=20000 width=48)

Seq Scan = reading every row (bad for large tables — missing index)
Index Scan = using an index (good)
Index Only Scan = all data from index (best — no heap access needed)
cost=start..total — estimated cost units (IO + CPU)
rows=N — estimated row count (if very wrong, run ANALYZE to update statistics)

Common performance killers:

-- ❌ Function on indexed column prevents index use
WHERE LOWER(email) = 'alice@example.com'
-- ✅ Fix: create a functional index
CREATE INDEX idx_users_lower_email ON users(LOWER(email));

-- ❌ Leading wildcard kills index
WHERE name LIKE '%alice%'
-- ✅ Fix: full-text search with GIN index

-- ❌ Implicit type cast prevents index use
WHERE user_id = '123'  -- user_id is INTEGER, '123' is TEXT → full scan
-- ✅ Fix: match types
WHERE user_id = 123

Connection Pooling

Every database connection uses ~5-10MB of RAM and has connection overhead. At scale, connection pooling is essential:

Without pooling: 1000 concurrent requests = 1000 DB connections = 10GB RAM
With pooling:    1000 concurrent requests = 20 pooled connections = 200MB RAM

PgBouncer (PostgreSQL) sits between your app and database:

App servers → PgBouncer → PostgreSQL
            (pool of 20)  (max 100 connections)

PgBouncer modes:

Transaction pooling (recommended): connection returned to pool after each transaction
Session pooling: connection held for the entire client session (less efficient)

// Node.js connection pool (pg library)
import { Pool } from "pg";

const pool = new Pool({
  host: process.env.DB_HOST,
  database: process.env.DB_NAME,
  user: process.env.DB_USER,
  password: process.env.DB_PASSWORD,
  port: 5432,
  max: 20,                // max pool size
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
});

// Use pool (connections automatically returned after query)
const { rows } = await pool.query(
  "SELECT id, name FROM users WHERE is_active = $1",
  [true]  // parameterized query — prevents SQL injection
);

Always use parameterized queries — never concatenate user input into SQL strings.

PostgreSQL-Specific Power Features

-- JSONB — store and query JSON documents
CREATE TABLE events (
  id      SERIAL PRIMARY KEY,
  type    TEXT,
  payload JSONB
);

INSERT INTO events (type, payload) VALUES
  ('purchase', '{"user_id": 1, "amount": 99.99, "items": ["book", "pen"]}');

-- Query JSONB
SELECT * FROM events WHERE payload->>'user_id' = '1';    -- text comparison
SELECT * FROM events WHERE (payload->>'amount')::FLOAT > 50;
SELECT * FROM events WHERE payload @> '{"items": ["book"]}';  -- contains
CREATE INDEX idx_events_payload ON events USING GIN(payload);  -- GIN for JSONB

-- Arrays
CREATE TABLE tags (post_id INTEGER, tags TEXT[]);
INSERT INTO tags VALUES (1, ARRAY['javascript', 'react', 'nextjs']);
SELECT * FROM tags WHERE 'react' = ANY(tags);

-- Full-text search
SELECT title, ts_rank(to_tsvector('english', content), query) AS rank
FROM posts, to_tsquery('english', 'javascript & react') query
WHERE to_tsvector('english', content) @@ query
ORDER BY rank DESC;

MySQL-Specific Notes

-- MySQL uses backticks for identifiers (PostgreSQL uses double quotes)
SELECT `name` FROM `users`;

-- AUTO_INCREMENT instead of SERIAL
CREATE TABLE users (
  id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  name VARCHAR(100)
);

-- MySQL string comparison is case-insensitive by default (depends on collation)
SELECT * FROM users WHERE name = 'alice';  -- may match 'Alice', 'ALICE'

-- Show slow queries
SET GLOBAL slow_query_log = ON;
SET GLOBAL long_query_time = 1;  -- log queries slower than 1 second

-- MySQL EXPLAIN
EXPLAIN FORMAT=JSON SELECT ...;

Common Interview Questions

Practice ladder

The advanced SQL these engines are known for — window functions, ranking and multi-join analytics. Climb in order.

Practice ladder: Advanced SQL — windows & CTEs0/5 solved

Climb in order — every rung assumes the one above it. Solve on LeetCode, then tick it here; progress is saved on this device.

Warm-up — your first window

Rank ScoresMedium
DENSE_RANK() OVER (ORDER BY …) — ranking without gaps.

Core — partitioned windows

Department Top Three SalariesHard
DENSE_RANK() OVER (PARTITION BY dept) — top-N per group.
Consecutive NumbersMedium
Compare a row to its neighbours (LAG / window or self-join).

Stretch — multi-join analytics

Trips and UsersHard
Joins + conditional aggregation + filtering banned rows.
Human Traffic of StadiumHard
Runs of consecutive rows — windows over an ordered sequence.

Think it through like the interview

Don't just discuss MVCC abstractly — derive the database's internal row headers and visibility checks under high concurrency.

Think it through: MVCC in Action: Transaction IsolationDatabase Internals0/3 stages

PROBLEMTwo transactions run concurrently in PostgreSQL. Transaction A (id = 101) updates a user's name from 'Alice' to 'Alex' but has not committed yet. Transaction B (id = 102) queries the user's name. Tracing Postgres internal row attributes, explain why B does not see 'Alex'.

1
Understand internal transaction metadata
“What hidden metadata fields does PostgreSQL attach to every row to track transaction visibility?”
2
Trace the UPDATE under MVCC
“When Transaction A (id = 101) updates 'Alice' to 'Alex', what physical operations does Postgres perform in the table heap?”
unlocks after the stage above
3
Compute visibility for concurrent readers
“Transaction B (id = 102) runs under the Read Committed isolation level. How does it determine which of the two versions ('Alice' or 'Alex') to return?”
unlocks after the stage above

Interactive Quiz

Check yourself0/3 answered

Practice

Transactions: Write a money transfer stored procedure that atomically debits one account and credits another, with proper error handling and rollback.
EXPLAIN: Take a slow query from a practice database, run EXPLAIN ANALYZE, identify the bottleneck, add an index, and verify the improvement.
Isolation: Write a demonstration of a phantom read — show it occurring at READ COMMITTED and prevented at SERIALIZABLE.
Production: Design a connection pool configuration for an app with 50 app server instances, each with 10 concurrent workers.

Next: MongoDB & Redis — NoSQL databases.