SQL Fundamentals · PrepDeck

Structured Query Language from the ground up — DDL, DML, joins, aggregates, window functions, subqueries, and interview-critical patterns.

Starting from Zero — A Physical Intuition

Before writing queries, let's understand SQL through physical analogies:

The Filing Clerk (The Query Engine): Imagine a giant archive library. You cannot retrieve folders yourself. Instead, you write a request note specifying what you need (e.g., "Find all client files from Chicago whose status is active"). A fast filing clerk (the database engine) reads your query, figures out the fastest path to the right drawer, extracts only those files, and returns them to you.
The Folder Joiner (INNER vs LEFT JOIN): Imagine you have two boxes of cards: Box A contains User profiles (each card has a User ID), and Box B contains Receipts (each card has a User ID and a purchase amount).
- INNER JOIN: You match cards from Box A and Box B by User ID. If a user has no receipts, or if a receipt lists a deleted User ID, you throw those cards aside. You only keep matches.
- LEFT JOIN: You keep every single card from Box A. If a user has matching receipts in Box B, you staple them together. If a user has no receipts, you still keep their card but leave the receipt fields blank (NULL).

What is SQL?

SQL (Structured Query Language, pronounced "sequel" or "S-Q-L") is the standard language for interacting with relational databases. It lets you:

Create and modify table structures (DDL)
Insert, update, delete, and query data (DML)
Control access and permissions (DCL)
Manage transactions (TCL)

SQL was created in 1974 at IBM, standardized as ANSI SQL, and is used in every major database: PostgreSQL, MySQL, SQLite, Oracle, SQL Server. The core syntax is nearly identical across all of them.

Data Types

-- Common SQL types (PostgreSQL syntax)
INTEGER       -- whole numbers (-2B to 2B)
BIGINT        -- larger integers
DECIMAL(p,s)  -- exact decimal (p = total digits, s = decimal places) — use for money
NUMERIC(p,s)  -- same as DECIMAL
FLOAT / DOUBLE-- approximate decimal (avoid for money — floating point errors)

VARCHAR(n)    -- variable-length text up to n chars
TEXT          -- unlimited text
CHAR(n)       -- fixed-length text, padded (rare in modern use)

BOOLEAN       -- true / false
DATE          -- 2024-01-15
TIME          -- 14:30:00
TIMESTAMP     -- 2024-01-15 14:30:00
TIMESTAMPTZ   -- timestamp with timezone (use this for production)

UUID          -- universally unique ID (e.g. 550e8400-e29b-41d4-a716-446655440000)
JSON / JSONB  -- JSON data (JSONB is binary, faster to query)
SERIAL        -- auto-incrementing integer (shorthand for INTEGER + sequence)

DDL — Creating and Modifying Tables

-- Create table
CREATE TABLE users (
  id         SERIAL PRIMARY KEY,               -- auto-increment, unique, not null
  email      VARCHAR(255) NOT NULL UNIQUE,     -- required, no duplicates
  name       VARCHAR(100) NOT NULL,
  role       VARCHAR(20) NOT NULL DEFAULT 'user',
  is_active  BOOLEAN NOT NULL DEFAULT true,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE posts (
  id         SERIAL PRIMARY KEY,
  title      VARCHAR(500) NOT NULL,
  content    TEXT,
  author_id  INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  -- ON DELETE CASCADE: delete posts when the user is deleted
  -- ON DELETE SET NULL: set author_id to NULL
  -- ON DELETE RESTRICT: prevent deleting user if they have posts
  published  BOOLEAN NOT NULL DEFAULT false,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Add index (covered in depth in the Indexing article)
CREATE INDEX idx_posts_author_id ON posts(author_id);
CREATE INDEX idx_users_email ON users(email);

-- Alter table
ALTER TABLE users ADD COLUMN bio TEXT;
ALTER TABLE users ALTER COLUMN name TYPE VARCHAR(200);
ALTER TABLE users DROP COLUMN bio;

-- Drop table
DROP TABLE posts;               -- fails if referenced by foreign keys
DROP TABLE posts CASCADE;       -- also drops dependent objects

DML — Querying and Modifying Data

-- INSERT
INSERT INTO users (email, name, role)
VALUES ('alice@example.com', 'Alice', 'admin');

-- Insert multiple rows
INSERT INTO users (email, name) VALUES
  ('bob@example.com', 'Bob'),
  ('carol@example.com', 'Carol');

-- INSERT with ON CONFLICT (upsert)
INSERT INTO users (email, name)
VALUES ('alice@example.com', 'Alice Updated')
ON CONFLICT (email)
DO UPDATE SET name = EXCLUDED.name, updated_at = NOW();

-- SELECT
SELECT id, name, email FROM users;          -- specific columns
SELECT * FROM users;                         -- all columns (avoid in production)
SELECT DISTINCT role FROM users;             -- unique values

-- WHERE — filtering
SELECT * FROM users
WHERE is_active = true
  AND role = 'admin'
  AND created_at > '2024-01-01';

-- LIKE — pattern matching (% = any characters, _ = single character)
SELECT * FROM users WHERE email LIKE '%@gmail.com';
SELECT * FROM users WHERE name ILIKE 'alice%';  -- case-insensitive (PostgreSQL)

-- IN and NOT IN
SELECT * FROM users WHERE role IN ('admin', 'moderator');
SELECT * FROM users WHERE id NOT IN (1, 2, 3);

-- IS NULL / IS NOT NULL
SELECT * FROM users WHERE bio IS NULL;

-- ORDER BY
SELECT * FROM users ORDER BY created_at DESC, name ASC;

-- LIMIT and OFFSET (pagination)
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 40;  -- page 3
-- Note: OFFSET pagination is slow for large offsets — use cursor-based for scale

-- UPDATE
UPDATE users SET name = 'Alice Smith', updated_at = NOW() WHERE id = 1;
UPDATE posts SET published = true WHERE author_id = 1 AND published = false;

-- DELETE
DELETE FROM users WHERE id = 1;
DELETE FROM posts WHERE created_at < NOW() - INTERVAL '1 year';

Joins — Combining Tables

Joins combine rows from two or more tables based on a related column.

┌──────────────────────────────────────────────────────────┐
│  users              posts                                  │
│  ┌────┬──────┐      ┌────┬──────────┬───────────┐        │
│  │ id │ name │      │ id │  title   │ author_id │        │
│  ├────┼──────┤      ├────┼──────────┼───────────┤        │
│  │  1 │ Alice│      │  1 │ "Post A" │     1     │        │
│  │  2 │ Bob  │      │  2 │ "Post B" │     1     │        │
│  │  3 │ Carol│      │  3 │ "Post C" │     2     │        │
│  └────┴──────┘      └────┴──────────┴───────────┘        │
└──────────────────────────────────────────────────────────┘

-- INNER JOIN: only rows that match in BOTH tables
SELECT u.name, p.title
FROM users u
INNER JOIN posts p ON u.id = p.author_id;
-- Returns: Alice/Post A, Alice/Post B, Bob/Post C (Carol excluded — no posts)

-- LEFT JOIN: all rows from left table, matched rows from right (or NULL)
SELECT u.name, p.title
FROM users u
LEFT JOIN posts p ON u.id = p.author_id;
-- Returns: Alice/Post A, Alice/Post B, Bob/Post C, Carol/NULL

-- RIGHT JOIN: all from right, matched from left (rarely used — just swap tables)

-- FULL OUTER JOIN: all rows from both, NULL where no match

-- SELF JOIN: table joins itself (org hierarchy, friends, etc.)
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

-- Multiple joins
SELECT u.name, p.title, c.content AS comment
FROM users u
JOIN posts p ON u.id = p.author_id
JOIN comments c ON p.id = c.post_id
WHERE u.is_active = true
ORDER BY p.created_at DESC;

Aggregates & GROUP BY

-- Aggregate functions
SELECT
  COUNT(*)                    AS total_users,
  COUNT(DISTINCT role)        AS unique_roles,
  AVG(EXTRACT(YEAR FROM NOW()) - EXTRACT(YEAR FROM created_at)) AS avg_age_years,
  MAX(created_at)             AS latest_signup,
  MIN(created_at)             AS first_signup,
  SUM(CASE WHEN is_active THEN 1 ELSE 0 END) AS active_users
FROM users;

-- GROUP BY — aggregate per group
SELECT role, COUNT(*) AS count
FROM users
GROUP BY role
ORDER BY count DESC;

-- HAVING — filter groups (like WHERE but after GROUP BY)
SELECT author_id, COUNT(*) AS post_count
FROM posts
WHERE published = true
GROUP BY author_id
HAVING COUNT(*) > 5            -- only authors with more than 5 published posts
ORDER BY post_count DESC;

-- Combining with JOIN
SELECT u.name, COUNT(p.id) AS published_posts
FROM users u
LEFT JOIN posts p ON u.id = p.author_id AND p.published = true
GROUP BY u.id, u.name
ORDER BY published_posts DESC;

Subqueries

A subquery is a SELECT inside another SELECT:

-- Subquery in WHERE
SELECT name FROM users
WHERE id IN (
  SELECT DISTINCT author_id FROM posts WHERE published = true
);

-- Subquery in FROM (derived table)
SELECT dept, avg_salary
FROM (
  SELECT department AS dept, AVG(salary) AS avg_salary
  FROM employees
  GROUP BY department
) AS dept_averages
WHERE avg_salary > 80000;

-- Correlated subquery — references outer query (runs once per row — slow for large tables)
SELECT name, (
  SELECT COUNT(*) FROM posts WHERE author_id = u.id
) AS post_count
FROM users u;
-- Better: use a LEFT JOIN + COUNT + GROUP BY

-- EXISTS — often faster than IN for large subqueries
SELECT name FROM users u
WHERE EXISTS (
  SELECT 1 FROM posts WHERE author_id = u.id AND published = true
);

Window Functions

Window functions perform calculations across rows related to the current row, without collapsing rows (unlike GROUP BY).

-- ROW_NUMBER: rank rows within a partition
SELECT
  name,
  department,
  salary,
  ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank_in_dept
FROM employees;

-- RANK vs DENSE_RANK
-- RANK: 1,1,3 (skips 2 when there's a tie)
-- DENSE_RANK: 1,1,2 (no gaps)

-- LAG and LEAD: access previous/next row
SELECT
  date,
  revenue,
  LAG(revenue) OVER (ORDER BY date) AS prev_revenue,
  revenue - LAG(revenue) OVER (ORDER BY date) AS day_over_day_change
FROM daily_revenue;

-- Running total
SELECT
  date,
  revenue,
  SUM(revenue) OVER (ORDER BY date ROWS UNBOUNDED PRECEDING) AS running_total
FROM daily_revenue;

-- Percentile rank
SELECT
  name,
  salary,
  PERCENT_RANK() OVER (ORDER BY salary) AS percentile
FROM employees;

-- Top N per group (classic interview problem)
SELECT name, department, salary FROM (
  SELECT
    name, department, salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
  FROM employees
) ranked
WHERE rn <= 3;  -- Top 3 earners per department

CTEs — Common Table Expressions

CTEs make complex queries readable by naming intermediate results:

-- Basic CTE
WITH active_users AS (
  SELECT * FROM users WHERE is_active = true
),
user_post_counts AS (
  SELECT author_id, COUNT(*) AS post_count
  FROM posts
  WHERE published = true
  GROUP BY author_id
)
SELECT u.name, COALESCE(upc.post_count, 0) AS posts
FROM active_users u
LEFT JOIN user_post_counts upc ON u.id = upc.author_id
ORDER BY posts DESC;

-- Recursive CTE — for hierarchical data (org charts, categories, file trees)
WITH RECURSIVE org_tree AS (
  -- Base case: top-level employees (no manager)
  SELECT id, name, manager_id, 0 AS depth
  FROM employees
  WHERE manager_id IS NULL

  UNION ALL

  -- Recursive case: find direct reports
  SELECT e.id, e.name, e.manager_id, ot.depth + 1
  FROM employees e
  JOIN org_tree ot ON e.manager_id = ot.id
)
SELECT name, depth FROM org_tree ORDER BY depth, name;

Practice ladder

SQL is a muscle you build on real queries. These are LeetCode's classic SQL problems — climb in order; each rung assumes the one above.

Practice ladder: SQL queries0/7 solved

Climb in order — every rung assumes the one above it. Solve on LeetCode, then tick it here; progress is saved on this device.

Warm-up — single tables & joins

Combine Two TablesEasy
LEFT JOIN basics — keep rows with no match.
Customers Who Never OrderEasy
Anti-join — rows in A with no match in B (NOT IN / LEFT JOIN … IS NULL).

Core — grouping & self-joins

Duplicate EmailsEasy
GROUP BY + HAVING COUNT(*) > 1.
Employees Earning More Than Their ManagersEasy
Self-join a table to itself on a foreign key.
Rising TemperatureEasy
Self-join on consecutive dates — compare a row to its neighbour.

Stretch — subqueries & per-group tops

Second Highest SalaryMedium
Subquery / LIMIT-OFFSET and the NULL-when-absent edge case.
Department Highest SalaryMedium
Max-per-group via join or window — the gateway to ranking.

Think it through like the interview

Don't just write queries from top to bottom — derive the logical execution order to debug syntax anomalies.

Think it through: The Logical Query Execution OrderCore SQL0/3 stages

PROBLEMYou write a query: `SELECT price * quantity AS total_cost FROM orders WHERE total_cost > 100`. The database compiler throws an error: 'Column total_cost does not exist'. Explain why this happens and trace the actual execution order.

1
Map query writing vs execution order
“While we write SELECT first in SQL, in what order does the database engine actually execute the FROM, WHERE, and SELECT clauses?”
2
Explain the compiler error
“Based on this execution order, why does `WHERE total_cost > 100` fail to find the alias `total_cost` defined in the SELECT clause?”
unlocks after the stage above
3
Determine alias validity in sorting
“If `WHERE` cannot see SELECT aliases, can the `ORDER BY` clause reference `total_cost`? Why or why not?”
unlocks after the stage above

Interactive Quiz

Check yourself0/3 answered

Practice

Joins: Find all users who have never written a post.
Aggregates: For each user, show their name, total posts, published posts, and draft posts.
Window: Given a sales table with date, product_id, and amount, find the top 3 best-selling products per month.
Recursive: Given an employees table with id, name, manager_id, write a query to find all direct and indirect reports of a given manager.

Next: PostgreSQL & MySQL — production database systems.