Sorting & Searching

Merge sort, quicksort, heap sort and when to use each — plus the searching ladder from linear scan to binary search, and what interviewers actually test.

sortingsearchingmerge-sortquicksort

Why sorting is the gateway algorithm

Sorting matters for three reasons: it's the classic vehicle for learning divide and conquer; sorted data unlocks faster everything (binary search, two pointers, dedupe, ranges); and "should I sort first?" is the highest-value first question on dozens of interview problems. You will almost never implement a sort at work — you must still know how they work, because the follow-ups ("why is quicksort O(n²) worst case?") are standard.

The complexity landscape

AlgorithmAverageWorstSpaceStable?The one-liner
Bubble/InsertionO(n²)O(n²)O(1)teaching tools; insertion wins on nearly-sorted data
Merge sortO(n log n)O(n log n)O(n)guaranteed, stable, the linked-list choice
QuicksortO(n log n)O(n²)O(log n)fastest in practice, in place
Heap sortO(n log n)O(n log n)O(1)guaranteed + in place, slower constants
Counting/RadixO(n + k)O(n + k)O(k)non-comparison; small integer ranges only

O(n log n) is provably the floor for comparison sorting — counting sort beats it only by not comparing (it indexes by value, an array trick).

Watch them run

The complexity table above becomes obvious once you watch the work happen. Switch algorithms on the same array — count how many steps each one takes, then shuffle and try again.

Bubble sorttime O(n²)space O(1)
38
0
12
1
71
2
5
3
56
4
24
5
90
6
17
7

1/48Bubble sort: repeatedly swap adjacent out-of-order pairs; the largest value bubbles to the end each pass.

algorithm

Merge sort — divide and conquer, canonically

Split in half, sort each half (recursively), merge the two sorted halves. The merge is two pointers: repeatedly take the smaller front.

Python
def merge_sort(a):
    if len(a) <= 1:
        return a
    mid = len(a) // 2
    left, right = merge_sort(a[:mid]), merge_sort(a[mid:])
    return merge(left, right)

def merge(l, r):
    out, i, j = [], 0, 0
    while i < len(l) and j < len(r):
        if l[i] <= r[j]:                # <= keeps it STABLE
            out.append(l[i]); i += 1
        else:
            out.append(r[j]); j += 1
    return out + l[i:] + r[j:]          # one side has leftovers

Why n log n: log n levels of splitting, O(n) merge work per level. Why it matters beyond sorting: the merge step is "merge two sorted lists," "merge k sorted lists" (add a heap), and external sorting — how databases sort data bigger than RAM (sort chunks, merge streams — files, Level 1 meets scale).

Stable means equal elements keep their original order — sort orders by date, then by customer, and each customer's orders stay date-sorted. Interviewers love this word; merge sort has it, quicksort doesn't.

Quicksort — partition and recurse

Pick a pivot, partition the array into < pivot | pivot | > pivot (one O(n) pass, in place), then recurse on the two sides.

Python
def quicksort(a, lo=0, hi=None):
    if hi is None: hi = len(a) - 1
    if lo >= hi: return
    p = partition(a, lo, hi)
    quicksort(a, lo, p - 1)
    quicksort(a, p + 1, hi)

def partition(a, lo, hi):          # Lomuto: pivot = last element
    pivot, write = a[hi], lo       # 'write' trails: everything left of it < pivot
    for read in range(lo, hi):
        if a[read] < pivot:
            a[write], a[read] = a[read], a[write]
            write += 1
    a[write], a[hi] = a[hi], a[write]
    return write

The worst case is the famous follow-up: a bad pivot (e.g., always the last element on already-sorted input) splits n into 1 and n−1 → O(n²). Defenses: random pivot or median-of-three. With randomization, expected O(n log n) and the best cache behavior of the bunch — which is why real libraries build on it (with introsort fallbacks to heap sort when recursion gets suspicious).

Partition alone is a tool: quickselect finds the k-th smallest element in expected O(n) — partition, then recurse into one side only. "K-th largest element" is a top-10 interview question, and "quickselect or a heap?" is the expected discussion.

What you actually do: call sort, design the key

Python
words.sort()                                  # Timsort: O(n log n), stable
users.sort(key=lambda u: (-u.score, u.name))  # rank by score desc, tie → name
Java
Arrays.sort(primitives);                       // dual-pivot quicksort
Collections.sort(users, Comparator.comparing(User::getScore).reversed());
C++
std::sort(v.begin(), v.end());                 // introsort
std::stable_sort(v.begin(), v.end(), byScore); // when stability matters

The real-world skill is the comparator/key: multi-field, ascending/ descending mixes, and custom orders ("largest number formed by concatenation" — sort with comparator a+b > b+a). Python's Timsort is merge-sort-based, stable, and exploits existing runs — nearly-sorted data sorts in nearly O(n).

The searching ladder

  1. Linear scan — O(n), unsorted data, no setup. Fine once; a loop of them is the hash-table cue.
  2. Binary search — O(log n), needs sorted data. Halve the space each comparison; full treatment (and its killer generalization, binary search on the answer) in its own doc.
  3. Hash lookup — O(1), needs preprocessing, exact-match only.
  4. Structure-specificBST range queries, trie prefixes.

The strategic table — given n items and q queries:

SituationChoice
One queryscan: O(n) beats sort-then-search O(n log n)
Many exact-match querieshash set/map: O(n) build, O(1) each
Many order queries (range, nearest, rank)sort once + binary search: O(n log n) + O(log n) each
Data changes between queriesBST/heap, not re-sorting

Common mistakes

  • Sorting when a hash map answers it — two-sum sorted costs O(n log n) and loses original indices; the dict is O(n).
  • Destroying information by sorting — need the original index later? Sort pairs (value, i) instead.
  • Quadratic "sorts" hiding in codelist.remove(min(list)) in a loop is O(n²) selection sort wearing builtins.
  • Claiming quicksort is O(n log n) flat — it's expected; the worst case and its trigger is the exact follow-up question.
  • Stability blindness — multi-pass sorting (by B, then by A) only works with stable sorts; say the word.

Interview perspective

Practice

  1. Implement from memory: merge sort, then quicksort with random pivot. Time both on 10⁵ random ints, then on 10⁵ already-sorted ints with a fixed last-element pivot — watch the O(n²) happen.
  2. Comparator reps: sort a list of (name, dept, salary) by dept asc, salary desc, name asc — one sort call. Then: why does doing it as three separate stable sorts (in reverse priority order) also work?
  3. The canon: k-th largest (all three ways); merge intervals (sort by start — see heaps & intervals); "sort colors" (Dutch national flag — partition with three pointers, no counting allowed).
  4. Judgment: 10⁹ 32-bit integers on disk, 4 GB RAM — sketch the external merge sort. (Chunks that fit, sort each, k-way merge with a heap. You just designed a database primitive.)

Next pattern: Greedy — when taking the locally best choice is provably globally optimal.

Practice — climb the ladder

Sorting is rarely the ANSWER — it is the move that unlocks two pointers, greedy sweeps, and binary search. Practice choosing it deliberately.

Practice ladder: Sorting & Searching0/8 solved

Climb in order — every rung assumes the one above it. Solve on LeetCode, then tick it here; progress is saved on this device.

Warm-up

sort mechanics you must own
  1. The merge step of merge sort, isolated — fill from the back.

  2. Dutch national flag — three-way partition, the heart of quicksort.

Core

sorting as a tool, not a button
  1. Quickselect — partition without fully sorting; average O(n) and say why.

  2. Sort ListMedium

    Merge sort on a linked list — the structure picks the algorithm.

  3. Custom comparator (a+b vs b+a) — sorting by a rule you invent.

  4. Bucket sort by count — when O(n log n) is already beatable.

  5. H-IndexMedium

    Sort then scan for a crossing point — sorted order creating meaning.

Stretch

divide-and-conquer counting
  1. Count during merge — the merge-sort skeleton doing analytics on the side.