Sorting & Searching

Merge sort, quicksort, heap sort and when to use each — plus the searching ladder from linear scan to binary search, and what interviewers actually test.

Why sorting is the gateway algorithm

Sorting matters for three reasons: it's the classic vehicle for learning divide and conquer; sorted data unlocks faster everything (binary search, two pointers, dedupe, ranges); and "should I sort first?" is the highest-value first question on dozens of interview problems. You will almost never implement a sort at work — you must still know how they work, because the follow-ups ("why is quicksort O(n²) worst case?") are standard.

The complexity landscape

Algorithm	Average	Worst	Space	Stable?	The one-liner
Bubble/Insertion	O(n²)	O(n²)	O(1)	✓	teaching tools; insertion wins on nearly-sorted data
Merge sort	O(n log n)	O(n log n)	O(n)	✓	guaranteed, stable, the linked-list choice
Quicksort	O(n log n)	O(n²)	O(log n)	✗	fastest in practice, in place
Heap sort	O(n log n)	O(n log n)	O(1)	✗	guaranteed + in place, slower constants
Counting/Radix	O(n + k)	O(n + k)	O(k)	✓	non-comparison; small integer ranges only

O(n log n) is provably the floor for comparison sorting — counting sort beats it only by not comparing (it indexes by value, an array trick).

Watch them run

The complexity table above becomes obvious once you watch the work happen. Switch algorithms on the same array — count how many steps each one takes, then shuffle and try again.

Bubble sorttime O(n²)space O(1)

1/48Bubble sort: repeatedly swap adjacent out-of-order pairs; the largest value bubbles to the end each pass.

algorithm

array

Merge sort — divide and conquer, canonically

Split in half, sort each half (recursively), merge the two sorted halves. The merge is two pointers: repeatedly take the smaller front.

Python

def merge_sort(a):
    if len(a) <= 1:
        return a
    mid = len(a) // 2
    left, right = merge_sort(a[:mid]), merge_sort(a[mid:])
    return merge(left, right)

def merge(l, r):
    out, i, j = [], 0, 0
    while i < len(l) and j < len(r):
        if l[i] <= r[j]:                # <= keeps it STABLE
            out.append(l[i]); i += 1
        else:
            out.append(r[j]); j += 1
    return out + l[i:] + r[j:]          # one side has leftovers

Why n log n: log n levels of splitting, O(n) merge work per level. Why it matters beyond sorting: the merge step is "merge two sorted lists," "merge k sorted lists" (add a heap), and external sorting — how databases sort data bigger than RAM (sort chunks, merge streams — files, Level 1 meets scale).

Stable means equal elements keep their original order — sort orders by date, then by customer, and each customer's orders stay date-sorted. Interviewers love this word; merge sort has it, quicksort doesn't.

Quicksort — partition and recurse

Pick a pivot, partition the array into < pivot | pivot | > pivot (one O(n) pass, in place), then recurse on the two sides.

Python

def quicksort(a, lo=0, hi=None):
    if hi is None: hi = len(a) - 1
    if lo >= hi: return
    p = partition(a, lo, hi)
    quicksort(a, lo, p - 1)
    quicksort(a, p + 1, hi)

def partition(a, lo, hi):          # Lomuto: pivot = last element
    pivot, write = a[hi], lo       # 'write' trails: everything left of it < pivot
    for read in range(lo, hi):
        if a[read] < pivot:
            a[write], a[read] = a[read], a[write]
            write += 1
    a[write], a[hi] = a[hi], a[write]
    return write

The worst case is the famous follow-up: a bad pivot (e.g., always the last element on already-sorted input) splits n into 1 and n−1 → O(n²). Defenses: random pivot or median-of-three. With randomization, expected O(n log n) and the best cache behavior of the bunch — which is why real libraries build on it (with introsort fallbacks to heap sort when recursion gets suspicious).

Partition alone is a tool: quickselect finds the k-th smallest element in expected O(n) — partition, then recurse into one side only. "K-th largest element" is a top-10 interview question, and "quickselect or a heap?" is the expected discussion.

What you actually do: call sort, design the key

Python

words.sort()                                  # Timsort: O(n log n), stable
users.sort(key=lambda u: (-u.score, u.name))  # rank by score desc, tie → name

Java

Arrays.sort(primitives);                       // dual-pivot quicksort
Collections.sort(users, Comparator.comparing(User::getScore).reversed());

C++

std::sort(v.begin(), v.end());                 // introsort
std::stable_sort(v.begin(), v.end(), byScore); // when stability matters

The real-world skill is the comparator/key: multi-field, ascending/ descending mixes, and custom orders ("largest number formed by concatenation" — sort with comparator a+b > b+a). Python's Timsort is merge-sort-based, stable, and exploits existing runs — nearly-sorted data sorts in nearly O(n).

The searching ladder

Linear scan — O(n), unsorted data, no setup. Fine once; a loop of them is the hash-table cue.
Binary search — O(log n), needs sorted data. Halve the space each comparison; full treatment (and its killer generalization, binary search on the answer) in its own doc.
Hash lookup — O(1), needs preprocessing, exact-match only.
Structure-specific — BST range queries, trie prefixes.

The strategic table — given n items and q queries:

Situation	Choice
One query	scan: O(n) beats sort-then-search O(n log n)
Many exact-match queries	hash set/map: O(n) build, O(1) each
Many order queries (range, nearest, rank)	sort once + binary search: O(n log n) + O(log n) each
Data changes between queries	BST/heap, not re-sorting

Common mistakes

Sorting when a hash map answers it — two-sum sorted costs O(n log n) and loses original indices; the dict is O(n).
Destroying information by sorting — need the original index later? Sort pairs (value, i) instead.
Quadratic "sorts" hiding in code — list.remove(min(list)) in a loop is O(n²) selection sort wearing builtins.
Claiming quicksort is O(n log n) flat — it's expected; the worst case and its trigger is the exact follow-up question.
Stability blindness — multi-pass sorting (by B, then by A) only works with stable sorts; say the word.

Think it through

The page's thesis: at work you call sort and design the key. The hardest version of that skill is a problem where the "key" is a custom pairwise comparator. Reason through it before revealing.

Think it through: Largest NumberMedium — LeetCode 1790/5 stages

PROBLEMArrange non-negative integers to form the LARGEST possible number, returned as a string. nums = [3, 30, 34, 5, 9] → '9534330'.

1
Restate & edges
“Why isn't this just 'sort descending'?”
2
Brute force first
“Dumbest correct approach and its cost?”
unlocks after the stage above
3
Find the comparator
“For two numbers a and b, which should come first?”
unlocks after the stage above
4
Code the template
“How do I sort by a custom pairwise rule in Python, and what's the zero trap?”
unlocks after the stage above
5
Cost & edge check
“Cost, and why is the comparator provably consistent?”
unlocks after the stage above

Check yourself

Check yourself0/4 answered

1. Why is comparison-based sorting Ω(n log n) in the worst case?

2. Quicksort's worst case is O(n²). When does it strike?

3. A 'stable' sort means:

4. You have n items and will run MANY exact-match lookups. Best approach?

Interview perspective

Practice — climb the ladder

Sorting is rarely the ANSWER — it is the move that unlocks two pointers, greedy sweeps, and binary search. Practice choosing it deliberately.

Practice ladder: Sorting & Searching0/8 solved

Climb in order — every rung assumes the one above it. Solve on LeetCode, then tick it here; progress is saved on this device.

Warm-up

sort mechanics you must own

Merge Sorted ArrayEasy
The merge step of merge sort, isolated — fill from the back.
Sort ColorsMedium
Dutch national flag — three-way partition, the heart of quicksort.

Core

sorting as a tool, not a button

Kth Largest Element in an ArrayMedium
Quickselect — partition without fully sorting; average O(n) and say why.
Sort ListMedium
Merge sort on a linked list — the structure picks the algorithm.
Largest NumberMedium
Custom comparator (a+b vs b+a) — sorting by a rule you invent.
Sort Characters by FrequencyMedium
Bucket sort by count — when O(n log n) is already beatable.
H-IndexMedium
Sort then scan for a crossing point — sorted order creating meaning.

Stretch

divide-and-conquer counting

Reverse PairsHard
Count during merge — the merge-sort skeleton doing analytics on the side.