Collections · PrepDeck

Lists, dictionaries, sets and tuples — the four containers every program is built from, and how to choose between them.

One variable, many values

A variable holds one value. Real programs deal in many: a cart of items, a class of students, a feed of posts. Collections are the containers languages provide for groups of values — and choosing the right one is the first genuinely architectural decision you'll make daily for the rest of your career.

Four shapes cover ~95% of all code:

Shape	Python	Java	C++	One-liner
Ordered sequence	`list`	`ArrayList`	`std::vector`	Items in a row, by position
Key → value lookup	`dict`	`HashMap`	`std::unordered_map`	A label on every value
Unique items	`set`	`HashSet`	`std::unordered_set`	Membership, no duplicates
Fixed group	`tuple`	`record`	`std::pair`/`struct`	A few values that travel together

Lists — the workhorse

An ordered, growable sequence, accessed by index (position, counting from 0):

Python

# Python
cart = ["bread", "milk"]
cart.append("eggs")          # add at the end
print(cart[0])               # "bread" — first item
print(cart[-1])              # "eggs" — negative = from the end
cart[1] = "oat milk"         # replace by index
print(len(cart))             # 3
for item in cart:            # iterate (Control Flow page)
    print(item)

Java

// Java
List<String> cart = new ArrayList<>();
cart.add("bread");
cart.add("milk");
System.out.println(cart.get(0));
cart.set(1, "oat milk");

C++

// C++
std::vector<std::string> cart = {"bread", "milk"};
cart.push_back("eggs");
std::cout << cart[0];

Slicing (Python superpower): cart[0:2] is a new list of the first two items; cart[::-1] is the reverse. Java/C++ do the same with explicit loops or library calls.

Use a list when order matters or you'll iterate everything: a feed, a queue of steps, last-10 search results.

List Comprehensions & Loop Helpers (Python)

1. List Comprehensions

A list comprehension is a concise way to build a new list from an existing collection.

Python

# Traditional loop
squares = []
for x in range(1, 6):
    squares.append(x * x)

# Comprehension equivalent
squares = [x * x for x in range(1, 6)] # [1, 4, 9, 16, 25]

# With a condition filter (e.g., only keep even squares)
even_squares = [x * x for x in range(1, 6) if x % 2 == 0] # [4, 16]

# If-Else inside comprehension (conditional mapping)
labels = ["even" if x % 2 == 0 else "odd" for x in range(1, 6)]

2. Loop Helpers: `enumerate()` and `zip()`

enumerate(collection): In DSA, we often need both the index and the value of elements in a loop. enumerate() provides both without having to manually manage a counter.

Python

names = ["Asha", "Rahul", "Meera"]
for index, name in enumerate(names):
    print(f"Index {index} belongs to {name}")

zip(col1, col2): Loops over multiple collections in parallel, yielding tuples of matched elements.

Python

names = ["Asha", "Rahul"]
scores = [91, 84]
for name, score in zip(names, scores):
    print(f"{name} scored {score}")

Dictionaries — label every value

A dictionary (also called map or hash map — same idea) stores key → value pairs and answers "what's the value for this key?" essentially instantly:

Python

# Python
prices = {"bread": 40, "milk": 60, "eggs": 90}
print(prices["milk"])            # 60 — lookup by key
prices["butter"] = 120           # add or update
if "tofu" in prices:             # membership check
    print(prices["tofu"])
print(prices.get("tofu", 0))     # 0 — default instead of an exception
for item, price in prices.items():
    print(item, price)

Java

// Java
Map<String, Integer> prices = new HashMap<>();
prices.put("bread", 40);
prices.getOrDefault("tofu", 0);

C++

// C++
std::unordered_map<std::string, int> prices = {{"bread", 40}, {"milk", 60}};
prices["butter"] = 120;

The analogy: a list is lockers in a row (find by position); a dict is a phone book (find by name). Asking "is bread in this list of 1 million items?" requires checking up to all million; asking a dict takes one step regardless of size. Why that's true — hashing — is exactly hash tables in Level 2; for now, trust the speed.

Use a dict when you look things up by identity: user by id, price by product, count by word. The moment you write a loop to find something in a list repeatedly, you probably wanted a dict.

Dict & Set Comprehensions (Python)

Just like lists, you can build dictionaries and sets inline:

Python

# Dict comprehension: mapping a list to their lengths
words = ["apple", "banana"]
word_lens = {w: len(w) for w in words} # {"apple": 5, "banana": 6}

# Set comprehension: extract unique squares
nums = [1, -2, 2, 3]
unique_squares = {x * x for x in nums} # {1, 4, 9}

Sets — membership and uniqueness

A set holds unique values, unordered, with instant membership checks:

Python

# Python
visitors = {"asha", "rahul", "asha"}     # duplicates collapse
print(len(visitors))                     # 2
visitors.add("meera")
print("rahul" in visitors)               # True — instant, like dict keys

admins = {"asha", "meera"}
print(visitors & admins)                 # intersection: both
print(visitors - admins)                 # difference: visitors who aren't admins

Use a set for "have I seen this before?" (dedupe, visited pages, already-processed ids) and for set algebra (common followers, missing permissions). A set is essentially a dict that stores only keys.

Tuples — small fixed groups

A tuple is a small, immutable (unchangeable) bundle of values that belong together:

Python

point = (3, 7)                      # x, y — forever
x, y = point                        # unpacking
locations = {(28.6, 77.2): "Delhi"} # immutable ⇒ usable as dict keys
def min_max(nums):
    return (min(nums), max(nums))   # return two values at once

Use tuples for coordinates, RGB colors, (key, value) pairs, multiple return values. The immutability is a feature: a function can hand one out without fearing the caller will mutate it (remember aliasing). When a tuple grows fields or needs names, graduate to a class (OOP Basics) — point.x beats point[0].

Choosing: the decision in four questions

And the performance intuition to carry into Level 2 (full version: complexity cheat sheet):

Operation	list	dict / set
Get by index/key	instant	instant
`x in collection`	scans everything — O(n)	instant — O(1)
Add at end / insert pair	instant	instant
Insert at front/middle	shifts everything — O(n)	—

That one row — membership: O(n) vs O(1) — explains half of all real-world slow code, and half of all DSA interview optimizations.

Java & C++ Collection Mechanics

In statically typed languages, using collections requires some specific syntax:

1. Java Generics

Java collections store objects of a specific type using Generics (indicated inside < > angle brackets). You cannot use primitive types (int, char, double) directly as type arguments; instead, use their object wrappers (Integer, Character, Double):

Java

// List of Integers
List<Integer> nums = new ArrayList<>(); // <> is the diamond operator (type inferred)

// Map from String to Integer
Map<String, Integer> counts = new HashMap<>();

2. C++ Structured Bindings

When looping over maps in C++, you can unpack the key and value directly using structured bindings:

C++

std::unordered_map<std::string, int> prices = {{"bread", 40}, {"milk", 60}};

// Access key and value directly
for (const auto& [item, price] : prices) {
    std::cout << item << " costs " << price << "\n";
}

Advanced DSA Collections

When solving DSA problems, the basic List/Dict/Set are not always enough. You will constantly use these specialized collections for stack, queue, and heap operations:

1. Double-Ended Queue (Deque)

A deque allows O(1) insertion and deletion at both ends (a normal list/ArrayList takes O(N) to insert at the front because elements must shift).

Python: from collections import deque
- q.append(x) (push right), q.popleft() (pop left — essential for BFS queue).
- q.appendleft(x) (push left), q.pop() (pop right).
Java: ArrayDeque (implements both Queue and Deque interfaces)
- q.addLast(x) / q.pollFirst() (standard queue FIFO).
- q.addFirst(x) / q.pollFirst() (standard stack LIFO).
C++: std::deque (or std::queue wrapper)
- q.push_back(x), q.pop_front() (FIFO).

2. Default Dictionary

In Python, looking up a missing key in a dict raises a KeyError. defaultdict automatically creates a default value (like an empty list or integer 0) for missing keys.

Python: from collections import defaultdict

Python

# Automatically initializes missing keys to an empty list
adj_list = defaultdict(list)
adj_list["nodeA"].append("nodeB") # Works instantly without checking if key exists!

Java/C++ equivalents: In Java, use .putIfAbsent() or .computeIfAbsent(). In C++, the map bracket operator map[key] automatically default-constructs the value if it doesn't exist.

Java

// Java
adjList.putIfAbsent("nodeA", new ArrayList<>()).add("nodeB");

C++

// C++
adj_list["nodeA"].push_back("nodeB"); // automatically creates empty vector if nodeA missing

3. Heaps / Priority Queues

A Priority Queue (or Heap) allows you to retrieve the minimum (or maximum) element in O(1) time and insert in O(log N) time.

Python: import heapq
- heapq.heappush(heap, item)
- heapq.heappop(heap) (returns the smallest item — min-heap).
- Tip: To make a max-heap in Python, push negative values (-value).
Java: PriorityQueue<Integer> minHeap = new PriorityQueue<>();
- For max-heap: PriorityQueue<Integer> maxHeap = new PriorityQueue<>(Collections.reverseOrder());
C++: std::priority_queue<int> maxHeap; (default is max-heap in C++!).
- For min-heap: std::priority_queue<int, std::vector<int>, std::greater<int>> minHeap;

Nesting: collections of collections

Real data is nested — and you already know this shape from JSON:

Python

orders = [                                  # a list of dicts
    {"id": 101, "customer": "Asha", "items": ["bread", "eggs"]},
    {"id": 102, "customer": "Rahul", "items": ["milk"]},
]
print(orders[0]["items"][1])                # "eggs"

word_count = {}                             # THE classic pattern: counting
for word in text.split():
    word_count[word] = word_count.get(word, 0) + 1

That counting loop is worth memorizing — it's the accumulator pattern from Control Flow wearing a dict, it appears in countless interview problems, and it's your first real hash table algorithm.

Common beginner mistakes

List when you meant dict. Looping through users to find an id, inside another loop — accidental O(n²) (nested-loop multiplication). Build users_by_id once; look up forever.
Mutating a list while iterating it — skipped items, subtle chaos. Build a new list with the items you keep.
The aliasing classic: b = a copies the reference (Functions) — b.append(x) changes a too. Copy explicitly: b = a.copy() (and copy.deepcopy for nested data).
dict[key] on a maybe-missing key → KeyError (Exceptions). Use .get(key, default) or check in first.
Relying on order in a set. Sets (and Java's HashMap keys) have no useful order; if order matters, you wanted a list (or sort at the end).

Think it through

The counting-dict is the most reusable collections pattern there is. Derive a classic that needs it — plus one extra idea (a second pass to keep order). Think before each reveal.

Think it through: First non-repeating characterBeginner — the counting dict0/5 stages

PROBLEMReturn the first character that appears exactly once in a string. s = 'leetcode' → 'l'. s = 'aabb' → None.

1
Restate & edges
“What two things must the answer satisfy?”
2
Brute force first
“Dumbest correct solution and its cost?”
unlocks after the stage above
3
Find the pattern
“A counting dict gives me frequencies. But why do I still need a SECOND pass?”
unlocks after the stage above
4
Code the template
“Tally in pass 1, then walk the string (not the dict) in pass 2 — why the string?”
unlocks after the stage above
5
Cost & edge check
“Cost, and trace 'leetcode'?”
unlocks after the stage above

Check yourself

Check yourself0/4 answered

1. Checking `x in my_list` for a million-item list, inside a loop over a million events, is slow because:

2. Why can a tuple (lat, lon) be a dict key but a list [lat, lon] cannot?

3. `b = a` where a is a list, then `b.append(5)`. What happens to a?

4. You repeatedly look up users by their id. The right collection is:

Interview perspective

Practice

Beginner

Keep a shopping list: loop reading commands add x, remove x, show, quit — implement each (list + while loop).
Given [3, 7, 3, 1, 7, 3], produce: the unique values, the count of each value, and the list reversed — pick the right collection for each.

Intermediate

Two lists: students enrolled in Math, students in Physics. Produce: in both, in either, only in Math. (Sets — then re-do it without sets and appreciate them.)
Build a gradebook dict of student → list of marks. Write functions add_mark, average(student), topper(). Handle the missing-student case both ways: .get default and try/except.

Advanced

Group a list of words into anagram families: ["eat","tea","tan","ate"] → {"aet": ["eat","tea","ate"], "ant": ["tan"]}. Hint: what key makes anagrams collide? Why must that key be a string or tuple, not a list? (You just invented the canonical-form trick — see it again in the problem bank.)

Next: File Handling — making data outlive the program.