Collections

Lists, dictionaries, sets and tuples — the four containers every program is built from, and how to choose between them.

programminglistsdictionariessetscollections

One variable, many values

A variable holds one value. Real programs deal in many: a cart of items, a class of students, a feed of posts. Collections are the containers languages provide for groups of values — and choosing the right one is the first genuinely architectural decision you'll make daily for the rest of your career.

Four shapes cover ~95% of all code:

ShapePythonJavaC++One-liner
Ordered sequencelistArrayListstd::vectorItems in a row, by position
Key → value lookupdictHashMapstd::unordered_mapA label on every value
Unique itemssetHashSetstd::unordered_setMembership, no duplicates
Fixed grouptuplerecordstd::pair/structA few values that travel together

Lists — the workhorse

An ordered, growable sequence, accessed by index (position, counting from 0):

Python
# Python
cart = ["bread", "milk"]
cart.append("eggs")          # add at the end
print(cart[0])               # "bread" — first item
print(cart[-1])              # "eggs" — negative = from the end
cart[1] = "oat milk"         # replace by index
print(len(cart))             # 3
for item in cart:            # iterate (Control Flow page)
    print(item)
Java
// Java
List<String> cart = new ArrayList<>();
cart.add("bread");
cart.add("milk");
System.out.println(cart.get(0));
cart.set(1, "oat milk");
C++
// C++
std::vector<std::string> cart = {"bread", "milk"};
cart.push_back("eggs");
std::cout << cart[0];

Slicing (Python superpower): cart[0:2] is a new list of the first two items; cart[::-1] is the reverse. Java/C++ do the same with explicit loops or library calls.

Use a list when order matters or you'll iterate everything: a feed, a queue of steps, last-10 search results.

Dictionaries — label every value

A dictionary (also called map or hash map — same idea) stores key → value pairs and answers "what's the value for this key?" essentially instantly:

Python
# Python
prices = {"bread": 40, "milk": 60, "eggs": 90}
print(prices["milk"])            # 60 — lookup by key
prices["butter"] = 120           # add or update
if "tofu" in prices:             # membership check
    print(prices["tofu"])
print(prices.get("tofu", 0))     # 0 — default instead of an exception
for item, price in prices.items():
    print(item, price)
Java
// Java
Map<String, Integer> prices = new HashMap<>();
prices.put("bread", 40);
prices.getOrDefault("tofu", 0);
C++
// C++
std::unordered_map<std::string, int> prices = {{"bread", 40}, {"milk", 60}};
prices["butter"] = 120;

The analogy: a list is lockers in a row (find by position); a dict is a phone book (find by name). Asking "is bread in this list of 1 million items?" requires checking up to all million; asking a dict takes one step regardless of size. Why that's true — hashing — is exactly hash tables in Level 2; for now, trust the speed.

Use a dict when you look things up by identity: user by id, price by product, count by word. The moment you write a loop to find something in a list repeatedly, you probably wanted a dict.

Sets — membership and uniqueness

A set holds unique values, unordered, with instant membership checks:

Python
# Python
visitors = {"asha", "rahul", "asha"}     # duplicates collapse
print(len(visitors))                     # 2
visitors.add("meera")
print("rahul" in visitors)               # True — instant, like dict keys

admins = {"asha", "meera"}
print(visitors & admins)                 # intersection: both
print(visitors - admins)                 # difference: visitors who aren't admins

Use a set for "have I seen this before?" (dedupe, visited pages, already-processed ids) and for set algebra (common followers, missing permissions). A set is essentially a dict that stores only keys.

Tuples — small fixed groups

A tuple is a small, immutable (unchangeable) bundle of values that belong together:

Python
point = (3, 7)                      # x, y — forever
x, y = point                        # unpacking
locations = {(28.6, 77.2): "Delhi"} # immutable ⇒ usable as dict keys
def min_max(nums):
    return (min(nums), max(nums))   # return two values at once

Use tuples for coordinates, RGB colors, (key, value) pairs, multiple return values. The immutability is a feature: a function can hand one out without fearing the caller will mutate it (remember aliasing). When a tuple grows fields or needs names, graduate to a class (OOP Basics) — point.x beats point[0].

Choosing: the decision in four questions

And the performance intuition to carry into Level 2 (full version: complexity cheat sheet):

Operationlistdict / set
Get by index/keyinstantinstant
x in collectionscans everything — O(n)instant — O(1)
Add at end / insert pairinstantinstant
Insert at front/middleshifts everything — O(n)

That one row — membership: O(n) vs O(1) — explains half of all real-world slow code, and half of all DSA interview optimizations.

Nesting: collections of collections

Real data is nested — and you already know this shape from JSON:

Python
orders = [                                  # a list of dicts
    {"id": 101, "customer": "Asha", "items": ["bread", "eggs"]},
    {"id": 102, "customer": "Rahul", "items": ["milk"]},
]
print(orders[0]["items"][1])                # "eggs"

word_count = {}                             # THE classic pattern: counting
for word in text.split():
    word_count[word] = word_count.get(word, 0) + 1

That counting loop is worth memorizing — it's the accumulator pattern from Control Flow wearing a dict, it appears in countless interview problems, and it's your first real hash table algorithm.

Common beginner mistakes

  • List when you meant dict. Looping through users to find an id, inside another loop — accidental O(n²) (nested-loop multiplication). Build users_by_id once; look up forever.
  • Mutating a list while iterating it — skipped items, subtle chaos. Build a new list with the items you keep.
  • The aliasing classic: b = a copies the reference (Functions) — b.append(x) changes a too. Copy explicitly: b = a.copy() (and copy.deepcopy for nested data).
  • dict[key] on a maybe-missing keyKeyError (Exceptions). Use .get(key, default) or check in first.
  • Relying on order in a set. Sets (and Java's HashMap keys) have no useful order; if order matters, you wanted a list (or sort at the end).

Interview perspective

Practice

Beginner

  1. Keep a shopping list: loop reading commands add x, remove x, show, quit — implement each (list + while loop).
  2. Given [3, 7, 3, 1, 7, 3], produce: the unique values, the count of each value, and the list reversed — pick the right collection for each.

Intermediate

  1. Two lists: students enrolled in Math, students in Physics. Produce: in both, in either, only in Math. (Sets — then re-do it without sets and appreciate them.)
  2. Build a gradebook dict of student → list of marks. Write functions add_mark, average(student), topper(). Handle the missing-student case both ways: .get default and try/except.

Advanced

  1. Group a list of words into anagram families: ["eat","tea","tan","ate"]{"aet": ["eat","tea","ate"], "ant": ["tan"]}. Hint: what key makes anagrams collide? Why must that key be a string or tuple, not a list? (You just invented the canonical-form trick — see it again in the problem bank.)

Next: File Handling — making data outlive the program.