The four standard collection types and when each one fits. The lab builds a class-roster tool: read a CSV of students, group by grade, emit a JSON summary.
Theme
A single number, a single string, a single boolean: these are values. Real programs work on collections of values. A list of guesses. A dictionary of scores. A set of unique words from a file. A tuple representing one row of data.
Python ships with four built-in collection types. Each has a shape it fits cleanly; using the wrong one is a tax that compounds for the life of the program.
- list: ordered, mutable, indexable by integer. The default "ordered sequence."
- dict: ordered (insertion order, since 3.7), mutable, indexable by hashable key. The default "lookup table."
- tuple: ordered, IMMUTABLE, indexable by integer. The default "record" or "small fixed structure."
- set: unordered, mutable, no duplicates. The default "do I have one of these?" check.
This week the lab forces you to make these choices. A class roster could be stored as a list of dicts (each dict is one student), or as a dict keyed by name (each value is a dict of attributes), or as a dict keyed by grade (each value is a list of student names). All three are correct. The choice depends on what queries you will run.
By the end of week 4 you can: pick the right collection type for a problem, use list and dict comprehensions where they clarify code, distinguish "mutating a list" from "rebinding a variable" (the famous a = b = [1,2]; b.append(3) surprise), serialize collections to and from JSON and CSV (a forward-pointer to week 5), and recognize when a list-of-lists should have been a list-of-tuples.
Reading list (~1 hour)
- Matthes, Python Crash Course 2nd ed., Ch 3 ("Introducing Lists") + Ch 4 ("Working with Lists") + Ch 6 ("Dictionaries"). Matthes spends three full chapters on lists and dicts because they are that fundamental. Read all three; the slicing examples in Ch 4 §4.4 (slicing notation) are the canonical reference.
- Sweigart, Automate the Boring Stuff with Python 2nd ed., Ch 4 ("Lists") and Ch 5 ("Dictionaries and Structuring Data") at
https://automatetheboringstuff.com/2e/chapter4/andhttps://automatetheboringstuff.com/2e/chapter5/. Free online. Sweigart's "references" section in Ch 4 is the clearest explanation of the mutable-aliasing surprise. - Allen B. Downey, Think Python 2nd ed., Ch 10 ("Lists"), Ch 11 ("Dictionaries"), Ch 12 ("Tuples") at
https://greenteapress.com/thinkpython2/html/thinkpython2011.html(and following). Downey's framing of "lists are mutable, strings are not" (and the implications) is the clearest in print. - Real Python: "Python's range() Function" at
https://realpython.com/python-range/. ~15 min read. Useful background for list comprehensions.
Lecture outline (~1.5 hours, 2 sessions of ~50 min)
Session 1: Lists and tuples
Section 1.1: Lists
- A list is an ordered sequence of values. The values can be of any type, including mixed:
numbers = [1, 2, 3, 4, 5] mixed = [1, 'hello', 3.14, True, None]
- Index from 0:
numbers[0]is1;numbers[4]is5. - Negative indexing from the end:
numbers[-1]is5;numbers[-2]is4. - Length:
len(numbers)is5. - Membership:
3 in numbersisTrue. - Mutate:
numbers[0] = 100. Append:numbers.append(6). Insert:numbers.insert(0, 0). Remove by value:numbers.remove(100). Remove by index:del numbers[0]orpopped = numbers.pop(0).
Section 1.2: Slicing
- The slice notation is
list[start:stop:step]. All three are optional. numbers[1:3]is[2, 3](start inclusive, stop exclusive; same convention asrange).numbers[:3]is[1, 2, 3].numbers[3:]is[4, 5].numbers[:]is a shallow COPY of the whole list.numbers[::-1]is the list reversed:[5, 4, 3, 2, 1].- Slicing returns a new list; it does not mutate the original. Slice assignment (
numbers[1:3] = [20, 30]) does mutate. - The same slice notation works on strings:
'hello'[:3]is'hel'.
Section 1.3: Mutability and aliasing
- A list is mutable. A variable that "holds" a list actually holds a reference to it.
- The classic surprise:
a = [1, 2, 3] b = a b.append(4) print(a) # [1, 2, 3, 4] -- a and b reference the SAME list print(a is b) # True
- To get an independent copy:
b = a.copy()orb = a[:]orb = list(a). Three idioms; all do the same thing for a flat list. - For nested lists (a list of lists),
copy()is a SHALLOW copy: the outer list is independent but the inner lists are still shared. Usecopy.deepcopy(a)for a fully independent copy. - This is the source of the
def f(x=[]):gotcha from week 3: the default[]is created once and shared across calls.
Section 1.4: Tuples
- A tuple is an immutable list. Syntax uses parentheses:
point = (3, 4). - Index, slice, iterate the same as lists:
point[0]is3;for v in point: ...works. - You cannot assign:
point[0] = 5raisesTypeError. - Why use a tuple? Two reasons: (1) immutability makes it safer to pass around (the receiver cannot mutate it under you); (2) a tuple represents a record; a small fixed-shape group of values.
(latitude, longitude).(name, grade). Use a tuple when the position of each value has a meaning. - Single-element tuples need a trailing comma:
(5,)is a tuple;(5)is just the number 5 in parens. - Tuple unpacking is a Python idiom:
x, y = pointassigns 3 to x and 4 to y. Works infor:for name, grade in pairs:.
Section 1.5: List comprehensions
- A compact syntax for "build a list by transforming another iterable":
squares = [n * n for n in range(10)] # equivalent to: squares = [] for n in range(10): squares.append(n * n)
- With a filter:
even_squares = [n * n for n in range(10) if n % 2 == 0]
- Read it left-to-right: "for each n in range(10), if n is even, compute n*n, collect into a list."
- Nested comprehensions exist but are usually a sign the loop should be unrolled. FND-102 stops at one level of comprehension. Beyond that, use explicit
for.
Session 2: Dicts and sets
Section 2.1: Dicts
- A dict (dictionary) maps keys to values. Syntax uses braces with
:between key and value:ages = {'alice': 30, 'bob': 25, 'charlie': 40}
- Look up:
ages['alice']is30. KeyError if the key is missing. - Safer look-up:
ages.get('dave')returnsNone;ages.get('dave', 0)returns0(the default). - Insert / update:
ages['dave'] = 22. Delete:del ages['bob']orages.pop('bob'). - Membership:
'alice' in agesisTrue. (Always checks KEYS, not values.) - Length:
len(ages)is the number of keys.
Section 2.2: Iterating a dict
for key in ages:iterates over keys.for value in ages.values():iterates over values.for key, value in ages.items():iterates over (key, value) pairs (tuples; unpacked in the for header).- The order is insertion order, since Python 3.7. (Earlier versions did not guarantee order.)
Section 2.3: Keys must be hashable
- Dict keys can be strings, numbers, tuples; anything hashable. Lists CANNOT be keys (they are mutable, therefore not hashable). Tuples can.
ages = {(40, 'N'): 'Madison'}is fine.ages = {[40, 'N']: 'Madison'}raisesTypeError.- This is why tuples exist: as compound keys.
Section 2.4: Dict comprehensions
- Same shape as list comprehensions, with
:for key:value:squares = {n: n * n for n in range(5)} # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
- Useful for "build a lookup table from a list of pairs."
Section 2.5: Sets
- A set is an unordered collection of unique hashable values. Syntax uses braces (like dicts but without
:):fruits = {'apple', 'banana', 'cherry'}
- Adding a duplicate is a no-op:
fruits.add('apple')leaves the set unchanged. - Set operations: union (
a | b), intersection (a & b), difference (a - b), symmetric difference (a ^ b). - Use a set when you want "is x in this collection?" with O(1) lookup, or to deduplicate a list:
unique = list(set(my_list))(loses order). - Empty set:
set(), NOT{}(which is an empty dict).
Section 2.6: When to use what
- "An ordered sequence I will index and mutate" -> list
- "A lookup table by key" -> dict
- "A fixed-shape record" -> tuple (or a
dataclass; you will meetdataclassin week 5 or 13) - "A bag of unique things I will membership-test" -> set
- "I want immutability so the receiver cannot mutate" -> tuple
- "I am dedup'ing a list" -> set (then convert back if you need order)
A small rule of thumb: if you find yourself with if item not in seen: seen.append(item), use a set for seen. The in check is O(n) for a list, O(1) for a set.
Labs (~90 minutes)
Lab 4: Class-Roster Grouper (labs/lab-4-class-roster.md)
- Goal: read a CSV of students; group them by grade level; emit a JSON summary
- Time: ~90 minutes
- Artifact:
lab-4-roster.py+ sample input CSV + sample output JSON in~/fnd-102/lab-4/, committed to Git
Independent practice (~4 hours)
-
Pick-the-right-collection drill (30 min). For each scenario, pick list / dict / tuple / set and write one sentence explaining why:
- You need to count word frequencies in a document
- You need to track all the (x, y) positions a chess piece has occupied
- You need to look up an employee's salary by employee ID
- You need a "playlist" of songs that the user can reorder
- You need to know if a particular IP address has been seen before
- You need to record one student's name, grade, and birthday
-
Mutability surprises (45 min). Predict the output of each block before running:
# block 1 a = [1, 2, 3] b = a a.append(4) print(b) # block 2 a = [1, 2, 3] b = a.copy() a.append(4) print(b) # block 3 a = [[1, 2], [3, 4]] b = a.copy() a[0].append(99) print(b)
Block 3 is the famous "shallow copy" surprise. Re-do it with
copy.deepcopy(a)and observe. -
List comprehension drills (45 min). Convert each
forloop to a list comprehension; check that the output is the same.# 1 result = [] for n in range(20): if n % 3 == 0: result.append(n) # 2 result = [] for word in ['hello', 'world', 'python']: result.append(word.upper()) # 3 result = [] for x in range(5): for y in range(5): if x != y: result.append((x, y))
The third is a double-loop comprehension; do it, then revert to
for. Which is more readable? -
Dict-of-list pattern (45 min). A common shape: "group items by a key." Given a list of (name, grade) tuples, produce a dict
{grade: [name, name, ...]}. Two ways:pairs = [('alice', 9), ('bob', 10), ('charlie', 9), ('dave', 11)] # Way 1: dict.setdefault grouped = {} for name, grade in pairs: grouped.setdefault(grade, []).append(name) # Way 2: collections.defaultdict from collections import defaultdict grouped = defaultdict(list) for name, grade in pairs: grouped[grade].append(name)
Implement both. Which is more readable? Which would you write if you do this once vs many times?
-
Set arithmetic (30 min). Two sets:
weekday_meals = {'monday', 'tuesday', 'wednesday', 'thursday', 'friday'} prepared = {'monday', 'wednesday', 'friday'}
Use set operations to compute: "which weekdays still need a meal planned?" and "did I plan any meals on a non-weekday?" (the latter requires a third set, the days of the week).
-
Lab 4 polish (60 min). After completing the base Lab 4: add a
--by-nameflag (you parse it manually for now) that, instead of grouping by grade, sorts the roster alphabetically by name and emits a flat list.
Reflection prompts (~30 minutes)
- Pick one collection type you used in Lab 4 and explain why you chose it over the other three. What would have changed if you used a different one?
- The mutability-aliasing surprise (block 1 in the practice above) catches every Python beginner. Did you predict the answer? If not, what was your mental model, and what is it now?
- List comprehensions are concise; explicit
forloops are clear. When does the comprehension win? When does the explicit loop? - The Lab 4 CSV format is one row per student. What would change if it were nested JSON instead? Why might the school's data team have picked CSV?
- One thing from this week you want to know more about?
Tool journal (week 4)
list: ordered, mutable sequencedict: ordered key-to-value mappingtuple: ordered, immutable recordset: unordered collection of unique values- List comprehensions:
[expr for x in iterable if cond] - Dict comprehensions:
{k: v for x in iterable} collections.defaultdict: dict that creates default values on misscopy.copy,copy.deepcopy: explicit copying
What comes next
Week 5 turns to file I/O. Your Lab 4 roster CSV was provided; week 5 teaches you to read and write CSV and JSON yourself, plus plain-text logs. The week-5 lab is a log-file scanner that filters ERROR lines from a multi-megabyte log file; the first time you work with input that does not fit comfortably in your head.