Classroom Glossary Public page

Week 4: Lists and Dictionaries (and Tuples and Sets)

1,655 words

The four standard collection types and when each one fits. The lab builds a class-roster tool: read a CSV of students, group by grade, emit a JSON summary.


Theme

A single number, a single string, a single boolean: these are values. Real programs work on collections of values. A list of guesses. A dictionary of scores. A set of unique words from a file. A tuple representing one row of data.

Python ships with four built-in collection types. Each has a shape it fits cleanly; using the wrong one is a tax that compounds for the life of the program.

  • list: ordered, mutable, indexable by integer. The default "ordered sequence."
  • dict: ordered (insertion order, since 3.7), mutable, indexable by hashable key. The default "lookup table."
  • tuple: ordered, IMMUTABLE, indexable by integer. The default "record" or "small fixed structure."
  • set: unordered, mutable, no duplicates. The default "do I have one of these?" check.

This week the lab forces you to make these choices. A class roster could be stored as a list of dicts (each dict is one student), or as a dict keyed by name (each value is a dict of attributes), or as a dict keyed by grade (each value is a list of student names). All three are correct. The choice depends on what queries you will run.

By the end of week 4 you can: pick the right collection type for a problem, use list and dict comprehensions where they clarify code, distinguish "mutating a list" from "rebinding a variable" (the famous a = b = [1,2]; b.append(3) surprise), serialize collections to and from JSON and CSV (a forward-pointer to week 5), and recognize when a list-of-lists should have been a list-of-tuples.

Reading list (~1 hour)

  1. Matthes, Python Crash Course 2nd ed., Ch 3 ("Introducing Lists") + Ch 4 ("Working with Lists") + Ch 6 ("Dictionaries"). Matthes spends three full chapters on lists and dicts because they are that fundamental. Read all three; the slicing examples in Ch 4 §4.4 (slicing notation) are the canonical reference.
  2. Sweigart, Automate the Boring Stuff with Python 2nd ed., Ch 4 ("Lists") and Ch 5 ("Dictionaries and Structuring Data") at https://automatetheboringstuff.com/2e/chapter4/ and https://automatetheboringstuff.com/2e/chapter5/. Free online. Sweigart's "references" section in Ch 4 is the clearest explanation of the mutable-aliasing surprise.
  3. Allen B. Downey, Think Python 2nd ed., Ch 10 ("Lists"), Ch 11 ("Dictionaries"), Ch 12 ("Tuples") at https://greenteapress.com/thinkpython2/html/thinkpython2011.html (and following). Downey's framing of "lists are mutable, strings are not" (and the implications) is the clearest in print.
  4. Real Python: "Python's range() Function" at https://realpython.com/python-range/. ~15 min read. Useful background for list comprehensions.

Lecture outline (~1.5 hours, 2 sessions of ~50 min)

Session 1: Lists and tuples

Section 1.1: Lists

  • A list is an ordered sequence of values. The values can be of any type, including mixed:
    numbers = [1, 2, 3, 4, 5]
    mixed = [1, 'hello', 3.14, True, None]
    
  • Index from 0: numbers[0] is 1; numbers[4] is 5.
  • Negative indexing from the end: numbers[-1] is 5; numbers[-2] is 4.
  • Length: len(numbers) is 5.
  • Membership: 3 in numbers is True.
  • Mutate: numbers[0] = 100. Append: numbers.append(6). Insert: numbers.insert(0, 0). Remove by value: numbers.remove(100). Remove by index: del numbers[0] or popped = numbers.pop(0).

Section 1.2: Slicing

  • The slice notation is list[start:stop:step]. All three are optional.
  • numbers[1:3] is [2, 3] (start inclusive, stop exclusive; same convention as range).
  • numbers[:3] is [1, 2, 3]. numbers[3:] is [4, 5]. numbers[:] is a shallow COPY of the whole list.
  • numbers[::-1] is the list reversed: [5, 4, 3, 2, 1].
  • Slicing returns a new list; it does not mutate the original. Slice assignment (numbers[1:3] = [20, 30]) does mutate.
  • The same slice notation works on strings: 'hello'[:3] is 'hel'.

Section 1.3: Mutability and aliasing

  • A list is mutable. A variable that "holds" a list actually holds a reference to it.
  • The classic surprise:
    a = [1, 2, 3]
    b = a
    b.append(4)
    print(a)   # [1, 2, 3, 4]  -- a and b reference the SAME list
    print(a is b)  # True
    
  • To get an independent copy: b = a.copy() or b = a[:] or b = list(a). Three idioms; all do the same thing for a flat list.
  • For nested lists (a list of lists), copy() is a SHALLOW copy: the outer list is independent but the inner lists are still shared. Use copy.deepcopy(a) for a fully independent copy.
  • This is the source of the def f(x=[]): gotcha from week 3: the default [] is created once and shared across calls.

Section 1.4: Tuples

  • A tuple is an immutable list. Syntax uses parentheses: point = (3, 4).
  • Index, slice, iterate the same as lists: point[0] is 3; for v in point: ... works.
  • You cannot assign: point[0] = 5 raises TypeError.
  • Why use a tuple? Two reasons: (1) immutability makes it safer to pass around (the receiver cannot mutate it under you); (2) a tuple represents a record; a small fixed-shape group of values. (latitude, longitude). (name, grade). Use a tuple when the position of each value has a meaning.
  • Single-element tuples need a trailing comma: (5,) is a tuple; (5) is just the number 5 in parens.
  • Tuple unpacking is a Python idiom: x, y = point assigns 3 to x and 4 to y. Works in for: for name, grade in pairs:.

Section 1.5: List comprehensions

  • A compact syntax for "build a list by transforming another iterable":
    squares = [n * n for n in range(10)]
    # equivalent to:
    squares = []
    for n in range(10):
        squares.append(n * n)
    
  • With a filter:
    even_squares = [n * n for n in range(10) if n % 2 == 0]
    
  • Read it left-to-right: "for each n in range(10), if n is even, compute n*n, collect into a list."
  • Nested comprehensions exist but are usually a sign the loop should be unrolled. FND-102 stops at one level of comprehension. Beyond that, use explicit for.

Session 2: Dicts and sets

Section 2.1: Dicts

  • A dict (dictionary) maps keys to values. Syntax uses braces with : between key and value:
    ages = {'alice': 30, 'bob': 25, 'charlie': 40}
    
  • Look up: ages['alice'] is 30. KeyError if the key is missing.
  • Safer look-up: ages.get('dave') returns None; ages.get('dave', 0) returns 0 (the default).
  • Insert / update: ages['dave'] = 22. Delete: del ages['bob'] or ages.pop('bob').
  • Membership: 'alice' in ages is True. (Always checks KEYS, not values.)
  • Length: len(ages) is the number of keys.

Section 2.2: Iterating a dict

  • for key in ages: iterates over keys.
  • for value in ages.values(): iterates over values.
  • for key, value in ages.items(): iterates over (key, value) pairs (tuples; unpacked in the for header).
  • The order is insertion order, since Python 3.7. (Earlier versions did not guarantee order.)

Section 2.3: Keys must be hashable

  • Dict keys can be strings, numbers, tuples; anything hashable. Lists CANNOT be keys (they are mutable, therefore not hashable). Tuples can.
  • ages = {(40, 'N'): 'Madison'} is fine. ages = {[40, 'N']: 'Madison'} raises TypeError.
  • This is why tuples exist: as compound keys.

Section 2.4: Dict comprehensions

  • Same shape as list comprehensions, with : for key:value:
    squares = {n: n * n for n in range(5)}
    # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
    
  • Useful for "build a lookup table from a list of pairs."

Section 2.5: Sets

  • A set is an unordered collection of unique hashable values. Syntax uses braces (like dicts but without :):
    fruits = {'apple', 'banana', 'cherry'}
    
  • Adding a duplicate is a no-op: fruits.add('apple') leaves the set unchanged.
  • Set operations: union (a | b), intersection (a & b), difference (a - b), symmetric difference (a ^ b).
  • Use a set when you want "is x in this collection?" with O(1) lookup, or to deduplicate a list: unique = list(set(my_list)) (loses order).
  • Empty set: set(), NOT {} (which is an empty dict).

Section 2.6: When to use what

  • "An ordered sequence I will index and mutate" -> list
  • "A lookup table by key" -> dict
  • "A fixed-shape record" -> tuple (or a dataclass; you will meet dataclass in week 5 or 13)
  • "A bag of unique things I will membership-test" -> set
  • "I want immutability so the receiver cannot mutate" -> tuple
  • "I am dedup'ing a list" -> set (then convert back if you need order)

A small rule of thumb: if you find yourself with if item not in seen: seen.append(item), use a set for seen. The in check is O(n) for a list, O(1) for a set.

Labs (~90 minutes)

Lab 4: Class-Roster Grouper (labs/lab-4-class-roster.md)

  • Goal: read a CSV of students; group them by grade level; emit a JSON summary
  • Time: ~90 minutes
  • Artifact: lab-4-roster.py + sample input CSV + sample output JSON in ~/fnd-102/lab-4/, committed to Git

Independent practice (~4 hours)

  1. Pick-the-right-collection drill (30 min). For each scenario, pick list / dict / tuple / set and write one sentence explaining why:

    • You need to count word frequencies in a document
    • You need to track all the (x, y) positions a chess piece has occupied
    • You need to look up an employee's salary by employee ID
    • You need a "playlist" of songs that the user can reorder
    • You need to know if a particular IP address has been seen before
    • You need to record one student's name, grade, and birthday
  2. Mutability surprises (45 min). Predict the output of each block before running:

    # block 1
    a = [1, 2, 3]
    b = a
    a.append(4)
    print(b)
    
    # block 2
    a = [1, 2, 3]
    b = a.copy()
    a.append(4)
    print(b)
    
    # block 3
    a = [[1, 2], [3, 4]]
    b = a.copy()
    a[0].append(99)
    print(b)
    

    Block 3 is the famous "shallow copy" surprise. Re-do it with copy.deepcopy(a) and observe.

  3. List comprehension drills (45 min). Convert each for loop to a list comprehension; check that the output is the same.

    # 1
    result = []
    for n in range(20):
        if n % 3 == 0:
            result.append(n)
    
    # 2
    result = []
    for word in ['hello', 'world', 'python']:
        result.append(word.upper())
    
    # 3
    result = []
    for x in range(5):
        for y in range(5):
            if x != y:
                result.append((x, y))
    

    The third is a double-loop comprehension; do it, then revert to for. Which is more readable?

  4. Dict-of-list pattern (45 min). A common shape: "group items by a key." Given a list of (name, grade) tuples, produce a dict {grade: [name, name, ...]}. Two ways:

    pairs = [('alice', 9), ('bob', 10), ('charlie', 9), ('dave', 11)]
    
    # Way 1: dict.setdefault
    grouped = {}
    for name, grade in pairs:
        grouped.setdefault(grade, []).append(name)
    
    # Way 2: collections.defaultdict
    from collections import defaultdict
    grouped = defaultdict(list)
    for name, grade in pairs:
        grouped[grade].append(name)
    

    Implement both. Which is more readable? Which would you write if you do this once vs many times?

  5. Set arithmetic (30 min). Two sets:

    weekday_meals = {'monday', 'tuesday', 'wednesday', 'thursday', 'friday'}
    prepared = {'monday', 'wednesday', 'friday'}
    

    Use set operations to compute: "which weekdays still need a meal planned?" and "did I plan any meals on a non-weekday?" (the latter requires a third set, the days of the week).

  6. Lab 4 polish (60 min). After completing the base Lab 4: add a --by-name flag (you parse it manually for now) that, instead of grouping by grade, sorts the roster alphabetically by name and emits a flat list.

Reflection prompts (~30 minutes)

  1. Pick one collection type you used in Lab 4 and explain why you chose it over the other three. What would have changed if you used a different one?
  2. The mutability-aliasing surprise (block 1 in the practice above) catches every Python beginner. Did you predict the answer? If not, what was your mental model, and what is it now?
  3. List comprehensions are concise; explicit for loops are clear. When does the comprehension win? When does the explicit loop?
  4. The Lab 4 CSV format is one row per student. What would change if it were nested JSON instead? Why might the school's data team have picked CSV?
  5. One thing from this week you want to know more about?

Tool journal (week 4)

  • list: ordered, mutable sequence
  • dict: ordered key-to-value mapping
  • tuple: ordered, immutable record
  • set: unordered collection of unique values
  • List comprehensions: [expr for x in iterable if cond]
  • Dict comprehensions: {k: v for x in iterable}
  • collections.defaultdict: dict that creates default values on miss
  • copy.copy, copy.deepcopy: explicit copying

What comes next

Week 5 turns to file I/O. Your Lab 4 roster CSV was provided; week 5 teaches you to read and write CSV and JSON yourself, plus plain-text logs. The week-5 lab is a log-file scanner that filters ERROR lines from a multi-megabyte log file; the first time you work with input that does not fit comfortably in your head.