Classroom Glossary Public page

Lab 4: Class-Roster Grouper

917 words

~90 minutes. Read a CSV of students; group them by grade; emit a JSON summary. Practice lists, dicts, tuples, the CSV module, and the JSON module.


Goal: ship a Python CLI tool that reads a CSV like students.csv and writes a JSON summary that groups students by grade level.

Estimated time: 90 minutes

Prerequisites: Week 4 lecture (collections, comprehensions). Labs 1-3 complete.


Setup

mkdir -p ~/fnd-102/lab-4
cd ~/fnd-102/lab-4

Create a sample students.csv (use a text editor; tab-separated also works but CSV is the lab default):

name,grade,homeroom
Alice Anderson,9,A101
Bob Brown,10,B202
Charlie Chen,9,A101
Dana Davis,11,C303
Eve Edwards,10,B202
Fred Foster,9,A102
Grace Green,11,C304
Henry Hall,12,D401

Aim for 20 rows for the lab (you can generate them or write them by hand). The CSV must have a header row and at least three columns: name, grade, and one other field of your choice.

Then create lab-4-roster.py.


Part A: Read the CSV (20 min)

Python's standard library has a csv module that parses CSV correctly (handling quoted fields, commas inside quotes, etc.). The minimum read pattern:

import csv

def read_roster(path):
    """Read a roster CSV and return a list of student dicts."""
    students = []
    with open(path, newline='', encoding='utf-8') as f:
        reader = csv.DictReader(f)
        for row in reader:
            students.append(row)
    return students

csv.DictReader reads the first row as field names and yields each subsequent row as a dict keyed by those field names. So row['name'] is the student's name; row['grade'] is the grade level (as a string; you will convert to int later).

The newline='' argument to open() is important: it tells open to not translate line endings, so the csv module can handle them itself. Without it, CSVs from Windows misbehave.

Test by adding at the bottom of your script:

if __name__ == '__main__':
    students = read_roster('students.csv')
    print(f'Read {len(students)} students.')
    print('First student:', students[0])

Run python3 lab-4-roster.py. You should see Read 8 students. First student: {'name': 'Alice Anderson', 'grade': '9', 'homeroom': 'A101'}.


Part B: Group by grade (25 min)

Write a function that takes the student list and returns a dict keyed by grade, where each value is the list of student names in that grade:

def group_by_grade(students):
    """Group students by grade level. Return {grade_int: [name, ...]}."""
    grouped = {}
    for student in students:
        grade = int(student['grade'])
        grouped.setdefault(grade, []).append(student['name'])
    return grouped

The setdefault idiom is "if the key is not in the dict, create it with this default value, then return the value at the key." Equivalent to:

if grade not in grouped:
    grouped[grade] = []
grouped[grade].append(student['name'])

A defaultdict(list) from the collections module is the third equivalent. Use whichever reads most naturally to you.

Notice the int conversion: grade = int(student['grade']). CSV fields are always strings; converting at the boundary is good practice (you want to compare grade 9 to grade 10 numerically, not lexicographically).

Test:

if __name__ == '__main__':
    students = read_roster('students.csv')
    grouped = group_by_grade(students)
    print(grouped)
    # Expected: {9: ['Alice Anderson', 'Charlie Chen', 'Fred Foster'], 10: ['Bob Brown', 'Eve Edwards'], 11: ['Dana Davis', 'Grace Green'], 12: ['Henry Hall']}

Part C: Emit JSON (15 min)

The json module serializes Python dicts and lists to JSON:

import json

def write_summary(grouped, path):
    """Write the grouped roster as a JSON file."""
    with open(path, 'w', encoding='utf-8') as f:
        json.dump(grouped, f, indent=2, sort_keys=True)

Two arguments worth knowing:

  • indent=2 produces a pretty-printed JSON with 2-space indentation. Without it, you get a single line of JSON (compact but unreadable). For human-facing files, always indent.
  • sort_keys=True sorts the keys alphabetically. For a grade-keyed dict that means grades come out in numeric order. (Strictly speaking, sort_keys sorts as strings, but '10', '11', '12', '9' would be wrong order; the cleaner approach is to convert grades to int as you did in Part B, but JSON requires string keys, so json.dump will stringify. For numeric ordering of stringified-int keys, you need to convert grades back when writing, or use a sorted list of keys. Stretch exercise.)

Wire it up:

if __name__ == '__main__':
    students = read_roster('students.csv')
    grouped = group_by_grade(students)
    write_summary(grouped, 'roster-summary.json')
    print(f'Wrote roster-summary.json with {len(grouped)} grade buckets.')

Run and inspect:

python3 lab-4-roster.py
cat roster-summary.json

You should see well-indented JSON like:

{
  "10": ["Bob Brown", "Eve Edwards"],
  "11": ["Dana Davis", "Grace Green"],
  "12": ["Henry Hall"],
  "9": ["Alice Anderson", "Charlie Chen", "Fred Foster"]
}

Notice: even though your dict keys were ints, JSON converts them to strings (JSON requires string keys). This is fine for the lab; in week 5 you will learn how to control this with default= argument to json.dump.


Part D: Add some statistics (15 min)

Extend the summary so each grade bucket includes a count, not just a list:

def group_by_grade_with_counts(students):
    """Group students by grade level. Return {grade: {'count': N, 'students': [name, ...]}}."""
    grouped = {}
    for student in students:
        grade = int(student['grade'])
        if grade not in grouped:
            grouped[grade] = {'count': 0, 'students': []}
        grouped[grade]['count'] += 1
        grouped[grade]['students'].append(student['name'])
    return grouped

The output JSON becomes:

{
  "9": {
    "count": 3,
    "students": ["Alice Anderson", "Charlie Chen", "Fred Foster"]
  },
  "10": { ... }
}

You can compute count as len(students) after the loop, but accumulating during the loop is the more general pattern (useful when you compute multiple statistics).


Part E: Commit your work (15 min)

cd ~/fnd-102/lab-4
git add lab-4-roster.py students.csv roster-summary.json
git commit -m "lab-4: class-roster grouper reads CSV, groups by grade, writes JSON summary"

A second commit for the statistics extension (Part D) is a good multi-commit habit:

git add lab-4-roster.py
git commit -m "lab-4: extend summary with per-grade count"

Expected output / artifact

lab-4-roster.py should:

  • Read students.csv (in the lab directory)
  • Group students by grade level
  • Write roster-summary.json with per-grade counts and name lists
  • Use csv.DictReader and json.dump (not custom CSV parsing or string concatenation for JSON)

Files committed: lab-4-roster.py, students.csv, roster-summary.json.


What's the failure mode?

This tool's likely failure modes:

  1. CSV with embedded commas in a name. "Anderson, Alice" as a single field requires the name to be quoted: "Anderson, Alice",9,A101. csv.DictReader handles this correctly if the quoting is correct. Test by adding a row with a quoted comma-containing name; verify your output.
  2. Missing grade column. If the CSV header has grade_level instead of grade, student['grade'] raises KeyError. Defensive: print the header row's field names at startup to verify.
  3. Non-numeric grade. A row with grade,9th,A101 (typo: '9th' instead of '9') crashes at int(student['grade']). The week 9 fix is try/except around the conversion plus skip-with-warning; for this week, a crash is acceptable.
  4. Empty CSV. A file with only a header row produces an empty students list; the output JSON is {}. Not a bug; the behavior is correct, but worth verifying.

Common pitfalls

  • Forgetting newline='' in open(). Works on macOS / Linux; produces extra blank rows on Windows-authored CSVs. Always include it for CSV files.
  • Iterating the DictReader twice. csv.DictReader is a one-shot iterator. After the first for row in reader: exhausts it, a second iteration sees nothing. If you need to walk the data twice, collect into a list first: rows = list(reader).
  • Comparing string grades. '9' < '10' is False in Python (lexicographic order: '1' < '9'). Always convert to int at the boundary.
  • json.dumps vs json.dump. dumps returns a string (no file); dump writes to a file object. Easy to swap accidentally. The 's' is for "string."

Stretch (optional)

  1. Sort the names within each grade bucket alphabetically by last name. Hint: sorted(names, key=lambda n: n.split()[-1]).
  2. Read from stdin instead of a file. Allow python3 lab-4-roster.py < students.csv > summary.json. Use sys.stdin and sys.stdout instead of open(). Forward-pointer to Unix pipeline composition (week 9 subprocess).
  3. Add a --top-N argument (manual sys.argv parsing for now) that emits, instead of the full grouping, a list of the top N students per grade (alphabetically). Forward-pointer to week 6 argparse.
  4. Validate the CSV. Before processing, check that the required columns (name, grade) are present; print a clear error and exit nonzero if not. Practitioner habit; catches typos in the input.

Lab 4 v0.1.