Classroom Glossary Public page

Week 9: Subprocess and pdb

1,579 words

Drive other programs from Python. Debug your own programs with pdb. The lab wraps the Unix du utility via subprocess and includes a debugging exercise that plants a subtle bug for you to find.


Theme

Two skills this week. They look unrelated but share a register: this is the week Python becomes a connector to the rest of your system.

The first skill is subprocess: running another program from Python and reading its output. Most Python automation is glue between existing tools. Want disk usage? Wrap du. Want git status? Wrap git status. Want to convert a video? Wrap ffmpeg. Reaching for subprocess is the right move whenever a Unix utility (or any command-line program) already does the work better than you could write it. The discipline is in handling the result correctly: exit codes, stdout, stderr, errors.

The second skill is pdb, the standard Python debugger. Until this week you have used print-debugging exclusively, and it has been fine for the problems you have faced. This week's lab plants a bug subtle enough that print-debugging is genuinely worse than pdb: the bug involves a wrong value computed somewhere deep in a function chain, and finding it with prints requires sprinkling the whole call tree with print(name, val) lines. With pdb you set a breakpoint, run, inspect, step.

You also pick up try/except properly this week, completing what week 5 introduced as a forward-pointer. By the end of week 9 you can: launch a subprocess and read its output safely; recognize the shell-injection risk and avoid shell=True; set a pdb breakpoint and use the four core commands (n, s, c, p); read a Python traceback root-up; catch specific exceptions and let unexpected ones propagate.

Reading list (~1 hour)

  1. Matthes, Python Crash Course 2nd ed., Ch 10.4 ("Exceptions"). Matthes covers try/except/else/finally and the common exception classes. FND-102 week 9 is where you finally get this in depth.
  2. Sweigart, Automate the Boring Stuff with Python 2nd ed., Ch 11 ("Debugging") at https://automatetheboringstuff.com/2e/chapter11/. Free online. Covers tracebacks, assertions, and logging-as-debugging. Sweigart skips pdb in favor of IDE debuggers; FND-102 teaches pdb because every Linux server you SSH into has it.
  3. Python subprocess module docs at https://docs.python.org/3/library/subprocess.html. Read at least the "Using the subprocess Module" section and the subprocess.run reference. ~25 min.
  4. Python pdb module docs at https://docs.python.org/3/library/pdb.html. Skim the command reference. ~15 min. The full four-command vocabulary you need is in the lecture below.
  5. Real Python: "Python Debugging With Pdb" at https://realpython.com/python-debugging-pdb/. ~25 min read. Worked examples; the only Real Python article on pdb worth a careful read.

Lecture outline (~1.5 hours, 2 sessions of ~50 min)

Session 1: subprocess and try/except

Section 1.1: subprocess.run the safe default

  • The minimal pattern:
    import subprocess
    result = subprocess.run(['ls', '-la'], capture_output=True, text=True)
    print(result.stdout)
    print('exit code:', result.returncode)
    
  • Arguments:
    • First argument is a LIST of strings (the command and its arguments). NOT a single string.
    • capture_output=True captures stdout and stderr instead of letting them inherit the parent's terminal.
    • text=True decodes stdout/stderr as text (UTF-8 by default) instead of bytes. Without it, result.stdout is bytes and you must .decode() manually.
  • Return value is a CompletedProcess object with attributes:
    • result.returncode; the exit status (0 = success)
    • result.stdout; captured stdout as a string
    • result.stderr; captured stderr as a string

Section 1.2: shell=True is dangerous

  • The convenient form:
    result = subprocess.run('ls -la ' + user_input, shell=True, capture_output=True, text=True)
    
  • The danger: if user_input is '; rm -rf ~', the shell happily runs ls -la ; rm -rf ~. This is a shell injection vulnerability. The standard example, but real software gets owned by it regularly.
  • The safe form: pass arguments as a list. The OS does NOT pass them through a shell; the shell metacharacters in user_input are just data.
    result = subprocess.run(['ls', '-la', user_input], capture_output=True, text=True)
    
  • Rule: do not use shell=True with any string that includes user input. Better rule: do not use shell=True. Almost every use case has a list-form equivalent.

Section 1.3: Exit codes

  • Convention: 0 means success, nonzero means failure.
  • Check explicitly:
    result = subprocess.run(['ls', '/nope'], capture_output=True, text=True)
    if result.returncode != 0:
        print('ls failed:', result.stderr)
    
  • Or use check=True to raise CalledProcessError on nonzero exit:
    try:
        result = subprocess.run(['ls', '/nope'], capture_output=True, text=True, check=True)
    except subprocess.CalledProcessError as e:
        print('ls failed:', e.stderr)
    
  • The check=True pattern matches the Pythonic "raise on error" style; the explicit-check pattern matches Unix script style. Pick the one that fits your tool.

Section 1.4: try / except in depth

  • The basic pattern:
    try:
        value = int(user_input)
    except ValueError as e:
        print(f'not a number: {e}')
        value = 0
    
  • Catch the SPECIFIC exception class. except: (bare) catches everything including KeyboardInterrupt and SystemExit, and hides real bugs. Always name the class.
  • Multiple exception types:
    try:
        ...
    except (FileNotFoundError, PermissionError) as e:
        print(f'cannot open file: {e}')
    
  • try / except / else / finally:
    • else runs only if try succeeded (no exception)
    • finally runs always (success or exception); useful for cleanup
  • The "EAFP" idiom: Easier to Ask Forgiveness than Permission. Pythonic style prefers try: x[k]; except KeyError: default over if k in x: x[k]; else: default. Both work; EAFP wins when the missing-key case is genuinely exceptional.

Session 2: pdb

Section 2.1: The four commands you need

  • Set a breakpoint by inserting one line in your code:
    import pdb; pdb.set_trace()
    # OR, in Python 3.7+:
    breakpoint()
    
  • When execution hits this line, you get an interactive prompt:
    (Pdb)
    
  • Four commands handle 95% of debugging:
    • n (next): execute the current line; if it is a function call, do NOT step into it
    • s (step): execute the current line; if it is a function call, DO step into it
    • c (continue): resume normal execution until the next breakpoint or program end
    • p variable (print): print the current value of variable

Section 2.2: Inspecting state

  • p variable prints the value. pp variable pretty-prints (for nested structures).
  • l (list) shows the source around the current line.
  • w (where) prints the call stack so you can see how you got here.
  • Type any Python expression at the (Pdb) prompt to evaluate it. p len(my_list) prints the length. p sum(my_dict.values()) prints the sum.

Section 2.3: Setting breakpoints without editing the source

  • Run your script under pdb:
    python3 -m pdb my_script.py
    
  • This drops into pdb at the first line. Use b filename.py:42 to set a breakpoint at line 42, then c to continue to it.
  • Useful when you cannot modify the source (a vendored module, for example).

Section 2.4: Reading a traceback

  • A Python traceback prints top-to-bottom in CALL order: the outermost frame first, the innermost (where the exception was raised) last.
  • Read root-up: start at the LAST line (the exception message), then work upward to see HOW you got there.
  • Example:
    Traceback (most recent call last):
      File "scan.py", line 23, in <module>
        main()
      File "scan.py", line 18, in main
        result = process(data)
      File "scan.py", line 10, in process
        return data[0] / data[1]
    ZeroDivisionError: division by zero
    
  • The exception is ZeroDivisionError at line 10 of scan.py in function process. process was called from main at line 18. main was called from the top of the file at line 23. To fix: either prevent data[1] from being zero, or wrap the division in try / except ZeroDivisionError.

Section 2.5: When pdb beats print, when print beats pdb

  • pdb wins when the bug is "wrong value somewhere in a deep call chain" or "rare condition I can't reliably trigger." Set a breakpoint at the suspicious line; inspect interactively.
  • print wins when the bug is "this loop is doing something weird" (sprinkle prints in the loop body) or when you cannot use stdin (the program is running unattended). Print scales to "every iteration"; pdb requires you to step.
  • A middle option: logging.debug with --debug flag (Lab 6 pattern). Production code uses logging, not pdb or print.

Labs (~90 minutes)

Lab 9: Disk-Usage Reporter + Debugging Exercise (labs/lab-9-disk-usage.md)

  • Goal: build a CLI tool that wraps du via subprocess and emits a human-readable directory-size summary; then debug a planted bug using pdb
  • Time: ~90 minutes (60 min for the tool, 30 min for the debug exercise)
  • Artifact: lab-9-du.py + lab-9-bug.py (with your fix committed) in ~/fnd-102/lab-9/

Independent practice (~4 hours)

  1. subprocess drills (45 min). Wrap five common shell commands with subprocess.run. For each, print the output and the exit code:

    • date (no args)
    • ls -la /tmp
    • df -h (Unix) or wmic logicaldisk get size,freespace,caption (Windows; or use shutil.disk_usage instead)
    • python3 --version
    • git status from the FND-102 directory

    For each, decide: list form (safe) or shell form (dangerous). Always pick list form for these.

  2. Shell-injection demo (30 min). Write a small Python script that calls subprocess.run('echo ' + name, shell=True). Call it with name = 'hello'; it prints "hello". Call it with name = 'hello; touch INJECTED'; observe the file INJECTED appears. Now rewrite with the list form (subprocess.run(['echo', name])); confirm that the list form treats the metacharacters as data.

  3. try/except practice (45 min). Take your Lab 5 scanner and add proper exception handling for:

    • Missing input file (FileNotFoundError)
    • Permission denied (PermissionError)
    • Decoding error on a binary file (UnicodeDecodeError)

    For each, the program should print a clear error message and exit nonzero. Test by deliberately triggering each.

  4. pdb exploration (45 min). Take a known-buggy function:

    def average(nums):
        return sum(nums) / len(nums) + 1  # bug: spurious +1
    
    def main():
        result = average([1, 2, 3, 4, 5])
        print(result)
    

    Set a breakpoint() inside average; run it; step through with n and p; identify the +1. Then fix it. The point: even on a trivial bug, the muscle memory matters.

  5. Traceback reading drill (30 min). Write three programs that crash in three different ways:

    • IndexError: [1, 2, 3][5]
    • KeyError: {'a': 1}['b']
    • TypeError: 'hello' + 5

    Read each traceback root-up. Describe in one sentence what went wrong and how to fix it.

  6. EAFP vs LBYL (30 min). Take this LBYL ("Look Before You Leap") code:

    if 'name' in user:
        greet(user['name'])
    else:
        print('no name')
    

    Rewrite as EAFP ("Easier to Ask Forgiveness than Permission"):

    try:
        greet(user['name'])
    except KeyError:
        print('no name')
    

    When is each more readable? When does each have a performance edge? (Hint: think about the success case being common vs rare.)

Reflection prompts (~30 minutes)

  1. The shell-injection demo (practice 2) made the danger of shell=True concrete. Did the demo change how you'd write similar code in the future?
  2. The pdb exploration (practice 4) showed pdb on a trivial bug. How would print-debugging have compared? On what kind of bug would pdb be clearly worth the setup cost?
  3. Tracebacks read top-to-bottom in call order but are most informative root-up. Did you read the three crash tracebacks (practice 5) top-down first? Which way is faster for you now?
  4. try/except lets a program continue past errors. Your week-5 scanner crashed on a missing file; your week-9 version handles it. What new failure modes did you NOT handle, intentionally? (Hint: every program has unhandled failure modes; the discipline is to be intentional about which.)
  5. One thing from this week you want to know more about?

Tool journal (week 9)

  • subprocess.run: run another program from Python
  • capture_output=True, text=True: the safe defaults
  • List form vs shell=True: shell injection avoidance
  • subprocess.CalledProcessError: nonzero exit handling
  • try / except / else / finally: exception handling shapes
  • pdb and breakpoint(): interactive debugger
  • pdb commands: n, s, c, p, l, w
  • Traceback reading discipline: root-up

What comes next

Week 10 picks up git intermediate skills: branches, remotes, pull requests. Lab 10 submits your Lab 9 disk-usage reporter as a git PR for instructor review. The actual workflow every working programmer uses every day; Lab 10 is your first experience with code review.