Classroom Glossary Public page

Week 6: Modules and the Standard Library

1,408 words

The shape of a real program: code split across files, plus the argparse and logging modules. The lab is a refactor: take Lab 5's scanner, give it a real argparse CLI and a logging debug log.


Theme

Single-file Python programs do not scale. Past ~500 lines, a script becomes hard to navigate. The fix is the same fix as for functions in week 3: split the code into named pieces. The unit one level up from a function is a module; a .py file that other code can import.

This week also introduces Python's standard library, the collection of modules that ship with the interpreter. The stdlib is large and growing; you do not need to memorize it. The discipline is to know which problems have stdlib solutions, then look up the module when you need it. Two modules are this week's focus: argparse (CLI argument parsing, used in every CLI tool you write from now on) and logging (structured debug and audit output, replacing print-debugging for non-trivial programs).

The lab is a refactor. Lab 5 was a working scanner; this week you rewrite it with a real CLI (--input PATH, --output PATH, --top N, --verbose) and a proper logging setup (DEBUG to a file, INFO to stderr). The scanner's behavior does not change; the interface becomes professional. This is the difference between a script you wrote and a tool you would hand to a coworker.

By the end of week 6 you can: organize a multi-file Python project; use import correctly (module imports, from-imports, aliases); write an argparse.ArgumentParser with positional and optional arguments; configure the logging module to send output to multiple destinations; recognize when to reach for a stdlib module instead of writing your own.

Reading list (~1 hour)

  1. Matthes, Python Crash Course 2nd ed., Ch 8.5 ("Storing Your Functions in Modules"). Matthes covers import, from ... import, aliases, and module organization.
  2. Sweigart, Automate the Boring Stuff with Python 2nd ed., Appendix B ("Running Programs") and chapter excerpts on argparse (Ch 14: "Working with PDF and Word Documents" uses argparse incidentally) at https://automatetheboringstuff.com/2e/chapter14/. Sweigart's book does not have a dedicated argparse chapter; the official Python tutorial fills that gap (next reading).
  3. Python official argparse tutorial at https://docs.python.org/3/howto/argparse.html. ~20 min read. The canonical reference; the patterns it shows are the ones you should use.
  4. Python official logging tutorial at https://docs.python.org/3/howto/logging.html. ~20 min read. The basic vs advanced split is genuine; for FND-102 you need the "basic logging tutorial" section only.
  5. Real Python: "Logging in Python" at https://realpython.com/python-logging/. ~20 min read. Worked examples beyond the official tutorial.

Lecture outline (~1.5 hours, 2 sessions of ~50 min)

Session 1: Modules and imports

Section 1.1: What a module is

  • A module is a .py file. Period.
  • mymath.py:
    def square(n):
        return n * n
    
    def cube(n):
        return n ** 3
    
  • From another file in the same directory:
    import mymath
    print(mymath.square(5))   # 25
    print(mymath.cube(3))     # 27
    
  • The first time mymath is imported, Python runs the file top-to-bottom (executing function definitions and any top-level code). The functions become attributes of the module object.

Section 1.2: Import variations

  • import mymath; bind the module to the name mymath. Use mymath.square(5).
  • import mymath as mm; alias. Use mm.square(5). Common for long module names (import numpy as np).
  • from mymath import square; bind square directly. Use square(5). No mymath. prefix.
  • from mymath import square, cube; multiple names at once.
  • from mymath import *; import everything. Almost always wrong; pollutes your namespace. Avoid.

Section 1.3: The standard library

  • Python ships with ~200 modules in the standard library. Some you will use frequently in FND-102:
    • os and pathlib; filesystem
    • sys; interpreter and command-line interface
    • json, csv; structured data
    • argparse; CLI argument parsing
    • logging; structured output
    • re; regular expressions
    • subprocess; running other programs
    • hashlib; cryptographic hashing
    • datetime; dates and times
    • collections; Counter, defaultdict, OrderedDict, etc.
    • random; pseudorandom numbers
    • urllib, http; basic HTTP (week 12 uses requests, a third-party package, but urllib is the stdlib alternative)
  • A complete stdlib reference is at https://docs.python.org/3/library/. Bookmark it.

Section 1.4: Third-party packages

  • The stdlib has limits. requests (HTTP), numpy (arrays), pandas (tables), pytest (testing) are third-party.
  • Install with pip:
    python3 -m pip install requests
    
  • Use the same way as stdlib:
    import requests
    resp = requests.get('https://example.com')
    print(resp.status_code)
    
  • For FND-102: you install requests in week 12 and pytest in week 13. Everything else is stdlib.

Section 1.5: Project structure

  • A small project organized as multiple files:
    my-scanner/
    ├── README.md
    ├── scan.py            # main entry point
    ├── log_reader.py      # reads + iterates log files
    ├── parser.py          # parses log lines
    └── tests/
        └── test_parser.py
    
  • scan.py is the entry point with if __name__ == '__main__':; it imports from the helper modules.
  • For very small tools (your Lab 6 included), a single file is fine. The multi-file pattern matters when modules exceed ~200 lines.

Session 2: argparse and logging

Section 2.1: argparse basics

  • The standard pattern:
    import argparse
    
    def build_parser():
        parser = argparse.ArgumentParser(
            description='Scan a log file for ERROR lines.'
        )
        parser.add_argument('input', help='path to the log file')
        parser.add_argument('--top', type=int, default=10, help='show top N matches (default: 10)')
        parser.add_argument('--verbose', '-v', action='store_true', help='enable debug logging')
        return parser
    
    def main():
        args = build_parser().parse_args()
        print(f'input: {args.input}')
        print(f'top: {args.top}')
        print(f'verbose: {args.verbose}')
    
    if __name__ == '__main__':
        main()
    
  • Positional arguments ('input') are required; the value is in args.input.
  • Optional arguments ('--top') start with --; the value is in args.top (Python attribute-name conversion: --my-arg becomes args.my_arg).
  • type=int parses the string into an int (or errors out).
  • default=10 is the fallback if --top is not passed.
  • action='store_true' is the "flag" pattern: passing --verbose sets args.verbose to True; not passing it leaves it False.
  • argparse auto-generates --help from your descriptions. Run python3 myscript.py --help to see it. A --help that reads like documentation is the goal.

Section 2.2: argparse for real

  • Common patterns beyond the basics:
    • Multiple required positionals: parser.add_argument('input'); parser.add_argument('output')
    • Optional with a value: parser.add_argument('--threshold', type=float, default=0.5)
    • Choices: parser.add_argument('--format', choices=['json', 'csv', 'text'], default='text')
    • Lists: parser.add_argument('--paths', nargs='+') accepts one or more paths
  • The --help output is your interface documentation; rewrite the help strings until they read like prose. Example: not --threshold THRESHOLD: a number but --threshold N: skip records with score below N (default: 0.5).

Section 2.3: The logging module

  • print-debugging works for week-3 scripts. Past ~100 lines, you want logging:
    import logging
    
    logging.basicConfig(
        level=logging.INFO,
        format='%(asctime)s %(levelname)s %(message)s'
    )
    
    log = logging.getLogger(__name__)
    
    log.debug('reading file %s', path)        # not shown at INFO level
    log.info('scan complete: %d matches', n)  # shown
    log.warning('parse failed on line %d', i)
    log.error('file not found: %s', path)
    log.critical('database connection lost; shutting down')
    
  • Five levels: DEBUG < INFO < WARNING < ERROR < CRITICAL. basicConfig(level=...) sets the minimum level to display; anything lower is silently dropped.
  • The %s, %d formatting uses the logger's lazy evaluation: the formatting only happens if the log level is enabled. Do NOT pre-format with f-strings (log.info(f'count: {n}')); that defeats the lazy evaluation and is slower in the dropped case.

Section 2.4: Logging configuration

  • Send DEBUG to a file, INFO+ to stderr:
    logging.basicConfig(
        level=logging.DEBUG,
        format='%(asctime)s %(name)s %(levelname)s %(message)s',
        handlers=[
            logging.FileHandler('scan.debug.log'),
            logging.StreamHandler()  # stderr by default
        ]
    )
    # then in main(): set StreamHandler level based on args.verbose
    
  • A common pattern in CLI tools: --verbose flips the stderr handler from WARNING (default) to INFO or DEBUG. The file handler always logs DEBUG.

Section 2.5: print vs logging: when to use which

  • print is for "this output is what the user asked for." The scanner's result list.
  • logging is for "this output is operational." Progress messages, debug traces, warnings about input quality.
  • Conventionally: print writes to stdout, which is the result; logging writes to stderr, which is the commentary. A user who runs myscript --output result.csv 2> debug.log separates them cleanly.

Labs (~90 minutes)

Lab 6: Argparse + Logging Refactor (labs/lab-6-argparse-logging.md)

  • Goal: take Lab 5's scanner and rewrite it with a real argparse CLI and a logging-based debug output
  • Time: ~90 minutes
  • Artifact: lab-6-scanner.py in ~/fnd-102/lab-6/, committed to Git

Independent practice (~4 hours)

  1. argparse drills (45 min). Build five small CLI tools that each demonstrate one argparse pattern:

    • greet.py NAME (one positional)
    • add.py N M (two positionals, both ints)
    • pick.py --choice red,blue,green (choices)
    • flag.py --verbose (flag)
    • paths.py FILE1 FILE2 [FILE3 ...] (nargs+) Run each with --help; verify the output is readable.
  2. logging exploration (30 min). Add logging to your Lab 5 scanner at four levels: DEBUG ("reading line 1234"), INFO ("found 500 errors"), WARNING ("parse failed on line N"), ERROR ("file not found"). Run with different --verbose flags; verify the level filter works.

  3. Multi-module project (60 min). Refactor Lab 5 into two files:

    • scanner/__init__.py (empty; this makes the directory a package)
    • scanner/reader.py (contains the scan generator)
    • scanner/main.py (the argparse + main loop) Run with python3 -m scanner.main. This is the conventional shape for a Python package.
  4. Read a stdlib module (30 min). Pick one stdlib module you have not used and read its docs (https://docs.python.org/3/library/). Suggestions: datetime, collections, itertools, functools. Write 3 things the module does and 1 thing you might use it for.

  5. --help readability audit (30 min). Take your Lab 6 --help output. Read it as if you had never seen the tool. Is it clear what --top N does? What --verbose controls? Rewrite any help string that is field-name shaped ("verbose") into sentence-shaped ("--verbose: print extra debug output to stderr").

  6. Optional stretch (45 min). Write a --config FILE argument that reads default values from a JSON config file. CLI arguments override the config; config overrides hard-coded defaults. This is the layered-config pattern every real CLI tool uses.

Reflection prompts (~30 minutes)

  1. Your week-5 scanner is a script; your week-6 scanner is a tool. Articulate the difference in two sentences. What did the CLI interface and the logging add?
  2. The stdlib has ~200 modules. You used 1 (csv) in week 5, 4 in week 6 (csv, json, argparse, logging). At what point would you start writing your own module instead of looking for a stdlib one?
  3. logging.info('count: %d', n) uses lazy formatting; logging.info(f'count: {n}') does not. The first is preferred in tight loops. Did you write any logging this week in the f-string form? Refactor.
  4. print vs logging: in your Lab 6 scanner, what is printed to stdout and what is logged to stderr? Why?
  5. One thing from this week you want to know more about?

Tool journal (week 6)

  • import module, from module import name, import module as alias: import variations
  • argparse.ArgumentParser: build a CLI
  • add_argument with positional, optional, flag, choices, nargs patterns
  • logging.getLogger, logging.basicConfig: structured output
  • Log levels (DEBUG, INFO, WARNING, ERROR, CRITICAL)
  • FileHandler, StreamHandler: multi-destination logging
  • if __name__ == '__main__':: import-safe entry point

What comes next

Week 7 introduces regular expressions. Your Lab 5 + Lab 6 scanner uses 'ERROR' in line for a simple substring match; week 7's lab extracts specific patterns (IPv4 and IPv6 addresses) from log lines using re.findall. Regex is the standard tool for "find structured data inside unstructured text."