~90 minutes. Take Lab 5's scanner and rewrite it with a real argparse CLI and a logging setup. The behavior is the same; the interface becomes professional.
Goal: ship lab-6-scanner.py that has a proper argparse-based CLI (--top, --verbose, --output) and a logging-based debug output (DEBUG to a file, INFO+ to stderr based on --verbose).
Estimated time: 90 minutes
Prerequisites: Week 6 lecture (argparse, logging, modules). Lab 5 complete and working.
Setup
mkdir -p ~/fnd-102/lab-6
cd ~/fnd-102/lab-6
cp ../lab-5/lab-5-scanner.py lab-6-scanner.py
cp ../lab-5/sample.log . # so the lab is self-contained
Open lab-6-scanner.py.
Part A: Build the argparse CLI (25 min)
Replace the manual sys.argv parsing from Lab 5 with argparse:
import argparse
from pathlib import Path
def build_parser():
"""Return an ArgumentParser for the log-scanner CLI."""
parser = argparse.ArgumentParser(
description='Scan a log file for ERROR lines and report counts.',
epilog='Example: %(prog)s sample.log --top 20 --verbose'
)
parser.add_argument(
'input',
type=Path,
help='path to the log file to scan'
)
parser.add_argument(
'--top',
type=int,
default=10,
help='show the top N matches (default: %(default)s)'
)
parser.add_argument(
'--output',
type=Path,
default=None,
help='write matching lines to this file (default: print to stdout)'
)
parser.add_argument(
'--verbose', '-v',
action='store_true',
help='enable INFO-level logging to stderr'
)
parser.add_argument(
'--debug',
action='store_true',
help='enable DEBUG-level logging to stderr (very chatty)'
)
return parser
Notice:
type=Pathparses the string into apathlib.Pathautomatically.default=%(default)sin the help text is an argparse template that substitutes the actual default value when--helpis rendered.%(prog)sin theepilogis replaced with the program's name.- Both
--verboseand--debugare flags; both could be passed but--debugis more verbose. We will give--debugprecedence in the logging setup.
Run python3 lab-6-scanner.py --help and read the output. If any help string is field-name shaped instead of sentence shaped, rewrite it.
Part B: Set up logging (20 min)
Add the logging configuration:
import logging
import sys
log = logging.getLogger('scanner')
def configure_logging(verbose: bool, debug: bool):
"""Configure logging. DEBUG always goes to scanner.debug.log; stderr level depends on flags."""
log.setLevel(logging.DEBUG)
# File handler: always DEBUG-level
file_handler = logging.FileHandler('scanner.debug.log', mode='w', encoding='utf-8')
file_handler.setLevel(logging.DEBUG)
file_handler.setFormatter(logging.Formatter('%(asctime)s %(levelname)s %(message)s'))
log.addHandler(file_handler)
# Stderr handler: WARNING by default; INFO with --verbose; DEBUG with --debug
stream_handler = logging.StreamHandler(sys.stderr)
if debug:
stream_handler.setLevel(logging.DEBUG)
elif verbose:
stream_handler.setLevel(logging.INFO)
else:
stream_handler.setLevel(logging.WARNING)
stream_handler.setFormatter(logging.Formatter('%(levelname)s: %(message)s'))
log.addHandler(stream_handler)
Three things to notice:
- The logger name is
'scanner', not__name__. For a single-file script either works; the convention__name__shines when the script is split into modules (each module gets its own named logger and you can tune levels per-module). - File logging is unconditional.
scanner.debug.logis always written with the full DEBUG trace; the stderr level decides what the USER sees. Operations engineers running this in production rely on the debug log being there when something goes wrong. mode='w'truncates the debug log on each run.mode='a'would append; for a CLI tool a fresh log per invocation is usually right.
Part C: Wire the scanner into the new shell (25 min)
The scan generator from Lab 5 stays unchanged. Replace the old main with one that uses argparse + logging:
from collections import Counter
def scan(path: Path):
"""Yield ERROR lines from a log file."""
log.debug(f'opening {path}')
with open(path, encoding='utf-8') as f:
for i, line in enumerate(f, start=1):
if i % 10000 == 0:
log.debug(f'scanned {i} lines so far')
if 'ERROR' in line:
yield line.rstrip()
def main():
args = build_parser().parse_args()
configure_logging(args.verbose, args.debug)
if not args.input.exists():
log.error(f'input file does not exist: {args.input}')
sys.exit(1)
log.info(f'scanning {args.input}')
matches = list(scan(args.input))
log.info(f'found {len(matches)} ERROR lines')
# breakdown by kind
kinds = []
for line in matches:
parts = line.split()
if len(parts) > 3:
kinds.append(parts[3].rstrip(':'))
counts = Counter(kinds)
# write or print matches
if args.output:
with open(args.output, 'w', encoding='utf-8') as f:
for line in matches[:args.top]:
f.write(line + '\n')
log.info(f'wrote {min(args.top, len(matches))} matches to {args.output}')
else:
print(f'Found {len(matches)} ERROR lines.')
print('Breakdown by kind:')
for kind, n in counts.most_common():
print(f' {kind:25s} {n:5d}')
print(f'First {args.top}:')
for line in matches[:args.top]:
print(f' {line}')
sys.exit(0)
if __name__ == '__main__':
main()
Test the four call paths:
# Default (no flags): no INFO output, just the human result
python3 lab-6-scanner.py sample.log
# Verbose: INFO output to stderr; same result to stdout
python3 lab-6-scanner.py sample.log --verbose
# Debug: chatty DEBUG output to stderr
python3 lab-6-scanner.py sample.log --debug
# Output to file: stdout silent; matches in the file
python3 lab-6-scanner.py sample.log --output matches.txt
cat matches.txt
Check scanner.debug.log: it should always have the full DEBUG trace regardless of the stderr level.
Part D: Check --help is good (10 min)
Run:
python3 lab-6-scanner.py --help
The output should be readable. A good --help looks like:
usage: lab-6-scanner.py [-h] [--top TOP] [--output OUTPUT] [--verbose] [--debug] input
Scan a log file for ERROR lines and report counts.
positional arguments:
input path to the log file to scan
options:
-h, --help show this help message and exit
--top TOP show the top N matches (default: 10)
--output OUTPUT write matching lines to this file (default: print to stdout)
--verbose, -v enable INFO-level logging to stderr
--debug enable DEBUG-level logging to stderr (very chatty)
Example: lab-6-scanner.py sample.log --top 20 --verbose
If any line reads like a database field name rather than a sentence, rewrite it. The --help is your tool's documentation; treat it with the same care as a README.
Part E: Commit your work (10 min)
cd ~/fnd-102/lab-6
git add lab-6-scanner.py sample.log
# scanner.debug.log is an output artifact; do not commit
echo 'scanner.debug.log' >> ~/fnd-102/.gitignore
git add ~/fnd-102/.gitignore
git commit -m "lab-6: refactor scanner with argparse CLI and logging (file DEBUG + stderr level by flag)"
A .gitignore keeps generated artifacts out of the repo. The pattern is one path per line; entries can use globs (*.log).
Expected output / artifact
lab-6-scanner.py should:
- Use
argparse.ArgumentParserwith at least four arguments: positionalinput, optional--top,--output,--verbose,--debug - Use
logging.getLoggerwith two handlers (file + stderr) - File handler always DEBUG; stderr handler WARNING / INFO / DEBUG based on flags
- Produce identical results to Lab 5 on the same input
--helpreads like documentation, not field names
Files committed: lab-6-scanner.py, sample.log, .gitignore.
What's the failure mode?
This tool's likely failure modes:
scanner.debug.lognot writable. If you run the scanner in a read-only directory, theFileHandlerraisesPermissionErrorat config time. The user sees a traceback before any work happens. Defensive fix: try the file handler; on failure, log a warning and continue with stderr only.--outputpath's directory does not exist.open('subdir/out.txt', 'w')fails ifsubdir/does not exist. Defensive fix:args.output.parent.mkdir(parents=True, exist_ok=True)before opening.- The scanner crashes mid-stream. If a single log line has unexpected encoding, the
for line in f:loop raisesUnicodeDecodeErrorand you lose all matches found so far. Fix (week 9):try/exceptaround the read. --topof 0. Argparse accepts--top 0; yourmatches[:0]is empty; the user sees "First 0:" followed by nothing. Not a crash, but confusing. Forward-stretch: add achoices=range(1, 1001)to argparse, or a manual validation.
Common pitfalls
logging.info(f'count: {n}')vslog.info('count: %d', n). The first formats eagerly (always); the second formats lazily (only if INFO is enabled). Tight loops should always use the lazy form. Lecture mentioned this; reinforce by running a million-iteration loop with each style and comparing wall time.- Forgetting
if __name__ == '__main__':. Without it, importinglab-6-scanner.pyruns the scanner. Tests in week 13 will fail loudly. Module names with hyphens cannot be imported anyway; rename tolab_6_scanner.pyif you plan to test. --verboseAND--debugpassed together. Your config gives--debugprecedence (good). Document this in the help string if it matters.- Argparse default of
Falsefor flags.action='store_true'defaults to False if the flag is absent.action='store_false'defaults to True. Pick the one that matches the natural off-state.
Stretch (optional)
- Add
--threshold Nthat only reports kinds with ≥ N occurrences. Practice in conditional filtering with argparse-controlled threshold. - Add
--formatwith choicestext,json,csvfor the output format. Each choice changes howmatchesandcountsare serialized. - Split the file into a package.
scanner/__init__.py,scanner/cli.py(argparse),scanner/core.py(scan generator),scanner/__main__.py(entry). Run withpython3 -m scanner. This is the conventional shape for a real Python CLI tool. - Add a config-file argument (
--config config.json) that pre-loads defaults for--topand--threshold. CLI overrides config; config overrides argparse defaults.
Lab 6 v0.1.