Classroom Public page

Week 11: Compiler II (Code Generation)

561 words

Walk the parse tree; emit VM bytecode. By end of week your compiler can take a Jack-like source file all the way through to a binary that runs on your silicon. Eight layers of toolchain, every one of them written by you.


Reading

  • Chapter prose (primary). draft-chapters/ch10-compiler-code-generation-prose.md
  • Petzold weave anchors. Ch 22 The Operating System pp. 326 + 329 (filename-to-disk-block resolution as namespace cousin of compiler's symbol-table resolution); Ch 24 Languages High and Low pp. 353 + 354 (final Ch 24 citation; closes the recursion-on-grammar thread); Ch 25 The Graphical Revolution p. 382 (the second-to-last page of the entire book; "Java byte codes... an imaginary computer called the Java virtual machine (JVM)")

The Ch 25 p. 382 weave is the moment Petzold's book describes the architecture you have just spent two chapters implementing. The recognition is meant to land.

Lecture

lectures/ch10-compiler-code-generation-lecture.md. 3 hours. Key arc:

  • The symbol table. Each variable has a scope (class-level, subroutine-level, argument); each name maps to a segment-plus-offset
  • Tree-walking code generation. Each parse-tree node type has an emit function; emit functions call each other recursively, mirroring the parse tree's shape
  • Expression codegen. Post-order traversal pushes operands; emits an operator op; the stack does the rest
  • Statement codegen. Each statement type (let, if, while, return) has a fixed VM-bytecode pattern
  • Subroutine codegen. The full function-call protocol from Ch 8 wraps every subroutine's body

Lab exercises

Five labs in worksheets/ch10/.

Plan for ~5 hours of lab.

Independent practice

  • Read Petzold Ch 25 p. 382 carefully. Pause on the paragraph. The architecture Petzold describes (Java bytecode + JVM) is structurally what you have built. Petzold wrote that paragraph in 1999; your version is 2026; the idea is the same
  • Update your Toolchain Diary. Week 11 introduces: Python visitor pattern for tree traversal, the academy's compiler module CLI, Ghidra's "import binary" workflow for your own compiled output

Architecture comparison sidebar

Tree-walking codegen is the simplest approach and the one CSA-101 teaches. Production compilers add several intermediate layers: SSA-form IR (LLVM IR is the canonical example); register allocation (the academy compiler's "register" is the stack; production compilers use real machine registers via graph coloring or linear-scan allocation); optimization passes (constant folding, dead code elimination, loop-invariant code motion). All of these are deferred to CSA-201 and beyond.

Reflection prompts

  1. The toolchain closed this week. Source code becomes silicon-runnable. Trace one source-code line through every stage of the toolchain you wrote and name what each stage did to that line
  2. Ghidra reads your compiler's output and tries to reconstruct higher-level structure. What can Ghidra recover? What can it not?
  3. Your compiler is one tree-walker. Production compilers have many passes. What pressure causes the production complexity?

What's next

Week 12 makes the compiler aware of the operating system. Your code can now call standard-library services (Math.multiply, Memory.peek, Screen.drawLine). The compiler emits the calls; the OS provides the implementations (you write the OS in Ch 12).