Classroom Public page

Week 6: Assembler

527 words

You wrote sum-to-N by hand-encoding bytes in week 4. This week you write the tool that does the encoding for you. A two-pass assembler that consumes RV32I-Lite assembly, builds a symbol table, encodes each instruction, and emits a VOF object file. By end of week your assembler produces a binary that runs on the silicon you brought up in week 5.


Reading

  • Chapter prose (primary). draft-chapters/ch6-assembler-prose.md
  • Petzold weave anchors. Ch 17 Automation p. 224 ("actually keying these numbers"; the manual cost the assembler eliminates); Ch 24 Languages High and Low (the dominant Petzold reading of the course): p. 349 "eating with a toothpick", p. 356 "first person to write the first assembler", pp. 354 + 359 ladder of languages
  • Cross-chapter handouts. VOF v1 layout reference, the object-file format you produce

Lecture

lectures/ch6-assembler-lecture.md. 3 hours. Key arc:

  • Two-pass assembly. Pass 1 builds the symbol table (every label gets an address); pass 2 emits the encoded bytes (now that every label is resolved)
  • Why two passes. Forward references make one-pass assembly require backpatching; two passes cleanly separate concerns
  • Pseudo-instructions. mv rd, rs1 is not a real RV32I-Lite instruction; the assembler expands it to addi rd, rs1, 0. The user-facing language is richer than the encoded language
  • VOF (Virtus Object Format) v1. The format your assembler emits: header + text section + symbol table + relocation table

Lab exercises

Five labs in worksheets/ch6/. The toolchain build begins.

Plan for ~5 hours of lab.

Independent practice

  • Read Petzold Ch 24 carefully. This is the dominant Petzold reading of the course; you visit it in Ch 6, Ch 6a, Ch 7, Ch 9, Ch 10, and Ch 11
  • Update your Toolchain Diary. Week 6 introduces: Python argparse for CLIs, file I/O patterns, nm for symbol-table inspection, strings for printable-byte extraction

Architecture comparison sidebar

VOF v1 is the academy's teaching object format. Industry uses ELF (Linux, BSD, embedded), Mach-O (macOS, iOS), and PE (Windows). All three encode the same information (sections, symbols, relocations) with different field layouts and feature surfaces. ELF is the closest cousin to VOF; you encounter ELF when you run readelf on real binaries (lab 6.5).

Reflection prompts

  1. The assembler is the first piece of software you wrote that produces software. What changed in your mental model of "what is code" vs "what is data"?
  2. Pseudo-instructions make assembly easier to read but they hide what the silicon actually does. When does the hiding help you and when does it hurt you?
  3. Why didn't we just use an existing assembler (GNU as)? What would you have learned and not learned?

What's next

Week 7 adds the static linker. Your assembler in week 6 produced an object file with unresolved symbols and relocations. The linker resolves those symbols and produces a flat binary that the CPU can execute directly. The next layer of the toolchain.