You wrote sum-to-N by hand-encoding bytes in week 4. This week you write the tool that does the encoding for you. A two-pass assembler that consumes RV32I-Lite assembly, builds a symbol table, encodes each instruction, and emits a VOF object file. By end of week your assembler produces a binary that runs on the silicon you brought up in week 5.
Reading
- Chapter prose (primary). draft-chapters/ch6-assembler-prose.md
- Petzold weave anchors. Ch 17 Automation p. 224 ("actually keying these numbers"; the manual cost the assembler eliminates); Ch 24 Languages High and Low (the dominant Petzold reading of the course): p. 349 "eating with a toothpick", p. 356 "first person to write the first assembler", pp. 354 + 359 ladder of languages
- Cross-chapter handouts. VOF v1 layout reference, the object-file format you produce
Lecture
lectures/ch6-assembler-lecture.md. 3 hours. Key arc:
- Two-pass assembly. Pass 1 builds the symbol table (every label gets an address); pass 2 emits the encoded bytes (now that every label is resolved)
- Why two passes. Forward references make one-pass assembly require backpatching; two passes cleanly separate concerns
- Pseudo-instructions.
mv rd, rs1is not a real RV32I-Lite instruction; the assembler expands it toaddi rd, rs1, 0. The user-facing language is richer than the encoded language - VOF (Virtus Object Format) v1. The format your assembler emits: header + text section + symbol table + relocation table
Lab exercises
Five labs in worksheets/ch6/. The toolchain build begins.
- lab-6.1-tokenizer-and-pass1.md
- lab-6.2-pass2-encoding-and-pseudos.md
- lab-6.3-vof-emit-sum-to-n.md
- lab-6.4-end-to-end-on-silicon-and-toolchain-reconciliation.md, your assembler's output runs on your CPU
- lab-6.5-nm-and-strings.md, industry-tool reconciliation
Plan for ~5 hours of lab.
Independent practice
- Read Petzold Ch 24 carefully. This is the dominant Petzold reading of the course; you visit it in Ch 6, Ch 6a, Ch 7, Ch 9, Ch 10, and Ch 11
- Update your Toolchain Diary. Week 6 introduces: Python
argparsefor CLIs, file I/O patterns,nmfor symbol-table inspection,stringsfor printable-byte extraction
Architecture comparison sidebar
VOF v1 is the academy's teaching object format. Industry uses ELF (Linux, BSD, embedded), Mach-O (macOS, iOS), and PE (Windows). All three encode the same information (sections, symbols, relocations) with different field layouts and feature surfaces. ELF is the closest cousin to VOF; you encounter ELF when you run readelf on real binaries (lab 6.5).
Reflection prompts
- The assembler is the first piece of software you wrote that produces software. What changed in your mental model of "what is code" vs "what is data"?
- Pseudo-instructions make assembly easier to read but they hide what the silicon actually does. When does the hiding help you and when does it hurt you?
- Why didn't we just use an existing assembler (GNU
as)? What would you have learned and not learned?
What's next
Week 7 adds the static linker. Your assembler in week 6 produced an object file with unresolved symbols and relocations. The linker resolves those symbols and produces a flat binary that the CPU can execute directly. The next layer of the toolchain.