You wrote a 6502 assembler in CSA-102. That one could usually do a single pass because 6502 code tends to fit in 64 KB and forward references are rare. The RV32I-Lite assembler needs two passes because branch and jump instructions encode a PC-relative offset, and you cannot compute the offset until you know the target label's address — which requires a first pass to collect all the labels. The concept is the same; the second pass is new.
Reading
- Petzold weave anchors. Ch 17 (Automation, p. 239) third visit: now read for Petzold's discussion of the assembler as the first layer of automation that frees the programmer from managing bit patterns; Ch 24 (Languages High and Low, p. 333) which frames the progression from assembly to high-level language. ~28 pages.
- Cross-chapter handout. handouts/cross-chapter-rv32i-lite-encoding-card.md: your assembler must implement this encoding. Keep it alongside your Python source.
- Architecture reference. The VOF v1 object file format specification (handout pending publication; your instructor will distribute the current draft).
Lecture
3 hours. Key arc:
Why two passes. Consider the assembly program:
BEQ x1, x2, done # forward reference: 'done' is not yet defined
ADDI x1, x1, -1
JAL x0, loop
done:
SW x1, 0(x0)
When the assembler encounters BEQ x1, x2, done on the first line, it does not yet know the address of done. It cannot encode the B-type immediate without knowing the offset. The solution: pass 1 records the address of every label as it encounters them (building the symbol table); pass 2 uses the symbol table to resolve forward references and emit the final encoding.
The 6502 assembler you wrote in CSA-102 could handle most programs in one pass because the 6502's short (8-bit) branches rarely reach beyond the current page, so forward references either didn't arise or were handled with a placeholder. The RV32I-Lite assembler formalizes two-pass as the standard approach.
Pass 1: Tokenization and label collection.
class RV32ILiteAssembler:
def __init__(self):
self.symbols = {} # label -> address
self.instructions = [] # list of (address, mnemonic, operands)
self.lc = 0 # location counter (current address)
def pass1(self, source: str):
for line in source.splitlines():
line = line.split('#')[0].strip() # strip comments
if not line:
continue
if line.endswith(':'): # label definition
label = line[:-1]
self.symbols[label] = self.lc
else:
tokens = line.split()
mnemonic = tokens[0].upper()
operands = tokens[1:]
self.instructions.append((self.lc, mnemonic, operands))
self.lc += 4 # every instruction is 4 bytes
Pass 2: Encoding. For each instruction, call the appropriate encoder. Immediate encoders retrieve target addresses from self.symbols.
def encode_beq(self, rs1: int, rs2: int, label: str, pc: int) -> int:
target = self.symbols[label]
offset = target - pc
# B-type immediate: sign-extend 13-bit offset (bit 0 is always 0)
if not (-4096 <= offset < 4096):
raise AssemblyError(f"BEQ offset {offset} out of range")
imm = offset & 0x1FFF
# Pack into B-type encoding: [12][10:5] [rs2][rs1] funct3 [4:1][11] opcode
instr = (((imm >> 12) & 1) << 31) | (((imm >> 5) & 0x3F) << 25) | \
(rs2 << 20) | (rs1 << 15) | (0b000 << 12) | \
(((imm >> 1) & 0xF) << 8) | (((imm >> 11) & 1) << 7) | 0b1100011
return instr
The VOF v1 object file format. The assembler does not emit a flat binary; it emits a VOF (Virtus Object File) that the linker consumes. The VOF v1 format has three sections:
.text: the machine code.data: initialized data.symtab: the symbol table (label name, section, offset, size).reloc: relocation entries (locations in.textthat need patching when linked with other object files)
The 6502 comparison. Your CSA-102 assembler emitted a flat binary for direct loading into the 6502's address space. The flat binary worked because the 6502's address space is small (64 KB), the program loaded at a fixed address, and there were no external references to resolve. RV32I-Lite programs link together multiple object files (the main program + the Virtus OS stdlib) so the VOF format is needed.
Lab exercises
Four labs in labs/lab-6.md. Plan for ~5 hours.
- Lab 6.1. Write
pass1.py: tokenizer + label collector. Test against five assembly programs including at least two with forward references. - Lab 6.2. Write
encode.py: encoders for all 11 RV32I-Lite instruction types plus the 8 pseudo-instructions. Use the encoding card. Unit-test each encoder independently with known inputs and expected 32-bit outputs. - Lab 6.3. Write
pass2.pyandemit_vof.py. Assemble thesum-to-Nprogram from Lab 4.3. Compare the VOF.textsection against the hand-encoded hex from Lab 4.4. They must be identical. - Lab 6.4 (round-trip). Assemble
sum-to-N.s→sum-to-N.vof. Extract the.textsection. Feed it toriscv64-linux-gnu-objdumpand verify the disassembly matches the original source. This is the assembler's correctness certificate.
Independent practice
- Read Petzold Ch 17 (third visit). Petzold describes the assembler as the first step in automating the programmer's job. Note the specific moment where the assembler transitions from "I have to look up the encoding" to "the assembler looks up the encoding." What does this tell you about the relationship between a programmer and their tools?
- Compare the complexity of your RV32I-Lite assembler (in Python lines of code) against your CSA-102 6502 assembler. What accounts for the difference? Is the RV32I-Lite assembler harder to write, or just longer?
- Update your Toolchain Diary. Week 6 introduces:
riscv64-linux-gnu-as(for round-trip verification), VOF v1 format, the two-pass assembly pattern.
Architecture comparison sidebar
VOF v1 vs ELF vs Mach-O vs PE vs 6502 flat binary.
Your Py6502v compiler in CSA-102 emitted a flat binary: a sequence of bytes starting at the 6502's reset vector address. No header, no sections, no relocation table. The loader simply wrote the bytes to the expected address range. This worked because there was only one program and one library, and both were assembled together at known addresses.
VOF v1 is a teaching object file format modeled on ELF (Executable and Linkable Format). ELF is the object file format used by Linux, FreeBSD, and most Unix-like systems. An ELF file has an ELF header, a program header table (for the linker's use), a section header table (.text, .data, .rodata, .bss, .symtab, .strtab, .rela.text), and the section data. The static linker reads these sections, resolves symbols, applies relocations, and writes the final flat binary that the OS loader actually puts in memory.
Mach-O (macOS / iOS) and PE (Windows) follow the same conceptual structure with different on-disk layouts. The fundamental operations -- section collection, symbol resolution, relocation patching -- are identical.
CSA-110's VOF v1 is intentionally simpler than ELF: no string tables for section names, no dynamic linking, no DWARF debug info. It has enough structure to teach symbol resolution and relocation without the full ELF complexity. CSA-201's linker lab expands to real ELF.
Reflection prompts
- Your assembler's pass 1 discovers all labels before pass 2 encodes any instructions. This means the entire source file must fit in memory during assembly. What would change about the design if you needed to assemble a 100 MB source file?
- The B-type immediate encoding splits the bits across the instruction word. Your
encode_beqfunction had to manually reassemble those split bits. What is the hardware reason for this split? (Hint: see Week 4's architecture comparison sidebar.) - Your CSA-102 6502 assembler did not need a relocation table because all addresses were absolute. Your RV32I-Lite assembler adds relocation entries for addresses that depend on link-time placement. What kinds of bugs become possible if you link without applying relocations?
What's next
Week 7 takes the VOF files your assembler emits and links them together into a flat binary. The static linker resolves symbol references that cross object-file boundaries -- specifically, the program's calls to Virtus OS standard-library functions, which live in separately-assembled object files.