Week: 6
Points: 20
Time: ~5 hours
Deliverable: toolchain/assembler/ directory + round-trip verification output + diary/week-06.md
What you ship
toolchain/assembler/pass1.py— tokenizer and label collectortoolchain/assembler/encode.py— instruction encoders for all 11 instructions + 8 pseudo-instructionstoolchain/assembler/pass2.py— encoding passtoolchain/assembler/emit_vof.py— VOF v1 emittertoolchain/assembler/asm.py— command-line entry pointasm/week-06/sum-to-n-vof.vof— assembled outputlab6_roundtrip.txt— diff showing assembler output matchesriscv64-linux-gnu-asoutputdiary/week-06.md
Lab 6.1: Pass 1 — tokenizer and label collector
Write pass1.py. Requirements:
- Strip comments (everything after
#) - Handle label definitions: lines ending with
:record the label and its address inself.symbols - Handle directives:
.text,.data,.globl(ignore for now; reserve for Week 7) - For each non-label, non-directive line: append
(address, mnemonic, operands_list)toself.instructions - Increment
self.lcby 4 for each instruction
Test with five programs including:
- A program with forward references (labels defined after use)
- A program with two labels pointing to the same address (adjacent labels)
- A program with no labels
python3 -c "
from pass1 import RV32ILiteAssembler
asm = RV32ILiteAssembler()
asm.pass1(open('asm/week-06/test-forward-ref.s').read())
print('Symbols:', asm.symbols)
print('Instructions:', len(asm.instructions))
"
Lab 6.2: Encoder
Write encode.py. Implement encoders for all 11 RV32I-Lite instructions and the 8 pseudo-instructions:
| Pseudo | Expands to |
|---|---|
LI rd, imm |
ADDI rd, x0, imm |
MV rd, rs |
ADDI rd, rs, 0 |
NOP |
ADDI x0, x0, 0 |
NOT rd, rs |
XORI rd, rs, -1 |
NEG rd, rs |
SUB rd, x0, rs |
J label |
JAL x0, label |
RET |
JALR x0, x1, 0 |
CALL label |
Two instructions: AUIPC x1, upper; JALR x1, x1, lower (or use JAL if within range) |
Unit-test each encoder:
# tests/test_encode.py
import unittest
from encode import encode_add, encode_addi, encode_beq, ...
class TestEncoders(unittest.TestCase):
def test_add(self):
# ADD x3, x1, x2
self.assertEqual(encode_add(3, 1, 2), 0x00208133) # (not the real hex; verify manually)
def test_beq_forward(self):
# BEQ x1, x2 with +8 offset
instr = encode_beq(1, 2, 8)
self.assertEqual(instr & 0x7F, 0b1100011) # B-type opcode
# Verify the immediate reassembly produces +8
Lab 6.3: Pass 2 and VOF emitter
Write pass2.py and emit_vof.py.
pass2.py iterates over self.instructions and calls the appropriate encoder for each mnemonic. For branch and jump instructions, it looks up the target in self.symbols and computes the PC-relative offset.
emit_vof.py writes the VOF v1 binary format: header (magic bytes VOF1), .text section, .symtab section, .reloc section. Use the VOF v1 spec at handouts/vof-v1-spec.md.
Assemble sum-to-n.s:
python3 toolchain/assembler/asm.py asm/week-06/sum-to-n.s -o asm/week-06/sum-to-n-vof.vof
Extract the .text section from the VOF and compare against Lab 4.4's verified hex:
python3 -c "
from emit_vof import read_vof
vof = read_vof('asm/week-06/sum-to-n-vof.vof')
print(' '.join(f'{b:02x}' for b in vof.text))
"
Lab 6.4: Round-trip verification
This is the assembler's correctness certificate. Assemble sum-to-n.s with your assembler and with riscv64-linux-gnu-as. Compare the text sections:
# Your assembler
python3 toolchain/assembler/asm.py asm/week-06/sum-to-n.s -o /tmp/yours.vof
python3 -c "from emit_vof import read_vof; v=read_vof('/tmp/yours.vof'); open('/tmp/yours.hex','wb').write(v.text)"
# GNU assembler
riscv64-linux-gnu-as -march=rv32i -mabi=ilp32 asm/week-06/sum-to-n.s -o /tmp/gnu.o
riscv64-linux-gnu-objdump -j .text -s /tmp/gnu.o | grep '^ ' | \
awk '{print $2$3$4$5}' | xxd -r -p > /tmp/gnu.hex
diff <(xxd /tmp/yours.hex) <(xxd /tmp/gnu.hex) > lab6_roundtrip.txt
cat lab6_roundtrip.txt
# Expected: empty (no differences)
If there are differences: identify which instruction they're from, explain the discrepancy, and fix your encoder.
Toolchain Diary
Record in diary/week-06.md:
- Whether you used the one-pass or two-pass strategy initially and what forced you to two passes
- Comparison to your CSA-102 6502 assembler: line count, complexity, what's different
- The round-trip result: any discrepancies found and fixed
Grading
| Component | Points |
|---|---|
pass1.py: tokenizer handles forward references, adjacent labels, comments |
5 |
encode.py: all 11 instructions + 8 pseudo-instructions with unit tests |
8 |
Round-trip verification: lab6_roundtrip.txt shows zero differences |
5 |
| Toolchain Diary: CSA-102 comparison | 2 |