Week: 9
Points: 20
Time: ~5 hours
Deliverable: toolchain/vm-translator/translator.py update + recursive factorial on silicon + diary/week-09.md
What you ship
toolchain/vm-translator/translator.py— extended with program flow + function-call opsvm/week-09/FibFact.vm— factorial program (recursive)vm/week-09/NestedCall.vm— three-level nesting test- Simulation output showing
factorial(5) = 120 diary/week-09.md
Lab 9.1: Program flow commands
Extend your VMTranslator class with a translate_flow method:
def translate_flow(self, command: str, arg: str):
"""label, goto, if-goto — labels are scoped to current function."""
scoped = f"{self._current_function}${arg}"
if command == 'label':
self.output.append(f"{scoped}:")
elif command == 'goto':
self._emit(f"JAL x0, {scoped}")
elif command == 'if-goto':
self._emit("LW x10, 0(sp)",
"ADDI sp, sp, 4",
f"BNE x10, x0, {scoped}")
else:
raise ValueError(f"Unknown flow command: {command}")
Track the current function name in self._current_function. Update translate_file to set self._current_function when it encounters a function command.
Test with a simple VM loop:
// Loop.vm
function Loop.main 0
push constant 0
label LOOP
push constant 1
add
dup // hypothetical — use push/load workaround
push constant 5
lt
if-goto LOOP
return
Translate and verify the loop executes exactly 5 iterations (count $display fires in the testbench).
Lab 9.2: Implement call, function, and return
Add the calling convention to your translator. The frame layout is:
HIGH ADDRESS
arg[N-1] ← pushed by caller
...
arg[0]
saved ra ← return address
saved x18 ← caller's LCL
saved x19 ← caller's ARG
saved x20 ← caller's THIS
saved x21 ← caller's THAT
local[0] ← callee's LCL base (x18 points here)
...
local[K-1]
← sp
LOW ADDRESS
def translate_function(self, function_name: str, n_locals: int):
self._current_function = function_name
self._call_counter = 0
self.output.append(f"{function_name}:")
for _ in range(n_locals):
self._emit("ADDI sp, sp, -4", "SW x0, 0(sp)")
def translate_call(self, function_name: str, n_args: int):
ret_label = f"RETURN_{self._current_function}_{self._call_counter}"
self._call_counter += 1
self._emit(
# push return address
f"LA x10, {ret_label}",
"ADDI sp, sp, -4", "SW x10, 0(sp)",
# save caller's base pointers
"ADDI sp, sp, -4", "SW x18, 0(sp)", # LCL
"ADDI sp, sp, -4", "SW x19, 0(sp)", # ARG
"ADDI sp, sp, -4", "SW x20, 0(sp)", # THIS
"ADDI sp, sp, -4", "SW x21, 0(sp)", # THAT
# reposition ARG: arg[0] is (5 + n_args) words above current sp
f"ADDI x10, sp, {(5 + n_args) * 4}",
"MV x19, x10",
# set LCL = sp (callee's local base)
"MV x18, sp",
# jump to callee
f"JAL x0, {function_name}",
)
self.output.append(f"{ret_label}:")
def translate_return(self):
self._emit(
# save return address before touching frame
"ADDI x11, x18, 20", # LCL + 5 slots = saved RA location
"LW x11, 0(x11)", # x11 = return address
# reposition return value
"LW x10, 0(sp)", # pop return value
"SW x10, 0(x19)", # *ARG = return value
# restore SP
"ADDI sp, x19, 4",
# restore saved registers (order: THAT, THIS, ARG, LCL)
"LW x21, -4(x18)",
"LW x20, -8(x18)",
"LW x19, -12(x18)",
"LW x18, -16(x18)", # must be last: overwrites x18
# jump to return address
"JALR x0, x11, 0",
)
Test with SingleFunction.vm — a single, non-recursive function call:
function SingleFunction.main 0
push constant 3
call SingleFunction.triple 1
return
function SingleFunction.triple 1
push argument 0
push argument 0
add
push argument 0
add
return
Verify that main returns 9.
Lab 9.3: NestedCall.vm — three levels of nesting
Create vm/week-09/NestedCall.vm:
// NestedCall.vm -- verifies that three levels of call/return restore all frame pointers
function Sys.init 0
push constant 4000
pop temp 1
push constant 5000
pop temp 2
call Foo.main 0
pop temp 3
return
function Foo.main 0
push constant 6000
pop temp 0
push constant 100
push constant 200
call Foo.add 2
pop temp 3
return
function Foo.add 0
push argument 0
push argument 1
add
return
Translate, simulate, and record the value left on the stack after Sys.init returns. Expected: 300 (100 + 200 returned from Foo.add, passed back through Foo.main, returned by Sys.init).
Verify that temp[1], temp[2], and temp[3] have the expected values after each call returns. Corruption of the temp segment indicates a frame-pointer restoration bug.
Lab 9.4: Recursive factorial on silicon
Create vm/week-09/FibFact.vm:
// FibFact.vm -- recursive factorial
// Input: argument 0
// Output: return value = argument 0!
function Math.factorial 0
push argument 0
push constant 1
eq
if-goto BASE_CASE
push argument 0
push argument 0
push constant 1
sub
call Math.factorial 1
call Math.mul 2
return
label BASE_CASE
push constant 1
return
The Math.mul implementation must be in the same VM file (you will add it in Lab 9.5). For now, stub it as:
function Math.mul 0
push argument 0
push argument 1
add // deliberately wrong stub — fix in Lab 9.5
return
Translate and simulate with argument 0 = 5. The stub produces wrong output; record what you observe.
Then complete Lab 9.5 and re-run. Expected output: 120.
Lab 9.5: Software multiply (forward promise to CSA-201)
Replace the Math.mul stub with a correct shift-and-add implementation:
// Math.mul: returns x * y using shift-and-add
// argument 0 = x, argument 1 = y
function Math.mul 3
push constant 0
pop local 0 // result = 0
push argument 0
pop local 1 // shiftedX = x
push constant 0
pop local 2 // j = 0
label MUL_LOOP
push local 2
push constant 16
lt
not
if-goto MUL_DONE
// if (y & (1 << j)) != 0: result += shiftedX
push argument 1
push constant 1
push local 2
call Math.shiftLeft 2 // 1 << j (see Lab 9.5a)
and
push constant 0
eq
not
if-goto MUL_ADD
goto MUL_SKIP
label MUL_ADD
push local 0
push local 1
add
pop local 0
label MUL_SKIP
push local 1
push local 1
add
pop local 1 // shiftedX *= 2
push local 2
push constant 1
add
pop local 2 // j++
goto MUL_LOOP
label MUL_DONE
push local 0
return
Note: you also need Math.shiftLeft. Implement it as repeated doubling (add the number to itself j times), or implement bit-level shift using the and/or primitives.
Alternatively, translate the Math.multiply implementation from week-13-virtus-os.md directly.
Record in diary/week-09.md: How many RV32I-Lite instructions does Math.mul(5, 7) require with your shift-and-add implementation? Count from the simulation trace. This is the baseline for the CSA-201 hardware-multiplier module.
Toolchain Diary — Week 9
Record in diary/week-09.md:
- The frame layout diagram: draw the stack for a call to
Math.factorial(5)when it recursively callsMath.factorial(4). Label every slot — saved ra, saved LCL, saved ARG, saved THIS, saved THAT, locals, and argument. - The ARG repositioning formula: why is ARG repositioned to
sp + (5 + n_args) * 4and not some other value? Where does the 5 come from? - The
translate_returnrestore order: why mustLW x18, -16(x18)be the last restore instruction? What would happen if you restored LCL first? - CSA-102 comparison: Py6502v's VM translator targeted 6502 registers (A, X, Y). This translator targets the RV32I-Lite stack protocol. What property of the 6502's hardware stack (256-byte page 1) would prevent it from supporting deep recursion?
- Lab 9.5 baseline: cycle count for
Math.mul(5, 7).
Grading
| Component | Points |
|---|---|
translator.py: label/goto/if-goto with function-scoped names |
4 |
translator.py: call, function, return — correct frame layout |
8 |
NestedCall.vm: three-level nesting produces correct result |
3 |
Math.factorial(5) = 120 on Tang Primer 25K simulation |
3 |
| Toolchain Diary: frame layout diagram + ARG formula + restore order + cycle count | 2 |