Lab 9: VM II — Program Flow and Function Calls · CSA-110

Week: 9
Points: 20
Time: ~5 hours
Deliverable: toolchain/vm-translator/translator.py update + recursive factorial on silicon + diary/week-09.md

What you ship

toolchain/vm-translator/translator.py — extended with program flow + function-call ops
vm/week-09/FibFact.vm — factorial program (recursive)
vm/week-09/NestedCall.vm — three-level nesting test
Simulation output showing factorial(5) = 120
diary/week-09.md

Lab 9.1: Program flow commands

Extend your VMTranslator class with a translate_flow method:

def translate_flow(self, command: str, arg: str):
    """label, goto, if-goto — labels are scoped to current function."""
    scoped = f"{self._current_function}${arg}"
    if command == 'label':
        self.output.append(f"{scoped}:")
    elif command == 'goto':
        self._emit(f"JAL x0, {scoped}")
    elif command == 'if-goto':
        self._emit("LW x10, 0(sp)",
                   "ADDI sp, sp, 4",
                   f"BNE x10, x0, {scoped}")
    else:
        raise ValueError(f"Unknown flow command: {command}")

Track the current function name in self._current_function. Update translate_file to set self._current_function when it encounters a function command.

Test with a simple VM loop:

// Loop.vm
function Loop.main 0
    push constant 0
label LOOP
    push constant 1
    add
    dup                 // hypothetical — use push/load workaround
    push constant 5
    lt
    if-goto LOOP
    return

Translate and verify the loop executes exactly 5 iterations (count $display fires in the testbench).

Lab 9.2: Implement call, function, and return

Add the calling convention to your translator. The frame layout is:

HIGH ADDRESS
  arg[N-1]    ← pushed by caller
  ...
  arg[0]
  saved ra    ← return address
  saved x18   ← caller's LCL
  saved x19   ← caller's ARG
  saved x20   ← caller's THIS
  saved x21   ← caller's THAT
  local[0]    ← callee's LCL base (x18 points here)
  ...
  local[K-1]
  ← sp
LOW ADDRESS

def translate_function(self, function_name: str, n_locals: int):
    self._current_function = function_name
    self._call_counter = 0
    self.output.append(f"{function_name}:")
    for _ in range(n_locals):
        self._emit("ADDI sp, sp, -4", "SW x0, 0(sp)")

def translate_call(self, function_name: str, n_args: int):
    ret_label = f"RETURN_{self._current_function}_{self._call_counter}"
    self._call_counter += 1
    self._emit(
        # push return address
        f"LA x10, {ret_label}",
        "ADDI sp, sp, -4", "SW x10, 0(sp)",
        # save caller's base pointers
        "ADDI sp, sp, -4", "SW x18, 0(sp)",   # LCL
        "ADDI sp, sp, -4", "SW x19, 0(sp)",   # ARG
        "ADDI sp, sp, -4", "SW x20, 0(sp)",   # THIS
        "ADDI sp, sp, -4", "SW x21, 0(sp)",   # THAT
        # reposition ARG: arg[0] is (5 + n_args) words above current sp
        f"ADDI x10, sp, {(5 + n_args) * 4}",
        "MV x19, x10",
        # set LCL = sp (callee's local base)
        "MV x18, sp",
        # jump to callee
        f"JAL x0, {function_name}",
    )
    self.output.append(f"{ret_label}:")

def translate_return(self):
    self._emit(
        # save return address before touching frame
        "ADDI x11, x18, 20",       # LCL + 5 slots = saved RA location
        "LW x11, 0(x11)",          # x11 = return address
        # reposition return value
        "LW x10, 0(sp)",           # pop return value
        "SW x10, 0(x19)",          # *ARG = return value
        # restore SP
        "ADDI sp, x19, 4",
        # restore saved registers (order: THAT, THIS, ARG, LCL)
        "LW x21, -4(x18)",
        "LW x20, -8(x18)",
        "LW x19, -12(x18)",
        "LW x18, -16(x18)",        # must be last: overwrites x18
        # jump to return address
        "JALR x0, x11, 0",
    )

Test with SingleFunction.vm — a single, non-recursive function call:

function SingleFunction.main 0
    push constant 3
    call SingleFunction.triple 1
    return

function SingleFunction.triple 1
    push argument 0
    push argument 0
    add
    push argument 0
    add
    return

Verify that main returns 9.

Lab 9.3: NestedCall.vm — three levels of nesting

Create vm/week-09/NestedCall.vm:

// NestedCall.vm -- verifies that three levels of call/return restore all frame pointers
function Sys.init 0
    push constant 4000
    pop temp 1
    push constant 5000
    pop temp 2
    call Foo.main 0
    pop temp 3
    return

function Foo.main 0
    push constant 6000
    pop temp 0
    push constant 100
    push constant 200
    call Foo.add 2
    pop temp 3
    return

function Foo.add 0
    push argument 0
    push argument 1
    add
    return

Translate, simulate, and record the value left on the stack after Sys.init returns. Expected: 300 (100 + 200 returned from Foo.add, passed back through Foo.main, returned by Sys.init).

Verify that temp[1], temp[2], and temp[3] have the expected values after each call returns. Corruption of the temp segment indicates a frame-pointer restoration bug.

Lab 9.4: Recursive factorial on silicon

Create vm/week-09/FibFact.vm:

// FibFact.vm -- recursive factorial
// Input: argument 0
// Output: return value = argument 0!

function Math.factorial 0
    push argument 0
    push constant 1
    eq
    if-goto BASE_CASE
    push argument 0
    push argument 0
    push constant 1
    sub
    call Math.factorial 1
    call Math.mul 2
    return
label BASE_CASE
    push constant 1
    return

The Math.mul implementation must be in the same VM file (you will add it in Lab 9.5). For now, stub it as:

function Math.mul 0
    push argument 0
    push argument 1
    add        // deliberately wrong stub — fix in Lab 9.5
    return

Translate and simulate with argument 0 = 5. The stub produces wrong output; record what you observe.

Then complete Lab 9.5 and re-run. Expected output: 120.

Lab 9.5: Software multiply (forward promise to CSA-201)

Replace the Math.mul stub with a correct shift-and-add implementation:

// Math.mul: returns x * y using shift-and-add
// argument 0 = x, argument 1 = y
function Math.mul 3
    push constant 0
    pop local 0        // result = 0
    push argument 0
    pop local 1        // shiftedX = x
    push constant 0
    pop local 2        // j = 0
label MUL_LOOP
    push local 2
    push constant 16
    lt
    not
    if-goto MUL_DONE
    // if (y & (1 << j)) != 0: result += shiftedX
    push argument 1
    push constant 1
    push local 2
    call Math.shiftLeft 2   // 1 << j  (see Lab 9.5a)
    and
    push constant 0
    eq
    not
    if-goto MUL_ADD
    goto MUL_SKIP
label MUL_ADD
    push local 0
    push local 1
    add
    pop local 0
label MUL_SKIP
    push local 1
    push local 1
    add
    pop local 1        // shiftedX *= 2
    push local 2
    push constant 1
    add
    pop local 2        // j++
    goto MUL_LOOP
label MUL_DONE
    push local 0
    return

Note: you also need Math.shiftLeft. Implement it as repeated doubling (add the number to itself j times), or implement bit-level shift using the and/or primitives.

Alternatively, translate the Math.multiply implementation from week-13-virtus-os.md directly.

Record in diary/week-09.md: How many RV32I-Lite instructions does Math.mul(5, 7) require with your shift-and-add implementation? Count from the simulation trace. This is the baseline for the CSA-201 hardware-multiplier module.

Toolchain Diary — Week 9

Record in diary/week-09.md:

The frame layout diagram: draw the stack for a call to Math.factorial(5) when it recursively calls Math.factorial(4). Label every slot — saved ra, saved LCL, saved ARG, saved THIS, saved THAT, locals, and argument.
The ARG repositioning formula: why is ARG repositioned to sp + (5 + n_args) * 4 and not some other value? Where does the 5 come from?
The translate_return restore order: why must LW x18, -16(x18) be the last restore instruction? What would happen if you restored LCL first?
CSA-102 comparison: Py6502v's VM translator targeted 6502 registers (A, X, Y). This translator targets the RV32I-Lite stack protocol. What property of the 6502's hardware stack (256-byte page 1) would prevent it from supporting deep recursion?
Lab 9.5 baseline: cycle count for Math.mul(5, 7).

Grading

Component	Points
`translator.py`: label/goto/if-goto with function-scoped names	4
`translator.py`: call, function, return — correct frame layout	8
`NestedCall.vm`: three-level nesting produces correct result	3
`Math.factorial(5) = 120` on Tang Primer 25K simulation	3
Toolchain Diary: frame layout diagram + ARG formula + restore order + cycle count	2