Classroom Glossary Public page

Week 9: VM II — Program Flow and Function Calls

1,013 words

Stack arithmetic is the easy half of the VM. Program flow — labels, unconditional jumps, conditional jumps — is mechanical. The hard half is function calls: a caller must push arguments, transfer control, allow the callee to allocate locals and run, recover its return value, and resume execution at the correct point. Getting the call protocol wrong produces the subtlest bugs in the course: corrupted stacks that manifest only in recursive programs, wrong return values that appear only when two frames coexist.


Reading

  • Petzold Ch 25 ("The World Brain," pp. 361-378): Petzold describes the complete abstraction ladder that lets a programmer write in a high-level language and produce machine instructions. The VM-to-assembly translation is the bottom rung of that ladder that the compiler above will take for granted. (~18 pages)
  • Petzold Ch 22 ("Operating Systems," pp. 311-320): The second visit — Petzold's description of multitasking and process state. Each process has a stack; switching between processes saves and restores the stack pointer. The calling convention you implement this week is the single-process version of that state management. (~10 pages)

Lecture

3 hours. Key arc:

Program flow commands. The VM language's three control-flow instructions map to assembly branch/jump instructions:

label LOOP_START      LOOP_START:      (label definition)
goto LOOP_START       JAL x0, LOOP_START
if-goto LOOP_START    LW x10, 0(sp)   // pop condition
                        ADDI sp, sp, 4
                        BNE x10, x0, LOOP_START  // branch if non-zero
def translate_flow(command: str, arg: str, current_function: str):
    """Emit program-flow instructions. Labels are scoped to current function."""
    # VM labels are function-scoped; emit as FUNCTION$LABEL in assembly
    scoped = f"{current_function}${arg}"
    
    if command == 'label':
        return [f"{scoped}:"]
    
    elif command == 'goto':
        return [f"    JAL x0, {scoped}"]
    
    elif command == 'if-goto':
        return [
            "    LW x10, 0(sp)",
            "    ADDI sp, sp, 4",
            f"   BNE x10, x0, {scoped}",
        ]
    
    raise ValueError(f"Unknown flow command: {command}")

The function-call protocol. The calling convention between VM functions:

  1. Caller pushes N arguments using push commands.
  2. Caller emits call FunctionName N.
  3. Callee is entered; the translator emits a prologue that allocates K locals.
  4. Callee runs; results are pushed onto the stack.
  5. Callee emits return.
  6. Caller resumes; the stack has been restored; one result is on top.

The frame layout (from high address to low):

HIGH ADDRESS
  arg[N-1]         pushed by caller; argument[N-1]
  arg[N-2]
  ...
  arg[0]           pushed by caller; argument[0]
  saved ra         return address (caller's PC+4 at call site)
  saved x18 (LCL)  caller's local segment base pointer
  saved x19 (ARG)  caller's argument segment base pointer
  saved x20 (THIS)
  saved x21 (THAT)
  local[0]         callee's local segment base (x18 points here)
  local[1]
  ...
  local[K-1]
   sp (current stack top)
LOW ADDRESS

Translating call. The caller saves state and transfers control:

def translate_call(function_name: str, n_args: int, call_id: int) -> list[str]:
    """Emit the 'call FunctionName N' instruction."""
    return_label = f"RETURN_{function_name}_{call_id}"
    
    return [
        # Push return address
        f"    LA x10, {return_label}",    # load return address
        f"    ADDI sp, sp, -4",
        f"    SW x10, 0(sp)",
        
        # Save caller's base pointer registers
        f"    ADDI sp, sp, -4",
        f"    SW x18, 0(sp)",             # save LCL
        f"    ADDI sp, sp, -4",
        f"    SW x19, 0(sp)",             # save ARG
        f"    ADDI sp, sp, -4",
        f"    SW x20, 0(sp)",             # save THIS
        f"    ADDI sp, sp, -4",
        f"    SW x21, 0(sp)",             # save THAT
        
        # Reposition ARG: points to arg[0] = sp + 5*4 + n_args*4 above
        f"    ADDI x10, sp, {(5 + n_args) * 4}",
        f"    MV x19, x10",              # ARG = repointed
        
        # LCL = sp (callee's local segment starts here)
        f"    MV x18, sp",
        
        # Jump to callee
        f"    JAL x0, {function_name}",
        
        # Return label
        f"{return_label}:",
    ]

Translating function. The callee allocates K local variables (initialized to 0):

def translate_function(function_name: str, n_locals: int) -> list[str]:
    """Emit the function entry point and local variable allocation."""
    lines = [f"{function_name}:"]
    
    # Allocate and zero-initialize N locals
    for _ in range(n_locals):
        lines += [
            "    ADDI sp, sp, -4",
            "    SW x0, 0(sp)",       # local[i] = 0
        ]
    
    return lines

Translating return.

def translate_return() -> list[str]:
    """Emit the function return sequence."""
    return [
        # Save return address (it's 5 words above current LCL pointer)
        "    LW x10, 4(x18)",          # FRAME = LCL; RET_ADDR = *(LCL+4) ... 
        # Actually: return addr is at LCL+5*4 below the saved registers
        # The exact offset depends on the frame layout above; 
        # compute from x18 (LCL = base of saved area):
        "    ADDI x11, x18, 20",       # x11 = LCL + 5 slots up = saved RA location
        "    LW x11, 0(x11)",          # x11 = return address
        
        # Reposition the return value for caller
        "    LW x10, 0(sp)",           # pop return value
        "    SW x10, 0(x19)",          # *ARG = return value (arg[0] slot)
        
        # Restore SP to just above ARG (one word: the return value)
        "    ADDI sp, x19, 4",
        
        # Restore saved registers (from the saved area above LCL)
        "    LW x21, -4(x18)",         # restore THAT
        "    LW x20, -8(x18)",         # restore THIS
        "    LW x19, -12(x18)",        # restore ARG
        "    LW x18, -16(x18)",        # restore LCL (must be last: overwrites x18)
        
        # Jump to return address
        "    JALR x0, x11, 0",
    ]

Recursive factorial demonstration. A program that requires the full call protocol:

// FibFact.vm
function factorial 0
    push argument 0
    push constant 1
    eq
    if-goto BASE_CASE
    push argument 0
    push argument 0
    push constant 1
    sub
    call factorial 1
    mul
    return
label BASE_CASE
    push constant 1
    return

This program requires:

  1. Recursive call (a function calls itself)
  2. Correct ARG repositioning on each call
  3. Correct return sequence restoring all base pointers
  4. The mul VM command (not in the arithmetic set — Lab 9.4 forward promise to CSA-201)

Lab exercises

Five labs in labs/lab-9.md. Plan for ~5 hours.

  • Lab 9.1. Extend translator.py with label, goto, if-goto support. Translate a simple loop and verify it produces the correct output in simulation.
  • Lab 9.2. Implement call, function, and return. Verify on SingleFunction.vm — a single function call with no recursion.
  • Lab 9.3. Translate NestedCall.vm (three levels of call nesting) and verify the return values are correct for all three.
  • Lab 9.4. Translate FibFact.vm (factorial, recursive) and run on the Tang Primer 25K CPU simulation. The expected output for input 5 is 120. If it fails, the most likely cause is an off-by-one in the ARG repositioning or a wrong restore offset in return.
  • Lab 9.5 (forward promise: mul opcode). The factorial program needs mul, but the RV32I-Lite ISA in CSA-110 does not include a hardware multiplier. Add a software multiply to the VM translator that expands mul into a shift-and-add loop (or repeated addition). Record the cycle count for factorial(5) in your Toolchain Diary as the baseline for the CSA-201 hardware-multiplier optimization.

Independent practice

  • Read Petzold Ch 25, pp. 361-378. Note the passage where Petzold describes how a high-level language expression maps to machine instructions through a series of translation layers. Draw the translation stack for a VirtusLang method call: source text → tokens → parse tree → VM bytecode → RV32I-Lite assembly → binary.
  • Compare the calling convention above against the RISC-V ABI (ABI specification, §18.2). Identify two differences between the CSA-110 VM calling convention and the standard RISC-V ABI calling convention. Which difference would matter most if you wanted to call a C function from VirtusLang?

Architecture comparison sidebar

Calling conventions: 6502 JSR vs RV32I-Lite JALR vs the VM call protocol.

The 6502 JSR instruction pushes a 16-bit return address to the hardware stack (page 1, 0x0100-0x01FF) and takes 6 clock cycles. The RTS instruction pops it and returns. The 6502 hardware stack is 256 bytes deep — enough for about 32 levels of subroutine nesting (8 bytes of context per frame) before it overwrites itself silently. No argument passing convention is built in; Py6502v used fixed memory addresses.

RV32I-Lite's JALR x1, x0, addr stores the return address in x1 (ra) and jumps in 1 cycle. For non-leaf functions, ra must be saved to the stack before calling a nested function — which the VM calling convention does explicitly as part of the call sequence. The argument passing convention (arguments above the frame, not in fixed addresses) lets recursive functions have independent argument copies.

The full VM call protocol costs ~14 instructions on entry and ~9 on return. This is more overhead than a 6502 JSR, but the 6502 JSR cannot support separate local variable frames for recursive calls. The tradeoff is generality: the VM calling convention scales from factorial(1) to a full OS with nested interrupt handlers.

For comparison, the x86-64 System V ABI passes the first six integer arguments in registers (rdi, rsi, rdx, rcx, r8, r9) with no stack overhead for the arguments themselves. RV32I-Lite's ABI (a0-a7 in the standard RISC-V ABI) does the same. The CSA-110 VM convention passes arguments on the stack — slightly slower, but identical in structure to what nand2tetris defines.


Reflection prompts

  1. The return sequence restores the caller's base pointers from saved locations on the frame. What would happen if the callee overwrote one of those saved locations before returning? Design a test case that exposes this bug.
  2. The VM calling convention specifies that the callee's first push goes to the slot immediately below the saved THAT pointer. If a callee pushes two return values instead of one (a hypothetical multi-return extension), what change to the return sequence would be required? Would the caller need to change too?
  3. Lab 9.5 asks you to implement software multiplication as a shift-and-add. How many RV32I-Lite instructions does mul(5, 7) require with shift-and-add? How many would a hardware MUL instruction require? This is the question CSA-201 Module 2 answers with hardware.

What's next

Week 10 begins the compiler: the component that reads VirtusLang source code and emits VM bytecode. Week 10 covers the front end — tokenization and recursive-descent parsing. The VM translator you have built this week is what the compiler's back end will target.