Lab 3.1: Compiler Register Allocator · CSA-201 · Virtus Cyber Academy Classroom

Total points: 25
Estimated time: 4 hours
Prerequisites: CSA-101 compiler (VM translator) working; Python or your compiler's implementation language; Lab 7.4 baseline measurements from CSA-101

Overview

You will add a linear-scan register allocator pass to your CSA-101 VM translator. The pass replaces the naive always-spill policy with a proper live-interval analysis and register assignment. Lab 7.4 documented 3-5x code bloat from the naive spill policy; this lab closes that forward-promise.

Part A: Liveness analysis (8 pts)

A1: Compute live intervals (5 pts)

Write a liveness analysis pass that processes a single function's assembly output (the output of your VM translator before final emission). The pass produces a LiveInterval for each variable or temporary:

class LiveInterval:
    name: str       # variable or temp name
    start: int      # index of first instruction that defines this value
    end: int        # index of last instruction that uses this value

For the purposes of this pass, treat each "variable" as one of:

A named local variable (from the symbol table)
A VM stack position that has been lifted to a named temporary by your translator

Instruction indices are sequential integers; the pass assigns them in document order.

Handling loops: when a variable is used in a loop body that might execute multiple times, extend its live interval to span the entire loop body. Use a conservative approximation: if an instruction is inside a loop (identified by a back-edge in the control flow), extend any live interval that crosses the loop header to the loop footer.

Unit test. Provide the assembly output for a function with 5 local variables; verify that the computed live intervals are correct by hand-checking against the assembly.

A2: Build the register pool (3 pts)

Define the register pool for the allocator: temporaries t0-t6 (7 registers, caller-saved). Build a data structure that tracks which registers are currently assigned (active set) and which are free.

For callee-saved registers (s1-s11): add a flag --use-saved-regs that extends the pool to include s1-s11 at the cost of generating save/restore code in the function prologue/epilogue. Report in Part C whether the flag improves or harms the overall code quality on the benchmark.

Part B: Linear scan allocation (10 pts)

B1: Implement linear scan (7 pts)

Implement the linear-scan algorithm:

sort intervals by start point
active = []                     # (register, interval) pairs sorted by end point
free_regs = register_pool[:]    # copy of the pool

for interval in sorted_intervals:
    expire_old(active, interval.start, free_regs)
    
    if len(free_regs) == 0:
        # spill: evict interval with furthest end
        to_spill = max(active, key=lambda x: x[1].end)
        if to_spill[1].end > interval.end:
            interval.register = to_spill[0]
            to_spill[1].stack_slot = allocate_stack_slot()
            active.remove(to_spill)
            active.append((interval.register, interval))
        else:
            interval.stack_slot = allocate_stack_slot()
    else:
        reg = free_regs.pop()
        interval.register = reg
        active.append((reg, interval))

def expire_old(active, current_start, free_regs):
    for (reg, interval) in list(active):
        if interval.end < current_start:
            free_regs.append(reg)
            active.remove((reg, interval))

B2: Code generation with allocation (3 pts)

Modify your VM translator's code emission stage to use the allocator's output:

When emitting an instruction that reads or writes a variable assigned to a register, use that register directly (no stack access).
When emitting an instruction for a spilled variable, insert lw reg_temp, N(sp) before use and sw reg_temp, N(sp) after definition, where N is the allocated stack slot offset.

Part C: Measurement (7 pts)

C1: Compile the benchmark (4 pts)

Compile benchmark-allocator.jack (provided in courses/csa-201/labs/fixtures/benchmark-allocator.jack) with:

CSA-101 translator (no allocator; always spill)
CSA-201 translator with linear-scan allocator (t0-t6 pool only)
CSA-201 translator with allocator (t0-t6 + s1-s11 pool, with prologue/epilogue)

For each version, count:

Total instruction lines in the output .asm file
Number of lw instructions (loads from stack)
Number of sw instructions (stores to stack)

Record all three sets of counts and compute the reduction ratios.

C2: Analysis (3 pts)

Write a 300-500 word analysis:

What is the actual reduction ratio your allocator achieves vs the CSA-101 baseline? Compare against the Lab 7.4 prediction of 3-5x.
For which functions in the benchmark does the allocator help most? For which does it help least? What property of the function explains this?
Does adding the callee-saved registers (s1-s11) improve the total code quality, accounting for the prologue/epilogue overhead? At what function size does the break-even point occur?

Add a Toolchain Diary entry summarizing the results. Include the exact command line you used to compile the benchmark.

Grading

Part	Criteria	Points
A1	Liveness analysis correct on the unit test; loop handling present	5
A2	Register pool defined; caller-saved and callee-saved pools both available	3
B1	Linear scan algorithm implemented; spill + expire logic correct	7
B2	Code emission uses allocated registers; spills insert lw/sw correctly	3
C1	All three benchmark measurements present and plausible	4
C2	Analysis explains the results; addresses all three questions	3
Total		25