Classroom Glossary Public page

Lab 3.1: Compiler Register Allocator

659 words

Total points: 25
Estimated time: 4 hours
Prerequisites: CSA-101 compiler (VM translator) working; Python or your compiler's implementation language; Lab 7.4 baseline measurements from CSA-101


Overview

You will add a linear-scan register allocator pass to your CSA-101 VM translator. The pass replaces the naive always-spill policy with a proper live-interval analysis and register assignment. Lab 7.4 documented 3-5x code bloat from the naive spill policy; this lab closes that forward-promise.


Part A: Liveness analysis (8 pts)

A1: Compute live intervals (5 pts)

Write a liveness analysis pass that processes a single function's assembly output (the output of your VM translator before final emission). The pass produces a LiveInterval for each variable or temporary:

class LiveInterval:
    name: str       # variable or temp name
    start: int      # index of first instruction that defines this value
    end: int        # index of last instruction that uses this value

For the purposes of this pass, treat each "variable" as one of:

  • A named local variable (from the symbol table)
  • A VM stack position that has been lifted to a named temporary by your translator

Instruction indices are sequential integers; the pass assigns them in document order.

Handling loops: when a variable is used in a loop body that might execute multiple times, extend its live interval to span the entire loop body. Use a conservative approximation: if an instruction is inside a loop (identified by a back-edge in the control flow), extend any live interval that crosses the loop header to the loop footer.

Unit test. Provide the assembly output for a function with 5 local variables; verify that the computed live intervals are correct by hand-checking against the assembly.

A2: Build the register pool (3 pts)

Define the register pool for the allocator: temporaries t0-t6 (7 registers, caller-saved). Build a data structure that tracks which registers are currently assigned (active set) and which are free.

For callee-saved registers (s1-s11): add a flag --use-saved-regs that extends the pool to include s1-s11 at the cost of generating save/restore code in the function prologue/epilogue. Report in Part C whether the flag improves or harms the overall code quality on the benchmark.


Part B: Linear scan allocation (10 pts)

B1: Implement linear scan (7 pts)

Implement the linear-scan algorithm:

sort intervals by start point
active = []                     # (register, interval) pairs sorted by end point
free_regs = register_pool[:]    # copy of the pool

for interval in sorted_intervals:
    expire_old(active, interval.start, free_regs)
    
    if len(free_regs) == 0:
        # spill: evict interval with furthest end
        to_spill = max(active, key=lambda x: x[1].end)
        if to_spill[1].end > interval.end:
            interval.register = to_spill[0]
            to_spill[1].stack_slot = allocate_stack_slot()
            active.remove(to_spill)
            active.append((interval.register, interval))
        else:
            interval.stack_slot = allocate_stack_slot()
    else:
        reg = free_regs.pop()
        interval.register = reg
        active.append((reg, interval))

def expire_old(active, current_start, free_regs):
    for (reg, interval) in list(active):
        if interval.end < current_start:
            free_regs.append(reg)
            active.remove((reg, interval))

B2: Code generation with allocation (3 pts)

Modify your VM translator's code emission stage to use the allocator's output:

  • When emitting an instruction that reads or writes a variable assigned to a register, use that register directly (no stack access).
  • When emitting an instruction for a spilled variable, insert lw reg_temp, N(sp) before use and sw reg_temp, N(sp) after definition, where N is the allocated stack slot offset.

Part C: Measurement (7 pts)

C1: Compile the benchmark (4 pts)

Compile benchmark-allocator.jack (provided in courses/csa-201/labs/fixtures/benchmark-allocator.jack) with:

  1. CSA-101 translator (no allocator; always spill)
  2. CSA-201 translator with linear-scan allocator (t0-t6 pool only)
  3. CSA-201 translator with allocator (t0-t6 + s1-s11 pool, with prologue/epilogue)

For each version, count:

  • Total instruction lines in the output .asm file
  • Number of lw instructions (loads from stack)
  • Number of sw instructions (stores to stack)

Record all three sets of counts and compute the reduction ratios.

C2: Analysis (3 pts)

Write a 300-500 word analysis:

  1. What is the actual reduction ratio your allocator achieves vs the CSA-101 baseline? Compare against the Lab 7.4 prediction of 3-5x.

  2. For which functions in the benchmark does the allocator help most? For which does it help least? What property of the function explains this?

  3. Does adding the callee-saved registers (s1-s11) improve the total code quality, accounting for the prologue/epilogue overhead? At what function size does the break-even point occur?

Add a Toolchain Diary entry summarizing the results. Include the exact command line you used to compile the benchmark.


Grading

Part Criteria Points
A1 Liveness analysis correct on the unit test; loop handling present 5
A2 Register pool defined; caller-saved and callee-saved pools both available 3
B1 Linear scan algorithm implemented; spill + expire logic correct 7
B2 Code emission uses allocated registers; spills insert lw/sw correctly 3
C1 All three benchmark measurements present and plausible 4
C2 Analysis explains the results; addresses all three questions 3
Total 25