Total points: 25
Estimated time: 4 hours
Prerequisites: CSA-101 compiler (VM translator) working; Python or your compiler's implementation language; Lab 7.4 baseline measurements from CSA-101
Overview
You will add a linear-scan register allocator pass to your CSA-101 VM translator. The pass replaces the naive always-spill policy with a proper live-interval analysis and register assignment. Lab 7.4 documented 3-5x code bloat from the naive spill policy; this lab closes that forward-promise.
Part A: Liveness analysis (8 pts)
A1: Compute live intervals (5 pts)
Write a liveness analysis pass that processes a single function's assembly output (the output of your VM translator before final emission). The pass produces a LiveInterval for each variable or temporary:
class LiveInterval:
name: str # variable or temp name
start: int # index of first instruction that defines this value
end: int # index of last instruction that uses this value
For the purposes of this pass, treat each "variable" as one of:
- A named local variable (from the symbol table)
- A VM stack position that has been lifted to a named temporary by your translator
Instruction indices are sequential integers; the pass assigns them in document order.
Handling loops: when a variable is used in a loop body that might execute multiple times, extend its live interval to span the entire loop body. Use a conservative approximation: if an instruction is inside a loop (identified by a back-edge in the control flow), extend any live interval that crosses the loop header to the loop footer.
Unit test. Provide the assembly output for a function with 5 local variables; verify that the computed live intervals are correct by hand-checking against the assembly.
A2: Build the register pool (3 pts)
Define the register pool for the allocator: temporaries t0-t6 (7 registers, caller-saved). Build a data structure that tracks which registers are currently assigned (active set) and which are free.
For callee-saved registers (s1-s11): add a flag --use-saved-regs that extends the pool to include s1-s11 at the cost of generating save/restore code in the function prologue/epilogue. Report in Part C whether the flag improves or harms the overall code quality on the benchmark.
Part B: Linear scan allocation (10 pts)
B1: Implement linear scan (7 pts)
Implement the linear-scan algorithm:
sort intervals by start point
active = [] # (register, interval) pairs sorted by end point
free_regs = register_pool[:] # copy of the pool
for interval in sorted_intervals:
expire_old(active, interval.start, free_regs)
if len(free_regs) == 0:
# spill: evict interval with furthest end
to_spill = max(active, key=lambda x: x[1].end)
if to_spill[1].end > interval.end:
interval.register = to_spill[0]
to_spill[1].stack_slot = allocate_stack_slot()
active.remove(to_spill)
active.append((interval.register, interval))
else:
interval.stack_slot = allocate_stack_slot()
else:
reg = free_regs.pop()
interval.register = reg
active.append((reg, interval))
def expire_old(active, current_start, free_regs):
for (reg, interval) in list(active):
if interval.end < current_start:
free_regs.append(reg)
active.remove((reg, interval))
B2: Code generation with allocation (3 pts)
Modify your VM translator's code emission stage to use the allocator's output:
- When emitting an instruction that reads or writes a variable assigned to a register, use that register directly (no stack access).
- When emitting an instruction for a spilled variable, insert
lw reg_temp, N(sp)before use andsw reg_temp, N(sp)after definition, where N is the allocated stack slot offset.
Part C: Measurement (7 pts)
C1: Compile the benchmark (4 pts)
Compile benchmark-allocator.jack (provided in courses/csa-201/labs/fixtures/benchmark-allocator.jack) with:
- CSA-101 translator (no allocator; always spill)
- CSA-201 translator with linear-scan allocator (t0-t6 pool only)
- CSA-201 translator with allocator (t0-t6 + s1-s11 pool, with prologue/epilogue)
For each version, count:
- Total instruction lines in the output
.asmfile - Number of
lwinstructions (loads from stack) - Number of
swinstructions (stores to stack)
Record all three sets of counts and compute the reduction ratios.
C2: Analysis (3 pts)
Write a 300-500 word analysis:
-
What is the actual reduction ratio your allocator achieves vs the CSA-101 baseline? Compare against the Lab 7.4 prediction of 3-5x.
-
For which functions in the benchmark does the allocator help most? For which does it help least? What property of the function explains this?
-
Does adding the callee-saved registers (s1-s11) improve the total code quality, accounting for the prologue/epilogue overhead? At what function size does the break-even point occur?
Add a Toolchain Diary entry summarizing the results. Include the exact command line you used to compile the benchmark.
Grading
| Part | Criteria | Points |
|---|---|---|
| A1 | Liveness analysis correct on the unit test; loop handling present | 5 |
| A2 | Register pool defined; caller-saved and callee-saved pools both available | 3 |
| B1 | Linear scan algorithm implemented; spill + expire logic correct | 7 |
| B2 | Code emission uses allocated registers; spills insert lw/sw correctly | 3 |
| C1 | All three benchmark measurements present and plausible | 4 |
| C2 | Analysis explains the results; addresses all three questions | 3 |
| Total | 25 |