Total points: 25
Estimated time: 3.5 hours
Prerequisites: Labs 3.1 and 4.1 complete; allocator + peephole pipeline working
Overview
You will add an inlining pass and a constant-folding pass to your compiler pipeline. The inliner substitutes small function bodies at their call sites; the constant folder evaluates expressions at compile time when all operands are known constants. Combined with the allocator and peephole from Labs 3-4, these passes close the Lab 11.4 five-categories baseline.
Part A: Function inliner (12 pts)
A1: Build the inline candidate set (3 pts)
Before the main compilation pass, scan all function definitions and identify inline candidates. A function is an inline candidate if:
- Its body is fewer than 10 instructions (post-allocator, pre-peephole; count the instructions between the function prologue and return)
- It is not recursive (no call to itself within the body)
Build a dictionary inline_candidates: {function_name: [instruction_list]} mapping each candidate's name to its body (without the prologue and epilogue). For Virtus OS, this should include at minimum: Math.abs, Math.max, Math.min, Math.signum, Math.sqrt (if under threshold).
A2: Inline at call sites (6 pts)
At each call site call <function_name>:
- Check if
function_nameis ininline_candidates. - If yes, substitute the function body inline:
a. Rename the callee's parameter registers to match the call-site's argument registers (the caller puts arguments in a0-a7; the callee reads from a0-a7 by ABI; for Virtus OS functions that follow the ABI, no renaming is needed for the first 8 arguments).
b. Rename the callee's return address (the callee jumps to
raon return; replacejalr x0, ra, 0with a fall-through label). c. Rename any temporaries in the callee body that conflict with temporaries the caller is currently using (alpha-rename with fresh names). - If no, emit the call instruction unchanged.
Handle the call-site register move sequence: for Virtus OS, calls follow the pattern mv a0, <arg0>; mv a1, <arg1>; call Math.max. After inlining, the mv instructions may become redundant if the callee body reads a0/a1 directly; run the peephole pass again on the inlined sequence to eliminate them.
A3: Guard against call-count blowup (3 pts)
Add a second inlining policy: if a function is called more than 20 times and has more than 5 instructions, do not inline it (code-size concern). Add a --force-inline flag that bypasses this guard for testing.
Write a test that verifies: (a) a function with 4 instructions called 50 times IS inlined at all 50 sites; (b) a function with 8 instructions called 25 times is NOT inlined; (c) with --force-inline, case (b) is overridden.
Part B: Constant folding (8 pts)
B1: Abstract interpretation over the instruction stream (5 pts)
Implement a constant-propagation pass as an abstract interpretation over the instruction stream. Maintain a map const_vals: {register: int | TOP} where TOP means "unknown." Initialize all registers to TOP.
For each instruction:
li Rd, imm:const_vals[Rd] = immadd Rd, Rs1, Rs2: if both Rs1 and Rs2 are inconst_vals, setconst_vals[Rd] = const_vals[Rs1] + const_vals[Rs2]; else setconst_vals[Rd] = TOPmul,sub,sll,srl,sra,and,or,xor: same pattern asaddmv Rd, Rs: if Rs is inconst_vals, propagate; else TOPlw Rd, N(Rb):const_vals[Rd] = TOP(loaded values are unknown)- Any instruction with a side effect or jump: for safety, set all registers to TOP (conservative; improve later)
bge,blt,beq,bne,blez,bgez: if both operands are known constants, evaluate the branch: if not-taken, the subsequent instructions are dead; if taken, jump the abstract state to the target label
For constant expressions where you can fold: replace the instruction with li Rd, <computed_value>.
B2: Dead-branch elimination (3 pts)
After constant folding, add a dead-branch pass: if a branch instruction has a known-constant condition, replace it with either:
- A
j label(unconditional jump) if the branch is always taken - A no-op (delete the branch) if the branch is never taken
Then remove all instructions between the never-taken branch and the next label (they are dead code). This is the "super-optimization" effect: inlining exposes constant arguments, folding evaluates them, branch elimination removes the dead path, and the peephole cleans up the result.
Part C: Measurement and analysis (5 pts)
C1: Compile and measure (3 pts)
Compile benchmark-inlining.jack (provided in labs/fixtures/) with five configurations:
- CSA-101 baseline
- Allocator only
- Allocator + peephole
- Allocator + peephole + inliner
- Allocator + peephole + inliner + folding
Fill in:
| Config | Instructions | vs baseline | math-call sites inlined | branches folded |
|---|---|---|---|---|
| 1. Baseline | 1.0x | 0 | 0 | |
| 2. Alloc | N/A | N/A | ||
| 3. Alloc+PH | N/A | N/A | ||
| 4. Alloc+PH+Inline | N/A | |||
| 5. Full pipeline |
C2: Analysis (2 pts)
Write 200 words:
- What percentage of the total reduction does the inliner contribute?
- For constant arguments, does inliner + folding produce shorter code than the original function call? Show one concrete example.
- What is the largest function in
benchmark-inlining.jackthat the inliner chose not to inline? Was this the right decision?
Grading
| Part | Criteria | Points |
|---|---|---|
| A1 | Inline candidate scan correct; Virtus OS utility functions identified | 3 |
| A2 | Inliner substitutes body correctly; registers renamed correctly; peephole re-run | 6 |
| A3 | Call-count guard works; --force-inline bypasses it; unit test passes | 3 |
| B1 | Constant propagation correct on benchmark; at least 5 constants folded | 5 |
| B2 | Dead-branch elimination removes unreachable instructions on at least 1 example | 3 |
| C1 | Measurement table complete with all 5 configurations | 3 |
| C2 | Analysis addresses all three questions | 2 |
| Total | 25 |