Classroom Glossary Public page

Lab 5.1: Compiler Inlining and Constant Folding

832 words

Total points: 25
Estimated time: 3.5 hours
Prerequisites: Labs 3.1 and 4.1 complete; allocator + peephole pipeline working


Overview

You will add an inlining pass and a constant-folding pass to your compiler pipeline. The inliner substitutes small function bodies at their call sites; the constant folder evaluates expressions at compile time when all operands are known constants. Combined with the allocator and peephole from Labs 3-4, these passes close the Lab 11.4 five-categories baseline.


Part A: Function inliner (12 pts)

A1: Build the inline candidate set (3 pts)

Before the main compilation pass, scan all function definitions and identify inline candidates. A function is an inline candidate if:

  1. Its body is fewer than 10 instructions (post-allocator, pre-peephole; count the instructions between the function prologue and return)
  2. It is not recursive (no call to itself within the body)

Build a dictionary inline_candidates: {function_name: [instruction_list]} mapping each candidate's name to its body (without the prologue and epilogue). For Virtus OS, this should include at minimum: Math.abs, Math.max, Math.min, Math.signum, Math.sqrt (if under threshold).

A2: Inline at call sites (6 pts)

At each call site call <function_name>:

  1. Check if function_name is in inline_candidates.
  2. If yes, substitute the function body inline: a. Rename the callee's parameter registers to match the call-site's argument registers (the caller puts arguments in a0-a7; the callee reads from a0-a7 by ABI; for Virtus OS functions that follow the ABI, no renaming is needed for the first 8 arguments). b. Rename the callee's return address (the callee jumps to ra on return; replace jalr x0, ra, 0 with a fall-through label). c. Rename any temporaries in the callee body that conflict with temporaries the caller is currently using (alpha-rename with fresh names).
  3. If no, emit the call instruction unchanged.

Handle the call-site register move sequence: for Virtus OS, calls follow the pattern mv a0, <arg0>; mv a1, <arg1>; call Math.max. After inlining, the mv instructions may become redundant if the callee body reads a0/a1 directly; run the peephole pass again on the inlined sequence to eliminate them.

A3: Guard against call-count blowup (3 pts)

Add a second inlining policy: if a function is called more than 20 times and has more than 5 instructions, do not inline it (code-size concern). Add a --force-inline flag that bypasses this guard for testing.

Write a test that verifies: (a) a function with 4 instructions called 50 times IS inlined at all 50 sites; (b) a function with 8 instructions called 25 times is NOT inlined; (c) with --force-inline, case (b) is overridden.


Part B: Constant folding (8 pts)

B1: Abstract interpretation over the instruction stream (5 pts)

Implement a constant-propagation pass as an abstract interpretation over the instruction stream. Maintain a map const_vals: {register: int | TOP} where TOP means "unknown." Initialize all registers to TOP.

For each instruction:

  • li Rd, imm: const_vals[Rd] = imm
  • add Rd, Rs1, Rs2: if both Rs1 and Rs2 are in const_vals, set const_vals[Rd] = const_vals[Rs1] + const_vals[Rs2]; else set const_vals[Rd] = TOP
  • mul, sub, sll, srl, sra, and, or, xor: same pattern as add
  • mv Rd, Rs: if Rs is in const_vals, propagate; else TOP
  • lw Rd, N(Rb): const_vals[Rd] = TOP (loaded values are unknown)
  • Any instruction with a side effect or jump: for safety, set all registers to TOP (conservative; improve later)
  • bge, blt, beq, bne, blez, bgez: if both operands are known constants, evaluate the branch: if not-taken, the subsequent instructions are dead; if taken, jump the abstract state to the target label

For constant expressions where you can fold: replace the instruction with li Rd, <computed_value>.

B2: Dead-branch elimination (3 pts)

After constant folding, add a dead-branch pass: if a branch instruction has a known-constant condition, replace it with either:

  • A j label (unconditional jump) if the branch is always taken
  • A no-op (delete the branch) if the branch is never taken

Then remove all instructions between the never-taken branch and the next label (they are dead code). This is the "super-optimization" effect: inlining exposes constant arguments, folding evaluates them, branch elimination removes the dead path, and the peephole cleans up the result.


Part C: Measurement and analysis (5 pts)

C1: Compile and measure (3 pts)

Compile benchmark-inlining.jack (provided in labs/fixtures/) with five configurations:

  1. CSA-101 baseline
  2. Allocator only
  3. Allocator + peephole
  4. Allocator + peephole + inliner
  5. Allocator + peephole + inliner + folding

Fill in:

Config Instructions vs baseline math-call sites inlined branches folded
1. Baseline 1.0x 0 0
2. Alloc N/A N/A
3. Alloc+PH N/A N/A
4. Alloc+PH+Inline N/A
5. Full pipeline

C2: Analysis (2 pts)

Write 200 words:

  • What percentage of the total reduction does the inliner contribute?
  • For constant arguments, does inliner + folding produce shorter code than the original function call? Show one concrete example.
  • What is the largest function in benchmark-inlining.jack that the inliner chose not to inline? Was this the right decision?

Grading

Part Criteria Points
A1 Inline candidate scan correct; Virtus OS utility functions identified 3
A2 Inliner substitutes body correctly; registers renamed correctly; peephole re-run 6
A3 Call-count guard works; --force-inline bypasses it; unit test passes 3
B1 Constant propagation correct on benchmark; at least 5 constants folded 5
B2 Dead-branch elimination removes unreachable instructions on at least 1 example 3
C1 Measurement table complete with all 5 configurations 3
C2 Analysis addresses all three questions 2
Total 25