Classroom Public page

Week 8: VM I (Stack Arithmetic + Memory Segments)

874 words

Add a layer to the toolchain. A stack-based virtual machine. You write the VM translator that consumes VM bytecode and emits RV32I-Lite assembly. By end of week your translator handles stack arithmetic plus the four memory segments (local, argument, this, that).


Reading

  • Chapter prose (primary). draft-chapters/ch7-vm-i-prose.md
  • Petzold weave anchors. Ch 17 Automation (returning visit, pp. 209 + 212); Ch 22 The Operating System p. 328 (returning visit, bootstrap loader); Ch 24 Languages High and Low p. 354 (returning visit, "ALGOL... seminal language, the direct ancestor..."). All three are returning visits; the threads start to weave together
  • Cross-chapter handouts. VM segment cheat sheet

Lecture

lectures/ch7-vm-i-lecture.md. 3 hours. Key arc:

  • Why a VM. The Jack-equivalent language (Ch 9-11) is easier to compile to a stack machine than to RV32I-Lite directly. The VM is the intermediate layer
  • Stack arithmetic. push 3; push 4; add leaves 7 on top of the stack. The VM operations are simple; the translator turns them into RV32I-Lite instructions that manipulate the stack pointer
  • Memory segments. The VM exposes named regions (local, argument, this, that, static, constant, pointer, temp). The translator maps each named region to a base-address-plus-offset pattern in RV32I-Lite
  • The translator is a one-pass pattern matcher. Each VM op has a fixed RV32I-Lite expansion
Stack push/pop small-multiples for the VM sequence push 5; push 7; add. Four panels left-to-right show the stack state Before, after push 5, after push 7, and after add. Each panel renders four stack cells stacked vertically with absolute addresses on the right (stack base 0x00010030), an amber sp arrow pinned to the next-free slot, and the cell just touched highlighted amber. Below each panel is the emitted RV32I-Lite assembly that produced the transition.

Figure 8.1. The same push 5; push 7; add walked across four panels of stack state. The amber cell in each panel is the one your translator's emitted instructions just wrote to. The sp marker ascends from 0x00010030 to 0x00010038 and back down to 0x00010034: each push writes then sp += 4, the add pops twice and pushes once (net sp - 4). Per cross-chapter-vm-segment-cheat-sheet.md, sp always points at the next-free slot, one past the topmost occupied word. Reuse this picture when Lab 7.1 first asks you to predict the SP-value column.

Lab exercises

Five labs in worksheets/ch7/.

Plan for ~6 hours of lab (the simulator companion adds ~75 minutes on top of the original budget).

Independent practice

  • Re-read Petzold Ch 17 + Ch 22 + Ch 24 for the returning-visit theses. Notice that the three chapters are starting to interlock in your mental model the way they do in Petzold's book
  • Update your Toolchain Diary. Week 8 introduces: VM bytecode notation, stack-pointer arithmetic, the segment-base-plus-offset addressing pattern, the academy's vm-translator Python module

Where the segments live in the memory map

Ch 3's byte-addressable RAM had no regions. Ch 7's VM hands you eight named segments, four of which (LCL, ARG, THIS, THAT) are pointers stored at fixed absolute addresses, four of which (temp) ARE fixed slots, and the other four (static, constant, pointer) live in .data or are inlined by the translator.

Virtus Console 32-bit memory map. Same diagram as Ch 3 figure: ten regions left-to-right by address with the VM segment-base region at 0x00010000 amber-highlighted. The inset zoom expands the amber block into LCL_addr, ARG_addr, THIS_addr, THAT_addr, and the eight temp slots.

Figure 8.2. Same memory-map strip you saw in Ch 3 §Where this RAM lives. The amber block is now the focus: by end of Ch 7 your translator emits code that reads LCL_addr from 0x00010000 and uses it as the base for local 0, local 1, etc. The eight temp slots at 0x00010010..0x0001002C are direct (no indirection); the four pointer slots above them are the indirection step. Pin this picture during Lab 7.2.

Architecture comparison sidebar

Stack-based VMs (JVM, Python bytecode, WebAssembly) are easier to compile to but slower to execute than register-based VMs (Dalvik on Android, LLVM IR). The trade-off: stack VMs have shorter bytecode (no register operand fields); register VMs have fewer instructions per high-level operation. CSA-101 uses stack-based because it pairs cleanly with the recursive-descent compiler in Ch 9-10.

Reflection prompts

  1. The VM adds a layer to the toolchain. Why is that a good thing? When would adding more layers make the toolchain worse?
  2. Stack operations require no register names in the bytecode. Why doesn't every VM use a stack architecture?
  3. The four memory segments (local, argument, this, that) are named after their pedagogical role. What would these be called in a production language runtime?

What's next

Week 9 finishes the VM. Program flow (labels and conditional jumps) plus function calls (with the full calling-convention protocol). After this week the VM can express any computable program; it is the first time the toolchain is Turing-complete from the source-language side.