Classroom Glossary Public page

Week 12: Compiler III — OS-Aware Compilation

856 words

A single-class compiler is an exercise. A multi-class compiler that links against an operating system is a tool. Week 12 wires the compiler into the full toolchain: multiple VirtusLang source files compiled in order, linked against Virtus OS stdlib objects, and producing a binary that boots on the Tang Primer 25K. This is the week where the abstraction ladder you have been climbing becomes visible as a complete structure.


Reading

  • Petzold Ch 25 ("The World Brain," pp. 361-378) second visit: Re-read the chapter with fresh eyes now that you have built a compiler. Petzold describes the complete ladder; you have now built most of the rungs. (~18 pages)

Lecture

3 hours. Key arc:

What single-class compilation missed. The Week 11 compiler processed one .vl file and emitted one .vm file. A real program has multiple classes, each in its own file. The compiler must handle this without knowing the implementation of classes that are defined elsewhere — it only needs to know their method signatures to emit correct call instructions.

The compiler.py driver. The entry point that orchestrates multi-file compilation:

#!/usr/bin/env python3
"""compiler.py -- VirtusLang multi-file compiler.

Usage: python3 compiler.py <source-dir-or-file> -o <output-dir>

If given a directory, compiles all .vl files in that directory.
Each .vl file produces one .vm file in the output directory.
"""
import sys, os
from pathlib import Path
from tokenizer import Tokenizer
from parser import Parser
from codegen import CodeGen

def compile_file(src_path: Path, out_dir: Path):
    source = src_path.read_text()
    tok = Tokenizer(source)
    parser = Parser(tok)
    tree = parser.parse_class()
    gen = CodeGen()
    gen.compile_class(tree)
    out_path = out_dir / src_path.with_suffix('.vm').name
    out_path.write_text('\n'.join(gen.output) + '\n')
    print(f"  {src_path.name} -> {out_path.name} ({len(gen.output)} VM lines)")

def main():
    if len(sys.argv) < 2:
        print("Usage: compiler.py <source.vl | source-dir> [-o output-dir]")
        sys.exit(1)
    
    src = Path(sys.argv[1])
    out_dir = Path(sys.argv[sys.argv.index('-o') + 1]) if '-o' in sys.argv else src.parent
    out_dir.mkdir(parents=True, exist_ok=True)
    
    sources = sorted(src.glob('*.vl')) if src.is_dir() else [src]
    print(f"Compiling {len(sources)} file(s) -> {out_dir}/")
    for s in sources:
        compile_file(s, out_dir)

if __name__ == '__main__':
    main()

OS library call lowering. When a VirtusLang program calls Output.printInt(x), the compiler emits:

push argument 0    // x value
call Output.printInt 1
pop temp 0         // discard (if do statement)

The Output.printInt symbol is unresolved in the compiled VM. The VM translator leaves it as an unresolved symbol in the assembled VOF's .reloc table. The linker resolves it when linking against the Virtus OS stdlib VOF files. This three-stage resolution — compiler knows signature, assembler records reference, linker resolves address — is the same pipeline that a C compiler + GNU assembler + GNU linker uses.

Virtus OS service signatures (required by the compiler). The compiler must know argument counts for OS calls; it does not need to know implementations:

# OS call signatures: class -> {method -> n_args (not counting 'this')}
OS_SIGNATURES = {
    "Math":     {"abs": 1, "multiply": 2, "divide": 2, "max": 2, "min": 2, "sqrt": 1},
    "String":   {"new": 1, "dispose": 0, "length": 0, "charAt": 1, "appendChar": 1,
                  "intValue": 0, "setInt": 1, "backSpace": 0, "doubleQuote": 0, "newLine": 0},
    "Array":    {"new": 1, "dispose": 0},
    "Output":   {"moveCursor": 2, "printChar": 1, "printString": 1, "printInt": 1,
                  "println": 0, "backSpace": 0},
    "Screen":   {"clearScreen": 0, "setColor": 1, "drawPixel": 2, "drawLine": 4,
                  "drawRectangle": 4, "drawCircle": 3},
    "Keyboard": {"keyPressed": 0, "readChar": 0, "readLine": 1, "readInt": 1},
    "GamePad":  {"buttonPressed": 1, "readAxes": 0},
    "Memory":   {"peek": 1, "poke": 2, "alloc": 1, "deAlloc": 1},
    "Sys":      {"halt": 0, "error": 1, "wait": 1},
}

End-to-end pipeline for a multi-file program.

# Compile a 3-class program
python3 toolchain/compiler/compiler.py program/ -o build/vm/

# Translate all VM files
for vm_file in build/vm/*.vm; do
    python3 toolchain/vm-translator/translator.py "$vm_file" \
        -o "build/asm/$(basename ${vm_file%.vm}.s)"
done

# Assemble all generated assembly files
for asm_file in build/asm/*.s; do
    python3 toolchain/assembler/asm.py "$asm_file" \
        -o "build/obj/$(basename ${asm_file%.s}.vof)"
done

# Link: user objects + Virtus OS stdlib
python3 toolchain/linker/linker.py \
    build/obj/*.vof \
    virtus-os/*.vof \
    -o build/program.bin

# Verify
sha256sum build/program.bin

HDMI output via Virtus Console. The Virtus Console peripheral exposes a framebuffer that Screen.drawPixel and Screen.drawRectangle write to. On the hardware path:

// verilog/console/framebuffer.v
// Dual-port BRAM: CPU writes pixel data; HDMI controller reads it
// Pixel format: 1 bit per pixel (monochrome); 640x480 = 38,400 bytes
module framebuffer (
    input  wire clk_cpu,
    input  wire [16:0] cpu_addr,    // 17-bit: covers 128KB
    input  wire [31:0] cpu_data,
    input  wire cpu_we,
    input  wire clk_hdmi,
    input  wire [16:0] hdmi_addr,
    output reg  [31:0] hdmi_data
);

For students on the UART-only path (no Tang Primer 25K with HDMI extension), Screen.drawPixel can be redirected to a UART escape sequence that prints character-art representations — acceptable for the Lab 12 submission.


Lab exercises

Four labs in labs/lab-12.md. Plan for ~5 hours.

  • Lab 12.1. Run compiler.py on the three-class Pong/ directory. Verify all three .vm files are produced. Spot-check PongGame.vm for correct method-dispatch and object-construction patterns.
  • Lab 12.2. Run the full pipeline: Pong/ → VM → assembly → link → binary → simulation. The Pong game requires Screen, Output, Keyboard, and GamePad OS services. Stub any missing services with minimal implementations for simulation.
  • Lab 12.3. Boot the Pong binary on the Tang Primer 25K (hardware path) or verify UART output matches expected screen state (UART path). Record the binary size and instruction count in your Toolchain Diary.
  • Lab 12.4. Write a new VirtusLang program of your own design — at minimum one class with one method, using at least one while loop and one OS call. Compile it, run it, and verify the output. Include the source, the generated VM, and the simulation output in your submission.

Independent practice

  • Read Knuth, The Art of Computer Programming, Vol. 1 §2.3 (Trees) for the conceptual background behind parse trees, or review the first two chapters of the Dragon Book (Aho, Lam, Sethi, Ullman, Compilers: Principles, Techniques, and Tools). The recursive-descent parser you built is described in Dragon Book §4.4.
  • Compare the Virtus OS service table above against the nand2tetris Jack OS API. Note one service that Virtus OS adds (GamePad) and explain what hardware change makes it possible.

Architecture comparison sidebar

Multi-file compilation: CSA-102 Py6502v vs CSA-110 VirtusLang vs GCC.

CSA-102's Py6502v compiler assembled everything into a single flat binary. "Multi-file" compilation in CSA-102 meant concatenating source files and running the compiler once. No separate object files; no linker.

CSA-110's compiler produces one .vm file per source file, which the VM translator turns into one .vof per source file, which the linker combines. Each stage can run independently. The stdlib (virtus-os/*.vof) is pre-compiled and linked in without recompilation — the same model that system libraries (libc, libm) use in Linux.

GCC with gcc -c produces ELF object files from C source files; ld links them. The structure is identical to CSA-110's pipeline. The key difference is scale: GCC handles 40+ source languages, 20+ target architectures, and millions of symbols. CSA-110 handles VirtusLang, RV32I-Lite, and a few hundred symbols. The structure that makes GCC extensible — front end → IR → back end → linker — is the structure you built this week.


Reflection prompts

  1. The compiler.py driver compiles .vl files in sorted order. Why does the order of compilation matter (or not matter) for correctness? When could it matter?
  2. The OS service signature table in the lecture is hardcoded in the compiler. What would break if a future version of Virtus OS changed the argument count of Output.printInt? Propose a mechanism that would make the compiler robust against OS signature changes without recompiling the OS.
  3. Lab 12.4 asks you to write a new VirtusLang program. What was the first thing that didn't work when you compiled and ran it? Describe the debugging steps and the root cause.

What's next

Week 13 builds the Virtus OS itself — the nine services that the compiled programs call. You have been generating calls to Math.multiply, String.new, and Screen.drawPixel throughout Weeks 10-12; Week 13 is where you write those implementations.