Classroom Glossary Public page

CSA-201: Computer Systems Architecture II, Course Outline

2,029 words

CSA-101 closed at the system line: a Tang Primer 25K running an OS the student wrote, on a CPU the student synthesized. Every layer the chapter omitted was named. CSA-201 pays those debts.


Course mission and audience

CSA-201 is the academy's Part-II anchor course. Students arrive with a sim-certified and silicon-certified RV32I-Lite CPU, a working compiler toolchain, and Virtus OS v1 running on Tang Primer 25K. Every layer CSA-201 touches is one the student already owns in a simplified form. The course is a systematic expansion: full RV32I plus the M extension, the privileged ISA, compiler improvements, virtual memory, memory protection, and the OS services that depend on all of the above.

The audience is CSA-101 graduates. Entry requirements: RV32I-Lite CPU synthesized and running; assembler, VM translator, and compiler working end-to-end; Virtus OS v1 booting and running at least three standard-library services on silicon.

Position in the pipeline. Belt 5/5. Part-II anchor. Prerequisites CSA-201 for all six named Part-II electives: VCA-ARM-201, VCA-NET-201, VCA-EMB-201, VCA-NET-301, VCA-X86-201, VCA-VCA-MIPS-201. Cross-track: the XD-strand capstones (XD1 stack/shellcode, XD2 mitigations/ROP, XD3 heap/format strings) all attack Virtus OS v2 built here.


What you will know at the end

  1. Remember. Recite the full RV32I instruction set (47 base instructions, 4 formats). Name the six CSR instructions. State the RISC-V privilege levels (M, S, U) and the trap mechanism (ECALL, mtvec, mepc, mcause, MRET). Name the Sv32 two-level page table structure. Name the PMP register layout (pmpcfg, pmpaddr, modes TOR/NA4/NAPOT).

  2. Understand (hardware). Explain how Zicsr augments the datapath with a separate CSR address space; how trap delivery works cycle-by-cycle; how an MMU translates virtual to physical addresses with a two-level page walk; how PMP blocks regions before they reach the memory bus.

  3. Understand (compiler). Explain what a register allocator does and why naively spilling to the stack costs code size; what a peephole pass finds in a local window; why inlining trades code size for call overhead; what SSA form enables that straight assignment cannot.

  4. Apply (hardware). Extend your CSA-101 CPU to full RV32I: widen the register file to 32 entries, add the M-extension multiplier/divider, add Zicsr, add privilege modes with trap delivery, add Sv32 MMU with a TLB, add PMP. Synthesize each step; verify with riscv-tests.

  5. Apply (compiler + OS). Add a register allocator pass to your CSA-101 compiler; measure emit reduction. Add peephole and inlining passes. Run your compiler against the same source on godbolt.org to compare with gcc at -O2. Build Virtus OS v2: U/S split, page tables, PMP, round-robin scheduler, SSD1306 + SD-card + ENC28J60 drivers.

  6. Analyze (cross-layer). Trace a page fault from a user-mode memory access: the MMU raises the fault, the supervisor trap handler services it, the process resumes. Then trace a stack-smash attempt: the PMP W^X policy intercepts the write to the code page before it reaches the ALU.

  7. Create (capstone). Deliver Virtus OS v2 running on DE10-Nano: U/S privilege transition demonstrated; page-fault handler running; PMP W^X enforced; round-robin scheduler context-switching two tasks; SSD1306 OLED showing output; SD-card filesystem reading a file. ~4,000 lines total across kernel + drivers, vs CSA-101's ~1,500.


Course shape (14 weeks)

Week Theme Lab Petzold weave anchor Architecture comparison sidebar
1 Full RV32I + M extension Lab 1.1: mul vs Math.multiply speedup measured Petzold Ch 12 + Ch 13 (binary multiplication and the cost of iteration) M-extension: ATmega software-mul (2 registers, 16 cycles) vs RV32IM mul (single instruction) vs MIPS mult (HI/LO registers)
2 Privileged ISA + ecall trap Lab 2.1: first user-to-supervisor transition; cycle cost measured Petzold Ch 22 (the OS chapter: supervisor mode emerges from timesharing mainframes) Trap delivery: x86_64 SYSCALL/SYSRET vs RISC-V ECALL/MRET vs ARM SVC/ERET
3 Compiler register allocator Lab 3.1: allocator pass added; emit reduction observed Petzold Ch 24 + Ch 17 (from machine code to language and back; what a register is worth) Register files: RV32I 32-GPR vs x86_64 16-GPR + 16 XMM vs AArch64 31-GPR; the RISC philosophy
4 Compiler peephole optimisation Lab 4.1: peephole pass; ~30% smaller assembly per §11.9 5-categories Petzold Ch 24 (high-level language compilation; local windows and the assembler's view) Peephole scope: LLVM MachineInstr window vs GCC RTL peephole vs hand-written RISC-V idioms
5 Compiler inlining + constant folding Lab 5.1: inliner pass; library-call overhead measured before/after Petzold Ch 22 + Ch 24 (procedure calls and their costs) Inlining policy: GCC/Clang heuristics vs JVM JIT threshold vs your compiler's naïve model
6 SSA-IR + Compiler Explorer (godbolt.org) Lab 6.1: compare your compiler output vs gcc -O0/-O2/-O3 on identical source Petzold Ch 24 (the long arc from machine code to optimizing compilers) SSA-IR: LLVM IR vs GCC GIMPLE vs WebAssembly; why static analysis needs single-assignment form
7 Sv32 paged virtual memory + MMU Lab 7.1: Sv32 paged VM running; page-fault handler demonstrated Petzold Ch 16 + Ch 14 (memory hierarchy; segment registers and why hardware abstracts addresses) Virtual memory: x86_64 CR3 + 4-level paging vs Sv32 2-level vs AArch64 TTBRn; TLB shootdown on SMP
8 PMP + W^X enforcement Lab 8.1: PMP-defended stack-smash; same exploit from Ch 12 §12.11 now traps cleanly Petzold Ch 16 (memory protection; why hardware rings exist above software) Privilege rings: Linux S/U/M three-layer vs Windows ring 3/0/HV vs bare-metal M-only
9 Stack canaries + CFI Lab 9.1: stack canary detects return-address overwrite; CFI shadow stack catches ROP Petzold Ch 22 (what the OS protects and what it cannot) CFI mechanisms: x86_64 CET (SHSTK + IBT) vs AArch64 PAC+BTI vs RISC-V Zicfilp/Zicfiss
10 Tracing garbage collection Lab 10.1: tracing GC running on Memory.lib; cycle cost measured Petzold Ch 22 (the OS manages memory so programs don't have to; GC as the logical extreme) GC strategies: mark-and-sweep vs copying vs generational; JVM G1 vs Python reference-counting vs Go tricolor
11 Preemption + scheduler Lab 11.1: round-robin scheduler; two demo tasks; context-switch cost measured Petzold Ch 22 (timesharing: the original reason for supervisor mode) Schedulers: Linux CFS vs Windows dispatcher vs RTOS fixed-priority vs your round-robin
12 Driver-writing track Lab 12.1: SSD1306 OLED driver from datasheet; output verified Petzold Ch 16 + Ch 18 (buses; peripherals; I2C and SPI as descendants of serial ideas Petzold traces) I2C vs SPI vs UART: protocol overhead; SSD1306 command-byte protocol vs ENC28J60 SPI frame
13 External DRAM + filesystem Lab 13.1: SD-card filesystem walker reads FAT16 partition Petzold Ch 14 + Ch 16 (memory hierarchy; DRAM timing; why flash storage is not RAM) Storage stacks: BRAM on Tang (< 1 MiB ceiling) vs SD-card FAT16 vs NVMe + ext4; endianness in FAT
14 Capstone, Virtus OS v2 on DE10-Nano Full capstone (see CAPSTONE.md) Closing reflection on the full ladder from CSA-101 through CSA-201 The bridge talk: where to take Virtus OS v2 next (ARM-201, EMB-201, NET-201, CON-201)

Anchor readings

Primary (continued from CSA-101 at advanced depth).

Patterson and Hennessy, Computer Organization and Design: RISC-V Edition (Morgan Kaufmann). The chapter coverage for CSA-201: Appendix B (RISC-V ISA reference, full RV32I + M); Chapter 4 (pipelining; optional but rewarding for Module 1 context); Chapter 5 (memory hierarchy; pairs with Modules 7 and 13); Chapter 2 (instructions; pairs with Modules 1 and 2). Use as a reference, not a reading schedule; the lab exercises are self-contained.

Petzold, CODE: The Hidden Language of Computer Hardware and Software, 1st edition (1999). CSA-201 adds ~25 new weaves across its chapters. Priority chapters: Ch 12 and Ch 13 (binary multiplication; the M extension pays back Math.multiply's iteration cost in a single instruction); Ch 14 and Ch 16 (memory addressing; the MMU chapter traces directly to Petzold's treatment of how hardware abstracts addresses); Ch 22 (the operating system; supervisor mode emerges from timesharing mainframes; the privilege chapter opens with this). The 2nd edition (2022) covers the same material; locate passages by section title.

Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, 3rd edition (Pearson). Use chapters: 7 (linking; pairs with the register allocator discussion), 9 (virtual memory; the definitive reference for the Sv32 module), 3 (machine-level representation; pairs with Modules 3-6 compiler work), 8 (exceptional control flow; the syscall and trap mechanism pairs with Module 2).

Secondary. Waterman and Asanovic, The RISC-V Instruction Set Manual, Volume I: Unprivileged Architecture (open access). Volume II: Privileged Architecture (open access). These are the authoritative ISA specs; use them to verify instruction encodings and CSR definitions during labs.


Per-week time budget

Activity Hours per week Hours over 14 weeks
Lecture ~2.5 hr ~35 hr
Lab (hands-on) ~4 hr ~56 hr
Independent practice (reading + repo work + Toolchain Diary) ~6 hr ~84 hr total
Capstone integration weeks -- +5 hr (weeks 13-14)
Total ~12.5 hr/week ~180 hr

The heaviest weeks are 1 (M-extension hardware integration), 7 (Sv32 MMU page-table walker), and 14 (capstone integration). Budget an extra 2-3 hours in each. The compiler weeks (3-6) run lighter on hardware and heavier on repo editing.


Lab arc summary

Each lab measures a specific cost that CSA-101 deliberately paid and then recovers it.

Lab What it measures What it recovers
Lab 1.1 Math.multiply ~1,000-cycle vs mul single-cycle speedup CSA-101's software-multiply dependency
Lab 2.1 ecall trap round-trip cycle cost CSA-101's M-mode-only baseline
Lab 3.1 Compiler emit size before/after register allocator Lab 7.4's translator-bloat forward-promise
Lab 4.1 Assembly line count before/after peephole pass Lab 11.4's 5-categories forward-promise
Lab 5.1 Call overhead before/after inliner Ch 11 library-call overhead
Lab 6.1 godbolt.org -O0/-O2/-O3 comparison on identical source Production-grade compiler reference
Lab 7.1 Page-fault handler under Sv32 Flat physical address space
Lab 8.1 PMP-defended stack smash that traps cleanly Ch 12 §12.11 W^X absence
Lab 9.1 Stack canary + ROP detection CSA-201's newly-enabled W^X and CFI primitives
Lab 10.1 Tracing-GC cycle cost on Memory.lib Ch 12 §12.5.4 manual-only allocator
Lab 11.1 Context-switch cost, two-task round-robin Ch 12 §12.1 single-task baseline
Lab 12.1 SSD1306 driver written from datasheet IP-Pack black box for OLED
Lab 13.1 FAT16 partition walk on SD card Tang BRAM ceiling; no persistent storage
Lab 14 Full Virtus OS v2 capstone on DE10-Nano Everything above, integrated

Hardware requirements

Required. Tang Primer 25K (carried over from CSA-101; your existing bitstream and board). DE10-Nano Cyclone V FPGA development board (~$130, student-purchased; Terasic). Pi 4 station kit (~$80; USB-C power, HDMI cable, microSD 16 GB minimum, USB keyboard). Quartus Prime Lite 23.1 or later (free, Intel/Altera; installs on Linux x86_64 or Windows). riscv32-unknown-elf toolchain (prebuilt binaries at riscv.org or build from source).

Optional but recommended. Logic analyzer (Saleae Logic 8 or open-source equivalent) for driver debugging in Modules 12-13. External DRAM module compatible with DE10-Nano I/O board for Module 13 extended exercises.

Yowasp browser path. The academy workbench Tab 3 supports Tang Primer 25K and Tang Nano 20K bitstream synthesis in-browser (Yosys + nextpnr-himbaechel + apicula). Use this for pre-Quartus sanity checks on RISC-V HDL changes before the full DE10-Nano Quartus build.

See SETUP.md for installation instructions.


Assessment overview

Tier 1 (pass/fail gate). Virtus OS v2 boots on DE10-Nano. Demonstrates U/S privilege transition. Page-fault handler handles at least one synthetic fault. PMP W^X policy intercepts a write to a code page. Round-robin scheduler switches between two running tasks. SSD1306 OLED shows live output. All six gates must pass for a Tier 2 score to count.

Tier 2 (40/30/30). 40% mitigation depth (does your OS actually enforce security properties, or just simulate them?); 30% measurement quality of speedups and cost recovery vs CSA-101 baseline; 30% demo and 6-8 page write-up. B- minimum on Tier 2 for the VCA-CSA-201 Certificate of Completion.

See CAPSTONE.md for the full rubric.


Continuation note

This build round covers Modules 1-6 (weeks 1-6) in full, plus the CAPSTONE rubric and INSTRUCTOR-GUIDE skeleton. Modules 7-14 (weeks 7-14) are outlined above and in CONTINUATION.md; their content files are the primary deliverable for the next build round.

Weeks remaining for next round: week-7-sv32-mmu.md, week-8-pmp-wx.md, week-9-stack-canaries-cfi.md, week-10-tracing-gc.md, week-11-preemption-scheduler.md, week-12-driver-writing.md, week-13-external-dram-filesystem.md, week-14-capstone-delivery.md, plus labs/lab-7 through labs/lab-14. The INSTRUCTOR-GUIDE should also be expanded per-week by the continuation round.