Week 4: Machine Language — RV32I-Lite ISA · CSA-110

In CSA-101 Week 6 you hand-assembled 6502 instructions by looking up opcodes in a table. This week you do the same thing for RV32I-Lite. The RV32I-Lite encoding is more regular: every instruction is 32 bits wide, the register fields always sit in the same bit positions, and there are only 4 instruction formats to memorize instead of 13 addressing modes. Harder in bulk; easier per instruction.

Reading

Petzold weave anchors. Ch 17 (Automation, p. 239) for the idea that "bytes carry meaning" -- the assembly mnemonic is a human-readable spelling of the same bits; Ch 19 (Two Classic Microprocessors, p. 271) specifically pp. 271-272 showing the 8080 MOV encoding 01dddsss. The structure-through-encoding insight is universal. ~32 pages.
Cross-chapter handout. handouts/cross-chapter-rv32i-lite-encoding-card.md. This is the single most important reference document in the course. Print it and pin it up.

Lecture

3 hours. Key arc:

The four RV32I-Lite instruction formats. Every instruction is 32 bits wide. The opcode lives at bits 6:0. The register fields -- rd, rs1, rs2 -- live at the same bit positions in every format that includes them:

R-type: [funct7 7b][rs2 5b][rs1 5b][funct3 3b][rd 5b][opcode 7b]
I-type: [imm[11:0] 12b]   [rs1 5b][funct3 3b][rd 5b][opcode 7b]
S-type: [imm[11:5] 7b][rs2 5b][rs1 5b][funct3 3b][imm[4:0] 5b][opcode 7b]
B-type: [imm[12|10:5] 7b][rs2 5b][rs1 5b][funct3 3b][imm[4:1|11] 5b][opcode 7b]

The regularity is deliberate: a hardware decoder can extract rs1 and rs2 without knowing the opcode first. The 6502 decoder had to determine the addressing mode before it knew where the operands were. The RV32I cost is the B-type and S-type immediate fields being split (to keep register fields in fixed positions); you will notice this when hand-encoding branch instructions.

The 11 RV32I-Lite instructions plus 8 pseudo-instructions.

Mnemonic	Format	What it does
`ADD rd, rs1, rs2`	R	rd = rs1 + rs2
`SUB rd, rs1, rs2`	R	rd = rs1 - rs2
`AND rd, rs1, rs2`	R	rd = rs1 & rs2
`OR rd, rs1, rs2`	R	rd = rs1 \| rs2
`ADDI rd, rs1, imm`	I	rd = rs1 + sign_ext(imm[11:0])
`LW rd, imm(rs1)`	I	rd = mem[rs1 + imm] (word)
`SW rs2, imm(rs1)`	S	mem[rs1 + imm] = rs2 (word)
`BEQ rs1, rs2, label`	B	if rs1==rs2: PC = PC + offset
`BNE rs1, rs2, label`	B	if rs1!=rs2: PC = PC + offset
`JAL rd, label`	J	rd = PC+4; PC = PC + offset
`JALR rd, rs1, imm`	I	rd = PC+4; PC = rs1 + imm

Pseudo-instructions (assembled by the assembler into real instructions): LI (load immediate), MV (move), J (unconditional jump), NOP, RET, CALL, NOT, NEG.

Hand-encoding a worked example. Encode ADDI x1, x0, 5 (load the value 5 into register x1):

opcode for I-type ADDI = 0010011 (bits 6:0)
rd = x1 = 00001 (bits 11:7)
funct3 for ADDI = 000 (bits 14:12)
rs1 = x0 = 00000 (bits 19:15)
imm[11:0] = 000000000101 (bits 31:20)
Assembled: 00000000010100000000000010010011 = 0x00500093

Comparison to 6502 encoding. The 6502 has variable-length instructions: 1, 2, or 3 bytes depending on the addressing mode. LDA #$42 is 2 bytes (A9 42); LDA $1234,X is 3 bytes (BD 34 12). The opcode byte alone determines the instruction length. The RV32I-Lite fixed-32-bit encoding means every instruction is the same length; the program counter always increments by 4; and there are no prefetch complications.

Lab exercises

Five labs in labs/lab-4.md. Plan for ~6 hours. This is the densest lab week; budget extra time.

Lab 4.1. Hand-encode 10 RV32I-Lite instructions: two ADD, two ADDI, one LW, one SW, one BEQ, one JAL, one JALR, one ADDI with negative immediate. Show bit-field work; verify with riscv64-linux-gnu-as.
Lab 4.2. Hand-decode 10 32-bit words (given as hex): identify instruction type, operands, and immediate. Verify with riscv64-linux-gnu-objdump.
Lab 4.3. Write sum-to-N in RV32I-Lite assembly: load N=10, initialize sum=0, loop adding i from 1 to N, store result. Hand-assemble to a hex array. Verify with riscv64-linux-gnu-as.
Lab 4.4. Use riscv64-linux-gnu-as to assemble your Lab 4.3 program. Compare your hand-encoded bytes against the assembler's output. If they differ, find the discrepancy and explain it.
Lab 4.5 (Ghidra). Load the assembled binary into Ghidra (processor: RISC-V:LE:32:RV32I). Confirm Ghidra's disassembly matches the mnemonics you intended. Record your first Ghidra-for-RISC-V session in your Toolchain Diary. Compare to your CSA-101 experience with Ghidra on 6502 binaries: what is the same, what changed?

Independent practice

Read Petzold Ch 17 carefully. This is the first of five visits to Ch 17 across the course.
Read Petzold Ch 19 pp. 271-272: the 8080 MOV encoding 01dddsss. Every register field at a fixed bit position. The same design principle as RV32I-Lite's fixed-field layout, implemented 40 years earlier on an 8-bit processor.
Work through the encoding card handout for every instruction type. Time yourself. The hand-encoding labs are the best way to build the mental model that makes debugging fast later.

Architecture comparison sidebar

6502 variable-length encoding vs RV32I-Lite fixed-32-bit encoding vs x86_64 variable-length.

The 6502's instruction encoding is clever and dense. LDA #$42 (load immediate) is 2 bytes; the first byte says "load A with an immediate" and the second byte is the value. LDA $1234,X (indexed absolute addressing) is 3 bytes. The decoder can determine the instruction length and operand location from the first byte alone. Code density is high; a 6502 program in 4 KB does a lot.

RV32I-Lite encodes every instruction in exactly 32 bits. The decoder has a simpler job: read 4 bytes, extract fields at fixed positions. The cost is code density: a program that uses mostly simple instructions pays 32 bits for each even when 8 bits would have been enough. The RV32I compressed extension (C) adds 16-bit encodings for common instructions; CSA-110 does not implement C (CSA-201 adds it as a lab exercise).

x86_64 takes the density idea further than the 6502: instructions range from 1 to 15 bytes, with optional prefixes, REX bytes, and multiple operand-size modes. The decoding logic is one of the most complex units in a modern x86 CPU. Intel publishes microarchitecture optimization manuals specifically about the frontend decode unit. The tradeoff was made when code density mattered more than decoder simplicity (late 1970s); the x86 lineage is now locked into backward compatibility with that choice.

The lesson the CSA-110 architecture sidebars return to repeatedly: encoding width is a design parameter, not a universal truth. You have built systems on both sides of that parameter.

Reflection prompts

The B-type and S-type immediate fields are split across the 32-bit word (e.g., imm[4:0] at one position and imm[11:5] at another). Why did the RV32I designers do this? What hardware property does it enable that simpler immediate placement would not?
You have now hand-assembled both 6502 instructions (CSA-101 Week 6) and RV32I-Lite instructions (this week). Which encoding would you rather read in hex? Which would you rather write by hand? Do those answers point at the same architectural property?
Ghidra disassembled your binary automatically. In CSA-101 you used Ghidra on Arlet's 6502 ROM. What does it mean that the same tool handles both ISAs? What does this tell you about the relationship between the instruction set and the disassembler?

What's next

Week 5 wires the ALU (Week 2), the register file (Week 3), and the instruction decoder you write this week into a single cpu.v module. You synthesize it to a Tang Primer 25K bitstream, flash it, and run the sum-to-N program from Lab 4.3 on silicon you built.