The 4-bit adder from Week 1 becomes a 32-bit ripple-carry chain. Then you add subtraction, AND, OR, XOR, and a select line. The result is the ALU that sits at the center of your CPU for the next twelve weeks. The 6502's ALU was 8 bits wide and handled BCD; RV32I-Lite's ALU is 32 bits wide and skips BCD entirely.
Reading
- Petzold weave anchors. Ch 12 (A Binary Adding Machine, p. 168), Ch 13 (But What About Subtraction?, p. 181). Two chapters you read in CSA-101, now read for the 32-bit angle: Ch 13's two's complement treatment explains exactly why the RV32I-Lite
SUBinstruction is justADDwith the second operand negated. ~25 pages.
Lecture
3 hours. Key arc:
From half adder to full adder. Two half adders plus an OR gate for the carry-out. The full adder accepts a carry-in from the previous stage. Review the Lab 1.4 half adder truth table and extend it to three inputs.
The 32-bit ripple-carry chain. Wire 32 full adders in sequence: bit 0's carry-out feeds bit 1's carry-in, and so on to bit 31. This is exactly the same design as the 8-bit adder in the 6502, doubled twice. The chain delay (propagation time = 32 gate delays) is slow; the CSA-201 fast-carry or carry-lookahead adder replaces it, but the ripple design is correct and readable.
Subtraction from addition. Two's complement: flip all bits and add one. RV32I-Lite SUB rd, rs1, rs2 is ADD rd, rs1, ~rs2 + 1. Your ALU module accepts a subtract control signal that inverts the second operand and sets carry-in to 1. No separate subtractor needed; this is the design principle Petzold traces in Ch 13.
ALU operation select. The ALU performs ADD/SUB, AND, OR, XOR, and SLT (set-less-than, which is subtraction with the carry-out and sign bit combined). A 3-bit op select signal routes the result through a multiplexer. Write alu.v with inputs a[31:0], b[31:0], op[2:0] and outputs result[31:0], zero.
The 6502 comparison. The 6502's ALU also performs ADD, SUB, AND, OR, XOR, and shift. But the 6502 adds BCD (binary-coded decimal) mode, controlled by the Decimal flag in the processor status register. RV32I-Lite has no BCD mode because modern software handles decimal formatting in software. Your RV32I-Lite ALU is approximately 20% simpler for skipping BCD. Arlet's alu.v is ~200 lines partly because of BCD and decimal-correction logic; your alu.v will be closer to 100.
Lab exercises
Four labs in labs/lab-2.md. Plan for ~5 hours.
- Lab 2.1. Extend
half_adder.vfrom Lab 1.4 tofull_adder.v. Add a carry-in port. Verify all 8 input combinations. - Lab 2.2. Wire 32
full_adderinstances intoadder32.v. Verify against 20 test cases with known sums. Check overflow behavior:32'hFFFFFFFF + 1should produce32'h00000000with carry-out = 1. - Lab 2.3. Write
alu.vsupporting ADD, SUB, AND, OR, XOR, SLT. Verify against an automated testbench (lab2_alu_tb.v) that checks all six operations with at least 10 cases each. - Lab 2.4 (IEEE-754 hand encoding). Hand-encode three floating-point numbers:
1.5,-0.75, and0.1in IEEE-754 single precision. Verify withpython3 -c "import struct; print(struct.pack('>f', 1.5).hex())". This exercise is not used by your RV32I-Lite CPU (which has no FPU) but prepares you for the CSA-201 floating-point module.
Independent practice
- Read Petzold Ch 12 and Ch 13 together. Note specifically the two's complement explanation in Ch 13: "flip and add one" is one of the oldest tricks in hardware engineering, and it appears in every architecture from the 6502 to the x86_64.
- Measure the gate depth of your
adder32.vby counting the longest signal propagation chain from any input to any output. The ripple-carry adder has gate depth proportional to the bit width. Record this in your Toolchain Diary. - Optional: look up carry-lookahead adders. In CSA-201 Module 1, the M-extension multiplier uses a tree adder internally to achieve O(log n) delay instead of O(n). This week's ripple adder is the baseline you will compare against.
Architecture comparison sidebar
The 6502's BCD mode vs RV32I-Lite's absence of BCD.
The 6502 has a Decimal flag in its processor status register. When set, ADC and SBC operate in BCD: each 4-bit nibble represents a decimal digit (0-9), and the ALU adjusts the result after binary addition to correct the decimal digits. This requires a decimal-correction circuit that fires conditionally after every add or subtract.
BCD made sense in 1975 for financial calculations and cash registers that needed exact decimal arithmetic without floating-point hardware. By the time the 6502's successors appeared, BCD in hardware was increasingly a maintenance burden: it complicated the ALU, added a flag bit, and was rarely used by the programs that mattered most.
RV32I-Lite has no BCD mode. Decimal formatting is handled by software. The tradeoff is that software division-by-10 loops (for printing decimal numbers) cost a few dozen instructions; the gain is an ALU that is simpler, smaller, and easier to synthesize correctly.
Every ISA eventually drops BCD: the ARM NEON instruction set does not include BCD; x86_64's BCD opcodes (AAA, AAD, AAM, AAS, DAA, DAS) were removed in 64-bit mode. The arc is from hardware BCD in 1975 to software BCD in 2025. Your RV32I-Lite lands at the destination.
Reflection prompts
- Your
alu.vperforms subtraction by inverting the second operand and asserting carry-in = 1. The 6502 does the same thing in its ALU but then adds optional BCD correction. What would you have to add to your Verilog to support BCD mode? Is it worth it? - The 32-bit ripple-carry adder has carry propagation that takes 32 gate delays. Your CPU's clock cycle must accommodate this propagation. What does this imply about the maximum clock frequency of a CPU with a ripple-carry adder vs one with a carry-lookahead adder?
- Petzold traces binary arithmetic from mechanical odometers through relay circuits to transistors, and arrives at the same ALU you just wrote. What invariant persisted across all those physical implementations?
What's next
Week 3 adds sequential logic to the combinational circuits from Weeks 1-2. You build a D flip-flop, wire 8 of them into the first of your CPU's registers, and add a byte-addressable RAM module. The data path begins to take shape: ALU feeds the register file, register file feeds the ALU.