Classroom Glossary Public page

Lab 5: Computer Architecture — CPU Integration and First Boot

599 words

Week: 5
Points: 25
Time: ~6 hours
Deliverable: verilog/cpu/ directory + synthesis report + UART output photo + diary/week-05.md


What you ship

  • verilog/cpu/decoder.v — instruction decoder
  • verilog/cpu/immgen.v — immediate generator
  • verilog/cpu/cpu.v — top-level CPU module
  • verilog/cpu/top.v — Tang Primer 25K top module (UART + reset + clock)
  • lab5_simulation_output.txt — sum-to-N result = 55 in simulation
  • lab5_synthesis_report.txt — Gowin/Apicula synthesis report (LUT count, Fmax, BRAM)
  • lab5_uart_output.jpg — photo or screenshot of UART terminal showing result on silicon
  • lab5_seeded_bug_analysis.md — description of the seeded bug and how you found it
  • diary/week-05.md

Lab 5.1: Instruction decoder

Write decoder.v. Input: 32-bit instruction word. Outputs: control signals for every component in the data path.

Minimum required outputs:

Signal Width Description
reg_we 1 Register file write enable
mem_we 1 Data memory write enable
mem_re 1 Data memory read enable
alu_op 3 ALU operation (matches lab-2 op encoding)
alu_src 1 0 = rs2, 1 = immediate
branch 1 This is a branch instruction
jump 1 This is JAL or JALR
mem_to_reg 2 0 = ALU, 1 = memory read, 2 = PC+4 (for JAL)
branch_type 1 0 = BEQ, 1 = BNE

Run lab5_decoder_tb.v which feeds one representative instruction of each type and checks all control signals. All 11 instruction variants must produce correct outputs.


Lab 5.2: Immediate generator

Write immgen.v. Input: 32-bit instruction word. Output: 32-bit sign-extended immediate.

Handle all four immediate-producing formats: I-type (ADDI, LW, JALR), S-type (SW), B-type (BEQ, BNE), J-type (JAL).

Remember: B-type and J-type immediates have their bits reordered across the instruction word. The output must be the reconstructed, sign-extended offset.

module immgen (
    input  wire [31:0] instr,
    output reg  [31:0] imm
);
    wire [6:0] opcode = instr[6:0];
    always @(*) begin
        case (opcode)
            7'b0010011, // I-type (ADDI, etc.)
            7'b0000011, // I-type (LW)
            7'b1100111: // I-type (JALR)
                imm = {{20{instr[31]}}, instr[31:20]};
            // ...
        endcase
    end
endmodule

Run the immgen testbench for all four format types with positive and negative immediates.


Lab 5.3: CPU integration and simulation

Write cpu.v. Instantiate and connect:

  • pc_reg (a 32-bit DFF holding the program counter)
  • imem (instruction memory, initialized from a hex file)
  • decoder
  • immgen
  • regfile
  • alu
  • dmem (data memory)
  • Writeback mux (selects between ALU result, memory read, PC+4)
  • Branch logic (computes next PC for branches and jumps)

Load sum-to-n.hex (your Lab 4.4 output) into imem. Simulate for 200 clock cycles. Verify that the data memory at address 0 contains 55 (0x00000037) after the program completes.

iverilog -o cpu_sim verilog/cpu/cpu.v verilog/cpu/decoder.v verilog/cpu/immgen.v \
    verilog/alu/alu.v verilog/mem/regfile.v verilog/mem/mem.v \
    worksheets/csa-110/lab5_cpu_tb.v
vvp cpu_sim | tee lab5_simulation_output.txt
# Expected: "[PASS] mem[0] = 0x00000037 (55)"

Lab 5.4: Synthesize and boot

Write top.v for the Tang Primer 25K. Wrap your cpu.v with:

  • PLL or clock-divider to generate your target clock (start at 4 MHz; increase if synthesis Fmax allows)
  • UART transmitter (the academy provides uart_tx.v in worksheets/csa-110/)
  • Reset logic (active-low reset from the Tang Primer's button)
  • UART output: at program end (when PC reaches the infinite loop), transmit the value in data memory address 0 as an ASCII hex string
# Synthesize with Apicula
yosys -p "synth_gowin -top top -json top.json" verilog/cpu/top.v ...
nextpnr-himbaechel --device GW5A-LV25MG121 --json top.json --write top_pnr.json
gowin_pack --device GW5A-LV25MG121 top_pnr.json -o top.fs
openFPGALoader -b tangnano20k top.fs   # or tang_primer_25k

Connect a UART terminal at 115200 baud. Observe the sum-to-N result. Take a photo or screenshot.

Record from the synthesis report: LUT count, Fmax estimate, BRAM blocks used.


Lab 5.5: Seeded failure drill

The testbench lab5_cpu_tb.v includes a version with one deliberately broken instruction: a BEQ with an off-by-one branch offset. The broken version produces the wrong answer (not 55).

Your task: find the bug. Steps:

  1. Run the broken testbench: iverilog ... worksheets/csa-110/lab5_seeded_tb.v
  2. Observe the wrong answer
  3. Add $display statements or use GTKWave to trace execution
  4. Find the instruction with the wrong offset
  5. Fix it and verify the correct answer reappears

Record the debugging session in lab5_seeded_bug_analysis.md: what symptom did you observe, what tool or technique found the bug, what the fix was.


Toolchain Diary

Record in diary/week-05.md:

  • Your CPU's line count vs Arlet's cpu.v from CSA-101 (with explanation)
  • Synthesis report: LUT count, Fmax, BRAM blocks
  • The critical path (which module limits Fmax)
  • The seeded bug: describe it and how you found it

Grading

Component Points
decoder.v testbench: all 11 instruction variants pass 5
immgen.v testbench: all four format types with positive and negative immediates 3
CPU simulation: sum-to-N produces 55 in simulation 7
Synthesis report: LUT count, Fmax, BRAM noted 3
UART output on silicon (photo or screenshot) 4
Seeded bug analysis: found, fixed, and explained 3