Classroom Glossary Public page

Lab 13.1: DDR3 Heap Extension and FAT16 Filesystem Walker

800 words

Total points: 25
Estimated time: 4 hours
Prerequisites: Lab 12.1 complete; DE10-Nano with formatted SD card (FAT16, at least one text file); SSD1306 driver working


Overview

This lab extends Virtus OS v2's memory to use the DE10-Nano's 1 GB DDR3 SDRAM (via the HPS F2H_SDRAM bridge) and adds a read-only FAT16 filesystem walker that reads files from the SD card. By the end, the kernel can allocate from a heap that is 1000x larger than BRAM, and it can read a configuration file from SD card at boot.


Part A: DDR3 address space configuration (8 pts)

A1: Memory address map update (3 pts)

The DE10-Nano DDR3 SDRAM is accessible to the FPGA fabric via the Cyclone V HPS-to-FPGA lightweight bridge. The DDR3 base address in the FPGA address space is 0x10000000 (256 MB from the start of the soft-core address space). The BRAM (your existing instruction and data memory) occupies 0x00000000-0x0003FFFF (256 KiB, expandable to your synthesized size).

Update your memory decoder in the top-level module to route addresses >= 0x10000000 to the AXI4-Lite to Avalon MM adapter that connects to the HPS DDR3 controller:

// Memory select logic
always @(*) begin
    if (paddr < 32'h02000000) begin
        // BRAM / MMIO region
        if (paddr < 32'h00040000)
            sel = SEL_BRAM;
        else if (paddr >= 32'h02000000 && paddr < 32'h0200C000)
            sel = SEL_CLINT;
        else
            sel = SEL_MMIO;
    end else if (paddr >= 32'h10000000) begin
        sel = SEL_DDR3;   // route to HPS AXI bridge
    end else begin
        sel = SEL_FAULT;  // unmapped: generate access fault
    end
end

Verify: write a known value to address 0x10000000 and read it back. The round-trip should succeed in Verilator (use a DDR3 model) and on DE10-Nano.

A2: Heap allocator DDR3 extension (3 pts)

Extend Memory.lib's heap to use DDR3. The existing heap is BRAM-backed (starts at physical address 0x00020000). The DDR3 extension starts at 0x10000000.

Configure the kernel heap descriptor at boot:

#define BRAM_HEAP_BASE    0x00020000
#define BRAM_HEAP_SIZE    0x00020000   // 128 KiB
#define DDR3_HEAP_BASE    0x10000000
#define DDR3_HEAP_SIZE    0x10000000   // 256 MiB (use first 256 MiB of 1 GB)

// heap_descriptor controls which region alloc_page() draws from
typedef struct {
    uint32_t base;
    uint32_t size;
    uint32_t next_free;
} heap_desc_t;

heap_desc_t primary_heap   = { DDR3_HEAP_BASE, DDR3_HEAP_SIZE, DDR3_HEAP_BASE };
heap_desc_t secondary_heap = { BRAM_HEAP_BASE, BRAM_HEAP_SIZE, BRAM_HEAP_BASE };

Modify alloc_page() to draw from primary_heap; fall back to secondary_heap if primary is exhausted. This ensures new allocations use DDR3 by default.

A3: DDR3 latency measurement (2 pts)

DDR3 is slower than BRAM on first access (cache miss in the Cyclone V's HPS memory controller) but has comparable throughput on sequential access. Measure the latency difference:

uint32_t* bram_ptr = (uint32_t*)0x00020000;
uint32_t* ddr3_ptr = (uint32_t*)0x10000000;

uint32_t t0 = read_mcycle();
volatile uint32_t x = *bram_ptr;
uint32_t t1 = read_mcycle();
uint32_t bram_latency = t1 - t0;

t0 = read_mcycle();
volatile uint32_t y = *ddr3_ptr;
t1 = read_mcycle();
uint32_t ddr3_latency = t1 - t0;

Fill in:

Memory First-access latency (cycles) Sequential-read throughput (MB/s)
BRAM (on-chip)
DDR3 (first access, cold)
DDR3 (sequential, warm)

Note: DDR3 first-access latency includes AXI bridge overhead + DRAM row-open. Sequential throughput should be measurable by reading a 4 KiB block and dividing by cycle count.


Part B: FAT16 filesystem walker (12 pts)

B1: SD card SPI initialization (3 pts)

The DE10-Nano's SD card slot is accessible in SPI mode via the Cyclone V's SPI controller at 0xFFF01000, or via a bit-bang SPI on the GPIO header if the HPS SPI is not wired to the FPGA fabric. This lab uses the HPS SPI controller.

The SD card SPI initialization sequence (from the SD card physical layer spec):

void sd_init(void) {
    // Step 1: assert CS, send 80+ clock cycles with MOSI high (CMD=0xFF)
    sd_cs_high();
    for (int i = 0; i < 10; i++) spi_write(0xFF);  // 10 * 8 = 80 clocks
    
    // Step 2: CMD0 (GO_IDLE_STATE) -- put card in SPI mode
    sd_cs_low();
    sd_send_cmd(0, 0, 0x95);   // CMD0, arg=0, CRC=0x95 (hard-coded for CMD0)
    uint8_t r1 = sd_wait_r1();
    // expect R1 = 0x01 (idle state)
    
    // Step 3: CMD8 (SEND_IF_COND) -- verify voltage range (SD v2 only)
    sd_send_cmd(8, 0x000001AA, 0x87);  // check voltage pattern 0xAA
    uint8_t r7[5]; sd_read_bytes(r7, 5);
    // expect r7[4] == 0xAA if SD v2+
    
    // Step 4: ACMD41 (APP_SEND_OP_COND) loop until card leaves idle
    do {
        sd_send_cmd(55, 0, 0xFF);   // CMD55: application-specific command follows
        sd_wait_r1();
        sd_send_cmd(41, 0x40000000, 0xFF);  // ACMD41 with HCS=1
        r1 = sd_wait_r1();
    } while (r1 & 0x01);   // loop while still idle
    
    // Step 5: CMD16 -- set block size to 512 bytes (required for FAT access)
    sd_send_cmd(16, 512, 0xFF);
    sd_wait_r1();
    sd_cs_high();
}

After sd_init() completes, the card is ready to accept CMD17 (single-block read) and CMD18 (multi-block read).

B2: Sector read and FAT16 structures (5 pts)

Implement single-sector read (512 bytes):

int sd_read_sector(uint32_t lba, uint8_t* buf) {
    sd_cs_low();
    sd_send_cmd(17, lba, 0xFF);   // CMD17 READ_SINGLE_BLOCK, arg = LBA
    if (sd_wait_r1() != 0) { sd_cs_high(); return -1; }
    
    // wait for data token 0xFE
    uint8_t tok;
    int timeout = 1000;
    do { tok = spi_read_byte(); } while (tok == 0xFF && --timeout > 0);
    if (tok != 0xFE) { sd_cs_high(); return -1; }
    
    for (int i = 0; i < 512; i++) buf[i] = spi_read_byte();
    spi_read_byte(); spi_read_byte();  // discard 2-byte CRC
    sd_cs_high();
    return 0;
}

FAT16 key structures (from the Microsoft FAT specification):

Boot sector (sector LBA 0):

Offset Size Field
0x0B 2 Bytes per sector (always 512)
0x0D 1 Sectors per cluster
0x0E 2 Reserved sectors (FAT starts here)
0x10 1 Number of FATs (usually 2)
0x11 2 Root entry count (max root-dir entries)
0x13 2 Total sectors (16-bit; 0 if > 65535)
0x16 2 Sectors per FAT
0x20 4 Total sectors (32-bit)

Parse the boot sector:

typedef struct {
    uint16_t bytes_per_sector;
    uint8_t  sectors_per_cluster;
    uint16_t reserved_sectors;
    uint8_t  num_fats;
    uint16_t root_entry_count;
    uint16_t sectors_per_fat;
} fat16_bpb_t;

void fat16_parse_bpb(uint8_t* sector, fat16_bpb_t* bpb) {
    bpb->bytes_per_sector    = *(uint16_t*)(sector + 0x0B);
    bpb->sectors_per_cluster = sector[0x0D];
    bpb->reserved_sectors    = *(uint16_t*)(sector + 0x0E);
    bpb->num_fats            = sector[0x10];
    bpb->root_entry_count    = *(uint16_t*)(sector + 0x11);
    bpb->sectors_per_fat     = *(uint16_t*)(sector + 0x16);
}

Derive the key LBA offsets:

uint32_t fat_start   = bpb.reserved_sectors;
uint32_t root_start  = fat_start + bpb.num_fats * bpb.sectors_per_fat;
uint32_t data_start  = root_start + (bpb.root_entry_count * 32 + 511) / 512;

B3: Directory listing and file read (4 pts)

Each root directory entry is 32 bytes:

Offset Size Field
0x00 8 Filename (8.3, space-padded)
0x08 3 Extension
0x0B 1 Attributes (0x20 = archive, 0x10 = directory)
0x1A 2 First cluster (low word)
0x1C 4 File size in bytes

Implement directory listing:

void fat16_list_root(fat16_bpb_t* bpb, uint32_t root_start) {
    uint8_t sector[512];
    int entries_per_sector = 512 / 32;
    int remaining = bpb->root_entry_count;
    
    for (uint32_t lba = root_start; remaining > 0; lba++) {
        sd_read_sector(lba, sector);
        for (int i = 0; i < entries_per_sector && remaining > 0; i++, remaining--) {
            uint8_t* ent = sector + i * 32;
            if (ent[0] == 0x00) return;   // no more entries
            if (ent[0] == 0xE5) continue; // deleted entry
            if (ent[0x0B] & 0x08) continue; // volume label
            
            char name[13];
            // format: "FILENAME.EXT"
            int j = 0;
            for (int k = 0; k < 8 && ent[k] != ' '; k++) name[j++] = ent[k];
            if (ent[8] != ' ') { name[j++] = '.'; }
            for (int k = 8; k < 11 && ent[k] != ' '; k++) name[j++] = ent[k];
            name[j] = '\0';
            
            uint32_t size = *(uint32_t*)(ent + 0x1C);
            oled_draw_string(i % 8, 0, name);
        }
    }
}

Implement file read (follow FAT cluster chain):

int fat16_read_file(fat16_bpb_t* bpb, uint32_t fat_start, uint32_t data_start,
                    uint16_t first_cluster, uint8_t* out_buf, uint32_t max_bytes) {
    uint8_t fat_sector[512];
    uint8_t data_sector[512];
    uint32_t written = 0;
    uint16_t cluster = first_cluster;
    
    while (cluster < 0xFFF8 && written < max_bytes) {
        // data sector for this cluster
        uint32_t data_lba = data_start + (cluster - 2) * bpb->sectors_per_cluster;
        for (int s = 0; s < bpb->sectors_per_cluster && written < max_bytes; s++) {
            sd_read_sector(data_lba + s, data_sector);
            uint32_t to_copy = (max_bytes - written < 512) ? max_bytes - written : 512;
            for (uint32_t i = 0; i < to_copy; i++) out_buf[written++] = data_sector[i];
        }
        // follow FAT chain: FAT entry for cluster is at FAT sector (cluster * 2 / 512)
        uint32_t fat_lba = fat_start + (cluster * 2) / 512;
        sd_read_sector(fat_lba, fat_sector);
        cluster = *(uint16_t*)(fat_sector + (cluster * 2) % 512);
    }
    return written;
}

Part C: Configuration file integration test (5 pts)

C1: Boot-time config file read (3 pts)

Create a file named BOOT.CFG on the SD card (FAT16 formatted) with this content:

HOSTNAME=virtus-os-v2
MAX_PROCS=4
TIMER_INTERVAL=500000

At Virtus OS v2 boot (after oled_init() and sd_init()), call the FAT16 walker to:

  1. Mount the FAT16 volume (parse boot sector).
  2. Search the root directory for BOOT CFG (8.3 format, space-padded).
  3. Read the file contents into a 512-byte kernel buffer.
  4. Parse key=value pairs.
  5. Display the hostname on the OLED boot screen.

C2: Memory allocation after heap extension (2 pts)

Verify that the DDR3 heap extension is functional by performing large allocations:

// Allocate 100 pages * 4 KiB = 400 KiB total
// This exceeds BRAM heap capacity; must use DDR3
uint32_t pages[100];
for (int i = 0; i < 100; i++) {
    pages[i] = alloc_page();
    if (pages[i] == 0) {
        oled_draw_string(0, 0, "ALLOC FAIL");
        return;
    }
    // write a pattern to verify the page is accessible
    *(uint32_t*)pages[i] = 0xDEAD0000 | i;
}

// verify all patterns
for (int i = 0; i < 100; i++) {
    if (*(uint32_t*)pages[i] != (0xDEAD0000 | i)) {
        oled_draw_string(1, 0, "VERIFY FAIL");
        return;
    }
}
oled_draw_string(2, 0, "DDR3 OK");

The OLED must display DDR3 OK to pass this item.


Grading

Part Criteria Points
A1 Memory decoder routes DDR3 addresses correctly; round-trip write/read passes 3
A2 Heap extension uses DDR3 as primary; falls back to BRAM; alloc_page returns DDR3 addresses 3
A3 Latency table complete; BRAM vs DDR3 first-access measured 2
B1 sd_init() completes; CMD0/CMD8/ACMD41/CMD16 sequence correct; R1 response handled 3
B2 Boot sector parsed; fat_start / root_start / data_start computed correctly 5
B3 Directory listing displays filenames; file read follows FAT cluster chain 4
C1 BOOT.CFG parsed at boot; hostname displayed on OLED 3
C2 100-page DDR3 allocation succeeds; patterns verified; OLED shows DDR3 OK 2
Total 25