Classroom Glossary Public page

Lab 10.1: Tracing GC Running on Memory.lib

550 words

Total points: 25
Estimated time: 3.5 hours
Prerequisites: Lab 9.1 complete; Memory.alloc working; DE10-Nano DDR3 heap accessible


Overview

This lab adds a mark-and-sweep tracing GC to Virtus OS v2's Memory service. You will instrument the allocator with type headers, implement the mark and sweep phases, and measure the cycle cost of one GC cycle.


Part A: Type-header instrumentation (7 pts)

A1: Update Memory.alloc to prepend a type header (4 pts)

Modify Memory.alloc to prepend a 32-bit type header before each object:

uint32_t Memory_alloc(uint16_t size_words, uint16_t ptr_bitmask) {
    // find a free block large enough for (size_words + 1) words
    // (the +1 is for the header)
    uint32_t block = find_free_block(size_words + 1);
    if (block == 0) {
        Memory_gc();              // trigger GC if no space
        block = find_free_block(size_words + 1);
        if (block == 0) return 0; // out of memory after GC
    }
    uint32_t header = ((uint32_t)ptr_bitmask << 16) | (uint32_t)size_words;
    heap[block] = header;         // write header at block[0]
    return block + 1;             // return pointer to block[1] (first field)
}

The returned pointer points past the header, so existing code that dereferences obj[0], obj[1], etc. does not need to change.

Update Memory.dealloc (the manual free path, if present) to also free the header word: free_list_add(ref - 1, size_words + 1).

A2: Stack map at safe points (3 pts)

The GC needs to know which stack slots are live references. Add a simple stack map to your compiler: at each ECALL point (the only safe points in CSA-201), the compiler emits a 32-bit bitmask stored in the read-only data segment. The bitmask bit i=1 means "stack slot at offset i*4 from fp contains a live reference."

For functions with up to 30 local variables (the max given a 30-bit bitmask; reserve bits 30-31 for flags), this is sufficient. For the GC, the OS trap handler reads the bitmask from the function's stack map when it encounters that function's return address in the call stack.

Simplification for the lab: if compiler stack map emission is too complex, use a conservative approximation: treat every stack word in the range [fp - (max_locals * 4), fp] as a potential reference. Conservative scanning may produce false positives (pointers to integers that happen to look like heap addresses) but will not miss live objects.


Part B: Mark and sweep (12 pts)

B1: Mark phase (7 pts)

Implement gc_mark(void):

void gc_mark(void) {
    // Step 1: collect roots from all stack frames
    uint32_t* worklist[MAX_ROOTS];
    int worklist_len = 0;
    
    // scan current stack from fp down to stack_base
    uint32_t* fp = get_current_fp();
    while (fp != NULL && fp > stack_base) {
        uint32_t ret_addr = fp[1];           // saved ra
        uint32_t stackmap = get_stackmap(ret_addr); // look up bitmask
        for (int i = 0; i < 30; i++) {
            if (stackmap & (1 << i)) {
                uint32_t ref = fp[-i-2];     // local variable
                if (is_heap_ref(ref)) {
                    worklist[worklist_len++] = (uint32_t*)ref;
                }
            }
        }
        fp = (uint32_t*)(fp[0]);             // follow saved fp chain
    }
    
    // Step 2: mark phase -- process worklist
    while (worklist_len > 0) {
        uint32_t* obj = worklist[--worklist_len];
        uint32_t* header_ptr = obj - 1;      // header is one word before obj
        uint32_t hdr = *header_ptr;
        
        if (hdr & (1 << 15)) continue;       // already marked
        *header_ptr = hdr | (1 << 15);       // set mark bit
        
        uint16_t size = hdr & 0x7FFF;
        uint16_t ptr_mask = hdr >> 16;
        
        for (int i = 0; i < 16; i++) {
            if (ptr_mask & (1 << i)) {
                uint32_t child_ref = obj[i];
                if (is_heap_ref(child_ref) && worklist_len < MAX_ROOTS) {
                    worklist[worklist_len++] = (uint32_t*)child_ref;
                }
            }
        }
    }
}

B2: Sweep phase (5 pts)

Implement gc_sweep(void):

void gc_sweep(void) {
    uint32_t ptr = HEAP_BASE;
    free_list_reset();  // clear the existing free list
    
    while (ptr < heap_top) {
        uint32_t hdr = heap[ptr];
        uint32_t size = hdr & 0x7FFF;
        int marked = (hdr >> 15) & 1;
        
        if (size == 0) break;  // sentinel
        
        if (marked) {
            heap[ptr] = hdr & ~(1 << 15);  // clear mark bit
        } else {
            free_list_add(ptr, size + 1);   // reclaim header + object
        }
        ptr += size + 1;
    }
}

Implement Memory_gc() to call: stop scheduler (disable timer interrupt), gc_mark(), gc_sweep(), resume scheduler (re-enable timer interrupt).


Part C: Measurement and correctness (6 pts)

C1: Correctness test (3 pts)

Create a benchmark that:

  1. Allocates a linked list of 50 nodes (each node: {value: int, next: ref}, ptr_bitmask = 0b10).
  2. Disconnects the tail 25 nodes from the list (they become garbage).
  3. Runs Memory.gc().
  4. Traverses the remaining 25 nodes and verifies all values are intact.
  5. Allocates 25 new nodes (which should reuse the reclaimed memory).
  6. Verifies allocation succeeded without out-of-memory.

C2: Cycle cost measurement (3 pts)

Extend the benchmark with csrr mcycle instrumentation:

uint64_t before = read_mcycle();
Memory_gc();
uint64_t after = read_mcycle();
uint64_t gc_cycles = after - before;

Run with 3 heap sizes:

  • 50 live objects + 50 garbage = 100 objects total
  • 100 live + 100 garbage = 200 total
  • 200 live + 200 garbage = 400 total

Fill in:

Heap (live + garbage) GC cycles Cycles per object
100 objects
200 objects
400 objects

Verify the relationship: GC time should scale linearly with total heap size (sweep is O(heap_size)) and with live set size (mark is O(live_set)).


Grading

Part Criteria Points
A1 Memory.alloc prepends type header; pointer arithmetic correct 4
A2 Stack map at ECALL points (or conservative scan documented) 3
B1 Mark phase traverses the reference graph; all reachable objects marked 7
B2 Sweep phase reclaims all unmarked objects; free list rebuilt correctly 5
C1 Linked-list correctness test passes; live nodes intact after GC 3
C2 Cycle cost table complete; scaling relationship noted 3
Total 25