Classroom Public page

RE-011 Lab 5: Assembly-to-C Reconstruction

816 words

Given a stripped 50-instruction function with no source, reconstruct a plausible C source. Label every control-flow pattern. Compare your reconstruction to Ghidra's decompiler output.


Overview

You receive a disassembly listing of a single stripped function (approximately 50 instructions). Using only the control-flow patterns from Weeks 4-5 and the System V AMD64 calling convention, you produce a plausible C source for the function. After your reconstruction, you use Ghidra's decompiler as a cross-check -- not as your primary answer.

Tools: objdump (or the provided listing), Ghidra (cross-check only)

Time: ~90 minutes.


The target function

The following disassembly is from the .text section of a stripped x86-64 ELF binary. The function has been extracted at address 0x00401160:

0000000000401160 <FUN_00401160>:
  401160: 53                      push   rbx
  401161: 48 89 fb                mov    rbx, rdi
  401164: 48 85 ff                test   rdi, rdi
  401167: 74 3b                   je     4011a4
  401169: 48 8d 35 98 0e 00 00    lea    rsi, [rip+0xe98]
  401170: bf 01 00 00 00          mov    edi, 0x1
  401175: 31 c0                   xor    eax, eax
  401177: e8 d4 fe ff ff          call   401050 <__printf_chk@plt>
  40117c: 4c 8b 23                mov    r12, QWORD PTR [rbx]
  40117f: 4d 85 e4                test   r12, r12
  401182: 74 1e                   je     4011a2
  401184: 4c 89 e3                mov    rbx, r12
  401187: 4c 8b 63 08             mov    r12, QWORD PTR [rbx+0x8]
  40118b: 48 8b 13                mov    rdx, QWORD PTR [rbx]
  40118e: 48 8b 7b 10             mov    rdi, QWORD PTR [rbx+0x10]
  401192: 48 8d 35 87 0e 00 00    lea    rsi, [rip+0xe87]
  401199: e8 c2 fe ff ff          call   401060 <fprintf@plt>
  40119e: 4d 85 e4                test   r12, r12
  4011a1: 75 e1                   jne    401184
  4011a3: c3                      ret
  4011a4: 48 8d 35 85 0e 00 00    lea    rsi, [rip+0xe85]
  4011ab: bf 02 00 00 00          mov    edi, 0x2
  4011b0: 31 c0                   xor    eax, eax
  4011b2: e8 99 fe ff ff          call   401050 <__printf_chk@plt>
  4011b7: c3                      ret

Available information:

  • The three lea rsi, [rip+N] instructions at 0x401169, 0x401192, and 0x4011a4 each load the address of a string constant in .rodata. The string contents (in order) are: "Processing list:\n", " [%lu]\n", and "Error: null list\n" -- you can infer the likely content from the context of each call site.
  • Do not use a hardcoded address for any of these strings. Resolve each virtual address yourself as part of Part A step 4. Use the RIP-relative formula: next-instruction address + signed displacement = target address. Show your arithmetic.

Part A: Trace the control flow

Before writing any C, draw or describe the control flow graph of this function:

  1. Identify all basic blocks (a basic block ends at a branch or return instruction and begins at a branch target).
  2. List each basic block by its start address.
  3. Draw the edges: which basic block flows to which, and under what condition?

Label the edges with the condition: je (equal/zero), jne (not equal/nonzero), fall-through (no jump taken).

  1. For each of the three lea rsi, [rip+N] instructions (0x401169, 0x401192, 0x4011a4): compute the virtual address of the string it loads. Show your arithmetic -- state the next-instruction address, add the displacement, and give the result. This is the same RIP-relative resolution process a debugger performs at run time.

Part B: Identify patterns

For each pattern from Weeks 4-5 that you recognize, label it:

  1. Null pointer check: Where in the function is a null check performed? What register is checked? What is the behavior if null?

  2. Loop: Is there a backward jump? What address does it jump to? What is the loop condition?

  3. Struct access pattern: The function reads from [rbx] and [rbx+0x8] and [rbx+0x10]. What does this suggest about the parameter's type?

  4. Calling convention: What is the first argument to this function (in rdi at entry)? What is it used for?


Part C: Plausible C reconstruction

Write a plausible C source for this function. You are reconstructing, not recovering -- there may be multiple valid C sources that compile to equivalent code.

Requirements for your reconstruction:

  • Use a named struct that accounts for the three fields accessed ([rbx], [rbx+0x8], [rbx+0x10])
  • Include the two string constants as visible calls
  • Match the control flow you identified in Part A
  • Include a comment for each control-flow decision: why you interpreted it as you did

Example skeleton to help you get started:

struct ListNode {
    /* your fields here, based on the access offsets */
};

void FUN_00401160(/* what type does rdi hold? */) {
    /* your reconstruction here */
}

Part D: Ghidra cross-check

Import the lab binary into Ghidra (the instructor provides the full binary, not just the listing). Self-paced fallback: see labs/_artifacts/README.md ("Self-paced fallback: Lab 5") for a C source + compile command that produces a structurally similar binary; use the resulting lab5_target in place of the instructor binary. The control-flow patterns are equivalent, though function addresses will differ. Navigate to the linked-list traversal function (search for a FUN_ label that calls fprintf twice), not necessarily FUN_00401160. Look at the decompiler output.

  1. Does Ghidra's decompiler identify the same control-flow structure you did?
  2. Does Ghidra correctly identify the linked-list traversal (or whatever you determined the loop does)?
  3. Does Ghidra's pseudo-C look similar to your reconstruction? Where does it differ?
  4. Did the decompiler add or remove any structure that surprised you?

Part E: Reflection

Write a paragraph (100-150 words) answering: what was the hardest part of the reconstruction? Where did your reconstruction match the decompiler exactly? Where did it differ, and whose version is closer to what you think the original source was?


Lab Report

Submit one document with Parts A through E:

  • Part A: control flow graph (diagram or structured description)
  • Part B: four pattern identifications with evidence from the listing
  • Part C: your C reconstruction with comments
  • Part D: a paragraph comparing your reconstruction to Ghidra's output
  • Part E: reflection paragraph

Grading

Criterion Points
Part A: Control flow graph complete and accurate 20
Part B: Four patterns identified with specific evidence 20
Part C: C reconstruction is structurally correct and internally consistent 35
Part D: Comparison with decompiler is accurate and analytical 15
Part E: Reflection is genuine (not just "Ghidra was right") 10
Total 100

A reconstruction that disagrees with Ghidra but is well-reasoned earns high marks in Part C. A reconstruction that copies Ghidra's output without independent analysis earns zero in Part C.


Lab 5 of 9. Due: end of Week 5. The assembly-to-C translation skill is the core RE skill; every subsequent lab uses it.