Given a stripped 50-instruction function with no source, reconstruct a plausible C source. Label every control-flow pattern. Compare your reconstruction to Ghidra's decompiler output.
Overview
You receive a disassembly listing of a single stripped function (approximately 50 instructions). Using only the control-flow patterns from Weeks 4-5 and the System V AMD64 calling convention, you produce a plausible C source for the function. After your reconstruction, you use Ghidra's decompiler as a cross-check -- not as your primary answer.
Tools: objdump (or the provided listing), Ghidra (cross-check only)
Time: ~90 minutes.
The target function
The following disassembly is from the .text section of a stripped x86-64 ELF binary. The function has been extracted at address 0x00401160:
0000000000401160 <FUN_00401160>:
401160: 53 push rbx
401161: 48 89 fb mov rbx, rdi
401164: 48 85 ff test rdi, rdi
401167: 74 3b je 4011a4
401169: 48 8d 35 98 0e 00 00 lea rsi, [rip+0xe98]
401170: bf 01 00 00 00 mov edi, 0x1
401175: 31 c0 xor eax, eax
401177: e8 d4 fe ff ff call 401050 <__printf_chk@plt>
40117c: 4c 8b 23 mov r12, QWORD PTR [rbx]
40117f: 4d 85 e4 test r12, r12
401182: 74 1e je 4011a2
401184: 4c 89 e3 mov rbx, r12
401187: 4c 8b 63 08 mov r12, QWORD PTR [rbx+0x8]
40118b: 48 8b 13 mov rdx, QWORD PTR [rbx]
40118e: 48 8b 7b 10 mov rdi, QWORD PTR [rbx+0x10]
401192: 48 8d 35 87 0e 00 00 lea rsi, [rip+0xe87]
401199: e8 c2 fe ff ff call 401060 <fprintf@plt>
40119e: 4d 85 e4 test r12, r12
4011a1: 75 e1 jne 401184
4011a3: c3 ret
4011a4: 48 8d 35 85 0e 00 00 lea rsi, [rip+0xe85]
4011ab: bf 02 00 00 00 mov edi, 0x2
4011b0: 31 c0 xor eax, eax
4011b2: e8 99 fe ff ff call 401050 <__printf_chk@plt>
4011b7: c3 ret
Available information:
- The three
lea rsi, [rip+N]instructions at0x401169,0x401192, and0x4011a4each load the address of a string constant in.rodata. The string contents (in order) are:"Processing list:\n"," [%lu]\n", and"Error: null list\n"-- you can infer the likely content from the context of each call site. - Do not use a hardcoded address for any of these strings. Resolve each virtual address yourself as part of Part A step 4. Use the RIP-relative formula: next-instruction address + signed displacement = target address. Show your arithmetic.
Part A: Trace the control flow
Before writing any C, draw or describe the control flow graph of this function:
- Identify all basic blocks (a basic block ends at a branch or return instruction and begins at a branch target).
- List each basic block by its start address.
- Draw the edges: which basic block flows to which, and under what condition?
Label the edges with the condition: je (equal/zero), jne (not equal/nonzero), fall-through (no jump taken).
- For each of the three
lea rsi, [rip+N]instructions (0x401169,0x401192,0x4011a4): compute the virtual address of the string it loads. Show your arithmetic -- state the next-instruction address, add the displacement, and give the result. This is the same RIP-relative resolution process a debugger performs at run time.
Part B: Identify patterns
For each pattern from Weeks 4-5 that you recognize, label it:
-
Null pointer check: Where in the function is a null check performed? What register is checked? What is the behavior if null?
-
Loop: Is there a backward jump? What address does it jump to? What is the loop condition?
-
Struct access pattern: The function reads from
[rbx]and[rbx+0x8]and[rbx+0x10]. What does this suggest about the parameter's type? -
Calling convention: What is the first argument to this function (in
rdiat entry)? What is it used for?
Part C: Plausible C reconstruction
Write a plausible C source for this function. You are reconstructing, not recovering -- there may be multiple valid C sources that compile to equivalent code.
Requirements for your reconstruction:
- Use a named struct that accounts for the three fields accessed (
[rbx],[rbx+0x8],[rbx+0x10]) - Include the two string constants as visible calls
- Match the control flow you identified in Part A
- Include a comment for each control-flow decision: why you interpreted it as you did
Example skeleton to help you get started:
struct ListNode {
/* your fields here, based on the access offsets */
};
void FUN_00401160(/* what type does rdi hold? */) {
/* your reconstruction here */
}
Part D: Ghidra cross-check
Import the lab binary into Ghidra (the instructor provides the full binary, not just the listing). Self-paced fallback: see labs/_artifacts/README.md ("Self-paced fallback: Lab 5") for a C source + compile command that produces a structurally similar binary; use the resulting lab5_target in place of the instructor binary. The control-flow patterns are equivalent, though function addresses will differ. Navigate to the linked-list traversal function (search for a FUN_ label that calls fprintf twice), not necessarily FUN_00401160. Look at the decompiler output.
- Does Ghidra's decompiler identify the same control-flow structure you did?
- Does Ghidra correctly identify the linked-list traversal (or whatever you determined the loop does)?
- Does Ghidra's pseudo-C look similar to your reconstruction? Where does it differ?
- Did the decompiler add or remove any structure that surprised you?
Part E: Reflection
Write a paragraph (100-150 words) answering: what was the hardest part of the reconstruction? Where did your reconstruction match the decompiler exactly? Where did it differ, and whose version is closer to what you think the original source was?
Lab Report
Submit one document with Parts A through E:
- Part A: control flow graph (diagram or structured description)
- Part B: four pattern identifications with evidence from the listing
- Part C: your C reconstruction with comments
- Part D: a paragraph comparing your reconstruction to Ghidra's output
- Part E: reflection paragraph
Grading
| Criterion | Points |
|---|---|
| Part A: Control flow graph complete and accurate | 20 |
| Part B: Four patterns identified with specific evidence | 20 |
| Part C: C reconstruction is structurally correct and internally consistent | 35 |
| Part D: Comparison with decompiler is accurate and analytical | 15 |
| Part E: Reflection is genuine (not just "Ghidra was right") | 10 |
| Total | 100 |
A reconstruction that disagrees with Ghidra but is well-reasoned earns high marks in Part C. A reconstruction that copies Ghidra's output without independent analysis earns zero in Part C.
Lab 5 of 9. Due: end of Week 5. The assembly-to-C translation skill is the core RE skill; every subsequent lab uses it.