Classroom Public page

RE-011 Week 5: x86-64 Assembly II

984 words

Control flow, conditional jumps, loops as backward jumps, switch statements as jump tables. The patterns that let you reconstruct C from disassembly.


Reading (~45 min)

From Yurichev RE4B: read the chapters on "If-then-else," "Loops," and "Switch statement." Yurichev walks through each pattern -- the C source, the compiler output at various optimisation levels, and the conceptual mapping. Read at least the x86-64 sections; the 32-bit sections are useful but optional for RE-011.

From OST2 Architecture 1001: complete the control flow and flags modules.


Lecture outline (~1.5 hr)

Part 1: The flags register and conditional jumps (25 min)

x86-64 has a FLAGS register whose individual bits record the result of the most recent arithmetic or comparison instruction. The bits relevant to control flow:

  • ZF (zero flag): set if the result was zero
  • SF (sign flag): set if the result was negative (high bit set)
  • CF (carry flag): set if the operation produced a carry (unsigned overflow)
  • OF (overflow flag): set if the operation produced signed overflow

The cmp a, b instruction subtracts b from a and sets the flags without storing the result. The test a, b instruction performs a bitwise AND and sets the flags.

After cmp or test, conditional jump instructions read the flags:

Instruction Condition Typical C
je / jz ZF=1 if (a == b)
jne / jnz ZF=0 if (a != b)
jl / jnge SF != OF if (a < b) (signed)
jle / jng ZF=1 or SF != OF if (a <= b) (signed)
jg / jnle ZF=0 and SF == OF if (a > b) (signed)
jge / jnl SF == OF if (a >= b) (signed)
jb / jnae CF=1 if (a < b) (unsigned)
ja / jnbe CF=0 and ZF=0 if (a > b) (unsigned)
js SF=1 result is negative
jns SF=0 result is non-negative

The signed vs. unsigned distinction (e.g., jl vs. jb) matters when reconstructing C: it tells you whether the operands are signed or unsigned values in the original source.

Part 2: Control flow patterns (30 min)

if/else:

cmp  rdi, 0          ; if (arg1 == 0) ...
je   .else_branch
; ... then branch code ...
jmp  .end
.else_branch:
; ... else branch code ...
.end:

Reading this in disassembly: find the cmp or test, find the conditional jump, follow both paths. The fall-through path (no jump) is the "then" branch; the jump target is the "else" branch (or vice versa if the condition is inverted). A jmp at the end of the "then" branch skips over the "else" branch.

Loops -- all loops are backward jumps:

A loop in assembly is just a conditional jump that goes backward (to a lower address, or more precisely, to an address before the current instruction pointer). Any time you see a conditional jump pointing backward in the disassembly, assume a loop until proven otherwise.

; while (counter < limit) { body; counter++; }
mov  ecx, 0          ; counter = 0
.loop_top:
cmp  ecx, edi        ; counter < limit?
jge  .loop_exit      ; if not, exit
; ... loop body ...
inc  ecx             ; counter++
jmp  .loop_top       ; back to condition check
.loop_exit:

For-loop pattern: same structure; init before the loop, increment at the bottom, condition at the top. Do-while: body comes before the condition check; the backward jump is always taken at least once.

test rax, rax / test rdi, rdi: This is how compilers check for zero or null pointer. test rax, rax ANDs rax with itself (result = rax); if zero, ZF=1. You will see this constantly in place of cmp rax, 0. Recognize it immediately as a null check or zero check.

sete, setne, setl, etc.: These set a byte register to 0 or 1 based on a flag condition. sete al is equivalent to al = (ZF == 1). Common when the comparison result is stored in a variable rather than immediately branched on.

Part 3: Switch statements as jump tables (20 min)

A C switch statement with many cases is often compiled to a jump table: an array of addresses where each entry corresponds to one case value. The compiled pattern:

; switch (n) { case 0: ...; case 1: ...; case 2: ...; }
cmp  rdi, 2          ; range check: is n > 2?
ja   .default        ; if so, jump to default
lea  rax, [rip + table]
movsxd rax, DWORD PTR [rax + rdi*4]  ; load entry from table
add  rax, rax_base   ; adjust (relative table encoding)
jmp  rax             ; jump to the case handler
.table:
  .long case_0 - .table
  .long case_1 - .table
  .long case_2 - .table

In Ghidra, jump tables are recognized automatically and shown in the listing view as a computed CALL or JUMP with an arrow pointing to each possible target. The decompiler shows them as switch statements. In raw objdump output, the jmp rax looks like a dynamic dispatch -- you need to find the table reference to understand all possible targets.

The presence of a jump table in a binary tells you: this function has a multiway branch with 3+ cases, and the cases were dense enough that the compiler chose table lookup over a chain of comparisons.


Lab exercises (~1.5 hr)

Lab 5: Assembly-to-C reconstruction

See labs/lab-5-assembly-to-c.md for the full specification.

You are given a stripped binary containing a 50-instruction function with no source code. Using objdump -d and the control-flow patterns from Weeks 4-5, you reconstruct a plausible C source for the function. You label each pattern you identify (if/else, loop, comparison type) and explain your reasoning. Ghidra's decompiler is available as a cross-check; you produce your own reconstruction first, then compare.

CrackMe ladder

Solve at least one more CrackMe from your Week 4 attempt. Document in your Tool Journal: what the check function does (in control-flow terms), where the key comparison happens, and what the correct input is. You are now reading disassembly to find the check; that is the core RE-011 skill.


Independent practice (~3 hr)

  • Yurichev RE4B: Read the "Arrays," "Structures," and "Working with strings" chapters. These come up in the Ghidra weeks.
  • Tool Journal: Add a control-flow pattern reference. Four entries: if/else pattern, while loop pattern, do-while pattern, jump table indicator. For each: what the assembly looks like, what the C looks like, how you tell the difference.
  • CrackMe ladder: Attempt a second CrackMe or continue with the Week 4 challenge. Document your progress regardless of whether you crack it.

Reflection prompts

  1. The jl instruction (jump-if-less) uses signed comparison, while jb uses unsigned comparison. If you see jb in a disassembly, what does that tell you about how the original C source treated the operands? Give an example where getting the signed/unsigned distinction wrong would cause you to misread the function's behavior.

  2. Loops in assembly are backward jumps. A disassembler does not know the difference between a loop and a goto. In C, goto is considered bad practice; in assembly, all backward jumps look the same. What does this mean for the reliability of the C reconstruction you produce in Lab 5?

  3. Compiler optimisation at -O2 often eliminates the frame pointer (rbp), using rsp-relative addressing instead. What is the consequence for a reverse engineer who is trying to identify local variables? What information did they have with the frame pointer that they no longer have without it?


Week 5 of 14. Next: Ghidra I -- project setup, the auto-analyser, navigation, and the decompiler view.