Registers, the stack, and the System V AMD64 calling convention. Reading disassembly without running the binary. The CrackMe ladder begins.
Reading (~45 min)
From OST2 Architecture 1001 (ost2.fyi): complete the x86-64 registers module and the stack module. These are the specific modules that cover what this week's lecture expects; the lecture builds on them rather than duplicating them.
From Yurichev RE4B (beginners.re): read the "Couple more words about registers" and "Hello, world!" chapter sections. Yurichev works through a simple compiled C function step by step from compiler output to human-readable interpretation. This is the pattern you will use every week.
Lecture outline (~1.5 hr)
Part 1: x86-64 registers (20 min)
The CPU's registers are its fastest storage: a small set of named locations that hold values the processor is actively working with. x86-64 has 16 general-purpose 64-bit registers:
Conventional use column reflects System V AMD64 (Linux x86-64). Windows x64 uses rcx/rdx/r8/r9 as the first four integer arguments instead of rdi/rsi/rdx/rcx; see Yurichev RE4B Appendix A for a full cross-reference.
| 64-bit | 32-bit | 16-bit | 8-bit (low) | Conventional use |
|---|---|---|---|---|
| rax | eax | ax | al | Return value; accumulator |
| rbx | ebx | bx | bl | Base register; callee-saved |
| rcx | ecx | cx | cl | 4th argument; loop counter |
| rdx | edx | dx | dl | 3rd argument |
| rsi | esi | si | sil | 2nd argument; source index |
| rdi | edi | di | dil | 1st argument; destination index |
| rsp | esp | sp | spl | Stack pointer (top of stack) |
| rbp | ebp | bp | bpl | Base pointer (frame; callee-saved) |
| r8 | r8d | r8w | r8b | 5th argument |
| r9 | r9d | r9w | r9b | 6th argument |
| r10 | r10d | r10w | r10b | Temporary |
| r11 | r11d | r11w | r11b | Temporary |
| r12-r15 | ... | ... | ... | Callee-saved general purpose |
rip (instruction pointer) always contains the address of the next instruction to execute. You cannot move a value directly into rip -- you change it with jump and call instructions.
The 32-bit register forms (eax, ebx, etc.) zero-extend into the 64-bit register when written. Writing eax clears the upper 32 bits of rax. This is an x86-64 design choice that catches beginners; Ghidra and objdump both show the shorter form when the compiler uses it.
Part 2: The stack (20 min)
The stack is a region of memory used for function call management. On x86-64:
- The stack grows downward (from higher addresses to lower addresses).
rspalways points to the last byte pushed onto the stack (the "top" of the stack, which is the lowest address).push raxdecrementsrspby 8, then writesraxto[rsp].pop raxreads[rsp]intorax, then incrementsrspby 8.
A function's stack frame is the region between rbp (frame base) and rsp (stack top) that the function owns. The frame contains:
- Saved caller registers (those the callee must preserve)
- Local variables
- Space for call arguments if there are more than 6 (the extra arguments go on the stack, not in registers)
The canonical function prologue:
push rbp ; save caller's frame base
mov rbp, rsp ; set our frame base to current stack top
sub rsp, 0x30 ; allocate 48 bytes for local variables
The canonical function epilogue:
leave ; equivalent to: mov rsp, rbp / pop rbp
ret ; pop return address into rip
Recognizing prologue and epilogue is the most reliable way to find function boundaries in stripped binaries. Ghidra finds them automatically; in a hex dump or raw objdump output you find them by eye.
Part 3: The System V AMD64 calling convention (20 min)
A calling convention is an agreement between the caller and the callee about how arguments are passed and how the stack is managed. Linux x86-64 uses the System V AMD64 ABI (Application Binary Interface).
Argument passing: The first six integer/pointer arguments go in registers, in order:
rdirsirdxrcxr8r9
Arguments beyond six go on the stack (right to left). Floating-point arguments use the xmm registers (xmm0 through xmm7).
Return value: Integer/pointer return values go in rax. 64-bit return values use rax alone. Some 128-bit returns use rdx:rax.
Callee-saved registers (the callee must preserve these): rbx, rbp, r12, r13, r14, r15. If a function uses any of these, it must save them to the stack in its prologue and restore them in its epilogue.
Caller-saved registers (the callee may trash these): rax, rcx, rdx, rsi, rdi, r8, r9, r10, r11. If the caller needs these values across a function call, it saves them before the call.
Why this matters for RE: when you see a call instruction, you immediately know:
- What
rdicontained just before the call = the first argument - What
raxcontains just after the call = the return value
This lets you trace data flow through a disassembly without running the binary.
Part 4: Reading objdump output -- a worked example (10 min)
0000000000401136 <main>:
401136: 55 push rbp
401137: 48 89 e5 mov rbp,rsp
40113a: 48 83 ec 10 sub rsp,0x10
40113e: 89 7d fc mov DWORD PTR [rbp-0x4],edi
401141: 48 89 75 f0 mov QWORD PTR [rbp-0x10],rsi
401145: bf 08 20 40 00 mov edi,0x402008
40114a: e8 e1 fe ff ff call 401030 <puts@plt>
40114f: b8 00 00 00 00 mov eax,0x0
401154: c9 leave
401155: c3 ret
Reading this from top to bottom:
push rbp / mov rbp, rsp / sub rsp, 0x10-- standard prologue, 16 bytes of local spacemov DWORD PTR [rbp-0x4], edi-- savesargc(first arg to main, inedi) to a local variable at rbp-4mov QWORD PTR [rbp-0x10], rsi-- savesargv(second arg, inrsi) to local at rbp-16mov edi, 0x402008-- loads a pointer (0x402008, probably a string address) as first argumentcall 401030 <puts@plt>-- callsputs. First arg =edi= that string pointer. This is aputs(some_string)call.mov eax, 0x0-- return value = 0 (success)leave / ret-- standard epilogue
Without running the binary: this is a main function that calls puts with a string at virtual address 0x402008 and returns 0. To inspect the string, navigate to 0x402008 in Ghidra (which works in virtual addresses) -- or use objdump -s -j .rodata binary to dump the .rodata section and locate the bytes at the right offset. Note that 0x402008 is a virtual address, not a file offset; looking at raw byte 0x402008 in xxd will give the wrong bytes. Week 11 covers the virtual-address-to-file-offset translation explicitly.
Lab exercises (~1.5 hr)
Lab 3: Compiler optimisation
See labs/lab-3-compiler-optimisation.md for the full specification.
You compile the same C source at -O0, -O2, and -O3, then use objdump -d to compare the three disassembly listings. You document at least three concrete differences (e.g., loop unrolling, inlining, constant folding) and explain the RE consequence of each: how does optimisation change your job as someone reading the binary?
CrackMe ladder begins
Start your CrackMe ladder. Pick the easiest available challenge on crackmes.one that targets Linux/x86-64. Attempt to understand it using only the static tools from Weeks 1-4: file, xxd, strings, readelf, nm, objdump -d. Document your approach in your Tool Journal. You do not need to crack it this week -- understanding the structure is the goal.
Independent practice (~3 hr)
- OST2 Architecture 1001: Continue through the instructions and addressing modes modules.
- Tool Journal: Document the System V AMD64 calling convention. A one-page reference you will look at every time you analyze a function. Include: argument registers in order, return value register, callee-saved registers, stack growth direction.
- objdump practice: Disassemble
/bin/lswithobjdump -d /bin/ls | less. Findmain. Identify the prologue and epilogue. Find at least three function calls and identify what register holds the first argument for each.
Reflection prompts
-
The System V AMD64 ABI specifies that the first six integer arguments go in
rdi, rsi, rdx, rcx, r8, r9. If a function has a seventh argument, where does it go, and how does the callee access it? (Answer in terms ofrbpoffsets orrspoffsets.) -
Writing
eaxclears the upper 32 bits ofrax. Why would compiler designers choose this behavior? What advantage does it offer over an architecture where a 32-bit write only modifies the lower 32 bits? -
A reverse engineer sees a function that begins with:
push r15 push r14 push r13 push r12 push rbp push rbx sub rsp, 0x18
What can you infer about this function without reading any further? What convention is the compiler following? How many local bytes of stack space is allocated?
Week 4 of 14. Next: x86-64 assembly II -- control flow, conditional jumps, loops, and jump tables.