RE-011 Week 4: x86-64 Assembly I · RE-011 · Virtus Cyber Academy Classroom

Registers, the stack, and the System V AMD64 calling convention. Reading disassembly without running the binary. The CrackMe ladder begins.

Reading (~45 min)

From OST2 Architecture 1001 (ost2.fyi): complete the x86-64 registers module and the stack module. These are the specific modules that cover what this week's lecture expects; the lecture builds on them rather than duplicating them.

From Yurichev RE4B (beginners.re): read the "Couple more words about registers" and "Hello, world!" chapter sections. Yurichev works through a simple compiled C function step by step from compiler output to human-readable interpretation. This is the pattern you will use every week.

Lecture outline (~1.5 hr)

Part 1: x86-64 registers (20 min)

The CPU's registers are its fastest storage: a small set of named locations that hold values the processor is actively working with. x86-64 has 16 general-purpose 64-bit registers:

Conventional use column reflects System V AMD64 (Linux x86-64). Windows x64 uses rcx/rdx/r8/r9 as the first four integer arguments instead of rdi/rsi/rdx/rcx; see Yurichev RE4B Appendix A for a full cross-reference.

64-bit	32-bit	16-bit	8-bit (low)	Conventional use
rax	eax	ax	al	Return value; accumulator
rbx	ebx	bx	bl	Base register; callee-saved
rcx	ecx	cx	cl	4th argument; loop counter
rdx	edx	dx	dl	3rd argument
rsi	esi	si	sil	2nd argument; source index
rdi	edi	di	dil	1st argument; destination index
rsp	esp	sp	spl	Stack pointer (top of stack)
rbp	ebp	bp	bpl	Base pointer (frame; callee-saved)
r8	r8d	r8w	r8b	5th argument
r9	r9d	r9w	r9b	6th argument
r10	r10d	r10w	r10b	Temporary
r11	r11d	r11w	r11b	Temporary
r12-r15	...	...	...	Callee-saved general purpose

rip (instruction pointer) always contains the address of the next instruction to execute. You cannot move a value directly into rip -- you change it with jump and call instructions.

The 32-bit register forms (eax, ebx, etc.) zero-extend into the 64-bit register when written. Writing eax clears the upper 32 bits of rax. This is an x86-64 design choice that catches beginners; Ghidra and objdump both show the shorter form when the compiler uses it.

Part 2: The stack (20 min)

The stack is a region of memory used for function call management. On x86-64:

The stack grows downward (from higher addresses to lower addresses).
rsp always points to the last byte pushed onto the stack (the "top" of the stack, which is the lowest address).
push rax decrements rsp by 8, then writes rax to [rsp].
pop rax reads [rsp] into rax, then increments rsp by 8.

A function's stack frame is the region between rbp (frame base) and rsp (stack top) that the function owns. The frame contains:

Saved caller registers (those the callee must preserve)
Local variables
Space for call arguments if there are more than 6 (the extra arguments go on the stack, not in registers)

The canonical function prologue:

push rbp          ; save caller's frame base
mov  rbp, rsp     ; set our frame base to current stack top
sub  rsp, 0x30    ; allocate 48 bytes for local variables

The canonical function epilogue:

leave             ; equivalent to: mov rsp, rbp / pop rbp
ret               ; pop return address into rip

Recognizing prologue and epilogue is the most reliable way to find function boundaries in stripped binaries. Ghidra finds them automatically; in a hex dump or raw objdump output you find them by eye.

Part 3: The System V AMD64 calling convention (20 min)

A calling convention is an agreement between the caller and the callee about how arguments are passed and how the stack is managed. Linux x86-64 uses the System V AMD64 ABI (Application Binary Interface).

Argument passing: The first six integer/pointer arguments go in registers, in order:

rdi
rsi
rdx
rcx
r8
r9

Arguments beyond six go on the stack (right to left). Floating-point arguments use the xmm registers (xmm0 through xmm7).

Return value: Integer/pointer return values go in rax. 64-bit return values use rax alone. Some 128-bit returns use rdx:rax.

Callee-saved registers (the callee must preserve these): rbx, rbp, r12, r13, r14, r15. If a function uses any of these, it must save them to the stack in its prologue and restore them in its epilogue.

Caller-saved registers (the callee may trash these): rax, rcx, rdx, rsi, rdi, r8, r9, r10, r11. If the caller needs these values across a function call, it saves them before the call.

Why this matters for RE: when you see a call instruction, you immediately know:

What rdi contained just before the call = the first argument
What rax contains just after the call = the return value

This lets you trace data flow through a disassembly without running the binary.

Part 4: Reading objdump output -- a worked example (10 min)

0000000000401136 <main>:
  401136: 55                    push   rbp
  401137: 48 89 e5              mov    rbp,rsp
  40113a: 48 83 ec 10           sub    rsp,0x10
  40113e: 89 7d fc              mov    DWORD PTR [rbp-0x4],edi
  401141: 48 89 75 f0           mov    QWORD PTR [rbp-0x10],rsi
  401145: bf 08 20 40 00        mov    edi,0x402008
  40114a: e8 e1 fe ff ff        call   401030 <puts@plt>
  40114f: b8 00 00 00 00        mov    eax,0x0
  401154: c9                    leave
  401155: c3                    ret

Reading this from top to bottom:

push rbp / mov rbp, rsp / sub rsp, 0x10 -- standard prologue, 16 bytes of local space
mov DWORD PTR [rbp-0x4], edi -- saves argc (first arg to main, in edi) to a local variable at rbp-4
mov QWORD PTR [rbp-0x10], rsi -- saves argv (second arg, in rsi) to local at rbp-16
mov edi, 0x402008 -- loads a pointer (0x402008, probably a string address) as first argument
call 401030 <puts@plt> -- calls puts. First arg = edi = that string pointer. This is a puts(some_string) call.
mov eax, 0x0 -- return value = 0 (success)
leave / ret -- standard epilogue

Without running the binary: this is a main function that calls puts with a string at virtual address 0x402008 and returns 0. To inspect the string, navigate to 0x402008 in Ghidra (which works in virtual addresses) -- or use objdump -s -j .rodata binary to dump the .rodata section and locate the bytes at the right offset. Note that 0x402008 is a virtual address, not a file offset; looking at raw byte 0x402008 in xxd will give the wrong bytes. Week 11 covers the virtual-address-to-file-offset translation explicitly.

Lab exercises (~1.5 hr)

Lab 3: Compiler optimisation

See labs/lab-3-compiler-optimisation.md for the full specification.

You compile the same C source at -O0, -O2, and -O3, then use objdump -d to compare the three disassembly listings. You document at least three concrete differences (e.g., loop unrolling, inlining, constant folding) and explain the RE consequence of each: how does optimisation change your job as someone reading the binary?

CrackMe ladder begins

Start your CrackMe ladder. Pick the easiest available challenge on crackmes.one that targets Linux/x86-64. Attempt to understand it using only the static tools from Weeks 1-4: file, xxd, strings, readelf, nm, objdump -d. Document your approach in your Tool Journal. You do not need to crack it this week -- understanding the structure is the goal.

Independent practice (~3 hr)

OST2 Architecture 1001: Continue through the instructions and addressing modes modules.
Tool Journal: Document the System V AMD64 calling convention. A one-page reference you will look at every time you analyze a function. Include: argument registers in order, return value register, callee-saved registers, stack growth direction.
objdump practice: Disassemble /bin/ls with objdump -d /bin/ls | less. Find main. Identify the prologue and epilogue. Find at least three function calls and identify what register holds the first argument for each.

Reflection prompts

The System V AMD64 ABI specifies that the first six integer arguments go in rdi, rsi, rdx, rcx, r8, r9. If a function has a seventh argument, where does it go, and how does the callee access it? (Answer in terms of rbp offsets or rsp offsets.)
Writing eax clears the upper 32 bits of rax. Why would compiler designers choose this behavior? What advantage does it offer over an architecture where a 32-bit write only modifies the lower 32 bits?
A reverse engineer sees a function that begins with:
```
push r15
push r14
push r13
push r12
push rbp
push rbx
sub rsp, 0x18
```
What can you infer about this function without reading any further? What convention is the compiler following? How many local bytes of stack space is allocated?

Week 4 of 14. Next: x86-64 assembly II -- control flow, conditional jumps, loops, and jump tables.