Compile the same C source at -O0, -O2, and -O3. Diff the disassembly listings. Understand what optimisation does to your job as a reverse engineer.
Overview
Compiler optimisation transforms correct code into faster or smaller code. For the reverse engineer, optimisation changes the disassembly: it may inline functions, unroll loops, eliminate redundant computations, rearrange basic blocks, and remove the frame pointer. This lab makes those changes visible by comparing the same source compiled three ways.
Tools: gcc, objdump, diff, wc
Time: ~90 minutes.
Setup
Save the following C source as lab3.c. It is intentionally simple enough to compile cleanly but contains patterns that optimisation transforms significantly:
#include <stdio.h>
static int square(int x) {
return x * x;
}
static int sum_of_squares(int n) {
int total = 0;
for (int i = 1; i <= n; i++) {
total += square(i);
}
return total;
}
static int classify(int x) {
if (x < 0) {
return -1;
} else if (x == 0) {
return 0;
} else {
return 1;
}
}
int main(void) {
int result = sum_of_squares(10);
int class_a = classify(result);
int class_b = classify(-5);
int class_c = classify(0);
printf("%d %d %d %d\n", result, class_a, class_b, class_c);
return 0;
}
Compile three versions:
gcc -O0 -o lab3_O0 lab3.c
gcc -O2 -o lab3_O2 lab3.c
gcc -O3 -o lab3_O3 lab3.c
Part A: Size comparison
Before reading the disassembly, check the binary and section sizes:
ls -l lab3_O0 lab3_O2 lab3_O3
size lab3_O0 lab3_O2 lab3_O3
objdump -d lab3_O0 | wc -l
objdump -d lab3_O2 | wc -l
objdump -d lab3_O3 | wc -l
Record:
- Binary sizes in bytes for all three.
.textsection size for all three (fromsizeoutput).- Disassembly line counts for all three.
- Does
-O3always produce more or fewer instructions than-O2? Write your initial hypothesis.
Part B: Disassembly extraction
Extract and save the disassembly of each version:
objdump -d lab3_O0 > O0.asm
objdump -d lab3_O2 > O2.asm
objdump -d lab3_O3 > O3.asm
Then compare O0 to O2:
diff O0.asm O2.asm | head -60
And compare O2 to O3:
diff O2.asm O3.asm | head -40
Part C: Function inlining
In the -O0 binary, square and classify should appear as separate named functions in the symbol table:
nm lab3_O0 | grep -E '(square|classify|sum_of_squares)'
nm lab3_O2 | grep -E '(square|classify|sum_of_squares)'
- Are
square,classify, andsum_of_squarespresent as separate functions in-O2or-O3? What happened? - Open
O0.asmand find thesum_of_squaresfunction. Identify thecallinstruction that callssquare. - Open
O2.asm(orO3.asm). Is there still a separatesquarefunction? Is there still acall squareinsum_of_squares? If not, what does the code look like where the call used to be?
This is function inlining: the compiler replaces the call to square with the body of square directly, eliminating the call overhead.
Part D: Loop transformation
Find the loop in sum_of_squares in each version:
In -O0, the loop should have:
- An initialization before the loop top
- A comparison instruction at the loop top
- A backward jump at the loop bottom
- In
O0.asm, find the loop insum_of_squares. Identify the backward jump instruction and its target. - In
O2.asmorO3.asm, does the same structure exist? Describe what you find. (Look for: unrolled iterations, SIMD instructions likexmm, or a completely different structure if the compiler constant-folded the entire computation.)
At -O3, the compiler may recognize that sum_of_squares(10) can be computed at compile time (constant folding) and emit a single mov eax, <result> instead of any loop at all. If this happens, document it.
Part E: Frame pointer and calling convention
The frame pointer (rbp) is used at -O0 to track the stack frame. At higher optimisation levels, the compiler may omit it (-fomit-frame-pointer is implicit at -O2).
- In
O0.asm, find the prologue ofmain. Ispush rbp/mov rbp, rsppresent? - In
O2.asm, find the prologue ofmain(or whatever main is reduced to). Isrbpused? - If
rbpis absent in-O2, how are local variables addressed? (They will usersp-relative addressing instead, e.g.,[rsp+0x4].)
Part F: RE consequences (written analysis)
Write a paragraph (150-200 words) on each of the following three consequences:
-
Function inlining. A stripped, optimised binary has
squareinlined intosum_of_squares. You are analyzing the binary and do not see asquarefunction at all. How does this change your analysis process? What risk does this create for completeness? -
Loop transformation / constant folding. If the compiler constant-folded
sum_of_squares(10)to a literal, what does the decompiler show? What challenge does this create for understanding the original algorithm? -
Frame pointer elimination. A debugger (gdb) uses the frame pointer chain to produce stack traces. If
rbpis not used as a frame pointer, what happens tobacktracein gdb? How does this affect dynamic analysis?
Lab Report
Submit one document with sections A through F. For each quantitative question, show the command you ran and quote the relevant output. For the written analysis in Part F, aim for analytical depth over length.
Grading
| Criterion | Points |
|---|---|
| Part A: Size comparison accurate and hypothesis stated | 10 |
| Part B: Diff commands run; output quoted | 5 |
| Part C: Function inlining identified and explained | 20 |
| Part D: Loop transformation characterized accurately | 25 |
| Part E: Frame pointer elimination identified | 15 |
| Part F: Three RE consequences analyzed with specificity | 25 |
| Total | 100 |
Lab 3 of 9. Due: end of Week 4. Compiler optimisation is one of the most common reasons assembly looks different from what you expect from the C source.