Classroom Public page

RE-011 Lab 3: Compiler Optimisation

689 words

Compile the same C source at -O0, -O2, and -O3. Diff the disassembly listings. Understand what optimisation does to your job as a reverse engineer.


Overview

Compiler optimisation transforms correct code into faster or smaller code. For the reverse engineer, optimisation changes the disassembly: it may inline functions, unroll loops, eliminate redundant computations, rearrange basic blocks, and remove the frame pointer. This lab makes those changes visible by comparing the same source compiled three ways.

Tools: gcc, objdump, diff, wc

Time: ~90 minutes.


Setup

Save the following C source as lab3.c. It is intentionally simple enough to compile cleanly but contains patterns that optimisation transforms significantly:

#include <stdio.h>

static int square(int x) {
    return x * x;
}

static int sum_of_squares(int n) {
    int total = 0;
    for (int i = 1; i <= n; i++) {
        total += square(i);
    }
    return total;
}

static int classify(int x) {
    if (x < 0) {
        return -1;
    } else if (x == 0) {
        return 0;
    } else {
        return 1;
    }
}

int main(void) {
    int result = sum_of_squares(10);
    int class_a = classify(result);
    int class_b = classify(-5);
    int class_c = classify(0);
    printf("%d %d %d %d\n", result, class_a, class_b, class_c);
    return 0;
}

Compile three versions:

gcc -O0 -o lab3_O0 lab3.c
gcc -O2 -o lab3_O2 lab3.c
gcc -O3 -o lab3_O3 lab3.c

Part A: Size comparison

Before reading the disassembly, check the binary and section sizes:

ls -l lab3_O0 lab3_O2 lab3_O3
size lab3_O0 lab3_O2 lab3_O3
objdump -d lab3_O0 | wc -l
objdump -d lab3_O2 | wc -l
objdump -d lab3_O3 | wc -l

Record:

  1. Binary sizes in bytes for all three.
  2. .text section size for all three (from size output).
  3. Disassembly line counts for all three.
  4. Does -O3 always produce more or fewer instructions than -O2? Write your initial hypothesis.

Part B: Disassembly extraction

Extract and save the disassembly of each version:

objdump -d lab3_O0 > O0.asm
objdump -d lab3_O2 > O2.asm
objdump -d lab3_O3 > O3.asm

Then compare O0 to O2:

diff O0.asm O2.asm | head -60

And compare O2 to O3:

diff O2.asm O3.asm | head -40

Part C: Function inlining

In the -O0 binary, square and classify should appear as separate named functions in the symbol table:

nm lab3_O0 | grep -E '(square|classify|sum_of_squares)'
nm lab3_O2 | grep -E '(square|classify|sum_of_squares)'
  1. Are square, classify, and sum_of_squares present as separate functions in -O2 or -O3? What happened?
  2. Open O0.asm and find the sum_of_squares function. Identify the call instruction that calls square.
  3. Open O2.asm (or O3.asm). Is there still a separate square function? Is there still a call square in sum_of_squares? If not, what does the code look like where the call used to be?

This is function inlining: the compiler replaces the call to square with the body of square directly, eliminating the call overhead.


Part D: Loop transformation

Find the loop in sum_of_squares in each version:

In -O0, the loop should have:

  • An initialization before the loop top
  • A comparison instruction at the loop top
  • A backward jump at the loop bottom
  1. In O0.asm, find the loop in sum_of_squares. Identify the backward jump instruction and its target.
  2. In O2.asm or O3.asm, does the same structure exist? Describe what you find. (Look for: unrolled iterations, SIMD instructions like xmm, or a completely different structure if the compiler constant-folded the entire computation.)

At -O3, the compiler may recognize that sum_of_squares(10) can be computed at compile time (constant folding) and emit a single mov eax, <result> instead of any loop at all. If this happens, document it.


Part E: Frame pointer and calling convention

The frame pointer (rbp) is used at -O0 to track the stack frame. At higher optimisation levels, the compiler may omit it (-fomit-frame-pointer is implicit at -O2).

  1. In O0.asm, find the prologue of main. Is push rbp / mov rbp, rsp present?
  2. In O2.asm, find the prologue of main (or whatever main is reduced to). Is rbp used?
  3. If rbp is absent in -O2, how are local variables addressed? (They will use rsp-relative addressing instead, e.g., [rsp+0x4].)

Part F: RE consequences (written analysis)

Write a paragraph (150-200 words) on each of the following three consequences:

  1. Function inlining. A stripped, optimised binary has square inlined into sum_of_squares. You are analyzing the binary and do not see a square function at all. How does this change your analysis process? What risk does this create for completeness?

  2. Loop transformation / constant folding. If the compiler constant-folded sum_of_squares(10) to a literal, what does the decompiler show? What challenge does this create for understanding the original algorithm?

  3. Frame pointer elimination. A debugger (gdb) uses the frame pointer chain to produce stack traces. If rbp is not used as a frame pointer, what happens to backtrace in gdb? How does this affect dynamic analysis?


Lab Report

Submit one document with sections A through F. For each quantitative question, show the command you ran and quote the relevant output. For the written analysis in Part F, aim for analytical depth over length.


Grading

Criterion Points
Part A: Size comparison accurate and hypothesis stated 10
Part B: Diff commands run; output quoted 5
Part C: Function inlining identified and explained 20
Part D: Loop transformation characterized accurately 25
Part E: Frame pointer elimination identified 15
Part F: Three RE consequences analyzed with specificity 25
Total 100

Lab 3 of 9. Due: end of Week 4. Compiler optimisation is one of the most common reasons assembly looks different from what you expect from the C source.