RE-011 Week 3: ELF Format In Depth · RE-011 · Virtus Cyber Academy Classroom

The Executable and Linkable Format is the container every Linux binary lives in. Understanding the container is prerequisite to understanding the code inside it.

Reading (~45 min)

Read the Wikipedia article on the ELF format in full -- including the section header table, program header table, and the list of special sections (.text, .data, .bss, .rodata, .symtab, .strtab, .dynamic, .plt, .got). You do not need to memorize the byte offsets of every field; you need the conceptual model.

Then read the man page for readelf (man readelf). Note the flags: -h (ELF header), -l (program headers / segments), -S (section headers), -s (symbol table), -d (dynamic section), -r (relocations). These are your primary ELF dissection commands.

Lecture outline (~1.5 hr)

Part 1: ELF overview (15 min)

ELF (Executable and Linkable Format) is the binary format used on Linux, most BSDs, Android, and many embedded systems. An ELF file can be:

An executable (ET_EXEC or ET_DYN for position-independent executables) -- ready to run
A shared object (ET_DYN) -- a library, linked at load time
A relocatable object (ET_REL) -- the output of a compiler before linking (a .o file)
A core dump (ET_CORE) -- a snapshot of a process's memory at crash time

For RE-011, the focus is on executables and shared objects.

The ELF header is 64 bytes on 64-bit systems. It starts with the magic number (7F 45 4C 46) and contains:

Architecture (e_machine: EM_386, EM_X86_64, EM_ARM, EM_RISCV, etc.)
Entry point address (e_entry: the virtual address where the OS starts execution)
Offset of the program header table (e_phoff)
Offset of the section header table (e_shoff)
Number of program headers and section headers

ELF64 header byte layout card. 64 bytes laid out as two rows of 32 bytes each. Field rectangles drawn over the byte ruler with labels naming each Elf64_Ehdr field and its purpose: EI_MAG0 through EI_MAG3 (the 7F 45 4C 46 magic, highlighted amber) at offsets 0x00 through 0x03; EI_CLASS, EI_DATA, EI_VERSION, EI_OSABI, EI_ABIVERSION, and EI_PAD in the e_ident block; then e_type, e_machine, e_version, e_entry, e_phoff, e_shoff, e_flags, e_ehsize, e_phentsize, e_phnum, e_shentsize, e_shnum, and e_shstrndx in the rest of the header. Byte-offset ticks every four bytes.

Figure 3.1. The first 64 bytes of every ELF64 executable on your Lab 2 disk. Read the amber four-byte magic from any xxd dump and you have confirmed the file is an ELF; the next bytes tell you 32-vs-64-bit, endianness, ABI, and (further in) the entry-point virtual address. Cross-reference the diagram against readelf -h output and /usr/include/elf.h struct Elf64_Ehdr while you work through Lab 2.

Part 2: Sections vs. segments (25 min)

ELF uses two parallel views of the same file:

Sections are the linker's view. They group related content by type. The section header table describes them. Key sections:

Section	Content
`.text`	Compiled machine code (the actual instructions)
`.data`	Initialized global and static variables (read-write)
`.rodata`	Read-only data (string literals, constants)
`.bss`	Uninitialized global variables (takes no space in the file; loader zeros it at startup)
`.symtab`	Symbol table (function names, variable names, sizes -- only in unstripped binaries)
`.strtab`	String table for symbol names
`.dynsym`	Dynamic symbol table (symbols exported or imported by shared libraries)
`.dynstr`	String table for dynamic symbol names
`.plt`	Procedure Linkage Table (stubs for calls to external functions)
`.got.plt`	Global Offset Table (resolved addresses filled in by the dynamic linker)
`.eh_frame`	Stack unwinding data (used by debuggers and exception handlers)

Segments are the loader's view. They describe how the file should be mapped into memory. Key segment types:

PT_LOAD: a region of the file to be mapped into memory. A typical executable has two: one containing .text (RX permissions) and one containing .data and .bss (RW permissions).
PT_DYNAMIC: the dynamic linking metadata.
PT_INTERP: the path to the dynamic linker (/lib64/ld-linux-x86-64.so.2 on most x86-64 systems).

The practical consequence: when you look at a running process's memory map (/proc/PID/maps), you see segments, not sections. When you look at Ghidra's program tree, you see sections. They are the same bytes, described differently.

Part 3: Symbol tables -- stripped vs. unstripped (20 min)

The .symtab section contains the symbol table: a list of every function and global variable by name, address, size, and type. When you load an unstripped binary into Ghidra, Ghidra can show you main, authenticate_user, compute_checksum, and so on by name.

Stripped binaries have had .symtab removed (using strip or the -s flag to the linker). You see addresses but not names. Ghidra labels functions FUN_00401230, FUN_00402180, etc. Your job is to rename them as you understand what they do.

The .dynsym section is NOT removed by stripping -- it contains the symbols that shared library interoperability requires. So even in a stripped binary you can see which library functions it calls (malloc, strcmp, open, read...) and infer behavior from those call sites.

The nm and readelf -s commands read the symbol table:

nm binary -- list symbols (fails with "no symbols" on fully stripped binary)
nm -D binary -- list only dynamic symbols (works on stripped binary)
readelf -s binary -- same as nm but with more ELF-specific columns
readelf -sW binary -- wide output, prevents column truncation

The strings command finds ASCII-printable sequences of 4+ characters. It does not consult the symbol table; it just scans raw bytes. Even in a stripped binary, strings may reveal: error messages, format strings, hard-coded paths, library names, version strings, and occasionally credentials. Use strings -n 8 to filter out short strings.

Part 4: readelf, objdump, nm in practice (10 min)

Quick reference for the tools you use this week and every week:

# ELF header summary
readelf -h binary

# All section headers
readelf -S binary

# All segment headers (program headers)
readelf -l binary

# Symbol table (if unstripped)
readelf -s binary

# Dynamic linking info (what libraries it needs)
readelf -d binary

# Disassemble the .text section
objdump -d binary

# Disassemble with source interleaved (if debug info present)
objdump -dS binary

# List symbols (nm)
nm binary
nm -D binary           # dynamic symbols only (works on stripped)
nm -u binary           # undefined symbols (imported from libraries)

Lab exercises (~1.5 hr)

Lab 2: ELF section walk

See labs/lab-2-elf-section-walk.md for the full specification.

You compile a provided short C program and then dissect it with readelf, objdump, nm, and strings. You locate each major section, explain what it contains, find the symbol table, compare the output on a stripped vs. unstripped copy, and write a one-paragraph explanation of the .plt and .got.plt role in dynamic linking.

Independent practice (~3 hr)

OST2 Architecture 1001: Begin working through the free OST2 Architecture 1001 course (ost2.fyi). Complete modules through the memory and x86-64 register overview. You will use these modules in parallel with Weeks 3-6.
Tool Journal: Document readelf and objdump. For each: what information it shows, two flags you will use regularly, and one concrete example from Lab 2 where the tool answered a question you had.
CrackMe preview (no ladder credit): Download one "Easy" or "Beginner" CrackMe from crackmes.one. Run readelf -S, nm -D, and strings -n 8 on it. What sections does it have? Is it stripped? What library calls does it make? Write your findings in your Tool Journal. Do not run the binary yet.

Reflection prompts

What is the practical consequence of stripping a binary for a reverse engineer? What information is lost, and what information is NOT lost (and why)?
The .bss section takes no space in the file but is allocated in memory at runtime. How does the operating system know how much space to allocate? (Hint: the section header records the size even though there is no content.) Why would storing uninitialised data this way be more efficient?
The .plt (Procedure Linkage Table) and .got.plt (Global Offset Table) exist to allow lazy binding: external function addresses are resolved on first call, not at program startup. What does this mean for a reverse engineer who sees a call to plt_malloc@0x401030? What does that call actually do at runtime, and what would you need to look at to find the real malloc address?

Week 3 of 14. Next: x86-64 assembly I -- registers, the stack, and the System V calling convention.