RE-011 Week 7: Ghidra II · RE-011 · Virtus Cyber Academy Classroom

Cross-references, data-type inference, struct recovery, and using the decompiler as a conversation rather than a lookup. Working effectively on larger binaries.

Reading (~30 min)

From the Ghidra Student Manual: read the "Searching," "Cross References," and "Data Types" sections. If the PDF is unavailable, the equivalent material is in the Ghidra online help (Help > Contents from within Ghidra).

From Yurichev RE4B: read the "Structures" and "Pointers to functions" chapters. These two patterns appear constantly in C binaries and the decompiler handles them better once you know what to look for.

Lecture outline (~1.5 hr)

Part 1: Cross-references (30 min)

A cross-reference (xref) is a record of every place in the binary that references a given address. Ghidra tracks xrefs automatically during analysis.

Why xrefs matter: Finding what calls a function is often more useful than reading the function itself. If you find a strcmp call but cannot figure out what it is comparing, look at the xref to see what calls it and what precedes the call.

Viewing xrefs in Ghidra:

In the listing view, right-click a function name or address > "Show References To" (or Ctrl-Shift-F). A window opens listing every caller.
In the decompiler, function names are clickable; clicking shows the definition. To see who calls the current function, use "References > Show References To Address" from the menu or Ctrl-Shift-F while the cursor is on the function name.
The listing view shows inline xref comments: ; XREF[3]: called from FUN_00401200, FUN_00403100, FUN_004050a0

Finding where a string is used: Search for a string in the listing (Search > Memory or S), navigate to the string's address, then view xrefs to that address. Every reference to that string is a code location that uses it. This is how you find the check function in a CrackMe: find the "Wrong password" string, follow its xref to the comparison code.

Finding where a global variable is written and read: Same technique. Navigate to the variable's address, view xrefs. Ghidra distinguishes READ references from WRITE references.

Part 2: Data-type inference and struct recovery (30 min)

The decompiler guesses at data types. Its defaults are often wrong:

A pointer to char (a string) might be shown as long * if Ghidra did not recognize the type.
A struct might be shown as accesses to *(param1 + 0), *(param1 + 8), *(param1 + 16) if Ghidra did not reconstruct the struct definition.

Refining types in the decompiler:

Right-click any variable in the decompiler > "Retype Variable" to change its type. If a parameter is a pointer to a known struct type, giving it the correct type changes all derived accesses to named field references.

Struct recovery: When you see a function that accesses a parameter at multiple fixed offsets (*(param + 0x0), *(param + 0x8), *(param + 0x10), ...), this is almost certainly a struct. The offsets are field offsets. To reconstruct:

List all offsets accessed and their sizes (4-byte access = int/float; 8-byte = long/pointer).
Create a new struct in Ghidra's Data Type Manager (Window > Data Type Manager, right-click your project > New > Structure).
Add fields at the appropriate offsets with appropriate types.
Apply the struct type to the parameter in the decompiler.

After applying, accesses like *(param + 0x8) become param->field_1 (or whatever you name the field). The decompiler becomes far more readable.

Function pointers: A struct field of type code * is a function pointer. In C this is common for callbacks, dispatch tables, and polymorphism. In the decompiler it shows as (**(param->vtable + 0x10))(args) or similar. Recognize the pattern: a pointer dereference used as a call target.

Part 3: The decompiler as a conversation (20 min)

The Ghidra decompiler is not an oracle; it is a hypothesis generator. Treat every decompiler output as a starting point, not a conclusion.

Decompiler output changes as you improve the model. Every rename, every retype, every struct definition you apply changes the decompiler's output for the functions that use those names and types. A session that starts with unreadable pointer arithmetic can end with clear, named struct field access -- not because the binary changed, but because your annotations improved Ghidra's model.

When the decompiler lies:

Dead code: The decompiler may show code paths the binary never takes (because of constant conditions the analyser did not resolve). Verify against the listing.
Wrong types: An int shown as undefined4 may be a pointer; a long may be two separate shorts packed together.
Inlined functions: A function the decompiler shows as a single block may actually be two inlined functions from the original source. The listing view (which shows raw instructions) is the ground truth when the decompiler is confused.
Optimised tail calls: A JMP at the end of a function to another function is a tail call optimisation. The decompiler may miss it.

Always cross-check an important decompiler conclusion against the listing. Two views, two interpretations; the listing is authoritative.

Lab walk: Ghidra CrackMe session (~1 hr, ungraded)

An instructor-led session working through a mid-difficulty CrackMe (rating 2-3 on crackmes.one) using Ghidra. The session demonstrates:

Import and auto-analyse.
Find the check function via xref from the "Wrong" string.
Reconstruct the check logic using cross-references and the decompiler.
Rename all functions and variables as understanding develops.
Identify the correct key without running the binary.

Students follow along and document what Ghidra views were most useful in their Tool Journal.

Independent practice (~4 hr)

CrackMe ladder: Solve two CrackMes this week using Ghidra as primary tool. At least one should be at difficulty 2 or higher. Document both in your Tool Journal: what made each one harder than the last, which Ghidra feature was most useful.
Struct recovery exercise: Find a function in any binary you have imported that accesses a parameter at multiple fixed offsets. Reconstruct the struct. Verify by checking that the decompiler output becomes more readable after the struct type is applied.
Tool Journal: Document the three Ghidra workflows from this week: finding xrefs to a string, retyping a variable, creating and applying a struct. Step-by-step notes you can follow in Week 8+ when you have forgotten the menu path.

Reflection prompts

The decompiler shows *(param + 0x10) and *(param + 0x18). How do you determine whether these are fields in the same struct or two separate variables that happen to be adjacent in memory? What evidence would you look for?
Cross-references are bi-directional: Ghidra tracks references TO an address and references FROM an address. Describe a reverse engineering scenario where the references FROM a function (i.e., what it calls) are more useful than the references TO it (i.e., what calls it). Describe a scenario where the reverse is true.
The decompiler may show a variable as char * (pointer to a string) when the original C source used a different type. What is the observable evidence in the decompiler that a pointer is pointing to a string versus pointing to a raw buffer of bytes? When does the distinction matter for understanding what a function does?

Week 7 of 14. Next: radare2 / rizin / cutter -- the alternative tradition in binary analysis.