Cross-references, data-type inference, struct recovery, and using the decompiler as a conversation rather than a lookup. Working effectively on larger binaries.
Reading (~30 min)
From the Ghidra Student Manual: read the "Searching," "Cross References," and "Data Types" sections. If the PDF is unavailable, the equivalent material is in the Ghidra online help (Help > Contents from within Ghidra).
From Yurichev RE4B: read the "Structures" and "Pointers to functions" chapters. These two patterns appear constantly in C binaries and the decompiler handles them better once you know what to look for.
Lecture outline (~1.5 hr)
Part 1: Cross-references (30 min)
A cross-reference (xref) is a record of every place in the binary that references a given address. Ghidra tracks xrefs automatically during analysis.
Why xrefs matter: Finding what calls a function is often more useful than reading the function itself. If you find a strcmp call but cannot figure out what it is comparing, look at the xref to see what calls it and what precedes the call.
Viewing xrefs in Ghidra:
- In the listing view, right-click a function name or address > "Show References To" (or
Ctrl-Shift-F). A window opens listing every caller. - In the decompiler, function names are clickable; clicking shows the definition. To see who calls the current function, use "References > Show References To Address" from the menu or
Ctrl-Shift-Fwhile the cursor is on the function name. - The listing view shows inline xref comments:
; XREF[3]: called from FUN_00401200, FUN_00403100, FUN_004050a0
Finding where a string is used: Search for a string in the listing (Search > Memory or S), navigate to the string's address, then view xrefs to that address. Every reference to that string is a code location that uses it. This is how you find the check function in a CrackMe: find the "Wrong password" string, follow its xref to the comparison code.
Finding where a global variable is written and read: Same technique. Navigate to the variable's address, view xrefs. Ghidra distinguishes READ references from WRITE references.
Part 2: Data-type inference and struct recovery (30 min)
The decompiler guesses at data types. Its defaults are often wrong:
- A pointer to
char(a string) might be shown aslong *if Ghidra did not recognize the type. - A struct might be shown as accesses to
*(param1 + 0),*(param1 + 8),*(param1 + 16)if Ghidra did not reconstruct the struct definition.
Refining types in the decompiler:
Right-click any variable in the decompiler > "Retype Variable" to change its type. If a parameter is a pointer to a known struct type, giving it the correct type changes all derived accesses to named field references.
Struct recovery: When you see a function that accesses a parameter at multiple fixed offsets (*(param + 0x0), *(param + 0x8), *(param + 0x10), ...), this is almost certainly a struct. The offsets are field offsets. To reconstruct:
- List all offsets accessed and their sizes (4-byte access = int/float; 8-byte = long/pointer).
- Create a new struct in Ghidra's Data Type Manager (Window > Data Type Manager, right-click your project > New > Structure).
- Add fields at the appropriate offsets with appropriate types.
- Apply the struct type to the parameter in the decompiler.
After applying, accesses like *(param + 0x8) become param->field_1 (or whatever you name the field). The decompiler becomes far more readable.
Function pointers: A struct field of type code * is a function pointer. In C this is common for callbacks, dispatch tables, and polymorphism. In the decompiler it shows as (**(param->vtable + 0x10))(args) or similar. Recognize the pattern: a pointer dereference used as a call target.
Part 3: The decompiler as a conversation (20 min)
The Ghidra decompiler is not an oracle; it is a hypothesis generator. Treat every decompiler output as a starting point, not a conclusion.
Decompiler output changes as you improve the model. Every rename, every retype, every struct definition you apply changes the decompiler's output for the functions that use those names and types. A session that starts with unreadable pointer arithmetic can end with clear, named struct field access -- not because the binary changed, but because your annotations improved Ghidra's model.
When the decompiler lies:
- Dead code: The decompiler may show code paths the binary never takes (because of constant conditions the analyser did not resolve). Verify against the listing.
- Wrong types: An
intshown asundefined4may be a pointer; alongmay be two separate shorts packed together. - Inlined functions: A function the decompiler shows as a single block may actually be two inlined functions from the original source. The listing view (which shows raw instructions) is the ground truth when the decompiler is confused.
- Optimised tail calls: A
JMPat the end of a function to another function is a tail call optimisation. The decompiler may miss it.
Always cross-check an important decompiler conclusion against the listing. Two views, two interpretations; the listing is authoritative.
Lab walk: Ghidra CrackMe session (~1 hr, ungraded)
An instructor-led session working through a mid-difficulty CrackMe (rating 2-3 on crackmes.one) using Ghidra. The session demonstrates:
- Import and auto-analyse.
- Find the check function via xref from the "Wrong" string.
- Reconstruct the check logic using cross-references and the decompiler.
- Rename all functions and variables as understanding develops.
- Identify the correct key without running the binary.
Students follow along and document what Ghidra views were most useful in their Tool Journal.
Independent practice (~4 hr)
- CrackMe ladder: Solve two CrackMes this week using Ghidra as primary tool. At least one should be at difficulty 2 or higher. Document both in your Tool Journal: what made each one harder than the last, which Ghidra feature was most useful.
- Struct recovery exercise: Find a function in any binary you have imported that accesses a parameter at multiple fixed offsets. Reconstruct the struct. Verify by checking that the decompiler output becomes more readable after the struct type is applied.
- Tool Journal: Document the three Ghidra workflows from this week: finding xrefs to a string, retyping a variable, creating and applying a struct. Step-by-step notes you can follow in Week 8+ when you have forgotten the menu path.
Reflection prompts
-
The decompiler shows
*(param + 0x10)and*(param + 0x18). How do you determine whether these are fields in the same struct or two separate variables that happen to be adjacent in memory? What evidence would you look for? -
Cross-references are bi-directional: Ghidra tracks references TO an address and references FROM an address. Describe a reverse engineering scenario where the references FROM a function (i.e., what it calls) are more useful than the references TO it (i.e., what calls it). Describe a scenario where the reverse is true.
-
The decompiler may show a variable as
char *(pointer to a string) when the original C source used a different type. What is the observable evidence in the decompiler that a pointer is pointing to a string versus pointing to a raw buffer of bytes? When does the distinction matter for understanding what a function does?
Week 7 of 14. Next: radare2 / rizin / cutter -- the alternative tradition in binary analysis.