Guided rehearsal of the capstone analysis process. Extraction from a firmware image, component identification, architecture identification, and salient findings -- applied to an instructor-provided training image before you do it yourself on the assigned target.
Reading (~45 min)
Read the bunnie Huang blog post "Reverse Engineering the Linksys WRT54G" (search by title -- it is a widely-cited example of firmware teardown methodology and is freely available). Note the structure: what he looked for first, how he extracted the filesystem, and how he oriented himself in an unfamiliar binary.
Then read the binwalk wiki "Signatures" page (github.com/ReFirmLabs/binwalk/wiki/Signatures). This is the reference for what file formats binwalk recognizes inside a firmware blob. You are not memorizing it; you are getting a sense of the breadth of what can be embedded.
Lecture outline (~1.5 hr)
Part 1: What firmware is (15 min)
Firmware is software stored in non-volatile memory (flash storage, ROM, EEPROM) on an embedded device. Unlike a desktop application, firmware is the entire software stack for the device: bootloader, OS kernel, userland, configuration data, and sometimes static web assets.
A firmware image (the file you receive or download) is a binary blob that contains all of these components, concatenated and sometimes compressed. It may also contain cryptographic signatures, checksums, and device-specific metadata.
The firmware analysis workflow in RE-011:
- Identify the image: What is the file format? Is there a header? What does
binwalkfind inside it? - Extract the filesystem: What filesystem is embedded (squashfs, jffs2, cramfs, ext2)? Extract it with binwalk or the appropriate tool.
- Identify the architecture: What CPU architecture does the firmware target? (x86, ARM, MIPS, PowerPC, RISC-V) This determines your disassembler configuration.
- Locate the interesting binaries: Which binary in the extracted filesystem is the primary application? Where is the web interface? Where is the authentication code?
- Analyze a specific binary: Apply the Ghidra workflow from Weeks 6-7 to the identified binary.
RE-011 covers steps 1-4 and a partial step 5. Full static analysis of a firmware binary's application layer is the RE-101 / RE-201 domain (where the SB6141 is the named lab target).
Part 2: binwalk (25 min)
binwalk is the primary tool for scanning and extracting firmware images. RE-011 uses binwalk v2.x (Python-based, from ReFirmLabs). A Rust rewrite (binwalk v3) exists with different flags; it is not the course reference. See SETUP.md section 8 if you are unsure which version you have installed.
# Install (Python v2.x via apt -- verify with: binwalk --version showing "Binwalk v2.x")
sudo apt install binwalk
# Scan a firmware image (no extraction)
binwalk firmware.bin
# Extract all recognized components
binwalk -e firmware.bin # creates _firmware.bin.extracted/
# Entropy visualization
binwalk -E firmware.bin # plots entropy per block; high entropy = compressed/encrypted
# Signature scan only (faster)
binwalk -B firmware.bin
Reading binwalk output:
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------
0 0x0 TRX firmware header, little endian, image size: 4325376, CRC32: 0x...
28 0x1C LZMA compressed data, ...
1245184 0x130000 Squashfs filesystem, little endian, version 4.0, ...
This tells you:
- The image starts with a TRX header (a common router firmware container format)
- At offset 28: LZMA-compressed data (the kernel)
- At offset 0x130000: a squashfs filesystem (the userland)
After binwalk -e, you find the extracted filesystem in the output directory. Navigate it as a normal Linux filesystem: ls, find, file, strings.
Common embedded filesystems:
| Filesystem | Description | Common in |
|---|---|---|
| squashfs | Compressed read-only FS; mounted at boot | Routers, consumer IoT |
| jffs2 | Journaling Flash FS; supports writes | Routers with persistent config |
| cramfs | Simple compressed FS; older | Legacy embedded |
| ext2/ext4 | Standard Linux FS | SBCs, more complex embedded |
| yaffs2 | NAND flash FS | Android (older), some IoT |
Part 3: Architecture identification (20 min)
Once you extract a binary from the firmware filesystem, you need to know what architecture it targets before you can disassemble it.
file /path/to/extracted/bin/busybox
file reads the ELF header's e_machine field (if it is an ELF binary) and reports the architecture. Common results:
ELF 32-bit LSB executable, ARM, EABI5-- 32-bit ARM, little-endianELF 32-bit MSB executable, MIPS, MIPS32-- 32-bit MIPS, big-endian (classic router architecture)ELF 32-bit LSB executable, Intel 80386-- x86 32-bitELF 64-bit LSB executable, x86-64-- x86-64
For non-ELF binaries (custom bootloaders, raw binary blobs), architecture identification is harder:
- Look for known instruction sequences (ARM thumb uses 2-byte instructions; MIPS has characteristic NOP patterns)
- Look for string references that name the architecture
- Look for known magic numbers in the blob that identify the bootloader (U-Boot is common:
55 42 6F 6F 74= "UBoo")
Configuring Ghidra for cross-architecture analysis:
When you import a non-x86 ELF into Ghidra, Ghidra detects the architecture from the ELF header and offers the correct processor module. For MIPS: select MIPS:BE:32:default (big-endian 32-bit). For 32-bit ARM: ARM:LE:32:v7 or similar. The decompiler and disassembler work for all supported architectures; the instruction set display changes.
Part 4: Salient findings (10 min)
A firmware analysis report is not an exhaustive catalog of everything in the firmware. It is a structured summary of the security-relevant findings. In RE-011 terms, a "salient finding" is any of:
- Hardcoded credentials (passwords, API keys, private keys embedded as string literals)
- Debug interfaces left enabled (telnetd, UART shell, SSH with default credentials)
- Outdated library versions with known CVEs (check
stringsoutput for version strings; cross-reference with NVD) - Dangerous function usage (strcpy, gets, sprintf in a binary that handles external input)
- Exposed administrative interfaces (web server paths, undocumented endpoints visible in strings)
- Interesting binary names or paths that suggest attack surface (e.g.,
cgi-bin/,backdoor,factory_reset,debug_port_service)
You report findings with evidence: where in the filesystem you found it, what the string or call site shows, and what the security implication is.
Lab walk: Guided firmware teardown (~1.5 hr, ungraded)
Instructor-led teardown of the course-provided training firmware image (a legally-distributable training target; not a production device image). Steps:
- Run
binwalkon the image; interpret the output together. - Run
binwalk -eto extract. - Navigate the extracted filesystem; identify key binaries.
- Run
fileandreadelf -hon the main application binary. - Load the main application binary into Ghidra; configure the correct processor.
- Find two salient findings: one string-based (hardcoded string, version number) and one code-based (dangerous function call site).
Students document their findings in their Tool Journal in the format they will use for the capstone report.
Independent practice (~3 hr)
- Tool Journal: Document the firmware analysis workflow as a checklist you can follow for the capstone: (1) binwalk scan, (2) binwalk extract, (3) navigate filesystem, (4) file + readelf on target binary, (5) Ghidra import with correct processor, (6) strings and imports survey, (7) identify salient findings.
- Capstone preview: Review the Lab 9 specification (labs/lab-9-capstone.md). Draft the "Analysis scope" section of your report: what binary in the assigned target you plan to focus on, and what category of finding you expect to be most interesting. This is your starting hypothesis.
- CrackMe ladder: Complete the Lab 6 checkpoint: 4+ CrackMes documented with technique narrative (due Week 13). If you are below 4, this week is the time to catch up.
Reflection prompts
-
binwalk scans for known magic numbers. A firmware image that uses a proprietary or nonstandard compression format will not be recognized by binwalk. How would you detect that a firmware image is compressed or encrypted without binwalk recognizing the format? What properties of the binary would you examine?
-
The firmware teardown workflow starts with the full image and works inward (image -> filesystem -> binary -> function). Why does the order matter? What information does each layer provide that makes the next layer easier to analyze?
-
Hardcoded credentials in firmware are a common finding. Why do embedded firmware developers hardcode credentials when they know it is a security risk? What constraints in the firmware development context lead to this choice? (Think about: no user interface for initial configuration, factory reset requirements, manufacturing test requirements.)
Week 12 of 14. Next: Capstone scoping -- instructor-assigned target, analysis plan, outline sign-off.