Classroom Public page

RE-011 Week 12: Firmware Teardown

1,180 words

Guided rehearsal of the capstone analysis process. Extraction from a firmware image, component identification, architecture identification, and salient findings -- applied to an instructor-provided training image before you do it yourself on the assigned target.


Reading (~45 min)

Read the bunnie Huang blog post "Reverse Engineering the Linksys WRT54G" (search by title -- it is a widely-cited example of firmware teardown methodology and is freely available). Note the structure: what he looked for first, how he extracted the filesystem, and how he oriented himself in an unfamiliar binary.

Then read the binwalk wiki "Signatures" page (github.com/ReFirmLabs/binwalk/wiki/Signatures). This is the reference for what file formats binwalk recognizes inside a firmware blob. You are not memorizing it; you are getting a sense of the breadth of what can be embedded.


Lecture outline (~1.5 hr)

Part 1: What firmware is (15 min)

Firmware is software stored in non-volatile memory (flash storage, ROM, EEPROM) on an embedded device. Unlike a desktop application, firmware is the entire software stack for the device: bootloader, OS kernel, userland, configuration data, and sometimes static web assets.

A firmware image (the file you receive or download) is a binary blob that contains all of these components, concatenated and sometimes compressed. It may also contain cryptographic signatures, checksums, and device-specific metadata.

The firmware analysis workflow in RE-011:

  1. Identify the image: What is the file format? Is there a header? What does binwalk find inside it?
  2. Extract the filesystem: What filesystem is embedded (squashfs, jffs2, cramfs, ext2)? Extract it with binwalk or the appropriate tool.
  3. Identify the architecture: What CPU architecture does the firmware target? (x86, ARM, MIPS, PowerPC, RISC-V) This determines your disassembler configuration.
  4. Locate the interesting binaries: Which binary in the extracted filesystem is the primary application? Where is the web interface? Where is the authentication code?
  5. Analyze a specific binary: Apply the Ghidra workflow from Weeks 6-7 to the identified binary.

RE-011 covers steps 1-4 and a partial step 5. Full static analysis of a firmware binary's application layer is the RE-101 / RE-201 domain (where the SB6141 is the named lab target).

Part 2: binwalk (25 min)

binwalk is the primary tool for scanning and extracting firmware images. RE-011 uses binwalk v2.x (Python-based, from ReFirmLabs). A Rust rewrite (binwalk v3) exists with different flags; it is not the course reference. See SETUP.md section 8 if you are unsure which version you have installed.

# Install (Python v2.x via apt -- verify with: binwalk --version showing "Binwalk v2.x")
sudo apt install binwalk

# Scan a firmware image (no extraction)
binwalk firmware.bin

# Extract all recognized components
binwalk -e firmware.bin     # creates _firmware.bin.extracted/

# Entropy visualization
binwalk -E firmware.bin     # plots entropy per block; high entropy = compressed/encrypted

# Signature scan only (faster)
binwalk -B firmware.bin

Reading binwalk output:

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------
0             0x0             TRX firmware header, little endian, image size: 4325376, CRC32: 0x...
28            0x1C            LZMA compressed data, ...
1245184       0x130000        Squashfs filesystem, little endian, version 4.0, ...

This tells you:

  • The image starts with a TRX header (a common router firmware container format)
  • At offset 28: LZMA-compressed data (the kernel)
  • At offset 0x130000: a squashfs filesystem (the userland)

After binwalk -e, you find the extracted filesystem in the output directory. Navigate it as a normal Linux filesystem: ls, find, file, strings.

Common embedded filesystems:

Filesystem Description Common in
squashfs Compressed read-only FS; mounted at boot Routers, consumer IoT
jffs2 Journaling Flash FS; supports writes Routers with persistent config
cramfs Simple compressed FS; older Legacy embedded
ext2/ext4 Standard Linux FS SBCs, more complex embedded
yaffs2 NAND flash FS Android (older), some IoT

Part 3: Architecture identification (20 min)

Once you extract a binary from the firmware filesystem, you need to know what architecture it targets before you can disassemble it.

file /path/to/extracted/bin/busybox

file reads the ELF header's e_machine field (if it is an ELF binary) and reports the architecture. Common results:

  • ELF 32-bit LSB executable, ARM, EABI5 -- 32-bit ARM, little-endian
  • ELF 32-bit MSB executable, MIPS, MIPS32 -- 32-bit MIPS, big-endian (classic router architecture)
  • ELF 32-bit LSB executable, Intel 80386 -- x86 32-bit
  • ELF 64-bit LSB executable, x86-64 -- x86-64

For non-ELF binaries (custom bootloaders, raw binary blobs), architecture identification is harder:

  • Look for known instruction sequences (ARM thumb uses 2-byte instructions; MIPS has characteristic NOP patterns)
  • Look for string references that name the architecture
  • Look for known magic numbers in the blob that identify the bootloader (U-Boot is common: 55 42 6F 6F 74 = "UBoo")

Configuring Ghidra for cross-architecture analysis:

When you import a non-x86 ELF into Ghidra, Ghidra detects the architecture from the ELF header and offers the correct processor module. For MIPS: select MIPS:BE:32:default (big-endian 32-bit). For 32-bit ARM: ARM:LE:32:v7 or similar. The decompiler and disassembler work for all supported architectures; the instruction set display changes.

Part 4: Salient findings (10 min)

A firmware analysis report is not an exhaustive catalog of everything in the firmware. It is a structured summary of the security-relevant findings. In RE-011 terms, a "salient finding" is any of:

  • Hardcoded credentials (passwords, API keys, private keys embedded as string literals)
  • Debug interfaces left enabled (telnetd, UART shell, SSH with default credentials)
  • Outdated library versions with known CVEs (check strings output for version strings; cross-reference with NVD)
  • Dangerous function usage (strcpy, gets, sprintf in a binary that handles external input)
  • Exposed administrative interfaces (web server paths, undocumented endpoints visible in strings)
  • Interesting binary names or paths that suggest attack surface (e.g., cgi-bin/, backdoor, factory_reset, debug_port_service)

You report findings with evidence: where in the filesystem you found it, what the string or call site shows, and what the security implication is.


Lab walk: Guided firmware teardown (~1.5 hr, ungraded)

Instructor-led teardown of the course-provided training firmware image (a legally-distributable training target; not a production device image). Steps:

  1. Run binwalk on the image; interpret the output together.
  2. Run binwalk -e to extract.
  3. Navigate the extracted filesystem; identify key binaries.
  4. Run file and readelf -h on the main application binary.
  5. Load the main application binary into Ghidra; configure the correct processor.
  6. Find two salient findings: one string-based (hardcoded string, version number) and one code-based (dangerous function call site).

Students document their findings in their Tool Journal in the format they will use for the capstone report.


Independent practice (~3 hr)

  • Tool Journal: Document the firmware analysis workflow as a checklist you can follow for the capstone: (1) binwalk scan, (2) binwalk extract, (3) navigate filesystem, (4) file + readelf on target binary, (5) Ghidra import with correct processor, (6) strings and imports survey, (7) identify salient findings.
  • Capstone preview: Review the Lab 9 specification (labs/lab-9-capstone.md). Draft the "Analysis scope" section of your report: what binary in the assigned target you plan to focus on, and what category of finding you expect to be most interesting. This is your starting hypothesis.
  • CrackMe ladder: Complete the Lab 6 checkpoint: 4+ CrackMes documented with technique narrative (due Week 13). If you are below 4, this week is the time to catch up.

Reflection prompts

  1. binwalk scans for known magic numbers. A firmware image that uses a proprietary or nonstandard compression format will not be recognized by binwalk. How would you detect that a firmware image is compressed or encrypted without binwalk recognizing the format? What properties of the binary would you examine?

  2. The firmware teardown workflow starts with the full image and works inward (image -> filesystem -> binary -> function). Why does the order matter? What information does each layer provide that makes the next layer easier to analyze?

  3. Hardcoded credentials in firmware are a common finding. Why do embedded firmware developers hardcode credentials when they know it is a security risk? What constraints in the firmware development context lead to this choice? (Think about: no user interface for initial configuration, factory reset requirements, manufacturing test requirements.)


Week 12 of 14. Next: Capstone scoping -- instructor-assigned target, analysis plan, outline sign-off.