Hex editors, magic numbers, endianness, and the habit of reading raw bytes. Everything a compiled binary contains is bytes. Learn to see them.
Reading (~30 min)
Read the Wikipedia articles on magic numbers (file signatures) and on endianness. Focus on the concept, not the full taxonomy. You want to be able to answer: what is a magic number, why do they exist, what is the difference between big-endian and little-endian, and why does it matter for binary analysis?
Then read the xxd man page (man xxd or the online version). Focus on: the default output format (offset / hex / ASCII), the -e flag (little-endian output), the -l flag (limit bytes), and the -s flag (skip to offset). These four flags are the ones you will use every week.
Lecture outline (~1.5 hr)
Part 0: Hex, decimal, and binary mental switching (15 min)
Before reading a hex dump, you need to move fluently between three number representations. This is a skill, not knowledge -- it takes repetition, not explanation.
The rules:
- One byte holds a number from 0 to 255 (decimal) / 0x00 to 0xFF (hex) / 00000000 to 11111111 (binary).
- Hex uses digits 0-9 and letters A-F. A = 10, B = 11, C = 12, D = 13, E = 14, F = 15.
- One hex digit = four binary bits (a "nibble"). Two hex digits = one byte.
0x4C= 0100 1100 in binary.
Worked examples to internalize:
| Decimal | Hex | Binary |
|---|---|---|
| 0 | 0x00 | 0000 0000 |
| 1 | 0x01 | 0000 0001 |
| 16 | 0x10 | 0001 0000 |
| 127 | 0x7F | 0111 1111 |
| 128 | 0x80 | 1000 0000 |
| 255 | 0xFF | 1111 1111 |
The three conversions you will do constantly:
- Hex digit to decimal: remember A-F = 10-15.
0xB= 11.0x1B= 1 x 16 + 11 = 27. - Hex to binary: each hex digit maps to a 4-bit pattern.
0xAB= 1010 1011 (A=1010, B=1011). - Byte to meaning:
0x7F 0x45 0x4C 0x46-- the ELF magic number. The0x7Fis 127 decimal (a non-printable ASCII control code); the next three bytes are ASCII for 'E', 'L', 'F'.
If this is new to you, spend 10 minutes at the keyboard now: run printf '%d\n' 0x10 0xFF 0x7f 0x3e 0x80 and verify the decimal values. Then run printf '%02x\n' 127 255 62 16 and verify the hex values. Use FND-101's number-systems module if you want a fuller treatment.
Part 1: Everything is bytes (20 min)
A compiled binary is a sequence of bytes. There is no magic. There are no types at the binary level -- only bytes interpreted according to convention. A 4-byte sequence 0x00 0x00 0x00 0x01 might be the integer 1 (big-endian), or it might be part of a float, or part of an instruction encoding, or a pointer, or arbitrary data. Context and convention determine interpretation.
This is the most important conceptual shift for students coming from high-level programming: when you read C or Python, the language tells you what a value is. When you read a binary, nothing tells you. You infer.
The tools in this course are inference aids. file makes probabilistic guesses based on magic bytes and structure. strings extracts ASCII-printable sequences. readelf parses ELF headers according to the ELF spec. Ghidra decompiles -- it guesses at the C that could produce the observed code. None of these tools are authoritative; they are structured ways to look at bytes.
Part 2: Hex editors and xxd (25 min)
A hex editor is a viewer (and optionally an editor) that shows you the raw bytes of a file: offset in the file, hexadecimal representation, and ASCII representation of the same bytes side by side.
xxd is the command-line hex viewer you will use most often in RE-011. The default output looks like:
00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............
00000010: 0300 3e00 0100 0000 1011 0000 0000 0000 ..>.............
Left column: offset in bytes from the start of the file, in hexadecimal. Middle column: 16 bytes per row in hex, grouped in pairs. Right column: the same 16 bytes as ASCII; non-printable bytes are shown as dots.
Common xxd patterns in RE work:
xxd binary | head -4-- see the first 64 bytes: magic number, format version, architecturexxd binary | grep -a 'ELF'-- find all occurrences of an ASCII string in hex outputxxd -s 0x400 -l 64 binary-- view 64 bytes starting at offset 0x400 (e.g., to read an ELF section header)
Part 3: Magic numbers (20 min)
A magic number is a sequence of bytes at a known offset in a file (usually the start) that identifies the file format. The name comes from Unix tradition; the concept predates Unix.
Magic numbers you will see frequently in RE-011:
| Format | Magic bytes | ASCII representation |
|---|---|---|
| ELF | 7F 45 4C 46 |
.ELF |
| PE (Windows) | 4D 5A |
MZ |
| Mach-O (macOS, 64-bit) | CF FA ED FE |
(not printable) |
| ZIP / JAR / APK | 50 4B 03 04 |
PK.. |
25 50 44 46 |
%PDF |
|
| PNG | 89 50 4E 47 0D 0A 1A 0A |
.PNG.... |
| gzip | 1F 8B |
(not printable) |
| squashfs | 73 71 73 68 |
sqsh (or variant) |
Firmware images frequently contain embedded filesystems. A squashfs or jffs2 or ext2 filesystem will have its own magic number starting at whatever byte offset it was appended to the firmware blob. The binwalk tool (which you will see in Week 12) scans a binary file for embedded magic numbers and reports their offsets.
Part 4: Endianness (15 min)
Endianness describes the byte order used to represent multi-byte values.
- Little-endian: the least-significant byte is stored at the lowest address. Intel x86 and x86-64 are little-endian. ARM can be either; most ARM operating systems configure it as little-endian.
- Big-endian: the most-significant byte is stored at the lowest address. Classic network byte order (TCP/IP) is big-endian. MIPS and PowerPC (older) are big-endian by default.
Why this matters in RE: you are often reading a 4-byte or 8-byte value from a hex dump and need to interpret it as an integer. On a little-endian system, 01 00 00 00 is the integer 1, not 0x01000000. The ELF header includes an endianness byte (offset 0x05 in the ELF ident field) that tells you how to interpret the rest of the file.
xxd by default prints bytes in file order. The -e flag outputs in little-endian 32-bit word order, which is useful when you want to read multi-byte values as they would appear in little-endian registers.
Lab walk: hex dump orientation (~1 hr, ungraded)
This is an instructor-led lab walk, not a graded exercise. The instructor opens four files in xxd:
- An ELF binary (the week's provided binary or a system binary like
/bin/ls) - A PNG image
- A ZIP file
- A file with a deliberately-corrupt or non-standard magic number
For each: identify the magic number, interpret the first 16 bytes as much as possible, and note where the ASCII sidebar in xxd gives useful hints versus where it does not.
Students follow along and write xxd observations in their Tool Journal.
Independent practice (~3 hr)
- Tool Journal: Document
xxd. Record the four flags from the reading (-e,-l,-s, default format). Add one concrete observation from the lab walk: a specific offset in a specific file where the hex dump told you something thefilecommand did not. - Magic number table: Build your own magic-number reference in your lab notebook or Tool Journal. Start with the eight formats from the lecture. Add at least three more by searching "file signatures" on the Gary Kessler file signatures table (a long-running reference maintained by Gary Kessler -- search for it). You will refer to this table when you see unfamiliar firmware images in Weeks 12-14.
- endianness exercise: In xxd, look at the two bytes at offset
0x10of/bin/ls-- that is thee_typefield in the ELF header. (xxd -l 18 /bin/lsshows the first 18 bytes; the two bytes at offset 0x10 are the last two.) What value does it have in little-endian byte order? What does that value mean according to the ELF spec? (Values to know: ET_EXEC = 2, ET_DYN = 3, ET_REL = 1, ET_CORE = 4. Most Linux binaries built in the last decade are ET_DYN even when they are executables.)
Reflection prompts
-
Why does the
filecommand sometimes get the format wrong? What would cause it to misidentify a file? (Think about what the command actually does -- it looks at magic bytes. What could go wrong?) -
Endianness is a design choice. The network protocol designers who defined TCP/IP chose big-endian. Intel chose little-endian for x86. What are the costs and benefits of standardizing on one byte order globally? Why do you think we still have both?
-
A student says "I can just open the binary in a hex editor and search for the password -- it will be in there as a string." Under what conditions is this correct? Under what conditions does it fail? What would a developer do to make this approach fail?
Week 2 of 14. Next: ELF format in depth -- sections, segments, symbol tables, stripped binaries.