"The specification exists. It is embedded in the transmitted bits. Your job is to read it out -- one frame at a time, one field at a time, with evidence for every claim." -- RF-301 course doctrine, SIGINT Stage 5
Lecture (90 min)
9.1 Beyond Stage 5: What Happens After the Hypothesis
Week 7's SIGINT pipeline ends at Stage 5 with a protocol hypothesis: "this signal resembles a 433 MHz OOK sensor; the payload is 24 bits; the byte at offset 8 looks like a counter." That hypothesis is evidence-grounded but not verified. Verification requires doing what a protocol engineer would do: characterize the protocol completely enough to predict the content of messages you have not yet seen.
The 2010 TPMS reverse engineering work by Francillon et al. is the canonical example. The researchers had no documentation. They captured TPMS transmissions from commercial vehicles, extracted the OOK bit stream, identified the tire ID field (fixed per wheel), the pressure field (correlated with tire pressure changes they induced), and the checksum field (residual byte that validated a CRC-8 assumption). The protocol specification they produced was complete enough to build a reader and, subsequently, to demonstrate tracking attacks against vehicles using their tire IDs.
This is the methodology: capture evidence, form hypotheses, test hypotheses by inducing known changes and observing the bit-level effect, and iterate until every field is accounted for.
9.2 Protocol State Machine Extraction
A protocol state machine is the set of message types, their sequencing, and the conditions that trigger transitions. For a simple sensor protocol (TPMS, weather station, key fob), the state machine may be trivial: one message type, one direction, no response. For a bidirectional protocol (rolling code entry system, RFID reader), the state machine has at least two states (challenge and response) and the field values change per transaction.
Approach:
- Record many captures: multiple transmissions from the same device, from multiple devices of the same type, and across time.
- Align captures at the preamble. All captures from the same device should have identical bits in fixed fields.
- Identify fixed fields vs variable fields: bits that do not change across captures from the same device are candidates for fixed identifiers (device ID, model code). Bits that change are candidates for variable fields (counter, measurement, checksum).
- Induce known changes: change battery voltage, temperature, pressure, or use a known sequence (press button 3 times). Observe which bits change and in what direction.
- Document the state machine as a diagram: states are message types; transitions are labeled with conditions.
def align_captures(capture_list: list[list[int]], preamble_len: int) -> np.ndarray:
"""
Align multiple bit captures at their preamble.
Returns a 2D array where each row is one aligned capture.
"""
aligned = []
for bits in capture_list:
# Find preamble start (look for known preamble pattern)
preamble = [1, 0, 1, 0, 1, 0, 1, 0] # example 4-byte preamble
for i in range(len(bits) - preamble_len):
if bits[i:i+len(preamble)] == preamble:
aligned.append(bits[i:])
break
# Trim to minimum length
min_len = min(len(r) for r in aligned)
return np.array([r[:min_len] for r in aligned], dtype=np.uint8)
def field_variability_map(aligned: np.ndarray) -> np.ndarray:
"""
For each bit position, compute the fraction of captures that differ from capture 0.
High variability → variable field candidate.
Low variability → fixed field candidate.
"""
return np.mean(aligned != aligned[0], axis=0)
9.3 Checksum and CRC Recovery
Every reliable RF protocol includes some form of error detection. The most common are:
- XOR checksum: sum of all payload bytes modulo 256 (8-bit). Easy to identify: change one byte, the checksum changes by the same amount.
- CRC-8: polynomial division; 256 possible polynomials with the standard form
x^8 + ....crcmodcan enumerate them. - CRC-16/CRC-32: same principle, wider.
CRC enumeration with crcmod:
import crcmod
def find_crc_polynomial(payload: bytes, checksum_byte: int) -> list[dict]:
"""
Try all standard CRC-8 polynomials to find which one matches the observed checksum.
Returns list of matching polynomials.
"""
# Common CRC-8 polynomials (expressed as integers including the implicit leading 1 bit)
polynomials = [0x107, 0x131, 0x11d, 0x12f, 0x197, 0x1a6, 0x1b8, 0x1d3, 0x1e7]
matches = []
for poly in polynomials:
for initCrc in [0x00, 0xFF]:
for rev in [True, False]:
try:
crc_fn = crcmod.mkCrcFun(poly, initCrc=initCrc, rev=rev, xorOut=0x00)
computed = crc_fn(payload)
if computed == checksum_byte:
matches.append({
"polynomial": hex(poly),
"initCrc": hex(initCrc),
"rev": rev,
})
except Exception:
pass
return matches
# Usage example
payload = bytes([0x3A, 0x01, 0xB2, 0x04, 0x00])
checksum = 0xC7
matches = find_crc_polynomial(payload, checksum)
print(f"Found {len(matches)} matching CRC-8 variant(s):", matches)
LFSR and scrambler identification. Some protocols use a Linear Feedback Shift Register (LFSR) to whiten the bit stream before transmission. A whitened stream has near-uniform bit distribution (approximately 50% zeros, 50% ones). If your bit stream after OOK demodulation shows highly skewed bit distribution (80% zeros or similar), suspect whitening. Apply LFSR descrambling with the common Bluetooth or IEEE 802.15.4 polynomials first.
9.4 Bit Encoding Schemes
The RF physical layer may not transmit raw NRZ bits. Common encoding schemes and how to detect them:
| Encoding | How it looks | Detection |
|---|---|---|
| NRZ (non-return to zero) | Bit value = signal level; 1=high, 0=low | Autocorrelation peak at symbol period |
| Manchester | Each bit has a transition; 1=low→high, 0=high→low | Double the apparent symbol rate; spectral nulls at DC |
| Differential Manchester | Transition at start of every bit period; direction encodes bit | Same as Manchester; always transitions |
| 4B/6B | 4 data bits encoded as one of 16 valid 6-bit patterns | 6 symbols per 4 bits; look-up table approach |
| PWM / pulse-width | 0=short pulse, 1=long pulse (or vice versa) | Variable-width pulses in OOK envelope |
def try_encodings(bit_stream: np.ndarray) -> dict:
"""
Try common bit encoding schemes and return decoded bits for each.
"""
results = {}
# NRZ: bits as-is
results["NRZ"] = bit_stream.copy()
# Manchester: decode pairs of bits; 10 → 1, 01 → 0
if len(bit_stream) % 2 == 0:
pairs = bit_stream.reshape(-1, 2)
valid_manchester = np.all((pairs[:, 0] != pairs[:, 1]), axis=0)
if valid_manchester:
results["Manchester"] = (pairs[:, 0] == 1).astype(np.uint8)
# Differential: XOR with shifted version
diff_decoded = np.diff(bit_stream).astype(np.uint8) & 1
results["Differential"] = diff_decoded
return results
Effect on find_preamble() from Lab 7. The Lab 7 preamble search assumed NRZ. A Manchester-encoded preamble of 1010 1010 becomes 10 01 10 01 10 01 10 01 at the physical layer -- 16 symbols instead of 8 bits. The Stage 4 symbol structure analysis must decode the encoding before searching for the preamble pattern.
9.5 Protocol Specification Writing
A complete protocol specification includes:
- Message format table. For each field: name, bit offset, bit width, encoding, known values.
- State machine diagram. States = transmitter conditions; transitions = events (timer, sensor change, button press); arc labels = message type emitted.
- Field semantics. For each variable field: what physical quantity it represents; the encoding (raw binary, offset binary, 2's complement, BCD); calibration/scaling.
- Confidence annotation. For each field: CONFIRMED (tested with induced change), HYPOTHESIZED (consistent with data but not verified by induced change), UNKNOWN (residual bits not yet explained).
Template for a minimum-viable protocol spec:
## Protocol: [Name from your hypothesis, e.g., "ISM433-WS01 Weather Sensor"]
**Physical layer:** OOK, 433.92 MHz, 2.4 kbaud
**Preamble:** 8 × `10` pulses (Manchester sync marker)
**Sync word:** `0xAA 0xD4`
**Payload length:** 24 bits (3 bytes) + 8-bit CRC
| Field | Bits | Width | Encoding | Semantics | Confidence |
|---|---|---|---|---|---|
| Device ID | 0 | 12 | Raw binary | Fixed per sensor | CONFIRMED |
| Battery low | 12 | 1 | Boolean | 1=low | HYPOTHESIZED |
| Channel | 13 | 2 | Raw binary | 0-3 channel selector | CONFIRMED |
| Temperature | 15 | 12 | Offset binary ÷ 10 | Degrees C × 10; subtract 400 | CONFIRMED |
| Checksum | 27 | 8 | CRC-8/MAXIM | Covers bytes 0-2 | CONFIRMED |
9.6 GNU Radio Reimplementation
Once you have a protocol specification, you can implement a receiver in GNU Radio. The key blocks:
digital.correlate_access_code_bb: searches for the sync word in the bit stream; marks the start of framesdigital.packet_headerparser_b: parses header fields per a format specificationpdu.pdu_filter: filters PDUs by field value- PDU stream →
blocks.pdu_to_tagged_stream: converts parsed PDUs to tagged IQ stream for further processing
A working GNU Radio demodulator built from a reverse-engineered protocol specification is the capstone's Tier 1 gate: make demod must run and produce PDUs. Lab 9's protocol spec is the prerequisite for this demodulator.
9.7 Architecture Comparison Sidebar
| Protocol | Documentation | Auth | Replay protection | RE difficulty |
|---|---|---|---|---|
| Bluetooth LE advertisement | Public (Bluetooth SIG) | None for ADV_IND; pairing for GATT | None for broadcasts | Low (full spec exists) |
| Zigbee | Public (IEEE 802.15.4) | Optional (network layer key) | Sequence number (weak) | Low (full spec + Wireshark) |
| Z-Wave | Licensed (Silicon Labs) | S2 security (ECDH) | Nonce-based (strong) | Medium (spec available with NDA) |
| 433 MHz OOK sensor | Proprietary (no public spec) | None | None | Medium-High (Lab 9 methodology) |
| APRS over AX.25 | Public (ham radio standard) | None | None | Very low (full spec + APRSfi) |
| Keeloq rolling code | Proprietary (Microchip) | Keeloq algorithm | Rolling code (weak -- breaks exist) | High (algorithm known; seed recovery required) |
Lab Preview
Lab 9 applies this methodology to ism433-mystery.cf32: an instructor-provided 30-second capture of a proprietary OOK 433 MHz device. See labs/lab-9.md.
Toolchain Diary Prompt
New this week: crcmod for CRC polynomial enumeration; numpy.diff() for differential decoding; digital.correlate_access_code_bb in GNU Radio. Compare the protocol spec you produce in Lab 9 to the Bluetooth LE advertisement format (public spec) -- what fields are present in both? What fields are present only in the open protocol and not in the proprietary one, and why?