Classroom Glossary Public page

Lab 9: ISM Protocol Reverse Engineering

555 words

Week: 11a -- Protocol RE Cross-Cut
Points: 20
Time estimate: 90 min lab + 2 hr independent
Deliverable: lab-9-report.md + protocol specification document


Objectives

  1. Demodulate a proprietary OOK 433 MHz signal from a provided IQ capture.
  2. Extract the bit stream and test NRZ vs Manchester encoding.
  3. Identify frame structure: preamble, sync word, payload boundaries, checksum field.
  4. Enumerate CRC-8 polynomial variants to recover the checksum algorithm.
  5. Produce a complete protocol specification document.

Target Signal

The instructor provides ism433-mystery.cf32: a 20-second capture at 2.4 MSPS, center frequency 433.92 MHz. This is a real ISM-band device with a proprietary OOK protocol. You have no documentation. Your task is to produce the documentation.

# Verify the file is present
ls -lh ism433-mystery.cf32
# Expected: ~92 MB (20s × 2.4 MSPS × 2 channels × 4 bytes/sample)

python3 -c "
import numpy as np
iq = np.fromfile('ism433-mystery.cf32', dtype=np.complex64)
print(f'Samples: {len(iq):,}')
print(f'Duration: {len(iq)/2.4e6:.1f} sec')
print(f'Sample rate: 2.4 MSPS')
"

Part A: Bit Extraction (6 points)

A.1 Signal Survey

Run a basic spectrum survey to confirm the signal parameters:

#!/usr/bin/env python3
"""Lab 9 Part A: OOK bit extraction."""
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import welch

iq = np.fromfile('ism433-mystery.cf32', dtype=np.complex64)
fs = 2.4e6

# Spectrum survey
NFFT = 8192
psd = np.abs(np.fft.fftshift(np.fft.fft(iq[:NFFT], NFFT)))**2 / NFFT
f_kHz = np.fft.fftshift(np.fft.fftfreq(NFFT, 1/fs)) / 1e3

# Symbol rate estimation via envelope PSD
power = np.abs(iq)**2
f_env, Pxx_env = welch(power, fs=fs, nperseg=65536, noverlap=32768)
noise_floor = np.percentile(Pxx_env, 50)
sym_candidates = f_env[(Pxx_env > 5 * noise_floor) & (f_env > 1e3)]

print(f"Symbol rate candidates (Hz): {sym_candidates[:5]}")
sym_rate_est = sym_candidates[0] if len(sym_candidates) > 0 else 4800
print(f"Using symbol rate estimate: {sym_rate_est/1e3:.2f} kBaud")
sps = fs / sym_rate_est
print(f"Samples per symbol: {sps:.1f}")

A.2 OOK Demodulation

OOK (On-Off Keying) is AM with one of the two amplitude levels being zero. Demodulation: take the envelope, apply a threshold, downsample to one sample per symbol.

def ook_demodulate(iq, fs, sym_rate, threshold_factor=0.5):
    """
    Demodulate OOK signal to bits.
    
    Args:
        iq: complex IQ samples
        fs: sample rate (Hz)
        sym_rate: estimated symbol rate (Hz)
        threshold_factor: threshold = factor * (max + min) / 2
    
    Returns:
        bits: numpy array of 0/1 values, one per symbol
    """
    sps = int(round(fs / sym_rate))
    envelope = np.abs(iq)
    
    # Normalize the envelope
    env_max = np.percentile(envelope, 99)
    env_min = np.percentile(envelope, 1)
    env_norm = (envelope - env_min) / (env_max - env_min + 1e-10)
    
    # Threshold
    threshold = threshold_factor * (1.0 + 0.0) / 2  # 0.5 of normalized
    binary = (env_norm > threshold).astype(np.uint8)
    
    # Downsample: take one sample per symbol at symbol center
    n_symbols = len(binary) // sps
    bits = np.array([binary[(i * sps) + sps // 2] for i in range(n_symbols)])
    
    return bits

bits_nrz = ook_demodulate(iq, fs, sym_rate_est)
print(f"Extracted {len(bits_nrz):,} bits")
print(f"First 64 bits (NRZ assumption): {''.join(map(str, bits_nrz[:64]))}")
print(f"Bit distribution: {np.mean(bits_nrz):.3f} (should be near 0.5 for good signal)")

A.3 Test Manchester Decoding

If the bit distribution from A.2 is skewed (< 0.4 or > 0.6), suspect Manchester encoding:

def try_manchester_decode(bits):
    """
    Try Manchester decoding: 10 → 1, 01 → 0.
    Returns (decoded_bits, valid_fraction).
    """
    if len(bits) % 2 != 0:
        bits = bits[:-1]
    
    pairs = bits.reshape(-1, 2)
    valid = np.all(pairs[:, 0] != pairs[:, 1], axis=1)
    valid_fraction = np.mean(valid)
    
    # Only decode valid pairs (skip invalid)
    decoded = pairs[valid, 0]  # first bit of pair
    return decoded, valid_fraction

bits_manchester, valid_frac = try_manchester_decode(bits_nrz)
print(f"\nManchester decode test:")
print(f"  Valid pair fraction: {valid_frac:.3f} (>0.8 suggests Manchester encoding)")
print(f"  Decoded bits: {len(bits_manchester):,}")
if valid_frac > 0.8:
    print(f"  → Manchester encoding likely. Using decoded bits.")
    bits = bits_manchester
else:
    print(f"  → NRZ encoding likely. Keeping raw bits.")
    bits = bits_nrz

print(f"\nFinal bit distribution: {np.mean(bits):.3f}")
print(f"First 64 bits (final): {''.join(map(str, bits[:64]))}")

Part A deliverable: Spectrum survey (1 plot), symbol rate estimate with evidence, choice of NRZ vs Manchester with the valid_fraction evidence. Include first 64 bits.


Part B: Frame Structure Analysis (7 points)

B.1 Align Captures

Find the preamble and align multiple packet starts:

def find_packet_boundaries(bits, min_silence=8):
    """
    Find packet start positions by looking for runs of uniform bits (silence)
    followed by the preamble.
    """
    boundaries = []
    in_silence = True
    silence_count = 0
    
    for i, b in enumerate(bits):
        if b == 0:  # silence (OOK off)
            silence_count += 1
            in_silence = True
        else:
            if in_silence and silence_count >= min_silence:
                boundaries.append(i)  # start of new packet
            silence_count = 0
            in_silence = False
    
    return boundaries

boundaries = find_packet_boundaries(bits)
print(f"Found {len(boundaries)} packet start candidates")
for b in boundaries[:5]:
    pkt = bits[b:b+100]
    print(f"  Offset {b:6d}: {''.join(map(str, pkt[:64]))}...")

B.2 Field Variability Analysis

Compare multiple captures to find fixed vs variable fields:

def align_captures(bits, boundaries, n_packets=20, frame_len=100):
    """
    Align packets at the preamble and return a 2D array (n_packets x frame_len).
    """
    packets = []
    for b in boundaries[:n_packets]:
        pkt = bits[b:b+frame_len]
        if len(pkt) == frame_len:
            packets.append(pkt)
    
    if not packets:
        return None
    
    return np.array(packets, dtype=np.uint8)

def field_variability_map(aligned):
    """
    For each bit position, compute the fraction of packets that differ from packet 0.
    Near 0: fixed field. Near 0.5: variable field. Near 1.0: anti-correlated.
    """
    return np.mean(aligned != aligned[0], axis=0)

n_packets = min(20, len(boundaries))
aligned = align_captures(bits, boundaries, n_packets=n_packets)

if aligned is not None:
    variability = field_variability_map(aligned)
    
    plt.figure(figsize=(14, 4))
    plt.bar(range(len(variability)), variability, width=1.0)
    plt.xlabel("Bit position in frame")
    plt.ylabel("Variability fraction (0=fixed, 0.5=variable)")
    plt.title("Field Variability Map")
    plt.axhline(0.1, color='g', linestyle='--', label='Fixed field threshold (0.1)')
    plt.axhline(0.4, color='r', linestyle='--', label='Variable field threshold (0.4)')
    plt.legend()
    plt.tight_layout()
    plt.savefig('lab9/plots/field_variability.png', dpi=150)
    
    # Identify field regions
    fixed_bits = np.where(variability < 0.1)[0]
    variable_bits = np.where(variability > 0.4)[0]
    print(f"\nFixed bit positions (first 20): {fixed_bits[:20]}")
    print(f"Variable bit positions (first 20): {variable_bits[:20]}")
    
    # Show a reference packet
    ref_packet = aligned[0]
    print(f"\nReference packet (packet 0): {''.join(map(str, ref_packet))}")

B.3 Byte-Level View

Group bits into bytes for easier analysis:

def bits_to_bytes(bits):
    """Convert bit array to bytes. Truncates to nearest byte boundary."""
    n_bytes = len(bits) // 8
    byte_array = []
    for i in range(n_bytes):
        byte_val = int(''.join(map(str, bits[i*8:(i+1)*8])), 2)
        byte_array.append(byte_val)
    return bytes(byte_array)

if aligned is not None:
    print("\nFirst 5 packets (bytes):")
    for i, pkt in enumerate(aligned[:5]):
        pkt_bytes = bits_to_bytes(pkt)
        hex_str = ' '.join(f'{b:02X}' for b in pkt_bytes)
        print(f"  Packet {i}: {hex_str}")

Part B deliverable: plots/field_variability.png, the list of fixed vs variable bit positions, and a table identifying at least 3 candidate fields with your hypothesis for each (device ID, counter, payload, checksum).


Part C: Checksum Recovery (4 points)

C.1 Identify the Checksum Byte

The last fixed-length field in a packet is typically the checksum. From your Part B analysis, identify which byte position you believe is the checksum:

import crcmod

def find_crc_polynomial(payload: bytes, checksum_byte: int) -> list:
    """
    Enumerate CRC-8 variants to find which matches the observed checksum.
    Returns list of matching (polynomial, initCrc, rev) combinations.
    """
    polynomials = [0x107, 0x131, 0x11d, 0x12f, 0x197, 0x1a6, 0x1b8, 0x1d3, 0x1e7]
    matches = []
    
    for poly in polynomials:
        for initCrc in [0x00, 0xFF, 0xA3, 0x5A]:
            for rev in [True, False]:
                for xorOut in [0x00, 0xFF]:
                    try:
                        crc_fn = crcmod.mkCrcFun(poly, initCrc=initCrc, rev=rev, xorOut=xorOut)
                        computed = crc_fn(payload)
                        if computed == checksum_byte:
                            matches.append({
                                "polynomial": hex(poly),
                                "initCrc": hex(initCrc),
                                "rev": rev,
                                "xorOut": hex(xorOut),
                            })
                    except Exception:
                        pass
    
    return matches

# TODO: fill in these values from your Part B analysis
# Example (adjust to your recovered field boundaries):
if aligned is not None:
    ref_pkt = aligned[0]
    pkt_bytes = bits_to_bytes(ref_pkt)
    
    # Adjust these indices based on your field analysis
    # payload = first N bytes (before checksum)
    # checksum = last byte
    for checksum_pos in range(len(pkt_bytes) - 1, 0, -1):
        payload = pkt_bytes[:checksum_pos]
        checksum = pkt_bytes[checksum_pos]
        matches = find_crc_polynomial(payload, checksum)
        if matches:
            print(f"Checksum at byte {checksum_pos}: 0x{checksum:02X}")
            print(f"Matching CRC variants: {matches[:3]}")
            break
    else:
        print("No CRC-8 match found. Consider:")
        print("  - Checksum position may be different")
        print("  - May be XOR checksum (not CRC)")
        print("  - May cover a different payload span")

C.2 XOR Checksum Test

If no CRC-8 matches, try a simple XOR checksum:

def test_xor_checksum(payload: bytes, checksum_byte: int) -> bool:
    """Test if checksum is XOR of all payload bytes."""
    xor_result = 0
    for b in payload:
        xor_result ^= b
    
    match = (xor_result == checksum_byte)
    print(f"XOR checksum test: computed 0x{xor_result:02X}, expected 0x{checksum_byte:02X}{'MATCH' if match else 'no match'}")
    return match

Part C deliverable: The polynomial (or "XOR checksum" or "unknown") with evidence from your enumeration. If no match is found, describe what you would try next.


Part D: Protocol Specification (3 points)

Produce a 1-page protocol specification using this template. Fill in every field with your best evidence. Mark confidence levels.

## Protocol Specification: ISM433-MYSTERY

**Physical layer:** OOK, 433.92 MHz, [your symbol rate] kBaud  
**Encoding:** [NRZ / Manchester -- from Part A evidence]  
**Capture source:** `ism433-mystery.cf32`, 20 sec, 2.4 MSPS  
**Analyst:** [your name]  
**Date:** [date]

---

### Frame Structure

| Field | Byte offset | Width (bits) | Encoding | Semantics | Confidence |
|---|---|---|---|---|---|
| Preamble | 0 | [width] | [pattern] | Sync marker | CONFIRMED |
| Sync word | [offset] | [width] | [hex value] | Frame start | CONFIRMED / HYPOTHESIZED |
| Device ID | [offset] | [width] | Raw binary | Fixed per device | CONFIRMED / HYPOTHESIZED |
| [Field 3] | [offset] | [width] | [encoding] | [semantics] | CONFIRMED / HYPOTHESIZED |
| Checksum | [offset] | 8 | [CRC-8/XOR/unknown] | Error detection | CONFIRMED / HYPOTHESIZED |

---

### State Machine

[Describe in 1-2 sentences: how often does the device transmit? Is it periodic or event-triggered? Is there a response expected?]

---

### Field Notes

**Fields I am confident about (CONFIRMED):**
- [List with evidence]

**Fields I am uncertain about (HYPOTHESIZED):**
- [List with reasoning]

**Residual unknowns:**
- [What you could not determine and why]

Part D deliverable: The completed protocol specification document. Save it as lab9/PROTOCOL-SPEC.md.


Lab Report

Create lab-9-report.md with:

  1. Part A: spectrum survey plot + symbol rate estimate + encoding choice with evidence
  2. Part B: field variability plot + field identification table
  3. Part C: checksum algorithm finding with evidence
  4. Part D: paste the completed PROTOCOL-SPEC.md

Grading

Component Points
Part A: bit extraction with encoding decision and evidence 6
Part B: field variability plot + at least 3 fields identified with confidence levels 7
Part C: checksum algorithm identified or failure diagnosed with next steps 4
Part D: protocol spec complete; all fields have confidence level; residual unknowns listed 3
Total 20

Instructor Note

ism433-mystery.cf32 contains a simulated OOK transmission from an ISM-band temperature/humidity sensor. Key facts (for graders only, not for students):

  • Symbol rate: 4800 Baud (NRZ)
  • Preamble: 8 × 10101010 (alternating)
  • Sync word: 0xD3 0x91
  • Frame: 3-byte payload (Device ID 12 bits, channel 2 bits, temperature 10 bits) + 1-byte CRC-8/MAXIM (poly=0x131, init=0x00, ref=true)
  • Transmission interval: approximately 56 seconds (sporadic)