RF-301 Week 9 — SIGINT Techniques: Capture, Classify, Decode Unknown Signals · RF-301

"Signal intelligence begins where protocol documentation ends. When you have no specification, you build one from observation — and the discipline of building it from evidence rather than assumption is what separates SIGINT from guessing." — practitioner framing, RF-301 course doctrine

Lecture (90 min)

7.1 SIGINT Discipline: The Classification Pipeline

In RF-201, the RE workflow started with a known protocol family (LoRa, BLE, ZigBee) and used URH to confirm the hypothesis. The target was identifiable; the question was confirmation. In RF-301 SIGINT work, the target may be completely unknown. No documentation. No vendor. No community RE writeup. The workflow must start from first observation.

The classification pipeline (five stages):

Stage 1: Spectrum survey
  → What frequency? What bandwidth? What power? When does it transmit?

Stage 2: Modulation classification
  → AM/FM/PM/ASK/FSK/PSK/QAM/spread-spectrum?
  → Constellation diagram, instantaneous phase, frequency deviation analysis

Stage 3: Multiple-access identification
  → TDMA/FDMA/CDMA/OFDMA/FHSS?
  → Time-frequency map; burst timing analysis; frequency-hopping pattern

Stage 4: Symbol structure
  → Symbol rate? Bit order? Framing (preamble, sync word, payload, CRC)?
  → Eye diagram; autocorrelation; bit error patterns

Stage 5: Protocol hypothesis
  → What protocol family does this resemble?
  → Named states? Message types? Request-response pattern?

Each stage produces a hypothesis that the next stage either confirms or refutes. The discipline is to document the evidence at each stage and the confidence level of the resulting hypothesis.

7.2 Stage 1: Spectrum Survey

Tools: gr-fosphor (GPU-accelerated waterfall), GQRX, SDRAngel, SDR# (Windows)

gr-fosphor provides the most information-dense spectrum display in the SDR ecosystem. It renders both a live spectrum and a color-coded waterfall (time × frequency × power), with a separate "persistence" display that shows the statistical envelope of the spectrum over time. Signals that appear for only 1 ms per 100 ms frame are visible in the persistence display even when they're invisible in the instantaneous spectrum.

# Launch gr-fosphor in GNU Radio
# In a GRC flowgraph, add the gr-fosphor FFT Sink block
# Connect your signal source → gr-fosphor FFT Sink

# Alternatively, command-line with RTL-SDR:
fosphor_cli -f 433.92e6 -s 2.4e6

Survey parameters to record:

Parameter	How to measure	Notes
Center frequency	Tune SDR; observe strongest signal	±drift for frequency accuracy
Bandwidth	-3 dB points of spectral envelope	Filter bandwidth, not channel spacing
EIRP estimate	Calibrated power meter or reference source	Link budget reverse-calculation
Duty cycle	Persistence display; time % signal present	Burst vs. continuous
Transmit timing	Timestamp via GPS-synchronized capture	Repeat interval, inter-burst gap
Polarization	Rotate receiving antenna	Determines antenna orientation

7.3 Stage 2: Modulation Classification

Visual indicators in waterfall and constellation:

Observation	Likely modulation
Constant amplitude, varying phase	PSK (BPSK, QPSK, 8PSK)
Varying amplitude AND phase, grid pattern	QAM (16-QAM, 64-QAM)
Discrete frequency jumps, constant amplitude	FSK (2-FSK, 4-FSK, GFSK)
Chirping (frequency rises monotonically per symbol)	LoRa CSS
Wideband, noise-like appearance, low spectral density	Spread spectrum (DSSS, FHSS, CDMA)
Amplitude varies at subcarrier	AM or ASK
Multiple closely-spaced subcarriers	OFDM

Instantaneous parameter extraction:

import numpy as np
import matplotlib.pyplot as plt

def analyze_signal(iq_samples, fs):
    """Extract instantaneous amplitude, frequency, and phase from IQ samples."""
    # Instantaneous amplitude
    amplitude = np.abs(iq_samples)
    
    # Instantaneous phase (unwrapped)
    phase = np.unwrap(np.angle(iq_samples))
    
    # Instantaneous frequency = derivative of phase
    inst_freq = np.diff(phase) * fs / (2 * np.pi)
    
    # Statistical features
    amp_variance = np.var(amplitude) / np.mean(amplitude)**2  # normalized
    freq_variance = np.var(inst_freq)
    phase_variance = np.var(np.diff(np.angle(iq_samples)))
    
    print(f"Amplitude variance (normalized): {amp_variance:.4f}")
    print(f"Inst. frequency variance: {freq_variance:.1f} Hz²")
    print(f"Phase step variance: {phase_variance:.4f} rad²")
    
    # Modulation classification heuristics
    if amp_variance < 0.01 and freq_variance > 1e6:
        print("→ Likely FSK (constant amplitude, frequency variation)")
    elif amp_variance < 0.01 and freq_variance < 1e3:
        print("→ Likely PSK (constant amplitude, low frequency variation)")
    elif amp_variance > 0.1:
        print("→ Likely AM/ASK or QAM (amplitude variation)")
    
    return amplitude, phase, inst_freq

# Load a captured IQ recording (Lab 7 provides the target capture)
# iq = np.fromfile('unknown_signal.cf32', dtype=np.complex64)
# analyze_signal(iq, fs=2.4e6)

Automatic Modulation Classification (AMC): Machine learning approaches (CNNs on I/Q samples, or on constellation images) achieve >95% accuracy across 11 modulation types at SNR > 10 dB on the RadioML 2016.10A and 2018.01 datasets. The AMC literature is the reference for the ML signal classifier mentioned in the capstone option.

7.4 Stage 3: Multiple-Access Identification

Time-frequency analysis (short-time Fourier transform):

from scipy.signal import spectrogram

def plot_spectrogram(iq, fs, title='Signal Spectrogram'):
    """Compute and display spectrogram."""
    f, t, Sxx = spectrogram(
        iq, fs=fs,
        window='hann',
        nperseg=256,
        noverlap=128,
        return_onesided=False
    )
    
    # Center frequencies (FFT shift)
    f_shifted = np.fft.fftshift(f)
    Sxx_shifted = np.fft.fftshift(Sxx, axes=0)
    
    plt.figure(figsize=(12, 6))
    plt.pcolormesh(t * 1e3, f_shifted / 1e3, 10*np.log10(Sxx_shifted + 1e-10),
                   cmap='viridis', vmin=-60, vmax=0)
    plt.colorbar(label='Power (dBFS)')
    plt.xlabel('Time (ms)')
    plt.ylabel('Frequency (kHz)')
    plt.title(title)
    plt.savefig('spectrogram.png', dpi=150)

FHSS detection: Frequency-hopping spread spectrum appears in a spectrogram as short bursts at pseudo-random frequencies. The burst duration (dwell time) and frequency hop rate are visible. Bluetooth Classic uses FHSS at 1600 hops/second (625 μs dwell); military FHSS systems use hop rates of 10-1000+ hops/second.

TDMA detection: Time-division multiplexing appears as periodic bursts at a fixed frequency with inter-burst gaps. The burst duration, guard time, and frame period are measurable from the spectrogram.

OFDM detection: OFDM produces a distinctive rectangular spectral mask (flat across the bandwidth, with steep roll-off at band edges) and cyclostationary features at 1/T_sym.

7.5 Stage 4: Symbol Structure

Symbol rate estimation:

def estimate_symbol_rate(iq, fs):
    """Estimate symbol rate from the power spectral density of the signal envelope."""
    # The symbol rate appears as a spectral line in |x|²
    power = np.abs(iq)**2
    
    # PSD of the envelope signal
    from scipy.signal import welch
    f_welch, Pxx = welch(power, fs=fs, nperseg=4096)
    
    # Find spectral peaks above a threshold
    threshold = np.mean(Pxx) + 3 * np.std(Pxx)
    peaks = f_welch[Pxx > threshold]
    
    if len(peaks) > 0:
        # The lowest non-DC peak is typically the symbol rate
        dc_mask = peaks > 1000  # ignore DC region (below 1 kHz)
        if np.any(dc_mask):
            sym_rate_est = peaks[dc_mask][0]
            print(f"Estimated symbol rate: {sym_rate_est/1e3:.2f} kbaud")
            return sym_rate_est
    
    print("Symbol rate not clearly identifiable from PSD")
    return None

# After symbol rate estimation, resample to ~4-8 samples per symbol
# then look for preamble patterns using autocorrelation
def find_preamble(bits, pattern_candidates=None):
    """Search for preamble pattern in recovered bit sequence."""
    if pattern_candidates is None:
        # Common preambles: alternating 1010..., all-1s, known sync words
        pattern_candidates = [
            [1,0,1,0,1,0,1,0],     # alternating (OOK common)
            [1,1,1,1,0,0,0,0],     # 4+4 run-length
            [0xAA, 0xAA, 0xD3, 0x91],  # ISM band common preamble
        ]
    
    for pattern in pattern_candidates:
        p = np.array(pattern, dtype=float)
        if len(p) <= len(bits):
            corr = np.correlate(bits[:len(bits)], p, mode='valid')
            peak = np.max(np.abs(corr))
            if peak > 0.9 * len(p):
                peak_loc = np.argmax(np.abs(corr))
                print(f"Preamble candidate found at bit {peak_loc}: {pattern}")
                return peak_loc
    
    print("No common preamble detected")
    return None

7.6 Stage 5: Protocol Hypothesis and Documentation

The final stage is synthesizing the observations into a protocol hypothesis:

Hypothesis document structure:

Signal identification: center frequency, bandwidth, modulation (with evidence), multiple-access scheme
Symbol parameters: symbol rate (with evidence), samples-per-symbol, bit order
Frame structure: preamble (if identified), sync word, payload format, CRC/FEC (if detected)
State machine hypothesis: what states does the transmitter cycle through? Are there request-response pairs? Acknowledgements?
Protocol family hypothesis: what known protocol family does this most resemble? What are the differences?
Confidence assessment: for each claim, one of {CONFIRMED (bit-for-bit verified), INFERRED (consistent with evidence but not confirmed), HYPOTHESIZED (plausible from inspection but not tested)}

The confidence assessment discipline is the central professional skill. RE work produces hypotheses, not facts. A professional SIGINT analyst or protocol RE engineer who states a hypothesis as a fact is unreliable. The confidence level must accompany every claim.

7.7 ML Signal Classifier Capstone Option

The capstone offers an ML signal classifier track: train a convolutional neural network on I/Q samples to classify modulation types, and use it to assist the SIGINT classification pipeline.

Reference datasets:

RadioML 2016.10A (DeepSig): 220K examples, 11 modulation types, 20 SNR levels. Download from https://www.deepsig.ai/datasets
RadioML 2018.01 (DeepSig): 2.55M examples, 24 modulation types

PyTorch CNN baseline:

import torch
import torch.nn as nn

class ModulationCNN(nn.Module):
    """Simple CNN for modulation classification from IQ samples."""
    
    def __init__(self, num_classes=11, input_length=128):
        super().__init__()
        self.conv_block = nn.Sequential(
            nn.Conv1d(2, 64, kernel_size=8, padding='same'),
            nn.ReLU(),
            nn.MaxPool1d(2),
            nn.Conv1d(64, 128, kernel_size=8, padding='same'),
            nn.ReLU(),
            nn.MaxPool1d(2),
            nn.Dropout(0.3),
        )
        fc_input_size = 128 * (input_length // 4)
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(fc_input_size, 256),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(256, num_classes)
        )
    
    def forward(self, x):
        # x: [batch, 2, seq_len] (I and Q as channels)
        x = self.conv_block(x)
        return self.classifier(x)

# Training sketch
model = ModulationCNN(num_classes=11, input_length=128)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()
print(f"Model parameters: {sum(p.numel() for p in model.parameters()):,}")

This CNN architecture (based on O'Shea & West 2016) achieves ~80% accuracy at 10 dB SNR on RadioML 2016.10A. For the capstone ML-classifier option, students extend this baseline with better architectures (ResNet, Transformer) and integrate it into the Lab 7 classification pipeline.

7.8 Anchor Weave: Sklar + Wyglinski on SIGINT fundamentals

Bernard Sklar's Digital Communications (3rd ed.) contains the modulation-theory foundation for the classification pipeline. The key Sklar argument for SIGINT: the statistical properties of a modulated signal (autocorrelation, cyclostationary features, constellation statistics) are deterministic consequences of the modulation scheme. If you measure those properties correctly, the modulation scheme is uniquely identifiable in principle -- the only question is whether your SNR budget is sufficient.

Wyglinski et al. Ch 4-5 provides the receiver-chain framing: the SNR at which classification operates is determined by the receiver chain, not the signal. If your noise figure is too high or your ADC range is saturated, you will fail to classify correctly regardless of algorithm sophistication. The engineering and the algorithm are coupled.

Lab Introduction

Lab 7 (25 pts): SIGINT discipline lab. The instructor provides an unknown low-SNR capture (IQ file); students execute the full five-stage classification pipeline, document their hypothesis trail, and produce a confidence-assessed protocol hypothesis. See labs/lab-7.md.

Independent Practice

Download the RadioML 2016.10A dataset and compute the classification accuracy of the analyze_signal() heuristic function above on 10 dB SNR examples. What is the most common misclassification?
Implement a spectrogram-based FHSS detector: given a 5-second capture of Bluetooth Classic traffic, detect the 1600 hop/second pattern and estimate the hop rate
Read the Sklar Ch 14 CDMA chapter section on the PN sequence structure. Explain in 100 words how knowing the PN sequence chip rate (but not the code) narrows your DSSS hypothesis