Classroom Glossary Public page

RF-301 Week 9 — SIGINT Techniques: Capture, Classify, Decode Unknown Signals

1,054 words

"Signal intelligence begins where protocol documentation ends. When you have no specification, you build one from observation — and the discipline of building it from evidence rather than assumption is what separates SIGINT from guessing." — practitioner framing, RF-301 course doctrine


Lecture (90 min)

7.1 SIGINT Discipline: The Classification Pipeline

In RF-201, the RE workflow started with a known protocol family (LoRa, BLE, ZigBee) and used URH to confirm the hypothesis. The target was identifiable; the question was confirmation. In RF-301 SIGINT work, the target may be completely unknown. No documentation. No vendor. No community RE writeup. The workflow must start from first observation.

The classification pipeline (five stages):

Stage 1: Spectrum survey
   What frequency? What bandwidth? What power? When does it transmit?

Stage 2: Modulation classification
   AM/FM/PM/ASK/FSK/PSK/QAM/spread-spectrum?
   Constellation diagram, instantaneous phase, frequency deviation analysis

Stage 3: Multiple-access identification
   TDMA/FDMA/CDMA/OFDMA/FHSS?
   Time-frequency map; burst timing analysis; frequency-hopping pattern

Stage 4: Symbol structure
   Symbol rate? Bit order? Framing (preamble, sync word, payload, CRC)?
   Eye diagram; autocorrelation; bit error patterns

Stage 5: Protocol hypothesis
   What protocol family does this resemble?
   Named states? Message types? Request-response pattern?

Each stage produces a hypothesis that the next stage either confirms or refutes. The discipline is to document the evidence at each stage and the confidence level of the resulting hypothesis.


7.2 Stage 1: Spectrum Survey

Tools: gr-fosphor (GPU-accelerated waterfall), GQRX, SDRAngel, SDR# (Windows)

gr-fosphor provides the most information-dense spectrum display in the SDR ecosystem. It renders both a live spectrum and a color-coded waterfall (time × frequency × power), with a separate "persistence" display that shows the statistical envelope of the spectrum over time. Signals that appear for only 1 ms per 100 ms frame are visible in the persistence display even when they're invisible in the instantaneous spectrum.

# Launch gr-fosphor in GNU Radio
# In a GRC flowgraph, add the gr-fosphor FFT Sink block
# Connect your signal source → gr-fosphor FFT Sink

# Alternatively, command-line with RTL-SDR:
fosphor_cli -f 433.92e6 -s 2.4e6

Survey parameters to record:

Parameter How to measure Notes
Center frequency Tune SDR; observe strongest signal ±drift for frequency accuracy
Bandwidth -3 dB points of spectral envelope Filter bandwidth, not channel spacing
EIRP estimate Calibrated power meter or reference source Link budget reverse-calculation
Duty cycle Persistence display; time % signal present Burst vs. continuous
Transmit timing Timestamp via GPS-synchronized capture Repeat interval, inter-burst gap
Polarization Rotate receiving antenna Determines antenna orientation

7.3 Stage 2: Modulation Classification

Visual indicators in waterfall and constellation:

Observation Likely modulation
Constant amplitude, varying phase PSK (BPSK, QPSK, 8PSK)
Varying amplitude AND phase, grid pattern QAM (16-QAM, 64-QAM)
Discrete frequency jumps, constant amplitude FSK (2-FSK, 4-FSK, GFSK)
Chirping (frequency rises monotonically per symbol) LoRa CSS
Wideband, noise-like appearance, low spectral density Spread spectrum (DSSS, FHSS, CDMA)
Amplitude varies at subcarrier AM or ASK
Multiple closely-spaced subcarriers OFDM

Instantaneous parameter extraction:

import numpy as np
import matplotlib.pyplot as plt

def analyze_signal(iq_samples, fs):
    """Extract instantaneous amplitude, frequency, and phase from IQ samples."""
    # Instantaneous amplitude
    amplitude = np.abs(iq_samples)
    
    # Instantaneous phase (unwrapped)
    phase = np.unwrap(np.angle(iq_samples))
    
    # Instantaneous frequency = derivative of phase
    inst_freq = np.diff(phase) * fs / (2 * np.pi)
    
    # Statistical features
    amp_variance = np.var(amplitude) / np.mean(amplitude)**2  # normalized
    freq_variance = np.var(inst_freq)
    phase_variance = np.var(np.diff(np.angle(iq_samples)))
    
    print(f"Amplitude variance (normalized): {amp_variance:.4f}")
    print(f"Inst. frequency variance: {freq_variance:.1f} Hz²")
    print(f"Phase step variance: {phase_variance:.4f} rad²")
    
    # Modulation classification heuristics
    if amp_variance < 0.01 and freq_variance > 1e6:
        print("→ Likely FSK (constant amplitude, frequency variation)")
    elif amp_variance < 0.01 and freq_variance < 1e3:
        print("→ Likely PSK (constant amplitude, low frequency variation)")
    elif amp_variance > 0.1:
        print("→ Likely AM/ASK or QAM (amplitude variation)")
    
    return amplitude, phase, inst_freq

# Load a captured IQ recording (Lab 7 provides the target capture)
# iq = np.fromfile('unknown_signal.cf32', dtype=np.complex64)
# analyze_signal(iq, fs=2.4e6)

Automatic Modulation Classification (AMC): Machine learning approaches (CNNs on I/Q samples, or on constellation images) achieve >95% accuracy across 11 modulation types at SNR > 10 dB on the RadioML 2016.10A and 2018.01 datasets. The AMC literature is the reference for the ML signal classifier mentioned in the capstone option.


7.4 Stage 3: Multiple-Access Identification

Time-frequency analysis (short-time Fourier transform):

from scipy.signal import spectrogram

def plot_spectrogram(iq, fs, title='Signal Spectrogram'):
    """Compute and display spectrogram."""
    f, t, Sxx = spectrogram(
        iq, fs=fs,
        window='hann',
        nperseg=256,
        noverlap=128,
        return_onesided=False
    )
    
    # Center frequencies (FFT shift)
    f_shifted = np.fft.fftshift(f)
    Sxx_shifted = np.fft.fftshift(Sxx, axes=0)
    
    plt.figure(figsize=(12, 6))
    plt.pcolormesh(t * 1e3, f_shifted / 1e3, 10*np.log10(Sxx_shifted + 1e-10),
                   cmap='viridis', vmin=-60, vmax=0)
    plt.colorbar(label='Power (dBFS)')
    plt.xlabel('Time (ms)')
    plt.ylabel('Frequency (kHz)')
    plt.title(title)
    plt.savefig('spectrogram.png', dpi=150)

FHSS detection: Frequency-hopping spread spectrum appears in a spectrogram as short bursts at pseudo-random frequencies. The burst duration (dwell time) and frequency hop rate are visible. Bluetooth Classic uses FHSS at 1600 hops/second (625 μs dwell); military FHSS systems use hop rates of 10-1000+ hops/second.

TDMA detection: Time-division multiplexing appears as periodic bursts at a fixed frequency with inter-burst gaps. The burst duration, guard time, and frame period are measurable from the spectrogram.

OFDM detection: OFDM produces a distinctive rectangular spectral mask (flat across the bandwidth, with steep roll-off at band edges) and cyclostationary features at 1/T_sym.


7.5 Stage 4: Symbol Structure

Symbol rate estimation:

def estimate_symbol_rate(iq, fs):
    """Estimate symbol rate from the power spectral density of the signal envelope."""
    # The symbol rate appears as a spectral line in |x|²
    power = np.abs(iq)**2
    
    # PSD of the envelope signal
    from scipy.signal import welch
    f_welch, Pxx = welch(power, fs=fs, nperseg=4096)
    
    # Find spectral peaks above a threshold
    threshold = np.mean(Pxx) + 3 * np.std(Pxx)
    peaks = f_welch[Pxx > threshold]
    
    if len(peaks) > 0:
        # The lowest non-DC peak is typically the symbol rate
        dc_mask = peaks > 1000  # ignore DC region (below 1 kHz)
        if np.any(dc_mask):
            sym_rate_est = peaks[dc_mask][0]
            print(f"Estimated symbol rate: {sym_rate_est/1e3:.2f} kbaud")
            return sym_rate_est
    
    print("Symbol rate not clearly identifiable from PSD")
    return None

# After symbol rate estimation, resample to ~4-8 samples per symbol
# then look for preamble patterns using autocorrelation
def find_preamble(bits, pattern_candidates=None):
    """Search for preamble pattern in recovered bit sequence."""
    if pattern_candidates is None:
        # Common preambles: alternating 1010..., all-1s, known sync words
        pattern_candidates = [
            [1,0,1,0,1,0,1,0],     # alternating (OOK common)
            [1,1,1,1,0,0,0,0],     # 4+4 run-length
            [0xAA, 0xAA, 0xD3, 0x91],  # ISM band common preamble
        ]
    
    for pattern in pattern_candidates:
        p = np.array(pattern, dtype=float)
        if len(p) <= len(bits):
            corr = np.correlate(bits[:len(bits)], p, mode='valid')
            peak = np.max(np.abs(corr))
            if peak > 0.9 * len(p):
                peak_loc = np.argmax(np.abs(corr))
                print(f"Preamble candidate found at bit {peak_loc}: {pattern}")
                return peak_loc
    
    print("No common preamble detected")
    return None

7.6 Stage 5: Protocol Hypothesis and Documentation

The final stage is synthesizing the observations into a protocol hypothesis:

Hypothesis document structure:

  1. Signal identification: center frequency, bandwidth, modulation (with evidence), multiple-access scheme
  2. Symbol parameters: symbol rate (with evidence), samples-per-symbol, bit order
  3. Frame structure: preamble (if identified), sync word, payload format, CRC/FEC (if detected)
  4. State machine hypothesis: what states does the transmitter cycle through? Are there request-response pairs? Acknowledgements?
  5. Protocol family hypothesis: what known protocol family does this most resemble? What are the differences?
  6. Confidence assessment: for each claim, one of {CONFIRMED (bit-for-bit verified), INFERRED (consistent with evidence but not confirmed), HYPOTHESIZED (plausible from inspection but not tested)}

The confidence assessment discipline is the central professional skill. RE work produces hypotheses, not facts. A professional SIGINT analyst or protocol RE engineer who states a hypothesis as a fact is unreliable. The confidence level must accompany every claim.


7.7 ML Signal Classifier Capstone Option

The capstone offers an ML signal classifier track: train a convolutional neural network on I/Q samples to classify modulation types, and use it to assist the SIGINT classification pipeline.

Reference datasets:

  • RadioML 2016.10A (DeepSig): 220K examples, 11 modulation types, 20 SNR levels. Download from https://www.deepsig.ai/datasets
  • RadioML 2018.01 (DeepSig): 2.55M examples, 24 modulation types

PyTorch CNN baseline:

import torch
import torch.nn as nn

class ModulationCNN(nn.Module):
    """Simple CNN for modulation classification from IQ samples."""
    
    def __init__(self, num_classes=11, input_length=128):
        super().__init__()
        self.conv_block = nn.Sequential(
            nn.Conv1d(2, 64, kernel_size=8, padding='same'),
            nn.ReLU(),
            nn.MaxPool1d(2),
            nn.Conv1d(64, 128, kernel_size=8, padding='same'),
            nn.ReLU(),
            nn.MaxPool1d(2),
            nn.Dropout(0.3),
        )
        fc_input_size = 128 * (input_length // 4)
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(fc_input_size, 256),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(256, num_classes)
        )
    
    def forward(self, x):
        # x: [batch, 2, seq_len] (I and Q as channels)
        x = self.conv_block(x)
        return self.classifier(x)

# Training sketch
model = ModulationCNN(num_classes=11, input_length=128)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()
print(f"Model parameters: {sum(p.numel() for p in model.parameters()):,}")

This CNN architecture (based on O'Shea & West 2016) achieves ~80% accuracy at 10 dB SNR on RadioML 2016.10A. For the capstone ML-classifier option, students extend this baseline with better architectures (ResNet, Transformer) and integrate it into the Lab 7 classification pipeline.


7.8 Anchor Weave: Sklar + Wyglinski on SIGINT fundamentals

Bernard Sklar's Digital Communications (3rd ed.) contains the modulation-theory foundation for the classification pipeline. The key Sklar argument for SIGINT: the statistical properties of a modulated signal (autocorrelation, cyclostationary features, constellation statistics) are deterministic consequences of the modulation scheme. If you measure those properties correctly, the modulation scheme is uniquely identifiable in principle -- the only question is whether your SNR budget is sufficient.

Wyglinski et al. Ch 4-5 provides the receiver-chain framing: the SNR at which classification operates is determined by the receiver chain, not the signal. If your noise figure is too high or your ADC range is saturated, you will fail to classify correctly regardless of algorithm sophistication. The engineering and the algorithm are coupled.


Lab Introduction

Lab 7 (25 pts): SIGINT discipline lab. The instructor provides an unknown low-SNR capture (IQ file); students execute the full five-stage classification pipeline, document their hypothesis trail, and produce a confidence-assessed protocol hypothesis. See labs/lab-7.md.

Independent Practice

  1. Download the RadioML 2016.10A dataset and compute the classification accuracy of the analyze_signal() heuristic function above on 10 dB SNR examples. What is the most common misclassification?
  2. Implement a spectrogram-based FHSS detector: given a 5-second capture of Bluetooth Classic traffic, detect the 1600 hop/second pattern and estimate the hop rate
  3. Read the Sklar Ch 14 CDMA chapter section on the PN sequence structure. Explain in 100 words how knowing the PN sequence chip rate (but not the code) narrows your DSSS hypothesis