"Signal intelligence begins where protocol documentation ends. When you have no specification, you build one from observation — and the discipline of building it from evidence rather than assumption is what separates SIGINT from guessing." — practitioner framing, RF-301 course doctrine
Lecture (90 min)
7.1 SIGINT Discipline: The Classification Pipeline
In RF-201, the RE workflow started with a known protocol family (LoRa, BLE, ZigBee) and used URH to confirm the hypothesis. The target was identifiable; the question was confirmation. In RF-301 SIGINT work, the target may be completely unknown. No documentation. No vendor. No community RE writeup. The workflow must start from first observation.
The classification pipeline (five stages):
Stage 1: Spectrum survey
→ What frequency? What bandwidth? What power? When does it transmit?
Stage 2: Modulation classification
→ AM/FM/PM/ASK/FSK/PSK/QAM/spread-spectrum?
→ Constellation diagram, instantaneous phase, frequency deviation analysis
Stage 3: Multiple-access identification
→ TDMA/FDMA/CDMA/OFDMA/FHSS?
→ Time-frequency map; burst timing analysis; frequency-hopping pattern
Stage 4: Symbol structure
→ Symbol rate? Bit order? Framing (preamble, sync word, payload, CRC)?
→ Eye diagram; autocorrelation; bit error patterns
Stage 5: Protocol hypothesis
→ What protocol family does this resemble?
→ Named states? Message types? Request-response pattern?
Each stage produces a hypothesis that the next stage either confirms or refutes. The discipline is to document the evidence at each stage and the confidence level of the resulting hypothesis.
7.2 Stage 1: Spectrum Survey
Tools: gr-fosphor (GPU-accelerated waterfall), GQRX, SDRAngel, SDR# (Windows)
gr-fosphor provides the most information-dense spectrum display in the SDR ecosystem. It renders both a live spectrum and a color-coded waterfall (time × frequency × power), with a separate "persistence" display that shows the statistical envelope of the spectrum over time. Signals that appear for only 1 ms per 100 ms frame are visible in the persistence display even when they're invisible in the instantaneous spectrum.
# Launch gr-fosphor in GNU Radio
# In a GRC flowgraph, add the gr-fosphor FFT Sink block
# Connect your signal source → gr-fosphor FFT Sink
# Alternatively, command-line with RTL-SDR:
fosphor_cli -f 433.92e6 -s 2.4e6
Survey parameters to record:
| Parameter | How to measure | Notes |
|---|---|---|
| Center frequency | Tune SDR; observe strongest signal | ±drift for frequency accuracy |
| Bandwidth | -3 dB points of spectral envelope | Filter bandwidth, not channel spacing |
| EIRP estimate | Calibrated power meter or reference source | Link budget reverse-calculation |
| Duty cycle | Persistence display; time % signal present | Burst vs. continuous |
| Transmit timing | Timestamp via GPS-synchronized capture | Repeat interval, inter-burst gap |
| Polarization | Rotate receiving antenna | Determines antenna orientation |
7.3 Stage 2: Modulation Classification
Visual indicators in waterfall and constellation:
| Observation | Likely modulation |
|---|---|
| Constant amplitude, varying phase | PSK (BPSK, QPSK, 8PSK) |
| Varying amplitude AND phase, grid pattern | QAM (16-QAM, 64-QAM) |
| Discrete frequency jumps, constant amplitude | FSK (2-FSK, 4-FSK, GFSK) |
| Chirping (frequency rises monotonically per symbol) | LoRa CSS |
| Wideband, noise-like appearance, low spectral density | Spread spectrum (DSSS, FHSS, CDMA) |
| Amplitude varies at subcarrier | AM or ASK |
| Multiple closely-spaced subcarriers | OFDM |
Instantaneous parameter extraction:
import numpy as np
import matplotlib.pyplot as plt
def analyze_signal(iq_samples, fs):
"""Extract instantaneous amplitude, frequency, and phase from IQ samples."""
# Instantaneous amplitude
amplitude = np.abs(iq_samples)
# Instantaneous phase (unwrapped)
phase = np.unwrap(np.angle(iq_samples))
# Instantaneous frequency = derivative of phase
inst_freq = np.diff(phase) * fs / (2 * np.pi)
# Statistical features
amp_variance = np.var(amplitude) / np.mean(amplitude)**2 # normalized
freq_variance = np.var(inst_freq)
phase_variance = np.var(np.diff(np.angle(iq_samples)))
print(f"Amplitude variance (normalized): {amp_variance:.4f}")
print(f"Inst. frequency variance: {freq_variance:.1f} Hz²")
print(f"Phase step variance: {phase_variance:.4f} rad²")
# Modulation classification heuristics
if amp_variance < 0.01 and freq_variance > 1e6:
print("→ Likely FSK (constant amplitude, frequency variation)")
elif amp_variance < 0.01 and freq_variance < 1e3:
print("→ Likely PSK (constant amplitude, low frequency variation)")
elif amp_variance > 0.1:
print("→ Likely AM/ASK or QAM (amplitude variation)")
return amplitude, phase, inst_freq
# Load a captured IQ recording (Lab 7 provides the target capture)
# iq = np.fromfile('unknown_signal.cf32', dtype=np.complex64)
# analyze_signal(iq, fs=2.4e6)
Automatic Modulation Classification (AMC): Machine learning approaches (CNNs on I/Q samples, or on constellation images) achieve >95% accuracy across 11 modulation types at SNR > 10 dB on the RadioML 2016.10A and 2018.01 datasets. The AMC literature is the reference for the ML signal classifier mentioned in the capstone option.
7.4 Stage 3: Multiple-Access Identification
Time-frequency analysis (short-time Fourier transform):
from scipy.signal import spectrogram
def plot_spectrogram(iq, fs, title='Signal Spectrogram'):
"""Compute and display spectrogram."""
f, t, Sxx = spectrogram(
iq, fs=fs,
window='hann',
nperseg=256,
noverlap=128,
return_onesided=False
)
# Center frequencies (FFT shift)
f_shifted = np.fft.fftshift(f)
Sxx_shifted = np.fft.fftshift(Sxx, axes=0)
plt.figure(figsize=(12, 6))
plt.pcolormesh(t * 1e3, f_shifted / 1e3, 10*np.log10(Sxx_shifted + 1e-10),
cmap='viridis', vmin=-60, vmax=0)
plt.colorbar(label='Power (dBFS)')
plt.xlabel('Time (ms)')
plt.ylabel('Frequency (kHz)')
plt.title(title)
plt.savefig('spectrogram.png', dpi=150)
FHSS detection: Frequency-hopping spread spectrum appears in a spectrogram as short bursts at pseudo-random frequencies. The burst duration (dwell time) and frequency hop rate are visible. Bluetooth Classic uses FHSS at 1600 hops/second (625 μs dwell); military FHSS systems use hop rates of 10-1000+ hops/second.
TDMA detection: Time-division multiplexing appears as periodic bursts at a fixed frequency with inter-burst gaps. The burst duration, guard time, and frame period are measurable from the spectrogram.
OFDM detection: OFDM produces a distinctive rectangular spectral mask (flat across the bandwidth, with steep roll-off at band edges) and cyclostationary features at 1/T_sym.
7.5 Stage 4: Symbol Structure
Symbol rate estimation:
def estimate_symbol_rate(iq, fs):
"""Estimate symbol rate from the power spectral density of the signal envelope."""
# The symbol rate appears as a spectral line in |x|²
power = np.abs(iq)**2
# PSD of the envelope signal
from scipy.signal import welch
f_welch, Pxx = welch(power, fs=fs, nperseg=4096)
# Find spectral peaks above a threshold
threshold = np.mean(Pxx) + 3 * np.std(Pxx)
peaks = f_welch[Pxx > threshold]
if len(peaks) > 0:
# The lowest non-DC peak is typically the symbol rate
dc_mask = peaks > 1000 # ignore DC region (below 1 kHz)
if np.any(dc_mask):
sym_rate_est = peaks[dc_mask][0]
print(f"Estimated symbol rate: {sym_rate_est/1e3:.2f} kbaud")
return sym_rate_est
print("Symbol rate not clearly identifiable from PSD")
return None
# After symbol rate estimation, resample to ~4-8 samples per symbol
# then look for preamble patterns using autocorrelation
def find_preamble(bits, pattern_candidates=None):
"""Search for preamble pattern in recovered bit sequence."""
if pattern_candidates is None:
# Common preambles: alternating 1010..., all-1s, known sync words
pattern_candidates = [
[1,0,1,0,1,0,1,0], # alternating (OOK common)
[1,1,1,1,0,0,0,0], # 4+4 run-length
[0xAA, 0xAA, 0xD3, 0x91], # ISM band common preamble
]
for pattern in pattern_candidates:
p = np.array(pattern, dtype=float)
if len(p) <= len(bits):
corr = np.correlate(bits[:len(bits)], p, mode='valid')
peak = np.max(np.abs(corr))
if peak > 0.9 * len(p):
peak_loc = np.argmax(np.abs(corr))
print(f"Preamble candidate found at bit {peak_loc}: {pattern}")
return peak_loc
print("No common preamble detected")
return None
7.6 Stage 5: Protocol Hypothesis and Documentation
The final stage is synthesizing the observations into a protocol hypothesis:
Hypothesis document structure:
- Signal identification: center frequency, bandwidth, modulation (with evidence), multiple-access scheme
- Symbol parameters: symbol rate (with evidence), samples-per-symbol, bit order
- Frame structure: preamble (if identified), sync word, payload format, CRC/FEC (if detected)
- State machine hypothesis: what states does the transmitter cycle through? Are there request-response pairs? Acknowledgements?
- Protocol family hypothesis: what known protocol family does this most resemble? What are the differences?
- Confidence assessment: for each claim, one of {CONFIRMED (bit-for-bit verified), INFERRED (consistent with evidence but not confirmed), HYPOTHESIZED (plausible from inspection but not tested)}
The confidence assessment discipline is the central professional skill. RE work produces hypotheses, not facts. A professional SIGINT analyst or protocol RE engineer who states a hypothesis as a fact is unreliable. The confidence level must accompany every claim.
7.7 ML Signal Classifier Capstone Option
The capstone offers an ML signal classifier track: train a convolutional neural network on I/Q samples to classify modulation types, and use it to assist the SIGINT classification pipeline.
Reference datasets:
- RadioML 2016.10A (DeepSig): 220K examples, 11 modulation types, 20 SNR levels. Download from
https://www.deepsig.ai/datasets - RadioML 2018.01 (DeepSig): 2.55M examples, 24 modulation types
PyTorch CNN baseline:
import torch
import torch.nn as nn
class ModulationCNN(nn.Module):
"""Simple CNN for modulation classification from IQ samples."""
def __init__(self, num_classes=11, input_length=128):
super().__init__()
self.conv_block = nn.Sequential(
nn.Conv1d(2, 64, kernel_size=8, padding='same'),
nn.ReLU(),
nn.MaxPool1d(2),
nn.Conv1d(64, 128, kernel_size=8, padding='same'),
nn.ReLU(),
nn.MaxPool1d(2),
nn.Dropout(0.3),
)
fc_input_size = 128 * (input_length // 4)
self.classifier = nn.Sequential(
nn.Flatten(),
nn.Linear(fc_input_size, 256),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(256, num_classes)
)
def forward(self, x):
# x: [batch, 2, seq_len] (I and Q as channels)
x = self.conv_block(x)
return self.classifier(x)
# Training sketch
model = ModulationCNN(num_classes=11, input_length=128)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()
print(f"Model parameters: {sum(p.numel() for p in model.parameters()):,}")
This CNN architecture (based on O'Shea & West 2016) achieves ~80% accuracy at 10 dB SNR on RadioML 2016.10A. For the capstone ML-classifier option, students extend this baseline with better architectures (ResNet, Transformer) and integrate it into the Lab 7 classification pipeline.
7.8 Anchor Weave: Sklar + Wyglinski on SIGINT fundamentals
Bernard Sklar's Digital Communications (3rd ed.) contains the modulation-theory foundation for the classification pipeline. The key Sklar argument for SIGINT: the statistical properties of a modulated signal (autocorrelation, cyclostationary features, constellation statistics) are deterministic consequences of the modulation scheme. If you measure those properties correctly, the modulation scheme is uniquely identifiable in principle -- the only question is whether your SNR budget is sufficient.
Wyglinski et al. Ch 4-5 provides the receiver-chain framing: the SNR at which classification operates is determined by the receiver chain, not the signal. If your noise figure is too high or your ADC range is saturated, you will fail to classify correctly regardless of algorithm sophistication. The engineering and the algorithm are coupled.
Lab Introduction
Lab 7 (25 pts): SIGINT discipline lab. The instructor provides an unknown low-SNR capture (IQ file); students execute the full five-stage classification pipeline, document their hypothesis trail, and produce a confidence-assessed protocol hypothesis. See labs/lab-7.md.
Independent Practice
- Download the RadioML 2016.10A dataset and compute the classification accuracy of the
analyze_signal()heuristic function above on 10 dB SNR examples. What is the most common misclassification? - Implement a spectrogram-based FHSS detector: given a 5-second capture of Bluetooth Classic traffic, detect the 1600 hop/second pattern and estimate the hop rate
- Read the Sklar Ch 14 CDMA chapter section on the PN sequence structure. Explain in 100 words how knowing the PN sequence chip rate (but not the code) narrows your DSSS hypothesis