Classroom Glossary Public page

RF-301 Week 11c — ML-Based Modulation Classification (Optional)

1,120 words

"The neural network does not read the textbook. It does not know that FSK has constant amplitude and PSK has constant envelope. It learns to distinguish them the same way a child learns to distinguish a cat from a dog -- by looking at thousands of examples. The useful question is not 'how?' but 'when is it better than the expert?'" -- paraphrase of O'Shea & Hoydis (2017), "An Introduction to Deep Learning for the Physical Layer"


Lecture (90 min)

11.1 Why ML for Modulation Classification

The SIGINT pipeline in Week 7 classified modulations using hand-engineered features: normalized amplitude variance, instantaneous frequency variance, phase step standard deviation. These features work well for the clear-cut cases -- OOK is obviously different from BPSK, which is obviously different from FM. But consider what happens when:

  • The SNR drops below 5 dB
  • The signal uses a modulation not in the heuristic lookup table
  • Two modulations produce similar feature vectors (e.g., QPSK vs 8PSK at 10 dB SNR)

In these conditions, the hand-engineered classifier breaks down. A convolutional neural network trained on a large dataset of labeled IQ captures can, in principle, do better -- not because it knows more about RF physics, but because it has seen more examples and learned subtler discriminating patterns in the raw IQ.

This week covers that tradeoff: when ML-based automatic modulation classification (AMC) outperforms hand-engineered features, how to build and evaluate a CNN classifier, and what failure modes to expect when the training distribution does not match the real channel.


11.2 RadioML Datasets

RadioML 2016.10A (O'Shea, 2016) is the canonical benchmark dataset for AMC research:

  • 11 modulation classes: 8PSK, AM-DSB, AM-SSB, BPSK, CPFSK, GFSK, PAM4, QAM16, QAM64, QPSK, WBFM
  • SNR range: -20 dB to +18 dB in 2 dB steps (20 SNR levels)
  • 1000 example IQ frames per (modulation, SNR) pair
  • Each frame: 128 I/Q samples (2 × 128 float array)
  • Channel model: AWGN + random phase rotation + CFO offset

RadioML 2018.01 (O'Shea, 2018) extends to 24 classes and 4,000 examples per pair, with a more realistic multipath fading channel model. The 2016.10A dataset is used in this course because it is smaller and loads faster in a lab environment.

import pickle
import numpy as np

def load_radioml_2016(path='RML2016.10a_dict.pkl'):
    """
    Load RadioML 2016.10A dataset.
    Returns: (X, labels) where X is (N, 2, 128) and labels is list of (modulation, snr).
    """
    with open(path, 'rb') as f:
        data = pickle.load(f, encoding='latin1')
    
    X_all = []
    Y_all = []
    
    for (modulation, snr), frames in data.items():
        # frames shape: (1000, 2, 128)
        for frame in frames:
            X_all.append(frame)
            Y_all.append((modulation, snr))
    
    return np.array(X_all, dtype=np.float32), Y_all

def dataset_summary(Y_labels):
    """Print dataset statistics."""
    modulations = sorted(set(m for m, _ in Y_labels))
    snrs = sorted(set(s for _, s in Y_labels))
    
    print(f"Modulation classes ({len(modulations)}): {modulations}")
    print(f"SNR range: {min(snrs)} to {max(snrs)} dB")
    print(f"Total examples: {len(Y_labels):,}")
    print(f"Examples per (modulation, SNR) pair: {len(Y_labels) // (len(modulations) * len(snrs))}")

Dataset access. RadioML 2016.10A is distributed at radioml.com (free registration required). File size: ~54 MB compressed. The dataset ships as a Python pickle; the loading function above handles the encoding quirk of Python 2 pickles loaded in Python 3.


11.3 CNN Architecture for AMC

The reference architecture from O'Shea (2016), simplified for the lab environment:

import torch
import torch.nn as nn

class ModulationCNN(nn.Module):
    """
    CNN for automatic modulation classification.
    Input: (batch, 2, 128) IQ frame
    Output: (batch, n_classes) log-probabilities
    """
    def __init__(self, n_classes=11, dropout=0.5):
        super().__init__()
        
        self.conv_block = nn.Sequential(
            # Block 1: feature extraction from raw IQ
            nn.Conv1d(2, 64, kernel_size=3, padding=1),
            nn.BatchNorm1d(64),
            nn.ReLU(),
            nn.MaxPool1d(2),   # 128 → 64
            nn.Dropout(dropout),
            
            # Block 2: temporal pattern learning
            nn.Conv1d(64, 64, kernel_size=3, padding=1),
            nn.BatchNorm1d(64),
            nn.ReLU(),
            nn.MaxPool1d(2),   # 64 → 32
            nn.Dropout(dropout),
            
            # Block 3: higher-order features
            nn.Conv1d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm1d(128),
            nn.ReLU(),
            nn.MaxPool1d(2),   # 32 → 16
            nn.Dropout(dropout),
        )
        
        # Flatten then classify
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(128 * 16, 256),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(256, n_classes),
        )
    
    def forward(self, x):
        return self.classifier(self.conv_block(x))

# Build and inspect the model
model = ModulationCNN(n_classes=11)
print(model)

# Parameter count
total_params = sum(p.numel() for p in model.parameters())
print(f"\nTotal trainable parameters: {total_params:,}")

Why Conv1d over Conv2d? The IQ frame is a 1-D time series with two channels (I and Q), not a 2-D image. Conv1d with in_channels=2 treats I and Q as the channel dimension and slides the kernel along the 128-sample time axis. Using Conv2d would require reshaping and is architecturally inappropriate for temporal sequence data.


11.4 Training Loop

from torch.utils.data import Dataset, DataLoader, random_split
import torch.optim as optim

class RadioMLDataset(Dataset):
    def __init__(self, X, Y_labels, modulation_list, snr_filter=None):
        """
        Args:
            X: (N, 2, 128) numpy array
            Y_labels: list of (modulation_str, snr_int)
            modulation_list: ordered list of class names (defines class indices)
            snr_filter: if not None, only include samples at this SNR or above
        """
        self.mod2idx = {m: i for i, m in enumerate(modulation_list)}
        
        mask = np.ones(len(X), dtype=bool)
        if snr_filter is not None:
            mask = np.array([snr >= snr_filter for _, snr in Y_labels])
        
        self.X = torch.from_numpy(X[mask])
        self.Y = torch.tensor([self.mod2idx[m] for m, _ in [Y_labels[i] for i in np.where(mask)[0]]])
    
    def __len__(self):
        return len(self.X)
    
    def __getitem__(self, idx):
        return self.X[idx], self.Y[idx]

def train_epoch(model, loader, optimizer, criterion, device):
    model.train()
    total_loss, correct, total = 0, 0, 0
    for X_batch, Y_batch in loader:
        X_batch, Y_batch = X_batch.to(device), Y_batch.to(device)
        optimizer.zero_grad()
        logits = model(X_batch)
        loss = criterion(logits, Y_batch)
        loss.backward()
        optimizer.step()
        total_loss += loss.item() * len(Y_batch)
        correct += (logits.argmax(1) == Y_batch).sum().item()
        total += len(Y_batch)
    return total_loss / total, correct / total

def eval_epoch(model, loader, criterion, device):
    model.eval()
    total_loss, correct, total = 0, 0, 0
    with torch.no_grad():
        for X_batch, Y_batch in loader:
            X_batch, Y_batch = X_batch.to(device), Y_batch.to(device)
            logits = model(X_batch)
            loss = criterion(logits, Y_batch)
            total_loss += loss.item() * len(Y_batch)
            correct += (logits.argmax(1) == Y_batch).sum().item()
            total += len(Y_batch)
    return total_loss / total, correct / total

def train_classifier(X, Y_labels, modulation_list, n_epochs=10, batch_size=256):
    """Full training pipeline."""
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    print(f"Training on {device}")
    
    # Dataset and splits
    dataset = RadioMLDataset(X, Y_labels, modulation_list)
    n_train = int(0.8 * len(dataset))
    n_val = len(dataset) - n_train
    train_ds, val_ds = random_split(dataset, [n_train, n_val])
    
    train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True, num_workers=2)
    val_loader = DataLoader(val_ds, batch_size=batch_size, shuffle=False, num_workers=2)
    
    model = ModulationCNN(n_classes=len(modulation_list)).to(device)
    optimizer = optim.Adam(model.parameters(), lr=1e-3)
    criterion = nn.CrossEntropyLoss()
    
    history = {'train_loss': [], 'val_loss': [], 'train_acc': [], 'val_acc': []}
    
    for epoch in range(n_epochs):
        tr_loss, tr_acc = train_epoch(model, train_loader, optimizer, criterion, device)
        va_loss, va_acc = eval_epoch(model, val_loader, criterion, device)
        history['train_loss'].append(tr_loss)
        history['val_loss'].append(va_loss)
        history['train_acc'].append(tr_acc)
        history['val_acc'].append(va_acc)
        print(f"Epoch {epoch+1:02d}/{n_epochs}: "
              f"train_loss={tr_loss:.4f} train_acc={tr_acc:.3f} "
              f"val_loss={va_loss:.4f} val_acc={va_acc:.3f}")
    
    return model, history

Expected results at 10 epochs. On the full RadioML 2016.10A dataset, this architecture reaches approximately 65-75% validation accuracy when all SNR levels are included. At SNR = +10 dB alone, accuracy exceeds 85%. The wide gap between these numbers is the central lesson of the chapter.


11.5 Confusion Matrix Analysis

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

def evaluate_at_snr(model, X, Y_labels, modulation_list, target_snr, device='cpu'):
    """Evaluate model accuracy at a specific SNR level."""
    model.eval()
    
    # Filter to target SNR
    indices = [i for i, (_, snr) in enumerate(Y_labels) if snr == target_snr]
    X_snr = torch.from_numpy(X[indices]).to(device)
    Y_true = np.array([modulation_list.index(Y_labels[i][0]) for i in indices])
    
    with torch.no_grad():
        logits = model(X_snr)
        Y_pred = logits.argmax(1).cpu().numpy()
    
    acc = (Y_pred == Y_true).mean()
    cm = confusion_matrix(Y_true, Y_pred)
    return acc, cm, Y_true, Y_pred

def plot_confusion_matrix(cm, modulation_list, snr, output_path='confusion_matrix.png'):
    fig, ax = plt.subplots(figsize=(10, 8))
    disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=modulation_list)
    disp.plot(ax=ax, colorbar=True, cmap='Blues', xticks_rotation=45)
    ax.set_title(f'Confusion Matrix at SNR = {snr} dB')
    plt.tight_layout()
    plt.savefig(output_path, dpi=150)
    return fig

def accuracy_vs_snr(model, X, Y_labels, modulation_list, snrs=None, device='cpu'):
    """Compute per-SNR accuracy to generate accuracy vs. SNR curve."""
    if snrs is None:
        snrs = sorted(set(snr for _, snr in Y_labels))
    
    results = {}
    for snr in snrs:
        acc, _, _, _ = evaluate_at_snr(model, X, Y_labels, modulation_list, snr, device)
        results[snr] = acc
        print(f"SNR {snr:+3d} dB: {acc:.3f}")
    return results

Key pattern in the confusion matrix. At high SNR, BPSK/QPSK/8PSK confusion dominates -- they are all constant-envelope phase-shift keyed and differ only in the number of phase states. At low SNR, nearly every modulation gets confused with BPSK because the noise overwhelms the higher-order structure. This pattern is diagnostic: if your training distribution covers all SNRs equally but your deployment environment is always high-SNR, you are training a harder problem than you need to solve.


11.6 Transfer Learning for Novel Modulations

The RadioML training distribution contains 11 classes. If a new modulation is added -- say, the proprietary OOK-FHSS hybrid in Lab 9 -- the CNN trained on RadioML will misclassify it as whatever RadioML class it most resembles.

Transfer learning approach:

  1. Freeze the convolutional layers (features they learned are broadly useful).
  2. Replace only the final linear classifier with a new layer for the expanded class set.
  3. Fine-tune on a small labeled dataset of the new class (100-500 examples).
def add_new_class(model, n_existing_classes, n_new_classes, freeze_conv=True):
    """
    Adapt a trained model to include new modulation classes.
    Strategy: replace the output layer; optionally freeze conv weights.
    """
    if freeze_conv:
        for param in model.conv_block.parameters():
            param.requires_grad = False
        print("Convolutional layers frozen.")
    
    # Get the input size of the current final layer
    old_classifier = model.classifier
    in_features = old_classifier[-1].in_features
    
    # Replace the last linear layer
    new_out = n_existing_classes + n_new_classes
    old_classifier[-1] = nn.Linear(in_features, new_out)
    
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    print(f"Trainable parameters after freeze: {trainable_params:,}")
    return model

Domain shift caveat. RadioML 2016.10A uses an AWGN channel with random phase and CFO. Real ISM-band channels have multipath fading, adjacent-channel interference, and time-varying noise. A model trained exclusively on RadioML may perform poorly in a real-world environment even at high SNR -- not because the model is poorly trained, but because the training distribution does not match the deployment distribution. This is the "domain shift" problem; it is common in applied ML and is not specific to RF.


11.7 GNU Radio Deployment

Once trained, the classifier can run inside a GNU Radio flowgraph via gr-pyblock:

# GNU Radio custom block: AMC classifier
# File: amc_classifier_block.py (place in /usr/lib/python3/dist-packages/gnuradio/)

import numpy as np
import torch
import gnuradio.gr as gr

class amc_classifier(gr.sync_block):
    """
    GNU Radio block: run ML modulation classifier on IQ stream.
    Input: complex IQ samples (128 per work call)
    Output: class index (int) as message PDU
    """
    def __init__(self, model_path='modulation_cnn.pt', n_classes=11, frame_len=128):
        gr.sync_block.__init__(self, name='amc_classifier',
                               in_sig=[np.complex64],
                               out_sig=None)
        
        self.frame_len = frame_len
        self.device = torch.device('cpu')
        self.model = ModulationCNN(n_classes=n_classes).to(self.device)
        self.model.load_state_dict(torch.load(model_path, map_location=self.device))
        self.model.eval()
        
        self.message_port_register_out(gr.pmt.intern('class'))
        self.set_output_multiple(frame_len)
    
    def work(self, input_items, output_items):
        n_frames = len(input_items[0]) // self.frame_len
        
        for i in range(n_frames):
            frame = input_items[0][i * self.frame_len : (i+1) * self.frame_len]
            
            # Convert complex64 to (2, 128) float32 array
            x = np.stack([frame.real, frame.imag]).astype(np.float32)
            x_tensor = torch.from_numpy(x).unsqueeze(0).to(self.device)
            
            with torch.no_grad():
                logits = self.model(x_tensor)
                pred_class = logits.argmax(1).item()
            
            # Publish as PDU
            self.message_port_pub(gr.pmt.intern('class'),
                                  gr.pmt.from_long(pred_class))
        
        return n_frames * self.frame_len

Throughput constraint. At 2.4 MSPS with 128 samples per inference call, the block processes 18,750 frames per second. A PyTorch CPU inference on ModulationCNN takes roughly 0.5-2 ms per call depending on hardware. This creates a maximum throughput of 500-2000 calls per second -- comfortable for 2.4 MSPS but tight for 20+ MSPS capture rates. For high-sample-rate deployments, batch multiple frames per call or move to a GPU.


11.8 Architecture Comparison Sidebar

Classifier Features Training data Compute Generalization
Template matching (matched filter) None (raw IQ correlation) None (uses known signal) Low (single correlation) Zero (works only for known signal)
Expert-feature classifier (Week 7) Hand-engineered (amp var, freq var) None (heuristic thresholds) Very low Good for well-separated modulations
RadioML CNN (this week) Learned by gradient descent 220,000 labeled IQ frames Moderate (GPU training) Good within distribution; domain shift risk
O'Shea 2021 transformer Learned attention Large RadioML + real captures High (GPU training + inference) Better out-of-distribution; slower

The key observation. The matched filter and expert-feature classifier require human knowledge -- either a known signal template or human-engineered features. The CNN requires data. The real boundary is not "smart vs. dumb" -- it is which resource is scarcer in your deployment: domain knowledge or labeled data. In an academic lab with the RadioML dataset, data is free and the CNN wins. In a real engagement targeting a novel proprietary protocol, labeled data may not exist and the expert-feature approach may be the only option until you can collect and label captures.


Lab Preview

Lab 11 trains a ModulationCNN on RadioML 2016.10A, generates confusion matrices at SNR = +10 dB and SNR = -5 dB, and applies the trained model to the Lab 7 unknown signal capture. See labs/lab-11.md.


Toolchain Diary Prompt

New this week: torch, torch.nn, torch.utils.data; sklearn.metrics.confusion_matrix; gr-pyblock (if extending to GNU Radio). Compare the accuracy-vs-SNR curve from Lab 11 to the hand-engineered classifier performance from Lab 7. At what SNR does the CNN begin to outperform your Stage 2 heuristic? At what SNR do both fail?