Duration: 2 hr lecture + 3 hr lab + 5 hr independent Lab: Lab 4 (HuggingFace Model Card Audit + Pickle Risk Analysis) OWASP anchor: LLM03:2025 Supply Chain / ASI04:2026 Agentic Supply Chain Vulnerabilities Foundational weave: Mitchell Ch 2 (the brittleness of trained systems when the distribution shifts; applies to supply chain contamination)
4.1 The AI Supply Chain
AI applications have a supply chain that parallels the software supply chain, with its own unique attack surfaces:
Training data --> Model weights --> Model hosting --> Application code --> Deployment
| | | |
Poisoning Weight tampering Malicious model Package injection
(LLM04) (LLM03) (LLM03) (LLM03)
LLM03:2025 covers the components in that chain that can be compromised before the application developer even writes a line of code:
- Pre-trained models downloaded from public repositories (HuggingFace Hub, Ollama library, model zoos)
- Fine-tune datasets sourced from public or third-party providers
- Python packages in the ML ecosystem (PyPI, conda-forge)
- Model-serving infrastructure (Ollama, vLLM, TGI, TensorRT-LLM)
- Plugins and tool integrations (LangChain community integrations, Semantic Kernel plugins)
4.2 Malicious Models: The Pickle Problem
Model weights are commonly serialized using Python's pickle format (PyTorch's .bin, .pt, and .pth files). Pickle is arbitrary code execution. When you unpickle a file, Python executes whatever __reduce__ method the pickle object specifies. A malicious model file can run arbitrary shell commands on the loading machine.
This is not theoretical. Multiple malicious models have been discovered on HuggingFace Hub running cryptominers, reverse shells, and data-exfiltration payloads on load.
Attack scenario:
- Attacker uploads a model to HuggingFace Hub with a convincing model card ("fine-tuned Llama 3.2 for cybersecurity tasks")
- Developer downloads the model:
from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("attacker/model") from_pretrained()callstorch.load(), which unpickles the weights, which executes the attacker's payload
4.3 Safetensors: The Defense
safetensors (by HuggingFace) is a format designed to eliminate the pickle attack surface. It stores tensors as raw binary data with a header; there is no code execution path during loading.
# Unsafe: loads a .bin or .pt file via pickle
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("model-id") # can execute arbitrary code
# Safe: explicitly request safetensors format
model = AutoModelForCausalLM.from_pretrained(
"model-id",
use_safetensors=True # fails if only pickle weights are available
)
HuggingFace now flags models that do not provide safetensors variants and prompts uploaders to add them. As of 2026, most major model families provide safetensors. The safetensors Python package also provides a safe_open() function that you can use to scan a weight file before loading.
Practitioner rule: Never load a model weight file with torch.load() directly. Use transformers.from_pretrained() with use_safetensors=True, or the safetensors.torch.load_file() API.
4.4 HuggingFace Model Card Auditing
HuggingFace Hub hosts over 500,000 model checkpoints. Before using any model:
Check 1: File manifest. Does the repository contain only expected file types? A legitimate model repository typically contains: config.json, tokenizer.json, model.safetensors (or sharded variants), special_tokens_map.json. Red flags: .py files in the root, .sh scripts, large .zip archives, unexpected binaries.
Check 2: Download count and age. A model uploaded yesterday with zero downloads and a polished-looking card is suspicious. Cross-reference the model card's claimed base model against the actual architecture.
Check 3: Organization verification. HuggingFace shows a verification badge for organizations that have gone through identity verification. Unverified organizations can use names that closely resemble verified ones.
Check 4: Pickle scanner. Run picklescan against downloaded files before loading:
pip install picklescan
picklescan -p model_directory/
Check 5: Model card license and intended use. Fine-tunes trained on proprietary data or distributed under unclear licenses create legal and operational risk in addition to security risk.
4.5 Dataset Poisoning in the Supply Chain
Fine-tune datasets are a second entry point. If an attacker can contribute to a dataset that will be used to fine-tune a model, they can alter the model's behavior via training -- this overlaps with LLM04:2025 (Data and Model Poisoning, Module 5) but the supply chain angle is: the contaminated dataset is sourced from a third party that the developer trusts.
Practical example: A company fine-tunes a customer service model on a dataset sourced from a public dataset repository. An attacker who contributes to that repository includes training examples where the "correct" response to a specific trigger phrase is to reveal the system prompt or exfiltrate data. The company trains on the dataset, deploys the model, and is now running a backdoored assistant.
This is analogous to a compromised npm package: the supply chain component looks legitimate, but contains malicious behavior that activates under specific conditions.
4.6 Package Supply Chain: PyPI and ML Packages
The Python ML ecosystem is a target-rich environment for PyPI supply chain attacks. Common vectors:
Typosquatting. torchvision (legitimate) vs. torch-vision (malicious). The ML ecosystem has hundreds of packages with easily confusable names.
Dependency confusion. Private packages with the same name as an internal package on a public registry. When the build system resolves the dependency, it downloads the attacker's public version instead of the internal one.
Compromised maintainer. Popular ML packages have been targeted via compromised maintainer accounts. Given that many ML packages run on GPU clusters with large memory and network bandwidth, a compromised ML dependency is a premium cryptomining platform.
Mitigations: pin all dependency versions in requirements.txt, use a hash-verified lockfile, audit packages with pip-audit.
4.7 ASI04:2026 -- Agentic Supply Chain Vulnerabilities
The ASI extension of LLM03 focuses on the new supply chain components that agentic systems introduce:
Plugin registries. Agents discover and load plugins at runtime. A compromised plugin registry entry (or a malicious plugin with a name similar to a legitimate one) gets loaded and executed with the agent's permissions.
Tool schema tampering. An agent discovers tools via OpenAPI specifications or tool schemas. If an attacker can substitute a malicious schema (e.g., via a MiTM attack on tool discovery), the agent may call tools it did not intend to.
Subagent injection. In multi-agent orchestration (an orchestrator spawning worker agents), a compromised subagent reports back false results that manipulate the orchestrator's plan. The orchestrator trusts the subagent because it spawned it, but the subagent has been compromised via its own supply chain.
4.8 Burp Suite: Intercepting HuggingFace API Calls
Lab 4 uses Burp Suite to intercept the HTTP calls made by huggingface_hub when downloading a model card and weights. This is useful for:
- Auditing exactly what metadata is fetched before model weights are loaded
- Verifying that
use_safetensors=Trueis causing the client to request.safetensorsfiles rather than.binfiles - Confirming that no unexpected network calls are made during model loading (a sign of a malicious
__reduce__payload)
Configuration:
import os
os.environ["HTTPS_PROXY"] = "http://127.0.0.1:8080"
os.environ["HF_HUB_DISABLE_PROGRESS_BARS"] = "1"
from huggingface_hub import hf_hub_download
hf_hub_download("distilbert/distilbert-base-uncased", "config.json")
In Burp, you will see the GET request to huggingface.co/distilbert/distilbert-base-uncased/resolve/main/config.json. Compare the request sequence for a safetensors download vs. a pickle download.
4.9 Module 4 Summary
| Concept | Key takeaway |
|---|---|
| Pickle = arbitrary code exec | Never call torch.load() on untrusted files; use safetensors |
| HuggingFace audit checklist | File manifest; download count; org verification; picklescan |
| Dataset supply chain | Poisoned training data from trusted third parties; overlaps LLM04 |
| PyPI supply chain | Typosquatting; dependency confusion; compromised maintainers |
| ASI04 | Plugin registries, tool schemas, and subagents add agentic-specific vectors |
Reading for Module 5
- OWASP LLM04:2025 (Data and Model Poisoning) advisory
- OWASP LLM05:2025 (Improper Output Handling) advisory
- Wan et al., "Backdoor Attacks on Language Models" (arXiv 2211.00144) -- abstract + Section 1 sufficient for Module 5 context