Classroom Glossary Public page

Module 9: OWASP LLM Top 10 Full Audit

1,380 words

Duration: 3 hr lecture + 5 hr lab + 6 hr independent
Lab: Lab 9.1 (Structured OWASP audit of the DVLA testbed; ATLAS technique cross-map)
Points: 20
MITRE ATLAS tactics: All 16 tactics covered via the OWASP cross-map
Required reading (before lab -- not optional):

  • OWASP LLM AI Security and Privacy Top 10 2025 (full document; available at owasp.org)
  • OWASP AI Exchange (owasp.org/www-project-ai-security-and-privacy-guide/)
  • MITRE ATLAS v5.1.0: at least 5 case studies across at least 4 distinct tactics
    Christian weave: The Alignment Problem, Normativity arc Ch 9 ("Inverse Reward Design") -- the difficulty of specifying what you actually want; OWASP LLM01-LLM10 is a taxonomy of the ways that specification fails in deployed systems
    Prerequisite: Modules 1-8 complete; Lab 8.1 topology results in hand

9.1 The Audit Discipline

Every prior lab in AI-301 attacked one vulnerability class at a time: prompt injection in Modules 3 and 5; output exploitation in Module 6; side channels in Module 7; supply-chain compromise in Module 7.5; multi-agent lateral movement in Module 8.

Module 9 introduces the audit discipline: instead of attacking one class at a time, the student performs a structured assessment of the full attack surface, applying a standard taxonomy to a target system. The target is the DVLA. The taxonomy is the OWASP LLM Top 10 2025.

The audit discipline is what separates a penetration tester from a security researcher. A researcher finds one interesting bug. A penetration tester accounts for the full surface and produces a structured report that a defender can act on. Belt-5 security work requires both skills; Module 9 develops the audit one.


9.2 The OWASP LLM Top 10 (2025): Cross-Module Map

The ten categories, each cross-referenced to the AI-301 module where the student already studied or attacked it:

# OWASP Category What it covers AI-301 cross-reference
LLM01 Prompt Injection Direct and indirect injection via attacker-controlled input Modules 3, 5, 8
LLM02 Sensitive Information Disclosure Model reveals training data, system prompts, or user PII Module 3 (system prompt extraction), Module 7 (latency side-channel)
LLM03 Supply Chain Vulnerabilities Compromised model weights, fine-tuning pipelines, or dependencies Module 7.5 (fine-tuning attack), AI-201 LangGrinch CVE
LLM04 Data and Model Poisoning Training data manipulation alters model behavior Module 7.5 (Qi et al.), Module 10 (RAG vector poisoning)
LLM05 Improper Output Handling Unvalidated model output interpreted as code, SQL, or HTML Module 6 (type confusion, untyped output exploitation)
LLM06 Excessive Agency Agent with overly broad permissions causes real-world harm Module 8 (multi-agent lateral movement)
LLM07 System Prompt Leakage System prompt extracted via jailbreak or model memorization Module 3
LLM08 Vector and Embedding Weaknesses RAG retrieval manipulated via poisoned vector store Module 10 (ChromaDB)
LLM09 Misinformation Model generates plausible but false information that causes harm Module 11 (deployment posture)
LLM10 Unbounded Consumption Denial-of-service via prompt length, token generation, or cost inflation Module 11

Why this table is the organizing principle of Module 9. Each category is not a new concept; it is a name for a failure mode the student has already observed. The audit produces a report that maps each category to observed behavior in the target system. This is the same structure as a CVE audit: the CVE taxonomy provides categories; the auditor maps findings to categories; the report documents coverage.


9.3 The ATLAS Cross-Map

MITRE ATLAS and OWASP LLM Top 10 are complementary taxonomies. OWASP describes vulnerability classes from the defender's perspective (what can go wrong). ATLAS describes adversary behaviors from the attacker's perspective (what an attacker does). The cross-map below shows how each OWASP category corresponds to ATLAS tactics and techniques.

OWASP Category Primary ATLAS Tactic Primary ATLAS Technique
LLM01 Prompt Injection ML Execution (AML.T0043) Prompt Injection (AML.T0051)
LLM02 Sensitive Information Disclosure ML Reconnaissance (AML.T0000) System Prompt Extraction (AML.T0017)
LLM03 Supply Chain Vulnerabilities ML Initial Access (AML.T0018) Compromise ML Model via Dependency (AML.T0056)
LLM04 Data and Model Poisoning ML Persistence (AML.T0023) Poison Training Data (AML.T0020)
LLM05 Improper Output Handling ML Impact (AML.T0048) LLM Output Exploitation (derived)
LLM06 Excessive Agency ML Lateral Movement (AML.T0056) Compromise via Agent Chain (derived)
LLM07 System Prompt Leakage ML Reconnaissance (AML.T0000) Discover Model Ontology (AML.T0017)
LLM08 Vector and Embedding Weaknesses ML Initial Access (AML.T0018) Poison Training Data (RAG variant)
LLM09 Misinformation ML Impact (AML.T0048) Spearphishing via LLM (derived)
LLM10 Unbounded Consumption ML Impact (AML.T0048) Denial of ML Service (AML.T0029)

Note: some ATLAS technique codes are marked "derived" because ATLAS v5.1.0 does not yet have dedicated techniques for all LLM-specific failure modes; the mapped technique is the closest available analog.


9.4 The Audit Structure

A structured OWASP audit has five steps per category:

  1. State the category. Name the OWASP category, its risk description, and the DVLA surface it applies to.
  2. Design a test case. Write a specific test: what input triggers the behavior; what observable output demonstrates the vulnerability; what counts as a finding.
  3. Run the test. Execute it against the DVLA. Record the model response verbatim.
  4. Map to ATLAS. Identify the most specific ATLAS technique that corresponds to the finding. If no specific technique exists, identify the closest tactic and note the gap.
  5. Propose a mitigation. One or two sentences: what specific change to the DVLA's code, configuration, or deployment would prevent or detect this class of attack?

This structure is repeatable. It is the same structure a security consultant uses to produce a deliverable audit report. The report format is the deliverable, not the finding itself.


9.5 OWASP vs MITRE: Why Two Taxonomies

Students sometimes ask why there are two overlapping taxonomies. The answer is: they serve different stakeholders.

OWASP is written for defenders and developers. Its categories describe what can go wrong in a deployed system; its guidance is remediation-oriented; its audience is the team building or deploying the system. The OWASP AI Top 10 is a checklist for a developer reviewing their own system.

MITRE ATLAS is written for security professionals and threat analysts. Its categories describe what adversaries do; its language is behavior-centric rather than failure-centric; its audience is the team assessing risk from external adversaries. ATLAS is an adversary playbook organized for threat modeling and detection engineering.

A complete AI security practice requires both: ATLAS to model the attacker (what they might do, what techniques they might use), OWASP to model the system (what vulnerabilities it has, how to remediate them). The Module 9 audit uses OWASP as the organizing framework and ATLAS as the technique-mapping layer.

The Christian connection. Ch 9 of The Alignment Problem ("Inverse Reward Design") argues that specifying what you want is harder than it looks -- and that the ways specification fails are systematic, not random. OWASP LLM01-LLM10 is an empirical enumeration of the ways that "build a helpful, safe AI system" has failed in deployed systems. The taxonomy is not a list of bugs; it is a list of failure modes in the specification-to-deployment pipeline. Normativity -- having the right values and acting on them consistently under adversarial pressure -- is the thing each OWASP category tests.


9.6 Lab 9.1 Scope and Preparation

The Lab 9.1 audit has three tiers:

Tier 1 (required for full credit): Test all 10 OWASP categories. For each: one test case designed; result recorded; ATLAS technique identified; mitigation proposed. The report uses the scoring card format from owasp.org.

Tier 2 (required for certificate endorsement): At least 3 of your 10 test cases must result in an actual finding (not just "tested; no vulnerability found"). If fewer than 3 categories produce findings against the hardened DVLA, strengthen the DVLA's defenses from Lab 7.5 and re-run -- you want a system that is mostly-defended but still has observable gaps on at least 3 categories.

Tier 3 (capstone-prep depth): For at least one finding, trace it through the full substrate-language mapping: what is the substrate-level analogue, what would the attack look like against the Tang Nano Virtus OS target, and what defense applies at each layer?

Before running the lab: review your Lab 8.1 results. Multi-agent topology directly produces LLM06 (Excessive Agency) and LLM01 (indirect injection) findings. Your Lab 8.1 report should give you two of the ten categories before you start.