Course: AI & Agentic Security: Intermediate
Version: v0.2
Prerequisites: AI-101 completion; basic Python; familiarity with LLM APIs
Module 1: The Production AI Attack Surface
Opening hook (5 min): Pull up the ATLAS Navigator live. Ask students: "Who here has heard of MITRE ATT&CK? This is its AI-specific cousin — 16 tactics, 84 techniques, all discovered by red teams hitting production AI systems in the last two years." Navigate to the October 2025 expansion tab and show the 14 new techniques added specifically for AI agents.
Pacing: 2 hr lecture. The OWASP-to-ATLAS mapping table is dense; walk through 3-4 entries live, let students complete the rest. Budget 20 min for the production scoping checklist.
Common issues:
- Students conflate ATLAS with ATT&CK. Clarify: ATLAS extends ATT&CK specifically for machine learning systems; the tactic names map but the techniques are ML-specific.
- Students ask why OWASP LLM Top 10 and ATLAS both exist. Answer: OWASP LLM is practitioner-focused (what to protect); ATLAS is attacker-model-focused (how attackers operate). They are complementary, not competing.
- Students underestimate the scope of "production" -- emphasize that multi-modal, agentic, and RAG-augmented deployments all have distinct attack surfaces.
Lab 1 timing: 2 hr. Part C (scope declaration JSON) is the hardest part; give 45 min minimum. Students who finish early: ask them to identify which ATLAS techniques from their scope declaration are NOT covered by OWASP LLM Top 10 -- this surfaces gaps.
Mitchell Ch 7 weave placement: best during the discussion of how LLMs process context. Ch 7 covers how statistical patterns in text shape model behavior -- this connects directly to why prompt injection works: the model cannot distinguish "trusted instruction" from "attacker-injected text" at the statistical level.
Module 2: Evaluation Methodology (D8 Framework)
Opening hook (5 min): "Nine models walk into a Signal group." Describe the D8 study setup verbally before showing any slides: 9 models, 47 sessions, real operational prompts. Ask students: "If you were comparing LLMs for a production deployment, how would you measure them?" Then show what the D8 study actually measured and why it differs from benchmark leaderboards.
Pacing: 2 hr lecture. Spend 30 min on the DVLA architecture (the 3-axis evaluation model). Spend 20 min on promptfoo config syntax -- students will struggle in Lab 2 if they haven't seen it.
Common issues:
- Students assume benchmark leaderboard ranking correlates with production suitability. The D8 finding -- that models scoring well on benchmarks failed on real operational prompts -- is the key teaching moment. Ask: "Why would a model score 90% on MMLU but fail on operational lifetime in a 4-hour session?"
- Students underestimate setup time for Lab 2. The Ollama + promptfoo integration has quirks; verify the endpoint configuration before lab session starts.
- "Productive ratio" is subjective. Clarify that the D8 operationalization uses output-length and concrete-artifact presence as proxies, not human judgment.
Lab 2 timing: 2 hr lab + 2 hr independent. The independent portion is where students run their suite against a second model (comparison mode). Common failure: students write tests that are too easy and get 100% PASS with no signal. Encourage tests that would distinguish between a model that just follows instructions vs. one that genuinely reasons.
Mitchell Ch 8 weave placement: during discussion of what "good" model behavior looks like. Ch 8 covers the limits of behavioral testing -- the model can produce correct output for the wrong reason. This connects to why promptfoo pass rates are necessary but not sufficient.
Module 3: ML Supply Chain Attacks
Opening hook (5 min): Show the CVE-2025-68664 (LangGrinch) CVSS score: 9.3 Critical. Ask: "A CVSS 9.3 in a machine learning library. What's the vector? Remote code execution via... model loading." Let that land. Then walk through why model artifacts became a code-execution surface.
Pacing: 2 hr lecture. The pickle opcode walkthrough (REDUCE/GLOBAL/INST/NEWOBJ) needs 20-30 min with live demo. Students who have not seen Python bytecode will find this abstract -- use the analogy: "A pickle file is not data. It's a program. When you call pickle.load(), you're running that program."
Common issues:
- Students think safe loading is a recent idea. It's not -- pickle's unsafe design has been known since Python 2. The gap is that ML frameworks adopted pickle as the default format because it was convenient, not secure.
- fickling installation sometimes fails on systems with strict Python environments. Fallback: use the fickling web demo or show pre-computed output.
- Students ask: "Can I use
pickle.loads()safely withRestrictedUnpickler?" Show whyRestrictedUnpickleris insufficient for arbitrary model artifacts -- the allowlist approach is fundamentally flawed against novel opcode chains. - The
__reduce__pattern is not the only attack vector:__getstate__/__setstate__and theINSTopcode also carry risk. Mention these but don't drill -- fickling catches all of them.
Lab 3 timing: 2 hr lab + 1 hr independent. Part A (payload construction) usually takes 30-40 min. Students get stuck finding the correct __subclasses__() index -- remind them the index is environment-specific and they should search for the class name, not guess the index.
Mitchell Ch 9 weave placement: during the discussion of why "secure by design" is harder than "secure by policy." Ch 9 covers the limits of behavioral constraints -- you can tell a model not to do X, but if X is in its capability space, it may find a way. The parallel: you can document "don't load untrusted pickles," but if the framework always uses pickle, the policy is hard to enforce.
Module 4: SSTI in LLM Pipelines
Opening hook (5 min): Show the Jinja2 {{7*7}} arithmetic test in a Python REPL live. "One line of user input. How much did the developer intend? Zero. How much did the model allow? Template evaluation." Then show the path from arithmetic to class disclosure to RCE.
Pacing: 2 hr lecture. The MRO traversal walkthrough needs to be done slowly -- students who have not seen Python's object model find __class__.__mro__[1].__subclasses__() opaque. Spend 15 min on "everything in Python is an object" before the MRO traversal.
Common issues:
- Students ask why LLM pipelines use Jinja2 at all. Common answer: prompt templates. Show a realistic prompt template:
"Answer this question about {{product_name}}: {{user_query}}". The template variable ({{product_name}}) is safe; the problem is when the developer interpolates user input into the template string before parsing. - The distinction between "template parsing" and "variable substitution" needs to be made explicit.
Template("{{name}}").render(name=user_input)is safe.env.from_string(f"{{{{ {user_input} }}}}")is not. - CVE-2025-9556 is a Go codebase. Students who have not seen Go will find the syntax unfamiliar. The key insight transcends language -- show the pattern similarity, not the Go syntax.
- Students ask: "Is this only a risk for Jinja2?" Walk through the 4-language table: Python/Jinja2, Go/Gonja, JavaScript/Eta, Java/FreeMarker. The sink pattern is identical across all four.
Lab 4 timing: 2 hr lab + 2 hr independent. The independent portion covers the 4-language table. Students often skip the Java/FreeMarker row -- it's the most important one in enterprise contexts.
Mitchell Ch 10 weave placement: during the discussion of how models generalize. Ch 10 covers the gap between training distribution and deployment distribution. SSTI is an example of this gap: the developer's mental model of "template" does not match the template engine's execution model.
Module 4.5: The 2023-2026 Academic Jailbreak Corpus
Opening hook (5 min): Show the GCG suffix. Read it aloud. "This is meaningless text. It has no semantic content. And it breaks every frontier model it has been tested against -- including models that would refuse the same request phrased in plain English." Pause. "That is what you are going to understand by the end of this module."
Pacing: 3 hr lecture. This is the densest lecture in AI-201. The GCG algorithm section needs 45+ min with slides. Budget separate 30-min blocks for AutoDAN and PAIR. The comparison table (Module 4.5.4) should be built live with students filling in cells.
Common issues:
- Students conflate GCG's transferability with AutoDAN's semantic meaningfulness. They are orthogonal properties: GCG transfers but looks like gibberish; AutoDAN is readable but may not transfer as well.
- Students ask if modern models are patched against GCG. Yes and no: frontier models have improved robustness, but HarmBench shows ASR is rarely zero. The arms race framing from JailbreakBench is important: patching one method often reopens surface for another.
- The "gradient" in GCG is confusing for students without ML background. Analogy: "Imagine you're trying to open a combination lock. GCG is trying every combination systematically, guided by feedback from the lock about which digits are close to correct."
- Lab 4.5 requires 7 hr of paper reading. Assign this before the lecture, not after. Students who haven't read the papers will not benefit from the lecture.
- The PAIR lab should use benign goals only. Reinforce the course RoE during setup.
Mitchell Ch 11 weave placement: during the explanation of why GCG works. Ch 11 covers how LLMs are statistical pattern matchers. The connection: a GCG suffix is not meaningful; it is a token sequence that statistically resembles the context preceding compliance in the training data. The model has no epistemology -- it matches patterns. This is why gibberish works.
Module 5: Tool-Calling Exploit Patterns
Opening hook (5 min): "The developer did everything right. The tool permissions were scoped correctly. The user had permission to send emails. And the attacker still got data exfiltrated." Show the agency confusion attack in 4 steps on a whiteboard before any code. The conceptual model must come before the technical implementation.
Pacing: 2 hr lecture. The "permission boundary is not enough" section (Module 5.4) is the most important section; give it 30 min. The multi-agent lateral movement pattern needs 20 min with a diagram.
Common issues:
- Students confuse excessive agency (AI-101 scope) with agency confusion (AI-201 scope). Drill: excessive agency = the developer granted too much; agency confusion = the developer granted appropriate permissions but the agent can be manipulated into using them wrong.
- Tool chaining amplification is often underestimated. Walk through a 3-step chain: read_file → format_content → send_email. Each step is individually within permissions. The chain exfiltrates data.
- The
safe_tool_call()pattern with context source tracking is the key defense. Students ask "why doesn't the system prompt just say 'don't follow document instructions'?" Walk through why LLM-level instructions are insufficient: the model doesn't have a semantic concept of "document content vs. user intent." The context-source check happens in application code, not in the prompt. - ATLAS discovery methods (direct query, error-based, schema inference) should be demonstrated live with a running agent. The direct-query method often surprises students -- many production agents will enumerate their tools if asked.
Christian Ch 1 weave placement: at the end, after covering agency confusion. The alignment problem framing -- the gap between specification and behavior under adversarial conditions -- is the meta-theme of AI-201. Module 5 is the first place where students see a specification (tool permission list) fail under adversarial conditions in a way the developer did not anticipate. This is the Christian "Prophecy" -- the early warning that specification gaps will be exploited.
Module 6: RAG-Poisoning and Indirect Prompt Injection at Scale
Opening hook (5 min): "What if the attacker's instructions were already in your knowledge base before you ever deployed?" Draw the RAG pipeline on the whiteboard: user query → embedding → retrieval → context → LLM → response. Then draw a red arrow: attacker uploads document → sits in vector store → retrieved by future user → injected into context → executed. "The attacker has persistence. They poisoned the pipeline before the session started."
Pacing: 2 hr lecture. The embedding similarity discussion (Module 6.4) is conceptually important -- students need to understand that retrieval is semantic, not keyword-based. The Chroma/vector-store setup review takes 20 min; don't skip it.
Common issues:
- Students assume RAG systems review documents before ingestion. Most do not. The ingestion pipeline is often fully automated.
- "Hidden content" in documents is harder to grasp without a live demo. Show the Python dictionary example from Module 6.4 -- the visible content is a legitimate summary; the hidden instructions appear after 200 newlines.
- The exfiltration chain (6 steps) should be walked step by step with students identifying which step is the failure point. Step 5 (the LLM follows the embedded instruction) is the failure, but the root cause is step 1 (no ingestion review) and step 4 (no context-source trust control).
- fickling is mentioned for PDF inspection. Students ask if fickling works on PDFs. Fickling works on pickle files; for PDFs, the equivalent tool is camelot, pdfminer, or pymupdf to extract content + a separate text scan. Clarify this distinction.
Lab 6 timing: 4 hr lab + 5 hr independent. The Chroma setup takes 30+ min for students who haven't used it. Verify the embedding model downloads correctly before lab -- all-MiniLM-L6-v2 is 80MB and may be slow on first load.
Mitchell Ch 13 weave placement: during the context-source trust control discussion. Ch 13 covers how LLMs derive meaning from context. The connection: the attacker's goal is to shift context so the model interprets its task differently. The trust control is a manual simulation of something Mitchell argues LLMs lack -- genuine understanding of instruction provenance.
Module 7: Agentic Web-Scraping and SSRF
Opening hook (5 min): "This is SSRF. You've seen it in OWASP A10. An application fetches a URL you supply -- you supply http://169.254.169.254 and get IAM credentials. Now: what if the attacker doesn't supply the URL? What if they just supply content that causes the agent to decide to fetch that URL on its own?" The LLM-mediated SSRF framing reframes a classic bug class for an AI context.
Pacing: 2 hr lecture. The SSRF target class table (Module 7.3) should be reviewed with students; all the cloud-provider metadata endpoints should be demonstrated in a test environment. The allow-list code (Module 7.4) should be walked through line by line -- the DNS rebinding protection step is the one students most often omit.
Common issues:
- Students ask "why doesn't the LLM just know not to fetch internal URLs?" The answer connects to Module 5's Christian weave: the specification ("retrieve useful information") doesn't include "don't fetch 169.254.169.254." The model follows its specification; the developer's intent was different from the specification.
- The DNS rebinding protection step is essential but easy to omit. Demo: a domain
trusted.example.compasses the allow-list check but resolves to10.0.0.1. The allow-list check and the DNS resolution must both happen, in sequence. - Students ask if
httpsenforcement alone is sufficient. No --https://169.254.169.254/is a valid HTTPS URL. HTTPS only encrypts the transport; it doesn't prevent SSRF. - Lab 7 Part B requires a running local HTTP server. Ensure students have the mock server running before starting the attack. Common failure: the mock server is not started, the fetch fails, and students think the attack didn't work.
Christian Ch 3 weave placement: during the defense discussion. Ch 3 covers reward specification and how incomplete specifications lead to unexpected behaviors. The allow-list is the "constrained action space" fix -- you cannot fix a specification error by making the agent smarter. The fix constrains what the agent can do, regardless of what it decides.
Module 7.5: Multi-Modal Adversarial Attacks
Opening hook (5 min): Show the VSH attack success rate: 89.0% on GPT-4o mini. "The same model that would refuse this request in text will comply 89% of the time when the request is delivered as pixels." Let the number land. Then explain why: safety alignment was developed for text modality, not vision modality. The two were added at different rates.
Pacing: 2 hr lecture. The Whisper transcription chain section needs a live audio demo if possible -- play a clean audio clip, then a clip with sub-perceptual injection, and ask students if they hear a difference. The visual prompt injection section benefits from a live demo with a simple image.
Common issues:
- Students ask why vision models extract text from images at all. The OCR-equivalent behavior is emergent -- vision models are not explicitly trained to run OCR, but they learn to do it because images with text are common in training data. This is precisely what makes visual injection dangerous: the behavior is implicit.
- The steganographic injection vector (EXIF/PNG metadata) is less reliable than visible text injection but students are often more interested in it. Don't spend too much lecture time here -- it's a lower-reliability attack surface.
- Compositional attacks (Module 7.5.4) are conceptually the hardest. Spend 15 min with a concrete 3-step example on the whiteboard. Students need to see how each individual step passes static evaluation while the combination does not.
- Lab 7.5 Part C uses a simulated Whisper injection because real adversarial audio generation requires GPU. Be explicit about this: students are building the detection layer, not reproducing the attack. The attack mechanics are explained in the module.
Christian Ch 4 weave placement: at the end, as a course-thread closing. Ch 4 on goal specification generalizes the module's lesson: the multi-modal safety gap is a goal-specification failure. The model was trained to refuse harmful text; the specification did not generalize to other modalities. The Belt-4 question -- "what contexts did the specification miss?" -- should be placed explicitly here as a forward pointer to the capstone.
General Assessment Notes
Capstone gate management: Students must pass Gates 1-2 (Modules 3-4) before attempting the SSTI/deserialization capstone submissions. Common failure: students attempt the capstone without having run Lab 4 and miss the safe-loading validator requirement.
Jailbreak lab ethics note: Reinforce at Lab 4.5 that the PAIR implementation is for understanding attack mechanics. The target in Lab 4.5 is a local model; the goal is to understand the algorithm, not to find novel jailbreaks. Course RoE applies.
Lab timing reality: Budget 20% extra time on Labs 3, 5, and 6. These have the most integration surface (APIs + local model + vector DB) and the most setup failure modes.
Module sequencing for repeat offerings: The jailbreak papers (Module 4.5) are the highest-decay content -- new papers appear frequently. Plan to update the Module 4.5 required reading list annually. The rest of the curriculum (CVEs, ATLAS, tool-calling patterns) is more stable.
Module 8: Adversarial Robustness Testing with HarmBench at Scale
Opening hook (5 min): Pull up the HarmBench GitHub leaderboard. Point to a frontier model: "This model scores 92% on MMLU. What's its jailbreak ASR?" Show the category-level ASR breakdown. "The model that aces every benchmark has a 34% attack success rate in the cybersecurity category when you hit it with 50 behaviors from the academic jailbreak corpus. That gap between benchmark rank and adversarial robustness is what this module is about."
Pacing: 2 hr lecture. The category-ASR discussion needs 30 min -- students need to understand WHY some categories have higher ASR (deployment context, specificity of training data, diversity of the attack corpus). The perplexity-ASR tradeoff section (Module 8.4) is the key insight: spend 20 min here.
Common issues:
- Students conflate ASR with "the model is unsafe." Clarify: ASR is a measurement of one model against one eval corpus under one judge. An ASR of 0.0 could mean strong safety alignment OR an overly strict judge. Both require investigation.
- The rule-based judge in Lab 8 is a teaching approximation; it classifies based on surface patterns and has both false positives and false negatives. An LLM judge is more accurate but expensive. Students ask: "Which judge should I use in production?" The answer depends on cost, latency, and false-positive tolerance -- walk through the tradeoff.
- HarmBench installation sometimes fails on CPU-only environments. The lab includes a
harmbench-evalpip install; if this fails, the course-fixture fallback (lab8-behaviors.json) is the correct path. Do not debugharmbench-evalinstall issues for more than 10 minutes -- just use the fixture. - Students ask: "Is running HarmBench on a model I don't own ethical?" The rule: HarmBench evaluation on local models (ollama-hosted) is clearly within scope. Running against production APIs requires the API provider's consent. Labs 8 only uses local models for this reason.
Lab 8 timing: 3 hr lab + 5 hr independent. The evaluation run itself (50 behaviors × 3 passes) takes 15-45 minutes on CPU. Budget setup time for the harmbench-eval / fixture setup. The finding-quality write-up (Part C) is where students spend the most time -- each finding takes 20-30 min to get right.
Mitchell Ch 11 weave placement: during the explanation of why ASR is not zero even for well-aligned models. Ch 11 covers how LLMs are statistical pattern matchers with no genuine understanding. This connects directly: the model refuses because the refusal pattern is strongly reinforced in training, but an adversarial suffix can shift the statistical context enough that the refusal pattern is overridden. The model has no epistemology -- it matches patterns.
Module 9: Agentic Memory and Persistent Instruction Injection
Opening hook (5 min): "What if you could give an instruction to an AI agent that would remain active indefinitely -- even after the conversation ends, even after the application restarts?" Show the memory injection attack chain on a whiteboard: inject → store → survive session boundary → activate → persist. "This is not a theoretical attack. It is a documented attack class in ATLAS. And it requires no technical sophistication -- just a carefully worded user message."
Pacing: 2 hr lecture. The memory architecture taxonomy (Module 9.1) needs 20 min -- students must understand the difference between in-context, external key-value, and vector DB memory before the attack mechanics make sense. The sleeper agent pattern (Module 9.3) is the most conceptually challenging; spend 25 min with a concrete example.
Common issues:
- Students ask why in-context memory is injectable at all -- "can't the model distinguish instructions from conversation history?" This is the core point: from the model's perspective, all tokens in the context window are equal. The model has no metadata about which tokens represent "authoritative instructions" vs. "user conversation history." The injection works because there is no such metadata boundary.
- The sanitization step in Lab 9 Part D uses regex patterns, which students quickly identify as bypassable. Acknowledge this immediately: "Yes, this is a first-generation defense. The value is understanding the shape of the problem -- what patterns look like injections." The evasion exercise (Part D question 2) is deliberately designed to surface the limitation.
- Students confuse memory injection (ATLAS AML.TA0008 Persistence) with prompt injection (ATLAS AML.T0051). Clarify: prompt injection is the mechanism; memory persistence is a persistence technique that can be achieved via prompt injection. They operate at different layers.
- The session boundary persistence test (Part C) is the most important part of Lab 9 for demonstrating the real-world threat. Students who see their injected instruction survive
memory.jsonread/write and re-initialization understand why memory is an attack surface.
Lab 9 timing: 3 hr lab + 5 hr independent. Part B (injection) usually takes 30-40 min including setup. Part C (session boundary test) is quick if memory.json is preserved correctly; common failure is students clear the memory file between Parts B and C. Part D (sanitization) is where students do the most independent thinking.
Christian Ch 2 weave placement: during the sleeper agent discussion. Ch 2 covers reward and goal pursuit -- the idea that a sufficiently capable system will find instrumental paths to its terminal goal even if those paths were not anticipated. The sleeper agent is a low-tech version of this: the injection is goal specification ("include this watermark"), and the goal persists because the agent's in-context "goal" is simply whatever its conversation history says it should do. This is Christian's reward misspecification problem in miniature.
Module 10: LLM-Powered Threat Intelligence Automation
Opening hook (5 min): "A security team processes 300 CVEs per week. They spend 4 minutes per CVE triage. That is 20 person-hours every week, before any analysis. Now: what if an LLM could triage the first 80% -- the ones with clear classification -- in 10 seconds each?" Show the CVE fetch → ATLAS enrichment pipeline diagram. "And then: what if an attacker used the same pipeline for the same CVEs, but to identify which of your disclosed technologies are vulnerable?"
Pacing: 2 hr lecture. The dual-use framing (Module 10.5) is the pedagogically important content; spend 20 min on it. The JSON schema enforcement discussion (Module 10.3) is technically important -- temperature=0.0 and explicit schema prompting are production skills students should leave with.
Common issues:
- Students ask if the NVD API requires authentication. No -- the public endpoint is rate-limited at 5 requests per 30 seconds without an API key; 50 requests per 30 seconds with a free NVD API key (takes 30 seconds to register). The lab respects the unauthenticated rate limit by default; students with keys can remove the rate-limiting sleep.
- Hallucinated ATLAS IDs are common in Part B. Students are often surprised that the model confidently asserts an ATLAS ID that does not exist. This is the key teaching moment: LLMs are not knowledge bases. The verify.py step is not optional -- it exists because hallucination rates on structured fields like technique IDs are non-trivial.
- The dual-use discussion is the highest-abstraction content in this module. Some students want to skip to the code. Resist: the dual-use frame is the reason AI security practitioners specifically (not just AI developers) need to think about this. The OSINT exfiltration risk from technology stack disclosure in prompts is a real concern for production deployments.
- NVD is occasionally unavailable or rate-limited during lab hours. The lab fixture (
lab10-cves.json) must be available before the lab session starts. Verify fixture presence in the lab environment.
Lab 10 timing: 3 hr lab + 5 hr independent. Part A (NVD fetch) is usually 20-30 min including rate-limit setup. Part B (enrichment) is 30-45 min -- the LLM inference takes real time. Part D (briefing) is where students do the most writing; the dual-use attack analysis (Part D question 3) requires thinking beyond the code.
Mitchell Ch 7 weave placement: during the structured-output discussion. Ch 7 covers how language models process context and generate continuations. The connection: JSON schema enforcement is an application-layer trick to constrain the continuation space -- the model is more likely to produce valid JSON if the prompt explicitly frames the expected format. The limitation (hallucination persists even with schema enforcement) follows from the same chapter: the model generates statistically plausible tokens, and a plausible-looking ATLAS ID is statistically indistinguishable from a real one without external verification.
Module 11: The D8 Methodology in Depth
Opening hook (5 min): Read the D8 study's opening sentence aloud: "Nine models walk into a Signal group." Then put up the tier results -- two STRONG PASSes, several PASSes, one FAIL. "What did the models that passed do that the models that failed did not? And why does none of this correlate with their MMLU scores?" Give the answer directly: "Because MMLU measures knowledge. The D8 study measures operational behavior. Those are different things. This module teaches you to measure operational behavior."
Pacing: 3 hr lecture. Budget 40 min for the OL/PR/W axis derivation -- students need to understand why these three axes were chosen, not just how to compute them. The 9-model scorecard table (Module 11.4) should be built live with student input. Spend 20 min on the "dimensions D8 did not measure" section (Module 11.5) -- this is the Lab 11 Part D foundation.
Common issues:
- Students ask why "Wordiness" is a relevant evaluation axis -- isn't shorter always better? The answer: no. Wordiness is not a negative metric; it is a fit metric. An analyst-operator (800+ chars, tables, multi-paragraph) is exactly right for a morning briefing to a technical team. A short-ops-operator (under 400 chars, no emojis) is exactly right for a pager alert to an on-call engineer. The deployment context determines which cluster is appropriate.
- The Productive Ratio metric generates the most debate. Students argue that probe calls (file reads, environment checks) are necessary for the agent to function, so penalizing them is unfair. Acknowledge this: "D8 doesn't penalize probe calls because they're wrong; it measures them because they are context-window consumption. High PR means the agent accomplishes more work per context token. In a deployment where sessions run for hours, PR directly determines how many tasks the agent can complete before handoff."
- The deepseek-v3.2:cloud FAIL result surprises students who assume larger/newer models are always better. Walk through the exact failure mode: hangs on P1 and P3 multi-step sequences in separate trials. "FAIL doesn't mean the model is bad at language. It means it is unreliable for this specific operational task. Context matters."
- Lab 11 Part B measures PR by classifying tool call names into productive vs. probe categories. The classification in
eval_prompts.py(productive =code_block,json_output) is a simplified proxy for the D8 study's operationalization. Students should understand this is a pedagogically simplified version of the real measurement.
Lab 11 timing: 4 hr lab + 6 hr independent. The three-session-per-model evaluation (Part B) is the longest step: 20-40 minutes per model. Set expectations clearly before starting: "This is real inference. Budget 90 minutes for the full evaluation." Part D (extended dimension) is the highest-depth work in the module; budget 60-90 min for students to design and run a meaningful test.
Mitchell Ch 8 weave placement: during the D8 study introduction. Ch 8 covers the limits of behavioral testing: a model that produces correct outputs may be doing so for wrong reasons. The D8 study is a practical illustration of this: models that pass capability benchmarks can fail on operational tasks, not because they lack capability, but because the benchmarks do not measure operational behavior. D8 is a behavioral test designed specifically to measure the right behavior for the deployment.
Module 12: Capstone
Opening hook (5 min): "Everything you have built in this course -- the attack reproductions, the ATLAS mappings, the regression suites, the harness runs -- was preparation for this. You now have two choices. You can write the report that a security researcher sends to a vendor. Or you can build the pipeline that a defensive engineer ships to production. Both require the same knowledge. The deliverable is different because the audience is different."
Pacing: 2 hr lecture. Spend 30 min on the Track A report structure -- walk through the Section 2 minimal reproduction case format in detail. Students who have not written coordinated disclosure reports before will need to see what "minimal" and "executable" mean in practice. Spend 20 min on the Track B test suite structure -- the legitimate-traffic test is the one students most often write poorly.
Common issues:
- Track A students underestimate the CVSSv3 scoring. The vector string requires justifying 8 dimensions; students often make internal-consistency errors (e.g., AV:N with PR:H is unusual for a web-facing application). Have students verify their base score with the NVD calculator before submitting.
- Track A students write Section 2 reproductions that are not executable. "The attacker submits a crafted query" is not a reproduction case. Enforce the standard: if I copy-paste your Section 2 input into VirtusChat (a hypothetical implementation), does the vulnerability fire? If not, it is not a reproduction case.
- Track B students often write tests that only test the attack path and not the legitimate-traffic path. Reinforce: a defense that blocks all traffic is trivially secure and completely useless. Every defense needs both tests.
- The Christian Ch 4 specification gap framing (Module 12.5) should be reviewed during the capstone introduction. Each attack class in VirtusChat is a specification gap: the system prompt specifies "answer customer questions" but does not specify "do not execute injected instructions from RAG context." Track A Section 6 is the specification fix. Track B is the implementation of specification constraints.
Lab 12 timing: 6 hr lab + 4 hr independent. Track A typically requires 4-5 hours for a complete report with all 6 sections. Track B requires 3-4 hours for implementation plus 1-2 hours for the validation test suite. Students underestimate the time needed for Section 5 (CVSSv3 scoring) and Section 6 (specific remediation). Do not accept "TBD" in either section.
Christian Ch 4 weave placement: opening the capstone lecture. Ch 4 closes Christian's "Prophecy" arc with the argument that the alignment problem is fundamentally a specification gap problem -- AI systems fail not because they are malicious, but because the specification did not anticipate the context of failure. This is the meta-theme of AI-201: every attack class in Labs 3-12 is a context the developer's specification did not anticipate. Track A names the specification gap; Track B constrains the deployment context so that the gap cannot be exploited.
General Assessment Notes (v0.2 Addendum)
Module 8-12 timing reality: Modules 10 and 11 have the highest external dependency surface (NVD API rate limits, Ollama local inference, promptfoo configuration). Pre-verify all external endpoints and fixture files before each lab session.
D8 evaluation runs are slow: Lab 11's 3-model × 3-session evaluation takes 60-120 minutes on CPU. Students who underestimate this will not finish in the lab window. Set the time expectation explicitly at the start of Lab 11 and encourage students to start the evaluation run before reading the Part C/D questions.
Capstone track selection: Students often want to do both tracks. This is not supported in the grading rubric (50 pts for one track). If a student submits both, grade the track where they score higher. The pedagogical goal is depth on one track, not breadth across both.
AI-301 forward pointer: Lab 12 closes AI-201. The natural follow-on conversation -- "what attacks require access to model weights?" -- is the AI-301 entry point. Surface it explicitly: training data poisoning (AML.T0020), model extraction (AML.T0005), membership inference (AML.T0024), and full red team operations are not in AI-201's scope because they require model-level access. AI-301 is the Belt-5 continuation.