Classroom Glossary Public page

AI-101 Instructor Guide

1,652 words

Course: VCA-AI-101: AI & Agentic Security: Foundations Version: v0.1 pilot Target audience: Cybersecurity practitioners with Python + HTTP fundamentals


Tool Installation Reference

Student machine minimum requirements

  • Python 3.10+ with pip
  • 8 GB RAM (16 GB recommended for local model labs)
  • CPU: 4-core minimum; GPU optional but speeds up Lab 5
  • 20 GB free disk space (Ollama models: ~2 GB; fine-tune checkpoint: ~500 MB)

Required tools and installation commands

# Core Python packages
pip install openai anthropic langchain langchain-openai langchain-community \
            langgraph tiktoken transformers safetensors huggingface_hub \
            datasets evaluate accelerate scikit-learn torch \
            requests httpx python-dotenv rich garak pyrit picklescan

# Ollama (local model server)
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama3.2:3b

# Verify garak
garak --version   # expect 0.15+

# Verify PyRIT
python3 -c "import pyrit; print(pyrit.__version__)"

Smoke tests

# Test API connectivity
python3 -c "
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
r = client.chat.completions.create(model='gpt-4o-mini',
    messages=[{'role':'user','content':'ping'}], max_tokens=5)
print('OpenAI:', r.choices[0].message.content)
"

# Test Ollama
ollama run llama3.2:3b 'Respond: READY'

# Test garak
garak -m ollama -n llama3.2:3b --probes promptinject.HijackHateHuman 2>&1 | tail -5

# Test transformers
python3 -c "from transformers import AutoTokenizer; t=AutoTokenizer.from_pretrained('distilbert-base-uncased'); print('Transformers OK')"

Week-by-Week Instructor Notes

Module 1: LLM Black-Box Mental Model

Prep: Ensure all students have API keys before Module 1 lab. Labs 1-3 incur API costs (~$0.10-0.25 total); inform students before they begin.

Common issues:

  • Students confuse "tokens" with "words." Demonstration: len(tiktoken.encode("ChatGPT's")) = 3 tokens, not 1 word.
  • Students install openai<1.0 (the old API). Require openai>=1.14.
  • Ollama not running when lab starts. Remind students to run ollama serve before the lab.

Teaching note: The black-box mental model (next_token = f(sequence, weights)) is the anchor concept. Spend time here. Students who internalize this reasoning will have a structural advantage in Modules 2-9 because they will understand why attacks work, not just what to do.

Karpathy companion: Assign as optional; approximately 30% of students will do it. For the ones who do, the micrograd exercise will make Modules 3 and 5 significantly cleaner.


Module 2: Prompt Injection

Lab 2 note: The lab uses Ollama for cost control. Some students will find that llama3.2:3b is more susceptible to injection attacks than GPT-4o. This is pedagogically correct: show the full range of model resistance. If students use GPT-4o instead, they may find more attacks blocked -- that is also a valid finding worth discussing.

Common mistakes:

  • Students write injection payloads that are too aggressive. The most effective injections often mimic the style and authority of legitimate instructions. Guide students toward subtle attacks, not just "ignore everything."
  • Students conflate the output validation defense with a content moderation system. Output validation checks whether the model is planning to do something it should not, not whether the output contains banned words.

Discussion anchor: Ask students: "EchoLeak (which you will study in Module 9) was a zero-click exploit. Given what you learned in Module 2, what properties of Copilot's architecture made that possible?" Good answers will mention: (1) indirect injection via email content, (2) Copilot had tools with wide scope, (3) the output (exfiltration URL) was not validated before the browser rendered it.


Module 3: Sensitive Information Disclosure

Lab 3 note: The canary extraction exercise uses few-shot prompting to simulate memorization. In a real fine-tuned model, extraction rates are much lower because the model has learned many training examples that compete with the canary. The lab is a proof-of-concept, not a realistic extraction rate estimate.

Common mistakes:

  • Students expect 100% canary extraction. Explain that fine-tuned model extraction depends on memorization rate (typically 0.1-5% of training examples are extractable via simple prefix attacks).
  • Students skip the context window leakage part (Part 4) as "not a model bug." Emphasize: this is a real deployment bug with real consequences; it falls under LLM02's umbrella.

Module 4: Supply Chain

Lab 4 note: The pickle payload in Part 2 executes print(). Students are instructed not to put real shellcode in the payload. Reiterate: the concept is the point. The __reduce__ mechanism is the attack primitive; the payload is irrelevant to the learning objective.

Burp Suite note: Students who have not used Burp before should follow the Burp interception setup from SEC-101. If they get SSL errors when intercepting HuggingFace HTTPS traffic, they need to install Burp's CA certificate. This is documented in SETUP.md.


Module 5: Data Poisoning

Lab 5 note: The fine-tuning step runs on Colab/Kaggle. Warn students that Colab may disconnect during training. The fix: use colab.research.google.com and enable "Background" execution, or use Kaggle Kernels which have a 12-hour runtime limit.

GPU availability issues: Colab free tier has a GPU availability limit. If students cannot get a GPU, the 2-epoch fine-tune with 2,000 examples takes ~15 min on CPU. Acceptable for the lab.

Common mistakes:

  • Students set poison rate to 50% to "see a stronger effect." Explain: high poison rates degrade overall model accuracy, which is detectable. The point of a backdoor attack is that it is stealthy with low poison rates.

Module 6: Excessive Agency

Lab 6 note: The human-in-the-loop confirmation in Part 2 requires interactive input (input() call). Students running in Pyodide cannot use input() -- they should run this part locally. Alternatively, mock the confirmation: get_human_confirmation = lambda action, details: True for the "approved" path and lambda action, details: False for the "rejected" path.

Teaching note: The minimal tool scope principle (Part 3) is the most important defense in this module. Emphasize: if the agent cannot send email, an injection that commands it to send email fails regardless of how clever the injection is.


Module 7: System Prompt Leakage + RAG Poisoning

Lab 7 note: The RAG implementation uses simple bag-of-words embeddings for portability. In a real system, you would use sentence-transformers or the OpenAI embeddings API. The lab's simplified embeddings may not retrieve the poisoned document as expected if the vocabulary overlap is low. If the injection does not trigger, have students manually examine the cosine similarity scores and adjust the knowledge base document content for better overlap.


Module 7.5: Red-Team Tooling

garak note: The full probe scan (all 50+ categories) takes 30-60 minutes against Ollama. Plan lab time accordingly. For a 90-minute lab session, use the targeted probe set from the lab instructions, not the full scan.

PyRIT note: The Crescendo orchestrator requires async Python. Some students will have issues with asyncio.run() in Jupyter notebooks (RuntimeError: event loop is already running). Fix: import nest_asyncio; nest_asyncio.apply() before calling asyncio.run().

Gandalf note: gandalf.lakera.ai requires no account. Students should use incognito/private mode to start from Level 1 without previously cached progress.


Module 8: CVE Deep Dive

Lab 8 note: The isolated venv setup is critical. If students install langchain-core==1.0.6 into their main virtualenv, they break their existing LangChain installation. Enforce the isolated venv in the lab instructions.

Patch diff note: GitHub shows the diff between tag versions. The relevant file is libs/core/langchain_core/prompts/prompt.py. Students should search for _RestrictedSandboxedEnvironment and the f-string validation function. If GitHub is unavailable, they can examine the installed source directly as shown in the lab.

OWASP mapping (Part 5): This is the highest-value written assessment in the module. Grade rigorously. A weak submission says "this is LLM05 because it involves output handling." A strong submission traces the specific code path, explains why Jinja2's Environment vs. SandboxedEnvironment choice matters, and explains what data was at risk in a production deployment.


Module 9: EchoLeak

EchoLeak reading: Assign the arXiv paper (arXiv 2509.10540) as homework before Module 9 lecture. The paper is 10 pages; assign Sections 1-3 (~5 pages). Students who read it before lecture will have a qualitatively different discussion experience.

1-pager grading: The executive briefing is graded on accessibility, not technical depth. If the CISO would understand it, it passes. If it uses terms like "IDOR" or "XPIA classifier" without explanation, it fails. Grade accordingly.


Module 10: Capstone

Target selection: If students use the ReAct agent example from LangChain, ensure they are using a recent version (1.0.7+) so they are not trivially "finding" CVE-2025-65106 in the target code. The target should be a running application, not just code; students should actually invoke the agent to collect D8 evaluation data.

Workshop presentations: 5 minutes per student is strict. Practice the format. Students who go over time should be cut off at 5 minutes -- this mirrors real security review board presentations.

Most common capstone failure modes:

  1. Attack scenarios are not specific ("prompt injection could happen"). Require: a named attacker, a named entry point, a step-by-step chain, a specific harm with business impact.
  2. All 10 OWASP entries are listed but most are "N/A -- not applicable." Push students to justify non-applicability with evidence, not assumption.
  3. D8 evaluation is skipped or minimal. This section should have actual test results, not estimates.

NSM Corpus / Lab Data

AI-101 does not use a PCAP corpus. Lab data files:

File Used in Purpose
None (API/local only) All labs No lab data files to distribute

Grading Breakdown

Component % of grade
Labs (Labs 1-10, raw 111 pts) 40%
Written components (threat model, EchoLeak 1-pager, Lab 8 analysis) 30%
CVE-disclosure assessments (Labs 8, 9; Module quizzes 7-10) 30%

Scaling: Grade all components to 100% basis using weights above. Threshold: A >= 90%, B >= 80%, C >= 70%.


API Cost Budget for Students

Estimated per-student API spend for the entire course (using gpt-4o-mini):

Lab Estimated cost
Lab 1 (10 prompts) < $0.01
Lab 2 (Ollama, free) $0.00
Lab 3 < $0.05
Lab 4 (metadata only) < $0.01
Lab 5 (Ollama / Colab) < $0.05
Lab 6 (Ollama) $0.00
Lab 7 < $0.05
Lab 7.5 (Ollama + free Gandalf) $0.00
Lab 8 (minimal API, local venv) < $0.02
Lab 9 < $0.10
Lab 10 (capstone evaluation) < $0.20
Total estimate < $0.50

Students who use GPT-4o instead of GPT-4o-mini will spend 10x more. Enforce gpt-4o-mini in all labs unless there is a specific reason to use a more capable model.


Classroom Cadence

Recommended pacing for a 10-week instructor-led cohort:

Week Content Lab
1 Modules 1 + 2 Lab 1 (in class) + Lab 2 (homework)
2 Module 3 Lab 3
3 Module 4 Lab 4
4 Module 5 Lab 5 (start in class; finish on Colab)
5 Module 6 Lab 6
6 Module 7 Lab 7
7 Module 7.5 Lab 7.5
8 Module 8 Lab 8
9 Module 9 Lab 9 (EchoLeak reading assigned week 8)
10 Module 10 Lab 10 workshop

Supplementary Resources

These resources are not required but are high-value for students who want depth:

  • Simon Willison's blog (simonwillison.net) -- prompt injection coverage is the best available
  • OWASP Gen AI Security Project site (genai.owasp.org) -- source documents for all advisories
  • garak documentation (github.com/NVIDIA/garak) -- probe catalog and new probe descriptions
  • PyRIT documentation (github.com/Azure/PyRIT) -- attack strategy descriptions
  • Karpathy nn-zero-to-hero playlist (YouTube) -- optional but highly recommended for students who want depth