Classroom Glossary Public page

Module 4: Cross-Language SSTI -- The Bug-Class Generalisation

1,165 words

Duration: 2 hr lecture + 4 hr lab + 5 hr independent
Lab: Lab 4 (CVE-2025-9556 LangChainGo Gonja SSTI -- signature lab)
MITRE ATLAS tactics: Execution (Tool-Chain Compromise)
Foundational weave: Mitchell Ch 10 (Trustworthy AI and the Problem of Meaning); Karpathy makemore Video 2


4.1 The Signature Lab

CVE-2025-9556 is the second of AI-201's two signature CVEs. It is a Server-Side Template Injection (SSTI) vulnerability in LangChainGo's Gonja template engine -- the Go-language cousin of CVE-2025-65106 (the LangChain Python Jinja2 SSTI you reproduced in AI-101 Module 8). Your reproduction harness from Lab 4 is the second required capstone gate.

The reason this CVE pair is the AI-201 spine: they demonstrate a property that Belt-4 practitioners must internalize -- bug classes generalise across implementations. If a bug class exists in one language's agentic framework, the same bug class is likely present in every language that has an equivalent framework. The attacker does not need to discover a new vulnerability class; they need to find the next implementation.


4.2 SSTI: The Mechanism (Jinja2 Review)

You reproduced CVE-2025-65106 (Python LangChain Jinja2 SSTI) in AI-101 Module 8. Review the mechanism before extending it to Go.

Server-Side Template Injection occurs when user-controlled input is rendered by a server-side template engine with expression evaluation enabled. The template engine evaluates expressions in the input as if they were part of the template -- not as literal strings to be displayed.

In Jinja2 (Python):

Template string: "Hello, {{ user_name }}!"
With user_name = "Alice":  Renders: "Hello, Alice!"
With user_name = "{{ 7 * 7 }}":  Renders: "Hello, 49!"
With user_name = "{{ ''.__class__.__mro__[1].__subclasses__() }}":
  Renders: a list of all Python classes -- class disclosure
With user_name = "{{ namespace.__init__.__globals__['os'].popen('id').read() }}":
  Renders: uid=1000(user) gid=... -- remote code execution

The vulnerability is: user input is interpolated into the template string before template rendering, rather than after. The template engine then evaluates the user's input as template code.


4.3 Gonja: The Go Template Engine

Gonja is a Go implementation of the Jinja2 template engine. Its design goal is source-compatibility with Jinja2 templates: a template written for Jinja2 should work in Gonja with minimal changes. This source-compatibility goal is what makes CVE-2025-9556 structurally identical to CVE-2025-65106: the same Jinja2 expression syntax works in Gonja because Gonja implements the same expression grammar.

Key differences between Jinja2 and Gonja that matter for exploitation:

  • Gonja runs in Go's type system; class traversal via __class__.__mro__ is not available
  • Instead, Gonja's expression evaluator exposes Go's reflection API
  • The exploitation path uses {{ exec("id") }} via Gonja's built-in function namespace if the application registers exec-equivalent functions in the template context
  • Without registered functions, the SSTI vector in Gonja still produces information disclosure (template variable dump) and may produce code execution if the context includes callable objects

4.4 CVE-2025-9556: The LangChainGo Vulnerability

CVE-2025-9556 occurs in LangChainGo's prompt-template component. The vulnerable code path:

// Vulnerable pattern -- generic illustration
func (t *PromptTemplate) Format(values map[string]any) (string, error) {
    // VULNERABILITY: user input is interpolated into the template string
    // before Gonja rendering, allowing SSTI
    tpl, err := gonja.FromString(t.Template + values["user_input"].(string))
    if err != nil { return "", err }
    return tpl.Execute(gonja.Context(values))
}

The vulnerability: user_input is concatenated into the template string before the template engine parses it. The correct pattern is to include the user input placeholder as a template variable (evaluated as a literal), not as part of the template code itself:

// Fixed pattern
func (t *PromptTemplate) Format(values map[string]any) (string, error) {
    // user_input is now a template variable, not template code
    // The template contains "{{ user_input }}" as a literal slot
    tpl, err := gonja.FromString(t.Template)   // Template string fixed; user input is a variable
    if err != nil { return "", err }
    return tpl.Execute(gonja.Context(values))  // values["user_input"] is rendered as a string, not code
}

The reproduction harness (Lab 4) builds a minimal LangChainGo application using the pinned vulnerable version, sends a crafted prompt, and observes the SSTI output.


4.5 The 4-Language SSTI Family

CVE-2025-9556 is one instance of a pattern that appears in every language that has an agentic LLM framework with a template engine:

Language Framework Template engine CVE / Finding
Python LangChain Jinja2 CVE-2025-65106 (AI-101 Module 8)
Go LangChainGo Gonja CVE-2025-9556 (this module)
JavaScript LangChain.js Eta (reference finding; same class)
Java LangChain4J FreeMarker (reference finding; same class)

Why all four implementations have this bug class: each framework needed a template engine to build LLM prompt templates. Each chose a template engine with expression evaluation. Each initially implemented prompt construction by concatenating user input into the template string rather than through a safe variable substitution mechanism. This is not coincidence -- it is the default behavior of most template engines, and framework developers who are thinking about "how do I make this flexible?" rather than "what does the security boundary look like?" make this choice naturally.

The Belt-4 lesson: when you find a bug class in one framework, the next question is not "what is the CVE?" but "which other frameworks use the same pattern?" The answer often produces multiple additional CVEs.


4.6 Cross-Language SSTI Analysis

For Lab 4, you will reproduce CVE-2025-9556 in Go. The lab report requires a written cross-language analysis:

  1. Identify the root cause of the SSTI pattern (concatenation before rendering, not concatenation of rendered output).

  2. Identify the consistent defense across all four languages: use the template engine's variable substitution mechanism, never string concatenation, for user-controlled input.

  3. Identify the structural property of agentic frameworks that makes this recurring: prompt construction is a first-class operation in LLM frameworks, and template engines are the natural tool for parameterized prompt construction. The security decision (how does user input enter the template?) is a framework-design decision, not an application-developer decision.

  4. Identify one additional framework not in the four-language family above that uses a template engine for prompt construction. Does it have the same SSTI pattern?

This analysis is a preview of Section 3 (Bug-Class Generalisation) in the capstone report.


4.7 ATLAS: Tool-Chain Compromise

CVE-2025-9556 maps to ATLAS AML.T0054: LLM Plugin Compromise (under the Execution tactic). The exploit occurs when the LLM's prompt construction is treated as a trusted tool in the agentic pipeline: the application trusts that prompt templates will produce sanitized output, and the SSTI vulnerability subverts that trust.

More broadly, the SSTI family maps to ATLAS AML.T0041: Craft Adversarial Data -- the attacker crafts a specific input (the SSTI payload) that the tool-chain processes differently than intended. The distinction from classical adversarial examples (which target model weights) is that SSTI targets the application layer: the exploit has nothing to do with the LLM's behavior. The LLM could be replaced entirely and the SSTI would still work.

This is why Module 10 (model-intrinsic vs application-layer findings) matters: CVE-2025-9556 is a pure application-layer finding. The fix is in the framework code, not in the model.


4.8 Mitchell Weave: The Problem of Meaning

Mitchell's Chapter 10 (Trustworthy AI and the Problem of Meaning) argues that current LLMs process syntax (token sequences and their statistical relationships) without reliable access to semantics (what those tokens mean in the world). The pedagogical connection:

SSTI exploits work because the template engine processes syntax (the template string) without security semantics (distinguishing user data from template code). The engine has no concept of "this part was supplied by an untrusted user" -- it evaluates everything with the same level of trust. This is the same structural failure Mitchell describes in LLMs: a system that processes patterns without understanding the security boundaries between those patterns.

The broader point for Belt-4: many LLM application vulnerabilities are not failures of the LLM itself. They are failures of the surrounding application to enforce the trust boundaries that the LLM's statistical pattern-matching cannot enforce on its own. The SSTI is a template engine bug, not an LLM bug. The excessive agency exploit (Module 5) is a tool-permission design failure, not an LLM bug. The LLM is a pattern-matching system; the application must provide the security context that the LLM's semantics-free processing cannot provide.