Jeff Weber - Portfolio

Prompt injection exploits work because models implicitly trust user supplied context. Hardened systems reduce the blast radius of untrusted text and limit downstream capability.

Threat model the orchestration graph

Map every surface where untrusted content can influence tool calls, retrieval context, or exec flows. Catalogue privileges for each tool.

Build layered mitigations

Structured prompts: Replace free-form system messages with templated slots and strict delimiters.
Context filters: Run untrusted text through regex/embedding classifiers to reject known bad patterns.
Capability isolation: Split risky tools (code exec, browsing) behind additional human-in-the-loop review.

Measure and iterate

Use a red team corpus of injection patterns to regression-test guardrails. Track detection precision/recall to avoid alert fatigue.

Defending LLMs from Prompt Injection

Threat model the orchestration graph

Build layered mitigations

Measure and iterate

Related

Case studies