Jeff Weber · Security Engineering Leader
prompt-securityllm

Defending LLMs from Prompt Injection

TL;DR
Red vs. blue techniques for hardening generative AI interfaces in enterprise applications.

Prompt injection exploits work because models implicitly trust user supplied context. Hardened systems reduce the blast radius of untrusted text and limit downstream capability.

Threat model the orchestration graph

Map every surface where untrusted content can influence tool calls, retrieval context, or exec flows. Catalogue privileges for each tool.

Build layered mitigations

  • Structured prompts: Replace free-form system messages with templated slots and strict delimiters.
  • Context filters: Run untrusted text through regex/embedding classifiers to reject known bad patterns.
  • Capability isolation: Split risky tools (code exec, browsing) behind additional human-in-the-loop review.

Measure and iterate

Use a red team corpus of injection patterns to regression-test guardrails. Track detection precision/recall to avoid alert fatigue.

Related

Case studies

Hardening the LLM Supply Chain
Implemented provenance tracking and secret scanning across the LLM lifecycle for a Fortune 100 fintech.