About guardrails

Protect LLM requests and responses from sensitive data exposure and harmful content using layered content safety controls.

About

In agentgateway, you can use guardrails to help prevent sensitive information from reaching LLM providers and block harmful content in both requests and responses. Guardrails broadly cover a range of content safety techniques including personally identifiable information (PII) detection, PII sanitization, data loss prevention, prompt guards, and other guardrail features.

You can layer multiple protection mechanisms to create comprehensive guardrail protection:

  • Regex-based filters: Fast, deterministic matching for known patterns like credit cards, SSNs, emails, and custom patterns
  • External moderation: Leverage built-in model moderation endpoints and cloud provider-specific guardrails for advanced content filtering
  • Custom webhooks: Integrate your own guardrail logic for specialized requirements

How guardrails works

Agentgateway checks for content safety in the request and response paths. You can configure multiple prompt guards that run in sequence, allowing you to combine different detection methods.

  sequenceDiagram
    participant Client
    participant Gateway as Agentgateway
    participant Guard as Guardrail
    participant LLM

    Client->>Gateway: Send prompt
    Gateway->>Guard: 1. Regex check (fast)
    Guard-->>Gateway: Pass/Reject/Mask

    alt Passed Regex
        Gateway->>Guard: 2. External moderation (if configured)
        Guard-->>Gateway: Pass/Reject/Mask

        alt Passed Moderation
            Gateway->>Guard: 3. Custom webhook (if configured)
            Guard-->>Gateway: Pass/Reject/Mask

            alt Passed All Guards
                Gateway->>LLM: Forward sanitized request
                LLM-->>Gateway: Generate response
                Gateway->>Guard: Response guards
                Guard-->>Gateway: Pass/Reject/Mask
                Gateway-->>Client: Return sanitized response
            end
        end
    else Rejected
        Gateway-->>Client: Return rejection message
    end

The diagram shows content flowing through multiple guard layers. Each layer can:

  • Pass: Allow content to proceed to the next layer
  • Reject: Block the request and return an error message
  • Mask: Replace sensitive patterns with placeholders and continue

Choosing the right approach

Use this table to decide which guardrail layer to use for your requirements:

RequirementRecommended ApproachReason
Detect known PII formats (SSN, credit cards, emails)Regex with builtinsFast, deterministic, no external dependencies
Block hate speech, violence, harmful contentExternal moderation (OpenAI, Bedrock)ML-based detection trained for content safety
Organization-specific restricted termsRegex with custom patternsSimple pattern matching for known strings
Named entity recognition (people, orgs, places)Custom webhookRequires NER models not available in built-in options
HIPAA, PCI-DSS, or other compliance requirementsLayered approachCombine regex + external moderation + custom validation
Jailbreak - DAN & Role HijackingRegex with custom patternsPattern-match known jailbreak phrases and role-injection strings before they reach the LLM
Credentials & Secrets (API keys, tokens, passwords)Regex with custom patternsDeterministic pattern matching for structured credential formats with no external dependencies
System prompt extractionRegex with custom patternsDetect phrases that attempt to reveal or override system instructions before the request is forwarded
Encoding Evasion & Delimiter InjectionRegex with custom patternsMatch encoded or delimiter-based bypass patterns to block evasion attempts early in the pipeline
Integration with existing DLP toolsCustom webhookAllows reuse of existing security infrastructure
Fastest performance with minimal latencyRegex onlyNo external API calls
Most comprehensive protectionAll three layersDefense-in-depth with multiple detection methods

Performance considerations

Each content safety layer adds latency to requests. Plan your configuration accordingly:

  • Regex guards: < 1ms per check, negligible latency impact
  • External moderation: 50-200ms depending on provider and network latency
  • Custom webhooks: Varies based on webhook implementation and location

To optimize performance:

  • Use regex for fast, deterministic checks before slower external checks
  • Deploy webhook servers in the same region as agentgateway
  • Configure appropriate timeouts for external moderation endpoints
  • Consider request size limits to avoid processing very large prompts

Next steps

Check out the following guides to build your guardrail system.

To track guardrails and content safety, see the following guide.

Agentgateway assistant

Ask me anything about agentgateway configuration, features, or usage.

Note: AI-generated content might contain errors; please verify and test all returned information.

Tip: one topic per conversation gives the best results. Use the + button in the chat header to start a new conversation.

Switching topics? Starting a new conversation improves accuracy.
↑↓ navigate select esc dismiss

What could be improved?

Your feedback helps us improve assistant answers and identify docs gaps we should fix.

Need more help? Join us on Discord: https://discord.gg/y9efgEmppm

Want to use your own agent? Add the Solo MCP server to query our docs directly. Get started here: https://search.solo.io/.