Guardrails AI
Guardrails AI is the open-source validation framework we layer on top of LLM responses to enforce structure, redact PII, and block unsafe outputs before they reach users. The principle: never trust a raw LLM response in production. Every output goes through a validation layer that checks for the things the LLM is likely to get wrong — wrong format, leaked private data, off-topic answers, jailbreak responses, hallucinated facts against a known truth set. We pair Guardrails with our own evaluation harness for ongoing measurement. Together they turn 'AI shipped' into 'AI shipped safely'.
What you get
Real examples
PII redaction in healthcare AI
Illustrative scenario: a healthcare provider's AI assistant summarises patient records. Guardrails enforces PII redaction on every output — patient names, MRNs, dates of birth are stripped or hashed before the summary reaches the doctor's screen. Mandatory audit trail attached to every redaction event.
Structured output enforcement for downstream automation
Illustrative scenario: an invoice-processing pipeline expects strict JSON output from the LLM. Guardrails enforces the schema (date format, currency code, line-item structure) and re-prompts the LLM if validation fails. Downstream automation never receives malformed data.
Off-topic + jailbreak detection in customer-facing chat
Illustrative scenario: a B2B SaaS chat assistant must stay within product-support scope. Guardrails filters outputs that drift off-topic, refuse politely on jailbreak attempts, and log every refusal for review. Prevents the bot from being weaponised for unintended use cases.
Common questions
Why a separate validation layer when LLMs have built-in safety?
Built-in model safety is good for the general case but tunes for breadth, not specificity. Your application has rules the model doesn't know — what counts as PII for your specific jurisdiction, what fields your downstream system requires, what topics are off-limits for your product. Guardrails encodes those rules as deterministic checks at the application boundary, independent of which model you're calling.
Does this add latency?
Some — typically 50-200ms per call depending on the validator stack. For most production workloads that's acceptable. For latency-critical realtime apps (sub-500ms response targets), we cut the validator stack to the safety-critical checks and run others asynchronously after the response is sent.
Can Guardrails replace our content-moderation policy?
No. It's enforcement, not policy. The policy still needs to come from your compliance / legal / product team — Guardrails just encodes that policy into deterministic runtime checks. We typically pair a Guardrails implementation with a documented content-moderation policy review.
What about hallucination detection?
Partial support. Guardrails has a 'provenance' validator that checks LLM claims against a known truth set (great for RAG). For genuinely novel hallucinations (where there's no truth set to check against), we layer in our own LLM-as-judge evaluation harness on top of Guardrails — judging accuracy with a different model is sometimes the best detection signal available.
How do you decide what validators to use?
Three buckets. Always-on: PII redaction (any data-sensitive context), JSON schema enforcement (any structured output). Risk-driven: toxicity, jailbreak, off-topic filters for customer-facing chat. Domain-specific: custom validators for industry rules (financial advice disclaimers, healthcare HIPAA equivalents, etc.). We design the validator stack as part of the AI readiness scoping work.
Ready to get started?
Tell us about your project and we'll tell you honestly how we can help.
Get in Touch