Voice-specific guardrails
VoiceGateway provides project-scoped, LLM-side guardrails for voice agents. Guardrails are injected through the existing `voicegateway.inference.LLM(...)` drop-in path, so agent code keeps the same Li
Voice-specific guardrails
VoiceGateway provides project-scoped, LLM-side guardrails for voice agents. Guardrails are injected through the existing voicegateway.inference.LLM(...) drop-in path, so agent code keeps the same LiveKit construction pattern.
Guardrails do not create a proxy session service, do not inspect raw audio, and do not intercept arbitrary tool calls. They append a versioned system prompt block to the LiveKit chat context and register one reserved LiveKit function tool named report_guardrail_action.
Policy model
Guardrail policies live per project. The default is disabled, with every category set to off.
projects:
support:
name: Support Bot
guardrails:
enabled: true
categories:
pii: redact
financial: block
medical: alert
prompt_injection: block
off_topic: offCategories:
piifinancialmedicalprompt_injectionoff_topic
Actions:
redact: answer without repeating the sensitive detail.block: decline the current turn with a brief, neutral response.alert: continue normally and write an audit event.off: disable that category.
Runtime behavior
On the first guarded LLM chat in a session, VoiceGateway freezes the active project policy. Later dashboard or API edits affect new sessions only.
When guardrails are active:
- VoiceGateway appends a
<voicegateway_guardrails version="v0.6.0">block after existing system/developer instructions. - VoiceGateway registers
report_guardrail_action(category, action, context_excerpt). - A user-defined tool with the same name is rejected for that session.
- Audit rows are written to
guardrail_eventswithevent_type = fired.
Session detail responses include:
guardrails_activeguardrails_bypassedguardrail_policy_snapshotguardrail_events
This lets the dashboard distinguish "active policy, zero events" from "no guardrail audit".
Bypass
Use bypass only for trusted internal sessions where the operator intentionally wants no injection. VoiceGateway records a bypass audit event when the frozen policy would otherwise be active.
from voicegateway import inference
session_id = inference.start_session(bypass_guardrails=True)
# Or, when binding a custom LiveKit AgentSession:
inference.attach_session(agent_session, bypass_guardrails=True)Bypass skips prompt/tool injection for the session. The bypass row has event_type = bypassed; category and action are NULL.
CLI
voicegw guardrails talks to the dashboard API:
voicegw guardrails show --project support
voicegw guardrails set --project support --category pii --action redact
voicegw guardrails clear --project support
voicegw guardrails dry-run --project supportUse VOICEGW_API_KEY when your dashboard API requires auth.
API
Server API:
GET /v1/projects/{id}/guardrailsPOST /v1/projects/{id}/guardrailsGET /v1/guardrails/eventsGET /v1/guardrails/aggregate
Dashboard API mirrors these under /api/....
Aggregates count only fired rows. Event listings can include both fired and bypassed.
Caveats
These guardrails are prompt-side controls, not a deterministic safety classifier. They depend on the selected LLM following instructions and calling the reserved tool. Use provider-native moderation, contractual compliance review, and invoice/log reconciliation for higher-assurance workflows.