Basic Voice Agent
Build a voice agent with LiveKit Agents using VoiceGateway to route STT, LLM, and TTS requests.
Basic Voice Agent
Build a voice agent with LiveKit Agents using VoiceGateway to route STT, LLM, and TTS requests.
Prerequisites
pip install voicegateway[openai,deepgram,cartesia]
pip install livekit-agents livekit-plugins-deepgram livekit-plugins-openai livekit-plugins-cartesiaConfiguration
Create voicegw.yaml in your project root:
projects:
voice-agent:
name: Voice Agent
daily_budget: 5.00
budget_action: warn
providers:
openai:
api_key: ${OPENAI_API_KEY}
deepgram:
api_key: ${DEEPGRAM_API_KEY}
cartesia:
api_key: ${CARTESIA_API_KEY}
default_project: voice-agent
cost_tracking:
enabled: true
observability:
latency_tracking: trueBasic Usage
from voicegateway import inference
# Each factory returns a wrapped LiveKit plugin instance, ready to drop
# into AgentSession. Cost, latency, and session correlation happen
# transparently in the middleware.
stt = inference.STT("deepgram/nova-3")
llm = inference.LLM("openai/gpt-4.1-mini")
tts = inference.TTS("cartesia/sonic-3")LiveKit Agent Integration
from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, cli
from livekit.plugins import silero
from voicegateway import inference
async def entrypoint(ctx: JobContext):
await ctx.connect()
session = AgentSession(
vad=silero.VAD.load(),
stt=inference.STT("deepgram/nova-3"),
llm=inference.LLM("openai/gpt-4.1-mini"),
tts=inference.TTS("cartesia/sonic-3"),
)
await session.start(
agent=Agent(
instructions="You are a helpful voice assistant. Be concise in your responses.",
),
room=ctx.room,
)
if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))Multiple agents in one process
When one process serves multiple agents (e.g., one worker handling several entrypoints), set the active project per call context:
from voicegateway import inference
async def restaurant_entrypoint(ctx):
inference.set_project("restaurant-agent")
# all inference factories below charge the restaurant-agent project
...
async def support_entrypoint(ctx):
inference.set_project("support-agent")
# sibling tasks each have their own ContextVar; no leakage
...Checking Costs
voicegw costs --project voice-agent
voicegw logs --project voice-agent
# Or open the dashboard in your browser (the daemon already serves it):
voicegw dashboard
# Default URL: http://localhost:8080# From the HTTP API:
curl 'http://localhost:8080/v1/costs?period=today&project=voice-agent'Monitoring Latency
VoiceGateway automatically records TTFB and total latency for every request. View these metrics through the dashboard or the HTTP API:
curl http://localhost:8080/v1/metrics?period=todayThe latency.ttfb_warning_ms config value (default 500ms) triggers a log warning when TTFB exceeds the threshold, useful for catching provider degradation early.
What Happens Under the Hood
When you call inference.STT("deepgram/nova-3"):
- The factory parses
"deepgram/nova-3"into provider"deepgram"and model"nova-3". - The active project is resolved (set_project / env / yaml default /
"default"). - The provider's API key is looked up: per-project entry first, then top-level
providers:. - The Registry lazily imports and instantiates
DeepgramProvider. - The provider creates a
livekit.plugins.deepgram.STTinstance. - The instance is wrapped in
InstrumentedSTTto track cost, latency, and the session id. - You get back an object that behaves exactly like the underlying LK plugin instance.
All of this is transparent: your LiveKit Agent code sees the same API surface whether it uses voicegateway.inference or direct plugin imports.
Examples
Practical examples showing how to use VoiceGateway in real-world scenarios. Each example includes runnable code and a complete `voicegw.yaml` configuration.
Budget Enforcement
VoiceGateway supports per-project daily budgets with three enforcement modes: `warn`, `throttle`, and `block`. Budgets are enforced at request-completion time inside the cost tracker; the `inference`