VoiceGateway // DOCS
Guide

What is VoiceGateway?

VoiceGateway is **a thin routing layer for LiveKit voice agents with first-class cost tracking and reconciliation**. It returns native LiveKit STT, LLM, and TTS plugin instances that drop straight int

What is VoiceGateway?

VoiceGateway is a thin routing layer for LiveKit voice agents with first-class cost tracking and reconciliation. It returns native LiveKit STT, LLM, and TTS plugin instances that drop straight into AgentSession, layering modality-aware unit accounting (audio-minutes for STT, tokens for LLM, characters for TTS), resolver-time fallback chains, rate limiting, and per-project budget enforcement on top. LLM, STT, and TTS prices flow through voice-prices; a voicegw reconcile command verifies VoiceGateway's recorded numbers against your provider invoices.

The problem

Building a production voice AI agent means juggling multiple providers. You need Deepgram or AssemblyAI for transcription, OpenAI or Anthropic for reasoning, and Cartesia or ElevenLabs for speech synthesis. Each provider has its own SDK, authentication scheme, pricing model, and failure modes.

As your project grows, so do the operational headaches:

  • Vendor lock-in -- switching from one STT provider to another means rewriting integration code.
  • No unified cost tracking -- you have to log into each provider's dashboard separately to understand spend.
  • No fallback story -- if your primary TTS provider goes down at 2 AM, your agent goes silent.
  • Per-project budgets are impossible -- when multiple teams or customers share the same API keys, there is no easy way to track or cap usage per project.
  • Local/cloud split -- running Whisper locally for development but Deepgram in production requires maintaining two code paths.

The solution

VoiceGateway solves these problems with a thin routing layer that drops in for livekit.agents.inference. You describe your providers, models, and policies in a single YAML file (voicegw.yaml), then construct inference.STT/LLM/TTS from your Python code exactly the way you would on LiveKit Cloud. VoiceGateway handles the rest: provider instantiation, middleware execution (cost tracking, latency monitoring, rate limiting), and budget enforcement.

agent.py
from voicegateway import inference

stt = inference.STT("deepgram/nova-3")
llm = inference.LLM("anthropic/claude-sonnet-4-20250514")
tts = inference.TTS("cartesia/sonic-3")

Switching providers is a one-line config change. Per-project budgets are built in. Cost data flows to the dashboard, the CLI, and the MCP tools without any extra plumbing in your agent code.

Who is it for?

  • Voice AI engineers building agents with LiveKit Agents or similar frameworks who want clean provider abstraction.
  • Platform teams running multi-tenant voice infrastructure that need per-project cost tracking and budget controls.
  • Indie developers who want to use local models (Whisper, Kokoro, Piper) during development and cloud providers in production, without changing application code.
  • Cost-conscious teams who need visibility into per-request costs across STT, LLM, and TTS with a single dashboard.

Feature comparison

FeatureVoiceGatewayDirect SDK callsLiteLLM
STT + LLM + TTS routingYesManualLLM only
Unified config (YAML)YesNoPartial
Fallback chainsYesManualYes
Per-project cost trackingYesNoNo
Budget enforcement (warn/throttle/block)YesNoNo
Local model supportYes (Whisper, Kokoro, Piper, Ollama)N/AOllama only
Drop-in for livekit.agents.inferenceYesNoNo
Web dashboardYesNoNo
MCP server integrationYesNoNo
LiveKit Agents compatibleYesYesPartial

Supported providers

VoiceGateway ships with 11 provider integrations spanning cloud and local:

Cloud providers:

ProviderSTTLLMTTS
DeepgramYes--Yes
OpenAIYesYesYes
Anthropic--Yes--
GroqYesYes--
Cartesia----Yes
ElevenLabs----Yes
AssemblyAIYes----

Local providers:

ProviderSTTLLMTTS
WhisperYes----
Ollama--Yes--
Kokoro----Yes
Piper----Yes

Architecture overview

The request flow through VoiceGateway follows a clean pipeline:

Plain text
Your code
  --> voicegateway.inference.STT() / LLM() / TTS()
    --> Resolve "provider/model" + per-project key
      --> Wrap a livekit.plugins.<provider>.* instance
        --> Middleware pipeline
            - Cost tracking
            - Latency monitoring
            - Session correlation
            - Budget enforcement
        --> SQLite storage
          --> Dashboard (reads stored data)

Key architectural decisions:

  • Async throughout -- all database, HTTP, and provider operations use async/await.
  • Lazy provider instantiation -- providers are created on first use via a registry factory, so unused providers cost nothing.
  • Modular installs -- pip install voicegateway[openai,deepgram] installs only the SDKs you need.
  • Pydantic validation -- the config schema uses extra="forbid" to catch typos in your YAML before they cause runtime errors.
  • SQLite storage -- request logs, cost records, and project data are stored locally in a SQLite database. No external dependencies.

For a deeper dive into the internal architecture, see the Architecture section.

Next steps

On this page