Core Concepts
This page defines the key abstractions in VoiceGateway. Understanding these concepts will help you navigate the configuration and API.
Core Concepts
This page defines the key abstractions in VoiceGateway. Understanding these concepts will help you navigate the configuration and API.
Inference module
The public Python surface. voicegateway.inference mirrors livekit.agents.inference so an agent written for LiveKit Cloud Inference moves to VoiceGateway with one import-line change. Each factory call (STT, LLM, TTS) constructs the matching LiveKit plugin and wraps it with VG's middleware.
from voicegateway import inference
stt = inference.STT("deepgram/nova-3")See: Quick Start, First Agent, Python SDK Reference
Provider
A backend service that performs inference. VoiceGateway supports 11 providers: 7 cloud (Deepgram, OpenAI, Anthropic, Groq, Cartesia, ElevenLabs, AssemblyAI) and 4 local (Whisper, Ollama, Kokoro, Piper). Each provider wraps a corresponding livekit.plugins.<name> package and is instantiated lazily on first inference call.
See: Providers
Model ID
A string in "provider/model" format that uniquely identifies a model. For example, deepgram/nova-3, openai/gpt-4.1-mini, or cartesia/sonic-3. STT model IDs can include a language suffix (deepgram/nova-3:en), and TTS model IDs can include a voice suffix (cartesia/sonic-3:narrator). LLM model IDs preserve trailing colons verbatim, so Ollama tags like ollama/qwen2.5:3b work as expected.
See: Models
Modality
The type of inference operation: STT (speech-to-text), LLM (large language model), or TTS (text-to-speech). Each provider supports one or more modalities. The factory classes inference.STT, inference.LLM, and inference.TTS correspond directly to these three modalities.
See: Providers for a modality support matrix
Project
A logical grouping for per-project provider keys, cost tracking, and budget enforcement. Each project has a name, optional daily_budget, a budget_action (warn, throttle, or block), and an optional providers: block carrying its own API keys. The active project is set via inference.set_project(...), the VOICEGW_ACTIVE_PROJECT env var, the default_project field in YAML, or the auto-created default fallback.
See: Projects
Session
A logical voice conversation: one caller turn through STT, LLM, and TTS. VoiceGateway tags every request from the same async context with a shared session_id ("vg-<uuid>"), accumulating cost and modality data into the sessions table.
See: Python SDK Reference
Fallback Chain
An ordered list of model IDs in voicegw.yaml for resolver-time fallback. Walk the chain at agent startup using the inference factories; the first model whose provider plugin imports cleanly and whose key resolves wins. Once AgentSession starts, that model is used for the whole call.
Budget Action
The enforcement behavior when a project exceeds its daily_budget. Three options:
- warn -- log a warning but allow requests to continue.
- throttle -- add artificial delay to requests to slow down consumption.
- block -- reject requests entirely until the budget resets.
See: Projects
Middleware
Processing layers that wrap every inference call. VoiceGateway includes four built-in middleware components: cost tracking, latency monitoring, rate limiting, and request logging. Middleware runs transparently around each provider invocation. You control which middleware is active via the observability config section.
See: Observability
Config Layer
VoiceGateway manages configuration from two sources: the voicegw.yaml file and a SQLite database (for models and projects created at runtime via the dashboard or MCP). At startup, the ConfigManager merges both sources. Changes made through the API or MCP are persisted to SQLite and merged on next refresh_config().
Agency quickstart
VoiceGateway supports the agency rung of the buyer ladder: cross-modality routing and per-project white-label branding. This guide walks an agency operator through provisioning a downstream customer p
Cost Reconciliation
How to compare VoiceGateway's recorded costs against your provider invoices using voicegw reconcile.