Providers
All 11 providers VoiceGateway supports, with modality coverage, recommended models, and per-provider config notes.
Providers
VoiceGateway supports 11 providers across cloud and local
deployments. Each provider extends the BaseProvider interface and
is instantiated lazily on first use.
Cloud providers
Deepgram
- Modalities: STT, TTS
- Required config:
api_key - Recommended models:
- STT:
deepgram/nova-3(best accuracy),deepgram/nova-2(lower cost) - TTS:
deepgram/aura-asteria-en
- STT:
- Pricing notes: Pay-per-second for STT, pay-per-character for TTS. Nova-3 is priced higher than Nova-2 but offers better accuracy.
providers:
deepgram:
api_key: ${DEEPGRAM_API_KEY}OpenAI
- Modalities: STT, LLM, TTS
- Required config:
api_key - Recommended models:
- STT:
openai/whisper-1 - LLM:
openai/gpt-4.1-mini(balanced),openai/gpt-4.1(best quality) - TTS:
openai/tts-1(fast),openai/tts-1-hd(high quality)
- STT:
- Pricing notes: Different pricing tiers per model. GPT-4.1-mini offers a good cost / quality balance for voice agents.
providers:
openai:
api_key: ${OPENAI_API_KEY}Anthropic
- Modalities: LLM
- Required config:
api_key - Recommended models:
- LLM:
anthropic/claude-sonnet-4-5(balanced),anthropic/claude-opus-4-1(highest quality)
- LLM:
- Pricing notes: Per-token pricing. Check Anthropic's pricing page for current rates.
providers:
anthropic:
api_key: ${ANTHROPIC_API_KEY}Groq
- Modalities: STT, LLM
- Required config:
api_key - Recommended models:
- STT:
groq/whisper-large-v3 - LLM:
groq/llama-3.3-70b-versatile,groq/llama-3.1-8b-instant
- STT:
- Pricing notes: Very fast inference at competitive pricing. The Whisper endpoint is significantly cheaper than OpenAI's hosted Whisper.
providers:
groq:
api_key: ${GROQ_API_KEY}Cartesia
- Modalities: TTS
- Required config:
api_key - Recommended models:
- TTS:
cartesia/sonic-3(latest, best quality)
- TTS:
- Pricing notes: Pay-per-character. Known for low-latency streaming TTS.
providers:
cartesia:
api_key: ${CARTESIA_API_KEY}ElevenLabs
- Modalities: TTS
- Required config:
api_key - Recommended models:
- TTS:
elevenlabs/eleven_multilingual_v2,elevenlabs/eleven_turbo_v2_5
- TTS:
- Pricing notes: Per-character pricing with monthly quotas depending on plan. Multilingual v2 supports 29 languages.
providers:
elevenlabs:
api_key: ${ELEVENLABS_API_KEY}AssemblyAI
- Modalities: STT
- Required config:
api_key - Recommended models:
- STT:
assemblyai/universal-2(single-tier model)
- STT:
- Pricing notes: Per-second pricing. Offers real-time streaming and batch transcription.
providers:
assemblyai:
api_key: ${ASSEMBLYAI_API_KEY}Local providers
Local providers run on your own hardware with no API keys required. They are useful for development, privacy-sensitive deployments, and offline operation.
Whisper
- Modalities: STT
- Required config: None (downloads model on first use)
- Recommended models:
- STT:
local/whisper-large-v3(best accuracy),local/whisper-base(fastest)
- STT:
- Notes: Runs OpenAI Whisper locally via faster-whisper. Requires a capable CPU or GPU.
providers:
whisper:
enabled: trueOllama
- Modalities: LLM
- Required config:
base_url(defaults tohttp://localhost:11434) - Recommended models:
- LLM:
ollama/llama3.2:3b,ollama/mistral:7b,ollama/phi3:mini
- LLM:
- Notes: Requires a running Ollama server. Models are pulled on
first use. Use
docker compose --profile local up -dto start Ollama alongside VoiceGateway.
providers:
ollama:
base_url: http://localhost:11434Kokoro
- Modalities: TTS
- Required config: None
- Recommended models:
- TTS:
local/kokoro
- TTS:
- Notes: Lightweight local TTS. Good for development and testing.
providers:
kokoro:
enabled: truePiper
- Modalities: TTS
- Required config: None
- Recommended models:
- TTS:
local/piper:en_US-lessac-medium,local/piper:en_US-amy-low(voice id after:)
- TTS:
- Notes: Fast offline TTS using ONNX models. Supports multiple languages and voices. Voice models are downloaded on first use.
providers:
piper:
enabled: trueProvider modality matrix
| Provider | STT | LLM | TTS | Type |
|---|---|---|---|---|
| Deepgram | Yes | -- | Yes | Cloud |
| OpenAI | Yes | Yes | Yes | Cloud |
| Anthropic | -- | Yes | -- | Cloud |
| Groq | Yes | Yes | -- | Cloud |
| Cartesia | -- | -- | Yes | Cloud |
| ElevenLabs | -- | -- | Yes | Cloud |
| AssemblyAI | Yes | -- | -- | Cloud |
| Whisper | Yes | -- | -- | Local |
| Ollama | -- | Yes | -- | Local |
| Kokoro | -- | -- | Yes | Local |
| Piper | -- | -- | Yes | Local |
Per-project provider keys
The top-level providers block sets the default keys. Each project
under projects: can override the providers it uses by declaring
its own providers block:
providers:
openai:
api_key: ${DEFAULT_OPENAI_KEY}
projects:
tonys-pizza:
name: Tony's Pizza
providers:
openai:
api_key: ${TONYS_OPENAI_KEY} # overrides for this projectThe inference factories pick the right key automatically based on
the active project (set via default_project, the set_project
helper from voicegateway.core.active_project, or a virtual key's
project binding).
DB-managed providers
Beyond YAML, providers can be added at runtime via the MCP server
or the dashboard. These rows live in the managed_providers table
with their API keys Fernet-encrypted by VOICEGW_SECRET. The
runtime resolution order is: YAML providers (top-level + per-project)
first, then DB-managed providers for any missing entries.
Common configuration options
All providers support these shared fields:
api_key(string): API key, typically via${ENV_VAR}substitution.base_url(string): override the default API endpoint.enabled(bool, defaulttrue): disable a provider without removing its config.
See voicegw.yaml reference, Models.
Projects
Projects provide per-project cost tracking, budget enforcement, and organizational grouping. They are the primary mechanism for attributing costs to specific agents, teams, or customers.
Stacks
Stacks are named YAML bundles that map a single name to one STT model, one LLM model, and one TTS model. They are a documentation and dashboard hint only: the `voicegateway.inference` module does not