VoiceGateway // DOCS
Configuration

Providers

All 11 providers VoiceGateway supports, with modality coverage, recommended models, and per-provider config notes.

Providers

VoiceGateway supports 11 providers across cloud and local deployments. Each provider extends the BaseProvider interface and is instantiated lazily on first use.

Cloud providers

Deepgram

  • Modalities: STT, TTS
  • Required config: api_key
  • Recommended models:
    • STT: deepgram/nova-3 (best accuracy), deepgram/nova-2 (lower cost)
    • TTS: deepgram/aura-asteria-en
  • Pricing notes: Pay-per-second for STT, pay-per-character for TTS. Nova-3 is priced higher than Nova-2 but offers better accuracy.
voicegw.yaml
providers:
  deepgram:
    api_key: ${DEEPGRAM_API_KEY}

OpenAI

  • Modalities: STT, LLM, TTS
  • Required config: api_key
  • Recommended models:
    • STT: openai/whisper-1
    • LLM: openai/gpt-4.1-mini (balanced), openai/gpt-4.1 (best quality)
    • TTS: openai/tts-1 (fast), openai/tts-1-hd (high quality)
  • Pricing notes: Different pricing tiers per model. GPT-4.1-mini offers a good cost / quality balance for voice agents.
voicegw.yaml
providers:
  openai:
    api_key: ${OPENAI_API_KEY}

Anthropic

  • Modalities: LLM
  • Required config: api_key
  • Recommended models:
    • LLM: anthropic/claude-sonnet-4-5 (balanced), anthropic/claude-opus-4-1 (highest quality)
  • Pricing notes: Per-token pricing. Check Anthropic's pricing page for current rates.
voicegw.yaml
providers:
  anthropic:
    api_key: ${ANTHROPIC_API_KEY}

Groq

  • Modalities: STT, LLM
  • Required config: api_key
  • Recommended models:
    • STT: groq/whisper-large-v3
    • LLM: groq/llama-3.3-70b-versatile, groq/llama-3.1-8b-instant
  • Pricing notes: Very fast inference at competitive pricing. The Whisper endpoint is significantly cheaper than OpenAI's hosted Whisper.
voicegw.yaml
providers:
  groq:
    api_key: ${GROQ_API_KEY}

Cartesia

  • Modalities: TTS
  • Required config: api_key
  • Recommended models:
    • TTS: cartesia/sonic-3 (latest, best quality)
  • Pricing notes: Pay-per-character. Known for low-latency streaming TTS.
voicegw.yaml
providers:
  cartesia:
    api_key: ${CARTESIA_API_KEY}

ElevenLabs

  • Modalities: TTS
  • Required config: api_key
  • Recommended models:
    • TTS: elevenlabs/eleven_multilingual_v2, elevenlabs/eleven_turbo_v2_5
  • Pricing notes: Per-character pricing with monthly quotas depending on plan. Multilingual v2 supports 29 languages.
voicegw.yaml
providers:
  elevenlabs:
    api_key: ${ELEVENLABS_API_KEY}

AssemblyAI

  • Modalities: STT
  • Required config: api_key
  • Recommended models:
    • STT: assemblyai/universal-2 (single-tier model)
  • Pricing notes: Per-second pricing. Offers real-time streaming and batch transcription.
voicegw.yaml
providers:
  assemblyai:
    api_key: ${ASSEMBLYAI_API_KEY}

Local providers

Local providers run on your own hardware with no API keys required. They are useful for development, privacy-sensitive deployments, and offline operation.

Whisper

  • Modalities: STT
  • Required config: None (downloads model on first use)
  • Recommended models:
    • STT: local/whisper-large-v3 (best accuracy), local/whisper-base (fastest)
  • Notes: Runs OpenAI Whisper locally via faster-whisper. Requires a capable CPU or GPU.
voicegw.yaml
providers:
  whisper:
    enabled: true

Ollama

  • Modalities: LLM
  • Required config: base_url (defaults to http://localhost:11434)
  • Recommended models:
    • LLM: ollama/llama3.2:3b, ollama/mistral:7b, ollama/phi3:mini
  • Notes: Requires a running Ollama server. Models are pulled on first use. Use docker compose --profile local up -d to start Ollama alongside VoiceGateway.
voicegw.yaml
providers:
  ollama:
    base_url: http://localhost:11434

Kokoro

  • Modalities: TTS
  • Required config: None
  • Recommended models:
    • TTS: local/kokoro
  • Notes: Lightweight local TTS. Good for development and testing.
voicegw.yaml
providers:
  kokoro:
    enabled: true

Piper

  • Modalities: TTS
  • Required config: None
  • Recommended models:
    • TTS: local/piper:en_US-lessac-medium, local/piper:en_US-amy-low (voice id after :)
  • Notes: Fast offline TTS using ONNX models. Supports multiple languages and voices. Voice models are downloaded on first use.
voicegw.yaml
providers:
  piper:
    enabled: true

Provider modality matrix

ProviderSTTLLMTTSType
DeepgramYes--YesCloud
OpenAIYesYesYesCloud
Anthropic--Yes--Cloud
GroqYesYes--Cloud
Cartesia----YesCloud
ElevenLabs----YesCloud
AssemblyAIYes----Cloud
WhisperYes----Local
Ollama--Yes--Local
Kokoro----YesLocal
Piper----YesLocal

Per-project provider keys

The top-level providers block sets the default keys. Each project under projects: can override the providers it uses by declaring its own providers block:

voicegw.yaml
providers:
  openai:
    api_key: ${DEFAULT_OPENAI_KEY}

projects:
  tonys-pizza:
    name: Tony's Pizza
    providers:
      openai:
        api_key: ${TONYS_OPENAI_KEY}  # overrides for this project

The inference factories pick the right key automatically based on the active project (set via default_project, the set_project helper from voicegateway.core.active_project, or a virtual key's project binding).

DB-managed providers

Beyond YAML, providers can be added at runtime via the MCP server or the dashboard. These rows live in the managed_providers table with their API keys Fernet-encrypted by VOICEGW_SECRET. The runtime resolution order is: YAML providers (top-level + per-project) first, then DB-managed providers for any missing entries.

Common configuration options

All providers support these shared fields:

  • api_key (string): API key, typically via ${ENV_VAR} substitution.
  • base_url (string): override the default API endpoint.
  • enabled (bool, default true): disable a provider without removing its config.

See voicegw.yaml reference, Models.

On this page