Fallback Chains
Resolver-time fallback at agent startup: walk a chain of model ids and pass the first one whose `inference.STT/LLM/TTS` factory builds cleanly into `AgentSession`. Useful when a primary provider's cre
Fallback Chains
Resolver-time fallback at agent startup: walk a chain of model ids and pass the first one whose inference.STT/LLM/TTS factory builds cleanly into AgentSession. Useful when a primary provider's credentials are temporarily wrong, its plugin SDK is missing, or its initialization handshake fails.
Once a model is wired into a LiveKit AgentSession, that resolved model is used for the entire call. VoiceGateway does not swap providers mid-call. For runtime failover when a provider degrades during an active call, compose LiveKit's FallbackAdapter around VG inference.* instances; see LiveKit FallbackAdapter integration.
VoiceGateway does not run an automatic fallback middleware. The
chain in voicegw.yaml (under fallbacks:) is a documentation +
walk-pattern convention: enumerate the chain at startup and pick
the first model whose factory builds.
Configuration
projects:
prod:
name: Production
daily_budget: 50.00
budget_action: warn
providers:
deepgram:
api_key: ${DEEPGRAM_API_KEY}
openai:
api_key: ${OPENAI_API_KEY}
cartesia:
api_key: ${CARTESIA_API_KEY}
elevenlabs:
api_key: ${ELEVENLABS_API_KEY}
groq:
api_key: ${GROQ_API_KEY}
default_project: prod
# Fallback chains: first model is primary, rest are backups. The
# YAML chain is documentation; the manual walk below picks the
# first model whose factory builds.
fallbacks:
stt:
- deepgram/nova-3 # Primary: fastest, best accuracy
- openai/whisper-1 # Backup: good accuracy, higher latency
- local/whisper-large-v3 # Last resort: local, no API dependency
llm:
- openai/gpt-4.1-mini # Primary: best quality
- groq/llama-3.3-70b-versatile # Backup: fast, good quality
- ollama/qwen2.5:3b # Last resort: local
tts:
- cartesia/sonic-3 # Primary: lowest latency
- elevenlabs/turbo-v2.5 # Backup: highest quality
- local/kokoro # Last resort: local
cost_tracking:
enabled: trueUsing Fallback Chains
from voicegateway import inference
def first_resolvable(modality: str, chain: list[str]):
"""Walk the chain, return the first inference instance that builds.
Raises the last error if every model fails.
"""
factory = {
"stt": inference.STT,
"llm": inference.LLM,
"tts": inference.TTS,
}[modality]
last_error: Exception | None = None
for model_id in chain:
try:
return factory(model_id)
except Exception as exc: # noqa: BLE001
last_error = exc
assert last_error is not None
raise last_error
STT_CHAIN = ["deepgram/nova-3", "openai/whisper-1", "local/whisper-large-v3"]
LLM_CHAIN = [
"openai/gpt-4.1-mini",
"groq/llama-3.3-70b-versatile",
"ollama/qwen2.5:3b",
]
TTS_CHAIN = ["cartesia/sonic-3", "elevenlabs/turbo-v2.5", "local/kokoro"]
stt = first_resolvable("stt", STT_CHAIN)
llm = first_resolvable("llm", LLM_CHAIN)
tts = first_resolvable("tts", TTS_CHAIN)How Fallback Works
The factory walk runs once at construction time:
Errors during an AgentSession are not in this picture; they propagate to the caller.
LiveKit Agent with Fallback
from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, cli
from livekit.plugins import silero
from voicegateway import inference
# (paste first_resolvable / STT_CHAIN / LLM_CHAIN / TTS_CHAIN from above)
async def entrypoint(ctx: JobContext):
await ctx.connect()
try:
stt = first_resolvable("stt", STT_CHAIN)
llm = first_resolvable("llm", LLM_CHAIN)
tts = first_resolvable("tts", TTS_CHAIN)
except Exception as e:
# Every model in every chain failed to resolve at startup.
print(f"Cannot start voice agent: {e}")
return
session = AgentSession(
vad=silero.VAD.load(),
stt=stt,
llm=llm,
tts=tts,
)
await session.start(
agent=Agent(instructions="You are a helpful voice assistant."),
room=ctx.room,
)
if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))Cloud-to-Local Fallback Strategy
A common pattern is cloud models as primaries with local models as the final fallback. This guarantees an agent can come up even when every cloud provider is unreachable:
fallbacks:
stt:
- deepgram/nova-3
- local/whisper-large-v3
llm:
- openai/gpt-4.1-mini
- ollama/qwen2.5:3b
tts:
- cartesia/sonic-3
- local/kokoroThis handles the cold-start case: every cloud provider unreachable when the agent starts means the local model is selected and the agent comes up. It does not handle the warm-failure case: if Deepgram is healthy at startup and starts returning 500s mid-call, VG keeps the Deepgram instance for the rest of the call. For warm failover, see LiveKit FallbackAdapter integration.
Docker Deployment
Deploy VoiceGateway in production with Docker Compose. The
LiveKit FallbackAdapter Integration
This page shows how to compose VoiceGateway's `inference` factories with LiveKit's `FallbackAdapter` to get runtime, error-driven failover during an active call. VoiceGateway's own resolver-time fallb