Changelog
VoiceGateway SDK release notes.
Changelog
All notable changes to VoiceGateway are documented here. This project follows Semantic Versioning and Conventional Commits.
v0.9.2: the Postgres fleet collector actually works
The Postgres collector backend carried several SQLite-only SQL constructs that crashed the server on startup or on first ingest. They are fixed and now covered by a CI job that boots the collector against a real Postgres.
Fixed
- Ambiguous
ON CONFLICTcolumns. The project and session upserts referenced bare existing-row columns (COALESCE(excluded.x, x)), which Postgres rejects as ambiguous between the target table andexcluded. They are now table-qualified (managed_projects.x,sessions.x), which both SQLite and Postgres accept. GROUP_CONCATin the cost summary. Replaced with the dialect-appropriate aggregate (STRING_AGGon Postgres,GROUP_CONCATon SQLite).datetime('now', ...)in the virtual-key staleness queries. Replaced with a Postgres-compatible cutoff (make_interval/ a Python-computed timestamp).
Added
Postgres collectorCI workflow. Adialectjob runs the collector against a Postgres service, and animage-smokejob builds the published image and bootsdocker-compose.collector.ymlagainst Postgres. Together they gate the collector on a working Postgres path before release.
v0.9.1: collector image and Postgres startup fixes
Fixed
- The core Docker image boots again. It shipped without the hatch-vcs
generated
_version.py(the runtime copied the raw source over the installed package), soimport voicegatewaycrashed on startup. The generated file is now baked into the image. - Postgres engine event-loop crash.
Gateway.__init__runs async startup through several short-livedasyncio.run()loops; asyncpg binds connections to their creating loop, so a pooled connection was reused across loops and crashed. The Postgres engine now usesNullPool.
v0.9.0: per-session cost tracking for OpenRTC multi-agent workers
Added
voicegateway.openrtc.VoiceGatewayObserver: one-line cost tracking for OpenRTC workers. OpenRTC runs many LiveKit voice agents in a single worker process. This observer implements OpenRTC'sSessionObserverprotocol and drivesvoicegateway.attach()for every session, so a whole multi-agent worker gets per-call STT, LLM, and TTS cost tracking by passing one argument:AgentPool(observers=[VoiceGatewayObserver(project="prod", collector_url=..., virtual_key=...)]). Attribution is automatic per call:agent_idfrom the resolved agent name,tenant_idfrom room or jobmetadata["tenant"], andprojectfrom the observer config. One sink is built lazily per worker and shared across all of that worker's sessions. The adapter is duck-typed (no hard runtime dependency onopenrtc, soimport voicegateway.openrtcworks without it installed) and picklable for OpenRTC'sprocessisolation mode. Install withpip install "voicegateway[openrtc]"(requiresopenrtc>=0.3.0). See the OpenRTC example for the full walkthrough.
v0.8.6: minimal config records to storage so the dashboard works
Fixed
voicegw initenables cost tracking by default again. The v0.8.4 minimal template dropped thecost_trackingblock, sovoicegw servestarted with storage disabled (gateway.storage is None) and the dashboard showed no costs, sessions, or agents, even though the agent SDK'svoicegateway.attach()was writing to the default SQLite database. The minimal template now enablescost_trackingat that same default path (~/.config/voicegateway/voicegw.db), so a first run sees its agents on the dashboard with no extra wiring. Existing configs are unaffected; if your server was already started with storage off, addcost_tracking: {enabled: true}or setVOICEGW_DB_PATH.
v0.8.5: fix the Docker image build
The published Docker images for v0.8.3 and v0.8.4 failed to build and were never pushed. This restores them. The Python package is unchanged from v0.8.4.
Fixed
- The Docker image builds again. v0.8.3 added a wheel
force-includefor the Alembic migrations, but neither Dockerfile copiedalembic/andalembic.iniinto the build stage, sopip installfailed during metadata generation withForced include not found: /build/alembic. Both the core and dashboard Dockerfiles now copy the migrations into the builder before installing. PyPI was unaffected (it uses a different build path); only the Docker images failed.
v0.8.4: quieter agents, live dashboard version, friendlier init
Polish from dogfooding the agent SDK. Embedded telemetry no longer floods a
host agent's debug logs, the dashboard reports the real version, the model list
only shows models you can actually call, and voicegw init starts minimal.
Fixed
- Embedded storage no longer floods agent DEBUG logs.
voicegateway.attach()runs SQLite storage in-process. Under a LiveKitconsole/devrun (root logger at DEBUG) theaiosqliteandalembicloggers emitted a line per query, burying the agent's own output.StorageServicenow quiets those two dependency loggers to WARNING, and only when the caller has not set a level of their own (an explicitaiosqlite=DEBUGstill wins). - The dashboard and
/healthreport the real version. The footer pill and the/healthendpoint were hardcoded to0.5.0. Both now read the installed__version__(PEP 440 local build segment stripped), exposed via/healthand the dashboard/api/status.
Changed
- The dashboard lists only callable models.
/api/statusnow returns models whose provider is configured (a cloud API key is set, or it is a local provider). The sidebar count and the Models page stop advertising models the operator cannot reach. voicegw initwrites a minimal config by default. First run gets a short STT + LLM + TTS starter (about 35 lines) instead of the 269-line reference. Runvoicegw init --fullfor the complete annotated config.
v0.8.3: ship migrations in the wheel
Fixed
- The PyPI wheel now ships the Alembic migrations.
alembic/andalembic.inilive at the repo root, outside thesrc/packages, so the published wheel carried no migrations andrun_migrations()failed at runtime withalembic.ini not foundwhenever storage initialized (hit byvoicegw serve,voicegw dashboard, andvoicegateway.attach()'s local SQLite sink). They are now force-included under the package, so apip install voicegatewaycan build its schema on first run.
v0.8.2: importable base install
Fixed
pip install voicegatewayis importable again. SQLAlchemy and SQLModel are pulled in at import time byvoicegateway.models, and the embedded storage thatvoicegateway.attach()writes through needs Alembic, but all three lived only in theserverextra. They are now core dependencies, so a base install (the agent SDK use case) no longer fails withModuleNotFoundError: No module named 'sqlalchemy'.- The
serverextra installspython-multipart. FastAPI's dashboard logo upload (anUploadFileroute) requires it; it was missing, sovoicegw servewarned and the upload endpoint would have failed.
v0.8.1: Docker fleet collector support
The official Docker image can now run as the Postgres-backed fleet collector.
Fixed
- The image ships the migrations.
alembic.iniand thealembic/tree are now copied into the image, so the server builds its schema on first start. It could not before (neither was copied in), which left storage broken on a fresh container for both SQLite and Postgres. VOICEGW_DB_URLenables storage. Pointing the collector at Postgres withVOICEGW_DB_URLalone now turns storage on. Previously it also requiredcost_tracking.enabledorVOICEGW_DB_PATH, soPOST /v1/ingestreturned 503 and the collector persisted nothing.
Added
- The Docker image includes the
postgresextra (asyncpg), so it can run against a Postgres collector backend out of the box. docker-compose.collector.yml: a ready-to-run Postgres + collector stack, with a deployment guide in the docs.
v0.8.0: fleet collector operational hardening
The self-hosted fleet collector becomes safe to run unattended: ingest rate limiting, data retention, a windowed per-agent dashboard rollup, and the background workers that keep them fresh. Two latent bugs that left the collector fragile are fixed along the way.
Added
- Ingest rate limiting.
POST /v1/ingestenforces a per-caller token bucket (keyed by virtual key, then static API key, then client IP). Over-limit requests get429with aRetry-Afterheader; oversized batches get413. Configured under the newingestblock (requests_per_minute,burst,max_batch_size). - Data retention. A background worker hard-deletes aged rows per project:
sessions and their dependent rows (replay, turns, dead-air, guardrail) by
ended_at, and requests bytimestamp, in batches. Configured under the newretentionblock (default_days, default 90). - Windowed fleet rollup. A new
agent_observationstable and a 15-minute worker pre-aggregate per-agent cost, requests, p95, and error rate over a 24h window, so the Agents dashboard list is fast and internally consistent. - Background workers wired into the server. The latency rollup, agent rollup,
and retention workers now start with the collector. Configured under the new
workersblock (enabled,rollup_interval_seconds,retention_interval_seconds).
Changed
- Ingest rate limiting is on by default (120 requests per minute per caller).
A collector already ingesting faster will start receiving
429s; the library's remote sink honorsRetry-Afterand retries without dropping the batch. Setingest.requests_per_minute: 0oringest.enabled: falseto opt out. - The Agents dashboard list now covers the last 24 hours instead of all time. The JSON shape is unchanged; cost, requests, p95, and error rate are now window-scoped. The per-agent detail view stays all-time.
Fixed
- The remote sink no longer drops telemetry under rate limiting. A
429is treated as backpressure (parseRetry-After, clamp to 60s, retry the same batch) rather than dropped after a short fixed backoff. - Background workers now actually run in production. The FastAPI lifespan was never attached, so the latency-rollup and retention workers were dormant; the collector now starts and stops all three workers on boot and shutdown.
v0.7.0: voice-prices pricing backend
Pricing moves from pydantic/genai-prices to
voice-prices, a fork that
prices all three modalities (LLM, STT, and TTS) from one source.
Changed
- Single pricing backend. LLM, STT, and TTS costs now all resolve
through
voice-prices. The hand-maintained local STT/TTS rate catalogs are retired;voice-pricesowns rates and freshness (each entry carriesprices_checkedandpricing_source_url). - Pricing-source attribution. Cloud-priced records are tagged
voice-prices@<version>; self-hosted (local/*,ollama/*) models are taggedvoicegateway-local; unknown models stay unpriced. The catalog-onlyoldest_entry_datefield is dropped from the/v1/statusand/api/statusresponses (voice-pricesowns freshness). - STT and TTS rates now follow
voice-pricesand may differ from the previous local-catalog estimates. Reconcile against your provider invoices.
Dependencies
genai-pricesreplaced byvoice-prices>=0.0.8,<0.1.
v0.6.0: first public release
The first public release of VoiceGateway. A self-hosted gateway for LiveKit voice agents that tracks costs per modality (audio-minutes for STT, tokens for LLM, characters for TTS) and reconciles logged costs against provider invoices.
What you get out of the box
- Drop-in replacement for
livekit.agents.inference. Swap one import line and your agent code keeps running:from voicegateway.inference import STT, LLM, TTS. Cost tracking, latency monitoring, and per-session correlation happen transparently. - Cost tracking per modality. LLM cost per 1k tokens (prices from
pydantic/genai-prices, 1100+ models). STT cost per audio-minute and TTS cost per character (catalog with source-date metadata). Cached LLM input tokens are billed at the provider's cache-read discount rate (OpenAI 50%, Anthropic ~10%) by surfacing LiveKit'sprompt_cached_tokensthrough togenai-prices.cache_read_tokens. - Background daemon.
voicegw onboardruns a five-question wizard, writesvoicegw.yaml, registers a user-scoped service (LaunchAgent on macOS,systemd --userunit on Linux, Scheduled Task on Windows), and starts the daemon. - Web dashboard and HTTP API on a single port. The daemon serves
the React dashboard at
/, the dashboard API at/api/*, and the public HTTP API at/v1/*.voicegw dashboardopens your browser at the daemon URL. - Reconciliation tooling.
voicegw export-costsandvoicegw reconcilecompare your logged costs against your provider's usage export. Per-rowpricing_sourceattribution shows exactly which catalog or version priced each call. - MCP server for agent-managed config. Seventeen tools over stdio and HTTP/SSE let Claude Code, Cursor, Codex, and Cline manage providers, projects, budgets, and queries conversationally.
- Multi-tenant attribution. Virtual API keys carry a tenant id so sessions auto-tag for per-customer reporting. Virtual keys expose their plaintext exactly once at creation and support soft revocation.
- Cross-modality routing. Per-session, lowest-predicted-total- latency selection of (STT, LLM, TTS) from per-project rosters, with observed latency feeding the predictor.
- White-label branding. Per-project logo, accent color, and product name. The dashboard chrome reflects the brand for users scoped to that project.
- Conversation replay. Per-modality time-ordered capture of every request, with retention windows configurable per project.
- Guardrails. Per-project policy overlay (PII categories, action enforcement), audit log of fired and bypassed events.
Install
curl -fsSL https://voicegateway.mahimai.ca/install.sh | bashOr:
pipx install 'voicegateway[cloud,dashboard]'
uv tool install 'voicegateway[cloud,dashboard]'See Get started for the full first-run flow.