Cost Tracking
VoiceGateway records the cost of every request that flows through it: tokens for LLM, audio seconds for STT, characters for TTS. Cost data lands in SQLite alongside latency metrics and is the source o
Cost Tracking
VoiceGateway records the cost of every request that flows through it: tokens for LLM, audio seconds for STT, characters for TTS. Cost data lands in SQLite alongside latency metrics and is the source of truth for the dashboard, the voicegw reconcile command, and per-project budget enforcement.
This page covers the cost-tracking subsystem end-to-end: the pricing layer, the per-request flow, and the substitute-validation strategy that backs the streaming cost accuracy claim.
Architecture
Pricing layer
The pricing facade in src/voicegateway/pricing/catalog.py exposes two functions:
calculate_cost(
modality: str,
model: str,
*,
audio_seconds: float = 0.0,
input_tokens: int = 0,
output_tokens: int = 0,
character_count: int = 0,
) -> Decimal | None
pricing_source(modality: str) -> strcalculate_cost dispatches by modality:
- LLM (
modality="llm"): usesinput_tokensandoutput_tokens. Routes topricing/llm.py, which wrapsvoice-prices. Returns the voice-prices total.pricing_source("llm")isvoice-prices@<version>. - STT (
modality="stt"): usesaudio_seconds. Routes topricing/stt.py, which maps the duration onto avoice-priceslookup.pricing_source("stt")isvoice-prices@<version>. - TTS (
modality="tts"): usescharacter_count. Routes topricing/tts.py, samevoice-pricespattern as STT. - Self-hosted (
local/*,ollama/*): priced at$0by a facade guard, attributed asvoicegateway-local.
All three modalities return None for unknown models (never silent zero), so callers can distinguish "free" from "unknown."
A 60-day staleness gate fails CI when any local-catalog entry's pricing_source_date is older than 60 days, forcing a quarterly refresh.
Per-request flow
Every wrapped request flows through _InstrumentedBase._log_request:
- Compute total latency as
now - start_time. - Compute TTFB as
first_byte_time - start_timeif the streaming hook fired; otherwise fall back to total latency. - Build a
RequestRecordviaCostTracker.create_record(...), which calls into the pricing facade and attachespricing_sourceto the record. - Write to storage via
SQLiteStorage.log_request(...). A failure logs at warning and is swallowed; in-memory accounting must not break because the disk is full. - Notify the budget enforcer via
CostTracker.notify_spend(...)so per-project caps stay accurate even during a storage outage.
Each RequestRecord carries the same pricing_source string the catalog returned, so voicegw reconcile can attribute the recorded number to a specific upstream catalog version.
How streaming cost accounting is validated
Streaming is where the real-world cost-tracking bugs hide: tokens that double at chunk boundaries, audio-second accumulators that drift, character counts that miss SSML markup. VoiceGateway closes the validation gap without requiring real production traffic.
The substitute strategy
Rather than dogfood the gateway in production and reconcile against provider invoices, VG records real provider streaming responses once via src/voicegateway/tests/fixtures/streaming/record_streaming_fixtures.py and replays them in CI forever. Each fixture is a JSON file with three load-bearing sections:
request: the literal payload VG sent.response_stream: the chunks the provider returned, withreceived_at_mstimestamps.provider_reported_usage: the usage block the provider reported at end-of-stream (tokens for LLM, duration for STT, character count for TTS).
The fixture also pins expected_cost_usd, computed at recording time by passing provider_reported_usage through voicegateway.pricing.catalog.calculate_cost. Quantized to 8 decimal places. This locks the cost math at the recording's price: if a catalog updates later, the fixture's expected_cost_usd stays at the price-at-recording. The fixture validates VG's math, not "today's price."
Filename convention is locked at <provider>_<model>_<modality>_<mode>_<YYYY-MM-DD>.json. The date drives the staleness check.
What the replay tests assert
src/voicegateway/tests/test_streaming_cost_accounting.py parameterizes over every committed fixture and asserts three things per fixture:
- Unit-count consistency:
provider_reported_usageagrees with the actual contents ofresponse_stream. For LLM, the normalizedinput_tokens/output_tokens/total_tokensmust equal the values inside the trailing ChatCompletion usage chunk. For STT,audio_secondsmust equal Deepgram'smetadata.duration. For TTS,character_countmust equallen(request.transcript). Catches recorder field-name typos, provider schema drift, and off-by-one normalization. - Cost calculation:
calculate_cost(provider_reported_usage)quantized to 8 dp must equalfixture.expected_cost_usdquantized to 8 dp. Catches cost-layer regressions (modality-dispatch bugs, pricing-source attribution drift, Decimal precision losses). - TTFB hook behavior (stream fixtures only): a wrapper that calls
_mark_first_bytepartway through must producettfb_ms < total_latency_ms. A wrapper that never calls it must producettfb_ms == total_latency_ms(the documented fallback). Catches modality refactors that forget to wire TTFB.
Plus a separate src/voicegateway/tests/test_ttfb_hook_coverage.py runs the TTFB-hook contract against synthetic streams for every modality, gated against wrap_provider's dispatch table so a future modality cannot land without TTFB coverage.
Honest limits of the substitute strategy
Fixture replay is not a complete substitute for production traffic. It does not catch:
- Real-time streaming behavior: replay is sequential and synchronous. We do not simulate network jitter, partial chunks split across TCP packets, or out-of-order delivery.
- Provider-side correctness: if Deepgram's reported usage is off by 0.1 seconds, the fixture accepts that as ground truth. The suite validates VG's accounting matches the provider's, not whether the provider is right.
- Stale fixtures: recorded fixtures capture provider behavior at a point in time. If a provider changes its streaming format, the fixture's
response_streamno longer matches what VG would see today. The filename's date convention surfaces staleness; a quarterly refresh task is on the maintenance backlog. - End-to-end LiveKit session validation: the wrappers are tested in isolation, not as part of a real
AgentSession. Session-level integration testing is deferred (it sits in the OpenRTC-Python Phase 2 plan).
The architecture is honest about this scope: cost tracking is validated against fixture-recorded provider responses, not against real production traffic. Without the fixture-replay phase, that distinction would be invisible; with it, the per-fixture date and provider attribution make the validation surface explicit.
Where to find each piece
src/voicegateway/pricing/catalog.py,llm.py,stt.py,tts.py: the pricing layer.src/voicegateway/middleware/cost_tracker.py: per-request record builder.src/voicegateway/middleware/instrumented_provider.py:_InstrumentedBase+wrap_provider+ the TTFB / log_request hooks.src/voicegateway/tests/fixtures/streaming/: recorded fixtures, schema, loader._schema.py:StreamingFixturePydantic model._loader.py:discover_fixtures,load_fixture, filename-decode helper.README.md: the fixture format and refresh policy.PLACEHOLDER.md: runbook for recording the six minimum fixtures.
src/voicegateway/tests/fixtures/streaming/record_streaming_fixtures.py: the dev-only recorder, gated behind--recordand--confirm. Its module docstring documents cost expectations and operational warnings.src/voicegateway/tests/test_streaming_cost_accounting.py: the three-assertion replay suite.src/voicegateway/tests/test_ttfb_hook_coverage.py: per-modality TTFB hardening.
Configuration Layers
VoiceGateway merges configuration from three sources with a clear priority order. This allows base configuration in YAML, dynamic management via the dashboard/MCP, and environment-level overrides.
Gateway Core
The core layer wires configuration, storage, and middleware together so the `voicegateway.inference` factories and the operations endpoints (CLI, HTTP, MCP, dashboard) all share one source of truth.