Observability
Observability
Vox treats transcription telemetry as a first-class runtime feature.
Dimensions
Each performance sample is tagged with:
clientIdroutemodelId
Metrics
Current metrics include:
fileCheckMsmodelCheckMsmodelLoadMsaudioLoadMsaudioPrepareMsinferenceMstotalMsaudioDurationMs
Additional useful derived values:
realtimeFactor- warm vs cold path behavior from
modelLoadMs - effective audio-to-text speed from
audioDurationMs / inferenceMs
Storage
The runtime appends JSON lines to:
~/.vox/performance.jsonl
That local store powers the CLI dashboard today and can later be exported to another metrics backend if needed.
Operator Commands
vox transcribe file --metrics /tmp/sample.wav
vox transcribe bench /tmp/sample.wav 5
vox perf dashboard
vox perf dashboard --client vox-cli
Philosophy
Loaded-model inference speed and end-to-end latency are different things.
Vox records both:
inferenceMstells you how fast the hot model is.totalMstells you what the user actually experienced.
Example sample
{
"clientId": "raycast",
"route": "transcribe.file",
"modelId": "parakeet:v3",
"audioDurationMs": 5110,
"inferenceMs": 151,
"totalMs": 165
}
How to read the dashboard
- Compare clients against each other only when the audio shape is similar
- Use
inferenceMsto judge loaded-model speed - Use
totalMsto judge end-user experience - Treat large
modelLoadMsspikes as warm-up lifecycle events, not steady-state inference regressions