fix(dictation): cut on ~1.5s silence instead of 0.64s
Streaming dictation sends one transcription request per ended speech segment. With redemptionMs=640 the VAD cut on every ~0.64s gap, so normal halting speech fragmented into many segments and flooded /ai-chat/transcribe — tripping the per-user rate limit even at modest real usage. Raise redemptionMs to 1500 so a cut only happens on a real sentence/thought pause (~the "couple seconds" the feature was meant to use). Request count now tracks actual pauses rather than inter-word gaps; the server throttle is left unchanged (the earlier limit bump was treating the symptom). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -280,9 +280,14 @@ export function useStreamingDictation(
|
|||||||
// positive threshold, per Silero guidance).
|
// positive threshold, per Silero guidance).
|
||||||
negativeSpeechThreshold: 0.35,
|
negativeSpeechThreshold: 0.35,
|
||||||
// Silence to wait through before ending a segment (the "don't cut
|
// Silence to wait through before ending a segment (the "don't cut
|
||||||
// immediately" delay) — ~0.6s. NOTE: vad-web 0.0.30 takes this in ms, not
|
// immediately" delay). Each ended segment is ONE transcription request, so
|
||||||
// frames (one Silero frame is ~32ms at 16k).
|
// cutting on short gaps over-fragments normal speech into a flood of tiny
|
||||||
redemptionMs: 640,
|
// requests (and trips the server's per-user rate limit). Wait ~1.5s — a
|
||||||
|
// real sentence/thought boundary — so request count tracks actual pauses,
|
||||||
|
// not every inter-word gap. Higher = fewer requests but more latency
|
||||||
|
// before text appears. NOTE: vad-web 0.0.30 takes this in ms, not frames
|
||||||
|
// (one Silero frame is ~32ms at 16k).
|
||||||
|
redemptionMs: 1500,
|
||||||
// Audio kept before speech start (left padding so the first word isn't
|
// Audio kept before speech start (left padding so the first word isn't
|
||||||
// clipped) — ~0.3s.
|
// clipped) — ~0.3s.
|
||||||
preSpeechPadMs: 320,
|
preSpeechPadMs: 320,
|
||||||
|
|||||||
Reference in New Issue
Block a user