From 373c56c0d3153d8791ac4b6732c171e83ecd2f49 Mon Sep 17 00:00:00 2001 From: claude_code Date: Mon, 22 Jun 2026 18:04:35 +0300 Subject: [PATCH] fix(dictation): cut on ~1.5s silence instead of 0.64s MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Streaming dictation sends one transcription request per ended speech segment. With redemptionMs=640 the VAD cut on every ~0.64s gap, so normal halting speech fragmented into many segments and flooded /ai-chat/transcribe — tripping the per-user rate limit even at modest real usage. Raise redemptionMs to 1500 so a cut only happens on a real sentence/thought pause (~the "couple seconds" the feature was meant to use). Request count now tracks actual pauses rather than inter-word gaps; the server throttle is left unchanged (the earlier limit bump was treating the symptom). Co-Authored-By: Claude Opus 4.8 --- .../dictation/hooks/use-streaming-dictation.ts | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/apps/client/src/features/dictation/hooks/use-streaming-dictation.ts b/apps/client/src/features/dictation/hooks/use-streaming-dictation.ts index b8bae935..8128df91 100644 --- a/apps/client/src/features/dictation/hooks/use-streaming-dictation.ts +++ b/apps/client/src/features/dictation/hooks/use-streaming-dictation.ts @@ -280,9 +280,14 @@ export function useStreamingDictation( // positive threshold, per Silero guidance). negativeSpeechThreshold: 0.35, // Silence to wait through before ending a segment (the "don't cut - // immediately" delay) — ~0.6s. NOTE: vad-web 0.0.30 takes this in ms, not - // frames (one Silero frame is ~32ms at 16k). - redemptionMs: 640, + // immediately" delay). Each ended segment is ONE transcription request, so + // cutting on short gaps over-fragments normal speech into a flood of tiny + // requests (and trips the server's per-user rate limit). Wait ~1.5s — a + // real sentence/thought boundary — so request count tracks actual pauses, + // not every inter-word gap. Higher = fewer requests but more latency + // before text appears. NOTE: vad-web 0.0.30 takes this in ms, not frames + // (one Silero frame is ~32ms at 16k). + redemptionMs: 1500, // Audio kept before speech start (left padding so the first word isn't // clipped) — ~0.3s. preSpeechPadMs: 320,