The streaming mic button only began recording on the SECOND click. The VAD
library creates its AudioContext inside vad.start() and never resumes it; on the
first click the lazy model load (import + MicVAD.new) ran first, so the context
was created after the user-gesture window expired and started suspended — the
audio worklet never ran, so nothing happened. The second click was fast (model
cached) so the context landed inside the gesture and worked.
Create and resume our own AudioContext synchronously at the top of start()
(inside the click gesture, before the model load) and inject it into MicVAD,
which then does not take ownership of it; it is reused across start/stop and
closed only on unmount. Add a "loading" status so the first click is shown as a
spinner (disabled) while the model loads, which also blocks a confusing second
click.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a lightweight "streaming" dictation mode as a simpler alternative to the
realtime-websocket path: detect speech with Silero VAD (@ricky0123/vad-web),
cut each segment on a pause and POST it to the existing /ai-chat/transcribe
endpoint, so text appears progressively. No server changes.
- new useStreamingDictation hook (same API as useDictation), lazy-loads VAD,
in-order seq emission, session-epoch guard against stop->start races
- new encodeWavPcm16 util (Float32 -> mono PCM16 WAV, accepted by the server)
- MicButton gains a `streaming` prop; enabled in the editor toolbar and chat
- VAD tuning: redemptionMs 640 / preSpeechPadMs 320 / minSpeechMs 96
- batch dictation kept as the fallback (streaming=false)
- deps: @ricky0123/vad-web@0.0.30, onnxruntime-web@1.27.0
Note: VAD assets load from the library CDN by default; for self-hosted/offline
set VAD_BASE_ASSET_PATH/VAD_ONNX_WASM_BASE_PATH and copy assets to public/vad/.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The byline mic rendered blue and with a smaller (16px) glyph next to the
gray 20px info icon, so it looked misaligned with an uneven gap. Add
optional color/iconSize props to MicButton (forwarded through
DictationGroup) and render the byline mic gray at 20px, wrapping it and
the info icon in a tight nowrap group so they read as a snug, aligned
pair. The AI chat mic is unchanged (passes neither prop).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The live mic-level halo around the stop button was frozen at a constant
scale (1.15) whenever the OS "Reduce motion" setting was on, so it never
reacted to the voice while dictating. Make haloScale unconditional so it
always follows audioLevel (amplitude 0.9), and drop the now-unused
useReducedMotion import and reduceMotion local.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a pulsing halo behind the stop button that scales with the
microphone input level, giving real-time feedback that recording is
active and the mic is picking up sound.
- use-dictation: meter the captured MediaStream via AudioContext +
AnalyserNode (analyser only, never connected to destination), compute
a smoothed RMS audioLevel (0..1) in a requestAnimationFrame loop, and
tear the meter down on every recording-end path (stop/cancel/auto-stop/
unmount); meter failure is non-fatal to recording
- mic-button: render a translucent red halo whose scale follows
audioLevel; honor prefers-reduced-motion with a static halo
- stop(): recover and release resources when no live recorder remains
- fix unhandled rejection from AudioContext.resume()
Add push-to-talk voice dictation that transcribes recorded audio on the
server via the workspace's OpenAI-compatible AI provider (Whisper /
gpt-4o-transcribe / self-hosted whisper), then inserts the text.
Backend:
- New `stt_api_key_enc` column + migration; STT creds parity with chat/
embeddings (sttModel/sttBaseUrl/sttApiKey, write-only key, fallbacks to
chat baseUrl/key). Both provider whitelists updated (service + repo).
- AiService.getTranscriptionModel + AiTranscriptionService.
- Gated POST /ai-chat/transcribe (dictation flag → 403, JWT + workspace
scope + throttle, 25MB cap, MIME whitelist, never logs audio/key).
- New `settings.ai.dictation` workspace flag (DTO + service + audit).
Frontend:
- Wire up the Voice/STT settings card (model/base URL/key) and the
Voice-dictation toggle.
- New `features/dictation`: useDictation (MediaRecorder state machine),
MicButton, transcribe service; integrated into the chat composer and a
new editor-toolbar dictation group, both gated by ai.dictation.