gitmost

Author	SHA1	Message	Date
claude_code	8f01a01122	fix(dictation): start streaming dictation on the first click The streaming mic button only began recording on the SECOND click. The VAD library creates its AudioContext inside vad.start() and never resumes it; on the first click the lazy model load (import + MicVAD.new) ran first, so the context was created after the user-gesture window expired and started suspended — the audio worklet never ran, so nothing happened. The second click was fast (model cached) so the context landed inside the gesture and worked. Create and resume our own AudioContext synchronously at the top of start() (inside the click gesture, before the model load) and inject it into MicVAD, which then does not take ownership of it; it is reused across start/stop and closed only on unmount. Add a "loading" status so the first click is shown as a spinner (disabled) while the model loads, which also blocks a confusing second click. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-22 18:42:44 +03:00
claude_code	4f0da42d88	feat(dictation): streaming STT via silence cut (Silero VAD) Add a lightweight "streaming" dictation mode as a simpler alternative to the realtime-websocket path: detect speech with Silero VAD (@ricky0123/vad-web), cut each segment on a pause and POST it to the existing /ai-chat/transcribe endpoint, so text appears progressively. No server changes. - new useStreamingDictation hook (same API as useDictation), lazy-loads VAD, in-order seq emission, session-epoch guard against stop->start races - new encodeWavPcm16 util (Float32 -> mono PCM16 WAV, accepted by the server) - MicButton gains a `streaming` prop; enabled in the editor toolbar and chat - VAD tuning: redemptionMs 640 / preSpeechPadMs 320 / minSpeechMs 96 - batch dictation kept as the fallback (streaming=false) - deps: @ricky0123/vad-web@0.0.30, onnxruntime-web@1.27.0 Note: VAD assets load from the library CDN by default; for self-hosted/offline set VAD_BASE_ASSET_PATH/VAD_ONNX_WASM_BASE_PATH and copy assets to public/vad/. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-22 16:52:05 +03:00
claude_code	ef74058301	style(editor): align byline dictation mic with the info icon The byline mic rendered blue and with a smaller (16px) glyph next to the gray 20px info icon, so it looked misaligned with an uneven gap. Add optional color/iconSize props to MicButton (forwarded through DictationGroup) and render the byline mic gray at 20px, wrapping it and the info icon in a tight nowrap group so they read as a snug, aligned pair. The AI chat mic is unchanged (passes neither prop). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-22 02:48:50 +03:00
claude_code	0deded342d	fix(dictation): drive the recording halo from mic level under reduced-motion The live mic-level halo around the stop button was frozen at a constant scale (1.15) whenever the OS "Reduce motion" setting was on, so it never reacted to the voice while dictating. Make haloScale unconditional so it always follows audioLevel (amplitude 0.9), and drop the now-unused useReducedMotion import and reduceMotion local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-21 23:34:07 +03:00
claude_code	55625874c5	feat(dictation): show live mic level while recording Add a pulsing halo behind the stop button that scales with the microphone input level, giving real-time feedback that recording is active and the mic is picking up sound. - use-dictation: meter the captured MediaStream via AudioContext + AnalyserNode (analyser only, never connected to destination), compute a smoothed RMS audioLevel (0..1) in a requestAnimationFrame loop, and tear the meter down on every recording-end path (stop/cancel/auto-stop/ unmount); meter failure is non-fatal to recording - mic-button: render a translucent red halo whose scale follows audioLevel; honor prefers-reduced-motion with a static halo - stop(): recover and release resources when no live recorder remains - fix unhandled rejection from AudioContext.resume()	2026-06-21 21:04:22 +03:00
vvzvlad	874bdd021c	feat(ai): server-side voice dictation (STT) with mic in chat and editor Add push-to-talk voice dictation that transcribes recorded audio on the server via the workspace's OpenAI-compatible AI provider (Whisper / gpt-4o-transcribe / self-hosted whisper), then inserts the text. Backend: - New `stt_api_key_enc` column + migration; STT creds parity with chat/ embeddings (sttModel/sttBaseUrl/sttApiKey, write-only key, fallbacks to chat baseUrl/key). Both provider whitelists updated (service + repo). - AiService.getTranscriptionModel + AiTranscriptionService. - Gated POST /ai-chat/transcribe (dictation flag → 403, JWT + workspace scope + throttle, 25MB cap, MIME whitelist, never logs audio/key). - New `settings.ai.dictation` workspace flag (DTO + service + audit). Frontend: - Wire up the Voice/STT settings card (model/base URL/key) and the Voice-dictation toggle. - New `features/dictation`: useDictation (MediaRecorder state machine), MicButton, transcribe service; integrated into the chat composer and a new editor-toolbar dictation group, both gated by ai.dictation.	2026-06-18 18:45:33 +03:00

6 Commits