gitmost

Author	SHA1	Message	Date
claude code agent 227	7c48bab1f2	test: add unit tests for 10 candidates from issue #139 Adds co-located unit tests for ten targets (client → vitest .test.ts(x), server → jest .spec.ts), plus minimal behavior-preserving extractions/exports where the issue required a pure function to test: - encode-wav: WAV header + PCM16 clamping - editor-ext embed-provider / utils (sanitizeUrl, isInternalFileUrl) / indent (export clampIndent) - label.dto @Matches regex - move-page.dto vs generateJitteredKeyBetween parity (bug locked via test.failing) - new-note-button canCreatePage (extracted to can-create-page.ts) - history-editor diff (extracted pure computeHistoryDiff into history-diff.ts) - notification getTypesForTab + repo contract (direct-tab divergence locked via test.failing) - search buildTsQuery (extracted + sanitizes operator inputs so adversarial queries no longer risk a to_tsquery 500) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 04:13:44 +03:00
claude_code	8f01a01122	fix(dictation): start streaming dictation on the first click The streaming mic button only began recording on the SECOND click. The VAD library creates its AudioContext inside vad.start() and never resumes it; on the first click the lazy model load (import + MicVAD.new) ran first, so the context was created after the user-gesture window expired and started suspended — the audio worklet never ran, so nothing happened. The second click was fast (model cached) so the context landed inside the gesture and worked. Create and resume our own AudioContext synchronously at the top of start() (inside the click gesture, before the model load) and inject it into MicVAD, which then does not take ownership of it; it is reused across start/stop and closed only on unmount. Add a "loading" status so the first click is shown as a spinner (disabled) while the model loads, which also blocks a confusing second click. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-22 18:42:44 +03:00
claude_code	373c56c0d3	fix(dictation): cut on ~1.5s silence instead of 0.64s Streaming dictation sends one transcription request per ended speech segment. With redemptionMs=640 the VAD cut on every ~0.64s gap, so normal halting speech fragmented into many segments and flooded /ai-chat/transcribe — tripping the per-user rate limit even at modest real usage. Raise redemptionMs to 1500 so a cut only happens on a real sentence/thought pause (~the "couple seconds" the feature was meant to use). Request count now tracks actual pauses rather than inter-word gaps; the server throttle is left unchanged (the earlier limit bump was treating the symptom). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-22 18:04:35 +03:00
claude_code	7093f184b2	fix(dictation): self-host Silero VAD / onnxruntime-web assets Streaming dictation failed at runtime with "no available backend found / 'text/html' is not a valid JavaScript MIME type": @ricky0123/vad-web 0.0.30 defaults baseAssetPath/onnxWASMBasePath to "./" (relative to the page URL), so the worklet, Silero model and ORT wasm/mjs were requested against the SPA catch-all and came back as index.html. Serve them from a fixed /vad/ instead: - scripts/copy-vad-assets.mjs copies the 4 runtime assets (vad worklet, silero_vad_v5.onnx, ort-wasm-simd-threaded.jsep.{mjs,wasm}) from node_modules into apps/client/public/vad/ (gitignored — the ORT wasm is ~26 MB) - client dev/build scripts run the copy first so the assets are always present - useStreamingDictation points both path constants at "/vad/" Verified: dev server serves all four under /vad/ with HTTP 200 and correct Content-Type (js/wasm, never text/html); tsc clean. Prod (Docker) build runs the copy step, so dist/vad/* ships in the image. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-22 17:19:11 +03:00
claude_code	4f0da42d88	feat(dictation): streaming STT via silence cut (Silero VAD) Add a lightweight "streaming" dictation mode as a simpler alternative to the realtime-websocket path: detect speech with Silero VAD (@ricky0123/vad-web), cut each segment on a pause and POST it to the existing /ai-chat/transcribe endpoint, so text appears progressively. No server changes. - new useStreamingDictation hook (same API as useDictation), lazy-loads VAD, in-order seq emission, session-epoch guard against stop->start races - new encodeWavPcm16 util (Float32 -> mono PCM16 WAV, accepted by the server) - MicButton gains a `streaming` prop; enabled in the editor toolbar and chat - VAD tuning: redemptionMs 640 / preSpeechPadMs 320 / minSpeechMs 96 - batch dictation kept as the fallback (streaming=false) - deps: @ricky0123/vad-web@0.0.30, onnxruntime-web@1.27.0 Note: VAD assets load from the library CDN by default; for self-hosted/offline set VAD_BASE_ASSET_PATH/VAD_ONNX_WASM_BASE_PATH and copy assets to public/vad/. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-22 16:52:05 +03:00
claude_code	89ac8fa37b	style(dictation): match mic button halo radius to button shape Update the halo's border-radius from a fixed 50% circle to the theme's default radius variable. This ensures the red pulse follows the button's rounded‑square outline instead of appearing circular.	2026-06-22 16:01:53 +03:00
claude_code	ef74058301	style(editor): align byline dictation mic with the info icon The byline mic rendered blue and with a smaller (16px) glyph next to the gray 20px info icon, so it looked misaligned with an uneven gap. Add optional color/iconSize props to MicButton (forwarded through DictationGroup) and render the byline mic gray at 20px, wrapping it and the info icon in a tight nowrap group so they read as a snug, aligned pair. The AI chat mic is unchanged (passes neither prop). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-22 02:48:50 +03:00
claude_code	0deded342d	fix(dictation): drive the recording halo from mic level under reduced-motion The live mic-level halo around the stop button was frozen at a constant scale (1.15) whenever the OS "Reduce motion" setting was on, so it never reacted to the voice while dictating. Make haloScale unconditional so it always follows audioLevel (amplitude 0.9), and drop the now-unused useReducedMotion import and reduceMotion local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-21 23:34:07 +03:00
claude_code	55625874c5	feat(dictation): show live mic level while recording Add a pulsing halo behind the stop button that scales with the microphone input level, giving real-time feedback that recording is active and the mic is picking up sound. - use-dictation: meter the captured MediaStream via AudioContext + AnalyserNode (analyser only, never connected to destination), compute a smoothed RMS audioLevel (0..1) in a requestAnimationFrame loop, and tear the meter down on every recording-end path (stop/cancel/auto-stop/ unmount); meter failure is non-fatal to recording - mic-button: render a translucent red halo whose scale follows audioLevel; honor prefers-reduced-motion with a static halo - stop(): recover and release resources when no live recorder remains - fix unhandled rejection from AudioContext.resume()	2026-06-21 21:04:22 +03:00
vvzvlad	77249d59c6	feat(ai): OpenRouter STT support + real error surfacing + STT endpoint test - ai.service: route *.openrouter.ai STT to its JSON+base64 /audio/transcriptions API; keep the OpenAI multipart path (AI SDK) for OpenAI/self-hosted whisper. Unify transcription behind transcribe(). - /transcribe controller: surface the real provider/transport reason (describeProviderError) instead of an opaque 500; preserve HttpException. - testConnection: add an 'stt' capability (silent-WAV probe) + DTO; client gets a Test endpoint button and status dot on the Voice/STT card. - useDictation: log full errors to the console and show the real reason (mic start + transcription paths); handle NotReadable/Abort and missing mediaDevices. - docs(CLAUDE.md): require full error logging + specific user-facing messages.	2026-06-18 19:26:35 +03:00
vvzvlad	874bdd021c	feat(ai): server-side voice dictation (STT) with mic in chat and editor Add push-to-talk voice dictation that transcribes recorded audio on the server via the workspace's OpenAI-compatible AI provider (Whisper / gpt-4o-transcribe / self-hosted whisper), then inserts the text. Backend: - New `stt_api_key_enc` column + migration; STT creds parity with chat/ embeddings (sttModel/sttBaseUrl/sttApiKey, write-only key, fallbacks to chat baseUrl/key). Both provider whitelists updated (service + repo). - AiService.getTranscriptionModel + AiTranscriptionService. - Gated POST /ai-chat/transcribe (dictation flag → 403, JWT + workspace scope + throttle, 25MB cap, MIME whitelist, never logs audio/key). - New `settings.ai.dictation` workspace flag (DTO + service + audit). Frontend: - Wire up the Voice/STT settings card (model/base URL/key) and the Voice-dictation toggle. - New `features/dictation`: useDictation (MediaRecorder state machine), MicButton, transcribe service; integrated into the chat composer and a new editor-toolbar dictation group, both gated by ai.dictation.	2026-06-18 18:45:33 +03:00

11 Commits