Commit Graph

11 Commits

Author SHA1 Message Date
claude code agent 227
7c48bab1f2 test: add unit tests for 10 candidates from issue #139
Adds co-located unit tests for ten targets (client → vitest *.test.ts(x),
server → jest *.spec.ts), plus minimal behavior-preserving extractions/exports
where the issue required a pure function to test:

- encode-wav: WAV header + PCM16 clamping
- editor-ext embed-provider / utils (sanitizeUrl, isInternalFileUrl) / indent
  (export clampIndent)
- label.dto @Matches regex
- move-page.dto vs generateJitteredKeyBetween parity (bug locked via test.failing)
- new-note-button canCreatePage (extracted to can-create-page.ts)
- history-editor diff (extracted pure computeHistoryDiff into history-diff.ts)
- notification getTypesForTab + repo contract (direct-tab divergence locked via
  test.failing)
- search buildTsQuery (extracted + sanitizes operator inputs so adversarial
  queries no longer risk a to_tsquery 500)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 04:13:44 +03:00
claude_code
8f01a01122 fix(dictation): start streaming dictation on the first click
The streaming mic button only began recording on the SECOND click. The VAD
library creates its AudioContext inside vad.start() and never resumes it; on the
first click the lazy model load (import + MicVAD.new) ran first, so the context
was created after the user-gesture window expired and started suspended — the
audio worklet never ran, so nothing happened. The second click was fast (model
cached) so the context landed inside the gesture and worked.

Create and resume our own AudioContext synchronously at the top of start()
(inside the click gesture, before the model load) and inject it into MicVAD,
which then does not take ownership of it; it is reused across start/stop and
closed only on unmount. Add a "loading" status so the first click is shown as a
spinner (disabled) while the model loads, which also blocks a confusing second
click.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 18:42:44 +03:00
claude_code
373c56c0d3 fix(dictation): cut on ~1.5s silence instead of 0.64s
Streaming dictation sends one transcription request per ended speech segment.
With redemptionMs=640 the VAD cut on every ~0.64s gap, so normal halting speech
fragmented into many segments and flooded /ai-chat/transcribe — tripping the
per-user rate limit even at modest real usage.

Raise redemptionMs to 1500 so a cut only happens on a real sentence/thought
pause (~the "couple seconds" the feature was meant to use). Request count now
tracks actual pauses rather than inter-word gaps; the server throttle is left
unchanged (the earlier limit bump was treating the symptom).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 18:04:35 +03:00
claude_code
7093f184b2 fix(dictation): self-host Silero VAD / onnxruntime-web assets
Streaming dictation failed at runtime with "no available backend found /
'text/html' is not a valid JavaScript MIME type": @ricky0123/vad-web 0.0.30
defaults baseAssetPath/onnxWASMBasePath to "./" (relative to the page URL),
so the worklet, Silero model and ORT wasm/mjs were requested against the SPA
catch-all and came back as index.html.

Serve them from a fixed /vad/ instead:
- scripts/copy-vad-assets.mjs copies the 4 runtime assets (vad worklet,
  silero_vad_v5.onnx, ort-wasm-simd-threaded.jsep.{mjs,wasm}) from node_modules
  into apps/client/public/vad/ (gitignored — the ORT wasm is ~26 MB)
- client dev/build scripts run the copy first so the assets are always present
- useStreamingDictation points both path constants at "/vad/"

Verified: dev server serves all four under /vad/ with HTTP 200 and correct
Content-Type (js/wasm, never text/html); tsc clean. Prod (Docker) build runs
the copy step, so dist/vad/* ships in the image.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 17:19:11 +03:00
claude_code
4f0da42d88 feat(dictation): streaming STT via silence cut (Silero VAD)
Add a lightweight "streaming" dictation mode as a simpler alternative to the
realtime-websocket path: detect speech with Silero VAD (@ricky0123/vad-web),
cut each segment on a pause and POST it to the existing /ai-chat/transcribe
endpoint, so text appears progressively. No server changes.

- new useStreamingDictation hook (same API as useDictation), lazy-loads VAD,
  in-order seq emission, session-epoch guard against stop->start races
- new encodeWavPcm16 util (Float32 -> mono PCM16 WAV, accepted by the server)
- MicButton gains a `streaming` prop; enabled in the editor toolbar and chat
- VAD tuning: redemptionMs 640 / preSpeechPadMs 320 / minSpeechMs 96
- batch dictation kept as the fallback (streaming=false)
- deps: @ricky0123/vad-web@0.0.30, onnxruntime-web@1.27.0

Note: VAD assets load from the library CDN by default; for self-hosted/offline
set VAD_BASE_ASSET_PATH/VAD_ONNX_WASM_BASE_PATH and copy assets to public/vad/.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 16:52:05 +03:00
claude_code
89ac8fa37b style(dictation): match mic button halo radius to button shape
Update the halo's border-radius from a fixed 50% circle to the theme's default radius variable. This ensures the red pulse follows the button's rounded‑square outline instead of appearing circular.
2026-06-22 16:01:53 +03:00
claude_code
ef74058301 style(editor): align byline dictation mic with the info icon
The byline mic rendered blue and with a smaller (16px) glyph next to the
gray 20px info icon, so it looked misaligned with an uneven gap. Add
optional color/iconSize props to MicButton (forwarded through
DictationGroup) and render the byline mic gray at 20px, wrapping it and
the info icon in a tight nowrap group so they read as a snug, aligned
pair. The AI chat mic is unchanged (passes neither prop).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 02:48:50 +03:00
claude_code
0deded342d fix(dictation): drive the recording halo from mic level under reduced-motion
The live mic-level halo around the stop button was frozen at a constant
scale (1.15) whenever the OS "Reduce motion" setting was on, so it never
reacted to the voice while dictating. Make haloScale unconditional so it
always follows audioLevel (amplitude 0.9), and drop the now-unused
useReducedMotion import and reduceMotion local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 23:34:07 +03:00
claude_code
55625874c5 feat(dictation): show live mic level while recording
Add a pulsing halo behind the stop button that scales with the
microphone input level, giving real-time feedback that recording is
active and the mic is picking up sound.

- use-dictation: meter the captured MediaStream via AudioContext +
  AnalyserNode (analyser only, never connected to destination), compute
  a smoothed RMS audioLevel (0..1) in a requestAnimationFrame loop, and
  tear the meter down on every recording-end path (stop/cancel/auto-stop/
  unmount); meter failure is non-fatal to recording
- mic-button: render a translucent red halo whose scale follows
  audioLevel; honor prefers-reduced-motion with a static halo
- stop(): recover and release resources when no live recorder remains
- fix unhandled rejection from AudioContext.resume()
2026-06-21 21:04:22 +03:00
vvzvlad
77249d59c6 feat(ai): OpenRouter STT support + real error surfacing + STT endpoint test
- ai.service: route *.openrouter.ai STT to its JSON+base64
  /audio/transcriptions API; keep the OpenAI multipart path (AI SDK) for
  OpenAI/self-hosted whisper. Unify transcription behind transcribe().
- /transcribe controller: surface the real provider/transport reason
  (describeProviderError) instead of an opaque 500; preserve HttpException.
- testConnection: add an 'stt' capability (silent-WAV probe) + DTO; client
  gets a Test endpoint button and status dot on the Voice/STT card.
- useDictation: log full errors to the console and show the real reason
  (mic start + transcription paths); handle NotReadable/Abort and missing
  mediaDevices.
- docs(CLAUDE.md): require full error logging + specific user-facing messages.
2026-06-18 19:26:35 +03:00
vvzvlad
874bdd021c feat(ai): server-side voice dictation (STT) with mic in chat and editor
Add push-to-talk voice dictation that transcribes recorded audio on the
server via the workspace's OpenAI-compatible AI provider (Whisper /
gpt-4o-transcribe / self-hosted whisper), then inserts the text.

Backend:
- New `stt_api_key_enc` column + migration; STT creds parity with chat/
  embeddings (sttModel/sttBaseUrl/sttApiKey, write-only key, fallbacks to
  chat baseUrl/key). Both provider whitelists updated (service + repo).
- AiService.getTranscriptionModel + AiTranscriptionService.
- Gated POST /ai-chat/transcribe (dictation flag → 403, JWT + workspace
  scope + throttle, 25MB cap, MIME whitelist, never logs audio/key).
- New `settings.ai.dictation` workspace flag (DTO + service + audit).

Frontend:
- Wire up the Voice/STT settings card (model/base URL/key) and the
  Voice-dictation toggle.
- New `features/dictation`: useDictation (MediaRecorder state machine),
  MicButton, transcribe service; integrated into the chat composer and a
  new editor-toolbar dictation group, both gated by ai.dictation.
2026-06-18 18:45:33 +03:00