Add docs/backlog/stt-providers-and-async.md: how to add new synchronous STT
request formats (Deepgram, native Gemini, ElevenLabs) via the explicit
sttApiStyle axis, which providers are inherently async and don't fit the
current sync model, and a target job-based async architecture (BullMQ job
table, sync+async unification, polling -> push -> live streaming) with the
migration path and security/cleanup considerations.
Add docs/streaming-dictation-plan.md — a design document for true
"text appears as you speak" dictation via the OpenAI Realtime API.
- Maps the current batch dictation flow (client MediaRecorder -> single
blob -> POST /ai-chat/transcribe) and why streaming is impossible there.
- Documents the Realtime API contract (transcription session, ephemeral
token, pcm16 audio, input_audio_buffer.append, input_audio_transcription
delta/completed events, server_vad).
- Recommends a server-side WS proxy transport (key stays server-side,
SSRF-guarded, provider-agnostic via sttBaseUrl) over direct browser
WebRTC, and a ProseMirror decoration for interim text with final-only
commit to avoid polluting Yjs collab/history.
- Covers config additions, AudioWorklet PCM16 capture, security per repo
conventions, edge cases, phased rollout, risks, and impacted files.