Files
gitmost/apps/client/src/features/dictation/services/dictation-service.ts
vvzvlad 874bdd021c feat(ai): server-side voice dictation (STT) with mic in chat and editor
Add push-to-talk voice dictation that transcribes recorded audio on the
server via the workspace's OpenAI-compatible AI provider (Whisper /
gpt-4o-transcribe / self-hosted whisper), then inserts the text.

Backend:
- New `stt_api_key_enc` column + migration; STT creds parity with chat/
  embeddings (sttModel/sttBaseUrl/sttApiKey, write-only key, fallbacks to
  chat baseUrl/key). Both provider whitelists updated (service + repo).
- AiService.getTranscriptionModel + AiTranscriptionService.
- Gated POST /ai-chat/transcribe (dictation flag → 403, JWT + workspace
  scope + throttle, 25MB cap, MIME whitelist, never logs audio/key).
- New `settings.ai.dictation` workspace flag (DTO + service + audit).

Frontend:
- Wire up the Voice/STT settings card (model/base URL/key) and the
  Voice-dictation toggle.
- New `features/dictation`: useDictation (MediaRecorder state machine),
  MicButton, transcribe service; integrated into the chat composer and a
  new editor-toolbar dictation group, both gated by ai.dictation.
2026-06-18 18:45:33 +03:00

18 lines
659 B
TypeScript

import api from "@/lib/api-client";
// POST the recorded audio as multipart/form-data; the server transcribes it with
// the workspace STT model and returns { text } (wrapped in the standard envelope,
// so the value is at req.data.text). `filename` only sets the part name; the
// server keys off the blob's MIME type.
export async function transcribeAudio(
blob: Blob,
filename = "speech.webm",
): Promise<string> {
const form = new FormData();
form.append("file", blob, filename);
const req = await api.post<{ text: string }>("/ai-chat/transcribe", form, {
headers: { "Content-Type": "multipart/form-data" },
});
return req.data.text;
}