Add push-to-talk voice dictation that transcribes recorded audio on the server via the workspace's OpenAI-compatible AI provider (Whisper / gpt-4o-transcribe / self-hosted whisper), then inserts the text. Backend: - New `stt_api_key_enc` column + migration; STT creds parity with chat/ embeddings (sttModel/sttBaseUrl/sttApiKey, write-only key, fallbacks to chat baseUrl/key). Both provider whitelists updated (service + repo). - AiService.getTranscriptionModel + AiTranscriptionService. - Gated POST /ai-chat/transcribe (dictation flag → 403, JWT + workspace scope + throttle, 25MB cap, MIME whitelist, never logs audio/key). - New `settings.ai.dictation` workspace flag (DTO + service + audit). Frontend: - Wire up the Voice/STT settings card (model/base URL/key) and the Voice-dictation toggle. - New `features/dictation`: useDictation (MediaRecorder state machine), MicButton, transcribe service; integrated into the chat composer and a new editor-toolbar dictation group, both gated by ai.dictation.
18 lines
659 B
TypeScript
18 lines
659 B
TypeScript
import api from "@/lib/api-client";
|
|
|
|
// POST the recorded audio as multipart/form-data; the server transcribes it with
|
|
// the workspace STT model and returns { text } (wrapped in the standard envelope,
|
|
// so the value is at req.data.text). `filename` only sets the part name; the
|
|
// server keys off the blob's MIME type.
|
|
export async function transcribeAudio(
|
|
blob: Blob,
|
|
filename = "speech.webm",
|
|
): Promise<string> {
|
|
const form = new FormData();
|
|
form.append("file", blob, filename);
|
|
const req = await api.post<{ text: string }>("/ai-chat/transcribe", form, {
|
|
headers: { "Content-Type": "multipart/form-data" },
|
|
});
|
|
return req.data.text;
|
|
}
|