feat(ai): server-side voice dictation (STT) with mic in chat and editor
Add push-to-talk voice dictation that transcribes recorded audio on the server via the workspace's OpenAI-compatible AI provider (Whisper / gpt-4o-transcribe / self-hosted whisper), then inserts the text. Backend: - New `stt_api_key_enc` column + migration; STT creds parity with chat/ embeddings (sttModel/sttBaseUrl/sttApiKey, write-only key, fallbacks to chat baseUrl/key). Both provider whitelists updated (service + repo). - AiService.getTranscriptionModel + AiTranscriptionService. - Gated POST /ai-chat/transcribe (dictation flag → 403, JWT + workspace scope + throttle, 25MB cap, MIME whitelist, never logs audio/key). - New `settings.ai.dictation` workspace flag (DTO + service + audit). Frontend: - Wire up the Voice/STT settings card (model/base URL/key) and the Voice-dictation toggle. - New `features/dictation`: useDictation (MediaRecorder state machine), MicButton, transcribe service; integrated into the chat composer and a new editor-toolbar dictation group, both gated by ai.dictation.
This commit is contained in:
@@ -4,6 +4,7 @@ import {
|
||||
generateText,
|
||||
type EmbeddingModel,
|
||||
type LanguageModel,
|
||||
type TranscriptionModel,
|
||||
} from 'ai';
|
||||
import { createOpenAI } from '@ai-sdk/openai';
|
||||
import { createGoogleGenerativeAI } from '@ai-sdk/google';
|
||||
@@ -11,6 +12,7 @@ import { createOllama } from 'ai-sdk-ollama';
|
||||
import { AiSettingsService } from './ai-settings.service';
|
||||
import { AiNotConfiguredException } from './ai-not-configured.exception';
|
||||
import { AiEmbeddingNotConfiguredException } from './ai-embedding-not-configured.exception';
|
||||
import { AiSttNotConfiguredException } from './ai-stt-not-configured.exception';
|
||||
import { describeProviderError } from './ai-error.util';
|
||||
|
||||
/**
|
||||
@@ -106,6 +108,26 @@ export class AiService {
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Resolve the workspace config and build the transcription (STT) model.
|
||||
* STT always speaks the OpenAI-compatible /v1/audio/transcriptions API
|
||||
* (only @ai-sdk/openai exposes .transcription()), regardless of the chat
|
||||
* driver. sttBaseUrl falls back to the chat baseUrl; the API key falls back
|
||||
* to the chat key (resolved by AiSettingsService.resolve). Built PER WORKSPACE
|
||||
* on demand; the decrypted key is never logged.
|
||||
*
|
||||
* Throws AiSttNotConfiguredException (-> 503) when no STT model is set.
|
||||
*/
|
||||
async getTranscriptionModel(workspaceId: string): Promise<TranscriptionModel> {
|
||||
const cfg = await this.aiSettings.resolve(workspaceId);
|
||||
if (!cfg?.sttModel) throw new AiSttNotConfiguredException();
|
||||
const baseURL = cfg.sttBaseUrl || cfg.baseUrl; // stt-specific, else chat
|
||||
// apiKey may be unused for keyless self-hosted whisper; pass a placeholder.
|
||||
return createOpenAI({ apiKey: cfg.sttApiKey ?? 'unused', baseURL }).transcription(
|
||||
cfg.sttModel,
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
* Embed a batch of texts with the workspace embedding model. Returns one
|
||||
* vector per input, in the same order. Thin wrapper over the AI SDK's
|
||||
|
||||
Reference in New Issue
Block a user