feat(ai): server-side voice dictation (STT) with mic in chat and editor

Add push-to-talk voice dictation that transcribes recorded audio on the
server via the workspace's OpenAI-compatible AI provider (Whisper /
gpt-4o-transcribe / self-hosted whisper), then inserts the text.

Backend:
- New `stt_api_key_enc` column + migration; STT creds parity with chat/
  embeddings (sttModel/sttBaseUrl/sttApiKey, write-only key, fallbacks to
  chat baseUrl/key). Both provider whitelists updated (service + repo).
- AiService.getTranscriptionModel + AiTranscriptionService.
- Gated POST /ai-chat/transcribe (dictation flag → 403, JWT + workspace
  scope + throttle, 25MB cap, MIME whitelist, never logs audio/key).
- New `settings.ai.dictation` workspace flag (DTO + service + audit).

Frontend:
- Wire up the Voice/STT settings card (model/base URL/key) and the
  Voice-dictation toggle.
- New `features/dictation`: useDictation (MediaRecorder state machine),
  MicButton, transcribe service; integrated into the chat composer and a
  new editor-toolbar dictation group, both gated by ai.dictation.
This commit is contained in:
vvzvlad
2026-06-18 18:45:33 +03:00
parent 49eba22201
commit 874bdd021c
24 changed files with 845 additions and 39 deletions

View File

@@ -98,4 +98,42 @@ export class AiProviderCredentialsRepo {
.where('driver', '=', driver)
.execute();
}
// Upsert the STT-specific encrypted key. If no row exists yet this inserts one
// with `apiKeyEnc` left null (the column is nullable). On conflict only
// `sttApiKeyEnc` / `updatedAt` are touched, so the chat & embedding keys are kept.
async upsertSttKey(
workspaceId: string,
driver: string,
sttApiKeyEnc: string,
trx?: KyselyTransaction,
): Promise<AiProviderCredentials> {
const db = dbOrTx(this.db, trx);
return db
.insertInto('aiProviderCredentials')
.values({ workspaceId, driver, sttApiKeyEnc })
.onConflict((oc) =>
oc.columns(['workspaceId', 'driver']).doUpdateSet({
sttApiKeyEnc,
updatedAt: new Date(),
}),
)
.returningAll()
.executeTakeFirst();
}
// Clear only the STT-specific key; the chat & embedding keys are kept.
async clearSttKey(
workspaceId: string,
driver: string,
trx?: KyselyTransaction,
): Promise<void> {
const db = dbOrTx(this.db, trx);
await db
.updateTable('aiProviderCredentials')
.set({ sttApiKeyEnc: null, updatedAt: new Date() })
.where('workspaceId', '=', workspaceId)
.where('driver', '=', driver)
.execute();
}
}

View File

@@ -239,7 +239,7 @@ export class WorkspaceRepo {
// is a real jsonb object, never a double-encoded string. The CASE self-heals
// workspaces whose settings.ai.provider was previously corrupted into an
// array/string.
const ALLOWED = ['driver', 'chatModel', 'embeddingModel', 'baseUrl', 'embeddingBaseUrl', 'systemPrompt'];
const ALLOWED = ['driver', 'chatModel', 'embeddingModel', 'baseUrl', 'embeddingBaseUrl', 'sttModel', 'sttBaseUrl', 'systemPrompt'];
const entries = Object.entries(provider).filter(
([k, v]) => v !== undefined && ALLOWED.includes(k),
);