Unit tests for the safety-critical paths: crypto secret-box (round-trip,
tamper detection, wrong key), the SSRF guard (blocked ranges + DNS-rebinding),
the ai-chat tools service, the page-embedding repo, and the
assistant-parts/serialization helpers. Those server helpers (assistantParts,
rowToUiMessage, serializeSteps) are exported ONLY for the tests — no runtime
change.
Also: keyboard a11y on the chat history header and conversation rows
(role/tabIndex/Enter+Space), and DRY refactors that move shared logic into one
place (isToolPart -> tool-parts util; buildInitialValues in the MCP form).
The behaviour-changing edits that previously rode along in this commit are
split out into the following two commits, per review.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add push-to-talk voice dictation that transcribes recorded audio on the
server via the workspace's OpenAI-compatible AI provider (Whisper /
gpt-4o-transcribe / self-hosted whisper), then inserts the text.
Backend:
- New `stt_api_key_enc` column + migration; STT creds parity with chat/
embeddings (sttModel/sttBaseUrl/sttApiKey, write-only key, fallbacks to
chat baseUrl/key). Both provider whitelists updated (service + repo).
- AiService.getTranscriptionModel + AiTranscriptionService.
- Gated POST /ai-chat/transcribe (dictation flag → 403, JWT + workspace
scope + throttle, 25MB cap, MIME whitelist, never logs audio/key).
- New `settings.ai.dictation` workspace flag (DTO + service + audit).
Frontend:
- Wire up the Voice/STT settings card (model/base URL/key) and the
Voice-dictation toggle.
- New `features/dictation`: useDictation (MediaRecorder state machine),
MicButton, transcribe service; integrated into the chat composer and a
new editor-toolbar dictation group, both gated by ai.dictation.
Improve agent RAG quality with three changes, plus a roadmap doc for the rest.
- Indexer: prefix each chunk with its heading path ("Page > H1 > H2"), built by
walking the ProseMirror JSON (heading nodes) so a `#` inside a fenced code block
is never mistaken for a heading. Falls back to plain-text chunking on any error.
buildChunkRows: drop indexOf-against-source offsets (breadcrumb prefixes break
verbatim matching) for a cumulative cursor — offsets are provenance-only.
- Hybrid search: new migration adds a generated `fts` tsvector column + GIN index
to page_embeddings (same english+f_unaccent config as pages.tsv). New
PageEmbeddingRepo.hybridSearch fuses cosine + full-text rankings via Reciprocal
Rank Fusion (k=60, equal weights) in one SQL query at chunk granularity.
- Tools: collapse semanticSearch + searchPages into one hybrid `searchPages` tool
with a query-rewrite-oriented description; gracefully falls back to the REST
full-text path when embeddings are unconfigured. Access control (space scope +
page-permission post-filter) preserved. Add a query-rewrite hint to the default
system prompt.
- docs/rag-improvements-plan.md: record what shipped and the deferred backlog
(reranker, attachment indexing, eval harness, tuning).
Note: requires a corpus reindex to populate breadcrumbs on existing pages.
The WORKSPACE_CREATE_EMBEDDINGS / WORKSPACE_DELETE_EMBEDDINGS jobs were
enqueued (on AI Search enable/disable) but had no AI_QUEUE handler, so
existing pages were never indexed ("Indexed 0 of N pages") and disabling
never purged embeddings.
- EmbeddingProcessor: handle WORKSPACE_CREATE_EMBEDDINGS (bulk reindex all
live pages) and WORKSPACE_DELETE_EMBEDDINGS (purge workspace embeddings)
- EmbeddingIndexerService: add reindexWorkspace() (skips when embeddings
unconfigured; per-page error isolation) and removeWorkspace()
- PageRepo.getIdsByWorkspace(), PageEmbeddingRepo.deleteByWorkspace()
- AiSettingsService.reindex() + admin-only POST /workspace/ai-settings/reindex
- Frontend: "Reindex now" button, service call and mutation
- Stable per-workspace jobId with remove-before-add so a stale job can't
block future reindexes; cancel the delayed purge on enable/reindex so it
can't wipe freshly-built embeddings
Per-workspace AI provider config previously shared a single base URL and
a single API key between the chat model and the embedding model. Add
dedicated, optional embedding endpoint/token that fall back to the chat
values when empty, preserving backward compatibility.
- db: new migration adds nullable `embedding_api_key_enc` to
`ai_provider_credentials`; chat key stays in `api_key_enc`
- repo: add `upsertEmbeddingKey` / `clearEmbeddingKey` (on-conflict
touches only its own column, so chat/embedding keys never overwrite)
- ai-settings.service: store non-secret `embeddingBaseUrl`; resolve()
applies fallback (embeddingBaseUrl || baseUrl; embedding key || chat
key); getMasked() exposes raw `embeddingBaseUrl` + `hasEmbeddingApiKey`,
never the key; update() handles the embedding key write-only
- ai.service: getEmbeddingModel() builds openai/gemini/ollama with the
embedding-specific URL/key; chat path unchanged
- client: new "Embedding base URL" and "Embedding API key" fields with
fallback hints and a clear-key action
Requires running the DB migration on deploy.
Display "Indexed N of M pages" on the AI provider settings page so admins
can see how much of the wiki is covered by vector-RAG semantic search.
- page-embedding.repo: add countIndexedPages() — distinct non-deleted pages
that have stored embeddings in the workspace
- page.repo: add countByWorkspace() — total non-deleted pages
- ai-settings.service: compute both counts in getMasked() (Promise.all) and
return them with the masked settings; inject PageEmbeddingRepo + PageRepo
- MaskedAiSettings / IAiSettings: add indexedPages + totalPages
- ai-provider-settings: render a dimmed coverage line under "Embedding model"
- i18n: add the "Indexed {{indexed}} of {{total}} pages" key (en-US, ru-RU)
so the v6 hook stops re-creating its store every render on a new chat
(which wiped the optimistic user message + streamed deltas, so nothing
showed until the turn finished). Also send X-Accel-Buffering:no + flushHeaders.
- context: client sends the currently-open page {id,title}; the system prompt
tells the agent which page 'this page' refers to (it reads it via its
CASL-scoped getPage tool; id is prompt-context only, no server-side fetch).
- embeddings: make page_embeddings.embedding dimension-agnostic (drop the
HNSW index + ALTER to vector), remove the hard 1536 guard, filter search by
model_dimensions — so 3072-dim (and any) models index instead of being
skipped. Seq-scan <=> search (wiki scale); existing pages reindex on next edit.
- openai provider: use .chat() (Chat Completions) instead of the default callable
(Responses API), which gateways reject on multi-turn -> 400.
- updateAiProviderSettings: assemble settings.ai.provider via jsonb_build_object
with ::text-cast bound params + jsonb_typeof self-heal (postgres.js was
double-encoding it into an array; the ::text cast avoids 'could not determine
data type of parameter').
- chat agent: drop the hard maxOutputTokens cap (truncated complex tool calls);
keep a tiny cap only on the test-connection ping.
- testConnection + chat stream: surface the real provider error (statusCode+message)
to logs and the UI instead of generic masks; never log the API key.
- chat UI: typing indicator, incremental streaming render, tool 'running' status, Stop.
Also bundled (prior uncommitted ai-chat work):
- history 'AI agent' provenance badge; vector RAG (pgvector image + page_embeddings
+ AI_QUEUE indexer + space-scoped semanticSearch); external MCP servers backend
(@ai-sdk/mcp client, SSRF IP-pinning, encrypted headers, admin CRUD/Test);
yjs duplicate-instance fix via pnpm patch (single CJS instance server-side).
WIP checkpoint of the gitmost AI-chat backend (plan stages A + B1 + B3a).
The agent acts under the requesting user's JWT (Docmost CASL enforces page
access); the external service-account /mcp endpoint is untouched.
LLM provider config (A2-A4):
- integrations/crypto: AES-256-GCM SecretBoxService (key derived from APP_SECRET,
per-record salt/iv; clear error on rotation instead of crashing).
- ai_provider_credentials table/repo/types: encrypted API key stored outside
workspace settings/baseFields, write-only (never returned by any endpoint).
- integrations/ai: per-workspace AI SDK v6 provider driver (openai/gemini/ollama),
admin-gated GET(masked)/PATCH(write-only key)/Test endpoints; settings.ai.provider
holds non-secret config incl. systemPrompt. Removed unused AI_* env getters (DB is
the single source of truth).
Chat module (A1, A5-A8):
- ai_chats/ai_chat_messages repos (workspace-scoped, soft-delete, tsv never selected).
- core/ai-chat: CRUD + POST /ai-chat/stream (Fastify hijack + AI SDK v6
pipeUIMessageStreamToResponse, abort on disconnect, persist user/assistant msgs).
- Agent loop: streamText + stepCountIs(8); read tools searchPages/getPage via a
per-request DocmostClient over loopback REST under the user's minted access token.
- Gate settings.ai.chat (+ 503 when provider unconfigured); buildSystemPrompt with a
non-removable safety/anti-prompt-injection framework. Per-user rate limit.
Per-user auth (B1):
- @docmost/mcp DocmostClient gains an additive getToken variant (carry a user JWT,
re-fetch on 401) and exports DocmostClient; the email/password service-account path
(external /mcp, stdio) is unchanged.
Agent-edit provenance backbone (B3a):
- Migration: pages/page_history (last_updated_source, last_updated_ai_chat_id) and
comments (created_source, ai_chat_id, resolved_source).
- Signed actor/aiChatId claim in the collab token; onAuthenticate propagates it,
onStoreDocument writes it with a sticky agent marker, saveHistory copies it.
Migrations auto-run on boot (additive). Write tools, frontend, RAG and external MCP
servers are not in this checkpoint.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>