Commit Graph

124 Commits

Author SHA1 Message Date
claude code agent 227
f1980cf425 test(ai-chat): safety-critical coverage + a11y + pure refactors
Unit tests for the safety-critical paths: crypto secret-box (round-trip,
tamper detection, wrong key), the SSRF guard (blocked ranges + DNS-rebinding),
the ai-chat tools service, the page-embedding repo, and the
assistant-parts/serialization helpers. Those server helpers (assistantParts,
rowToUiMessage, serializeSteps) are exported ONLY for the tests — no runtime
change.

Also: keyboard a11y on the chat history header and conversation rows
(role/tabIndex/Enter+Space), and DRY refactors that move shared logic into one
place (isToolPart -> tool-parts util; buildInitialValues in the MCP form).

The behaviour-changing edits that previously rode along in this commit are
split out into the following two commits, per review.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 17:58:44 +03:00
vvzvlad
01a5a4b5d2 refactor(ai): explicit STT request format instead of OpenRouter host-sniffing
Replace the implicit `hostname endsWith openrouter.ai` detection with an
explicit, admin-chosen provider field `sttApiStyle` ('multipart' = OpenAI-
compatible multipart /audio/transcriptions; 'json' = OpenRouter-style JSON +
base64 input_audio). The transcription path now branches on the stored field,
not on the URL — nothing hidden from the admin.

- ai.types: add SttApiStyle + STT_API_STYLES; field on AiProviderSettings and
  MaskedAiSettings (resolved via ResolvedAiConfig).
- update-ai-settings.dto: validate sttApiStyle with @IsIn(STT_API_STYLES).
- ai-settings.service: plumb sttApiStyle through resolve()/getMasked() and the
  non-secret update whitelist; workspace.repo: add it to the ALLOWED array so it
  persists.
- ai.service: drop isOpenRouter(); transcribe() branches on cfg.sttApiStyle;
  rename helper to transcribeJsonBase64 with provider-neutral error text and a
  BadRequestException (400) when the base URL is missing for the JSON style.
- client: SttApiStyle type on IAiSettings/IAiSettingsUpdate; "Request format"
  Select on the Voice/STT settings card; i18n.
2026-06-18 19:40:05 +03:00
vvzvlad
5af40e0ee5 refactor(db): replace STT credentials migration
Remove the outdated 20260618T130000 migration file and add the updated
20260618T160000 version to correct the timestamp and ensure proper ordering.
2026-06-18 18:54:24 +03:00
vvzvlad
874bdd021c feat(ai): server-side voice dictation (STT) with mic in chat and editor
Add push-to-talk voice dictation that transcribes recorded audio on the
server via the workspace's OpenAI-compatible AI provider (Whisper /
gpt-4o-transcribe / self-hosted whisper), then inserts the text.

Backend:
- New `stt_api_key_enc` column + migration; STT creds parity with chat/
  embeddings (sttModel/sttBaseUrl/sttApiKey, write-only key, fallbacks to
  chat baseUrl/key). Both provider whitelists updated (service + repo).
- AiService.getTranscriptionModel + AiTranscriptionService.
- Gated POST /ai-chat/transcribe (dictation flag → 403, JWT + workspace
  scope + throttle, 25MB cap, MIME whitelist, never logs audio/key).
- New `settings.ai.dictation` workspace flag (DTO + service + audit).

Frontend:
- Wire up the Voice/STT settings card (model/base URL/key) and the
  Voice-dictation toggle.
- New `features/dictation`: useDictation (MediaRecorder state machine),
  MicButton, transcribe service; integrated into the chat composer and a
  new editor-toolbar dictation group, both gated by ai.dictation.
2026-06-18 18:45:33 +03:00
vvzvlad
c8e41e8916 feat(ai): hybrid RRF retrieval, heading-breadcrumb chunks, merged search tool
Improve agent RAG quality with three changes, plus a roadmap doc for the rest.

- Indexer: prefix each chunk with its heading path ("Page > H1 > H2"), built by
  walking the ProseMirror JSON (heading nodes) so a `#` inside a fenced code block
  is never mistaken for a heading. Falls back to plain-text chunking on any error.
  buildChunkRows: drop indexOf-against-source offsets (breadcrumb prefixes break
  verbatim matching) for a cumulative cursor — offsets are provenance-only.
- Hybrid search: new migration adds a generated `fts` tsvector column + GIN index
  to page_embeddings (same english+f_unaccent config as pages.tsv). New
  PageEmbeddingRepo.hybridSearch fuses cosine + full-text rankings via Reciprocal
  Rank Fusion (k=60, equal weights) in one SQL query at chunk granularity.
- Tools: collapse semanticSearch + searchPages into one hybrid `searchPages` tool
  with a query-rewrite-oriented description; gracefully falls back to the REST
  full-text path when embeddings are unconfigured. Access control (space scope +
  page-permission post-filter) preserved. Add a query-rewrite hint to the default
  system prompt.
- docs/rag-improvements-plan.md: record what shipped and the deferred backlog
  (reranker, attachment indexing, eval harness, tuning).

Note: requires a corpus reindex to populate breadcrumbs on existing pages.
2026-06-18 03:43:01 +03:00
vvzvlad
91a63f0b2c fix(ai): stop RAG coverage bar sticking below 100% on empty pages
"Indexed N of M pages" stayed at e.g. "27 of 34" forever even after a
successful full reindex. The numerator counted pages that have embeddings
while the denominator counted ALL non-deleted pages, so empty / text-less
pages (which legitimately store zero embeddings) could never be reached.

Add PageRepo.countEmbeddablePages: counts non-deleted pages that have
non-empty textContent OR already have a stored embedding row, and use it as
the totalPages denominator in AiSettingsService.getMasked. The "has
embeddings" clause covers pages indexed from the content JSON (null
textContent) and guarantees indexedPages <= totalPages. No DB migration.
2026-06-18 03:33:38 +03:00
vvzvlad
52e19fe678 feat(ai): wire up workspace RAG bulk reindex + manual "Reindex now"
The WORKSPACE_CREATE_EMBEDDINGS / WORKSPACE_DELETE_EMBEDDINGS jobs were
enqueued (on AI Search enable/disable) but had no AI_QUEUE handler, so
existing pages were never indexed ("Indexed 0 of N pages") and disabling
never purged embeddings.

- EmbeddingProcessor: handle WORKSPACE_CREATE_EMBEDDINGS (bulk reindex all
  live pages) and WORKSPACE_DELETE_EMBEDDINGS (purge workspace embeddings)
- EmbeddingIndexerService: add reindexWorkspace() (skips when embeddings
  unconfigured; per-page error isolation) and removeWorkspace()
- PageRepo.getIdsByWorkspace(), PageEmbeddingRepo.deleteByWorkspace()
- AiSettingsService.reindex() + admin-only POST /workspace/ai-settings/reindex
- Frontend: "Reindex now" button, service call and mutation
- Stable per-workspace jobId with remove-before-add so a stale job can't
  block future reindexes; cancel the delayed purge on enable/reindex so it
  can't wipe freshly-built embeddings
2026-06-18 02:15:18 +03:00
vvzvlad
a7f244053b feat(ai): separate base URL and API key for chat vs embedding model
Per-workspace AI provider config previously shared a single base URL and
a single API key between the chat model and the embedding model. Add
dedicated, optional embedding endpoint/token that fall back to the chat
values when empty, preserving backward compatibility.

- db: new migration adds nullable `embedding_api_key_enc` to
  `ai_provider_credentials`; chat key stays in `api_key_enc`
- repo: add `upsertEmbeddingKey` / `clearEmbeddingKey` (on-conflict
  touches only its own column, so chat/embedding keys never overwrite)
- ai-settings.service: store non-secret `embeddingBaseUrl`; resolve()
  applies fallback (embeddingBaseUrl || baseUrl; embedding key || chat
  key); getMasked() exposes raw `embeddingBaseUrl` + `hasEmbeddingApiKey`,
  never the key; update() handles the embedding key write-only
- ai.service: getEmbeddingModel() builds openai/gemini/ollama with the
  embedding-specific URL/key; chat path unchanged
- client: new "Embedding base URL" and "Embedding API key" fields with
  fallback hints and a clear-key action

Requires running the DB migration on deploy.
2026-06-18 01:33:45 +03:00
vvzvlad
1f2d20244e feat(ai-chat): show RAG indexing coverage in AI settings
Display "Indexed N of M pages" on the AI provider settings page so admins
can see how much of the wiki is covered by vector-RAG semantic search.

- page-embedding.repo: add countIndexedPages() — distinct non-deleted pages
  that have stored embeddings in the workspace
- page.repo: add countByWorkspace() — total non-deleted pages
- ai-settings.service: compute both counts in getMasked() (Promise.all) and
  return them with the masked settings; inject PageEmbeddingRepo + PageRepo
- MaskedAiSettings / IAiSettings: add indexedPages + totalPages
- ai-provider-settings: render a dimmed coverage line under "Embedding model"
- i18n: add the "Indexed {{indexed}} of {{total}} pages" key (en-US, ru-RU)
2026-06-17 23:18:51 +03:00
vvzvlad
0a9788e89a feat(collab): separate agent edits from human edits in page history
Page-history snapshots are debounced/coalesced (one per 1–5 min window,
jobId=page.id). A human edit followed by an agent edit in the same window
collapsed into a single snapshot, losing both the pre-agent human state and
a deterministic record of the agent's result.

Two provenance-aware boundaries now bracket an agent intervention:
- Before: on a user->agent transition, onStoreDocument synchronously pins the
  current (pre-agent) human content as its own history version tagged 'user',
  inside the page-write transaction, before the agent overwrites it.
- After: agent stores enqueue an immediate (delay 0), source-keyed history job
  (jobId=`${pageId}:agent`) so the agent's result snapshots deterministically
  as 'agent' and a later human edit (jobId=page.id) cannot coalesce/retag it.

Also add an `id desc` tie-break to findPageLastHistory so "last history" stays
deterministic when two snapshots share a created_at, consistent with
findPageHistoryByPageId.

Known trade-offs (Variant 1): the delay-0 worker re-reads the row, leaving a
millisecond mis-tag window; multiple agent edits in one turn may yield multiple
versions. The reverse agent->human boundary is intentionally out of scope.
2026-06-17 06:40:28 +03:00
vvzvlad
65f0713a70 fix(ai-chat): live streaming, open-page context, any-dimension embeddings" -m "- streaming: give useChat a STABLE store id (chatId ?? per-mount generated)
so the v6 hook stops re-creating its store every render on a new chat
  (which wiped the optimistic user message + streamed deltas, so nothing
  showed until the turn finished). Also send X-Accel-Buffering:no + flushHeaders.
- context: client sends the currently-open page {id,title}; the system prompt
  tells the agent which page 'this page' refers to (it reads it via its
  CASL-scoped getPage tool; id is prompt-context only, no server-side fetch).
- embeddings: make page_embeddings.embedding dimension-agnostic (drop the
  HNSW index + ALTER to vector), remove the hard 1536 guard, filter search by
  model_dimensions — so 3072-dim (and any) models index instead of being
  skipped. Seq-scan <=> search (wiki scale); existing pages reindex on next edit.
2026-06-17 04:58:06 +03:00
vvzvlad
a4b7919753 fix(ai-chat): OpenAI Chat Completions for multi-turn + provider settings, stream UX & errors" -m "Live-stand fixes (OpenRouter / OpenAI-compatible):
- openai provider: use .chat() (Chat Completions) instead of the default callable
  (Responses API), which gateways reject on multi-turn -> 400.
- updateAiProviderSettings: assemble settings.ai.provider via jsonb_build_object
  with ::text-cast bound params + jsonb_typeof self-heal (postgres.js was
  double-encoding it into an array; the ::text cast avoids 'could not determine
  data type of parameter').
- chat agent: drop the hard maxOutputTokens cap (truncated complex tool calls);
  keep a tiny cap only on the test-connection ping.
- testConnection + chat stream: surface the real provider error (statusCode+message)
  to logs and the UI instead of generic masks; never log the API key.
- chat UI: typing indicator, incremental streaming render, tool 'running' status, Stop.

Also bundled (prior uncommitted ai-chat work):
- history 'AI agent' provenance badge; vector RAG (pgvector image + page_embeddings
  + AI_QUEUE indexer + space-scoped semanticSearch); external MCP servers backend
  (@ai-sdk/mcp client, SSRF IP-pinning, encrypted headers, admin CRUD/Test);
  yjs duplicate-instance fix via pnpm patch (single CJS instance server-side).
2026-06-17 04:28:29 +03:00
vvzvlad
683da7a4c5 feat(ai-chat): per-user AI agent backend — LLM config, read-only agent, provenance schema
WIP checkpoint of the gitmost AI-chat backend (plan stages A + B1 + B3a).
The agent acts under the requesting user's JWT (Docmost CASL enforces page
access); the external service-account /mcp endpoint is untouched.

LLM provider config (A2-A4):
- integrations/crypto: AES-256-GCM SecretBoxService (key derived from APP_SECRET,
  per-record salt/iv; clear error on rotation instead of crashing).
- ai_provider_credentials table/repo/types: encrypted API key stored outside
  workspace settings/baseFields, write-only (never returned by any endpoint).
- integrations/ai: per-workspace AI SDK v6 provider driver (openai/gemini/ollama),
  admin-gated GET(masked)/PATCH(write-only key)/Test endpoints; settings.ai.provider
  holds non-secret config incl. systemPrompt. Removed unused AI_* env getters (DB is
  the single source of truth).

Chat module (A1, A5-A8):
- ai_chats/ai_chat_messages repos (workspace-scoped, soft-delete, tsv never selected).
- core/ai-chat: CRUD + POST /ai-chat/stream (Fastify hijack + AI SDK v6
  pipeUIMessageStreamToResponse, abort on disconnect, persist user/assistant msgs).
- Agent loop: streamText + stepCountIs(8); read tools searchPages/getPage via a
  per-request DocmostClient over loopback REST under the user's minted access token.
- Gate settings.ai.chat (+ 503 when provider unconfigured); buildSystemPrompt with a
  non-removable safety/anti-prompt-injection framework. Per-user rate limit.

Per-user auth (B1):
- @docmost/mcp DocmostClient gains an additive getToken variant (carry a user JWT,
  re-fetch on 401) and exports DocmostClient; the email/password service-account path
  (external /mcp, stdio) is unchanged.

Agent-edit provenance backbone (B3a):
- Migration: pages/page_history (last_updated_source, last_updated_ai_chat_id) and
  comments (created_source, ai_chat_id, resolved_source).
- Signed actor/aiChatId claim in the collab token; onAuthenticate propagates it,
  onStoreDocument writes it with a sticky agent marker, saveHistory copies it.

Migrations auto-run on boot (additive). Write tools, frontend, RAG and external MCP
servers are not in this checkpoint.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 01:36:41 +03:00
Philip Okugbe
33895b0607 bug fixes (#2250)
* util

* fix page position collation

* support fixed toolbar in templates editor

* date localization

* fix clipped emoji in templates editor

* fix page updated time object

* fix flickers

* fix: remove redundant breadcrumb from destination modal
2026-05-28 16:20:37 +01:00
Philip Okugbe
6cf8101ab3 feat(ee): templates (#2215)
* feat(ee): templates
* fix tree
* fix
2026-05-19 02:41:52 +01:00
Peter Tripp
932c1ad5b7 Better trash (#2190)
* Better trash

I recently lost a bunch of time editing and searching for pages that were actually in the Trash. Docmost intentionally tries to not link to Trashed pages, but the url of that Trashed page and any inbound links still work.  This makes it clearer when a page you are interacting with is in the Trash.

- /trash
  - Refactored banner into `trash-banner.tsx`
  - Refactored "Restore" modal into `use-restore-page-modal.tsx`
- Page (when isDeleted)
  - Add: `trash-banner.tsx`
  - Add breadcrumbs: `Parent / Child / Page (Deleted)`
  - Change: Deleted Pages are read-only
  - Replace "Move to Trash" with "Restore" in page menu (invokes `use-restore-page-modal`)

I tried very hard to keep this simple and re-use existing translation strings wherever possible.

* cleanup

---------

Co-authored-by: Philipinho <16838612+Philipinho@users.noreply.github.com>
2026-05-14 14:41:10 +01:00
Philip Okugbe
f758091b2a perf(permissions): cache space role and page edit lookups (#2208) 2026-05-14 13:11:28 +01:00
Philip Okugbe
a689cca7a0 feat: page labels/tags (#2188)
* feat: labels (WIP)
* full implementation
2026-05-10 18:14:15 +01:00
Philip Okugbe
537e45bc11 feat: page details section and backlinks (#2186)
* feat: page details section and backlinks
2026-05-09 17:03:08 +01:00
Philip Okugbe
de60aa7e61 feat: synced blocks (transclusion) (#2163)
* feat: synced blocks (transclusion)

* fix:remove name

* make placeholders smaller

* feat: enforce strict transclusion schema

* fix: scope synced blocks to workspace, gate unsync on edit permission

* fix collab module error
2026-05-08 13:23:16 +01:00
Philip Okugbe
641ce142df feat(ee): SCIM (#1347)
* SCIM - init (EE)

* accept db transaction

* sync

* Content parser support for scim+json

* patch scimmy

* sync

* return early if userIds is empty

* sync

* SCIM db table

* fixes

* scim tokens

* backfill

* feat(audit): add scim token events

* rename scim migration

* fix

* fix translation

* cleanup
2026-05-01 14:53:30 +01:00
Philip Okugbe
a6a7e4370a feat(ee): PDF export api (#2112)
* feat(ee): server side PDF export

* feat: pdf export queue

* sync

* sync
2026-04-14 16:26:54 +01:00
Philip Okugbe
cc00e77dfb fix: space overview favorites (#2110) 2026-04-14 02:58:24 +01:00
Philip Okugbe
4056bd0104 feat: enhancements (#2107)
* refactor
* fix
* update packages
2026-04-13 23:34:40 +01:00
Philip Okugbe
bd68e47e03 feat(ee): page verification workflow (#2102)
* feat: page verification workflow

* feat: refactor page-verification

* sync

* fix type

* fix

* fix

* notification icon

* use full word

* accept .license file

* - update templates
- update migration and notification

* fix copy

* update audit labels

* sync

* add space name
2026-04-13 20:20:34 +01:00
Philip Okugbe
d42091ccb1 feat: favorites (#2103)
* feat: favorites and templates(ee)

* rename migrations

* fix sidebar

* cleanup tabs

* fix

* turn off templates

* cleanup

* uuid validation
2026-04-12 22:06:25 +01:00
Philip Okugbe
57efb91bd3 feat(ee): ai chat (#2098)
* feat: ai chat

* feat: ai chat

* sync

* cleanup

* view space button
2026-04-10 19:23:47 +01:00
Philip Okugbe
da9b43681e feat: watch space (#2096) 2026-04-09 00:37:51 +01:00
Philip Okugbe
879aa2c3d8 feat: page update notifications (#2074)
* feat: watchers notification and email preferences

* fix: email copy

* digests

* clean up

* fix

* clean up

* move backlinks queue-up to history processor

* fix

* fix keys

* feat: group notifications

* filter

* adjust email digest window
2026-03-31 16:03:59 +01:00
Philip Okugbe
cbd0dd4a0b feat: indexes (#2071) 2026-03-29 20:29:12 +01:00
Philip Okugbe
3829b6cbef feat(ee): viewer comments (#2060) 2026-03-28 19:32:52 +00:00
Philip Okugbe
803f1f0b81 feat: user session management (#2056)
* user session management

* WIP

* cleanup

* license

* cleanup

* don't cache index

* rename current device property

* fix
2026-03-26 20:00:04 +00:00
Philipinho
90c190df78 fix: space members view enhancement 2026-03-02 21:33:15 +00:00
Philipinho
17ec2f4ac5 lists sorting 2026-03-02 21:07:47 +00:00
Philipinho
616d9297eb sync 2026-03-02 04:08:59 +00:00
Philip Okugbe
2309d1434b feat: support cross-space page mentions (#1979) 2026-03-01 17:14:10 +00:00
Philip Okugbe
69d7532c6c feat(ee): audit logs (#1977)
feat: clickhouse driver
* sync
* updates
2026-03-01 01:29:03 +00:00
Philip Okugbe
59e945562d feat(ee): page-level access/permissions (#1971)
* Add page_hierarchy table

* feat(ee): page-level permissions

* pagination

* rename migration
fixes

* fix

* tabs

* fix theme

* cleanup

* sync

* page permissions notification
* other fixes

* sharing disbled

* fix column nodes

* toggle error handling
2026-02-26 19:49:10 +00:00
Philipinho
873c963043 fix db types duplication 2026-02-19 22:34:07 +00:00
Philip Okugbe
05b3c65b0f feat: notifications (#1947)
* feat: notifications
* feat: watchers

* improvements

* handle page move for watchers

* make watchers non-blocking

* more
2026-02-14 20:00:38 -08:00
Philipinho
ab7999a946 v0.25.3 2026-02-09 18:27:55 -08:00
Philip Okugbe
0f02261ee6 feat: page version history improvements (#1925)
* Refactor: use queue for page history

* feat: save multiple version contributors

* display contributor avatars in history list

* fix interval
2026-02-09 18:25:35 -08:00
Philip Okugbe
1ad53c2581 feat(ee): public sharing controls (#1910)
* feat(ee): public sharing controls
* lint
2026-02-06 10:35:36 -08:00
Philip Okugbe
5506eb194b feat: page history diff (#1891)
* Show actual history changes
* V2 - WIP
* feat: page history diff
* fix: exclude content from history listing

---------

Co-authored-by: Jason Norwood-Young <jason@10layer.com>
2026-02-03 11:55:20 -08:00
Philip Okugbe
78b1c1a453 feat: switch to cursor pagination (#1884)
* add cursor pagination function

* support custom order modifier
* refactor returned object

* feat(db): migrate paginated endpoints to cursor-based pagination

* sync

* support hasPrevPage boolean

* feat(client): migrate pagination from offset to cursor-based

* support beforeCursor/prevCursor

* wrap search results in items array for API consistency
2026-01-30 19:28:54 +00:00
Philipinho
3523600f40 add timestamps 2026-01-27 16:49:22 +00:00
Philip Okugbe
aa143ad79c refactor(db): migrate from node-postgres to postgres.js (#1846)
* refactor(db): migrate from node-postgres to postgres.js
* ignore schema param
2026-01-21 18:12:16 +00:00
Philip Okugbe
47097969a0 fix: use subquery (#1833)
- enhance file tasks list endpoint
2026-01-13 15:58:26 +00:00
Philip Okugbe
9fb16bc842 feat(EE): AI vector search (#1691)
* WIP

* AI module - init

* WIP

* sync

* WIP

* refactor naming

* new columns

* sync

* sync

* fix search bug

* stream response

* WIP

* feat embeddings sync

* refine

* Add workspaceId to page events

* refine

* WIP

* add translation string

* sync

* reset ai answer on query change

* hide AI search in cloud

* capture streaming error

* sync
2025-12-01 11:50:25 +00:00
Philip Okugbe
3164b6981c feat: api keys management (EE) (#1665)
* feat: api keys (EE)

* improvements

* fix table

* fix route

* remove token suffix

* api settings

* Fix

* fix

* fix

* fix
2025-10-07 21:05:13 +01:00