Compare commits

...

72 Commits

Author SHA1 Message Date
2524f39a36 Merge pull request 'docs: how to bring up a local dev stand (+ gotchas), referenced from AGENTS.md' (#272) from docs/dev-stand-guide into develop
Reviewed-on: #272
2026-07-01 18:32:35 +03:00
claude code agent 227
ef173f022d docs: add "Running a local dev stand" guide + reference it from AGENTS.md
Captures the non-obvious gotchas that make bringing up a local instance
painful: the collaboration server is a THIRD process (pnpm dev starts only
API + client) that must be built before running (tsx/ts-node fail on NestJS
DI); APP_SECRET must be identical between the API and collab servers or every
realtime connection is rejected with "Invalid collab token"; Vite binds
localhost so LAN access needs --host; a stale @docmost/editor-ext white-
screens the client; pgvector is mandatory; migrations don't auto-run in dev.
Also documents that demo/test passwords should be a simple one-word
alphanumeric (no special chars, which get mangled through shells/JSON/URLs).

Referenced from AGENTS.md (Commands + Two-server-processes sections).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-01 03:21:41 +03:00
38f9a7938a Merge pull request 'feat(editor): restore reading scroll position on reload (#266)' (#267) from feat/266-scroll-position into develop
Reviewed-on: #267
2026-06-30 19:59:50 +03:00
claude code agent 227
30cdd65b92 test/refactor(#266): cover anti-clobber capture + once-guard; log storage errors
Review round 1 on the scroll-position feature:
- F1: add two tests for the hook's subtlest invariants — (a2) the restore
  target is captured synchronously at mount and survives a fresh scroll@0
  overwriting storage on load (a regression moving the capture into an effect
  would now fail); (a3) restore runs at most once per mount even when called
  again (the wiring effect can re-run).
- F2: log instead of silently swallowing sessionStorage errors in
  readStorage/writeStorage (AGENTS.md "errors must never be swallowed" rule);
  no user notification since a missed scroll restore is not actionable.
- F3: document the hard dependency on PageEditor remounting per page
  (key={page.id}) at the refs declaration — the per-mount refs are not reset
  on an in-place pageId change, so removing that key would break restore on
  the 2nd page.

vitest 9/9, tsc 0, eslint 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 12:13:44 +03:00
claude code agent 227
b601c78c21 feat(editor): restore reading scroll position on reload (#266)
Adds useScrollPosition(pageId): saves window.scrollY to sessionStorage
(key gitmost:scroll-position:<pageId>) on throttled scroll / pagehide /
visibilitychange / cleanup, capturing the previously-saved value
synchronously at mount before any handler can overwrite it with the fresh 0.
restoreScrollPosition() (wired in page-editor.tsx to fire once the live
content is laid out, !showStatic && editor) yields to a #hash anchor, then
polls the document height and scrolls to the saved Y once the content is
tall enough, with a 5s timeout clamped to the max reachable position. All
storage access is try/caught so a disabled/quota'd Storage never breaks the
page. The in-flight restore poll is held in a ref and cancelled on unmount,
so a fast SPA navigation can't scroll the next page. closes #266

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 11:43:14 +03:00
79394b3ef8 Merge pull request 'test(#244): dictation ordered-emitter + internal-link paste (Phase 2 tail)' (#263) from test/244-phase2-tail into develop
Reviewed-on: #263
2026-06-30 11:21:17 +03:00
e3ec9a2965 Merge pull request 'fix(#262): reindex counter polls past the stale pre-reindex snapshot' (#264) from fix/262-reindex-progress-realtime into develop
Reviewed-on: #264
2026-06-30 11:21:01 +03:00
449a304657 Merge pull request 'fix(#260): open MCP collab docs by canonical UUID (slugId doc-name split)' (#265) from fix/260-collab-docname-slugid into develop
Reviewed-on: #265
2026-06-30 11:20:51 +03:00
claude code agent 227
e04afee629 test(#260): cover replaceImage's UUID lock-key invariant; drop dead cache line
Reviewer round 1 on the #260 collab-doc-name fix:

- F1: replaceImage is the one path where the resolved UUID gates BOTH the
  collab-doc open AND the per-page mutex key (withPageLock(pageUuid)). Add a
  deterministic test to resolve-page-id-collab-doc-name.test.mjs: it gates
  /files/upload so replaceImage parks mid-upload holding its lock, asserts the
  doc opened as page.<uuid> (never page.<slug>), and probes the SHARED
  page-lock chain — a withPageLock(UUID) probe must stay blocked while
  replaceImage holds it (with a free-key probe as a non-vacuity guard). The
  test fails if the lock key is reverted to the slugId (verified).
- F2: drop the dead `pageIdCache.set(uuid, uuid)` — resolvePageId returns on
  the isUuid() short-circuit before the cache is ever read with a uuid key, so
  only slugId->uuid entries are stored/read. Comment corrected to match.

MCP suite 430/430, tsc 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 10:46:07 +03:00
claude code agent 227
3b80285d57 fix(#260): open MCP collab docs by canonical UUID (slugId doc-name split)
Real root cause of the silent MCP edit loss: the web editor always opens the
collaboration document by the page UUID (`page.${page.id}`), but the MCP
opened it by the agent-supplied id — usually a slugId — so `page.${pageId}`
became `page.<slugId>`. For one DB page that is TWO independent Yjs documents;
both persist to the same `pages` row (findById/updatePage resolve id or
slugId), so the human tab's debounced store overwrites the agent edit
(last-store-wins) — gone after reload, never shown live. The slugId doc also
made the server's transclusion sync + embedding reindex throw Postgres 22P02.

Fix:
- MCP (primary): resolvePageId(pageId) returns the canonical UUID — a UUID
  short-circuits with no network call, a slugId resolves once via getPageRaw
  and is cached both ways. Every collab-write path (mutatePageContent /
  updatePageContentRealtime / replacePageContent and the mutate/replace/
  unlocked seams) now opens by the resolved UUID, so the MCP and the editor
  share ONE Yjs doc. replaceImage's whole-operation page lock also keys on the
  UUID so it serializes against the other (now-UUID-keyed) writes.
- Server (defense + kills the 22P02 noise): onStoreDocument passes the resolved
  page.id — not the raw doc-name id — to syncTransclusion, the embedding queue,
  the mention-notification job, addContributors, and the in-tx history read.
  Content store and the empty-guard are untouched.

Tests: a new MCP test stands up a real Hocuspocus server and asserts a slugId
input opens `page.<uuid>` (never `page.<slugId>`), with UUID short-circuit and
single-resolve caching; the server spec asserts the side-effects receive the
UUID for a `page.<slugId>` doc. closes #260

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 10:04:49 +03:00
claude code agent 227
42a1fa1d3a test(#244): cover the out-of-order failure branch of the dictation emitter (F1)
The reviewer noted the in-order emitter's else branch (a NOT-next-to-emit
segment failing → buffer an empty placeholder so the drain can skip it,
use-streaming-dictation.ts:215-218) was the one reachable ordering branch
left uncovered. Add a non-vacuous case: with 3 segments, reject seq 1
(out of order) → one notification, nothing emitted; resolve seq 0 → "alpha";
resolve seq 2 → "gamma". The seq-2 flush proves the empty placeholder let the
emitter advance PAST the failed seq 1 — without the else branch the drain
would stall at the missing seq 1 and "gamma" would never emit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 10:01:49 +03:00
claude code agent 227
67312a3753 fix(#262): keep polling the reindex counter past the stale pre-reindex snapshot
After "Reindex now" the "Indexed X of Y" counter froze at 0 until a manual
reload. Root cause is purely client-side: right after the mutation the
client still holds the PRE-reindex settings snapshot, which for an already
fully-indexed workspace reads reindexing=false, indexed>=total. The
deadline-clearing effect evaluated isReindexComplete() against that stale
snapshot, read it as "done", and cleared the poll deadline before the first
post-reindex poll ever landed — so polling never ran and the counter stayed
at 0 (a reload just fetched one fresh snapshot).

Gate completion on having actually observed the active run: a
reindexSeenActiveRef, reset on each new reindex (mutation onSuccess, before
setting the deadline) and latched true once a poll reports reindexing=true.
isReindexComplete(status, seenActive) and nextReindexPollInterval now require
seenActive, so the stale fully-indexed snapshot no longer reads as finished.
The server pre-seeds reindexing=true from enqueue time, so seenActive latches
early and a genuine completion still stops polling promptly; the
REINDEX_POLL_CAP_MS cap is checked first and always wins, so polling can
never run away. closes #262

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 09:12:15 +03:00
claude code agent 227
ef27b6d440 test(#244): cover dictation ordered-emitter + internal-link paste (Phase 2 tail)
Backfill the two genuinely-uncovered infra-free units from the #244 Part B
test backlog (the rest was already covered by #248/#257):

- use-streaming-dictation: the in-order transcription emitter. Drives the
  real hook via renderHook with mocked VAD + deferred transcribeAudio so the
  test controls response order. Asserts out-of-order HTTP responses still
  emit text in segment order; whitespace trimmed and empty results dropped
  while the sequence advances; a failed segment shows one notification and is
  skipped so later segments still flush; a response resolving after cancel()
  is dropped (stale-epoch guard).
- internal-link-paste (handleInternalLink / createMentionAction): validateFn
  reject → no resolve/dispatch; resolve → mention node with the resolved page
  + anchor dispatched via replaceWith at pos; "Untitled" fallback; reject →
  raw url inserted as text under a link mark; createMentionAction wiring to
  getPageById on success + failure.

Test-only; no production code changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 09:07:45 +03:00
c4842367af Merge pull request 'docs(changelog): sync compare-links for 0.94.0 (#258)' (#261) from fix/258-changelog-compare-links into develop
Reviewed-on: #261
2026-06-30 09:02:22 +03:00
claude_code
96b9ec11d6 ci: use mirror.gcr.io for postgres and redis
Update GitHub workflow services to pull PostgreSQL and Redis images from `mirror.gcr.io` instead of Docker Hub. This avoids anonymous pull rate‑limit failures on shared GitHub runner IPs by using the Docker Hub pull‑through cache.
2026-06-30 08:50:00 +03:00
claude code agent 227
24b802baa3 docs(changelog): sync compare-links for the 0.94.0 release (#258)
The [Unreleased] compare link still pointed at v0.93.0 even though the
0.94.0 release section already exists, and there was no [0.94.0]
link-reference at all (the header was unresolvable). Point [Unreleased] at
v0.94.0...HEAD and add [0.94.0]: v0.93.0...v0.94.0 so every version header
resolves. closes #258

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 04:01:13 +03:00
claude_code
f8d26420eb test(mcp): add stashPage to HOST_CONTRACT_METHODS (fix drift-guard)
stashPage is declared in the server's DocmostClientLike interface and
shipped as the stash_page MCP tool (client.ts, tool-specs.ts, index.ts),
but the hand-maintained HOST_CONTRACT_METHODS mirror in the contract test
was never updated — so the drift-guard test failed and broke CI's
unit-test job. Add the missing name; both directions now agree.
2026-06-30 03:44:29 +03:00
claude_code
5c1187b864 feat(editor): add Clear formatting button to bubble menu
The floating bubble menu had no way to clear formatting, so in the
default configuration (fixed toolbar disabled) users could not reset
inline formatting at all. Mirror the fixed-toolbar action into the
bubble menu: a new "Clear formatting" item running unsetAllMarks().

- bubble-menu.tsx: import IconClearFormatting; append a non-toggle
  "Clear formatting" item (isActive: () => false) to the items array.
- No i18n changes — the "Clear formatting" key already exists in all
  locales.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 03:26:17 +03:00
claude_code
14f83abe78 fix(editor-ext): remove duplicate escapeHtmlAttr (TS2393, broken CI)
Merging the image-captions (#221) and lossless-export branches each added
its own escapeHtmlAttr in turndown.utils.ts, producing two implementations
of the same function and breaking `tsc --build` (TS2393) — which failed the
Build editor-ext step across all CI jobs.

Drop the lighter image-captions duplicate (escapes & and ") and keep the
fuller version (escapes & " < >). It is a strict superset: both call sites
(serializeAttrs, the image rule) place the value inside a double-quoted HTML
attribute, where extra < > escaping is harmless and idempotent on re-import.
Verified: editor-ext builds; turndown.dataloss + image-markdown tests pass.
2026-06-30 02:51:20 +03:00
22ea387495 Merge pull request 'feat(#246): inline spoiler mark (blur + click-reveal, lossless Markdown)' (#259) from feat/246-spoiler into develop
Reviewed-on: #259
2026-06-30 01:47:46 +03:00
b56a1629d2 Merge pull request 'feat(editor): image captions (figcaption) with lossless markdown round-trip (#221)' (#233) from feat/221-image-captions into develop
Reviewed-on: #233
2026-06-30 01:47:27 +03:00
7e6dd457a4 Merge pull request 'refactor(#193): tool-host drift-guard + staged plan (shared spec registry already merged)' (#249) from refactor/193-tool-spec-registry into develop
Reviewed-on: #249
2026-06-30 01:47:13 +03:00
ad08458ac4 Merge pull request 'fix(#244): two HIGH data-loss bugs — lossless markdown export + store-side empty-guard' (#248) from fix/244-dataloss-bugs into develop
Reviewed-on: #248
2026-06-30 01:46:42 +03:00
claude code agent 227
9bbac29bc5 Merge remote-tracking branch 'gitea/develop' into HEAD
# Conflicts:
#	apps/server/src/collaboration/extensions/persistence-store.spec.ts
#	apps/server/src/collaboration/extensions/persistence.extension.ts
2026-06-30 01:44:27 +03:00
42f3a328c2 Merge pull request 'feat(#251): intentional-clear signal editor→store (persist deliberate clear, keep #248 guard)' (#253) from feat/251-intentional-clear into develop
Reviewed-on: #253
2026-06-30 01:36:46 +03:00
a8a7fad850 Merge pull request 'test(#244): Part B backlog — editor-ext/mcp/client/server unit+contract tests + findBreadcrumbPath mutation fix' (#257) from test/244-part-b into develop
Reviewed-on: #257
2026-06-30 01:36:00 +03:00
claude code agent 227
3c7b69d6d4 test(#221): make the caption escaping assertion non-vacuous (F1)
The special-chars test only checked substrings (data-caption=/Tom/Jerry) that
survive even if escapeHtmlAttr stopped escaping " or double-encoded &. Assert
the exact escaped attribute in the intermediate Markdown
(data-caption="Tom &amp; &quot;Jerry&quot;") and re-parse the rendered HTML to
confirm the recovered caption is exactly Tom & "Jerry".

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 00:06:30 +03:00
d38a39e3e5 Merge pull request 'fix(ai): show live reindex progress so the embeddings counter resets to 0 and climbs' (#242) from fix/embeddings-reindex-progress into develop
Reviewed-on: #242
2026-06-29 23:44:13 +03:00
claude_code
0724d8d362 feat(mcp): expose resolve_comment tool to resolve/reopen comment threads
The Docmost backend (POST /comments/resolve) and the MCP client method
resolveComment() already supported resolving/reopening comment threads, but no
MCP tool surfaced it — so agents could only close threads destructively via
delete_comment. Register a resolve_comment tool wrapping the existing client
method.

- packages/mcp/src/index.ts: register resolve_comment (commentId + optional
  resolved, default true → close; false → reopen); extend SERVER_INSTRUCTIONS
- packages/mcp/build/index.js: regenerated via tsc
- packages/mcp/README.md / README.ru.md: document resolve_comment; bump tool
  count 40 → 41
- packages/mcp/test-e2e.mjs: add resolve → verify resolvedAt → reopen coverage

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 23:42:57 +03:00
116a231691 Merge pull request 'fix(#255): disconnect socket.io redis-adapter pub/sub clients on shutdown' (#256) from fix/255-ws-redis-adapter-leak into develop
Reviewed-on: #256
2026-06-29 23:23:47 +03:00
claude code agent 227
c4ab03d387 docs(editor-ext): correct why vitest skips the table-test helper (F1c)
The comment claimed vitest skips the file because it has no test cases; vitest
collects by filename glob, so the real reason is the name not matching
*.{test,spec}.ts. Reword to cite the glob and warn that adding test cases here
would not run them.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 22:26:39 +03:00
claude code agent 227
b35950ef94 test(editor-ext): extract shared ProseMirror table-test fixture (F1)
The schema + cell/row/table/doc builders + grid/stateFor/trFor were copied
verbatim into the 3 new table-utils test files (and the pre-existing
table-utils.test.ts) — a schema change would have to be synced across all four.
Move them into a shared table-test-helpers.ts (test-only, excluded from the
build like footnote-corpus.ts) and import it everywhere; cell uses the
(txt, attrs?) superset (a drop-in for the bare (txt) copies). No assertion
changes — test counts unchanged (223 passed + 3 expected-fail).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 21:17:18 +03:00
claude code agent 227
97eef22bc3 test(#251): cover the change-origin guard; add CHANGELOG entry (F1,F2)
F1: add a test that empties a non-empty doc via a change-origin transaction
    (ySyncPluginKey meta, the shape y-tiptap sets for remote/merge updates) and
    asserts the intentional-clear signal is NOT emitted — pinning the
    isChangeOrigin early-return that keeps remote emptiness from punching through
    the #248 server guard. The 4 existing tests use local transactions and never
    exercised that true-path (verified: removing the guard fails only this test).
F2: record the #248 empty-overwrite guard and the #251 intentional-clear in the
    CHANGELOG [Unreleased] Fixed section.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 21:14:36 +03:00
claude code agent 227
aa14ad6698 docs(ai): quote the content predicate verbatim; drop twin tautological assert (F16,F17)
F17: the header's content-clause literal omitted the [[:space:]]* tolerance;
     copy page.repo.ts's exact '"type"[[:space:]]*:[[:space:]]*"text"' (jsonb::text
     renders a space after the colon, which is why the tolerance exists).
F16: remove expect(ttl).toBeGreaterThan(0) — the twin of the F15 removal;
     expect(ttl).toBe(120) strictly subsumes it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 17:30:52 +03:00
claude code agent 227
1e5994573f docs(ai): list all three embeddable clauses in int-spec header; drop tautological assert (F14,F15)
F14: the lockstep int-spec header still described the pre-F6 two-clause set with
     'iff' — add the content-JSON text-node clause so it matches embeddablePredicate.
F15: remove the redundant expect(ttl).toBeLessThanOrEqual(120) that followed
     expect(ttl).toBe(120).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 17:02:32 +03:00
claude code agent 227
d0eae69086 fix(ai): raise reindex pre-seed TTL to the client poll cap; cover predicate clause; align docs (F11-F13)
F11: PRE_SEED_TTL_SECONDS 45->120 (= client REINDEX_POLL_CAP_MS). At concurrency
     1 a queued reindex can wait past the old 45s; if the pre-seed expired while
     pending, getMasked fell back to the COUNT and reported done, so the client
     stopped polling and missed the climb. Tie the pre-seed TTL to the client cap.
F12: extend the lockstep integration spec — insertPage takes content; a
     text_content=null + text-node-content page is IN and a math-only page is OUT,
     pinning the structural "type":"text" clause (and the jsonb space-after-colon).
F13: list all three embeddable clauses in the reindex JSDoc/inline comments.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 16:12:36 +03:00
claude code agent 227
91f24fc062 fix(ai): include content-bearing pages in reindex coverage; correct progress race & hot path (F6-F10)
F6: extend embeddablePredicate to pages with body content but null text_content,
    keyed on the text-node marker "type":"text" (not a bare "text": key, which
    also matched math nodes' attrs.text and would leave math-only pages stuck
    below 100%). Numerator and denominator share the predicate; tests assert the
    compiled WHERE is byte-identical and a math-only doc is excluded.
F7: correct the start() JSDoc (both totals are the real page count).
F8: nextReindexPollInterval reuses isReindexComplete.
F9: getMasked reads progress first and skips the two COUNTs while a reindex is active.
F10: pre-seed the progress entry with a short 45s TTL so a deduped enqueue's
     phantom "0 of N" expires quickly instead of sticking for the 1h TTL.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 14:37:26 +03:00
claude code agent 227
888deba891 docs(#193): drop uploadImage from MCP-transport method list in contract-guard comment (F3)
uploadImage is internal to client.ts (called by insertImage/replaceImage);
the MCP transport (index.ts) does not call it directly. Remove it from the
comment's list of transport-called methods.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 14:07:02 +03:00
claude code agent 227
f9b58a0e3d test(server): SSRF guardedFetch, decryptHeaders fail-open, yjs.util, tool-spec parity, storage delegation
guardedFetch blocks loopback/private/link-local/metadata IPs and never calls
fetch; decryptHeaders fails open (returns undefined, warns once, no blob leak).
yjs.util setYjsMark/removeYjsMarkByAttribute/updateYjsMarkAttribute on real
Y.Docs. SHARED_TOOL_SPECS<->in-app parity (name/desc/input-schema; a dropped or
renamed wiring fails). Replace the tautological storage.service spec with
driver-delegation checks across every public method.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 04:49:56 +03:00
claude code agent 227
388894c257 fix(client): stop findBreadcrumbPath mutating the live tree + tests
findBreadcrumbPath set node.name='Untitled' in place, mutating the shared
sidebar tree (treeData passed from resolveBreadcrumbNodes). Surface 'Untitled'
via a shallow copy on the returned chain only; input nodes stay untouched.
Add tests for the non-mutation invariant plus applyUpdateOne reducer,
formatRelativeTime buckets, and the pure tree mappers (sortPositionKeys,
pageToTreeNode).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 04:49:48 +03:00
claude code agent 227
e2b7ff10d9 test(mcp): media round-trip attrs, cookie parsing, anchor apply, recreate drift
Extract pure extractAuthTokenFromSetCookie from performLogin (behavior-identical)
so cookie parsing is unit-testable without a network login. Add round-trip
coverage for media attrs (width/height/align/drawio/escaping) the existing
suite omitted; applyAnchorInDoc selection/ambiguity/atom-break cases; and a
cross-copy drift guard proving the vendored editor-ext recreate-transform and
the @fellow npm copy used by diff.ts emit identical steps (apply(diff)==target).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 04:49:41 +03:00
claude code agent 227
683a62a547 test(editor-ext): cover recreateTransform invariant, table move/selection, unique-id
recreateTransform: apply(diff)==target round-trip across text/mark/structural
edits and complexSteps/wordDiffs options. moveRow/moveColumn drive real PM
tables (reorder preserves content, self-move/no-table -> false, CellSelection
on select). getSelectionRangeInColumn: single/multi-column + colspan + range
guard. addUniqueIdsToDoc: only configured types, nested targets, idempotency.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 04:49:31 +03:00
claude code agent 227
82b042209e fix(ws): make redis adapter error handlers actually log (were noop)
The pub/sub error handlers were `(err) => () => {}` — a noop returning an
inner arrow that never runs, so socket.io redis client errors were silently
swallowed. Log them via Nest Logger. Adjacent pre-existing bug surfaced in
review of #255.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 04:32:34 +03:00
claude code agent 227
a0f4c86a74 fix(ws): disconnect socket.io redis adapter pub/sub clients on shutdown
The WsRedisIoAdapter creates two ioredis clients (pubClient/subClient) for
@socket.io/redis-adapter but never closed them, leaking their TCP handles on
application shutdown (#255). The redis-adapter does not own these clients'
lifecycle, and the adapter is instantiated from main.ts (not a DI provider),
so no Nest lifecycle hook applied to it.

Keep references to both clients and override dispose(), which Nest's
SocketModule.close() invokes exactly once during shutdown after all socket.io
servers are closed. Use disconnect(false) to mirror the sibling pub/sub pair
in collaboration/extensions/redis-sync (onDestroy): immediate close, no QUIT
round-trip, no auto-reconnect. Refs are nulled to guard against double-close.
Runtime behavior is unchanged; only the shutdown path is added.

Verified with a script that boots connectToRedis() against a real Redis:
2 sockets to :6379 open after connect, 0 remain after dispose().

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 04:28:56 +03:00
claude code agent 227
cce539e8e2 fix(collab): hoist intentional-clear consume out of the store retry loop (#251)
The store-side empty-guard consumed the per-document intentional-clear flag
INSIDE the bounded retry loop. consumeIntentionalClear always deletes the
in-memory Map entry, but a tx rollback cannot un-delete it: attempt 1
consumed the flag then updatePage threw a transient error and rolled back;
attempt 2 re-read the page non-empty, saw the flag gone, and the empty-guard
silently BLOCKED the write — dropping the user's deliberate clear and
defeating the retry guarantee for clears.

Hoist the decision out of the loop (like consumeContributors /
consumeAgentTouched): consume once into `allowIntentionalClear` before the
`for`, and only read that boolean on the empty-over-non-empty branch. The
single hoisted consume still drops a pending flag for a non-empty store
(the "cleared then retyped" case), since every store consumes regardless of
incoming emptiness.

Add a regression test: arm via the real onStateless transport, updatePage
throws once then succeeds, assert it is called twice and the retry writes the
empty doc (the clear survives). It fails on the old consume-in-loop ordering
(updatePage called once) and passes after the hoist.

Document the known fail-safe limitation near the TTL constant: if document
ownership transfers / a node crashes between the stateless signal and the
debounced store, the in-memory flag is lost and the clear is silently not
applied (the doc reloads non-empty) — fail-safe, content is never destroyed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 04:17:41 +03:00
claude code agent 227
3fdb1e05a4 feat(collab): persist a deliberate page clear via an intentional-clear signal (#251)
The #248 store-side empty-guard (onStoreDocument) unconditionally refuses to
overwrite non-empty persisted content with an empty document, because a
momentarily-empty live Y.Doc is indistinguishable from a real clear at the
store layer. That correctly blocks glitches/bad-merges, but also blocks a user
who genuinely wants to empty a page. This re-introduces a WORKING, narrow,
non-spoofable exception (the dead context.intentionalClear hatch #248 removed
never had a real channel).

Definition of an intentional clear (client, IntentionalClear editor extension):
a LOCAL user transaction (docChanged, NOT a remote y-sync change — filtered via
isChangeOrigin) that reduces a non-empty doc to the empty single-paragraph
shape. This is exactly the select-all + Delete/Backspace keystroke path.

Transport (option b — hocuspocus stateless message): on that transition the
client sends a `{type:'intentional-clear'}` stateless message. The server
(PersistenceExtension.onStateless) records a short-lived (TTL 60s > 45s
maxDebounce), single-use "pending clear" flag keyed by the connection's
document. The next debounced onStoreDocument consumes it on the empty-guard
branch to let that one empty write through.

Why this is the right channel and non-spoofable:
- Yjs transaction origin/metadata does not survive to the server store; awareness
  is per-connection and racy. A stateless message ties the signal to a specific
  clear, survives the debounce, and rides the authenticated connection.
- The document is taken from the connection, never the payload, so a client
  cannot target another page.
- The flag is read ONLY on the empty-over-non-empty branch, so the worst a forged
  signal can do is clear a page the connection may already edit; it can never
  force or alter a non-empty write. Read-only connections cannot arm it. Every
  non-empty store drops a pending flag, so "cleared then retyped" leaves nothing
  usable; the flag is single-use and TTL-bounded.

NOTE: #248 is not yet on develop, so the empty-guard block is included here as
the foundation this exception extends. If #248 lands first this rebases cleanly
(the guard logic is identical; the #251-unique additions are the exception,
onStateless, the pending-flag state, and the client extension).

Tests:
- Server (real transport path, not a hand-poke): onStateless sets the flag with
  the exact client payload, then the debounced onStoreDocument persists the empty
  doc; plus single-use consumption, read-only rejection, non-empty-store drops
  the flag, and the unchanged #248 guard tests (empty-over-non-empty blocked,
  empty-over-empty allowed).
- Client: a real Editor + the actual selectAll+deleteSelection command emits the
  signal; typing / non-emptying edits / already-empty docs do not.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 04:06:39 +03:00
claude code agent 227
57308bc3f3 docs(#221): fix CHANGELOG grammar after setImageCaption removal (F8)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 02:07:41 +03:00
claude code agent 227
4c7b671950 docs(#193): correct contract-guard comment — interface is a subset, not superset
The DocmostClientLike mirror covers only methods the in-app adapter consumes;
the standalone MCP transport calls additional client methods not tracked here
(covered by its own typecheck). Fixes the misleading 'superset' wording (F2).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 01:59:10 +03:00
claude code agent 227
90a3fa012d test(#248 F3): make empty-over-empty test actually reach the store empty-guard
The "does not block an empty store over an already-empty page" test set the
stored page.content to TiptapTransformer.fromYdoc(document,'default') — exactly
the value tiptapJson is computed from — so isDeepStrictEqual(tiptapJson,
page.content) was TRUE and onStoreDocument RETURNED at the unchanged short-circuit
before ever reaching the empty-guard. It exercised the old short-circuit, not the
new guard's `!isEmptyParagraphDoc(page.content)` branch (the only NEW branch
protecting empty existing pages from over-blocking); the condition could be
removed and the test would still pass (false coverage).

Set stored content to an empty paragraph with `content: []` — empty per
isEmptyParagraphDoc but NOT deep-equal to the live doc (which normalizes to a
paragraph with `attrs: { indent: 0 }` and no content key). Execution now skips
the short-circuit and enters the guard; reorient the assertion to "the write is
NOT blocked" (updatePage IS called). Verified the test now FAILS if the
`!isEmptyParagraphDoc(page.content)` condition is removed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 01:56:00 +03:00
claude code agent 227
bdc033e689 fix(ai): extract reindex-button loading predicate + correct poll comment (PR #242)
F4: extract the reindex button `loading` predicate into a pure, unit-tested
`isReindexButtonLoading({ mutationPending, deadline, status })` next to the
other reindex helpers, replacing the inline JSX expression. Covers the
load-bearing post-cap case (deadline nulled, reindexing stale-true -> not
loading) plus mutationPending, active-run, and finished cases.

F5: rewrite the `useAiSettingsQuery` poll comment to match the actual
`nextReindexPollInterval` stop condition (continues while reindexing===true OR
within deadline and not fully indexed; stops only when reindexing===false &&
indexed>=total, or the deadline cap) instead of the stale "until indexed===total".

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 01:49:55 +03:00
claude code agent 227
1ddb386214 docs(#221): CHANGELOG — drop removed setImageCaption command mention
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 01:46:49 +03:00
claude code agent 227
43af3dd5f1 test(mcp): cover captioned image inside a column round-trip (F5)
A captioned image in a column is emitted via the imageToHtml helper, a
separate path from the top-level image case whose data-caption branch was
untested. Add a round-trip test with special chars (Tom & "Jerry") that
fails if the imageToHtml caption branch breaks.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 01:43:18 +03:00
claude code agent 227
b02101b58a docs(mcp): correct captioned-image import comment (F6)
The comment referenced markdownToHtml, which does not exist in the mcp
package; the import path is marked.parse + generateJSON (which runs the
image extension's parseHTML). Describe the actual step and regenerate the
build artifact in sync.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 01:43:13 +03:00
claude code agent 227
932bfce1d9 refactor(editor-ext): remove unused setImageCaption command (F7)
The setImageCaption command and its Commands<> declaration were dead:
captions are written via the generic updateAttributes in
useImageTextFieldControl, and a repo-wide grep finds zero callers.
Remove the speculative implementation (image.ts) and its type
declaration.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 01:43:08 +03:00
claude code agent 227
04fda0c0b2 test(#248 F2): exercise <,> escape branches in raw-HTML export round-trip
The escaping round-trip test's data (A & "B") only contained & and ",
so the <,> branches of escapeHtmlAttr (&,",<,>) and escapeHtmlText (&,<,>)
were never exercised; a regression dropping <,> escaping would still pass.
Extend the data to A & <B> "C" in both the data-label attribute and the
visible text so both functions' <,> branches are genuinely covered. Assert
the well-formed escaped tag (attr: A &amp; &lt;B&gt; &quot;C&quot;, text:
A &amp; &lt;B&gt; "C"), explicitly reject the raw tag-corrupting forms,
and confirm markdownToHtml restores the originals. Comment updated to match.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 00:04:56 +03:00
claude code agent 227
4131deaabb test(mcp): robustify the client-host contract drift-guard parser
Architect-review hardening of the bidirectional DocmostClientLike <->
HOST_CONTRACT_METHODS guard (test-only, no production change):

- Interface method-name regex now accepts full TS identifiers
  (digits/_/$) and generic signatures (method<T>(), avoiding a future
  benign false-FAIL.
- Skip /* ... */ block comments in the interface body so a `name(` line
  inside one is not falsely parsed as a method.
- Wrap the cross-package readFileSync with a clear "expected monorepo
  layout" error instead of a bare ENOENT when run outside the monorepo.
- Narrow the guard's comments/error to state plainly it checks the
  method-NAME set only; signature parity remains the deferred staged-plan
  item.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 23:54:04 +03:00
claude code agent 227
5308f2fb65 test(#248 F2): cover HTML-escaping of attrs/text in lossless raw-HTML export
Round-1 review F2. The escapeHtmlAttr (&,",<,>) and escapeHtmlText (&,<,>)
helpers in turndown.utils were untested — every existing round-trip case used
alphanumeric values, so no escape branch ran. A mention/status carrying HTML
special chars would re-emit malformed HTML that import's parseHTML can't
restore → the same data loss this PR fixes, uncaught.

Add a round-trip case to turndown.dataloss.test.ts: a mention with `&` and `"`
in both data-label and visible text. Assert (a) the exported Markdown carries
the correctly-escaped, well-formed tag (data-label="A &amp; &quot;B&quot;",
text escapes &), not the raw malformed form; and (b) markdownToHtml restores
the original unescaped values (attribute `A & "B"`, text `@A & "B"`).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 23:45:53 +03:00
claude code agent 227
78cc019492 fix(#248 F1): remove dead intentional-clear escape hatch from empty-guard
Round-1 review F1 (maintainer decision: variant A). The store-side
empty-guard's `context?.intentionalClear === true` branch was dead:
`intentionalClear` is never set in production (connection context is
{user, actor, aiChatId}); it appeared only in the guard and a hand-injected
spec, so the guard already blocked empty-over-non-empty unconditionally.

- persistence.extension.ts: drop the dead branch; the guard now simply
  skips empty-over-non-empty, full stop. Reference issue #251 (real
  intentional-clear UX) in the comment where the branch was.
- persistence-store.spec.ts: remove the misleading "persists an intentional
  clear" escape-hatch test (false coverage — green only because the flag was
  injected by hand). Real guard tests (empty-over-empty allowed,
  empty-over-non-empty blocked, etc.) kept.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 23:45:45 +03:00
claude code agent 227
85b38d6946 fix(ai): address reindex-progress review round 1 (PR #242)
F1: clear the "Reindex now" spinner once the poll cap fires. Gate the
reindexing part of the button's loading state on the active poll window
(reindexDeadline !== null) so a run that outlives the 120s cap no longer
leaves the button stuck-disabled with a stale `reindexing: true`; the
admin can restart.

F2: rewrite reindexWorkspace JSDoc to describe the EMBEDDABLE page set
(text OR existing embeddings), matching getEmbeddablePageIds /
countEmbeddablePages instead of the old "every non-deleted page".

F3: extract the shared embeddable-content predicate into a private
PageRepo.embeddablePredicate helper, called by both countEmbeddablePages
and getEmbeddablePageIds, removing the verbatim duplication. Behavior is
identical (lockstep int-spec stays green).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 23:39:20 +03:00
claude code agent 227
d39b7ae67c refactor(editor): dedupe alt/caption controls via shared hook (F4)
Extract the ~110 duplicated lines into one parameterized
useImageTextFieldControl and make useAltTextControl/useCaptionControl
thin wrappers. Behavior identical; t("...") literals stay in the
wrappers so i18n extraction keeps working. sanitizeCaption still
exported for its unit test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 23:38:48 +03:00
claude code agent 227
c124fb1f2c test(editor): fix wrong sanitizeCaption collapse-cap comment (F3)
The comment claimed 250 groups -> 499 chars -> slice past 500; the
input is 120 "a  b " groups collapsing to 479 chars, under the cap
with no slice. Correct the comment and assert the 479 length.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 23:38:41 +03:00
claude code agent 227
d3ebae48cf test(mcp): cover image caption markdown round-trip (F2)
Add PM -> markdown -> PM round-trip assertions for image caption
(plain and special-char), which fail without F1 and pass with it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 23:38:36 +03:00
claude code agent 227
607aed5997 fix(mcp): restore image caption on markdown round-trip (F1)
Stock @tiptap/extension-image carries no caption attribute, so
markdownToProseMirror through docmostExtensions dropped the
data-caption the client emits, breaking the lossless claim. Extend the
Image node (mirroring editor-ext image.ts and the nearby Highlight
extend) to parse/render data-caption. Rebuilt build/.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 23:38:28 +03:00
claude code agent 227
5b88e3dddf test(mcp): drift-guard HOST_CONTRACT_METHODS against DocmostClientLike both ways
The contract test only checked one direction (each name in
HOST_CONTRACT_METHODS exists on the real DocmostClient). But
HOST_CONTRACT_METHODS is itself a hand-copy of the server's
DocmostClientLike interface (docmost-client.loader.ts), and that
list<->interface link was untested: a method added to the interface +
consumed by the adapter but forgotten in the list (or removed from the
interface but left in the list) would escape both the server typecheck
(the pkg emits no .d.ts) and the existing test (name not in the list) ->
a runtime "x is not a function" in a tool call.

Parse the method names from the DocmostClientLike interface body (read
the .ts source via import.meta.url, scan member-signature lines) and
assert.deepEqual them against HOST_CONTRACT_METHODS BOTH ways. Lists are
currently identical (39=39), so this is a coverage hole closed, not a
live bug.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 23:36:22 +03:00
claude code agent 227
d0ca127d83 refactor(ai-chat): drift-guard the DocmostClientLike hand-mirror (#193)
Issue #193's tool-half has two open items. The shared, zod-agnostic tool-spec
registry (SHARED_TOOL_SPECS) for the identical tools is already merged
(f3fa15e7) and consumed by both layers, so that subset is done. The remaining
items are: (a) deriving the layer-3 hand-mirror `DocmostClientLike` from the
real client type, and (b) folding more tools into the registry. Both were
deferred as risky, and that deferral still holds (verified, see below) — so
this change ships the safest concrete increment instead of forcing the risk.

What this adds (behaviour-neutral, test-only + a doc comment):

- packages/mcp/test/unit/client-host-contract.test.mjs: pins the layer-3
  contract from the ESM side, where the real DocmostClient is importable. It
  asserts every method the in-app `DocmostClientLike` mirror declares exists as
  a function on a real DocmostClient instance (constructor is side-effect-free).
  A rename/removal in client.ts now fails this test instead of silently shipping
  a runtime "x is not a function" into an agent tool call. Negative-case
  verified (a bogus method name is detected).

- docmost-client.loader.ts: replaces the vague mirror comment with a pointer to
  the guard test and a concrete, empirically-grounded staged plan for the full
  type-derivation. Verified blockers kept it deferred: @docmost/mcp emits no
  .d.ts (no `declaration`, no `types` export) and the server has no path mapping
  for it, so there is no type to import today; and the real methods' inferred
  CONCRETE return types conflict with the in-app adapter's loose
  Record<string,unknown> + `as`-cast result handling (deriving the exact type
  breaks the build / forces pervasive double-casts and full-surface test stubs).

Out of scope (noted in the issue): the PM<->Markdown converter unification.

Verified: server tsc clean; mcp tsc clean; mcp tests 369 pass (367 + 2 new);
ai-chat tools specs 51 pass. No behaviour change; committed mcp build untouched
(no mcp src changed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 15:07:43 +03:00
claude_code
78953cf775 fix(#244 Part A): close two HIGH data-loss bugs PR #230 only documented
mdrt-2 (markdown export): add lossless turndown rules for the custom nodes
that had no rule — transclusionReference, pageBreak, mention, status. Each
re-emits the node as inert raw HTML carrying every data-* attribute instead
of being silently dropped (childless atom divs) or collapsed to bare text
(mention/status losing data-id/data-color). Empty atom blocks are made
non-blank before turndown's blank-rule strips them (mirrors the footnote-ref
fix). markdownToHtml passes the raw HTML through and each node's parseHTML
rebuilds it, so the form round-trips. Flips the it.fails cases to passing and
adds export + import round-trip coverage.

persist-6 (collab store): add a store-side empty-guard in onStoreDocument.
Before updatePage, if the serialized live doc is an empty paragraph doc AND
the persisted page is non-empty, skip the write and log — unless an explicit
context.intentionalClear signal is present (deliberate select-all+delete).
New/empty pages and unchanged docs are unaffected. Flips the it.failing case
to passing and adds escape-hatch + empty-over-empty coverage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 14:48:31 +03:00
claude_code
bf09eec4e1 fix(ai): address reindex-progress review (PR #242)
- Delete the now-orphaned PageRepo.getIdsByWorkspace (its only caller,
  reindexWorkspace, switched to getEmbeddablePageIds). Its docstring still
  claimed "Used by the RAG bulk reindex"; re-grep confirmed zero callers.
- ai-settings.service.reindex(): if aiQueue.add() throws (Redis hiccup/
  shutdown) the worker never runs so its finally->clear() never fires,
  leaving the seeded progress record stuck for the full 1h TTL (button
  stuck "reindexing: 0 of N"). Roll back the seed THIS call wrote
  (seeded flag, only when get() was null) before re-throwing, so a
  concurrent active run's record is never wiped. Add tests for both the
  clear-on-throw and the don't-clear-a-concurrent-run paths.
- Add an integration spec (real Postgres) proving getEmbeddablePageIds'
  WHERE stays in lockstep with countEmbeddablePages: seeds every boundary
  case and asserts the returned id set equals the count.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 04:39:18 +03:00
a
dc14a9a540 chore(editor): address image-caption review (#221)
- docs: add CHANGELOG Unreleased/Added entry for editable image captions
- test: export sanitizeCaption and add vitest unit coverage
  (whitespace collapse, trim, 500-char boundary)
- refactor: drop duplicate .imageCaption CSS module class, keep the
  global .image-caption as the single source
- docs: fix turndown image-caption comment (video rule emits a markdown
  link, not a <div>)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 04:36:30 +03:00
claude code agent 227
2aa482f62d feat(editor): add editable image captions (#221)
Add a visible caption (<figcaption>) under images, editable from the
image bubble-menu and persisted across all formats: native Yjs/JSON,
HTML export, and Markdown.

- image node: new plain-text `caption` attribute (parse/render
  `data-caption` on <img>, emitted only when set) + `setImageCaption`
  command. The node stays an atom; the schema shape is unchanged, so the
  server's generateHTML/generateJSON path round-trips it for free.
- resize node-view: re-parent the resizable wrapper into a <figure> and
  render the caption in a <figcaption> BELOW it, outside nodeView.wrapper
  (so onCommit's offsetHeight measurement and the left/right resize
  handles still cover the image only). This path also drives read-only /
  share rendering. React placeholder view renders the caption too.
- bubble-menu: new useCaptionControl panel modeled on useAltTextControl
  (own icon, Caption strings, softer sanitizer, ~500 char limit).
- markdown lossless round-trip: a captioned image is emitted as a raw
  <img data-caption> wrapped in a block <div> (same trick as <video>) in
  both the editor-ext turndown rule and the MCP converter; caption-less
  images stay clean ![alt](src). Import restores the caption via the
  shared markdownToHtml + parseHTML.
- styles + i18n keys; tests for the schema attr round-trip, markdown
  round-trip (editor-ext) and the MCP converter.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 04:33:00 +03:00
a
95d07d8d6f fix(ai): align reindex live denominator with the steady-state count
Review fixes for the reindex-progress counter (#242):

1. Denominator jump (478 -> 500 -> 478): reindexWorkspace iterated
   getIdsByWorkspace() (ALL non-deleted pages) but the seed/status use
   countEmbeddablePages (text OR existing-embedding), so the live total exceeded
   the steady-state total whenever empty/text-less pages existed. Add
   PageRepo.getEmbeddablePageIds() that selects the IDs of the EXACT same set
   countEmbeddablePages counts (deletedAt IS NULL AND (text_content matches a
   non-whitespace char OR an EXISTS non-deleted pageEmbeddings row)), and have
   reindexWorkspace iterate THAT set with total = its length. Iteration set and
   count source change together, so done reaches exactly total == the
   steady-state denominator. Dropping text-less pages is correct (reindexPage
   no-ops on them; a page that lost its text but still has stale embeddings is in
   the set via the EXISTS clause and still gets its stale rows cleared). Removed
   the contradictory "worker overwrites with the real page count" / "denominator
   matches" comment.

2. Mid-run re-trigger reset: reindex() unconditionally re-seeded done=0 before an
   enqueue that de-dupes a running job, so a second click/admin/tab reset the
   visible counter while the worker kept incrementing. Now seed only when
   get(workspaceId) === null; the worker's own start() remains the single
   authoritative reset.

3. TTL: documented that it is intentionally tied to write progress
   (start/increment) and never refreshed on get(), so a dead worker's record
   can't be kept alive forever by client polling.

Tests: new embedding-reindex-progress.service.spec.ts (fake ioredis: hash ->
ReindexProgress, malformed/missing/non-numeric -> null, non-finite startedAt ->
0, hgetall throws -> null, start/increment issue hset/hincrby+expire and swallow
Redis errors); reindex() seed order + no-reseed-when-active guard; getMasked
live test now uses progress.total=500 vs DB 478 to pin the progress branch;
indexer specs updated to mock getEmbeddablePageIds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 04:32:36 +03:00
a
630939e8f3 feat(ai): tighten reindex-progress polling on the reindexing flag
Make the "Indexed N of N" counter update near-realtime during a reindex by
tracking the server's active-run state instead of a pure time window:

- Set REINDEX_POLL_INTERVAL to 5000ms (kept bounded by the cap).
- Extract two pure, exported, unit-tested helpers:
  - nextReindexPollInterval: keep polling while the server reports an ACTIVE run
    (reindexing===true) OR within the deadline and not yet done; stop once the
    run is finished AND fully indexed (reindexing===false && indexed>=total) or
    the deadline cap is hit (the cap always wins, so a stuck/never-clearing
    progress record can't poll forever).
  - isReindexComplete: deadline-clear predicate mirroring that stop condition.
- Wire the refetchInterval and the deadline-clearing effect to those helpers.
- Keep the Reindex button spinner active for the whole run (loading also while
  settings.reindexing), reusing the existing loading prop; also blocks a
  redundant mid-run re-trigger (server de-dupes regardless).

No SSE/websockets: polling keyed on the reindexing flag is the intended scope.
The counter now tracks the actual active-reindex state and stops promptly when
the server reports the run is done.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 04:32:36 +03:00
a
72bb03918d fix(ai): show live reindex progress in semantic-search settings
The "Indexed X of Y pages" counter stayed stuck at "478 of 478" during a
manual "Reindex now" run instead of resetting to 0 and climbing. The status
reports indexedPages = countIndexedPages (DISTINCT pages with >=1 embedding
row), but reindex hard-replaces each page in its OWN small transaction, so
nearly all pages always have rows -> the count never drops.

Add a per-workspace live reindex-progress record in Redis (reusing the
existing global ioredis client via RedisService, no new Redis config):
- EmbeddingReindexProgressService: start/increment/clear/get over a Redis hash
  with a 1h TTL self-clean; all best-effort/cosmetic so a Redis failure degrades
  to the existing DB-count behavior.
- AiSettingsService.reindex seeds {total, done:0, startedAt} at enqueue time so
  the very first poll already reports done=0.
- EmbeddingIndexerService.reindexWorkspace overwrites total with the real page
  count at start, increments done per processed page (success or handled
  failure), and clears the record in a finally (covers success, fatal abort,
  and the unconfigured early-return) so a failed run never sticks.
- AiSettingsService.getMasked returns the live run numbers when a progress
  record is active (plus an optional reindexing flag), else falls back to
  countIndexedPages/countEmbeddablePages.

Per-page edits (reindexPage) never touch the workspace progress record, and no
mass up-front delete is introduced (search availability preserved).

Tests: indexer sets/increments/clears progress (incl. fatal abort and
unconfigured early-return); status reports run progress when active and falls
back when not.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 04:32:36 +03:00
88 changed files with 7175 additions and 458 deletions

View File

@@ -75,7 +75,9 @@ jobs:
APP_URL: http://localhost:3000
services:
postgres:
image: pgvector/pgvector:pg18
# via mirror.gcr.io (Docker Hub pull-through cache; avoids Hub anonymous
# pull rate-limit that randomly fails on shared GitHub runner IPs).
image: mirror.gcr.io/pgvector/pgvector:pg18
env:
POSTGRES_DB: docmost
POSTGRES_USER: docmost
@@ -88,7 +90,8 @@ jobs:
--health-timeout 5s
--health-retries 20
redis:
image: redis:7
# via mirror.gcr.io (see postgres note above).
image: mirror.gcr.io/library/redis:7
ports:
- 6379:6379
options: >-
@@ -135,7 +138,9 @@ jobs:
NODE_ENV: production
services:
postgres:
image: pgvector/pgvector:pg18
# via mirror.gcr.io (Docker Hub pull-through cache; avoids Hub anonymous
# pull rate-limit that randomly fails on shared GitHub runner IPs).
image: mirror.gcr.io/pgvector/pgvector:pg18
env:
POSTGRES_DB: docmost
POSTGRES_USER: docmost
@@ -148,7 +153,8 @@ jobs:
--health-timeout 5s
--health-retries 20
redis:
image: redis:7
# via mirror.gcr.io (see postgres note above).
image: mirror.gcr.io/library/redis:7
ports:
- 6379:6379
options: >-

View File

@@ -27,7 +27,9 @@ jobs:
# TEST_*_URL overrides are needed.
services:
postgres:
image: pgvector/pgvector:pg18
# via mirror.gcr.io (Docker Hub pull-through cache; avoids Hub anonymous
# pull rate-limit that randomly fails on shared GitHub runner IPs).
image: mirror.gcr.io/pgvector/pgvector:pg18
env:
POSTGRES_USER: docmost
POSTGRES_PASSWORD: docmost_dev_pw
@@ -40,7 +42,8 @@ jobs:
--health-timeout 5s
--health-retries 5
redis:
image: redis:7
# via mirror.gcr.io (see postgres note above).
image: mirror.gcr.io/library/redis:7
ports:
- 6379:6379
options: >-

View File

@@ -197,6 +197,12 @@ pnpm workspace (`pnpm@10.4.0`) orchestrated by **Nx**. Four workspace packages:
Run from the repo root unless noted. The dev workflow needs **Postgres (with the `pgvector` extension) and Redis** reachable per `.env` (copy `.env.example``.env`).
> **Bringing up a full local stand** (API + client + the separate realtime
> collaboration process) has several non-obvious gotchas — a missing collab
> server, `APP_SECRET` mismatch between processes, a stale `editor-ext` white-
> screening the client, LAN exposure. See **[docs/dev-stand.md](docs/dev-stand.md)**
> for the step-by-step and the traps.
```bash
pnpm install # install all workspaces (uses pnpm patches; see package.json `pnpm.patchedDependencies`)
pnpm dev # client (Vite) + server (Nest watch) concurrently — primary dev loop
@@ -241,6 +247,8 @@ Migration files live in `apps/server/src/database/migrations/` and are named `YY
- **API server** — `dist/main` (`apps/server/src/main.ts`), the Fastify HTTP app (`AppModule`).
- **Collaboration server** — `dist/collaboration/server/collab-main` (`pnpm collab`), a Hocuspocus/Yjs WebSocket server (`apps/server/src/collaboration/`) handling real-time document editing, persistence, and page-history snapshots. It listens on `COLLAB_PORT` (default `3001`), separate from the API server's `PORT` (default `3000`), and shares state with the API server through Redis.
`pnpm dev` starts **only** the API server + client — the collaboration process is separate and must be started too, or the editor never connects. See **[docs/dev-stand.md](docs/dev-stand.md)** for running both locally (and why `APP_SECRET` must match between them).
The API server is a Fastify app with a global `/api` prefix (`main.ts` excludes `robots.txt`, public share pages, and `mcp` from the prefix). A `preHandler` hook enforces that a resolved `workspaceId` exists for most `/api` routes (multi-tenant by hostname/subdomain via `DomainMiddleware`). `GET /api/sb/:id` (the anonymous blob-sandbox read route) is listed in that preHandler's `excludedPaths`, so it is exempt from workspace resolution and carries no session auth at all (its capability is the unguessable UUID + TTL + TLS) — unlike `/api/files/public/...`, which still resolves a workspace and requires a workspace-bound attachment JWT. Auth is JWT (cookie + bearer); authorization is **CASL** (`core/casl`) — every data access is scoped to the user's abilities.
### Module structure (server)

View File

@@ -12,6 +12,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
- **Editable captions for images.** Images gain an optional caption shown
below them, edited inline from the image bubble menu and stored as a `caption` attribute. Captions round-trip
losslessly through markdown as a `data-caption` attribute on the image, so
they survive export/import unchanged. (#221)
- **Quick-create regular and temporary notes from the Home and Space screens.**
The Home screen now shows a second action next to "New note" that creates a
*temporary* note (one that auto-moves to Trash after the workspace lifetime),
@@ -129,6 +134,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
"This address is in use. Saving will move it to this page." — and keeps Save
enabled, so the existing reassign-confirm flow (`409 ALIAS_REASSIGN_REQUIRED`
"Move custom address?") is discoverable instead of reading as terminal. (#227)
- **A non-empty page can no longer be silently lost to a momentarily-empty live
document.** The server's persistence guard now refuses to overwrite non-empty
persisted content with an empty live Y.Doc — a transient emptiness from a
glitch, a bad merge, or an emptying transclusion no longer wipes the saved
page. A *deliberate* clear still works: a select-all + Delete in the editor
emits a single-use "intentional clear" signal that lets exactly that one empty
write through the guard, so genuinely emptying a page is persisted while
accidental empties are blocked. (#248, #251)
### Security
@@ -501,6 +514,7 @@ knowledge layer, an embedded MCP server, and the Gitmost rebrand.
- Build: drop the private EE submodule, retarget CI to GHCR, and update the
Docker image to the GHCR registry.
[Unreleased]: https://github.com/vvzvlad/gitmost/compare/v0.93.0...HEAD
[Unreleased]: https://github.com/vvzvlad/gitmost/compare/v0.94.0...HEAD
[0.94.0]: https://github.com/vvzvlad/gitmost/compare/v0.93.0...v0.94.0
[0.93.0]: https://github.com/vvzvlad/gitmost/compare/v0.91.0...v0.93.0
[0.91.0]: https://github.com/vvzvlad/gitmost/compare/v0.90.1...v0.91.0

View File

@@ -286,6 +286,9 @@
"Alt text": "Alt text",
"Describe this for accessibility.": "Describe this for accessibility.",
"Add a description": "Add a description",
"Caption": "Caption",
"Add a caption": "Add a caption",
"Shown below the image.": "Shown below the image.",
"Justify": "Justify",
"Merge cells": "Merge cells",
"Split cell": "Split cell",

View File

@@ -0,0 +1,206 @@
import { describe, it, expect, vi, beforeEach } from "vitest";
import { renderHook, act } from "@testing-library/react";
// Shared, hoisted test state the module mocks write into. `onSpeechEnd` is the
// VAD callback the hook registers on MicVAD.new — capturing it lets us drive
// "a speech segment ended" deterministically. `pending` collects the deferred
// transcription promises so the test controls their resolution order, which is
// the whole point: out-of-order HTTP responses must NOT scramble the emitted
// text (the in-order emitter under test).
const h = vi.hoisted(() => {
return {
onSpeechEnd: null as null | ((audio: Float32Array) => void),
pending: [] as { resolve: (s: string) => void; reject: (e: unknown) => void }[],
notify: null as null | ReturnType<typeof Object>,
};
});
// Lazy-imported VAD: capture the onSpeechEnd handler and hand back a no-op
// instance (start/pause/destroy all resolve).
vi.mock("@ricky0123/vad-web", () => ({
MicVAD: {
new: vi.fn(async (opts: { onSpeechEnd: (a: Float32Array) => void }) => {
h.onSpeechEnd = opts.onSpeechEnd;
return {
start: vi.fn(async () => {}),
pause: vi.fn(async () => {}),
destroy: vi.fn(async () => {}),
};
}),
},
}));
// Each transcribeAudio call returns a promise we resolve/reject by index.
vi.mock("@/features/dictation/services/dictation-service", () => ({
transcribeAudio: vi.fn(
() =>
new Promise<string>((resolve, reject) => {
h.pending.push({ resolve, reject });
}),
),
}));
// Avoid real WAV encoding; the segment payload is irrelevant to ordering.
vi.mock("@/features/dictation/utils/encode-wav", () => ({
encodeWavPcm16: vi.fn(() => new Blob()),
}));
const notifyShow = vi.fn();
vi.mock("@mantine/notifications", () => ({
notifications: { show: (...args: unknown[]) => notifyShow(...args) },
}));
vi.mock("react-i18next", () => ({
useTranslation: () => ({ t: (s: string) => s }),
}));
import { useStreamingDictation } from "./use-streaming-dictation";
// jsdom has no AudioContext; the hook constructs one and calls resume(). A
// trivial stub is enough — the real audio path is irrelevant to ordering.
class FakeAudioContext {
state = "running";
resume() {
return Promise.resolve();
}
close() {
this.state = "closed";
return Promise.resolve();
}
}
async function startRecording(onText: (t: string) => void) {
const hook = renderHook(() => useStreamingDictation({ onText }));
await act(async () => {
await hook.result.current.start();
});
// The VAD registered its onSpeechEnd and start() resolved into "recording".
expect(h.onSpeechEnd).toBeTypeOf("function");
expect(hook.result.current.status).toBe("recording");
return hook;
}
// Fire N ended speech segments (seq 0..N-1), each kicking off one transcription.
async function emitSegments(n: number) {
await act(async () => {
for (let i = 0; i < n; i++) h.onSpeechEnd!(new Float32Array(8));
});
}
describe("useStreamingDictation — in-order segment emitter", () => {
beforeEach(() => {
vi.clearAllMocks();
h.onSpeechEnd = null;
h.pending = [];
notifyShow.mockClear();
(window as unknown as { AudioContext: unknown }).AudioContext =
FakeAudioContext;
});
it("emits transcriptions in segment order even when responses resolve out of order", async () => {
const emitted: string[] = [];
await startRecording((t) => emitted.push(t));
await emitSegments(3);
expect(h.pending).toHaveLength(3);
// Resolve seq 1 FIRST: it must be buffered, not emitted, because seq 0 is
// still outstanding (nextEmit == 0).
await act(async () => {
h.pending[1].resolve("second");
});
expect(emitted).toEqual([]);
// Resolve seq 0: this unblocks the buffer and flushes 0 then 1 in order.
await act(async () => {
h.pending[0].resolve("first");
});
expect(emitted).toEqual(["first", "second"]);
// seq 2 resolves last and flushes immediately (it is now next).
await act(async () => {
h.pending[2].resolve("third");
});
expect(emitted).toEqual(["first", "second", "third"]);
});
it("trims whitespace and drops empty/whitespace-only transcriptions while still advancing", async () => {
const emitted: string[] = [];
await startRecording((t) => emitted.push(t));
await emitSegments(3);
await act(async () => {
h.pending[0].resolve(" hello "); // leading/trailing space trimmed
h.pending[1].resolve(" "); // whitespace-only -> not emitted, but seq advances
h.pending[2].resolve("world");
});
expect(emitted).toEqual(["hello", "world"]);
});
it("a failed segment shows one notification and is skipped so later segments still flush in order", async () => {
const emitted: string[] = [];
await startRecording((t) => emitted.push(t));
await emitSegments(2);
// seq 0 fails: the user sees a notification and the emitter advances past it.
await act(async () => {
h.pending[0].reject({ message: "boom" });
});
expect(notifyShow).toHaveBeenCalledTimes(1);
expect(emitted).toEqual([]);
// seq 1 still flushes (it is now next), proving one failure did not stall.
await act(async () => {
h.pending[1].resolve("survivor");
});
expect(emitted).toEqual(["survivor"]);
});
it("an OUT-OF-ORDER failed segment is buffered as empty and skipped without stalling later text", async () => {
const emitted: string[] = [];
await startRecording((t) => emitted.push(t));
await emitSegments(3);
// seq 1 (NOT next-to-emit) fails first: it takes the else branch — an empty
// placeholder is buffered (resultsRef.set(seq, "")) so the emitter can later
// skip it. One notification, nothing emitted yet (seq 0 still gates).
await act(async () => {
h.pending[1].reject({ message: "boom" });
});
expect(notifyShow).toHaveBeenCalledTimes(1);
expect(emitted).toEqual([]);
// seq 0 flushes; the drain then reaches the buffered empty seq 1 and SKIPS
// past it to seq 2.
await act(async () => {
h.pending[0].resolve("alpha");
});
expect(emitted).toEqual(["alpha"]);
// seq 2 emits — proving the empty placeholder let the emitter advance past
// the failed seq 1. Without the else branch's placeholder the drain would
// stall at the missing seq 1 and "gamma" would never flush.
await act(async () => {
h.pending[2].resolve("gamma");
});
expect(emitted).toEqual(["alpha", "gamma"]);
});
it("ignores a transcription that resolves AFTER cancel() (stale epoch — no emit)", async () => {
const emitted: string[] = [];
const hook = await startRecording((t) => emitted.push(t));
await emitSegments(1);
// Hard discard the session: the in-flight request is now stale.
act(() => {
hook.result.current.cancel();
});
expect(hook.result.current.status).toBe("idle");
// Its late resolution must be dropped (no emit into the new/empty session).
await act(async () => {
h.pending[0].resolve("late");
});
expect(emitted).toEqual([]);
});
});

View File

@@ -10,6 +10,7 @@ import {
IconUnderline,
IconMessage,
IconEyeOff,
IconClearFormatting,
} from "@tabler/icons-react";
import clsx from "clsx";
import classes from "./bubble-menu.module.css";
@@ -117,6 +118,14 @@ export const EditorBubbleMenu: FC<EditorBubbleMenuProps> = (props) => {
command: () => props.editor.chain().focus().toggleSpoiler().run(),
icon: IconEyeOff,
},
{
name: "Clear formatting",
// Action, not a toggle — never show an active/highlighted state.
isActive: () => false,
// Mirror the fixed-toolbar behavior: strip all inline marks from the selection.
command: () => props.editor.chain().focus().unsetAllMarks().run(),
icon: IconClearFormatting,
},
];
const commentItem: BubbleMenuItem = {

View File

@@ -1,16 +1,7 @@
import React, { useCallback, useEffect, useState } from "react";
import { Editor } from "@tiptap/react";
import {
ActionIcon,
Button,
Group,
Paper,
Text,
Textarea,
Tooltip,
} from "@mantine/core";
import { IconAlt } from "@tabler/icons-react";
import { useTranslation } from "react-i18next";
import { useImageTextFieldControl } from "@/features/editor/components/common/use-image-text-field-control.tsx";
const ALT_MAX_LENGTH = 300;
@@ -27,113 +18,25 @@ type UseAltTextControlArgs = {
currentAlt: string;
};
// Thin wrapper over the shared image text-field popover; see
// useImageTextFieldControl. The t("...") literals stay here so they remain
// statically extractable for i18n.
export function useAltTextControl({
editor,
nodeName,
currentAlt,
}: UseAltTextControlArgs) {
const { t } = useTranslation();
const [showInput, setShowInput] = useState(false);
const [draft, setDraft] = useState("");
const open = useCallback(() => {
setDraft(currentAlt || "");
setShowInput(true);
}, [currentAlt]);
useEffect(() => {
const handler = () => {
if (!editor.isActive(nodeName)) {
setShowInput(false);
}
};
editor.on("selectionUpdate", handler);
return () => {
editor.off("selectionUpdate", handler);
};
}, [editor, nodeName]);
const cancel = useCallback(() => {
setShowInput(false);
}, []);
const save = useCallback(() => {
editor
.chain()
.focus(undefined, { scrollIntoView: false })
.updateAttributes(nodeName, { alt: sanitizeAlt(draft) || undefined })
.run();
setShowInput(false);
}, [editor, nodeName, draft]);
const onKeyDown = useCallback(
(e: React.KeyboardEvent) => {
if (e.key === "Enter" && (e.metaKey || e.ctrlKey)) {
e.preventDefault();
save();
} else if (e.key === "Escape") {
e.preventDefault();
cancel();
}
},
[save, cancel],
);
const button = (
<Tooltip position="top" label={t("Alt text")} withinPortal={false}>
<ActionIcon
onClick={open}
size="lg"
aria-label={t("Alt text")}
variant="subtle"
>
<IconAlt size={18} />
</ActionIcon>
</Tooltip>
);
const panel = showInput ? (
<Paper
withBorder
shadow="md"
radius={6}
p="sm"
w={320}
style={{ position: "relative", zIndex: 100 }}
>
<Text size="sm" fw={600} mb={2}>
{t("Alt text")}
</Text>
<Text size="xs" c="dimmed" mb="xs">
{t("Describe this for accessibility.")}
</Text>
<Textarea
size="xs"
placeholder={t("Add a description")}
value={draft}
onChange={(e) => setDraft(e.currentTarget.value)}
onKeyDown={onKeyDown}
autoFocus
autosize
minRows={2}
maxRows={5}
maxLength={ALT_MAX_LENGTH}
/>
<Group justify="space-between" align="center" mt="xs" wrap="nowrap">
<Text size="xs" c="dimmed">
{draft.length}/{ALT_MAX_LENGTH}
</Text>
<Group gap="xs">
<Button size="compact-xs" variant="default" onClick={cancel}>
{t("Cancel")}
</Button>
<Button size="compact-xs" onClick={save}>
{t("Save")}
</Button>
</Group>
</Group>
</Paper>
) : null;
return { button, panel, isEditing: showInput };
return useImageTextFieldControl({
editor,
nodeName,
currentValue: currentAlt,
attrName: "alt",
sanitize: sanitizeAlt,
maxLength: ALT_MAX_LENGTH,
icon: <IconAlt size={18} />,
label: t("Alt text"),
description: t("Describe this for accessibility."),
placeholder: t("Add a description"),
});
}

View File

@@ -0,0 +1,59 @@
import { describe, it, expect } from "vitest";
import { sanitizeCaption } from "@/features/editor/components/common/use-caption-control.tsx";
/**
* `sanitizeCaption` = collapse every whitespace run to a single space + trim +
* cap at 500 chars. Captions are plain visible text, so this is a softer
* normalization than alt-text sanitization.
*/
describe("sanitizeCaption", () => {
it("trims leading and trailing whitespace", () => {
expect(sanitizeCaption(" hello ")).toBe("hello");
});
it("collapses internal whitespace runs to a single space", () => {
expect(sanitizeCaption("a b c")).toBe("a b c");
});
it("treats tab, newline and CRLF as whitespace", () => {
expect(sanitizeCaption("a\tb")).toBe("a b");
expect(sanitizeCaption("a\nb")).toBe("a b");
expect(sanitizeCaption("a\r\nb")).toBe("a b");
expect(sanitizeCaption("line1\n\n\nline2")).toBe("line1 line2");
});
it("treats unicode whitespace (no-break space) as a separator", () => {
// U+00A0 NO-BREAK SPACE is matched by the \s class.
expect(sanitizeCaption("a b")).toBe("a b");
});
it("returns empty string for whitespace-only input", () => {
expect(sanitizeCaption(" ")).toBe("");
expect(sanitizeCaption("")).toBe("");
});
it("keeps a caption at the 500-char limit unchanged", () => {
const exact = "x".repeat(500);
expect(sanitizeCaption(exact)).toHaveLength(500);
expect(sanitizeCaption(exact)).toBe(exact);
});
it("slices a caption longer than 500 chars down to 500", () => {
const tooLong = "y".repeat(600);
const result = sanitizeCaption(tooLong);
expect(result).toHaveLength(500);
expect(result).toBe("y".repeat(500));
});
it("collapses whitespace before applying the 500-char cap", () => {
// 120 "a b " groups (600 raw chars) collapse to "a b a b ..." = 479 chars
// after trimming the trailing space, which stays under the 500 cap — so only
// the collapse is exercised here, no slice. (See the dedicated >500 test
// above for the slice boundary.)
const input = "a b ".repeat(120); // lots of double spaces
const result = sanitizeCaption(input);
expect(result).toHaveLength(479);
expect(result.length).toBeLessThanOrEqual(500);
expect(result).not.toMatch(/\s{2,}/);
});
});

View File

@@ -0,0 +1,42 @@
import { Editor } from "@tiptap/react";
import { IconTextCaption } from "@tabler/icons-react";
import { useTranslation } from "react-i18next";
import { useImageTextFieldControl } from "@/features/editor/components/common/use-image-text-field-control.tsx";
const CAPTION_MAX_LENGTH = 500;
// Caption is plain visible text (not a markdown link target like alt), so it is
// sanitized more softly than alt: collapse runs of whitespace/newlines into a
// single space and trim, keeping the limit generous.
export function sanitizeCaption(value: string): string {
return value.replace(/\s+/g, " ").trim().slice(0, CAPTION_MAX_LENGTH);
}
type UseCaptionControlArgs = {
editor: Editor;
nodeName: string;
currentCaption: string;
};
// Thin wrapper over the shared image text-field popover; see
// useImageTextFieldControl. The t("...") literals stay here so they remain
// statically extractable for i18n.
export function useCaptionControl({
editor,
nodeName,
currentCaption,
}: UseCaptionControlArgs) {
const { t } = useTranslation();
return useImageTextFieldControl({
editor,
nodeName,
currentValue: currentCaption,
attrName: "caption",
sanitize: sanitizeCaption,
maxLength: CAPTION_MAX_LENGTH,
icon: <IconTextCaption size={18} />,
label: t("Caption"),
description: t("Shown below the image."),
placeholder: t("Add a caption"),
});
}

View File

@@ -0,0 +1,145 @@
import React, { useCallback, useEffect, useState } from "react";
import { Editor } from "@tiptap/react";
import {
ActionIcon,
Button,
Group,
Paper,
Text,
Textarea,
Tooltip,
} from "@mantine/core";
import { useTranslation } from "react-i18next";
// Shared logic+UI for the image bubble-menu text-field popovers (alt text,
// caption, ...). Each field is the same popover — an ActionIcon that opens a
// titled Paper with a counted Textarea and Cancel/Save — differing only in the
// node attribute it writes, its sanitizer, length cap, icon and labels. The
// label/description/placeholder are passed already translated so the literal
// t("...") calls stay in the thin wrappers and remain extractable; the shared
// Cancel/Save strings are translated here.
type UseImageTextFieldControlArgs = {
editor: Editor;
nodeName: string;
currentValue: string;
attrName: string;
sanitize: (value: string) => string;
maxLength: number;
icon: React.ReactNode;
label: string;
description: string;
placeholder: string;
};
export function useImageTextFieldControl({
editor,
nodeName,
currentValue,
attrName,
sanitize,
maxLength,
icon,
label,
description,
placeholder,
}: UseImageTextFieldControlArgs) {
const { t } = useTranslation();
const [showInput, setShowInput] = useState(false);
const [draft, setDraft] = useState("");
const open = useCallback(() => {
setDraft(currentValue || "");
setShowInput(true);
}, [currentValue]);
useEffect(() => {
const handler = () => {
if (!editor.isActive(nodeName)) {
setShowInput(false);
}
};
editor.on("selectionUpdate", handler);
return () => {
editor.off("selectionUpdate", handler);
};
}, [editor, nodeName]);
const cancel = useCallback(() => {
setShowInput(false);
}, []);
const save = useCallback(() => {
editor
.chain()
.focus(undefined, { scrollIntoView: false })
.updateAttributes(nodeName, { [attrName]: sanitize(draft) || undefined })
.run();
setShowInput(false);
}, [editor, nodeName, attrName, sanitize, draft]);
const onKeyDown = useCallback(
(e: React.KeyboardEvent) => {
if (e.key === "Enter" && (e.metaKey || e.ctrlKey)) {
e.preventDefault();
save();
} else if (e.key === "Escape") {
e.preventDefault();
cancel();
}
},
[save, cancel],
);
const button = (
<Tooltip position="top" label={label} withinPortal={false}>
<ActionIcon onClick={open} size="lg" aria-label={label} variant="subtle">
{icon}
</ActionIcon>
</Tooltip>
);
const panel = showInput ? (
<Paper
withBorder
shadow="md"
radius={6}
p="sm"
w={320}
style={{ position: "relative", zIndex: 100 }}
>
<Text size="sm" fw={600} mb={2}>
{label}
</Text>
<Text size="xs" c="dimmed" mb="xs">
{description}
</Text>
<Textarea
size="xs"
placeholder={placeholder}
value={draft}
onChange={(e) => setDraft(e.currentTarget.value)}
onKeyDown={onKeyDown}
autoFocus
autosize
minRows={2}
maxRows={5}
maxLength={maxLength}
/>
<Group justify="space-between" align="center" mt="xs" wrap="nowrap">
<Text size="xs" c="dimmed">
{draft.length}/{maxLength}
</Text>
<Group gap="xs">
<Button size="compact-xs" variant="default" onClick={cancel}>
{t("Cancel")}
</Button>
<Button size="compact-xs" onClick={save}>
{t("Save")}
</Button>
</Group>
</Group>
</Paper>
) : null;
return { button, panel, isEditing: showInput };
}

View File

@@ -23,6 +23,7 @@ import { useTranslation } from "react-i18next";
import { getFileUrl } from "@/lib/config.ts";
import { uploadImageAction } from "@/features/editor/components/image/upload-image-action.tsx";
import { useAltTextControl } from "@/features/editor/components/common/use-alt-text-control.tsx";
import { useCaptionControl } from "@/features/editor/components/common/use-caption-control.tsx";
import classes from "../common/toolbar-menu.module.css";
export function ImageMenu({ editor }: EditorMenuProps) {
@@ -47,6 +48,7 @@ export function ImageMenu({ editor }: EditorMenuProps) {
isFloatRight: ctx.editor.isActive("image", { align: "floatRight" }),
src: imageAttrs?.src || null,
alt: imageAttrs?.alt || "",
caption: imageAttrs?.caption || "",
};
},
});
@@ -168,6 +170,16 @@ export function ImageMenu({ editor }: EditorMenuProps) {
currentAlt: editorState?.alt || "",
});
const {
button: captionButton,
panel: captionPanel,
isEditing: isEditingCaption,
} = useCaptionControl({
editor,
nodeName: "image",
currentCaption: editorState?.caption || "",
});
return (
<BaseBubbleMenu
editor={editor}
@@ -183,6 +195,8 @@ export function ImageMenu({ editor }: EditorMenuProps) {
>
{isEditingAlt ? (
altTextPanel
) : isEditingCaption ? (
captionPanel
) : (
<div className={classes.toolbar}>
<Tooltip position="top" label={t("Align left")} withinPortal={false}>
@@ -249,6 +263,8 @@ export function ImageMenu({ editor }: EditorMenuProps) {
{altTextButton}
{captionButton}
<div className={classes.divider} />
<Tooltip position="top" label={t("Download")} withinPortal={false}>

View File

@@ -9,7 +9,9 @@ import { useTranslation } from "react-i18next";
export default function ImageView(props: NodeViewProps) {
const { t } = useTranslation();
const { editor, node, selected } = props;
const { src, width, align, alt, aspectRatio, placeholder } = node.attrs;
const { src, width, align, alt, caption, aspectRatio, placeholder } =
node.attrs;
const captionText = (caption || "").trim();
const alignClass = useMemo(() => {
if (align === "left") return "alignLeft";
if (align === "right") return "alignRight";
@@ -29,6 +31,7 @@ export default function ImageView(props: NodeViewProps) {
return (
<NodeViewWrapper data-drag-handle>
<figure style={{ margin: 0 }}>
<div
className={clsx(
selected && "ProseMirror-selectednode",
@@ -66,6 +69,15 @@ export default function ImageView(props: NodeViewProps) {
</Group>
)}
</div>
{captionText && (
<Text
component="figcaption"
className="image-caption"
>
{captionText}
</Text>
)}
</figure>
</NodeViewWrapper>
);
}

View File

@@ -0,0 +1,194 @@
import { describe, it, expect, vi, beforeEach } from "vitest";
// Mock the page-service so importing the module under test does not pull in the
// axios/api-client chain. `createMentionAction` is wired to `getPageById`; the
// spy lets us assert that wiring without any network. `vi.hoisted` keeps the spy
// available inside the hoisted vi.mock factory.
const { getPageById } = vi.hoisted(() => ({ getPageById: vi.fn() }));
vi.mock("@/features/page/services/page-service.ts", () => ({
getPageById,
}));
// `uuid` v7 is used for the mention node id; pin only v7 so assertions are
// stable, keeping the rest (e.g. `validate`, used by extractPageSlugId) real.
vi.mock("uuid", async (importOriginal) => ({
...(await importOriginal<typeof import("uuid")>()),
v7: () => "fixed-mention-uuid",
}));
import {
handleInternalLink,
createMentionAction,
} from "./internal-link-paste";
// Minimal ProseMirror-ish EditorView fake. We record what handleInternalLink
// builds and dispatches without standing up a real schema/state.
function makeView() {
const tr = {
replaceWith: vi.fn(function (this: unknown) {
return tr;
}),
insertText: vi.fn(function (this: unknown) {
return tr;
}),
addMark: vi.fn(function (this: unknown) {
return tr;
}),
};
const schema = {
nodes: {
mention: {
// Echo the attrs back so we can assert exactly what was created.
create: vi.fn((attrs: Record<string, unknown>) => ({
type: "mention",
attrs,
})),
},
},
marks: {
link: {
create: vi.fn((attrs: Record<string, unknown>) => ({
type: "link",
attrs,
})),
},
},
};
const view = {
state: { schema, tr },
dispatch: vi.fn(),
};
return { view, tr, schema };
}
describe("handleInternalLink", () => {
beforeEach(() => vi.clearAllMocks());
it("does nothing when validateFn rejects the url (no resolve, no dispatch)", async () => {
const onResolveLink = vi.fn();
const validateFn = vi.fn(() => false);
const { view } = makeView();
await handleInternalLink({ validateFn, onResolveLink })(
"any-url",
view as never,
3,
"creator-1",
);
expect(validateFn).toHaveBeenCalledWith("any-url", view);
expect(onResolveLink).not.toHaveBeenCalled();
expect(view.dispatch).not.toHaveBeenCalled();
});
it("on resolve: inserts a mention node carrying the resolved page + anchor and dispatches replaceWith at pos", async () => {
const page = {
id: "page-id-99",
title: "My Page",
slugId: "slugABC",
};
const onResolveLink = vi.fn().mockResolvedValue(page);
const { view, tr, schema } = makeView();
// extractPageSlugId("doc-slug-xyz789") -> "xyz789" (last hyphen segment).
await handleInternalLink({ validateFn: () => true, onResolveLink })(
"doc-slug-xyz789",
view as never,
5,
"creator-7",
"anchor-42",
);
// The linked page id is the extracted slug-id, not the whole url.
expect(onResolveLink).toHaveBeenCalledWith("xyz789", "creator-7");
expect(schema.nodes.mention.create).toHaveBeenCalledWith({
id: "fixed-mention-uuid",
label: "My Page",
entityType: "page",
entityId: "page-id-99",
slugId: "slugABC",
creatorId: "creator-7",
anchorId: "anchor-42",
});
expect(tr.replaceWith).toHaveBeenCalledWith(5, 5, {
type: "mention",
attrs: expect.objectContaining({ entityId: "page-id-99" }),
});
expect(tr.insertText).not.toHaveBeenCalled();
expect(view.dispatch).toHaveBeenCalledTimes(1);
expect(view.dispatch).toHaveBeenCalledWith(tr);
});
it("falls back to 'Untitled' label when the resolved page has no title", async () => {
const onResolveLink = vi
.fn()
.mockResolvedValue({ id: "p", title: "", slugId: "s" });
const { view, schema } = makeView();
await handleInternalLink({ validateFn: () => true, onResolveLink })(
"abc-id1",
view as never,
0,
"c",
);
expect(schema.nodes.mention.create).toHaveBeenCalledWith(
expect.objectContaining({ label: "Untitled" }),
);
});
it("on reject: inserts the raw url as plain text with a link mark and dispatches", async () => {
const onResolveLink = vi.fn().mockRejectedValue(new Error("not found"));
const { view, tr, schema } = makeView();
await handleInternalLink({ validateFn: () => true, onResolveLink })(
"http://x/page-id2",
view as never,
4,
"creator-1",
);
// No mention node on the failure path.
expect(schema.nodes.mention.create).not.toHaveBeenCalled();
expect(tr.insertText).toHaveBeenCalledWith("http://x/page-id2", 4);
expect(schema.marks.link.create).toHaveBeenCalledWith({
href: "http://x/page-id2",
});
// Mark spans exactly the inserted url text: [pos, pos + url.length].
expect(tr.addMark).toHaveBeenCalledWith(4, 4 + "http://x/page-id2".length, {
type: "link",
attrs: { href: "http://x/page-id2" },
});
expect(view.dispatch).toHaveBeenCalledTimes(1);
});
});
describe("createMentionAction", () => {
beforeEach(() => vi.clearAllMocks());
it("resolves the link via getPageById and inserts the mention", async () => {
getPageById.mockResolvedValue({
id: "real-page",
title: "Real",
slugId: "rslug",
});
const { view, schema } = makeView();
await createMentionAction("ref-pageABC", view as never, 2, "creator-9");
expect(getPageById).toHaveBeenCalledWith({ pageId: "pageABC" });
expect(schema.nodes.mention.create).toHaveBeenCalledWith(
expect.objectContaining({ entityId: "real-page", label: "Real" }),
);
});
it("propagates a getPageById failure to the plain-link fallback", async () => {
getPageById.mockRejectedValue(new Error("404"));
const { view, tr } = makeView();
await createMentionAction("ref-pageABC", view as never, 1, "creator-9");
// Failure path: the url is inserted as text, not as a mention node.
expect(tr.insertText).toHaveBeenCalledWith("ref-pageABC", 1);
});
});

View File

@@ -125,6 +125,7 @@ import { countWords } from "alfaaz";
import AutoJoiner from "@/features/editor/extensions/autojoiner.ts";
import GlobalDragHandle from "@/features/editor/extensions/drag-handle.ts";
import { CleanStyles } from "@/features/editor/extensions/clean-styles.ts";
import { IntentionalClear } from "@/features/editor/extensions/intentional-clear.ts";
const lowlight = createLowlight(common);
lowlight.register("mermaid", plaintext);
@@ -493,4 +494,10 @@ export const collabExtensions: CollabExtensions = (provider, user) => [
color: randomElement(userColors),
},
}),
// #251 — emit an intentional-clear signal to the server when the user
// deliberately empties the page, so the #248 store-side empty-guard lets that
// one clear through while still blocking accidental empties.
IntentionalClear.configure({
provider,
}),
];

View File

@@ -0,0 +1,120 @@
import { describe, it, expect, vi, beforeEach } from "vitest";
import { Editor } from "@tiptap/core";
import { Document } from "@tiptap/extension-document";
import { Paragraph } from "@tiptap/extension-paragraph";
import { Text } from "@tiptap/extension-text";
import { ySyncPluginKey } from "@tiptap/y-tiptap";
import {
IntentionalClear,
INTENTIONAL_CLEAR_MESSAGE_TYPE,
} from "./intentional-clear";
/**
* #251 — the intentional-clear signal is driven through the REAL editor path:
* a fresh Editor with the IntentionalClear extension, a fake provider that
* records sendStateless, and the actual select-all + delete command the user's
* keystroke runs. No hand-poke of any flag.
*/
describe("IntentionalClear extension", () => {
let sendStateless: ReturnType<typeof vi.fn>;
const makeEditor = (content: unknown) =>
new Editor({
extensions: [
Document,
Paragraph,
Text,
IntentionalClear.configure({
// Minimal provider stand-in: only sendStateless is exercised.
provider: { sendStateless } as any,
}),
],
content: content as any,
});
beforeEach(() => {
sendStateless = vi.fn();
});
it("emits the clear signal when a user empties a non-empty doc (select-all + delete)", () => {
const editor = makeEditor({
type: "doc",
content: [
{ type: "paragraph", content: [{ type: "text", text: "hello world" }] },
],
});
// The exact command path a select-all + Delete keystroke dispatches.
editor.chain().selectAll().deleteSelection().run();
expect(sendStateless).toHaveBeenCalledTimes(1);
const payload = JSON.parse(sendStateless.mock.calls[0][0]);
expect(payload).toEqual({ type: INTENTIONAL_CLEAR_MESSAGE_TYPE });
editor.destroy();
});
it("does NOT emit when typing into an empty doc (no non-empty → empty transition)", () => {
const editor = makeEditor({ type: "doc", content: [{ type: "paragraph" }] });
editor.chain().insertContent("typed text").run();
expect(sendStateless).not.toHaveBeenCalled();
editor.destroy();
});
it("does NOT emit on an edit that leaves the doc non-empty", () => {
const editor = makeEditor({
type: "doc",
content: [
{ type: "paragraph", content: [{ type: "text", text: "keep me" }] },
],
});
editor.chain().insertContent(" more").run();
expect(sendStateless).not.toHaveBeenCalled();
editor.destroy();
});
it("does NOT emit when a REMOTE/merge (change-origin) transaction empties the doc", () => {
// This pins the CENTRAL #248 protection: only a LOCAL user edit may emit the
// intentional-clear signal. An emptiness arriving from another client, a bad
// merge, or an emptied transclusion is applied as a y-sync transaction tagged
// with the ySyncPluginKey meta, which `isChangeOrigin` detects. The extension
// must early-return on it and NOT punch the empty write through the server
// guard.
const editor = makeEditor({
type: "doc",
content: [
{ type: "paragraph", content: [{ type: "text", text: "remote content" }] },
],
});
// Build a transaction that empties the non-empty doc and tag it exactly the
// way y-tiptap tags a remote y-sync update: `tr.setMeta(ySyncPluginKey,
// { isChangeOrigin: true })` (see @tiptap/y-tiptap sync-plugin). This makes
// the real `isChangeOrigin(tr)` predicate return true — not a stand-in.
const { state } = editor;
const tr = state.tr
.delete(0, state.doc.content.size)
.setMeta(ySyncPluginKey, { isChangeOrigin: true });
editor.view.dispatch(tr);
// The transaction really emptied the doc (became the single empty paragraph)…
expect(editor.state.doc.textContent).toBe("");
// …yet because it is change-origin, no signal is emitted.
expect(sendStateless).not.toHaveBeenCalled();
editor.destroy();
});
it("does NOT emit when the doc was already empty", () => {
const editor = makeEditor({ type: "doc", content: [{ type: "paragraph" }] });
// Selecting all + delete on an already-empty doc is a no-op transition.
editor.chain().selectAll().deleteSelection().run();
expect(sendStateless).not.toHaveBeenCalled();
editor.destroy();
});
});

View File

@@ -0,0 +1,94 @@
import { Extension } from "@tiptap/core";
import { isChangeOrigin } from "@tiptap/extension-collaboration";
import type { Node as PMNode } from "@tiptap/pm/model";
import type { HocuspocusProvider } from "@hocuspocus/provider";
/**
* Stateless message type sent to the server when a user deliberately clears a
* page to empty. Kept in one place so the client emitter and the server
* consumer (PersistenceExtension.onStateless) agree on the wire format.
*/
export const INTENTIONAL_CLEAR_MESSAGE_TYPE = "intentional-clear";
export interface IntentionalClearOptions {
/** The collab provider used to send the stateless clear signal. */
provider: HocuspocusProvider | null;
}
/**
* A "document is empty" check that mirrors the server's `isEmptyParagraphDoc`
* (collaboration.util.ts): exactly one top-level paragraph with no inline
* content. After a select-all + delete TipTap leaves precisely this shape, so
* matching it here keeps the client signal aligned with the server guard that
* consumes it.
*/
function isEmptyParagraphDoc(doc: PMNode): boolean {
if (doc.childCount !== 1) return false;
const child = doc.firstChild;
return (
child !== null &&
child !== undefined &&
child.type.name === "paragraph" &&
child.content.size === 0
);
}
/**
* #251 — intentional-clear signal.
*
* The server's #248 store-side empty-guard unconditionally refuses to overwrite
* non-empty persisted content with an empty document, because a momentarily
* empty live Y.Doc (a glitch, a bad merge, an emptying transclusion) is
* indistinguishable from a real clear *at the store layer*. That protection is
* correct, but it also blocks a user who genuinely wants to empty the page.
*
* This extension supplies the missing distinction. It watches LOCAL, user-driven
* transactions and, the moment one reduces a non-empty document to the empty
* single-paragraph shape, it sends a hocuspocus stateless message to the server.
* The server records a short-lived, single-use "intentional clear pending" flag
* for this document that the next (debounced) onStoreDocument consumes to let
* that one empty write through the guard.
*
* What counts as an intentional clear (precise definition):
* - the transaction actually changed the document (`docChanged`), AND
* - it is a LOCAL user edit, not a remote collab application — remote y-sync
* transactions are tagged and filtered out via `isChangeOrigin`, so an
* emptiness that arrives from another client / a merge never emits a signal,
* AND
* - the document was non-empty before the transaction and is the empty
* single-paragraph doc after it.
*
* This is exactly the select-all + Delete / Backspace (or any local command that
* empties the doc, e.g. clearContent) keystroke path. A transient/programmatic
* empty serialization that the server might see on the wire does NOT come with
* this signal, so the guard still blocks it.
*/
export const IntentionalClear = Extension.create<IntentionalClearOptions>({
name: "intentionalClear",
addOptions() {
return {
provider: null,
};
},
onTransaction({ transaction }) {
if (!transaction.docChanged) return;
// Only react to local user edits. Remote collaboration steps (and other
// y-sync-applied changes) carry the change origin and must never be treated
// as an intentional clear, otherwise a remote/merge-induced emptiness would
// punch through the server guard.
if (isChangeOrigin(transaction)) return;
const becameEmpty =
!isEmptyParagraphDoc(transaction.before) &&
isEmptyParagraphDoc(transaction.doc);
if (!becameEmpty) return;
// The server reads the originating document from the connection, so the
// payload only needs to declare intent — it cannot target another document.
this.options.provider?.sendStateless(
JSON.stringify({ type: INTENTIONAL_CLEAR_MESSAGE_TYPE }),
);
},
});

View File

@@ -0,0 +1,243 @@
import { describe, it, expect, vi, beforeEach, afterEach } from "vitest";
import { renderHook, act } from "@testing-library/react";
import { useScrollPosition } from "./use-scroll-position";
const KEY_PREFIX = "gitmost:scroll-position:";
function setScrollY(value: number): void {
Object.defineProperty(window, "scrollY", {
configurable: true,
value,
});
}
function setScrollHeight(value: number): void {
Object.defineProperty(document.documentElement, "scrollHeight", {
configurable: true,
value,
});
}
function setInnerHeight(value: number): void {
Object.defineProperty(window, "innerHeight", {
configurable: true,
value,
});
}
describe("useScrollPosition", () => {
beforeEach(() => {
window.sessionStorage.clear();
setScrollY(0);
setScrollHeight(0);
setInnerHeight(800);
// jsdom does not implement window.scrollTo; stub it.
window.scrollTo = vi.fn();
// Ensure no anchor leaks between tests.
window.location.hash = "";
});
afterEach(() => {
vi.restoreAllMocks();
vi.useRealTimers();
window.location.hash = "";
});
it("(a) saves window.scrollY to sessionStorage under the pageId key, throttled", () => {
vi.useFakeTimers();
const { unmount } = renderHook(() => useScrollPosition("p1"));
// Leading-edge save fires immediately.
setScrollY(123);
act(() => {
window.dispatchEvent(new Event("scroll"));
});
expect(window.sessionStorage.getItem(`${KEY_PREFIX}p1`)).toBe("123");
// Within the throttle window the next scroll is suppressed.
setScrollY(456);
act(() => {
window.dispatchEvent(new Event("scroll"));
});
expect(window.sessionStorage.getItem(`${KEY_PREFIX}p1`)).toBe("123");
// After the throttle window elapses, the next scroll persists again.
act(() => {
vi.advanceTimersByTime(250);
});
setScrollY(789);
act(() => {
window.dispatchEvent(new Event("scroll"));
});
expect(window.sessionStorage.getItem(`${KEY_PREFIX}p1`)).toBe("789");
unmount();
});
it("(a2) the restore target is captured at mount and survives a fresh scroll@0 clobber", () => {
vi.useFakeTimers();
// A previous session saved 500.
window.sessionStorage.setItem(`${KEY_PREFIX}clob`, "500");
const { result } = renderHook(() => useScrollPosition("clob"));
// On load the page is at the top; a scroll@0 fires and overwrites storage
// with 0. This is exactly the clobber the synchronous mount-capture defends
// against: the stored value becomes "0", but the target was already captured.
setScrollY(0);
act(() => {
window.dispatchEvent(new Event("scroll"));
});
expect(window.sessionStorage.getItem(`${KEY_PREFIX}clob`)).toBe("0");
// Restore still scrolls to 500 (the captured target), NOT the clobbered 0.
// If the capture were moved into an effect (after handlers register), it
// would read the clobbered 0 and this assertion would fail.
setScrollHeight(2000); // maxScroll = 1200 >= 500
act(() => {
result.current.restoreScrollPosition();
});
expect(window.scrollTo).toHaveBeenCalledWith({ top: 500, behavior: "auto" });
});
it("(a3) restores at most once per mount even if called again", () => {
vi.useFakeTimers();
window.sessionStorage.setItem(`${KEY_PREFIX}once`, "500");
setScrollHeight(2000); // tall enough to restore synchronously
const { result } = renderHook(() => useScrollPosition("once"));
act(() => {
result.current.restoreScrollPosition();
});
expect(window.scrollTo).toHaveBeenCalledTimes(1);
// A second call (e.g. the wiring effect re-running on [showStatic, editor,
// restoreScrollPosition]) must NOT scroll again and yank the reader.
act(() => {
result.current.restoreScrollPosition();
});
expect(window.scrollTo).toHaveBeenCalledTimes(1);
});
it("(b) does not restore when the URL has a #hash anchor", () => {
vi.useFakeTimers();
window.sessionStorage.setItem(`${KEY_PREFIX}p2`, "500");
// Content is ALREADY tall enough (maxScroll = 2000 - 800 = 1200 >= 500), so
// without the hash guard tryRestore would call scrollTo synchronously on the
// first tick. The assertion below therefore genuinely proves the hash guard
// short-circuits before any scroll (not just that the poll has not fired).
setScrollHeight(2000);
window.location.hash = "#some-heading";
const { result } = renderHook(() => useScrollPosition("p2"));
act(() => {
result.current.restoreScrollPosition();
vi.advanceTimersByTime(5000);
});
expect(window.scrollTo).not.toHaveBeenCalled();
});
it("(f) cancels the in-flight restore poll on unmount (no scroll on the next page)", () => {
vi.useFakeTimers();
window.sessionStorage.setItem(`${KEY_PREFIX}p7`, "500");
setInnerHeight(800);
setScrollHeight(100); // maxScroll = -700: target not reachable yet, so it polls.
const { result, unmount } = renderHook(() => useScrollPosition("p7"));
act(() => {
result.current.restoreScrollPosition();
});
expect(window.scrollTo).not.toHaveBeenCalled(); // still polling
// Navigate away (the hook unmounts) BEFORE the content grows tall enough.
unmount();
// Content of the NEXT page becomes tall; advancing time must NOT resurrect
// the cancelled poll (without the cleanup it would scroll the new page).
setScrollHeight(2000);
act(() => {
vi.advanceTimersByTime(5000);
});
expect(window.scrollTo).not.toHaveBeenCalled();
});
it("(c) does nothing when nothing is saved or the saved value is <= 0", () => {
// Nothing saved.
const a = renderHook(() => useScrollPosition("nope"));
act(() => {
a.result.current.restoreScrollPosition();
});
expect(window.scrollTo).not.toHaveBeenCalled();
// Saved value <= 0.
window.sessionStorage.setItem(`${KEY_PREFIX}zero`, "0");
const b = renderHook(() => useScrollPosition("zero"));
act(() => {
b.result.current.restoreScrollPosition();
});
expect(window.scrollTo).not.toHaveBeenCalled();
});
it("(d) scrolls to the saved Y once the content is tall enough", () => {
vi.useFakeTimers();
window.sessionStorage.setItem(`${KEY_PREFIX}p4`, "500");
setInnerHeight(800);
setScrollHeight(100); // maxScroll = -700, target not yet reachable.
const { result } = renderHook(() => useScrollPosition("p4"));
act(() => {
result.current.restoreScrollPosition();
});
// Still polling: content not laid out yet.
expect(window.scrollTo).not.toHaveBeenCalled();
// Content becomes tall enough: maxScroll = 2000 - 800 = 1200 >= 500.
setScrollHeight(2000);
act(() => {
vi.advanceTimersByTime(100);
});
expect(window.scrollTo).toHaveBeenCalledWith({ top: 500, behavior: "auto" });
});
it("(d2) clamps to the max reachable position after the timeout", () => {
vi.useFakeTimers();
window.sessionStorage.setItem(`${KEY_PREFIX}p5`, "5000");
setInnerHeight(800);
setScrollHeight(1000); // maxScroll stays 200, never reaches 5000.
const { result } = renderHook(() => useScrollPosition("p5"));
act(() => {
result.current.restoreScrollPosition();
});
// Advance past the 5s timeout; restore should fire clamped to maxScroll.
act(() => {
vi.advanceTimersByTime(5000);
});
expect(window.scrollTo).toHaveBeenCalledWith({ top: 200, behavior: "auto" });
});
it("(e) never throws when storage access throws", () => {
const err = new Error("storage denied");
vi.spyOn(window.sessionStorage, "getItem").mockImplementation(() => {
throw err;
});
vi.spyOn(window.sessionStorage, "setItem").mockImplementation(() => {
throw err;
});
expect(() => {
const { result, unmount } = renderHook(() => useScrollPosition("p6"));
act(() => {
setScrollY(42);
window.dispatchEvent(new Event("scroll"));
result.current.restoreScrollPosition();
});
unmount();
}).not.toThrow();
});
});

View File

@@ -0,0 +1,177 @@
import { useCallback, useEffect, useRef } from "react";
// Throttle interval for persisting the scroll position while the user reads.
const SAVE_THROTTLE_MS = 250;
// Give up polling for the live content height after this long and restore to
// the furthest reachable position (handles "collab never finishes laying out").
const MAX_RESTORE_WAIT_MS = 5000;
// How often to re-check the document height while waiting for content to load.
const RESTORE_POLL_MS = 100;
// sessionStorage key prefix. sessionStorage survives an F5 in the same tab and
// is cleared on tab close, which is exactly the lifetime we want for an MVP
// "remember where I was reading" feature (self-limiting, no cross-tab leak).
const STORAGE_PREFIX = "gitmost:scroll-position:";
function storageKey(pageId: string): string {
return `${STORAGE_PREFIX}${pageId}`;
}
// All storage access is wrapped: private mode / quota / disabled storage must
// never throw out of the hook and break the page.
function readStorage(pageId: string): number | null {
try {
const raw = window.sessionStorage.getItem(storageKey(pageId));
if (raw === null) return null;
const value = Number.parseInt(raw, 10);
return Number.isFinite(value) ? value : null;
} catch (err) {
// Best-effort feature: storage may be unavailable (private mode / quota).
// No user-facing notification (a missed scroll restore is not actionable),
// but log per the AGENTS.md "errors must never be swallowed" rule.
console.warn("[useScrollPosition] sessionStorage read failed", err);
return null;
}
}
function writeStorage(pageId: string, scrollY: number): void {
try {
window.sessionStorage.setItem(storageKey(pageId), String(Math.round(scrollY)));
} catch (err) {
// Storage unavailable (private mode / quota). Non-actionable for the user,
// but log it rather than swallow silently (AGENTS.md error-handling rule).
console.warn("[useScrollPosition] sessionStorage write failed", err);
}
}
/**
* Persists and restores the window scroll position per page so a reader keeps
* their place across a reload (F5) or reopening the document.
*
* Returns `restoreScrollPosition`, which the page editor calls once the live
* (non-static) content is laid out. The two scroll mechanisms are mutually
* exclusive: if the URL has a `#hash` anchor, the existing anchor-scroll logic
* wins and restore is a no-op.
*/
export function useScrollPosition(pageId: string): {
restoreScrollPosition: () => void;
} {
// CONTRACT: this hook assumes PageEditor REMOUNTS per page — page.tsx renders
// `<MemoizedFullEditor key={page.id} ...>`, so switching pages creates a fresh
// hook instance with fresh refs. These refs latch per-mount and are NOT reset
// when `pageId` changes in place (only the effect re-runs on [pageId]). If that
// `key={page.id}` is ever removed, restore would silently break on the 2nd page
// (refs would hold the first page's target / already-restored flag) — in that
// case the refs must be reset on a pageId change.
//
// The target Y captured synchronously at mount, BEFORE any scroll/visibility
// handler can overwrite the stored value with a fresh 0 (the page starts
// scrolled to top on load). `null` means "not yet captured".
const initialTargetRef = useRef<number | null>(null);
// Guards so restore runs at most once per page mount.
const hasRestoredRef = useRef(false);
// Holds the in-flight restore poll timer so the cleanup can cancel it: without
// this, a fast SPA navigation away mid-poll would let the old page's poll fire
// window.scrollTo against the NEW page's document (visible wrong-page scroll).
const pollTimerRef = useRef<number | null>(null);
// Capture the previously-saved value synchronously during render, before the
// effect below registers handlers that would persist the current (0) scrollY.
if (initialTargetRef.current === null) {
const saved = readStorage(pageId);
// Store 0 when nothing is saved so the "already captured" check (!== null)
// holds; restore treats targetY <= 0 as a no-op anyway.
initialTargetRef.current = saved ?? 0;
}
useEffect(() => {
let throttleTimer: number | null = null;
const save = () => {
writeStorage(pageId, window.scrollY);
};
// Throttle the high-frequency scroll handler: persist immediately on the
// leading edge, then at most once per SAVE_THROTTLE_MS.
const onScroll = () => {
if (throttleTimer !== null) return;
save();
throttleTimer = window.setTimeout(() => {
throttleTimer = null;
}, SAVE_THROTTLE_MS);
};
// pagehide fires on reload/navigation (more reliable than unload); save now.
const onPageHide = () => {
save();
};
// Save when the tab is being backgrounded — covers mobile where pagehide is
// not always emitted.
const onVisibilityChange = () => {
if (document.visibilityState === "hidden") {
save();
}
};
window.addEventListener("scroll", onScroll, { passive: true });
window.addEventListener("pagehide", onPageHide);
document.addEventListener("visibilitychange", onVisibilityChange);
return () => {
window.removeEventListener("scroll", onScroll);
window.removeEventListener("pagehide", onPageHide);
document.removeEventListener("visibilitychange", onVisibilityChange);
if (throttleTimer !== null) {
window.clearTimeout(throttleTimer);
throttleTimer = null;
}
// Cancel any in-flight restore poll so it cannot scroll the next page.
if (pollTimerRef.current !== null) {
window.clearTimeout(pollTimerRef.current);
pollTimerRef.current = null;
}
// SPA navigation away from this page: persist the final position.
save();
};
}, [pageId]);
const restoreScrollPosition = useCallback(() => {
// Run at most once per page mount.
if (hasRestoredRef.current) return;
hasRestoredRef.current = true;
// Anchor priority: a `#hash` in the URL is handled by useEditorScroll.
if (window.location.hash) return;
const targetY = initialTargetRef.current ?? 0;
// Nothing meaningful to restore to.
if (targetY <= 0) return;
const start = Date.now();
const tryRestore = () => {
const maxScroll =
document.documentElement.scrollHeight - window.innerHeight;
const timedOut = Date.now() - start >= MAX_RESTORE_WAIT_MS;
// Restore once the content is tall enough to reach the target, or bail out
// after the timeout and scroll as far as currently possible.
if (maxScroll >= targetY || timedOut) {
window.scrollTo({
top: Math.min(targetY, Math.max(maxScroll, 0)),
behavior: "auto",
});
pollTimerRef.current = null;
return;
}
// Stored in a ref so the effect cleanup can cancel it on unmount.
pollTimerRef.current = window.setTimeout(tryRestore, RESTORE_POLL_MS);
};
tryRestore();
}, []);
return { restoreScrollPosition };
}

View File

@@ -77,6 +77,7 @@ import { PageEditMode } from "@/features/user/types/user.types.ts";
import { jwtDecode } from "jwt-decode";
import { searchSpotlight } from "@/features/search/constants.ts";
import { useEditorScroll } from "./hooks/use-editor-scroll";
import { useScrollPosition } from "./hooks/use-scroll-position";
import { EditorLinkMenu } from "@/features/editor/components/link/link-menu";
import ColumnsMenu from "@/features/editor/components/columns/columns-menu.tsx";
import { TransclusionLookupProvider } from "@/features/editor/components/transclusion/transclusion-lookup-context";
@@ -141,6 +142,7 @@ export default function PageEditor({
[isComponentMounted],
);
const { handleScrollTo } = useEditorScroll({ canScroll });
const { restoreScrollPosition } = useScrollPosition(pageId);
// Providers only created once per pageId
const providersRef = useRef<{
local: IndexeddbPersistence;
@@ -479,6 +481,11 @@ export default function PageEditor({
}
}, [yjsConnectionStatus, isSynced]);
// Restore the saved reading position once the live content is laid out.
useEffect(() => {
if (!showStatic && editor) restoreScrollPosition();
}, [showStatic, editor, restoreScrollPosition]);
return (
<TransclusionLookupProvider>
<PageEmbedLookupProvider>

View File

@@ -33,6 +33,15 @@
}
}
.image-caption {
text-align: center;
font-size: 0.875em;
color: var(--mantine-color-dimmed);
margin-top: 0.4em;
line-height: 1.35;
word-break: break-word;
}
.uploading-text {
font-size: var(--mantine-font-size-md);
line-height: var(--mantine-line-height-md);

View File

@@ -1,5 +1,7 @@
import { describe, it, expect, vi, beforeEach, afterEach } from "vitest";
import i18n from "@/i18n.ts";
import {
formatRelativeTime,
getTimeGroup,
groupNotificationsByTime,
} from "@/features/notification/notification.utils.ts";
@@ -132,3 +134,59 @@ describe("groupNotificationsByTime", () => {
expect(groupNotificationsByTime([], labels)).toEqual([]);
});
});
describe("formatRelativeTime — relative buckets and absolute-date fallback", () => {
// Distinct fixed clock for the relative formatter (uses Date.now via `new
// Date()`), so the bucket boundaries are deterministic under fake timers.
const NOW = new Date("2026-06-15T12:00:00.000Z");
const MIN = 60_000;
beforeEach(() => {
vi.setSystemTime(NOW);
});
// ISO string `ms` milliseconds before NOW.
function ago(ms: number): string {
return new Date(NOW.getTime() - ms).toISOString();
}
it("returns the i18n 'now' label for anything under a minute", () => {
expect(formatRelativeTime(ago(0))).toBe(i18n.t("now"));
expect(formatRelativeTime(ago(59_000))).toBe(i18n.t("now"));
});
it("crosses into the minutes bucket exactly at 1 minute", () => {
expect(formatRelativeTime(ago(MIN - 1000))).toBe(i18n.t("now"));
expect(formatRelativeTime(ago(MIN))).toBe("1m");
expect(formatRelativeTime(ago(5 * MIN))).toBe("5m");
expect(formatRelativeTime(ago(59 * MIN))).toBe("59m");
});
it("crosses into the hours bucket exactly at 60 minutes", () => {
expect(formatRelativeTime(ago(60 * MIN - 1000))).toBe("59m");
expect(formatRelativeTime(ago(HOUR))).toBe("1h");
expect(formatRelativeTime(ago(23 * HOUR))).toBe("23h");
});
it("crosses into the days bucket exactly at 24 hours", () => {
expect(formatRelativeTime(ago(24 * HOUR - 1000))).toBe("23h");
expect(formatRelativeTime(ago(DAY))).toBe("1d");
expect(formatRelativeTime(ago(6 * DAY))).toBe("6d");
});
it("falls back to an absolute short date once >= 7 days old", () => {
// 6d -> still relative; 7d -> absolute date (no longer N[mhd], and equal to
// the localized short-date of the source timestamp).
expect(formatRelativeTime(ago(6 * DAY))).toBe("6d");
const sevenDaysAgo = ago(7 * DAY);
const result = formatRelativeTime(sevenDaysAgo);
expect(result).not.toMatch(/^\d+[mhd]$/);
expect(result).not.toBe(i18n.t("now"));
const expected = new Intl.DateTimeFormat(i18n.language, {
month: "short",
day: "numeric",
}).format(new Date(sevenDaysAgo));
expect(result).toBe(expected);
});
});

View File

@@ -0,0 +1,79 @@
import { describe, it, expect } from "vitest";
import { findBreadcrumbPath } from "./utils";
import type { SpaceTreeNode } from "@/features/page/tree/types.ts";
// findBreadcrumbPath walks the live, SHARED sidebar tree. The high-value
// invariant: when a node has no usable name it must surface "Untitled" ONLY on
// the returned breadcrumb chain via a shallow copy — never by mutating the input
// node (which would silently rename the node in the sidebar). Also covers normal
// ancestor-chain resolution, the not-found case, and nested children.
function node(id: string, over: Partial<SpaceTreeNode> = {}): SpaceTreeNode {
return {
id,
slugId: `slug-${id}`,
name: id.toUpperCase(),
icon: undefined,
position: "a0",
spaceId: "space-1",
parentPageId: null as unknown as string,
hasChildren: false,
children: [],
...over,
};
}
describe("findBreadcrumbPath", () => {
it("does NOT mutate the input tree when a node has an empty/whitespace name", () => {
// A whitespace-only-named node nested under a blank-named root.
const target = node("target", { name: " " });
const root = node("root", { name: "", hasChildren: true, children: [target] });
const tree = [root];
const result = findBreadcrumbPath(tree, "target");
expect(result).not.toBeNull();
// The RETURNED chain shows "Untitled" for both blank nodes.
expect(result!.map((n) => n.name)).toEqual(["Untitled", "Untitled"]);
// The original input nodes are untouched (still blank).
expect(root.name).toBe("");
expect(target.name).toBe(" ");
// The renamed breadcrumb entries are fresh copies, not the input objects.
expect(result![0]).not.toBe(root);
expect(result![1]).not.toBe(target);
});
it("returns the SAME node reference (no copy) when the name is non-empty", () => {
// No rename needed -> the node is passed through by reference (cheap path).
const target = node("target", { name: "Real Title" });
const result = findBreadcrumbPath([target], "target");
expect(result![0]).toBe(target);
expect(result![0].name).toBe("Real Title");
});
it("resolves the full ancestor chain ending at the target", () => {
const target = node("c");
const mid = node("b", { hasChildren: true, children: [target] });
const root = node("a", { hasChildren: true, children: [mid] });
const result = findBreadcrumbPath([root], "c");
expect(result!.map((n) => n.id)).toEqual(["a", "b", "c"]);
});
it("finds a target nested under a deeper sibling branch", () => {
// Two root branches; the target lives inside the second branch's child.
const target = node("deep");
const branch2 = node("r2", {
hasChildren: true,
children: [node("x"), node("y", { hasChildren: true, children: [target] })],
});
const branch1 = node("r1", { hasChildren: true, children: [node("z")] });
const result = findBreadcrumbPath([branch1, branch2], "deep");
expect(result!.map((n) => n.id)).toEqual(["r2", "y", "deep"]);
});
it("returns null when the page id is not present in the tree", () => {
const root = node("root", { hasChildren: true, children: [node("child")] });
expect(findBreadcrumbPath([root], "missing")).toBeNull();
expect(findBreadcrumbPath([], "anything")).toBeNull();
});
});

View File

@@ -8,6 +8,8 @@ import {
closeIds,
mergeRootTrees,
loadedOpenBranchIds,
sortPositionKeys,
pageToTreeNode,
} from "./utils";
import type { IPage } from "@/features/page/types/page.types.ts";
import type { SpaceTreeNode } from "@/features/page/tree/types.ts";
@@ -60,6 +62,82 @@ function treeNode(id: string, children: SpaceTreeNode[] = []): SpaceTreeNode {
};
}
describe("sortPositionKeys", () => {
it("orders items ascending by their fractional `position` string", () => {
const items = [
{ id: "c", position: "a5" },
{ id: "a", position: "a1" },
{ id: "b", position: "a3" },
];
expect(sortPositionKeys(items).map((i) => i.id)).toEqual(["a", "b", "c"]);
});
it("is a stable sort: equal positions keep their input order", () => {
const items = [
{ id: "x", position: "a1" },
{ id: "y", position: "a1" },
{ id: "z", position: "a1" },
];
expect(sortPositionKeys(items).map((i) => i.id)).toEqual(["x", "y", "z"]);
});
});
describe("pageToTreeNode", () => {
function pageRow(over: Partial<IPage> = {}): IPage {
return {
id: "p1",
slugId: "slug-p1",
title: "My Page",
icon: "📄",
position: "a1",
hasChildren: true,
spaceId: "space-1",
parentPageId: null as unknown as string,
...over,
} as IPage;
}
it("maps page.title -> node.name and copies the core fields", () => {
const node = pageToTreeNode(pageRow());
// The non-trivial transform: a page's `title` becomes the tree node's `name`.
expect(node.name).toBe("My Page");
expect(node.id).toBe("p1");
expect(node.slugId).toBe("slug-p1");
expect(node.icon).toBe("📄");
expect(node.position).toBe("a1");
expect(node.spaceId).toBe("space-1");
expect(node.hasChildren).toBe(true);
// Always materialized with an empty children array.
expect(node.children).toEqual([]);
});
it("derives canEdit from page.permissions.canEdit when the flat field is absent", () => {
const node = pageToTreeNode(
pageRow({ canEdit: undefined, permissions: { canEdit: true } } as Partial<IPage>),
);
expect(node.canEdit).toBe(true);
});
it("prefers the flat page.canEdit over permissions.canEdit", () => {
const node = pageToTreeNode(
pageRow({ canEdit: false, permissions: { canEdit: true } } as Partial<IPage>),
);
expect(node.canEdit).toBe(false);
});
it("carries temporaryExpiresAt straight off the page", () => {
const expiresAt = "2026-06-27T21:00:00.000Z";
expect(pageToTreeNode(pageRow({ temporaryExpiresAt: expiresAt })).temporaryExpiresAt).toBe(
expiresAt,
);
});
it("applies overrides on top of the mapped fields (e.g. optimistic blank name)", () => {
const node = pageToTreeNode(pageRow(), { name: "" });
expect(node.name).toBe("");
});
});
describe("buildTree", () => {
it("builds one node per unique page", () => {
const tree = buildTree([page("a", "a1"), page("b", "a2")]);

View File

@@ -70,18 +70,22 @@ export function findBreadcrumbPath(
path: SpaceTreeNode[] = [],
): SpaceTreeNode[] | null {
for (const node of tree) {
if (!node.name || node.name.trim() === "") {
node.name = "Untitled";
}
// Never mutate the input tree (it is the live, shared sidebar tree state).
// When a node has no usable name, surface "Untitled" via a shallow copy that
// only the returned breadcrumb chain sees — the source node stays untouched.
const displayNode: SpaceTreeNode =
!node.name || node.name.trim() === ""
? { ...node, name: "Untitled" }
: node;
if (node.id === pageId) {
return [...path, node];
return [...path, displayNode];
}
if (node.children) {
const newPath = findBreadcrumbPath(node.children, pageId, [
...path,
node,
displayNode,
]);
if (newPath) {
return newPath;

View File

@@ -3,6 +3,7 @@ import {
applyAddTreeNode,
applyMoveTreeNode,
applyDeleteTreeNode,
applyUpdateOne,
} from "./tree-socket-reducers";
import { treeModel } from "@/features/page/tree/model/tree-model";
import { SpaceTreeNode } from "@/features/page/tree/types.ts";
@@ -338,3 +339,76 @@ describe("applyAddTreeNode", () => {
expect(treeModel.find(next, "temp")?.temporaryExpiresAt).toBe(expiresAt);
});
});
describe("applyUpdateOne", () => {
// A loaded two-level tree so we can patch both a root and a nested node.
const buildTree = (): SpaceTreeNode[] => [
node("root", {
position: "a0",
name: "Root",
icon: "📁",
hasChildren: true,
children: [node("child", { position: "a1", parentPageId: "root", name: "Child", icon: "📄" })],
}),
];
// Build the UpdateEvent envelope; only `id`/`payload` matter to the reducer.
const ev = (id: string, payload: Record<string, unknown>) =>
({
operation: "updateOne",
spaceId: "space-1",
entity: ["pages"],
id,
payload,
}) as unknown as Parameters<typeof applyUpdateOne>[1];
it("applies a title-only update to the node's name (icon untouched)", () => {
const tree = buildTree();
const next = applyUpdateOne(tree, ev("child", { title: "Renamed" }));
const child = treeModel.find(next, "child");
expect(child?.name).toBe("Renamed");
// Icon is left as it was.
expect(child?.icon).toBe("📄");
});
it("applies an icon-only update to the node's icon (name untouched)", () => {
const tree = buildTree();
const next = applyUpdateOne(tree, ev("root", { icon: "🔥" }));
const root = treeModel.find(next, "root");
expect(root?.icon).toBe("🔥");
expect(root?.name).toBe("Root");
});
it("applies a combined title + icon update", () => {
const tree = buildTree();
const next = applyUpdateOne(tree, ev("child", { title: "Both", icon: "⭐" }));
const child = treeModel.find(next, "child");
expect(child?.name).toBe("Both");
expect(child?.icon).toBe("⭐");
});
it("returns prev UNCHANGED (same reference) when the id is not loaded", () => {
const tree = buildTree();
const next = applyUpdateOne(tree, ev("ghost", { title: "Nope" }));
expect(next).toBe(tree);
});
it("returns prev UNCHANGED (same reference) for a no-op payload (no title/icon)", () => {
// The node exists, but the payload carries neither title nor icon -> nothing
// to patch, so the reducer must hand back the same array reference.
const tree = buildTree();
const next = applyUpdateOne(tree, ev("child", {}));
expect(next).toBe(tree);
});
it("treats an explicit null icon/title as a value to apply (undefined check, not truthiness)", () => {
// The reducer guards on `!== undefined`, so a clearing null IS applied.
const tree = buildTree();
const next = applyUpdateOne(tree, ev("child", { title: "", icon: null }));
const child = treeModel.find(next, "child");
expect(child?.name).toBe("");
expect(child?.icon).toBeNull();
// And it did change something -> a fresh reference, not prev.
expect(next).not.toBe(tree);
});
});

View File

@@ -3,6 +3,9 @@ import {
resolveCardStatus,
isEndpointConfigured,
resolveKeyField,
nextReindexPollInterval,
isReindexComplete,
isReindexButtonLoading,
} from './ai-provider-settings';
describe('resolveCardStatus', () => {
@@ -71,3 +74,195 @@ describe('resolveKeyField (write-only key payload)', () => {
expect(resolveKeyField('', false)).toEqual({ set: false });
});
});
describe('nextReindexPollInterval', () => {
const INTERVAL = 5000;
// `seenActive: true` is the steady state for most of a run — a poll has
// observed `reindexing === true` (the server pre-seeds it from enqueue time).
const base = { now: 1_000, intervalMs: INTERVAL, seenActive: true };
it('does not poll when no reindex deadline is set', () => {
expect(
nextReindexPollInterval({
...base,
deadline: null,
status: { reindexing: true, indexedPages: 0, totalPages: 478 },
}),
).toBe(false);
});
it('keeps polling while the server reports an active run', () => {
expect(
nextReindexPollInterval({
...base,
deadline: 10_000,
status: { reindexing: true, indexedPages: 120, totalPages: 478 },
}),
).toBe(INTERVAL);
});
it('keeps polling during an active run even if counts momentarily look full', () => {
// The run clears its progress record only at the very end, so a transient
// indexed==total while reindexing is still true must NOT stop polling.
expect(
nextReindexPollInterval({
...base,
deadline: 10_000,
status: { reindexing: true, indexedPages: 478, totalPages: 478 },
}),
).toBe(INTERVAL);
});
it('stops once the run is finished AND fully indexed (after having been active)', () => {
expect(
nextReindexPollInterval({
...base,
deadline: 10_000,
status: { reindexing: false, indexedPages: 478, totalPages: 478 },
}),
).toBe(false);
});
it('does NOT stop on the stale pre-reindex snapshot (fully indexed, never seen active)', () => {
// Regression for #262: right after "Reindex now" the client still holds the
// PRE-reindex settings (an already fully-indexed workspace reads as
// reindexing=false, indexed>=total). Without the seenActive gate this looked
// "done" and stopped polling on the very first tick, freezing the counter at
// 0 until a manual reload. The fresh window has not observed the active run,
// so polling must continue until the first real poll lands.
expect(
nextReindexPollInterval({
...base,
seenActive: false,
deadline: 10_000,
status: { reindexing: false, indexedPages: 478, totalPages: 478 },
}),
).toBe(INTERVAL);
});
it('keeps polling within the deadline when not yet done and no active flag', () => {
// First poll right after enqueue, before the worker publishes progress.
expect(
nextReindexPollInterval({
...base,
seenActive: false,
deadline: 10_000,
status: { reindexing: false, indexedPages: 0, totalPages: 478 },
}),
).toBe(INTERVAL);
});
it('cap always wins: stops once past the deadline even if still reindexing', () => {
expect(
nextReindexPollInterval({
deadline: 1_000,
now: 2_000, // past the deadline
intervalMs: INTERVAL,
seenActive: true,
status: { reindexing: true, indexedPages: 200, totalPages: 478 },
}),
).toBe(false);
});
it('stops on an empty workspace (0 of 0) once the run is finished', () => {
// The pre-seed publishes reindexing=true even for 0 pages, so a poll sees the
// run active before the worker clears -> seenActive latches true.
expect(
nextReindexPollInterval({
...base,
deadline: 10_000,
status: { reindexing: false, indexedPages: 0, totalPages: 0 },
}),
).toBe(false);
});
});
describe('isReindexComplete', () => {
it('false when no status yet', () => {
expect(isReindexComplete(undefined, true)).toBe(false);
});
it('false while a run is still active (even at indexed==total)', () => {
expect(
isReindexComplete(
{ reindexing: true, indexedPages: 478, totalPages: 478 },
true,
),
).toBe(false);
});
it('false when finished but not yet fully indexed', () => {
expect(
isReindexComplete(
{ reindexing: false, indexedPages: 120, totalPages: 478 },
true,
),
).toBe(false);
});
it('true once finished and fully indexed (after having been active)', () => {
expect(
isReindexComplete(
{ reindexing: false, indexedPages: 478, totalPages: 478 },
true,
),
).toBe(true);
});
it('false on the stale pre-reindex snapshot: finished+fully indexed but never seen active', () => {
// The just-started edge: the gate keeps this from clearing the poll deadline
// before the first post-reindex poll arrives.
expect(
isReindexComplete(
{ reindexing: false, indexedPages: 478, totalPages: 478 },
false,
),
).toBe(false);
});
});
describe('isReindexButtonLoading', () => {
it('loads while the POST mutation is pending', () => {
expect(
isReindexButtonLoading({
mutationPending: true,
deadline: null,
status: false,
}),
).toBe(true);
});
it('does NOT load post-cap: deadline nulled but reindexing left stale-true', () => {
// The key case: after the poll cap fires `reindexDeadline` is null while
// `settings.reindexing` can be a stale `true` from the last poll. Gating on
// the deadline keeps the spinner from sticking forever so the admin can
// restart.
expect(
isReindexButtonLoading({
mutationPending: false,
deadline: null,
status: true,
}),
).toBe(false);
});
it('loads during an active run within the poll window', () => {
expect(
isReindexButtonLoading({
mutationPending: false,
deadline: 10_000,
status: true,
}),
).toBe(true);
});
it('does not load once the run finished while still polling', () => {
expect(
isReindexButtonLoading({
mutationPending: false,
deadline: 10_000,
status: false,
}),
).toBe(false);
});
});

View File

@@ -1,4 +1,4 @@
import { useEffect, useState } from "react";
import { useEffect, useRef, useState } from "react";
import { z } from "zod/v4";
import {
ActionIcon,
@@ -37,6 +37,7 @@ import {
} from "@/features/workspace/queries/ai-settings-query.ts";
import {
AiTestCapability,
IAiSettings,
IAiSettingsUpdate,
SttApiStyle,
ChatApiStyle,
@@ -169,6 +170,95 @@ export function resolveKeyField(
return { set: false };
}
// Subset of the status payload that drives the reindex poll decisions.
type ReindexStatus = Pick<
IAiSettings,
"reindexing" | "indexedPages" | "totalPages"
>;
/**
* Decide the TanStack Query `refetchInterval` while a reindex may be running.
* Returns the poll interval (ms) to keep polling, or `false` to stop.
*
* Polls while the server reports an ACTIVE run (`reindexing === true`) OR we are
* still within the deadline window and not yet fully indexed. Stops once the run
* has finished AND everything is indexed (server cleared its progress record and
* fell back to the DB coverage count), or the deadline cap is hit — the cap
* always wins so a stuck/never-clearing progress record can't poll forever.
*
* `seenActive` guards the just-started window: right after "Reindex now" the
* client still holds the PRE-reindex settings snapshot, which for an already
* fully-indexed workspace reads as `reindexing=false, indexed>=total`. Treating
* that stale snapshot as "done" would stop polling before the first post-reindex
* poll ever lands (counter frozen at 0). So completion is only honored once a
* poll has actually observed the active run (the enqueue-time pre-seed makes
* `reindexing=true` visible from the first poll until the run truly clears).
*/
export function nextReindexPollInterval(args: {
deadline: number | null;
now: number;
intervalMs: number;
status?: ReindexStatus;
seenActive: boolean;
}): number | false {
const { deadline, now, intervalMs, status, seenActive } = args;
if (deadline === null) return false;
// Cap always wins.
if (now > deadline) return false;
// Active run → keep polling even if the momentary counts already look full.
if (status?.reindexing) return intervalMs;
// Finished and fully indexed (incl. an empty workspace, 0 >= 0) → stop. Reuse
// isReindexComplete so the completeness check lives in exactly one place.
if (isReindexComplete(status, seenActive)) return false;
// Within the deadline and not yet done → keep polling.
return intervalMs;
}
/**
* Whether the reindex poll deadline should be cleared: a poll has observed the
* active run (`seenActive`) AND the server now reports no active run AND the
* count is complete. The single source of truth for the "reindex finished"
* check — `nextReindexPollInterval` reuses it for its stop condition (sans the
* cap, which the effect handles via time).
*
* The `seenActive` requirement is what keeps the STALE pre-reindex snapshot
* (already fully indexed → `reindexing=false, indexed>=total`) from being read
* as "finished" in the window before the first post-reindex poll arrives. Once
* a poll has seen `reindexing=true` (guaranteed by the server's enqueue-time
* pre-seed for the whole run), this flips to a genuine completion check.
*/
export function isReindexComplete(
status: ReindexStatus | undefined,
seenActive: boolean,
): boolean {
return (
seenActive &&
!!status &&
!status.reindexing &&
status.indexedPages >= status.totalPages
);
}
/**
* Whether the reindex button should show its spinner (and stay disabled).
*
* Spins while the POST is in flight, and for the WHOLE background run while the
* server reports `reindexing === true`. The `deadline !== null` gate is the
* load-bearing part: once the 120s poll cap fires it nulls `reindexDeadline`
* and stops refetching, so `status` (settings?.reindexing) can be a stale
* `true` from the last poll. Without the gate the spinner would stick forever
* for a run that outlives the cap and block a restart; gating on the active
* poll window clears it so the admin can re-trigger.
*/
export function isReindexButtonLoading(args: {
mutationPending: boolean;
deadline: number | null;
status?: boolean;
}): boolean {
const { mutationPending, deadline, status } = args;
return mutationPending || (deadline !== null && status === true);
}
// Translate the dot's tooltip label. Kept in one place so all three endpoint
// cards share identical wording.
function cardStatusLabel(status: CardStatus, t: (k: string) => string): string {
@@ -215,31 +305,48 @@ export default function AiProviderSettings() {
// PRE-job counts immediately, so the only way the "Indexed X of Y" counter
// visibly climbs is to keep polling the settings query while the job runs.
// `reindexDeadline` is the timestamp until which we poll (set on reindex
// success); polling stops early once indexed === total. Bounded so a stuck
// job can never poll forever.
const REINDEX_POLL_INTERVAL = 3000; // ms between refetches while indexing
// success). Polling tracks the server's `reindexing` flag: it keeps going for
// the whole active run and stops promptly once the server reports the run is
// finished. Bounded by the cap so a stuck/never-clearing progress record can
// never poll forever.
const REINDEX_POLL_INTERVAL = 5000; // ms between refetches while indexing
const REINDEX_POLL_CAP_MS = 120000; // ~2 min hard cap
const [reindexDeadline, setReindexDeadline] = useState<number | null>(null);
// Whether any poll in the CURRENT window has actually observed the active run
// (`reindexing === true`). Reset when a new reindex is kicked off. Gates the
// completion check so the STALE pre-reindex snapshot (an already fully-indexed
// workspace reads as `reindexing=false, indexed>=total`) can't be mistaken for
// "finished" before the first post-reindex poll lands — which would freeze the
// counter at 0 until a manual reload. A ref (not state) because it must not
// trigger a render and is only ever read where `reindexing` is already false.
const reindexSeenActiveRef = useRef(false);
// Only admins may read the (masked) AI settings; the server enforces this too.
const { data: settings, isLoading } = useAiSettingsQuery(isAdmin, (query) => {
if (reindexDeadline === null) return false;
// Past the cap → stop polling (cleared via the effect below too).
if (Date.now() > reindexDeadline) return false;
const data = query.state.data;
// Stop once everything is indexed; otherwise keep polling.
if (data && data.indexedPages >= data.totalPages) return false;
return REINDEX_POLL_INTERVAL;
});
const { data: settings, isLoading } = useAiSettingsQuery(isAdmin, (query) =>
nextReindexPollInterval({
deadline: reindexDeadline,
now: Date.now(),
intervalMs: REINDEX_POLL_INTERVAL,
status: query.state.data,
seenActive: reindexSeenActiveRef.current,
}),
);
// Stop polling once the work is done or the cap is reached. Also clears on
// Stop polling once the run is finished or the cap is reached. Also clears on
// unmount because the deadline state goes away with the component.
useEffect(() => {
if (reindexDeadline === null) return;
// "Done" matches the refetchInterval stop condition (indexed >= total),
// including an empty workspace (0 >= 0), so the deadline clears promptly
// instead of waiting out the cap.
if (settings && settings.indexedPages >= settings.totalPages) {
// Latch "we have seen the active run" the moment a poll reports it, so the
// completion check below (and the refetchInterval's) only fires once the run
// has genuinely started — never on the stale pre-reindex snapshot.
if (settings?.reindexing) reindexSeenActiveRef.current = true;
// "Done" matches the refetchInterval stop condition: a poll has observed the
// active run AND the server now reports no active run AND the count is
// complete (indexed >= total, incl. an empty workspace 0 >= 0), so the
// deadline clears promptly instead of waiting out the cap. While `reindexing`
// is still true (or no poll has seen it active yet) we keep the deadline so
// polling continues for the whole run.
if (isReindexComplete(settings, reindexSeenActiveRef.current)) {
setReindexDeadline(null);
return;
}
@@ -1031,13 +1138,28 @@ export default function AiProviderSettings() {
<Button
variant="subtle"
size="compact-sm"
loading={reindexMutation.isPending}
// Spin for the WHOLE run: the POST resolves immediately, but the
// background job keeps running, so also stay loading while the
// server reports `reindexing` (this also blocks a redundant
// re-trigger mid-run; the server de-dupes regardless). The
// deadline gate (and why it matters post-cap) lives in
// `isReindexButtonLoading`, which is unit-tested.
loading={isReindexButtonLoading({
mutationPending: reindexMutation.isPending,
deadline: reindexDeadline,
status: settings?.reindexing,
})}
onClick={() =>
reindexMutation.mutate(undefined, {
// Begin bounded polling so the counter climbs as the async
// background job indexes (it does not update on its own).
onSuccess: () =>
setReindexDeadline(Date.now() + REINDEX_POLL_CAP_MS),
// Clear the "seen active" latch first so this fresh window
// doesn't inherit a previous run's completion state and stop
// immediately.
onSuccess: () => {
reindexSeenActiveRef.current = false;
setReindexDeadline(Date.now() + REINDEX_POLL_CAP_MS);
},
})
}
>

View File

@@ -23,8 +23,12 @@ export function useAiSettingsQuery(
enabled: boolean = true,
// While reindexing runs as an async background job, the counter only climbs
// if the client keeps refetching. The component passes a refetchInterval
// function that polls until indexed === total or a bounded deadline, then
// returns false to stop. See AiProviderSettings.
// function (`nextReindexPollInterval`) that keeps polling while the server
// reports an active run (reindexing === true) OR we are still within the
// bounded deadline and not yet fully indexed; it returns false to stop only
// once the run has finished AND indexed >= total, or the deadline cap is hit
// (the cap always wins). Note: a transient indexed === total during an active
// run does NOT stop polling. See AiProviderSettings.
refetchInterval?:
| number
| false

View File

@@ -48,6 +48,9 @@ export interface IAiSettings {
// RAG indexing coverage (pages indexed for semantic search).
indexedPages: number;
totalPages: number;
// True while a full workspace reindex is actively running; the counts above
// then reflect the live run progress (done climbs 0 -> total).
reindexing?: boolean;
}
// Update payload. Key semantics (same for `apiKey` and `embeddingApiKey`):

View File

@@ -205,31 +205,203 @@ describe('PersistenceExtension.onStoreDocument — Approach-A boundary snapshot'
expect(historyQueue.add).toHaveBeenCalledTimes(1);
});
// #206 persist-6 — RED (it.failing): a momentarily-empty live Y.Doc must not
// overwrite non-empty persisted content. `onStoreDocument` empty-guards the
// LOAD path but not the STORE path, so today an empty doc (a client/agent
// glitch, a bad merge, an emptying transclusion) is written straight over the
// page and the content is wiped silently. A store-side empty-guard is a real
// behaviour change (a deliberate "select-all + delete" is also empty), so it
// is left UNFIXED pending a product decision; this documents the data-loss
// path and flips to a normal passing test the moment the guard lands.
it.failing(
'does NOT overwrite non-empty content with a momentarily-empty live doc (persist-6)',
async () => {
const emptyDoc = { type: 'doc', content: [{ type: 'paragraph' }] };
const document = ydocFor(emptyDoc);
pageRepo.findById.mockResolvedValue({
...persistedHumanPage('IGNORED'),
content: doc('IMPORTANT RICH CONTENT'),
});
// #206 persist-6 / #248 — a momentarily-empty live Y.Doc must not overwrite
// non-empty persisted content. The store-side empty-guard blocks an empty doc
// (a client/agent glitch, a bad merge, an emptying transclusion) from wiping
// the page silently when NO intentional-clear signal is present.
it('does NOT overwrite non-empty content with a momentarily-empty live doc (persist-6)', async () => {
const emptyDoc = { type: 'doc', content: [{ type: 'paragraph' }] };
const document = ydocFor(emptyDoc);
pageRepo.findById.mockResolvedValue({
...persistedHumanPage('IGNORED'),
content: doc('IMPORTANT RICH CONTENT'),
});
await ext.onStoreDocument(buildData(document, 'user') as any);
await ext.onStoreDocument(buildData(document, 'user') as any);
// Desired contract: the empty incoming doc is rejected and the rich page
// survives. Today updatePage is called with the empty content (data loss).
expect(pageRepo.updatePage).not.toHaveBeenCalled();
},
);
// The empty incoming doc is rejected and the rich page survives.
expect(pageRepo.updatePage).not.toHaveBeenCalled();
});
// #248 — an empty-over-empty store is allowed (nothing to lose); the guard
// only protects non-empty persisted content.
it('allows an empty store over already-empty content (#248)', async () => {
const liveEmptyDoc = { type: 'doc', content: [{ type: 'paragraph' }] };
const document = ydocFor(liveEmptyDoc);
// Stored content is empty per isEmptyParagraphDoc (paragraph with content:[])
// but NOT deep-equal to the normalized live doc, so the unchanged
// short-circuit is skipped and the empty-guard is genuinely reached.
pageRepo.findById.mockResolvedValue({
...persistedHumanPage('IGNORED'),
content: { type: 'doc', content: [{ type: 'paragraph', content: [] }] },
});
await ext.onStoreDocument(buildData(document, 'user') as any);
expect(pageRepo.updatePage).toHaveBeenCalledTimes(1);
});
// #251 — REAL-PATH regression test. The intentional-clear signal is set via
// the actual transport seam (ext.onStateless with the exact stateless payload
// the client's IntentionalClear extension sends), NOT a hand-injected
// context.intentionalClear poke. We then run the debounced store with an empty
// live doc over non-empty persisted content and assert the empty write goes
// through — i.e. the clear persists.
it('persists an intentional clear signalled via the real stateless transport (#251)', async () => {
const documentName = `page.${PAGE_ID}`;
const emptyDoc = { type: 'doc', content: [{ type: 'paragraph' }] };
const document = ydocFor(emptyDoc);
pageRepo.findById.mockResolvedValue({
...persistedHumanPage('IGNORED'),
content: doc('IMPORTANT RICH CONTENT'),
});
// The client signalled a deliberate clear over the live connection.
await ext.onStateless({
connection: { readOnly: false } as any,
documentName,
document: document as any,
payload: JSON.stringify({ type: 'intentional-clear' }),
} as any);
await ext.onStoreDocument(buildData(document, 'user') as any);
// The empty doc was written (the clear persisted). The persisted content is
// the Y.Doc round-trip of the empty doc (attrs normalized), so compare
// against fromYdoc rather than the raw literal.
expect(pageRepo.updatePage).toHaveBeenCalledTimes(1);
const expectedEmpty = TiptapTransformer.fromYdoc(document, 'default');
expect(pageRepo.updatePage.mock.calls[0][0].content).toEqual(expectedEmpty);
});
// #251 — retry correctness: a transient DB failure on the FIRST attempt must
// not silently drop the clear. The intentional-clear flag is consumed ONCE
// before the retry loop, so when attempt 1's updatePage throws (tx rolls back,
// but the in-memory flag delete cannot roll back) the retry on attempt 2 still
// sees the clear as allowed and writes the empty doc. On the pre-fix code
// (consumeIntentionalClear called INSIDE the loop) attempt 1 consumed the flag,
// attempt 2 re-read it as absent and the empty-guard BLOCKED the write — so
// updatePage would be called once and the clear would be lost. This test fails
// on that ordering and passes after the hoist.
it('persists an intentional clear even when the first store attempt fails transiently (#251)', async () => {
const documentName = `page.${PAGE_ID}`;
const emptyDoc = { type: 'doc', content: [{ type: 'paragraph' }] };
const document = ydocFor(emptyDoc);
// The page stays non-empty in the DB across both attempts (the rolled-back
// first attempt never changed it), exactly the failure scenario the WARNING
// describes.
pageRepo.findById.mockResolvedValue({
...persistedHumanPage('IGNORED'),
content: doc('IMPORTANT RICH CONTENT'),
});
let attempts = 0;
pageRepo.updatePage.mockImplementation(async () => {
attempts += 1;
if (attempts === 1) throw new Error('deadlock detected'); // transient
callOrder.push('updatePage');
});
// The client signalled a deliberate clear over the live connection.
await ext.onStateless({
connection: { readOnly: false } as any,
documentName,
document: document as any,
payload: JSON.stringify({ type: 'intentional-clear' }),
} as any);
await ext.onStoreDocument(buildData(document, 'user') as any);
// First attempt failed and rolled back; the retry still honoured the clear
// and wrote the empty doc (the clear survived the retry).
expect(pageRepo.updatePage).toHaveBeenCalledTimes(2);
const expectedEmpty = TiptapTransformer.fromYdoc(document, 'default');
expect(pageRepo.updatePage.mock.calls[1][0].content).toEqual(expectedEmpty);
});
// #251 — the signal is single-use: it is consumed by the first empty store,
// so a SECOND accidental empty (no fresh signal) is still blocked.
it('consumes the intentional-clear signal once; a later empty is blocked (#251)', async () => {
const documentName = `page.${PAGE_ID}`;
const emptyDoc = { type: 'doc', content: [{ type: 'paragraph' }] };
pageRepo.findById.mockResolvedValue({
...persistedHumanPage('IGNORED'),
content: doc('IMPORTANT RICH CONTENT'),
});
await ext.onStateless({
connection: { readOnly: false } as any,
documentName,
document: ydocFor(emptyDoc) as any,
payload: JSON.stringify({ type: 'intentional-clear' }),
} as any);
// First empty store consumes the signal and writes.
await ext.onStoreDocument(buildData(ydocFor(emptyDoc), 'user') as any);
expect(pageRepo.updatePage).toHaveBeenCalledTimes(1);
// Re-arm findById to non-empty (as if content came back) and fire another
// empty store WITHOUT a new signal — the guard must block it.
pageRepo.updatePage.mockClear();
pageRepo.findById.mockResolvedValue({
...persistedHumanPage('IGNORED'),
content: doc('IMPORTANT RICH CONTENT'),
});
await ext.onStoreDocument(buildData(ydocFor(emptyDoc), 'user') as any);
expect(pageRepo.updatePage).not.toHaveBeenCalled();
});
// #251 — a read-only connection cannot arm the clear, so its empty store is
// still blocked (defends the guard against a read-only spoof).
it('ignores an intentional-clear signal from a read-only connection (#251)', async () => {
const documentName = `page.${PAGE_ID}`;
const emptyDoc = { type: 'doc', content: [{ type: 'paragraph' }] };
const document = ydocFor(emptyDoc);
pageRepo.findById.mockResolvedValue({
...persistedHumanPage('IGNORED'),
content: doc('IMPORTANT RICH CONTENT'),
});
await ext.onStateless({
connection: { readOnly: true } as any,
documentName,
document: document as any,
payload: JSON.stringify({ type: 'intentional-clear' }),
} as any);
await ext.onStoreDocument(buildData(document, 'user') as any);
expect(pageRepo.updatePage).not.toHaveBeenCalled();
});
// #251 — a non-empty store between the signal and the empty store drops the
// pending flag ("cleared then retyped" can't leave a usable signal behind).
it('drops a pending clear when a non-empty store intervenes (#251)', async () => {
const documentName = `page.${PAGE_ID}`;
const emptyDoc = { type: 'doc', content: [{ type: 'paragraph' }] };
await ext.onStateless({
connection: { readOnly: false } as any,
documentName,
document: ydocFor(emptyDoc) as any,
payload: JSON.stringify({ type: 'intentional-clear' }),
} as any);
// A non-empty store lands first → consumes/drops the stale flag.
pageRepo.findById.mockResolvedValue(persistedHumanPage('NEW HUMAN TEXT'));
await ext.onStoreDocument(
buildData(ydocFor(doc('NEW HUMAN TEXT')), 'user') as any,
);
pageRepo.updatePage.mockClear();
// Now an empty store with no fresh signal must be blocked.
pageRepo.findById.mockResolvedValue({
...persistedHumanPage('IGNORED'),
content: doc('IMPORTANT RICH CONTENT'),
});
await ext.onStoreDocument(buildData(ydocFor(emptyDoc), 'user') as any);
expect(pageRepo.updatePage).not.toHaveBeenCalled();
});
// persist-1 — when every attempt fails the hook must NOT report a phantom
// success: no "page.updated" badge broadcast and no history snapshot for
@@ -250,4 +422,51 @@ describe('PersistenceExtension.onStoreDocument — Approach-A boundary snapshot'
expect(historyQueue.add).not.toHaveBeenCalled();
expect(aiQueue.add).not.toHaveBeenCalled();
});
// #260 — when the collab doc name carries a SLUGID (`page.<slugId>`) the
// post-store side effects must use the resolved page.id (a UUID), NOT the
// slugId. The transclusion sync + embedding reindex write uuid-typed columns,
// so a slugId there threw Postgres 22P02; the contributors key must also match
// the PAGE_HISTORY job, which is enqueued with page.id.
it('uses the canonical page.id (not the slugId doc name) for post-store side effects (#260)', async () => {
const SLUG = 'slug-1'; // persistedHumanPage.slugId; findById resolves it
const document = ydocFor(doc('NEW AGENT CONTENT'));
pageRepo.findById.mockResolvedValue(persistedHumanPage('NEW AGENT CONTENT'));
pageHistoryRepo.findPageLastHistory.mockResolvedValue(null);
// A `page.<slugId>` document name (the bug's smoking gun), agent store over
// a human page so the in-tx history-boundary read is also exercised.
await ext.onStoreDocument({
documentName: `page.${SLUG}`,
document,
context: { user: { id: USER_ID, name: 'Alice' }, actor: 'agent' },
} as any);
// findById was queried with the slugId (it resolves either id or slugId).
expect(pageRepo.findById).toHaveBeenCalledWith(SLUG, expect.anything());
// The in-tx history-boundary read uses the canonical UUID, never the slugId.
expect(pageHistoryRepo.findPageLastHistory).toHaveBeenCalledWith(
PAGE_ID,
expect.anything(),
);
// Transclusion sync (uuid-typed columns) must receive the UUID.
expect(transclusionService.syncPageTransclusions.mock.calls[0][0]).toBe(
PAGE_ID,
);
expect(transclusionService.syncPageReferences.mock.calls[0][0]).toBe(
PAGE_ID,
);
expect(
transclusionService.syncPageTemplateReferences.mock.calls[0][0],
).toBe(PAGE_ID);
// Embedding reindex job keyed by the UUID (slugId there threw 22P02).
expect(aiQueue.add).toHaveBeenCalledTimes(1);
expect(aiQueue.add.mock.calls[0][1].pageIds).toEqual([PAGE_ID]);
// Contributors keyed by the UUID so they match the PAGE_HISTORY job (page.id).
expect(collabHistory.addContributors.mock.calls[0][0]).toBe(PAGE_ID);
});
});

View File

@@ -3,6 +3,7 @@ import {
Extension,
onChangePayload,
onLoadDocumentPayload,
onStatelessPayload,
onStoreDocumentPayload,
} from '@hocuspocus/server';
import * as Y from 'yjs';
@@ -41,6 +42,35 @@ import {
} from '../constants';
import { TransclusionService } from '../../core/page/transclusion/transclusion.service';
/**
* #251 — wire format of the client→server stateless message that signals a
* deliberate page clear. The client (IntentionalClear editor extension) sends
* `{ type: INTENTIONAL_CLEAR_MESSAGE_TYPE }`; the document is taken from the
* connection, not the payload, so the signal cannot be aimed at another page.
*/
export const INTENTIONAL_CLEAR_MESSAGE_TYPE = 'intentional-clear';
/**
* #251 — how long an intentional-clear signal stays "pending" before it is
* ignored. The signal is set on the clearing keystroke but consumed by the
* DEBOUNCED onStoreDocument, so the TTL must comfortably exceed the collab
* store debounce window (hocuspocus is configured with maxDebounce = 45s in
* collaboration.gateway.ts). 60s leaves a margin while keeping the window for a
* stale flag small; on top of the TTL, any non-empty store immediately drops a
* pending flag (see onStoreDocument), so a "cleared then retyped" sequence can
* never leave a usable flag behind.
*
* Known fail-safe limitation: the flag lives only in this node's process memory.
* If document ownership transfers to another node, or this node crashes/restarts,
* between the stateless signal (set on node A) and the debounced store, the
* in-memory flag is lost and the clear is silently NOT applied — the store-side
* empty-guard then reloads the document non-empty from the DB. This is
* deliberately fail-safe (a lost flag preserves content rather than destroying
* it), but it is a documented limitation, not a guarantee that every deliberate
* clear survives a node handoff.
*/
export const INTENTIONAL_CLEAR_TTL_MS = 60_000;
/**
* Resolve the provenance source for a coalesced snapshot.
*
@@ -96,6 +126,13 @@ export class PersistenceExtension implements Extension {
// coalescing window" per document and OR it across all edits in the window,
// so the snapshot is marked 'agent' regardless of who wrote last.
private agentTouched: Map<string, boolean> = new Map();
// #251 — per-document "intentional clear pending" flags. Keyed by
// documentName, value = expiry timestamp (ms). Set by onStateless when the
// client reports a deliberate clear; consumed once by the next
// onStoreDocument empty-guard branch. This is the per-EDIT channel the
// per-connection context cannot provide (a clear is an edit event, but the
// store is debounced and connection context is fixed at authentication).
private intentionalClear: Map<string, number> = new Map();
constructor(
private readonly pageRepo: PageRepo,
@@ -180,6 +217,19 @@ export class PersistenceExtension implements Extension {
this.consumeAgentTouched(documentName),
context?.actor,
);
// #251 — consume the intentional-clear flag ONCE, BEFORE the retry loop
// (like consumeContributors / consumeAgentTouched above). consumeIntentional-
// Clear ALWAYS deletes the in-memory Map entry, but a tx rollback cannot
// un-delete it. Calling it INSIDE the loop meant: a clear armed for attempt 1
// was consumed there, attempt 1's updatePage threw a transient error and
// rolled back, then attempt 2 re-read non-empty content and saw the flag
// already gone — silently downgrading the retry into a BLOCKED write, so the
// user's deliberate clear was dropped. Hoisting makes the decision stable
// across every attempt. This single call also preserves the "a non-empty
// store drops a pending flag" semantics (the cleared-then-retyped case):
// every store consumes the flag here regardless of incoming emptiness, so a
// subsequent non-empty store can never leave a usable flag behind.
const allowIntentionalClear = this.consumeIntentionalClear(documentName);
// Persist with a small bounded retry. The in-memory Y.Doc is the ONLY copy
// of the latest edit until this hook returns: hocuspocus destroys/unloads the
@@ -210,6 +260,46 @@ export class PersistenceExtension implements Extension {
return;
}
// #206 persist-6 / #248 — store-side empty-guard. A momentarily-empty
// live Y.Doc (a client/agent glitch, a bad merge, a transclusion that
// emptied) must NOT overwrite non-empty persisted content. The LOAD
// path already guards emptiness (onLoadDocument only hydrates from db
// when the live doc isEmpty); the STORE path did not, so an empty
// serialization was written straight over the page, wiping it
// silently.
//
// #251 — the ONE legitimate empty-over-non-empty write is a user who
// deliberately clears the page. That intent arrives out-of-band as a
// stateless message, NOT from the doc content, which is why it cannot
// be spoofed for non-clear writes: the flag is only ever read on this
// empty-incoming branch, so the worst a forged signal can do is clear
// a page the connection may already edit. The flag was consumed ONCE
// before the retry loop (`allowIntentionalClear`) so the decision is
// stable across retries; a non-empty store still drops any pending
// flag via that same hoisted consume (a "cleared then retyped"
// sequence can't leave a usable one behind).
const incomingEmpty = isEmptyParagraphDoc(tiptapJson as any);
if (
incomingEmpty &&
page.content &&
!isEmptyParagraphDoc(page.content as any)
) {
if (allowIntentionalClear) {
this.logger.debug(
`Intentional clear for ${pageId}: persisting empty doc over ` +
`non-empty content (user-signalled)`,
);
// fall through — the empty write is allowed exactly once.
} else {
this.logger.warn(
`Skipping store for ${pageId}: empty live doc would overwrite ` +
`non-empty persisted content`,
);
page = null;
return;
}
}
let contributorIds = undefined;
try {
const existingContributors = page.contributorIds || [];
@@ -239,8 +329,10 @@ export class PersistenceExtension implements Extension {
lastUpdatedSource === 'agent' &&
page.lastUpdatedSource !== 'agent'
) {
// pageHistory.pageId is uuid-typed; use page.id (never the doc-name
// slugId) so a `page.<slugId>` doc cannot throw 22P02 here (#260).
const lastHistory = await this.pageHistoryRepo.findPageLastHistory(
pageId,
page.id,
{ includeContent: true, trx },
);
const humanBaselineMissing =
@@ -308,11 +400,16 @@ export class PersistenceExtension implements Extension {
}),
);
await this.syncTransclusion(pageId, page.workspaceId, tiptapJson);
// Use the canonical page UUID (page.id), not the doc-name id, which may be
// a slugId for a `page.<slugId>` doc (#260). The transclusion/reference
// syncs write uuid-typed columns, so a slugId here threw Postgres 22P02.
await this.syncTransclusion(page.id, page.workspaceId, tiptapJson);
}
if (page) {
await this.collabHistory.addContributors(pageId, editingUserIds);
// Key contributors by the page UUID so they MATCH the PAGE_HISTORY job,
// which is enqueued with page.id and pops contributors by page.id (#260).
await this.collabHistory.addContributors(page.id, editingUserIds);
const mentions = extractMentions(tiptapJson);
@@ -330,14 +427,17 @@ export class PersistenceExtension implements Extension {
creatorId: m.creatorId,
})),
oldMentionedUserIds,
pageId,
// Canonical UUID, never the doc-name slugId (#260).
pageId: page.id,
spaceId: page.spaceId,
workspaceId: page.workspaceId,
} as IPageMentionNotificationJob);
}
await this.aiQueue.add(QueueJob.PAGE_CONTENT_UPDATED, {
pageIds: [pageId],
// Canonical UUID: the embedding reindex resolves pages by uuid, so a
// slugId here threw Postgres 22P02 invalid-uuid (#260).
pageIds: [page.id],
workspaceId: page.workspaceId,
});
@@ -345,6 +445,37 @@ export class PersistenceExtension implements Extension {
}
}
/**
* #251 — receive the client's deliberate-clear signal. Records a short-lived,
* single-use pending flag for the originating document so the next
* onStoreDocument may let one empty-over-non-empty write through the guard.
*
* Hardening: read-only connections cannot arm the flag, and the document is
* taken from the connection (`data.documentName`), never the payload, so a
* client cannot target a page it isn't editing. The flag only ever RELAXES
* the guard for an empty write (a clear); it can never force or alter a
* non-empty write, so it is not a guard bypass for normal content.
*/
async onStateless(data: onStatelessPayload) {
const { connection, documentName, payload } = data;
if (connection?.readOnly) return;
let message: { type?: string } | undefined;
try {
message = JSON.parse(payload);
} catch {
return; // unrelated / malformed stateless message
}
if (message?.type !== INTENTIONAL_CLEAR_MESSAGE_TYPE) return;
this.intentionalClear.set(
documentName,
Date.now() + INTENTIONAL_CLEAR_TTL_MS,
);
}
async onChange(data: onChangePayload) {
const documentName = data.documentName;
const userId = data.context?.user?.id;
@@ -368,6 +499,7 @@ export class PersistenceExtension implements Extension {
const documentName = data.documentName;
this.contributors.delete(documentName);
this.agentTouched.delete(documentName);
this.intentionalClear.delete(documentName);
}
private consumeContributors(documentName: string): string[] {
@@ -385,6 +517,18 @@ export class PersistenceExtension implements Extension {
return touched;
}
/**
* #251 — read and clear the intentional-clear flag for this document. Returns
* true only if a flag was pending AND still within its TTL. Always deletes the
* entry so the signal is strictly single-use (one clear → one allowed empty
* write); an expired flag is treated as absent (guard still blocks).
*/
private consumeIntentionalClear(documentName: string): boolean {
const expiry = this.intentionalClear.get(documentName);
this.intentionalClear.delete(documentName);
return expiry !== undefined && Date.now() < expiry;
}
private async enqueuePageHistory(
page: Page,
lastUpdatedSource: string,

View File

@@ -0,0 +1,278 @@
import * as Y from 'yjs';
import { getSchema } from '@tiptap/core';
import {
initProseMirrorDoc,
absolutePositionToRelativePosition,
prosemirrorJSONToYDoc,
} from '@tiptap/y-tiptap';
import { tiptapExtensions } from './collaboration.util';
import {
setYjsMark,
removeYjsMarkByAttribute,
updateYjsMarkAttribute,
type YjsSelection,
} from './yjs.util';
/**
* Unit tests for the server-side Yjs mark helpers used by the collaboration
* handler to set/resolve/delete comment marks directly on the shared Y.Doc
* (collaboration.handler.ts: setCommentMark / resolveCommentMark).
*
* The fragment shape mirrors production exactly: a `default` XmlFragment whose
* children are block XmlElements (paragraph) holding XmlText runs. For setYjsMark
* the selection is a pair of Yjs RelativePosition JSONs (what the client sends);
* we synthesize them from known ProseMirror absolute positions via
* absolutePositionToRelativePosition so the marked range is deterministic.
*/
const schema = getSchema(tiptapExtensions);
// Build a real Y.Doc from ProseMirror JSON (same path the collab handler uses
// via TiptapTransformer) and return the doc + its `default` fragment.
function buildFromPm(pmJson: unknown) {
const ydoc = prosemirrorJSONToYDoc(
schema,
pmJson as never,
'default',
) as unknown as Y.Doc;
const fragment = ydoc.getXmlFragment('default');
return { ydoc, fragment };
}
// Make a YjsSelection (anchor/head RelativePosition JSON) for two ProseMirror
// absolute positions in `fragment`.
function selectionFor(
fragment: Y.XmlFragment,
anchorPos: number,
headPos: number,
): YjsSelection {
const { mapping } = initProseMirrorDoc(fragment, schema);
const anchor = absolutePositionToRelativePosition(
anchorPos,
fragment as never,
mapping,
);
const head = absolutePositionToRelativePosition(
headPos,
fragment as never,
mapping,
);
return {
anchor: Y.relativePositionToJSON(anchor),
head: Y.relativePositionToJSON(head),
};
}
// The XmlText run of the i-th top-level paragraph.
function paragraphText(fragment: Y.XmlFragment, index = 0): Y.XmlText {
const para = fragment.get(index) as Y.XmlElement;
return para.get(0) as Y.XmlText;
}
// --- raw fragment builder for the remove/update tests (no schema needed) ---
//
// removeYjsMarkByAttribute / updateYjsMarkAttribute only read item.toDelta() and
// call item.format(); they never touch the ProseMirror schema. Build the runs
// directly so we control which segment carries which comment attrs.
function buildWithComments(
segments: Array<{
text: string;
comment?: { commentId: string; resolved: boolean };
}>,
): { fragment: Y.XmlFragment; text: Y.XmlText } {
const ydoc = new Y.Doc();
const fragment = ydoc.getXmlFragment('default');
const para = new Y.XmlElement('paragraph');
fragment.insert(0, [para]);
const text = new Y.XmlText();
para.insert(0, [text]);
let offset = 0;
for (const seg of segments) {
text.insert(offset, seg.text);
if (seg.comment) {
text.format(offset, seg.text.length, { comment: seg.comment });
}
offset += seg.text.length;
}
return { fragment, text };
}
describe('setYjsMark', () => {
it('applies the mark over exactly the selected sub-range (PM pos 1..6 = "Hello")', () => {
const { ydoc, fragment } = buildFromPm({
type: 'doc',
content: [
{ type: 'paragraph', content: [{ type: 'text', text: 'Hello world' }] },
],
});
// PM pos 1 = start of the paragraph text; pos 6 = just after "Hello".
const sel = selectionFor(fragment, 1, 6);
setYjsMark(ydoc as never, fragment, sel, 'comment', {
commentId: 'c1',
resolved: false,
});
// The run splits: "Hello" carries the comment mark, " world" stays clean.
expect(paragraphText(fragment).toDelta()).toEqual([
{
insert: 'Hello',
attributes: { comment: { commentId: 'c1', resolved: false } },
},
{ insert: ' world' },
]);
});
it('normalizes a reversed selection (head before anchor) to the same range', () => {
const { ydoc, fragment } = buildFromPm({
type: 'doc',
content: [
{ type: 'paragraph', content: [{ type: 'text', text: 'Hello world' }] },
],
});
// anchor=6, head=1 — reversed; setYjsMark takes min/max so it marks "Hello".
const sel = selectionFor(fragment, 6, 1);
setYjsMark(ydoc as never, fragment, sel, 'comment', {
commentId: 'c2',
resolved: false,
});
expect(paragraphText(fragment).toDelta()).toEqual([
{
insert: 'Hello',
attributes: { comment: { commentId: 'c2', resolved: false } },
},
{ insert: ' world' },
]);
});
it('marks across two paragraphs (range spans an element boundary)', () => {
const { ydoc, fragment } = buildFromPm({
type: 'doc',
content: [
{ type: 'paragraph', content: [{ type: 'text', text: 'aaa' }] },
{ type: 'paragraph', content: [{ type: 'text', text: 'bbb' }] },
],
});
// PM positions: "aaa" = 1..4; the </p><p> boundary consumes pos 4 and 5, so
// "bbb" starts at pos 6 (chars at 6,7,8). Select pos 2 (inside "aaa") to pos
// 8 (after the second "b").
const sel = selectionFor(fragment, 2, 8);
setYjsMark(ydoc as never, fragment, sel, 'comment', {
commentId: 'c3',
resolved: false,
});
// First paragraph: "a" clean, "aa" marked.
expect(paragraphText(fragment, 0).toDelta()).toEqual([
{ insert: 'a' },
{
insert: 'aa',
attributes: { comment: { commentId: 'c3', resolved: false } },
},
]);
// Second paragraph: "bb" marked, "b" clean.
expect(paragraphText(fragment, 1).toDelta()).toEqual([
{
insert: 'bb',
attributes: { comment: { commentId: 'c3', resolved: false } },
},
{ insert: 'b' },
]);
});
});
describe('removeYjsMarkByAttribute', () => {
it('removes only the run whose attribute value matches, leaving others', () => {
const { fragment, text } = buildWithComments([
{ text: 'AAA', comment: { commentId: 'c1', resolved: false } },
{ text: 'BBB', comment: { commentId: 'c2', resolved: false } },
]);
removeYjsMarkByAttribute(fragment, 'comment', 'commentId', 'c1');
// c1's run loses the mark; c2's run is untouched.
expect(text.toDelta()).toEqual([
{ insert: 'AAA' },
{
insert: 'BBB',
attributes: { comment: { commentId: 'c2', resolved: false } },
},
]);
});
it('does nothing when no run carries the requested value (no-match branch)', () => {
const { fragment, text } = buildWithComments([
{ text: 'AAA', comment: { commentId: 'c1', resolved: false } },
]);
const before = text.toDelta();
removeYjsMarkByAttribute(fragment, 'comment', 'commentId', 'does-not-exist');
expect(text.toDelta()).toEqual(before);
});
it('leaves a different mark type alone', () => {
// A run carrying only `bold` must survive a comment removal pass.
const ydoc = new Y.Doc();
const fragment = ydoc.getXmlFragment('default');
const para = new Y.XmlElement('paragraph');
fragment.insert(0, [para]);
const text = new Y.XmlText();
para.insert(0, [text]);
text.insert(0, 'XYZ');
text.format(0, 3, { bold: true });
removeYjsMarkByAttribute(fragment, 'comment', 'commentId', 'c1');
expect(text.toDelta()).toEqual([
{ insert: 'XYZ', attributes: { bold: true } },
]);
});
});
describe('updateYjsMarkAttribute', () => {
it('merges new attributes into the matching run, preserving the rest', () => {
const { fragment, text } = buildWithComments([
{ text: 'AAA', comment: { commentId: 'c1', resolved: false } },
{ text: 'BBB', comment: { commentId: 'c2', resolved: false } },
]);
updateYjsMarkAttribute(
fragment,
'comment',
{ name: 'commentId', value: 'c1' },
{ resolved: true },
);
// c1's run flips resolved=true (commentId preserved via merge); c2 untouched.
expect(text.toDelta()).toEqual([
{
insert: 'AAA',
attributes: { comment: { commentId: 'c1', resolved: true } },
},
{
insert: 'BBB',
attributes: { comment: { commentId: 'c2', resolved: false } },
},
]);
});
it('does nothing when no run matches (no-match branch)', () => {
const { fragment, text } = buildWithComments([
{ text: 'AAA', comment: { commentId: 'c1', resolved: false } },
]);
const before = text.toDelta();
updateYjsMarkAttribute(
fragment,
'comment',
{ name: 'commentId', value: 'nope' },
{ resolved: true },
);
expect(text.toDelta()).toEqual(before);
});
});

View File

@@ -3,6 +3,8 @@ import { PageRepo } from '@docmost/db/repos/page/page.repo';
import { PageEmbeddingRepo } from '@docmost/db/repos/ai-chat/page-embedding.repo';
import { KyselyDB } from '@docmost/db/types/kysely.types';
import { AiService } from '../../../integrations/ai/ai.service';
import { EmbeddingReindexProgressService } from '../../../integrations/ai/embedding-reindex-progress.service';
import { AiEmbeddingNotConfiguredException } from '../../../integrations/ai/ai-embedding-not-configured.exception';
/**
* Unit tests for EmbeddingIndexerService.reindexWorkspace's batch control flow.
@@ -12,7 +14,8 @@ import { AiService } from '../../../integrations/ai/ai.service';
* reindexWorkspace actually touches:
* - aiService.getEmbeddingModel -> a model string so the up-front configured
* check passes,
* - pageRepo.getIdsByWorkspace -> three page ids,
* - pageRepo.getEmbeddablePageIds -> three page ids (the embeddable set the
* reindex iterates),
* - service.reindexPage -> spied per test to drive the per-page outcome.
*
* The point under test is the catch block: a FATAL provider error (auth/billing)
@@ -24,21 +27,30 @@ describe('EmbeddingIndexerService.reindexWorkspace fail-fast', () => {
function makeService() {
const pageRepo = {
getIdsByWorkspace: jest.fn().mockResolvedValue(['p1', 'p2', 'p3']),
getEmbeddablePageIds: jest.fn().mockResolvedValue(['p1', 'p2', 'p3']),
};
const pageEmbeddingRepo = {};
const aiService = {
getEmbeddingModel: jest.fn().mockResolvedValue('some-model'),
};
// Progress is a best-effort cosmetic store; mock its async methods so the
// batch control flow can be tested without Redis.
const reindexProgress = {
start: jest.fn().mockResolvedValue(undefined),
increment: jest.fn().mockResolvedValue(undefined),
clear: jest.fn().mockResolvedValue(undefined),
get: jest.fn().mockResolvedValue(null),
};
const db = {};
const service = new EmbeddingIndexerService(
pageRepo as unknown as PageRepo,
pageEmbeddingRepo as unknown as PageEmbeddingRepo,
aiService as unknown as AiService,
reindexProgress as unknown as EmbeddingReindexProgressService,
db as unknown as KyselyDB,
);
return { service, pageRepo, aiService };
return { service, pageRepo, aiService, reindexProgress };
}
it('aborts after the first page on a FATAL (401) provider error', async () => {
@@ -78,3 +90,100 @@ describe('EmbeddingIndexerService.reindexWorkspace fail-fast', () => {
expect(reindexPage).toHaveBeenCalledTimes(3);
});
});
/**
* Live reindex-progress reporting: reindexWorkspace must publish a per-workspace
* progress record (total at start, done incremented per processed page) and ALWAYS
* clear it in a finally — including on a fatal abort and an unconfigured early
* return — so the settings status can show the counter climb without ever getting
* stuck in a "reindexing" state.
*/
describe('EmbeddingIndexerService.reindexWorkspace progress', () => {
const WORKSPACE_ID = 'ws-1';
function makeService(pageIds: string[] = ['p1', 'p2', 'p3']) {
const pageRepo = {
getEmbeddablePageIds: jest.fn().mockResolvedValue(pageIds),
};
const pageEmbeddingRepo = {};
const aiService = {
getEmbeddingModel: jest.fn().mockResolvedValue('some-model'),
};
const reindexProgress = {
start: jest.fn().mockResolvedValue(undefined),
increment: jest.fn().mockResolvedValue(undefined),
clear: jest.fn().mockResolvedValue(undefined),
get: jest.fn().mockResolvedValue(null),
};
const db = {};
const service = new EmbeddingIndexerService(
pageRepo as unknown as PageRepo,
pageEmbeddingRepo as unknown as PageEmbeddingRepo,
aiService as unknown as AiService,
reindexProgress as unknown as EmbeddingReindexProgressService,
db as unknown as KyselyDB,
);
return { service, pageRepo, aiService, reindexProgress };
}
it('sets total at start, increments done per page, and clears in finally', async () => {
const { service, reindexProgress } = makeService(['p1', 'p2', 'p3']);
jest.spyOn(service, 'reindexPage').mockResolvedValue(undefined);
await service.reindexWorkspace(WORKSPACE_ID);
expect(reindexProgress.start).toHaveBeenCalledWith(WORKSPACE_ID, 3);
// One increment per processed page.
expect(reindexProgress.increment).toHaveBeenCalledTimes(3);
expect(reindexProgress.increment).toHaveBeenCalledWith(WORKSPACE_ID);
// Cleared exactly once on completion.
expect(reindexProgress.clear).toHaveBeenCalledTimes(1);
expect(reindexProgress.clear).toHaveBeenCalledWith(WORKSPACE_ID);
});
it('counts a handled (non-fatal) per-page failure as processed', async () => {
const { service, reindexProgress } = makeService(['p1', 'p2', 'p3']);
// No statusCode -> non-fatal -> isolate and continue; each counts as done.
jest.spyOn(service, 'reindexPage').mockRejectedValue(new Error('boom'));
await service.reindexWorkspace(WORKSPACE_ID);
expect(reindexProgress.increment).toHaveBeenCalledTimes(3);
expect(reindexProgress.clear).toHaveBeenCalledTimes(1);
});
it('clears progress in finally even when a FATAL provider error aborts the batch', async () => {
const { service, reindexProgress } = makeService(['p1', 'p2', 'p3']);
// A 401 aborts on the first page (re-thrown) — the finally must still clear.
jest
.spyOn(service, 'reindexPage')
.mockRejectedValue({ statusCode: 401, message: 'User not found' });
await expect(service.reindexWorkspace(WORKSPACE_ID)).rejects.toMatchObject({
statusCode: 401,
});
expect(reindexProgress.start).toHaveBeenCalledWith(WORKSPACE_ID, 3);
// Aborted page is NOT counted as processed.
expect(reindexProgress.increment).not.toHaveBeenCalled();
// But progress is still cleared so the run never gets stuck.
expect(reindexProgress.clear).toHaveBeenCalledTimes(1);
});
it('clears the enqueue-seeded progress on an unconfigured early return', async () => {
const { service, aiService, reindexProgress } = makeService();
// Embeddings not configured: reindexWorkspace returns early WITHOUT starting
// a fresh record, but the finally must still clear the enqueue-time seed.
aiService.getEmbeddingModel = jest
.fn()
.mockRejectedValue(new AiEmbeddingNotConfiguredException());
await expect(
service.reindexWorkspace(WORKSPACE_ID),
).resolves.toBeUndefined();
expect(reindexProgress.start).not.toHaveBeenCalled();
expect(reindexProgress.clear).toHaveBeenCalledTimes(1);
expect(reindexProgress.clear).toHaveBeenCalledWith(WORKSPACE_ID);
});
});

View File

@@ -9,6 +9,7 @@ import { KyselyDB } from '@docmost/db/types/kysely.types';
import { InjectKysely } from 'nestjs-kysely';
import { executeTx } from '@docmost/db/utils';
import { AiService } from '../../../integrations/ai/ai.service';
import { EmbeddingReindexProgressService } from '../../../integrations/ai/embedding-reindex-progress.service';
import { AiEmbeddingNotConfiguredException } from '../../../integrations/ai/ai-embedding-not-configured.exception';
import {
describeProviderError,
@@ -48,6 +49,7 @@ export class EmbeddingIndexerService {
private readonly pageRepo: PageRepo,
private readonly pageEmbeddingRepo: PageEmbeddingRepo,
private readonly aiService: AiService,
private readonly reindexProgress: EmbeddingReindexProgressService,
@InjectKysely() private readonly db: KyselyDB,
) {}
@@ -183,7 +185,19 @@ export class EmbeddingIndexerService {
}
/**
* (Re)build embeddings for EVERY non-deleted page in a workspace. Used by the
* (Re)build embeddings for the EMBEDDABLE page set of a workspace — the same
* set countEmbeddablePages counts (via getEmbeddablePageIds): non-deleted pages
* that qualify under any of the three clauses of `embeddablePredicate` —
* non-empty textContent, OR an empty/null textContent whose ProseMirror
* `content` JSON has at least one text node (`"type":"text"`) that `jsonToText`
* can extract, OR an already-stored (non-deleted) embedding row — NOT every
* non-deleted page. Iterating this set keeps the live `total` equal to the
* steady-state denominator, so the progress counter climbs 0 -> total and
* matches the before/after DB coverage exactly. A page with truly no
* extractable text (empty textContent AND content with only non-text/atom
* nodes such as math) is correctly skipped (reindexPage no-ops on it); a page
* that lost its text but still has stale embeddings stays in the set (the
* EXISTS clause) so it is visited and its stale rows are cleared. Used by the
* bulk reindex (WORKSPACE_CREATE_EMBEDDINGS, fired when AI Search is enabled
* and by the manual "Reindex now" action).
*
@@ -194,69 +208,99 @@ export class EmbeddingIndexerService {
* the batch.
*/
async reindexWorkspace(workspaceId: string): Promise<void> {
// The whole run is wrapped so the per-workspace progress record is ALWAYS
// cleared in the finally — on success, on a fatal-provider abort, on an
// unconfigured early-return, or on any unexpected throw — so a failed run
// never leaves a stuck "reindexing" state (the status then falls back to the
// steady-state DB coverage count). A placeholder record may already exist
// (seeded at enqueue time); the finally cleans that too.
try {
await this.aiService.getEmbeddingModel(workspaceId);
} catch (err) {
if (err instanceof AiEmbeddingNotConfiguredException) {
this.logger.log(
`reindexWorkspace: embeddings not configured for workspace ${workspaceId}, skipping`,
);
return;
}
throw err;
}
const pageIds = await this.pageRepo.getIdsByWorkspace(workspaceId);
const total = pageIds.length;
const startedAt = Date.now();
this.logger.log(
`reindexWorkspace: starting reindex of ${total} page(s) for workspace ${workspaceId}`,
);
let failed = 0;
for (let i = 0; i < total; i++) {
const pageId = pageIds[i];
const position = i + 1;
// Log BEFORE the await: if the embedding call hangs, this is the last line
// in the log and it names the exact page that is stuck.
this.logger.log(
`reindexWorkspace: [${position}/${total}] indexing page ${pageId} (workspace ${workspaceId})`,
);
const pageStartedAt = Date.now();
try {
await this.reindexPage(pageId);
const elapsed = Date.now() - pageStartedAt;
if (elapsed >= SLOW_PAGE_MS) {
this.logger.warn(
`reindexWorkspace: [${position}/${total}] page ${pageId} took ${elapsed}ms`,
);
}
await this.aiService.getEmbeddingModel(workspaceId);
} catch (err) {
// A fatal provider error (invalid/missing key, no credits) recurs
// identically on EVERY remaining page. Abort the whole batch instead of
// issuing hundreds of doomed requests against the provider.
if (isFatalProviderError(err)) {
this.logger.error(
`reindexWorkspace: aborting at [${position}/${total}] for workspace ` +
`${workspaceId} — fatal provider error, remaining pages would fail ` +
`identically: ${describeProviderError(err)}`,
if (err instanceof AiEmbeddingNotConfiguredException) {
this.logger.log(
`reindexWorkspace: embeddings not configured for workspace ${workspaceId}, skipping`,
);
throw err;
return;
}
// Per-page isolation: one non-fatal failure (incl. an embedding timeout)
// must not abort the whole batch.
failed++;
this.logger.error(
`reindexWorkspace: [${position}/${total}] failed to reindex page ${pageId} ` +
`after ${Date.now() - pageStartedAt}ms: ${describeProviderError(err)}`,
);
throw err;
}
}
this.logger.log(
`reindexWorkspace: done for workspace ${workspaceId}: ` +
`${total - failed}/${total} indexed, ${failed} failed in ${Date.now() - startedAt}ms`,
);
// Iterate the EMBEDDABLE set (same three-clause predicate as
// countEmbeddablePages), NOT every non-deleted page: this makes `total`
// here equal the steady-state denominator, so the live counter climbs
// 0 -> total and matches the before/after DB count exactly (no
// 478 -> 500 -> 478 denominator jump). Pages whose text lives in the
// ProseMirror `content` JSON (a text node) even with empty text_content ARE
// in this set (the content-JSON clause) and get embedded; a page with no
// extractable text at all is correctly skipped — reindexPage no-ops on it —
// and a page that lost its text but still has stale embeddings IS in this
// set (the EXISTS clause) so it is still visited and its stale rows cleared.
const pageIds = await this.pageRepo.getEmbeddablePageIds(workspaceId);
const total = pageIds.length;
const startedAt = Date.now();
// Publish the live run progress over this same set (done reset to 0). The
// counter increments once per iterated page and reaches exactly `total`,
// which equals countEmbeddablePages — the steady-state denominator.
await this.reindexProgress.start(workspaceId, total);
this.logger.log(
`reindexWorkspace: starting reindex of ${total} page(s) for workspace ${workspaceId}`,
);
let failed = 0;
for (let i = 0; i < total; i++) {
const pageId = pageIds[i];
const position = i + 1;
// Log BEFORE the await: if the embedding call hangs, this is the last line
// in the log and it names the exact page that is stuck.
this.logger.log(
`reindexWorkspace: [${position}/${total}] indexing page ${pageId} (workspace ${workspaceId})`,
);
const pageStartedAt = Date.now();
try {
await this.reindexPage(pageId);
// Count this page as processed (matches the [position/total] log).
await this.reindexProgress.increment(workspaceId);
const elapsed = Date.now() - pageStartedAt;
if (elapsed >= SLOW_PAGE_MS) {
this.logger.warn(
`reindexWorkspace: [${position}/${total}] page ${pageId} took ${elapsed}ms`,
);
}
} catch (err) {
// A fatal provider error (invalid/missing key, no credits) recurs
// identically on EVERY remaining page. Abort the whole batch instead of
// issuing hundreds of doomed requests against the provider. Do NOT count
// it as processed — the run aborts here (the finally clears progress).
if (isFatalProviderError(err)) {
this.logger.error(
`reindexWorkspace: aborting at [${position}/${total}] for workspace ` +
`${workspaceId} — fatal provider error, remaining pages would fail ` +
`identically: ${describeProviderError(err)}`,
);
throw err;
}
// Per-page isolation: one non-fatal failure (incl. an embedding timeout)
// must not abort the whole batch. A handled failure still advances the
// counter (matches the [position/total] log, so done reaches total).
failed++;
await this.reindexProgress.increment(workspaceId);
this.logger.error(
`reindexWorkspace: [${position}/${total}] failed to reindex page ${pageId} ` +
`after ${Date.now() - pageStartedAt}ms: ${describeProviderError(err)}`,
);
}
}
this.logger.log(
`reindexWorkspace: done for workspace ${workspaceId}: ` +
`${total - failed}/${total} indexed, ${failed} failed in ${Date.now() - startedAt}ms`,
);
} finally {
// Always remove the progress record so the status reverts to the DB count.
await this.reindexProgress.clear(workspaceId);
}
}
/** Purge ALL embeddings for a workspace (WORKSPACE_DELETE_EMBEDDINGS). */

View File

@@ -0,0 +1,166 @@
import { McpClientsService } from './mcp-clients.service';
/**
* Unit tests for the two security-critical surfaces of McpClientsService that the
* sibling specs (ssrf-guard / validate-resolved-addresses / lease) do NOT cover:
*
* 1. `decryptHeaders` (private) — FAIL-OPEN behavior. A decrypt/parse failure
* (e.g. APP_SECRET rotated, tampered blob) must NEVER throw and must NEVER
* log the blob: it returns `undefined` so the connect proceeds WITHOUT the
* now-unreadable auth headers (which then 401s and the server is skipped),
* rather than crashing the whole turn.
*
* 2. `this.guardedFetch` (private, bound to the SSRF-pinned dispatcher) — the
* per-request DNS-rebinding guard. A blocked host (private/loopback/metadata
* IP literal, or an unparseable URL) must REJECT before any socket is opened;
* a public host is allowed through to the real `fetch` with the pinned
* dispatcher attached.
*
* No network and no DB: the repo + secretBox deps are stubbed, and global `fetch`
* is mocked for the single allow-path assertion.
*/
// Build the service with a SecretBoxService stub whose decryptSecret is supplied
// per-test. The repo dep is unused by the methods under test.
function buildService(decryptSecret: (blob: string) => string) {
const secretBox = { decryptSecret: jest.fn(decryptSecret) };
const service = new McpClientsService({} as never, secretBox as never);
return { service, secretBox };
}
describe('McpClientsService.decryptHeaders', () => {
// Reach the private method via the as-any pattern common in these NestJS specs.
const callDecrypt = (
service: McpClientsService,
blob: string | null,
): Record<string, string> | undefined =>
(
service as unknown as {
decryptHeaders: (b: string | null) => Record<string, string> | undefined;
}
).decryptHeaders(blob);
it('returns undefined for a null blob without decrypting', () => {
const { service, secretBox } = buildService(() => '{}');
expect(callDecrypt(service, null)).toBeUndefined();
expect(secretBox.decryptSecret).not.toHaveBeenCalled();
});
it('decrypts a valid blob and keeps only string-valued headers', () => {
const { service } = buildService(() =>
JSON.stringify({
Authorization: 'Bearer abc',
'X-Api-Key': 'k',
// Non-string values must be dropped, not coerced.
count: 5,
flag: true,
nested: { a: 1 },
}),
);
expect(callDecrypt(service, 'cipher')).toEqual({
Authorization: 'Bearer abc',
'X-Api-Key': 'k',
});
});
it('returns undefined when the decrypted object has no string headers', () => {
const { service } = buildService(() => JSON.stringify({ count: 5 }));
// No usable headers -> undefined (connect with no auth header), not {}.
expect(callDecrypt(service, 'cipher')).toBeUndefined();
});
it('FAILS OPEN: a decrypt error returns undefined instead of throwing', () => {
const { service } = buildService(() => {
throw new Error('Failed to decrypt secret — APP_SECRET may have changed');
});
const warnSpy = jest
.spyOn(
(service as unknown as { logger: { warn: (...a: unknown[]) => void } })
.logger,
'warn',
)
.mockImplementation(() => undefined);
let result: unknown;
expect(() => {
result = callDecrypt(service, 'tampered-blob');
}).not.toThrow();
expect(result).toBeUndefined();
// It warns (so ops sees degradation) but never logs the blob itself.
expect(warnSpy).toHaveBeenCalledTimes(1);
expect(String(warnSpy.mock.calls[0]?.[0])).not.toContain('tampered-blob');
});
it('FAILS OPEN: malformed JSON (decrypts to non-JSON) returns undefined', () => {
const { service } = buildService(() => 'not-json{');
jest
.spyOn(
(service as unknown as { logger: { warn: (...a: unknown[]) => void } })
.logger,
'warn',
)
.mockImplementation(() => undefined);
expect(callDecrypt(service, 'cipher')).toBeUndefined();
});
});
describe('McpClientsService.guardedFetch (SSRF per-request guard)', () => {
// The bound guardedFetch closure lives on the instance as a private field.
const guardedFetchOf = (service: McpClientsService) =>
(service as unknown as { guardedFetch: typeof fetch }).guardedFetch;
let fetchSpy: jest.SpiedFunction<typeof fetch>;
beforeEach(() => {
// Any reachable real fetch would be a network call; assert per-test that the
// blocked paths never reach it, and stub a Response for the allow path.
fetchSpy = jest
.spyOn(global, 'fetch')
.mockResolvedValue(new Response('ok', { status: 200 }));
});
afterEach(() => {
jest.restoreAllMocks();
});
const blocked: Array<[string, string]> = [
['loopback IPv4', 'http://127.0.0.1/mcp'],
['private 10/8', 'http://10.0.0.5/mcp'],
['private 192.168/16', 'http://192.168.1.1/mcp'],
['cloud metadata link-local', 'http://169.254.169.254/latest/meta-data/'],
['loopback IPv6 (bracketed)', 'http://[::1]:8080/mcp'],
];
it.each(blocked)(
'rejects a request to %s without opening a socket',
async (_label, url) => {
const { service } = buildService(() => '{}');
await expect(guardedFetchOf(service)(url)).rejects.toThrow(
/blocked request/,
);
expect(fetchSpy).not.toHaveBeenCalled();
},
);
it('rejects an unparseable URL as a blocked request', async () => {
const { service } = buildService(() => '{}');
await expect(
guardedFetchOf(service)('::: not a url :::'),
).rejects.toThrow('blocked request: invalid URL');
expect(fetchSpy).not.toHaveBeenCalled();
});
it('allows a public IP literal and forwards through the pinned dispatcher', async () => {
const { service } = buildService(() => '{}');
const res = await guardedFetchOf(service)('http://8.8.8.8/mcp');
expect(res.status).toBe(200);
expect(fetchSpy).toHaveBeenCalledTimes(1);
// The init MUST carry the SSRF-pinned undici dispatcher (the rebinding pin);
// dropping it would let undici do a second, unchecked DNS resolution.
const init = fetchSpy.mock.calls[0][1] as RequestInit & {
dispatcher?: unknown;
};
expect(init.dispatcher).toBeDefined();
});
});

View File

@@ -5,6 +5,34 @@ import { pathToFileURL } from 'node:url';
* ESM-only `@docmost/mcp` package. We only need the constructor + the read/write
* methods used by the per-user tool adapter; the full client surface lives in
* `packages/mcp/src/client.ts`. Signatures here mirror that file exactly.
*
* DRIFT GUARD: the method NAMES below are runtime-checked against the real
* `DocmostClient` by `packages/mcp/test/unit/client-host-contract.test.mjs`
* (which can import the ESM class directly). If you rename/remove a method here
* or in client.ts, that test fails — so a stale mirror cannot silently ship a
* runtime "x is not a function" into an agent tool call. Keep the two in sync.
*
* STAGED PLAN — full derivation `DocmostClientLike = <real DocmostClient type>`
* (issue #193, layer 3) is intentionally NOT done; it stays a hand-mirror for
* now because of two verified blockers across the ESM(mcp)/CJS(server) boundary:
* 1. `@docmost/mcp` emits NO declaration files (its tsconfig has no
* `declaration`, package.json has no `types`/types-export) and the server
* tsconfig has no path mapping for it — the server only loads it via the
* runtime `import()` trick below, so there is no type to import today.
* 2. The real client methods have inferred, CONCRETE return types; the in-app
* tool adapter reads results through loose `Record<string,unknown>` returns
* + `as` casts (e.g. `(result?.data ?? {}) as { title?: string }`).
* Deriving the exact type would make those casts non-overlapping ("may be a
* mistake") and break the build, and `Partial<DocmostClientLike>` test stubs
* would have to satisfy the full concrete surface.
* To do it safely later (incrementally): (a) turn on `declaration: true` in
* packages/mcp/tsconfig.json + add a `types` export condition and commit the
* emitted `.d.ts`; (b) `import type { DocmostClient } from '@docmost/mcp'` here
* and replace this interface with a `Pick<DocmostClient, ...>` of the consumed
* methods; (c) audit every `as` cast in ai-chat-tools.service.ts against the now
* concrete return types (double-cast through `unknown` only where genuinely
* needed); (d) keep the runtime guard test as a belt-and-braces check. Until
* then the guard test above is the cheap, behaviour-neutral protection.
*/
export interface DocmostClientLike {
// --- read ---

View File

@@ -0,0 +1,124 @@
import { z } from 'zod';
import { AiChatToolsService } from './ai-chat-tools.service';
import * as loader from './docmost-client.loader';
import type { DocmostClientLike } from './docmost-client.loader';
// The real zod-agnostic registry, imported from source so the contract is checked
// against exactly what the @docmost/mcp package ships (no hand-stub).
import { SHARED_TOOL_SPECS } from '../../../../../../packages/mcp/src/tool-specs';
/**
* CONTRACT: SHARED_TOOL_SPECS <-> in-app tool wiring parity.
*
* `packages/mcp/src/tool-specs.ts` is the single source of truth for the tools
* that are intentionally IDENTICAL across the standalone MCP server (zod v3) and
* the in-app AI-SDK service (zod v4). The in-app service builds each one via
* `sharedTool(sharedToolSpecs.<key>, execute)`, keyed by the spec's `inAppKey`.
*
* This test fails the build if a spec is added to the registry but never wired
* in-app, if an `inAppKey` is renamed without updating the service, if the
* description drifts between the registry and the exposed tool, if the
* snake_case `mcpName` <-> camelCase `inAppKey` convention is broken, or if the
* exposed tool's input-schema keys diverge from the spec's `buildShape`.
*
* It does NOT need @docmost/mcp built: the registry is imported from TS source,
* and the ESM loader is mocked so `forUser()` never dynamically imports the
* package.
*/
describe('SHARED_TOOL_SPECS contract parity', () => {
// Empty fake client: no tool is executed here — every assertion is on tool
// presence / metadata / schema, so the client methods are never called.
const fakeClient: Partial<DocmostClientLike> = {};
const tokenServiceStub = {
generateAccessToken: jest.fn().mockResolvedValue('access-token'),
generateCollabToken: jest.fn().mockResolvedValue('collab-token'),
};
let tools: Record<string, unknown>;
beforeAll(async () => {
jest.spyOn(loader, 'loadDocmostMcp').mockResolvedValue({
DocmostClient: function () {
return fakeClient as DocmostClientLike;
} as unknown as loader.DocmostClientCtor,
// Feed the service the SAME registry this test asserts against.
sharedToolSpecs: SHARED_TOOL_SPECS as unknown as Record<
string,
loader.SharedToolSpec
>,
});
const service = new AiChatToolsService(
tokenServiceStub as never,
{} as never,
{} as never,
{} as never,
{} as never,
{ asSink: () => ({ put: jest.fn(), has: jest.fn(), evict: jest.fn() }) } as never,
);
tools = (await service.forUser(
{ id: 'user-1', email: 'u@example.com', workspaceId: 'ws-1' } as never,
'session-1',
'ws-1',
'chat-1',
)) as unknown as Record<string, unknown>;
});
afterAll(() => jest.restoreAllMocks());
// camelCase -> snake_case, matching the registry's mcpName convention.
const toSnake = (s: string) =>
s.replace(/[A-Z]/g, (c) => `_${c.toLowerCase()}`);
// Type as the (optional-buildShape) SharedToolSpec; the `satisfies` literal
// above otherwise narrows to a union where some members lack buildShape.
const specEntries = Object.entries(SHARED_TOOL_SPECS) as Array<
[string, loader.SharedToolSpec]
>;
// Sanity: the registry is non-empty, so the per-spec table below is not vacuous.
it('registry is non-empty', () => {
expect(specEntries.length).toBeGreaterThan(0);
});
describe.each(specEntries)('spec "%s"', (registryKey, spec) => {
it('registry key equals its inAppKey', () => {
// The service indexes the registry by property name; a key != inAppKey
// would wire the wrong (or no) tool.
expect(spec.inAppKey).toBe(registryKey);
});
it('mcpName is the snake_case form of inAppKey', () => {
expect(spec.mcpName).toBe(toSnake(spec.inAppKey));
});
it('is exposed in-app under its inAppKey', () => {
// Fails if a spec is added to the registry but never wired in forUser().
expect(tools[spec.inAppKey]).toBeDefined();
});
it("exposed tool's description matches the registry description", () => {
const tool = tools[spec.inAppKey] as { description: string };
expect(tool.description).toBe(spec.description);
});
it("exposed tool's input-schema keys match buildShape (incl. required)", () => {
const tool = tools[spec.inAppKey] as {
inputSchema: { jsonSchema: { properties?: Record<string, unknown>; required?: string[] } };
};
const json = tool.inputSchema.jsonSchema;
const actualKeys = Object.keys(json.properties ?? {}).sort();
// Derive the spec's declared shape with THIS layer's zod (v4) — the same
// call the service makes — then compare key sets and required-ness.
const shape = spec.buildShape ? spec.buildShape(z) : {};
const expectedKeys = Object.keys(shape).sort();
expect(actualKeys).toEqual(expectedKeys);
// A non-.optional() field must surface as required in the advertised schema.
const expectedRequired = Object.entries(shape)
.filter(([, field]) => !(field as z.ZodTypeAny).isOptional?.())
.map(([k]) => k)
.sort();
expect((json.required ?? []).slice().sort()).toEqual(expectedRequired);
});
});
});

View File

@@ -0,0 +1,167 @@
import { PageRepo } from './page.repo';
import {
DummyDriver,
Kysely,
PostgresAdapter,
PostgresIntrospector,
PostgresQueryCompiler,
} from 'kysely';
/**
* F6 regression guard for the embeddable-page predicate.
*
* The predicate is shared by `countEmbeddablePages` (the "Indexed N of M" coverage
* denominator) and `getEmbeddablePageIds` (the exact set a full reindex iterates).
* It MUST select pages whose `text_content` was never backfilled (null/empty) but
* whose ProseMirror `content` JSON still carries body text — `reindexPage` builds
* its chunks straight from `content`, so without a content clause such a page is
* silently SKIPPED by a mass reindex even though it is fully embeddable.
*
* The content clause keys on the structural text-node marker `"type":"text"`, NOT
* a bare `"text":` key. The bare key also appears as the `attrs.text` of atom
* nodes that carry NO extractable text — notably math (`mathBlock`/`mathInline`),
* whose LaTeX lives in `attrs.text` and has no `generateText` serializer. A
* math-ONLY page therefore yields empty `text_content` and zero embeddings; if the
* predicate matched its `attrs.text` it would land in the denominator but
* `reindexPage` would no-op on it, pinning "Indexed N of M" below 100% forever —
* the exact bug this feature fixes. The `"type":"text"` marker matches only real
* text nodes (what `jsonToText` extracts), keeping the predicate consistent with
* what gets indexed.
*
* There is no real Postgres here: a recording Kysely (DummyDriver wired to the
* Postgres query compiler) compiles the queries to SQL so we can assert the WHERE
* predicate ORs in the narrowed content clause alongside the existing text_content
* and stored-embeddings clauses — and that BOTH callers compile the identical
* clause (denominator and reindex set can never diverge).
*/
function makeRecordingDb() {
const sqls: string[] = [];
const db = new Kysely<any>({
dialect: {
createAdapter: () => new PostgresAdapter(),
createDriver: () =>
new (class extends DummyDriver {
async acquireConnection() {
return {
executeQuery: async (compiled: { sql: string }) => {
sqls.push(compiled.sql);
return { rows: [] };
},
// eslint-disable-next-line @typescript-eslint/no-empty-function
streamQuery: async function* () {},
} as any;
}
})(),
createIntrospector: (d: Kysely<any>) => new PostgresIntrospector(d),
createQueryCompiler: () => new PostgresQueryCompiler(),
},
});
return { db, sqls };
}
// The narrowed content clause, as it appears in the compiled SQL. Keying on the
// structural `"type":"text"` marker (not a bare `"text":` key) is what excludes
// math-only pages whose only `"text"` key is the atom node's `attrs.text`.
const NARROWED_CLAUSE = `"type"[[:space:]]*:[[:space:]]*"text"`;
const BARE_TEXT_KEY = `"text"[[:space:]]*:`;
describe('PageRepo embeddable predicate — content-bearing pages (F6)', () => {
it('selects content-bearing pages via the narrowed "type":"text" node marker', async () => {
const { db, sqls } = makeRecordingDb();
const repo = new PageRepo(db as any, {} as any, { emit: jest.fn() } as any);
await repo.getEmbeddablePageIds('ws-1');
expect(sqls).toHaveLength(1);
const sql = sqls[0];
// Clause 1 (existing): pages with extractable text_content.
expect(sql).toContain('text_content');
// Clause 3 (the F6 fix, now narrowed): a page whose content JSON carries a
// real text node is selected even when text_content is null/empty, so a full
// reindex visits it instead of silently skipping it.
expect(sql).toContain('content::text');
expect(sql).toContain(NARROWED_CLAUSE);
// It must NOT use the old bare `"text":` key, which also matches the
// `attrs.text` of math-only atom pages (false-positive denominator inflation).
expect(sql).not.toContain(BARE_TEXT_KEY);
// Clause 2 (existing): pages that already have stored embeddings stay in the
// set so a reindex can clear their stale rows.
expect(sql.toLowerCase()).toContain('embeddings');
});
it('countEmbeddablePages compiles the SAME narrowed clause as getEmbeddablePageIds', async () => {
// Consistency is the core requirement: the denominator (countEmbeddablePages)
// and the reindex set (getEmbeddablePageIds) MUST share the identical
// predicate, else the live "done" counter and the steady-state total diverge.
const { db, sqls } = makeRecordingDb();
const repo = new PageRepo(db as any, {} as any, { emit: jest.fn() } as any);
await repo.countEmbeddablePages('ws-1');
await repo.getEmbeddablePageIds('ws-1');
expect(sqls).toHaveLength(2);
const [countSql, idsSql] = sqls;
// Both carry the narrowed content clause...
expect(countSql).toContain(NARROWED_CLAUSE);
expect(idsSql).toContain(NARROWED_CLAUSE);
// ...neither carries the bare key...
expect(countSql).not.toContain(BARE_TEXT_KEY);
expect(idsSql).not.toContain(BARE_TEXT_KEY);
// ...and the full OR predicate (text_content + content node + embeddings
// EXISTS) is byte-identical between the two queries, so they can't drift.
const where = (s: string) => s.slice(s.indexOf('where'));
expect(where(countSql)).toEqual(where(idsSql));
});
it('the content regex matches a text-bearing doc but NOT a math-only doc', () => {
// Semantic check of the predicate against sample `content::text` payloads.
// Note: `jsonb::text` is NOT identical to JSON.stringify — Postgres renders a
// space after each colon (`"type": "text"`), which is exactly why the POSIX
// clause uses `[[:space:]]*`. The clause `"type"[[:space:]]*:[[:space:]]*"text"`
// maps to the JS regex below (`[[:space:]]` -> `\s`, tolerating both forms);
// we evaluate it the way Postgres would.
const re = /"type"\s*:\s*"text"/;
// A real paragraph with a text node -> embeddable.
const textDoc = JSON.stringify({
type: 'doc',
content: [
{
type: 'paragraph',
content: [{ type: 'text', text: 'hello world' }],
},
],
});
// A doc whose ONLY node is a math atom. Its LaTeX is in `attrs.text`, there is
// no text node, and `jsonToText`/`generateText` has no serializer for it -> it
// yields empty text_content and zero embeddings, so it must NOT qualify.
const mathOnlyDoc = JSON.stringify({
type: 'doc',
content: [
{ type: 'mathBlock', attrs: { text: 'E = mc^2' } },
{ type: 'mathInline', attrs: { text: '\\alpha' } },
],
});
// An empty doc has no text node either.
const emptyDoc = JSON.stringify({ type: 'doc', content: [] });
expect(re.test(textDoc)).toBe(true);
expect(re.test(mathOnlyDoc)).toBe(false);
expect(re.test(emptyDoc)).toBe(false);
// Sanity: the OLD bare-key regex WOULD have wrongly matched the math-only doc,
// which is precisely the false positive the narrowing removes.
expect(/"text"\s*:/.test(mathOnlyDoc)).toBe(true);
// A user literally TYPING `"type":"text"` in prose can't false-positive on an
// otherwise text-less page: in `content::text` the typed value's quotes are
// escaped (`\"type\":\"text\"`), so the literal-quote regex does not match the
// escaped form. (And such a page is a genuine text node anyway.)
const escapedLiteral = JSON.stringify({
type: 'doc',
content: [{ type: 'someAtom', attrs: { note: '"type":"text"' } }],
});
expect(re.test(escapedLiteral)).toBe(false);
});
});

View File

@@ -12,6 +12,7 @@ import { executeWithCursorPagination } from '@docmost/db/pagination/cursor-pagin
import { validate as isValidUUID } from 'uuid';
import { ExpressionBuilder, sql } from 'kysely';
import { DB } from '@docmost/db/types/db';
import { DbInterface } from '@docmost/db/types/db.interface';
import { jsonArrayFrom, jsonObjectFrom } from 'kysely/helpers/postgres';
import { SpaceMemberRepo } from '@docmost/db/repos/space/space-member.repo';
import { EventEmitter2 } from '@nestjs/event-emitter';
@@ -233,9 +234,9 @@ export class PageRepo {
* text-less pages (which legitimately store zero embeddings) don't keep the
* bar below 100% forever.
*
* A page qualifies if it has non-empty textContent OR already has stored
* embeddings. The second clause covers pages whose text the indexer extracted
* from the content JSON when textContent was null, and guarantees this total is
* A page qualifies if it has non-empty textContent, OR its content JSON has at
* least one text node (`"type":"text"`) when textContent was never backfilled,
* OR it already has stored embeddings. The last clause guarantees this total is
* always >= countIndexedPages (the indexed count can never exceed it).
*/
async countEmbeddablePages(workspaceId: string): Promise<number> {
@@ -243,37 +244,91 @@ export class PageRepo {
.selectFrom('pages as p')
.where('p.workspaceId', '=', workspaceId)
.where('p.deletedAt', 'is', null)
.where((eb) =>
eb.or([
// Has extractable body text. The regex matches any non-whitespace
// character, mirroring the indexer's `text.trim().length === 0` check
// (raw SQL -> use the snake_case column name).
sql<boolean>`p.text_content ~ '[^[:space:]]'`,
// OR already has at least one (non-deleted) embedding row.
eb.exists(
eb
.selectFrom('pageEmbeddings as pe')
.select(sql`1`.as('one'))
.whereRef('pe.pageId', '=', 'p.id')
.where('pe.deletedAt', 'is', null),
),
]),
)
.where((eb) => this.embeddablePredicate(eb))
.select((eb) => eb.fn.countAll().as('count'))
.executeTakeFirst();
return Number(row?.count ?? 0);
}
/**
* IDs of all non-deleted pages in a workspace. Used by the RAG bulk reindex to
* (re)build embeddings for every existing page.
* The "embeddable content" qualifying predicate, shared verbatim by
* countEmbeddablePages (the steady-state denominator) and getEmbeddablePageIds
* (the set the bulk reindex iterates). Both MUST use the exact same condition
* or the live total and steady-state total diverge — extracting it here is what
* guarantees that, replacing the previous hand-duplicated copy. Callers supply
* the trivial workspaceId/deletedAt filters inline; this returns only the
* non-trivial OR clause, evaluated against the `p` alias of `pages`.
*
* A page qualifies if it has non-empty textContent, OR its ProseMirror
* `content` JSON has at least one text node (`"type":"text"`) even though
* textContent was never backfilled, OR it already has a stored (non-deleted)
* embedding row.
*/
async getIdsByWorkspace(workspaceId: string): Promise<string[]> {
private embeddablePredicate(
eb: ExpressionBuilder<DbInterface & { p: DbInterface['pages'] }, 'p'>,
) {
return eb.or([
// Has extractable body text. The regex matches any non-whitespace
// character, mirroring the indexer's `text.trim().length === 0` check
// (raw SQL -> use the snake_case column name).
sql<boolean>`p.text_content ~ '[^[:space:]]'`,
// OR the ProseMirror `content` JSON has at least one text node (`"type":
// "text"`) the indexer can extract, even when `text_content` is null/empty
// (never backfilled): `reindexPage` runs `jsonToText` (generateText) over
// `content`, which only emits the text of ProseMirror text nodes, so such a
// page IS embeddable and a full reindex MUST visit it (otherwise it is
// silently skipped). A text node always serialises as
// `{"type":"text","text":"..."}`, so we key on the structural `"type":
// "text"` marker — NOT a bare `"text":` key, which also appears as the
// `attrs.text` of atom nodes that carry NO extractable text (e.g. math
// `mathBlock`/`mathInline`, whose LaTeX lives in `attrs.text` and has no
// text serializer). A math-only page thus produces empty `text_content` and
// zero embeddings; matching its `attrs.text` here would wrongly inflate the
// denominator and keep "Indexed N of M" below 100% forever. An empty doc
// (no text nodes) has no `"type":"text"` and is correctly excluded. A user
// who literally types `"type":"text"` in their prose can't false-positive:
// in `content::text` that text value's quotes are escaped (`\"type\"...`),
// so the literal-quote regex won't match the escaped form (and such a page
// is a real text node anyway).
sql<boolean>`p.content::text ~ '"type"[[:space:]]*:[[:space:]]*"text"'`,
// OR already has at least one (non-deleted) embedding row.
eb.exists(
eb
.selectFrom('pageEmbeddings as pe')
.select(sql`1`.as('one'))
.whereRef('pe.pageId', '=', 'p.id')
.where('pe.deletedAt', 'is', null),
),
]);
}
/**
* IDs of the EMBEDDABLE page set for a workspace — the exact same set that
* `countEmbeddablePages` counts (a page qualifies if it has non-empty
* textContent, OR content JSON with at least one text node (`"type":"text"`)
* and an empty/null textContent, OR already has a stored embedding row). The
* bulk reindex
* iterates THIS set so the live "done" counter reaches exactly
* `countEmbeddablePages` (the steady-state denominator), instead of iterating
* every non-deleted page (which would push the denominator above the
* steady-state value mid-run).
*
* IMPORTANT: the qualifying WHERE is shared with `countEmbeddablePages` via the
* private `embeddablePredicate` helper, so the two can no longer drift — if the
* embeddable definition changes, change it once there and both stay in lockstep
* (else the live total and steady-state total diverge again). Dropping
* text-less pages is correct: `reindexPage` no-ops on
* a page with no extractable content anyway, and a page that lost its text but
* still has stale embeddings IS in this set (the EXISTS clause), so it is still
* visited and its stale rows are cleared.
*/
async getEmbeddablePageIds(workspaceId: string): Promise<string[]> {
const rows = await this.db
.selectFrom('pages')
.select('id')
.where('workspaceId', '=', workspaceId)
.where('deletedAt', 'is', null)
.selectFrom('pages as p')
.select('p.id')
.where('p.workspaceId', '=', workspaceId)
.where('p.deletedAt', 'is', null)
.where((eb) => this.embeddablePredicate(eb))
.execute();
return rows.map((r) => r.id);
}

View File

@@ -1,4 +1,12 @@
import { parsePositiveInt } from './ai-settings.service';
import { AiSettingsService, parsePositiveInt } from './ai-settings.service';
import { WorkspaceRepo } from '@docmost/db/repos/workspace/workspace.repo';
import { AiAgentRoleRepo } from '@docmost/db/repos/ai-agent-roles/ai-agent-roles.repo';
import { AiProviderCredentialsRepo } from '@docmost/db/repos/ai-chat/ai-provider-credentials.repo';
import { PageEmbeddingRepo } from '@docmost/db/repos/ai-chat/page-embedding.repo';
import { PageRepo } from '@docmost/db/repos/page/page.repo';
import { SecretBoxService } from '../crypto/secret-box';
import { EmbeddingReindexProgressService } from './embedding-reindex-progress.service';
import type { Queue } from 'bullmq';
/**
* Round-trip coercion for numeric `::text` provider settings (e.g.
@@ -41,3 +49,196 @@ describe('parsePositiveInt', () => {
expect(parsePositiveInt(42)).toBe(42);
});
});
/**
* getMasked must surface the LIVE reindex run progress while a reindex is active
* (so the "Indexed X of Y" counter can climb 0 -> total), and fall back to the
* steady-state DB coverage count (countIndexedPages / countEmbeddablePages) when
* no reindex is running. This is the server side of the fix for the counter that
* otherwise stays stuck at "478 of 478" the whole reindex.
*/
describe('AiSettingsService.getMasked reindex progress', () => {
const WORKSPACE_ID = 'ws-1';
function makeService() {
// No driver configured -> the credentials lookup is skipped, keeping the
// setup minimal; we only care about the indexed/total numbers here.
const workspaceRepo = {
findById: jest.fn().mockResolvedValue({ settings: {} }),
};
const aiAgentRoleRepo = {};
const aiProviderCredentialsRepo = { find: jest.fn() };
const pageEmbeddingRepo = {
countIndexedPages: jest.fn().mockResolvedValue(478),
};
const pageRepo = {
countEmbeddablePages: jest.fn().mockResolvedValue(478),
};
const secretBox = {};
const reindexProgress = {
get: jest.fn().mockResolvedValue(null),
};
const aiQueue = {};
const service = new AiSettingsService(
workspaceRepo as unknown as WorkspaceRepo,
aiAgentRoleRepo as unknown as AiAgentRoleRepo,
aiProviderCredentialsRepo as unknown as AiProviderCredentialsRepo,
pageEmbeddingRepo as unknown as PageEmbeddingRepo,
pageRepo as unknown as PageRepo,
secretBox as unknown as SecretBoxService,
reindexProgress as unknown as EmbeddingReindexProgressService,
aiQueue as unknown as Queue,
);
return { service, reindexProgress, pageEmbeddingRepo };
}
it('reports the live run numbers when a reindex progress record is active', async () => {
const { service, reindexProgress } = makeService();
// Use a progress.total (500) DISTINCT from the DB count (478) so the test
// actually pins the progress.total branch rather than coincidentally
// matching the DB fallback. With fix #1 the two sources agree in practice,
// but getMasked must still return progress.total when a record is active.
reindexProgress.get.mockResolvedValue({
total: 500,
done: 120,
startedAt: Date.now(),
});
const masked = await service.getMasked(WORKSPACE_ID);
expect(masked.indexedPages).toBe(120); // progress.done, not DB 478
expect(masked.totalPages).toBe(500); // progress.total, not DB 478
expect(masked.reindexing).toBe(true);
});
it('falls back to countIndexedPages when no reindex is active', async () => {
const { service, reindexProgress } = makeService();
reindexProgress.get.mockResolvedValue(null);
const masked = await service.getMasked(WORKSPACE_ID);
expect(masked.indexedPages).toBe(478);
expect(masked.totalPages).toBe(478);
expect(masked.reindexing).toBe(false);
});
});
/**
* reindex() must seed a live progress record (done=0) BEFORE enqueueing so the
* first status poll shows 0 — but ONLY when no run is already active, since
* aiQueue.add() de-duplicates a running reindex and a re-seed would reset the
* visible counter to 0 while the live worker keeps incrementing from its real
* position.
*/
describe('AiSettingsService.reindex progress seed', () => {
const WORKSPACE_ID = 'ws-1';
function makeService() {
const order: string[] = [];
const aiQueue = {
remove: jest.fn().mockResolvedValue(undefined),
add: jest.fn().mockImplementation(async () => {
order.push('add');
}),
};
const pageRepo = {
countEmbeddablePages: jest.fn().mockResolvedValue(478),
};
const reindexProgress = {
// Default: no active run -> seed should happen.
get: jest.fn().mockResolvedValue(null),
start: jest.fn().mockImplementation(async () => {
order.push('start');
}),
clear: jest.fn().mockResolvedValue(undefined),
};
const service = new AiSettingsService(
{} as unknown as WorkspaceRepo,
{} as unknown as AiAgentRoleRepo,
{} as unknown as AiProviderCredentialsRepo,
{} as unknown as PageEmbeddingRepo,
pageRepo as unknown as PageRepo,
{} as unknown as SecretBoxService,
reindexProgress as unknown as EmbeddingReindexProgressService,
aiQueue as unknown as Queue,
);
return { service, aiQueue, pageRepo, reindexProgress, order };
}
it('seeds progress (workspace, count) BEFORE enqueue when no run is active', async () => {
const { service, aiQueue, reindexProgress, order } = makeService();
await service.reindex(WORKSPACE_ID);
// The pre-seed carries the real page count AND a SHORT ttl (3rd arg) so a
// de-duplicated enqueue against a just-finishing job can't leave a phantom
// "reindexing: 0 of N" stuck for the full record TTL (F10).
expect(reindexProgress.start).toHaveBeenCalledWith(
WORKSPACE_ID,
478,
expect.any(Number),
);
const ttl = reindexProgress.start.mock.calls[0][2];
// Short pre-seed TTL, distinct from the full 1h (3600s) record TTL, but
// pinned to the client poll cap (120s) so a still-pending run can't expire
// into a false "done" while the client is still polling (F11).
expect(ttl).toBe(120);
expect(aiQueue.add).toHaveBeenCalledTimes(1);
// Seed must precede the enqueue so the first poll already reports done=0.
expect(order).toEqual(['start', 'add']);
});
it('does NOT re-seed when a run is already active (mid-run re-trigger)', async () => {
const { service, aiQueue, reindexProgress } = makeService();
// An active record exists -> a second click must not reset the counter.
reindexProgress.get.mockResolvedValue({
total: 478,
done: 120,
startedAt: Date.now(),
});
await service.reindex(WORKSPACE_ID);
expect(reindexProgress.start).not.toHaveBeenCalled();
// The enqueue still runs (and de-duplicates against the active job).
expect(aiQueue.add).toHaveBeenCalledTimes(1);
});
it('clears the seed it just wrote and re-throws when enqueue fails', async () => {
const { service, aiQueue, reindexProgress } = makeService();
// This call seeds (get() is null) but the enqueue then blows up
// (Redis hiccup/shutdown) -> the worker never runs and never clear()s, so
// reindex() must roll back its own seed to avoid a 1h stuck "reindexing".
const boom = new Error('redis down');
aiQueue.add.mockRejectedValue(boom);
await expect(service.reindex(WORKSPACE_ID)).rejects.toBe(boom);
expect(reindexProgress.start).toHaveBeenCalledWith(
WORKSPACE_ID,
478,
expect.any(Number),
);
expect(reindexProgress.clear).toHaveBeenCalledWith(WORKSPACE_ID);
});
it('does NOT clear a concurrent active run when enqueue fails (no seed)', async () => {
const { service, aiQueue, reindexProgress } = makeService();
// A run is already active, so THIS call does not seed; if the enqueue then
// fails it must NOT wipe the live worker's record.
reindexProgress.get.mockResolvedValue({
total: 478,
done: 120,
startedAt: Date.now(),
});
const boom = new Error('redis down');
aiQueue.add.mockRejectedValue(boom);
await expect(service.reindex(WORKSPACE_ID)).rejects.toBe(boom);
expect(reindexProgress.start).not.toHaveBeenCalled();
expect(reindexProgress.clear).not.toHaveBeenCalled();
});
});

View File

@@ -8,6 +8,7 @@ import { AiProviderCredentialsRepo } from '@docmost/db/repos/ai-chat/ai-provider
import { PageEmbeddingRepo } from '@docmost/db/repos/ai-chat/page-embedding.repo';
import { PageRepo } from '@docmost/db/repos/page/page.repo';
import { SecretBoxService } from '../crypto/secret-box';
import { EmbeddingReindexProgressService } from './embedding-reindex-progress.service';
import {
AiDriver,
AiProviderSettings,
@@ -30,6 +31,30 @@ export function parsePositiveInt(raw: unknown): number | undefined {
return Number.isFinite(n) && n > 0 ? Math.floor(n) : undefined;
}
/**
* TTL (seconds) for the enqueue-time progress PRE-SEED written by `reindex()`
* before the worker starts. Deliberately SHORT relative to the full 1h record
* TTL: if `aiQueue.add()` de-duplicates against a job that is just finishing
* (the worker's finally already ran `clear()` but removeOnComplete hasn't yet
* removed the job), no new worker runs to overwrite/clear this seed — so this
* shorter TTL lets the phantom "reindexing: 0 of N" expire instead of sticking
* for the full 1h record TTL. A worker that DOES start re-seeds with the full
* TTL, so a real run is unaffected.
*
* It MUST be >= the client poll cap (REINDEX_POLL_CAP_MS = 120000ms in
* ai-provider-settings.tsx) though: the AI_QUEUE worker runs at concurrency 1
* and shares the queue with page-level embedding jobs, so a queued reindex can
* wait well beyond a few dozen seconds before the worker re-seeds with the full
* TTL. If the pre-seed expired while the job is still pending, `get()` returns
* null and getMasked() falls back to the steady-state COUNT (indexedPages ==
* totalPages, reindexing=false) — the client reads that as "done & fully
* indexed", clears its deadline and STOPS polling, so the admin never sees the
* real climb. Pinning the pre-seed TTL to the client cap means a deduped phantom
* is bounded to ~120s — the same window the client already polls — and a genuine
* pending run never expires-into-"done" inside that window.
*/
const PRE_SEED_TTL_SECONDS = 120;
/**
* Shape of the partial update accepted by `update`. Mirrors the validated
* controller DTO. `apiKey` / `embeddingApiKey` are write-only: undefined =
@@ -74,6 +99,7 @@ export class AiSettingsService {
private readonly pageEmbeddingRepo: PageEmbeddingRepo,
private readonly pageRepo: PageRepo,
private readonly secretBox: SecretBoxService,
private readonly reindexProgress: EmbeddingReindexProgressService,
@InjectQueue(QueueName.AI_QUEUE) private readonly aiQueue: Queue,
) {}
@@ -100,21 +126,63 @@ export class AiSettingsService {
.remove(`ai-search-disabled-${workspaceId}`)
.catch(() => undefined);
// Seed a live progress record BEFORE enqueueing so the very first status
// poll already reports done=0 (the reindex POST returns the PRE-job counts,
// so without this seed the first poll would still show "total of total").
// `totalPages` uses countEmbeddablePages — the SAME set the worker iterates
// and the SAME denominator the status endpoint reports, so the live and
// steady-state totals match.
//
// ONLY seed when no run is active: aiQueue.add() de-duplicates an already-
// running reindex, so a mid-run re-trigger (second click / second admin /
// second tab) must NOT reset the visible counter to 0 — that would
// understate the live worker's real position for the rest of the run. The
// worker's own start() at run begin is the single authoritative reset.
let seeded = false;
if ((await this.reindexProgress.get(workspaceId)) === null) {
const totalPages = await this.pageRepo.countEmbeddablePages(workspaceId);
// Short TTL (vs the full 1h record TTL): if add() below de-duplicates
// against a just-finishing job whose worker already clear()ed but isn't
// removed yet, no worker runs to clear this seed — the shorter TTL expires
// the phantom record rather than leaving a stuck "reindexing: 0 of N" for
// the full record TTL. It is kept >= the client poll cap (120s) so a
// genuine but still-pending run never expires into a false "done" while
// the client is still polling (see PRE_SEED_TTL_SECONDS).
await this.reindexProgress.start(
workspaceId,
totalPages,
PRE_SEED_TTL_SECONDS,
);
seeded = true;
}
const jobId = `ai-reindex-${workspaceId}`;
// Clear a prior non-active entry so a stale job can't block this reindex.
// A locked/active job is left in place (remove() no-ops) and the add() below
// de-duplicates against it, keeping the in-progress pass.
await this.aiQueue.remove(jobId).catch(() => undefined);
await this.aiQueue.add(
QueueJob.WORKSPACE_CREATE_EMBEDDINGS,
{ workspaceId },
{
jobId,
removeOnComplete: true,
removeOnFail: true,
},
);
try {
await this.aiQueue.add(
QueueJob.WORKSPACE_CREATE_EMBEDDINGS,
{ workspaceId },
{
jobId,
removeOnComplete: true,
removeOnFail: true,
},
);
} catch (err) {
// If the enqueue fails (Redis hiccup/shutdown) the worker never runs, so
// its finally->clear() never fires. Roll back the seed WE just wrote so
// the status endpoint doesn't report a stuck "reindexing: 0 of N" for the
// full TTL. Only clear when this call did the seed — never wipe a
// concurrent active run's record (get() was non-null, seeded=false).
if (seeded) {
await this.reindexProgress.clear(workspaceId);
}
throw err;
}
}
/**
@@ -253,13 +321,33 @@ export class AiSettingsService {
hasSttApiKey = !!creds?.sttApiKeyEnc;
}
// totalPages now counts only pages with embeddable content (non-empty text
// or already-stored embeddings), so empty/text-less pages don't keep the
// "Indexed N of M pages" bar below 100% forever.
const [indexedPages, totalPages] = await Promise.all([
this.pageEmbeddingRepo.countIndexedPages(workspaceId),
this.pageRepo.countEmbeddablePages(workspaceId),
]);
// While a reindex run is active, report its LIVE progress (done climbs 0 ->
// total) so the settings UI can watch it advance. Read progress FIRST and
// short-circuit: this endpoint is polled every ~5s for the whole run, so when
// a record is active we skip the two coverage COUNTs entirely (their results
// would be discarded anyway). Without the live progress the counter never
// drops: the per-page reindex hard-replaces rows in its own small
// transaction, so countIndexedPages stays ~= total for the whole run. With no
// active record we fall back to the steady-state DB coverage count, which
// preserves the existing display and the client's "done == total -> stop
// polling" condition (the run ends -> record cleared -> DB count == total).
//
// The fallback `totalPages` counts only pages with embeddable content
// (non-empty text, content-borne text, or already-stored embeddings), so
// empty/text-less pages don't keep the "Indexed N of M pages" bar below 100%
// forever.
const progress = await this.reindexProgress.get(workspaceId);
let indexedPages: number;
let totalPages: number;
if (progress) {
indexedPages = progress.done;
totalPages = progress.total;
} else {
[indexedPages, totalPages] = await Promise.all([
this.pageEmbeddingRepo.countIndexedPages(workspaceId),
this.pageRepo.countEmbeddablePages(workspaceId),
]);
}
return {
driver: provider.driver,
@@ -281,6 +369,8 @@ export class AiSettingsService {
hasSttApiKey,
indexedPages,
totalPages,
// Optional hint for the client: a reindex run is currently in progress.
reindexing: progress != null,
};
}

View File

@@ -5,6 +5,7 @@ import { QueueName } from '../queue/constants';
import { AiService } from './ai.service';
import { AiSettingsService } from './ai-settings.service';
import { AiSettingsController } from './ai-settings.controller';
import { EmbeddingReindexProgressService } from './embedding-reindex-progress.service';
/**
* LLM driver + provider-settings unit (§6.2/§6.4).
@@ -19,7 +20,7 @@ import { AiSettingsController } from './ai-settings.controller';
BullModule.registerQueue({ name: QueueName.AI_QUEUE }),
],
controllers: [AiSettingsController],
providers: [AiService, AiSettingsService],
exports: [AiService, AiSettingsService],
providers: [AiService, AiSettingsService, EmbeddingReindexProgressService],
exports: [AiService, AiSettingsService, EmbeddingReindexProgressService],
})
export class AiModule {}

View File

@@ -146,4 +146,7 @@ export interface MaskedAiSettings {
// RAG indexing coverage for the settings UI.
indexedPages: number;
totalPages: number;
// True while a full workspace reindex is actively running (the counts above
// then reflect the live run progress rather than the steady-state DB count).
reindexing?: boolean;
}

View File

@@ -0,0 +1,179 @@
import { EmbeddingReindexProgressService } from './embedding-reindex-progress.service';
import type { RedisService } from '@nestjs-labs/nestjs-ioredis';
import type { Redis } from 'ioredis';
/**
* Unit tests for the Redis-backed reindex-progress store.
*
* The store is a thin, BEST-EFFORT wrapper: writes (start/increment) issue an
* hset/hincrby + expire pipeline and must SWALLOW Redis errors (progress is
* cosmetic — it must never break a reindex); reads (get) must map a valid hash
* to a ReindexProgress and degrade to null on a malformed/missing record or a
* Redis failure. We drive it with a hand-rolled fake ioredis (the project mocks
* Redis with plain fakes, see public-share limiter specs).
*/
describe('EmbeddingReindexProgressService', () => {
const WORKSPACE_ID = 'ws-1';
const KEY = 'ai:reindex:progress:ws-1';
/**
* Build a fake ioredis whose `multi()` returns a chainable recorder and whose
* `hgetall`/`del` are configurable jest mocks. `execImpl` lets a test make the
* pipeline reject (to assert error-swallowing).
*/
function makeRedis(opts: { execImpl?: () => Promise<unknown> } = {}) {
const exec = jest
.fn()
.mockImplementation(opts.execImpl ?? (() => Promise.resolve([])));
// mockReturnThis() returns the call's `this` (the multi object), so the
// chain hset().expire().exec() resolves correctly.
const multiObj = {
hset: jest.fn().mockReturnThis(),
hincrby: jest.fn().mockReturnThis(),
expire: jest.fn().mockReturnThis(),
exec,
};
const multi = jest.fn(() => multiObj);
const hgetall = jest.fn().mockResolvedValue({});
const del = jest.fn().mockResolvedValue(1);
const redis = { multi, hgetall, del } as unknown as Redis;
return { redis, multiObj, multi, hgetall, del, exec };
}
function makeService(redis: Redis) {
const redisService = {
getOrThrow: () => redis,
} as unknown as RedisService;
return new EmbeddingReindexProgressService(redisService);
}
describe('get', () => {
it('maps a valid hash to a ReindexProgress object', async () => {
const { redis, hgetall } = makeRedis();
hgetall.mockResolvedValue({ total: '478', done: '120', startedAt: '1000' });
const service = makeService(redis);
await expect(service.get(WORKSPACE_ID)).resolves.toEqual({
total: 478,
done: 120,
startedAt: 1000,
});
expect(hgetall).toHaveBeenCalledWith(KEY);
});
it('returns null for an empty hash (no record)', async () => {
const { redis, hgetall } = makeRedis();
hgetall.mockResolvedValue({});
await expect(makeService(redis).get(WORKSPACE_ID)).resolves.toBeNull();
});
it('returns null when `total` is missing (partial record)', async () => {
const { redis, hgetall } = makeRedis();
hgetall.mockResolvedValue({ done: '5' });
await expect(makeService(redis).get(WORKSPACE_ID)).resolves.toBeNull();
});
it('returns null for a non-numeric total', async () => {
const { redis, hgetall } = makeRedis();
hgetall.mockResolvedValue({ total: 'abc', done: '1', startedAt: '1' });
await expect(makeService(redis).get(WORKSPACE_ID)).resolves.toBeNull();
});
it('returns null for a non-numeric done', async () => {
const { redis, hgetall } = makeRedis();
hgetall.mockResolvedValue({ total: '10', done: 'xyz', startedAt: '1' });
await expect(makeService(redis).get(WORKSPACE_ID)).resolves.toBeNull();
});
it('coerces a non-finite startedAt to 0', async () => {
const { redis, hgetall } = makeRedis();
hgetall.mockResolvedValue({ total: '10', done: '2', startedAt: 'nope' });
await expect(makeService(redis).get(WORKSPACE_ID)).resolves.toEqual({
total: 10,
done: 2,
startedAt: 0,
});
});
it('degrades to null when hgetall throws (degradation contract)', async () => {
const { redis, hgetall } = makeRedis();
hgetall.mockRejectedValue(new Error('redis down'));
await expect(makeService(redis).get(WORKSPACE_ID)).resolves.toBeNull();
});
});
describe('start', () => {
it('issues hset + expire on the workspace key', async () => {
const { redis, multiObj } = makeRedis();
await makeService(redis).start(WORKSPACE_ID, 478);
expect(multiObj.hset).toHaveBeenCalledWith(
KEY,
expect.objectContaining({ total: '478', done: '0' }),
);
expect(multiObj.expire).toHaveBeenCalledWith(KEY, expect.any(Number));
expect(multiObj.exec).toHaveBeenCalledTimes(1);
});
it('defaults the expire TTL to the full 1h record TTL', async () => {
const { redis, multiObj } = makeRedis();
await makeService(redis).start(WORKSPACE_ID, 478);
// Default ttl = full record TTL (60 * 60) so a real run never expires
// mid-flight before the worker refreshes it on each increment.
expect(multiObj.expire).toHaveBeenCalledWith(KEY, 60 * 60);
});
it('honours an explicit short ttlSeconds for the enqueue-time pre-seed (F10)', async () => {
const { redis, multiObj } = makeRedis();
// The reindex() pre-seed passes a short ttl so a phantom record left by a
// de-duplicated enqueue expires in seconds, not after the full 1h TTL.
await makeService(redis).start(WORKSPACE_ID, 478, 45);
expect(multiObj.expire).toHaveBeenCalledWith(KEY, 45);
});
it('swallows a thrown Redis error (best-effort)', async () => {
const { redis } = makeRedis({
execImpl: () => Promise.reject(new Error('redis down')),
});
await expect(
makeService(redis).start(WORKSPACE_ID, 1),
).resolves.toBeUndefined();
});
});
describe('increment', () => {
it('issues hincrby + expire on the workspace key', async () => {
const { redis, multiObj } = makeRedis();
await makeService(redis).increment(WORKSPACE_ID);
expect(multiObj.hincrby).toHaveBeenCalledWith(KEY, 'done', 1);
expect(multiObj.expire).toHaveBeenCalledWith(KEY, expect.any(Number));
expect(multiObj.exec).toHaveBeenCalledTimes(1);
});
it('swallows a thrown Redis error (best-effort)', async () => {
const { redis } = makeRedis({
execImpl: () => Promise.reject(new Error('redis down')),
});
await expect(
makeService(redis).increment(WORKSPACE_ID),
).resolves.toBeUndefined();
});
});
describe('clear', () => {
it('deletes the workspace key', async () => {
const { redis, del } = makeRedis();
await makeService(redis).clear(WORKSPACE_ID);
expect(del).toHaveBeenCalledWith(KEY);
});
it('swallows a thrown Redis error (best-effort)', async () => {
const { redis, del } = makeRedis();
del.mockRejectedValue(new Error('redis down'));
await expect(
makeService(redis).clear(WORKSPACE_ID),
).resolves.toBeUndefined();
});
});
});

View File

@@ -0,0 +1,162 @@
import { Injectable, Logger } from '@nestjs/common';
import { RedisService } from '@nestjs-labs/nestjs-ioredis';
import type { Redis } from 'ioredis';
/**
* Live progress of an in-flight workspace embeddings reindex run.
* `total` is the number of pages the run will process, `done` how many it has
* already processed (success OR handled failure), `startedAt` the epoch-ms the
* record was created.
*/
export interface ReindexProgress {
total: number;
done: number;
startedAt: number;
}
/** Redis key namespace for the per-workspace reindex-progress record. */
const KEY_PREFIX = 'ai:reindex:progress:';
/**
* TTL (seconds) on the progress record so a crashed/aborted worker that never
* reaches its `clear()` finally can still self-clean instead of leaving a stuck
* "reindexing" state. Refreshed on every increment so a long run never expires
* mid-flight; on a crash it disappears within TTL of the last processed page.
*
* INTENTIONALLY tied to WRITE progress (start/increment) only — never refreshed
* on get(). Refreshing on read would keep a dead worker's record alive forever
* as long as a client keeps polling (a permanently stuck reindexing:true). The
* clear() in the worker's finally handles normal completion; a dead worker's
* record expires after TTL, and the client's own poll cap stops polling anyway.
*/
const TTL_SECONDS = 60 * 60; // 1h
/**
* Cluster-wide store for the live progress of a workspace embeddings reindex.
*
* The reindex runs in a BullMQ worker (AI_QUEUE) that may be a DIFFERENT process
* than the API handling the settings-status GET, so the progress must live in
* the shared Redis — we reuse the same global ioredis client (RedisService from
* @nestjs-labs/nestjs-ioredis) that backs BullMQ and the other anti-abuse
* limiters, adding NO new Redis config.
*
* Everything here is best-effort and COSMETIC: progress only drives the "Indexed
* X of Y" counter while a reindex is running. Any Redis failure degrades to the
* existing steady-state behaviour (the status falls back to the DB coverage
* count), so reads fail to `null` and writes are swallowed — a reindex must
* never break because progress reporting did.
*
* Stored as a Redis HASH so `done` can be bumped with an atomic HINCRBY (the
* worker is the only writer of `done`, but HINCRBY also keeps us off a
* read-modify-write race and preserves the other fields).
*/
@Injectable()
export class EmbeddingReindexProgressService {
private readonly logger = new Logger(EmbeddingReindexProgressService.name);
private readonly redis: Redis;
constructor(redisService: RedisService) {
this.redis = redisService.getOrThrow();
}
private key(workspaceId: string): string {
return KEY_PREFIX + workspaceId;
}
/**
* Begin (or reset) the progress record for a workspace: `total` pages, `done`
* back to 0, `startedAt` now. Called twice for a run, BOTH with the real page
* count (countEmbeddablePages) so the two totals coincide: once at reindex
* enqueue time (so the very first status poll already reports done=0) and again
* at the worker start (which re-asserts the same total and resets `done`).
* Resets `done` to 0 so a re-trigger never inherits a stale count.
*
* `ttlSeconds` lets the caller pick the record's lifetime. The enqueue-time
* pre-seed passes a SHORT ttl: if `aiQueue.add()` de-duplicates against a job
* that is just finishing (its worker hasn't yet removed the job but already
* ran its `clear()`), no new worker starts to clear this phantom seed, so a
* short ttl lets it expire in seconds instead of sticking for the full TTL.
* The worker's own `start()` at the begin of a real run overwrites this entry
* and raises the ttl back to the default full TTL.
*/
async start(
workspaceId: string,
total: number,
ttlSeconds: number = TTL_SECONDS,
): Promise<void> {
const key = this.key(workspaceId);
try {
await this.redis
.multi()
.hset(key, {
total: String(total),
done: '0',
startedAt: String(Date.now()),
})
.expire(key, ttlSeconds)
.exec();
} catch (err) {
this.logger.warn(
`reindex-progress start failed for workspace ${workspaceId}; ` +
`progress reporting disabled for this run: ${(err as Error).message}`,
);
}
}
/**
* Bump the processed-page counter by one and refresh the TTL. Atomic and
* best-effort: a missing key (cleared/expired) would be recreated with only
* `done`, but `get()` treats a record without a numeric `total` as inactive,
* so that partial state safely reads as "no active reindex".
*/
async increment(workspaceId: string): Promise<void> {
const key = this.key(workspaceId);
try {
await this.redis.multi().hincrby(key, 'done', 1).expire(key, TTL_SECONDS).exec();
} catch (err) {
this.logger.warn(
`reindex-progress increment failed for workspace ${workspaceId}: ` +
`${(err as Error).message}`,
);
}
}
/**
* Remove the progress record. Called in the worker's `finally` so a completed,
* aborted, or unconfigured-early-return run never leaves a stuck record; the
* status then falls back to the DB coverage count.
*/
async clear(workspaceId: string): Promise<void> {
try {
await this.redis.del(this.key(workspaceId));
} catch (err) {
this.logger.warn(
`reindex-progress clear failed for workspace ${workspaceId} ` +
`(self-cleans via TTL): ${(err as Error).message}`,
);
}
}
/**
* Read the live progress, or `null` when no reindex is active (no record, an
* expired record, or a partial record without a numeric `total`). On a Redis
* error returns `null` so the status endpoint degrades to its DB count.
*/
async get(workspaceId: string): Promise<ReindexProgress | null> {
try {
const data = await this.redis.hgetall(this.key(workspaceId));
if (!data || data.total === undefined) return null;
const total = Number(data.total);
const done = Number(data.done);
const startedAt = Number(data.startedAt);
if (!Number.isFinite(total) || !Number.isFinite(done)) return null;
return { total, done, startedAt: Number.isFinite(startedAt) ? startedAt : 0 };
} catch (err) {
this.logger.warn(
`reindex-progress read failed for workspace ${workspaceId}; ` +
`falling back to DB count: ${(err as Error).message}`,
);
return null;
}
}
}

View File

@@ -1,18 +1,110 @@
import { Readable } from 'stream';
import { StorageService } from './storage.service';
import type { StorageDriver } from './interfaces';
// Direct instantiation with a stub driver. The Test.createTestingModule form
// failed to resolve the STORAGE_DRIVER_TOKEN at compile(); this smoke test only
// needs the service to construct.
describe('StorageService', () => {
/**
* StorageService is a thin facade over the injected StorageDriver: each public
* method must forward to the driver with the SAME arguments and return/await the
* driver's result unchanged (the read paths return it; the write paths await it).
* A mock driver lets us assert that delegation exactly, with no real S3/disk IO.
*/
describe('StorageService delegation', () => {
// Every driver method is a jest mock so we can assert call args + return passing.
function buildDriver(): jest.Mocked<StorageDriver> {
return {
upload: jest.fn().mockResolvedValue(undefined),
uploadStream: jest.fn().mockResolvedValue(undefined),
copy: jest.fn().mockResolvedValue(undefined),
read: jest.fn(),
readStream: jest.fn(),
readRangeStream: jest.fn(),
exists: jest.fn(),
getUrl: jest.fn(),
getSignedUrl: jest.fn(),
delete: jest.fn().mockResolvedValue(undefined),
getDriver: jest.fn(),
getDriverName: jest.fn(),
getConfig: jest.fn(),
} as unknown as jest.Mocked<StorageDriver>;
}
let driver: jest.Mocked<StorageDriver>;
let service: StorageService;
beforeEach(() => {
service = new StorageService(
{} as any, // storageDriver
);
driver = buildDriver();
service = new StorageService(driver as unknown as StorageDriver);
});
it('should be defined', () => {
expect(service).toBeDefined();
it('upload forwards path + content to the driver', async () => {
const buf = Buffer.from('data');
await service.upload('a/b.png', buf);
expect(driver.upload).toHaveBeenCalledWith('a/b.png', buf);
});
it('uploadStream forwards path, stream and options', async () => {
const stream = Readable.from(['x']);
await service.uploadStream('a/b.bin', stream, { recreateClient: true });
expect(driver.uploadStream).toHaveBeenCalledWith('a/b.bin', stream, {
recreateClient: true,
});
});
it('copy forwards both paths', async () => {
await service.copy('from.txt', 'to.txt');
expect(driver.copy).toHaveBeenCalledWith('from.txt', 'to.txt');
});
it('read returns the driver buffer unchanged', async () => {
const buf = Buffer.from('content');
driver.read.mockResolvedValue(buf);
await expect(service.read('f.txt')).resolves.toBe(buf);
expect(driver.read).toHaveBeenCalledWith('f.txt');
});
it('readStream returns the driver stream unchanged', async () => {
const stream = Readable.from(['y']);
driver.readStream.mockResolvedValue(stream);
await expect(service.readStream('f.bin')).resolves.toBe(stream);
expect(driver.readStream).toHaveBeenCalledWith('f.bin');
});
it('readRangeStream forwards the range object and returns the stream', async () => {
const stream = Readable.from(['z']);
driver.readRangeStream.mockResolvedValue(stream);
const range = { start: 0, end: 99 };
await expect(service.readRangeStream('f.bin', range)).resolves.toBe(stream);
expect(driver.readRangeStream).toHaveBeenCalledWith('f.bin', range);
});
it('exists returns the driver boolean', async () => {
driver.exists.mockResolvedValue(false);
await expect(service.exists('missing')).resolves.toBe(false);
expect(driver.exists).toHaveBeenCalledWith('missing');
});
it('getSignedUrl forwards path + expiry and returns the signed url', async () => {
driver.getSignedUrl.mockResolvedValue('https://signed/url');
await expect(service.getSignedUrl('f.png', 600)).resolves.toBe(
'https://signed/url',
);
expect(driver.getSignedUrl).toHaveBeenCalledWith('f.png', 600);
});
it('getUrl returns the driver url synchronously', () => {
driver.getUrl.mockReturnValue('https://cdn/f.png');
expect(service.getUrl('f.png')).toBe('https://cdn/f.png');
expect(driver.getUrl).toHaveBeenCalledWith('f.png');
});
it('delete forwards the path', async () => {
await service.delete('old.txt');
expect(driver.delete).toHaveBeenCalledWith('old.txt');
});
it('getDriverName returns the driver name', () => {
driver.getDriverName.mockReturnValue('s3');
expect(service.getDriverName()).toBe('s3');
expect(driver.getDriverName).toHaveBeenCalledTimes(1);
});
});

View File

@@ -1,3 +1,4 @@
import { Logger } from '@nestjs/common';
import { IoAdapter } from '@nestjs/platform-socket.io';
import { ServerOptions } from 'socket.io';
import { createAdapter } from '@socket.io/redis-adapter';
@@ -9,8 +10,11 @@ import {
} from '../../common/helpers';
export class WsRedisIoAdapter extends IoAdapter {
private readonly logger = new Logger(WsRedisIoAdapter.name);
private adapterConstructor: ReturnType<typeof createAdapter>;
private redisConfig: RedisConfig;
private pubClient: Redis;
private subClient: Redis;
async connectToRedis(): Promise<void> {
this.redisConfig = parseRedisUrl(process.env.REDIS_URL);
@@ -23,8 +27,13 @@ export class WsRedisIoAdapter extends IoAdapter {
const pubClient = new Redis(process.env.REDIS_URL, options);
const subClient = new Redis(process.env.REDIS_URL, options);
pubClient.on('error', (err) => () => {});
subClient.on('error', (err) => () => {});
pubClient.on('error', (err) => this.logger.error('socket.io redis pub client error', err));
subClient.on('error', (err) => this.logger.error('socket.io redis sub client error', err));
// Hold references so the pub/sub connections can be torn down on shutdown
// (see dispose()); otherwise these ioredis sockets leak as active handles.
this.pubClient = pubClient;
this.subClient = subClient;
this.adapterConstructor = createAdapter(pubClient, subClient);
}
@@ -34,4 +43,26 @@ export class WsRedisIoAdapter extends IoAdapter {
server.adapter(this.adapterConstructor);
return server;
}
/**
* Called once by Nest's SocketModule during application shutdown, after every
* socket.io server has been closed. The @socket.io/redis-adapter never owns
* the lifecycle of the ioredis pub/sub clients it is handed, so we close them
* here to avoid leaking their TCP handles on shutdown (see issue #255).
*
* Uses disconnect(false) to mirror the sibling pub/sub pair in
* collaboration/extensions/redis-sync (redis-sync.extension.ts onDestroy):
* an immediate close with no graceful QUIT round-trip and no auto-reconnect,
* which is what we want for idle adapter clients during teardown.
*/
async dispose(): Promise<void> {
await super.dispose();
// dispose() is invoked once per shutdown; null the refs so a second call
// (or any post-shutdown path) cannot act on already-closed clients.
this.pubClient?.disconnect(false);
this.subClient?.disconnect(false);
this.pubClient = undefined;
this.subClient = undefined;
}
}

View File

@@ -0,0 +1,160 @@
import { Kysely } from 'kysely';
import { randomUUID } from 'node:crypto';
import { PageRepo } from '@docmost/db/repos/page/page.repo';
import { SpaceMemberRepo } from '@docmost/db/repos/space/space-member.repo';
import { EventEmitter2 } from '@nestjs/event-emitter';
import { getTestDb, destroyTestDb, createWorkspace, createSpace } from './db';
/**
* `PageRepo.getEmbeddablePageIds` MUST stay in lockstep with
* `PageRepo.countEmbeddablePages` (page.repo.ts) — the bulk reindex iterates the
* ID set while the status endpoint reports the count as the live denominator, so
* if the two predicates ever diverge the "done X of Y" counter ends on the wrong
* total. Both share the SAME WHERE: a page qualifies iff it is non-deleted AND
* (text_content has a non-whitespace char OR — when text_content is empty — its
* content JSON has a text node OR it has a non-deleted embedding row).
*
* This is a DB-level invariant: the predicate lives in raw SQL (`text_content ~
* '[^[:space:]]'`, `content::text ~ '"type"[[:space:]]*:[[:space:]]*"text"'`) and an EXISTS subquery, so a unit test with mocked Kysely
* cannot observe it. We seed every boundary case against real Postgres and
* assert the returned ID set EQUALS the count (and is exactly the expected set).
* A future edit that touches one predicate but not the other turns this red.
*/
describe('PageRepo embeddable-page set: getEmbeddablePageIds <-> countEmbeddablePages [integration]', () => {
let db: Kysely<any>;
let repo: PageRepo;
let workspaceId: string;
let spaceId: string;
beforeAll(async () => {
db = getTestDb();
// Only the Kysely-backed query methods under test are exercised, so the
// SpaceMemberRepo / EventEmitter2 deps are never touched — stub them.
repo = new PageRepo(
db as any,
{} as unknown as SpaceMemberRepo,
{} as unknown as EventEmitter2,
);
workspaceId = (await createWorkspace(db)).id;
spaceId = (await createSpace(db, workspaceId)).id;
});
afterAll(async () => {
await destroyTestDb();
});
// Insert a page with explicit text_content / content / deleted_at (createPage
// in db.ts sets none), returning its id so the test can assert membership.
// `content` is the ProseMirror doc JSON (jsonb): postgres.js serializes a plain
// object to JSON for jsonb columns, so we pass it through only when supplied so
// the rest of the rows keep the DB default.
async function insertPage(args: {
textContent: string | null;
content?: unknown;
deletedAt?: Date | null;
}): Promise<string> {
const id = randomUUID();
await db
.insertInto('pages')
.values({
id,
slugId: `slug-${id.slice(0, 8)}`,
title: `page-${id.slice(0, 8)}`,
spaceId,
workspaceId,
textContent: args.textContent,
...(args.content !== undefined ? { content: args.content as any } : {}),
deletedAt: args.deletedAt ?? null,
})
.execute();
return id;
}
// Insert one embedding chunk row for a page (NOT NULL columns + deleted_at).
async function insertEmbedding(
pageId: string,
opts: { deletedAt?: Date | null } = {},
): Promise<void> {
await db
.insertInto('pageEmbeddings')
.values({
id: randomUUID(),
workspaceId,
pageId,
spaceId,
chunkIndex: 0,
chunkStart: 0,
chunkLength: 1,
content: 'x',
modelName: 'test-model',
modelDimensions: 1,
deletedAt: opts.deletedAt ?? null,
})
.execute();
}
it('returns exactly the embeddable set and its size equals countEmbeddablePages', async () => {
// IN the set --------------------------------------------------------------
// (a) non-deleted page with real body text.
const withText = await insertPage({ textContent: 'hello world' });
// (b) non-deleted page with NO text but a live embedding row (EXISTS clause:
// a page that lost its text yet still has stale vectors must be visited
// so the reindex can clear them).
const noTextLiveEmbedding = await insertPage({ textContent: null });
await insertEmbedding(noTextLiveEmbedding);
// (c) non-deleted page with EMPTY text_content but ProseMirror `content` JSON
// carrying a real text node — the content-JSON clause. This pins BOTH the
// third OR-clause AND the space-after-colon: jsonb stores the key/value
// separator as `"type": "text"` (a space after the colon), which is why
// the predicate needs `[[:space:]]*`. `reindexPage` extracts this text, so
// the page IS embeddable and the reindex MUST visit it.
const noTextContentDoc = await insertPage({
textContent: null,
content: {
type: 'doc',
content: [
{ type: 'paragraph', content: [{ type: 'text', text: 'hello' }] },
],
},
});
// OUT of the set ----------------------------------------------------------
// (d) non-deleted, text_content NULL, no embeddings.
await insertPage({ textContent: null });
// (e) non-deleted, whitespace-only text (regex requires a non-space char).
await insertPage({ textContent: ' \n\t ' });
// (f) deleted page WITH body text — excluded by the non-deleted predicate.
await insertPage({
textContent: 'deleted but had text',
deletedAt: new Date(),
});
// (g) non-deleted, no text, with ONLY a DELETED embedding row — the EXISTS
// subquery filters pe.deleted_at IS NULL, so this stays out.
const onlyDeletedEmbedding = await insertPage({ textContent: null });
await insertEmbedding(onlyDeletedEmbedding, { deletedAt: new Date() });
// (h) non-deleted, empty text_content, content JSON with ONLY a math atom
// node — its LaTeX lives in `attrs.text` (a `"text":` KEY, not a
// `"type":"text"` text node) and has no text serializer, so `jsonToText`
// yields nothing and the page produces zero embeddings. The predicate
// keys on the structural `"type":"text"` marker, so this stays OUT (a
// bare `"text":` match would wrongly inflate the denominator).
await insertPage({
textContent: null,
content: {
type: 'doc',
content: [{ type: 'mathBlock', attrs: { text: 'E=mc^2' } }],
},
});
const ids = await repo.getEmbeddablePageIds(workspaceId);
const count = await repo.countEmbeddablePages(workspaceId);
// The two queries agree on the size (the load-bearing lockstep invariant)...
expect(ids.length).toBe(count);
// ...and the set is exactly the three qualifying pages, nothing else.
expect(new Set(ids)).toEqual(
new Set([withText, noTextLiveEmbedding, noTextContentDoc]),
);
expect(count).toBe(3);
});
});

135
docs/dev-stand.md Normal file
View File

@@ -0,0 +1,135 @@
# Running a local dev stand
How to bring up a working local instance (API + client + realtime collaboration)
and the non-obvious gotchas that will otherwise eat an hour. Written from real
setup pain — read the **Gotchas** section before you start.
## Prerequisites
- **Node 20+ / pnpm 10+.**
- **Postgres with pgvector.** Use the `pgvector/pgvector` image (e.g.
`pgvector/pgvector:pg18`). The stock `postgres` image will FAIL the
`CREATE EXTENSION vector` migration — the RAG feature stores embeddings in
`page_embeddings`.
- **Redis** — backs caching, BullMQ queues, the Socket.IO adapter, and collab
sync.
## 1. Environment (`.env`)
The client (`apps/client/vite.config.ts`) and both server processes read env via
`envPath` → the **workspace root `.env`**. Keep a single source of truth. Minimum:
```dotenv
APP_URL=http://localhost:3000
PORT=3000
APP_SECRET=<one long secret — SAME value everywhere, see gotcha #3>
DATABASE_URL="postgresql://<user>:<pass>@localhost:5432/<db>?schema=public"
REDIS_URL=redis://127.0.0.1:6379
COLLAB_URL=http://localhost:3001 # where the CLIENT connects for realtime
COLLAB_PORT=3001 # where the COLLAB server listens
STORAGE_DRIVER=local
DISABLE_TELEMETRY=true
```
> If you also keep an `apps/server/.env`, its `APP_SECRET` **must match** the
> root one (see gotcha #3).
## 2. Migrations
Migrations do **not** auto-run in local dev. After a fresh checkout or switching
branches, apply them yourself or endpoints touching a new column/table will 500:
```bash
pnpm --filter server migration:latest
```
## 3. Bring it up — THREE processes, not two
`pnpm dev` starts only the **API server** (Nest, `:3000`) and the **client**
(Vite). Realtime collaboration is a **separate process** and `pnpm dev` does NOT
start it. You need all three:
```bash
# 1) API + client (from the repo root)
pnpm dev
# → API http://localhost:3000
# → client http://localhost:5173 (Vite; localhost-only by default)
# 2) Collaboration server — SEPARATE process. Build first (see gotcha #2), then:
pnpm --filter server build # produces dist/collaboration/server/collab-main.js
pnpm collab:dev # node dist/.../collab-main → listens on :3001 (0.0.0.0)
```
Without step 2 the editor shows **"Real-time editor connection lost. Retrying…"**,
stays in read-only *static* mode, and anything that only mounts in the *live*
editor won't appear.
## Seeding a login
Register through the UI, or reset an existing user's password directly in the DB
(the server hashes with `bcrypt`):
```js
// node -e '...' with pg + bcrypt from the repo's node_modules
const bcrypt = require("bcrypt");
const { Client } = require("pg");
(async () => {
const hash = await bcrypt.hash("demopass", 10);
const c = new Client({ /* DATABASE_URL parts */ });
await c.connect();
await c.query("update users set password=$1 where email=$2", [hash, "admin@example.com"]);
await c.end();
})();
```
> **Use a simple one-word password with no special characters** (e.g. `demopass`,
> not `Str0ng!Pass@2026`). Demo/test credentials get passed through shells, JSON
> payloads, and URLs by scripts and automation, where `!` `@` `$` `&` etc. get
> mangled or need escaping — a plain alphanumeric word avoids a whole class of
> "wrong password" confusion.
## Gotchas (the грабли)
1. **Collaboration is a third process.** `pnpm dev` runs API + client only.
Start `pnpm collab:dev` (on `:3001`) separately or the live editor never
connects. The client connects to `COLLAB_URL` directly (default
`http://localhost:3001`), NOT through the Vite `/collab` proxy — the API
server on `:3000` does **not** serve the collab websocket.
2. **The collab server must be built — you can't run it from source.**
`collab:dev` runs `node dist/collaboration/server/collab-main.js`, so run
`pnpm --filter server build` first. Running the entry via `tsx`/`ts-node`
fails with a NestJS DI error ("dependency … appears to be undefined at
runtime") because direct TS execution doesn't emit the decorator metadata the
built output has.
3. **`APP_SECRET` must be identical for the API server and the collab server.**
The API issues a collab-token (JWT signed with `APP_SECRET`); the collab
server validates it with `APP_SECRET`. If they load different values (e.g. a
root `.env` and an `apps/server/.env` with different secrets), every realtime
connection is rejected with **`[onAuthenticate] Invalid collab token`** and
the editor shows "connection lost". Keep one secret everywhere.
4. **Vite binds localhost only.** To reach the stand from another machine on the
LAN, start the client with `--host` (`pnpm --filter client exec vite --host`)
and use the box's LAN IP. The `/api`, `/socket.io`, and `/collab` Vite proxies
forward to `APP_URL`, so the API just works over the LAN; realtime needs
`COLLAB_URL` reachable from the browser (point it at the LAN IP:3001, and run
collab on `0.0.0.0` — it does by default).
5. **A stale `@docmost/editor-ext` white-screens the client.** The client imports
from `@docmost/editor-ext` (a workspace package). If that package's source is
behind (missing a newer export, e.g. `Spoiler`), the client dies at load with
*"The requested module … does not provide an export named 'Spoiler'"* → blank
page. Make sure the workspace `packages/editor-ext` is current for the branch
you're running (a stale sibling checkout resolved through a shared
`node_modules` symlink is the usual cause).
6. **pgvector, not stock postgres** (see Prerequisites) — the `vector` extension
migration fails otherwise.
7. **Migrations don't auto-run in dev** — run `migration:latest` after every pull
or branch switch.
See also the **Commands** and **Architecture → Two server processes** sections in
[`AGENTS.md`](../AGENTS.md).

View File

@@ -0,0 +1,68 @@
import { describe, it, expect } from "vitest";
import { generateJSON } from "@tiptap/html";
import { Document } from "@tiptap/extension-document";
import { Paragraph } from "@tiptap/extension-paragraph";
import { Text } from "@tiptap/extension-text";
import { htmlToMarkdown } from "../markdown/utils/turndown.utils";
import { markdownToHtml } from "../markdown/utils/marked.utils";
import { TiptapImage } from "./image";
// Minimal schema for parsing markdownToHtml output back to JSON (mirrors
// image.spec.ts), so we can assert the recovered caption EXACTLY.
const parseExtensions = [Document, Paragraph, Text, TiptapImage];
// Lossless markdown round-trip for image captions (issue #221). An image WITH a
// caption can't be expressed as `![alt](src)`, so it is emitted as a raw <img>
// (carrying data-caption) wrapped in a block <div>, the same trick the <video>
// rule uses. marked passes the raw HTML through, so markdownToHtml keeps the
// data-caption, and the image extension's parseHTML restores the attribute.
describe("image caption markdown round-trip", () => {
it("HTML -> Markdown emits a raw <img data-caption> for captioned images", () => {
const html = `<p><img src="/files/a.png" alt="cat" data-caption="A grey cat"></p>`;
const md = htmlToMarkdown(html);
expect(md).toContain("data-caption=\"A grey cat\"");
expect(md).toContain('src="/files/a.png"');
expect(md).toContain('alt="cat"');
// It must NOT degrade to the lossy ![]() form.
expect(md).not.toContain("![cat]");
});
it("Markdown -> HTML restores data-caption on the <img>", async () => {
const html = `<p><img src="/files/a.png" alt="cat" data-caption="A grey cat"></p>`;
const md = htmlToMarkdown(html);
const back = await markdownToHtml(md);
expect(back).toContain('data-caption="A grey cat"');
expect(back).toContain('src="/files/a.png"');
});
it("special characters in the caption survive the round-trip (escaped)", async () => {
// The source caption is the decoded string `Tom & "Jerry"` (both an `&` and
// a `"`). escapeHtmlAttr must encode `&` -> `&amp;` and `"` -> `&quot;`.
const html = `<p><img src="/files/a.png" data-caption='Tom &amp; &quot;Jerry&quot;'></p>`;
const md = htmlToMarkdown(html);
// (a) The intermediate Markdown must carry the EXACT escaped attribute. This
// fails if escapeHtmlAttr stopped escaping `"` (attribute break-out:
// data-caption="Tom & "Jerry"") or double-encoded `&` (`&amp;amp;`).
expect(md).toContain('data-caption="Tom &amp; &quot;Jerry&quot;"');
const back = await markdownToHtml(md);
expect(back).toContain("data-caption=");
expect(back).toContain("Jerry");
expect(back).toContain("Tom");
// (b) Re-parse the rendered HTML through the image extension's parseHTML and
// assert the recovered caption is EXACTLY the original (no corruption, loss,
// or double-encoding).
const json = generateJSON(back, parseExtensions);
expect(json.content?.[0]?.attrs?.caption).toBe('Tom & "Jerry"');
});
it("caption-less images stay a clean ![alt](src) with no raw HTML", () => {
const html = `<p><img src="/files/a.png" alt="cat"></p>`;
const md = htmlToMarkdown(html);
expect(md).toContain("![cat](/files/a.png)");
expect(md).not.toContain("data-caption");
expect(md).not.toContain("<img");
});
});

View File

@@ -1,5 +1,16 @@
import { describe, it, expect, beforeEach } from "vitest";
import { applyAlignment } from "./image";
import { getSchema } from "@tiptap/core";
import { generateHTML, generateJSON } from "@tiptap/html";
import { Document } from "@tiptap/extension-document";
import { Paragraph } from "@tiptap/extension-paragraph";
import { Text } from "@tiptap/extension-text";
import { applyAlignment, TiptapImage } from "./image";
// CONTRACT tests for the image node's `caption` attribute (issue #221). The
// caption is a plain-text string stored on the image atom and serialized as
// `data-caption` on the <img>. If this mapping drifts, captions saved to HTML
// (and thus to native storage / search / markdown) are silently lost.
const extensions = [Document, Paragraph, Text, TiptapImage];
// applyAlignment is a pure DOM mutation: it sets the float / padding /
// justify-content / data-image-align on an image node-view container per the
@@ -65,3 +76,56 @@ describe("applyAlignment", () => {
expect(el.style.justifyContent).toBe("flex-start");
});
});
describe("image schema", () => {
it("registers the image node and keeps it an atom", () => {
const schema = getSchema(extensions);
expect(schema.nodes.image).toBeTruthy();
expect(schema.nodes.image.spec.atom).toBe(true);
});
});
describe("image caption parse/render round-trip", () => {
it("recovers caption from data-caption on parse (HTML -> JSON)", () => {
const html = `<img src="/files/a.png" alt="cat" data-caption="A grey cat">`;
const json = generateJSON(html, extensions);
const node = json.content?.[0];
expect(node?.type).toBe("image");
expect(node?.attrs?.caption).toBe("A grey cat");
expect(node?.attrs?.alt).toBe("cat");
});
it("emits data-caption on render when set (JSON -> HTML)", () => {
const json = {
type: "doc",
content: [
{
type: "image",
attrs: { src: "/files/a.png", alt: "cat", caption: "A grey cat" },
},
],
};
const html = generateHTML(json, extensions);
expect(html).toContain('data-caption="A grey cat"');
});
it("omits data-caption when there is no caption (caption-less images stay clean)", () => {
const json = {
type: "doc",
content: [{ type: "image", attrs: { src: "/files/a.png", alt: "cat" } }],
};
const html = generateHTML(json, extensions);
expect(html).not.toContain("data-caption");
});
it("full HTML -> JSON -> HTML round-trip preserves the caption", () => {
const html = `<img src="/files/a.png" alt="cat" data-caption="Caption with &amp; &quot;quotes&quot;">`;
const json = generateJSON(html, extensions);
expect(json.content?.[0]?.attrs?.caption).toBe('Caption with & "quotes"');
const out = generateHTML(json, extensions);
const back = generateJSON(out, extensions);
expect(back.content?.[0]?.attrs?.caption).toBe('Caption with & "quotes"');
});
});

View File

@@ -32,6 +32,7 @@ export interface ImageOptions extends DefaultImageOptions {
export interface ImageAttributes {
src?: string;
alt?: string;
caption?: string;
align?: string;
attachmentId?: string;
size?: number;
@@ -125,6 +126,13 @@ export const TiptapImage = Image.extend<ImageOptions>({
alt: attributes.alt,
}),
},
caption: {
default: undefined,
parseHTML: (element) => element.getAttribute("data-caption") || undefined,
// Emit data-caption only when set, so caption-less images stay clean.
renderHTML: (attributes: ImageAttributes) =>
attributes.caption ? { "data-caption": attributes.caption } : {},
},
attachmentId: {
default: undefined,
parseHTML: (element) => element.getAttribute("data-attachment-id"),
@@ -304,6 +312,10 @@ export const TiptapImage = Image.extend<ImageOptions>({
el.alt = updatedNode.attrs.alt || "";
}
if (updatedNode.attrs.caption !== currentNode.attrs.caption) {
applyCaption(updatedNode.attrs.caption);
}
const w = updatedNode.attrs.width;
const h = updatedNode.attrs.height;
if (w != null) {
@@ -335,6 +347,28 @@ export const TiptapImage = Image.extend<ImageOptions>({
const dom = nodeView.dom as HTMLElement;
// Re-parent the resizable wrapper into a <figure> so the caption sits BELOW
// the image, OUTSIDE nodeView.wrapper. onCommit measures the img's
// offsetHeight for the persisted height/aspectRatio, and the left/right
// resize handles span the wrapper — both must cover the image only. The
// <figure> stays the single flex child of the container, so applyAlignment
// and the float modes keep working. This path also drives read-only/share.
const figure = document.createElement("figure");
figure.style.margin = "0";
figure.style.display = "inline-block"; // shrink-to-fit to image width
figure.appendChild(nodeView.wrapper);
dom.appendChild(figure);
const figcaption = document.createElement("figcaption");
figcaption.className = "image-caption";
const applyCaption = (text?: string) => {
const value = (text || "").trim();
figcaption.textContent = value;
figcaption.style.display = value ? "block" : "none";
};
applyCaption(node.attrs.caption);
figure.appendChild(figcaption);
// Apply initial alignment
applyAlignment(dom, node.attrs.align || "center");

View File

@@ -1,77 +1,147 @@
import { describe, it, expect } from "vitest";
import { htmlToMarkdown } from "./turndown.utils";
import { markdownToHtml } from "./marked.utils";
/**
* #206 mdrt-2 — Markdown export must never SILENTLY drop a block.
* #206 mdrt-2 — Markdown export must never SILENTLY drop a block. (FIXED)
*
* `htmlToMarkdown` (turndown) only registers rules for a fixed set of custom
* nodes (callout, taskItem, details, math, iframe, htmlEmbed, image, video,
* footnote). Any other custom node — `transclusionReference`, `pageBreak`,
* `mention`, `status` — falls through to turndown's default handling: an empty
* wrapper is "blank" and removed, so the block disappears from the exported
* Markdown with no trace. The invariant "never silently lose a block" is broken.
* `htmlToMarkdown` (turndown) historically only registered rules for a fixed
* set of custom nodes (callout, taskItem, details, math, iframe, htmlEmbed,
* image, video, footnote). Any other custom node — `transclusionReference`,
* `pageBreak`, `mention`, `status` — fell through to turndown's default
* handling: an empty wrapper is "blank" and removed, so the block disappeared
* from the exported Markdown with no trace, and `mention`/`status` collapsed to
* bare text, losing their identity (data-id / data-color). The invariant
* "never silently lose a block" was broken.
*
* The `it.fails` cases assert the DESIRED contract (the block survives export in
* SOME form) and are RED today: they document the unfixed data loss and flip to
* green the moment a turndown rule (real syntax or a lossless HTML-comment
* placeholder) is added. A normal characterization `it` pins the exact current
* lossy output so the regression is unambiguous.
* The fix adds lossless turndown rules that re-emit each of these nodes as raw
* HTML carrying every `data-*` attribute. Plain-Markdown viewers ignore the
* inert tag; the import path round-trips it (`markdownToHtml` passes the raw
* HTML through and each node's `parseHTML` rebuilds the ProseMirror node). These
* tests assert the surviving contract (the block is preserved AND its identity
* round-trips back through import).
*/
describe("htmlToMarkdown — custom nodes without a turndown rule (#206 mdrt-2)", () => {
const wrap = (inner: string) =>
`<p>before</p>${inner}<p>after</p>`;
describe("htmlToMarkdown — custom nodes are preserved losslessly (#206 mdrt-2)", () => {
const wrap = (inner: string) => `<p>before</p>${inner}<p>after</p>`;
it("CURRENTLY drops a pageBreak entirely (data loss)", () => {
it("preserves a pageBreak block on Markdown export", () => {
const md = htmlToMarkdown(
wrap('<div data-type="pageBreak" class="page-break"></div>'),
);
// The page break vanishes: only the two paragraphs remain, nothing between.
expect(md).toContain("before");
expect(md).toContain("after");
expect(md).not.toMatch(/page-?break/i);
expect(md).not.toContain("---"); // not even a horizontal-rule fallback
// The break survives as an inert raw-HTML tag, not silently dropped.
expect(md).toMatch(/data-type="pageBreak"/);
expect(md).toMatch(/page-?break/i);
});
it("CURRENTLY drops a transclusionReference entirely (data loss)", () => {
it("preserves a transclusionReference's identity on Markdown export", () => {
const md = htmlToMarkdown(
wrap('<div data-type="transclusionReference" data-id="abc"></div>'),
);
expect(md).toContain("before");
expect(md).toContain("after");
// The data-id (the only thing that gives the reference identity) is gone.
expect(md).not.toContain("abc");
// The data-id (the only thing that gives the reference identity) survives.
expect(md).toContain("abc");
expect(md).toMatch(/data-type="transclusionReference"/);
});
it.fails(
"should NOT lose a pageBreak block on Markdown export",
() => {
it("preserves a mention's data-id (stable identity) on Markdown export", () => {
const md = htmlToMarkdown(
'<p>hi <span data-type="mention" data-id="u1" data-label="Bob">@Bob</span> there</p>',
);
// The mention keeps its stable identity (data-id), not just the text.
expect(md).toContain("u1");
expect(md).toContain("Bob");
expect(md).toMatch(/data-type="mention"/);
});
it("preserves a status chip's color on Markdown export", () => {
const md = htmlToMarkdown(
'<p>s <span data-type="status" data-color="green">Done</span></p>',
);
// The chip's color (its identity) survives, not just the visible text.
expect(md).toContain("green");
expect(md).toContain("Done");
expect(md).toMatch(/data-type="status"/);
});
// The export form is only lossless if the import path can rebuild it. These
// assert the full MD -> HTML round-trip restores the node + its attributes,
// which is the marker <-> node contract each `parseHTML` relies on.
describe("import round-trip (markdownToHtml restores the node)", () => {
it("round-trips a pageBreak through export + import", async () => {
const md = htmlToMarkdown(
wrap('<div data-type="pageBreak" class="page-break"></div>'),
);
// Desired: the break survives in some form (e.g. a `---` rule or marker).
expect(md).toMatch(/(-{3,}|page-?break)/i);
},
);
const html = await markdownToHtml(md);
expect(html).toMatch(/<div[^>]*data-type="pageBreak"[^>]*>/);
expect(html).toContain("before");
expect(html).toContain("after");
});
it.fails(
"should NOT lose a transclusionReference's identity on Markdown export",
() => {
it("round-trips a transclusionReference (keeps data-id)", async () => {
const md = htmlToMarkdown(
wrap('<div data-type="transclusionReference" data-id="abc"></div>'),
);
// Desired: the referenced id survives so the block can be rebuilt.
expect(md).toContain("abc");
},
);
const html = await markdownToHtml(md);
expect(html).toMatch(/<div[^>]*data-type="transclusionReference"[^>]*>/);
expect(html).toContain("abc");
});
it.fails(
"should NOT lose a mention's data-id on Markdown export",
() => {
it("round-trips a mention (keeps data-id + data-label)", async () => {
const md = htmlToMarkdown(
'<p>hi <span data-type="mention" data-id="u1" data-label="Bob">@Bob</span> there</p>',
);
// Desired: the mention keeps its stable identity (data-id), not just text.
expect(md).toContain("u1");
},
);
const html = await markdownToHtml(md);
expect(html).toMatch(/<span[^>]*data-type="mention"[^>]*>/);
expect(html).toContain("u1");
expect(html).toContain("Bob");
});
it("round-trips a status chip (keeps data-color)", async () => {
const md = htmlToMarkdown(
'<p>s <span data-type="status" data-color="green">Done</span></p>',
);
const html = await markdownToHtml(md);
expect(html).toMatch(/<span[^>]*data-type="status"[^>]*>/);
expect(html).toContain("green");
});
// HTML special chars in an attribute value or in a node's text must be
// ESCAPED when re-emitted as raw HTML, otherwise the exported tag is
// malformed and `markdownToHtml`'s parser cannot restore the original value
// (the same silent data loss this PR fixes). Dropping `<`/`>` escaping is the
// dangerous regression: a stray `<` or `>` corrupts the tag (or injects new
// markup), so the test data carries ALL of `&`, `"`, `<`, `>` in BOTH the
// data-label attribute and the visible text. That fully exercises
// escapeHtmlAttr's `&,",<,>` branches and escapeHtmlText's `&,<,>` branches
// (escapeHtmlText leaves `"` literal); the alphanumeric-only cases above hit
// none of them.
it("escapes HTML special chars (& \" < >) in attrs + text and round-trips them", async () => {
const md = htmlToMarkdown(
`<p>hi <span data-type="mention" data-id="u1" data-label="A &amp; &lt;B&gt; &quot;C&quot;">@A &amp; &lt;B&gt; "C"</span> there</p>`,
);
// (a) The exported Markdown carries a WELL-FORMED, correctly-escaped tag:
// the attribute escapes `&`, `<`, `>` AND `"`; the text escapes `&`, `<`,
// `>` (a `"` inside text content is legal, so it stays literal).
expect(md).toContain('data-label="A &amp; &lt;B&gt; &quot;C&quot;"');
expect(md).toContain('>@A &amp; &lt;B&gt; "C"</span>');
// And explicitly NOT the raw, tag-corrupting forms: a literal `<B>` (would
// mean `<`/`>` escaping was dropped in either the attr or the text)...
expect(md).not.toContain("<B>");
// ...nor the malformed attribute that an unescaped `"` would produce.
expect(md).not.toContain('data-label="A &amp; &lt;B&gt; "C""');
// (b) Import restores the ORIGINAL (unescaped) values, attribute and text.
const html = await markdownToHtml(md);
const dom = new DOMParser().parseFromString(html as string, "text/html");
const span = dom.querySelector('span[data-type="mention"]');
expect(span).not.toBeNull();
expect(span!.getAttribute("data-id")).toBe("u1");
expect(span!.getAttribute("data-label")).toBe('A & <B> "C"');
expect(span!.textContent).toBe('@A & <B> "C"');
});
});
});

View File

@@ -43,6 +43,54 @@ function fillEmptyFootnoteRefs(html: string): string {
);
}
/**
* `pageBreak` and `transclusionReference` are childless atom <div>s. Like an
* empty footnote ref (see above), turndown treats a childless block as "blank"
* and replaces it with the blankRule BEFORE any custom rule can fire — so the
* node disappears from the export with no trace (#206 mdrt-2). Inject a
* zero-width space so the node is non-blank and our lossless rule runs; the
* rule rebuilds the tag from the element's attributes, so the injected char
* never reaches the output.
*/
function fillEmptyAtomBlocks(html: string): string {
return html.replace(
/<div\b([^>]*\bdata-type="(?:pageBreak|transclusionReference)"[^>]*)>\s*<\/div>/gi,
(_m, attrs) => `<div${attrs}>​</div>`,
);
}
/** HTML-escape an attribute value so a re-emitted raw-HTML tag is well-formed. */
function escapeHtmlAttr(value: string): string {
return value
.replace(/&/g, '&amp;')
.replace(/"/g, '&quot;')
.replace(/</g, '&lt;')
.replace(/>/g, '&gt;');
}
/** HTML-escape text placed inside a re-emitted raw-HTML element. */
function escapeHtmlText(value: string): string {
return value
.replace(/&/g, '&amp;')
.replace(/</g, '&lt;')
.replace(/>/g, '&gt;');
}
/**
* Serialize ALL of an element's attributes back to a raw-HTML attribute string
* (leading space included). Generic on purpose: a custom node's identity lives
* entirely in its `data-*` attributes (data-id, data-color, data-source-page-id,
* data-transclusion-id, …), and serializing every attribute keeps the export
* lossless regardless of which attributes a given node carries.
*/
function serializeAttrs(node: any): string {
const attrs = node?.attributes;
if (!attrs) return '';
return Array.from(attrs as ArrayLike<{ name: string; value: string }>)
.map((attr) => ` ${attr.name}="${escapeHtmlAttr(attr.value ?? '')}"`)
.join('');
}
export function htmlToMarkdown(html: string): string {
const turndownService = new TurndownService({
headingStyle: 'atx',
@@ -70,12 +118,83 @@ export function htmlToMarkdown(html: string): string {
video,
footnoteReference,
footnotesList,
pageBreak,
transclusionReference,
mention,
status,
]);
return turndownService
.turndown(fillEmptyFootnoteRefs(html))
.turndown(fillEmptyAtomBlocks(fillEmptyFootnoteRefs(html)))
.replaceAll('<br>', ' ');
}
/**
* Lossless export rules for custom nodes that have NO native Markdown syntax
* (#206 mdrt-2). Markdown cannot represent a page break, a transclusion
* reference, a mention's stable id, or a status chip's color — so rather than
* letting turndown silently drop them, each rule re-emits the node as raw HTML
* carrying every `data-*` attribute. Plain-Markdown viewers ignore the inert
* tag, and the import path round-trips it: `markdownToHtml` passes raw HTML
* through and each node's `parseHTML` (`div[data-type="…"]`, `span[…]`) rebuilds
* the ProseMirror node with its attributes intact.
*/
function pageBreak(turndownService: _TurndownService) {
turndownService.addRule('pageBreak', {
filter: function (node: HTMLInputElement) {
return (
node.nodeName === 'DIV' &&
node.getAttribute('data-type') === 'pageBreak'
);
},
replacement: function (_content: string, node: HTMLInputElement) {
return `\n\n<div${serializeAttrs(node)}></div>\n\n`;
},
});
}
function transclusionReference(turndownService: _TurndownService) {
turndownService.addRule('transclusionReference', {
filter: function (node: HTMLInputElement) {
return (
node.nodeName === 'DIV' &&
node.getAttribute('data-type') === 'transclusionReference'
);
},
replacement: function (_content: string, node: HTMLInputElement) {
return `\n\n<div${serializeAttrs(node)}></div>\n\n`;
},
});
}
function mention(turndownService: _TurndownService) {
turndownService.addRule('mention', {
filter: function (node: HTMLInputElement) {
return (
node.nodeName === 'SPAN' &&
node.getAttribute('data-type') === 'mention'
);
},
replacement: function (_content: string, node: HTMLInputElement) {
const text = escapeHtmlText(node.textContent || '');
return `<span${serializeAttrs(node)}>${text}</span>`;
},
});
}
function status(turndownService: _TurndownService) {
turndownService.addRule('status', {
filter: function (node: HTMLInputElement) {
return (
node.nodeName === 'SPAN' && node.getAttribute('data-type') === 'status'
);
},
replacement: function (_content: string, node: HTMLInputElement) {
const text = escapeHtmlText(node.textContent || '');
return `<span${serializeAttrs(node)}>${text}</span>`;
},
});
}
/**
* Serialize the `htmlEmbed` node to Markdown.
*
@@ -282,6 +401,17 @@ function image(turndownService: _TurndownService) {
replacement: function (_content: string, node: HTMLInputElement) {
const src = node.getAttribute('src') || '';
if (!src) return '';
const caption = node.getAttribute('data-caption') || '';
if (caption) {
// ![]() can't carry a caption, so emit a raw <img> wrapped in a block
// <div>. marked passes it through and the image extension's parseHTML
// restores the caption from data-caption.
const parts = [`src="${escapeHtmlAttr(src)}"`];
const alt = node.getAttribute('alt') || '';
if (alt) parts.push(`alt="${escapeHtmlAttr(alt)}"`);
parts.push(`data-caption="${escapeHtmlAttr(caption)}"`);
return `<div><img ${parts.join(' ')}></div>`;
}
const alt = sanitizeMdLinkText(node.getAttribute('alt') || '');
const title = node.getAttribute('title') || '';
const titlePart = title ? ' "' + title.replace(/"/g, '\\"') + '"' : '';

View File

@@ -0,0 +1,133 @@
import { describe, it, expect } from "vitest";
import { schema } from "@tiptap/pm/schema-basic";
import type { Node as PMNode } from "@tiptap/pm/model";
import { Transform } from "@tiptap/pm/transform";
import { recreateTransform } from "./recreateTransform";
/**
* recreateTransform diffs two documents and produces ProseMirror steps that turn
* `fromDoc` into `toDoc`. It is the backbone of collaborative/version diffing, so
* THE invariant that matters is: replaying the produced steps on `fromDoc` must
* reproduce `toDoc` exactly. Every test below re-applies the steps onto a fresh
* Transform seeded from `fromDoc` (not just trusting `tr.doc`) and asserts node
* equality with `.eq()`. If a regression makes any step wrong, the round-trip
* breaks and the test fails.
*/
// Real ProseMirror schema (the standard basic schema) with paragraph/heading +
// strong/em marks — the same primitives the editor diffs in production.
const doc = (...c: PMNode[]) => schema.node("doc", null, c);
const p = (...c: PMNode[]) =>
schema.node("paragraph", null, c.length ? c : undefined);
const h = (level: number, ...c: PMNode[]) =>
schema.node("heading", { level }, c);
const t = (text: string, ...marks: any[]) =>
schema.text(text, marks.length ? marks : undefined);
const strong = schema.marks.strong.create();
const em = schema.marks.em.create();
// Replay the diff's steps onto a fresh Transform built from `fromDoc`. This is
// the faithful "apply(diff) == target" check — it exercises the actual Step
// objects rather than the transform's internal accumulated doc.
function applyDiff(fromDoc: PMNode, toDoc: PMNode, options?: any): PMNode {
const tr = recreateTransform(fromDoc, toDoc, options);
const replay = new Transform(fromDoc);
tr.steps.forEach((s) => {
const result = replay.maybeStep(s);
if (result.failed) throw new Error(`step failed: ${result.failed}`);
});
return replay.doc;
}
describe("recreateTransform round-trip (apply(diff) == target)", () => {
it("reconstructs the target on plain text insertion", () => {
// Inserting " world" must yield exactly the target paragraph.
const from = doc(p(t("hello")));
const to = doc(p(t("hello world")));
expect(applyDiff(from, to).eq(to)).toBe(true);
});
it("reconstructs the target on text deletion", () => {
// Deleting a trailing word is the inverse of insertion and must round-trip.
const from = doc(p(t("hello world")));
const to = doc(p(t("hello")));
expect(applyDiff(from, to).eq(to)).toBe(true);
});
it("reconstructs the target when a word is replaced mid-string", () => {
// A char-level replace in the middle must not corrupt the surrounding text.
const from = doc(p(t("the quick brown fox")));
const to = doc(p(t("the slow brown fox")));
expect(applyDiff(from, to).eq(to)).toBe(true);
});
it("reconstructs the target when a mark is added (complexSteps path)", () => {
// Mark-only changes are diffed in a separate pass; the bolded run must match.
const from = doc(p(t("hello")));
const to = doc(p(t("hello", strong)));
const out = applyDiff(from, to);
expect(out.eq(to)).toBe(true);
// Sanity: the produced doc actually carries the strong mark.
expect(out.firstChild!.firstChild!.marks.length).toBe(1);
});
it("reconstructs the target when a mark is removed", () => {
// Removing the only mark must leave the same text with no marks.
const from = doc(p(t("hello", strong)));
const to = doc(p(t("hello")));
const out = applyDiff(from, to);
expect(out.eq(to)).toBe(true);
expect(out.firstChild!.firstChild!.marks.length).toBe(0);
});
it("reconstructs the target on a paragraph split into two blocks", () => {
// Structural change (one block -> two) must replay as valid replace steps.
const from = doc(p(t("hello world")));
const to = doc(p(t("hello")), p(t("world")));
const out = applyDiff(from, to);
expect(out.eq(to)).toBe(true);
expect(out.childCount).toBe(2);
});
it("reconstructs the target on a node-type change (paragraph -> heading)", () => {
// Type/attrs changes drive the setNodeMarkup branch; the node must become a
// heading while keeping its text.
const from = doc(p(t("hello")));
const to = doc(h(1, t("hello")));
const out = applyDiff(from, to);
expect(out.eq(to)).toBe(true);
expect(out.firstChild!.type.name).toBe("heading");
});
it("reconstructs a combined structural + mark change", () => {
// Several diff kinds at once (new block + italic run) still round-trips.
const from = doc(p(t("alpha")));
const to = doc(p(t("alpha")), p(t("beta", em)));
const out = applyDiff(from, to);
expect(out.eq(to)).toBe(true);
});
it("produces an empty step list for identical documents", () => {
// No diff => no work; spurious steps would mean wasted/incorrect history.
const from = doc(p(t("same")));
const to = doc(p(t("same")));
const tr = recreateTransform(from, to);
expect(tr.steps.length).toBe(0);
expect(tr.doc.eq(to)).toBe(true);
});
it("round-trips with complexSteps:false (marks diffed as replaces)", () => {
// With complexSteps off, mark changes are folded into replace steps rather
// than dedicated mark steps — the result must still equal the target.
const from = doc(p(t("hello")));
const to = doc(p(t("hello", strong)));
expect(applyDiff(from, to, { complexSteps: false }).eq(to)).toBe(true);
});
it("round-trips with wordDiffs:true (whole-word text diffing)", () => {
// wordDiffs changes the granularity of the text diff, not the outcome.
const from = doc(p(t("the quick brown fox")));
const to = doc(p(t("the quick red fox")));
expect(applyDiff(from, to, { wordDiffs: true }).eq(to)).toBe(true);
});
});

View File

@@ -0,0 +1,75 @@
import { describe, it, expect } from "vitest";
import { getSelectionRangeInColumn } from "./get-selection-range-in-column";
import { cell, row, table, doc, trFor } from "./table-test-helpers";
/**
* getSelectionRangeInColumn computes the rectangular column range (the set of
* column indexes, plus anchor/head cell positions) that a drag-reorder or
* column-select operation should act on, accounting for merged (colspan) cells.
* It keys off the table found from the current selection, so we drive it with a
* real EditorState whose selection sits inside the table.
*/
// A 2-row x 3-col grid; each column is identifiable by its top-row letter.
const grid3x2 = () =>
doc(
table(
row(cell("a"), cell("b"), cell("c")),
row(cell("d"), cell("e"), cell("f")),
),
);
describe("getSelectionRangeInColumn", () => {
it("returns a single-column range for a single index", () => {
// Asking for column 1 yields exactly indexes [1].
const tr = trFor(grid3x2());
const range = getSelectionRangeInColumn(tr, 1);
expect(range).toBeTruthy();
expect(range!.indexes).toEqual([1]);
});
it("anchor/head resolve to the top and bottom cells OF the requested column", () => {
// $head must point at the column's first (top) cell and $anchor at its last
// (bottom) cell — pinning that the returned positions belong to column 1,
// not some other column.
const tr = trFor(grid3x2());
const range = getSelectionRangeInColumn(tr, 1)!;
expect(tr.doc.nodeAt(range.$head.pos)?.textContent).toBe("b"); // top of col 1
expect(tr.doc.nodeAt(range.$anchor.pos)?.textContent).toBe("e"); // bottom of col 1
});
it("returns the inclusive span of columns for a multi-column request", () => {
// A 0..2 request must enumerate every covered column, in order.
const tr = trFor(grid3x2());
const range = getSelectionRangeInColumn(tr, 0, 2);
expect(range!.indexes).toEqual([0, 1, 2]);
});
it("returns a two-column span for an adjacent pair", () => {
const tr = trFor(grid3x2());
const range = getSelectionRangeInColumn(tr, 1, 2);
expect(range!.indexes).toEqual([1, 2]);
});
it("expands the range to cover a horizontally merged (colspan) cell", () => {
// Row 0 col 0 spans 2 columns. Requesting just column 0 must pull column 1
// into the range because they are merged together in the top row.
const d = doc(
table(
row(cell("ab", { colspan: 2 }), cell("c")),
row(cell("d"), cell("e"), cell("f")),
),
);
const tr = trFor(d);
const range = getSelectionRangeInColumn(tr, 0);
expect(range!.indexes).toEqual([0, 1]);
});
it("throws when the requested column is entirely out of range", () => {
// No cells exist at column 5 of a 3-wide table, so the function cannot pick
// an anchor cell and dereferences undefined — pin this as the current
// (caller-guarded) contract so a silent behavior change is caught.
const tr = trFor(grid3x2());
expect(() => getSelectionRangeInColumn(tr, 5)).toThrow();
});
});

View File

@@ -0,0 +1,127 @@
import { describe, it, expect } from "vitest";
import { CellSelection } from "@tiptap/pm/tables";
import { moveColumn } from "./move-column";
import {
schema,
cell,
row,
table,
doc,
grid,
stateFor,
} from "./table-test-helpers";
/**
* moveColumn reorders whole columns of a real ProseMirror table by mutating a
* Transaction (transpose -> move row -> transpose back -> replace). The invariant
* is that after the call each column appears at its new position with every
* cell's content preserved and nothing dropped or duplicated.
*/
// 2-row x 3-col table; column k is (rowX-col-k). Columns: 0=(a,d) 1=(b,e) 2=(c,f).
const grid3x2 = () =>
doc(
table(
row(cell("a"), cell("b"), cell("c")),
row(cell("d"), cell("e"), cell("f")),
),
);
describe("moveColumn", () => {
it("moves the first column to the last index, preserving column content", () => {
// origin 0 -> target 2 sends column (a,d) to the right: cols become 1,2,0.
const state = stateFor(grid3x2());
const tr = state.tr;
const ok = moveColumn({
tr,
originIndex: 0,
targetIndex: 2,
select: false,
pos: state.selection.from,
});
expect(ok).toBe(true);
expect(grid(tr)).toEqual([
["b", "c", "a"],
["e", "f", "d"],
]);
});
it("moves a later column to the first index", () => {
// origin 2 -> target 0 pulls column (c,f) to the front: cols become 2,0,1.
const state = stateFor(grid3x2());
const tr = state.tr;
const ok = moveColumn({
tr,
originIndex: 2,
targetIndex: 0,
select: false,
pos: state.selection.from,
});
expect(ok).toBe(true);
expect(grid(tr)).toEqual([
["c", "a", "b"],
["f", "d", "e"],
]);
});
it("never drops or duplicates cells when reordering columns", () => {
const state = stateFor(grid3x2());
const tr = state.tr;
moveColumn({
tr,
originIndex: 1,
targetIndex: 2,
select: false,
pos: state.selection.from,
});
expect(grid(tr).flat().sort()).toEqual(
["a", "b", "c", "d", "e", "f"].sort(),
);
expect(grid(tr)[0].length).toBe(3);
});
it("returns false (no-op) when target equals origin", () => {
const state = stateFor(grid3x2());
const tr = state.tr;
const before = grid(tr);
const ok = moveColumn({
tr,
originIndex: 1,
targetIndex: 1,
select: false,
pos: state.selection.from,
});
expect(ok).toBe(false);
expect(grid(tr)).toEqual(before);
});
it("returns false when pos is not inside a table", () => {
const d = doc(
schema.nodes.paragraph.createChecked(null, schema.text("plain")),
);
const state = stateFor(d);
const tr = state.tr;
const ok = moveColumn({
tr,
originIndex: 0,
targetIndex: 1,
select: false,
pos: state.selection.from,
});
expect(ok).toBe(false);
});
it("installs a CellSelection on the moved column when select is true", () => {
const state = stateFor(grid3x2());
const tr = state.tr;
const ok = moveColumn({
tr,
originIndex: 0,
targetIndex: 2,
select: true,
pos: state.selection.from,
});
expect(ok).toBe(true);
expect(tr.selection instanceof CellSelection).toBe(true);
});
});

View File

@@ -0,0 +1,136 @@
import { describe, it, expect } from "vitest";
import { CellSelection } from "@tiptap/pm/tables";
import { moveRow } from "./move-row";
import {
schema,
cell,
row,
table,
doc,
grid,
stateFor,
} from "./table-test-helpers";
/**
* moveRow reorders whole rows of a real ProseMirror table by mutating a
* Transaction: it locates the table, computes origin/target row ranges, rebuilds
* the table with rows reordered, and replaces it in the doc. The invariant is
* that after the call the table's rows appear in the new order with every cell's
* content preserved, and no rows are dropped or duplicated.
*/
// 3-row x 2-col table; each row identifiable by its cells.
const grid2x3 = () =>
doc(
table(
row(cell("r0a"), cell("r0b")),
row(cell("r1a"), cell("r1b")),
row(cell("r2a"), cell("r2b")),
),
);
describe("moveRow", () => {
it("moves the first row down to the last index, preserving content", () => {
// origin 0 -> target 2 makes row 0 land after the other rows: [r1, r2, r0].
const state = stateFor(grid2x3());
const tr = state.tr;
const ok = moveRow({
tr,
originIndex: 0,
targetIndex: 2,
select: false,
pos: state.selection.from,
});
expect(ok).toBe(true);
expect(grid(tr)).toEqual([
["r1a", "r1b"],
["r2a", "r2b"],
["r0a", "r0b"],
]);
});
it("moves a lower row up to an earlier index", () => {
// origin 2 -> target 0 lifts the last row above the rest: [r2, r0, r1].
const state = stateFor(grid2x3());
const tr = state.tr;
const ok = moveRow({
tr,
originIndex: 2,
targetIndex: 0,
select: false,
pos: state.selection.from,
});
expect(ok).toBe(true);
expect(grid(tr)).toEqual([
["r2a", "r2b"],
["r0a", "r0b"],
["r1a", "r1b"],
]);
});
it("never drops or duplicates rows when reordering", () => {
// The full multiset of cell texts is invariant under any valid move.
const state = stateFor(grid2x3());
const tr = state.tr;
moveRow({
tr,
originIndex: 1,
targetIndex: 2,
select: false,
pos: state.selection.from,
});
const flat = grid(tr).flat().sort();
expect(flat).toEqual(
["r0a", "r0b", "r1a", "r1b", "r2a", "r2b"].sort(),
);
expect(grid(tr).length).toBe(3);
});
it("returns false (no-op) when target equals origin", () => {
// Moving a row onto itself is rejected and leaves the table unchanged.
const state = stateFor(grid2x3());
const tr = state.tr;
const before = grid(tr);
const ok = moveRow({
tr,
originIndex: 1,
targetIndex: 1,
select: false,
pos: state.selection.from,
});
expect(ok).toBe(false);
expect(grid(tr)).toEqual(before);
});
it("returns false when pos is not inside a table", () => {
// Without a table at `pos`, the function bails out instead of throwing.
const d = doc(
schema.nodes.paragraph.createChecked(null, schema.text("plain")),
);
const state = stateFor(d);
const tr = state.tr;
const ok = moveRow({
tr,
originIndex: 0,
targetIndex: 1,
select: false,
pos: state.selection.from,
});
expect(ok).toBe(false);
});
it("installs a CellSelection on the moved row when select is true", () => {
// With select:true the moved row at the target index is selected.
const state = stateFor(grid2x3());
const tr = state.tr;
const ok = moveRow({
tr,
originIndex: 0,
targetIndex: 2,
select: true,
pos: state.selection.from,
});
expect(ok).toBe(true);
expect(tr.selection instanceof CellSelection).toBe(true);
});
});

View File

@@ -0,0 +1,60 @@
import { Schema } from "@tiptap/pm/model";
import type { Node as PMNode } from "@tiptap/pm/model";
import { tableNodes } from "@tiptap/pm/tables";
import { EditorState, Selection } from "@tiptap/pm/state";
import { findTable } from "./query";
import { convertTableNodeToArrayOfRows } from "./convert-table-node-to-array-of-rows";
/**
* Shared test fixtures for the table utility tests. Several test files exercise
* the row/column move and selection helpers against a real ProseMirror table
* schema (the same primitives the editor uses) so TableMap / cellsInRect behave
* exactly as in production. Keeping the schema and node builders in one place
* means a schema change (e.g. cellAttributes) is applied once instead of being
* copied across every test file.
*
* This is a test-only helper (not shipped). Its name does not match vitest's
* `*.{test,spec}.ts` include glob, so it is not collected as a spec file —
* adding test cases here would NOT make them run; put them in a `*.test.ts`.
*/
const tNodes = tableNodes({
tableGroup: "block",
cellContent: "inline*",
cellAttributes: {},
});
export const schema = new Schema({
nodes: {
doc: { content: "block+" },
paragraph: { group: "block", content: "inline*", toDOM: () => ["p", 0] },
text: { group: "inline" },
...tNodes,
},
marks: {},
});
export const cell = (txt: string, attrs?: Record<string, unknown>): PMNode =>
schema.nodes.table_cell.createChecked(attrs ?? null, schema.text(txt));
export const row = (...cells: PMNode[]): PMNode =>
schema.nodes.table_row.createChecked(null, cells);
export const table = (...rows: PMNode[]): PMNode =>
schema.nodes.table.createChecked(null, rows);
export const doc = (...content: PMNode[]): PMNode =>
schema.nodes.doc.createChecked(null, content);
// Read the table's content as a grid of cell texts (rows x cols) from whatever
// table currently lives in `tr.doc`.
export const grid = (tr: any): string[][] => {
const t = findTable(tr.doc.resolve(tr.selection.from))!;
return convertTableNodeToArrayOfRows(t.node).map((r) =>
r.map((c) => (c ? c.textContent : "")),
);
};
export const stateFor = (d: PMNode) =>
EditorState.create({ doc: d, selection: Selection.atStart(d) });
// Build a transaction whose selection is inside the doc (helpers locate the
// table via `tr.selection.$from`).
export const trFor = (d: PMNode) => stateFor(d).tr;

View File

@@ -1,11 +1,11 @@
import { describe, it, expect } from "vitest";
import { Schema } from "@tiptap/pm/model";
import type { Node as PMNode } from "@tiptap/pm/model";
import { tableNodes, TableMap } from "@tiptap/pm/tables";
import { TableMap } from "@tiptap/pm/tables";
import { transpose } from "./transpose";
import { moveRowInArrayOfRows } from "./move-row-in-array-of-rows";
import { convertTableNodeToArrayOfRows } from "./convert-table-node-to-array-of-rows";
import { convertArrayOfRowsToTableNode } from "./convert-array-of-rows-to-table-node";
import { cell, row, table } from "./table-test-helpers";
/**
* Unit tests for the pure table data-transformation utilities. These functions
@@ -14,30 +14,6 @@ import { convertArrayOfRowsToTableNode } from "./convert-array-of-rows-to-table-
* ProseMirror table schema (the same primitives the editor uses).
*/
// Minimal schema containing real ProseMirror table nodes so TableMap behaves
// exactly as it does in the editor (merged cells, colspan, etc.).
const tNodes = tableNodes({
tableGroup: "block",
cellContent: "inline*",
cellAttributes: {},
});
const schema = new Schema({
nodes: {
doc: { content: "block+" },
paragraph: { group: "block", content: "inline*", toDOM: () => ["p", 0] },
text: { group: "inline" },
...tNodes,
},
marks: {},
});
const cell = (txt: string, attrs?: Record<string, unknown>): PMNode =>
schema.nodes.table_cell.createChecked(attrs ?? null, schema.text(txt));
const row = (...cells: PMNode[]): PMNode =>
schema.nodes.table_row.createChecked(null, cells);
const table = (...rows: PMNode[]): PMNode =>
schema.nodes.table.createChecked(null, rows);
// Read the text content of each (non-null) cell so we can compare structure
// without depending on ProseMirror node identity.
const textGrid = (rows: (PMNode | null)[][]): (string | null)[][] =>

View File

@@ -100,4 +100,51 @@ describe("addUniqueIdsToDoc", () => {
const [id] = ids(out);
expect(id).toBeTruthy();
});
it("only assigns ids to configured node types, not to others", () => {
// `types` is ["heading","paragraph"]; a codeBlock is NOT addressed, so it
// must come back without an id while the sibling paragraph is filled. (The
// UniqueID attribute only exists on configured types in the schema.)
const doc = {
type: "doc",
content: [
{ type: "codeBlock", content: [{ type: "text", text: "x = 1" }] },
para(undefined, "after"),
],
};
const out = addUniqueIdsToDoc(doc, extensions);
const [codeId, paraId] = ids(out);
expect(codeId).toBeUndefined();
expect(paraId).toBeTruthy();
});
it("assigns ids to target nodes nested inside non-target containers", () => {
// findChildren walks the whole tree: a paragraph inside a blockquote still
// gets an id, while the (non-target) blockquote wrapper does not.
const doc = {
type: "doc",
content: [
{ type: "blockquote", content: [para(undefined, "quoted")] },
],
};
const out = addUniqueIdsToDoc(doc, extensions) as any;
const blockquote = out.content[0];
const nestedPara = blockquote.content[0];
expect(blockquote.attrs?.id).toBeUndefined();
expect(nestedPara.attrs.id).toBeTruthy();
});
it("is idempotent: a second pass keeps every already-unique id unchanged", () => {
// Once ids are assigned and unique, re-running must be a fixed point — no
// churn that would invalidate stored MCP anchors on every save.
const doc = {
type: "doc",
content: [para(undefined, "a"), para(undefined, "b"), para(undefined, "c")],
};
const once = addUniqueIdsToDoc(doc, extensions);
const twice = addUniqueIdsToDoc(once, extensions);
expect(ids(twice)).toEqual(ids(once));
// And all three are distinct, so the second pass had real ids to preserve.
expect(new Set(ids(once)).size).toBe(3);
});
});

View File

@@ -27,6 +27,7 @@
"dist",
"src/**/*.spec.ts",
"src/**/*.test.ts",
"src/lib/footnote/footnote-corpus.ts"
"src/lib/footnote/footnote-corpus.ts",
"src/lib/table/utils/table-test-helpers.ts"
]
}

View File

@@ -16,7 +16,7 @@ license.
> that interface. Other Docmost MCPs are human-shaped — they expose "open the page" and
> "replace the page"; this one exposes the editing primitives a model is good at.
It exposes **40 tools** built around three ideas that the other Docmost MCPs do not
It exposes **41 tools** built around three ideas that the other Docmost MCPs do not
combine:
1. **Surgical, token-cheap edits.** Address a single block by id and patch it, or run
@@ -106,7 +106,7 @@ There are several Docmost MCPs. Here is a capability-by-capability comparison.
## Tools
All 40 tools, grouped by what you'd reach for them.
All 41 tools, grouped by what you'd reach for them.
### Exploration & retrieval
@@ -219,6 +219,8 @@ All 40 tools, grouped by what you'd reach for them.
- **`list_comments`** — List a page's comments (content returned as Markdown).
- **`update_comment`** — Edit an existing comment.
- **`delete_comment`** — Delete a comment.
- **`resolve_comment`** — Resolve (close) or reopen a comment thread (reversible). Only top-level
comments can be resolved; the thread and its replies are kept, unlike `delete_comment`.
- **`check_new_comments`** — Find comments created after a given ISO-8601 timestamp across
a space, optionally scoped to a page subtree — ideal for an agent that watches a doc for
feedback.
@@ -262,7 +264,7 @@ so capable clients steer the model automatically.
- **Reads**: `get_page` (Markdown) / `get_page_json` (lossless ProseMirror with ids).
- **Review changes**: `list_page_history` → `diff_page_versions` → `restore_page_version`.
- **Comments**: `create_comment` (with optional inline anchoring) / `list_comments` /
`update_comment` / `delete_comment` / `check_new_comments`.
`update_comment` / `resolve_comment` / `delete_comment` / `check_new_comments`.
- **Navigate a page cheaply** (find a section/table, grab a block id): `get_outline` →
`get_node`.
- **Tables** (add/remove a row, set a cell): `table_get` / `table_insert_row` /

View File

@@ -17,7 +17,7 @@
> «открыть страницу» и «заменить страницу»; этот даёт примитивы редактирования, в которых
> модель сильна.
Сервер предоставляет **40 инструментов**, построенных вокруг трёх идей, которые другие
Сервер предоставляет **41 инструмент**, построенный вокруг трёх идей, которые другие
Docmost-MCP не сочетают:
1. **Точечные, экономичные по токенам правки.** Адресуйте отдельный блок по id и патчите
@@ -109,7 +109,7 @@ Docmost-MCP не сочетают:
## Инструменты
Все 40 инструментов, сгруппированы по задачам, для которых вы их возьмёте.
Все 41 инструмент, сгруппированы по задачам, для которых вы их возьмёте.
### Чтение и поиск
@@ -226,6 +226,8 @@ Docmost-MCP не сочетают:
- **`list_comments`** — Список комментариев страницы (контент возвращается как Markdown).
- **`update_comment`** — Изменить существующий комментарий.
- **`delete_comment`** — Удалить комментарий.
- **`resolve_comment`** — Закрыть (resolve) или переоткрыть тред комментария (обратимо). Resolve
доступен только для корневых комментариев; тред и ответы сохраняются, в отличие от `delete_comment`.
- **`check_new_comments`** — Найти комментарии, созданные после заданной метки времени
ISO-8601, по пространству, опционально в рамках поддерева страниц — идеально для агента,
который следит за обратной связью в документе.
@@ -271,7 +273,7 @@ Docmost-MCP не сочетают:
- **Просмотр изменений**: `list_page_history` → `diff_page_versions` →
`restore_page_version`.
- **Комментарии**: `create_comment` (с опциональной inline-привязкой) / `list_comments` /
`update_comment` / `delete_comment` / `check_new_comments`.
`update_comment` / `resolve_comment` / `delete_comment` / `check_new_comments`.
- **Дешёвая навигация по странице** (найти раздел/таблицу, получить id блока): `get_outline`
→ `get_node`.
- **Таблицы** (добавить/удалить строку, задать ячейку): `table_get` / `table_insert_row` /

View File

@@ -37,6 +37,15 @@ const MIME_TO_EXT = {
"image/webp": ".webp",
"image/svg+xml": ".svg",
};
// Canonical UUID shape (versions 1–8, matching the `uuid` package's `validate`
// that the server's isValidUUID uses). page.repo.ts treats any non-UUID pageId
// as a slugId, so the MCP detects a UUID locally and skips a /pages/info
// round-trip in resolvePageId. A 10-char nanoid slugId never contains dashes,
// so it can never be misread as a UUID here.
const UUID_RE = /^[0-9a-f]{8}-[0-9a-f]{4}-[1-8][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i;
function isUuid(value) {
return typeof value === "string" && UUID_RE.test(value);
}
export class DocmostClient {
client;
token = null;
@@ -64,6 +73,11 @@ export class DocmostClient {
// can all call login() at once. Memoizing a single promise collapses that
// thundering herd into ONE /auth/login request that everyone awaits.
loginPromise = null;
// Canonical-UUID cache for resolvePageId: maps an agent-supplied slugId to the
// page's canonical UUID, so repeated collab edits on the same page do not
// re-fetch /pages/info. A UUID input short-circuits before this cache (see
// resolvePageId), so only slugId->uuid entries are stored/read here.
pageIdCache = new Map();
constructor(configOrBaseURL, email, password) {
// Normalize the legacy positional form into the object union.
const config = typeof configOrBaseURL === "string"
@@ -572,6 +586,35 @@ export class DocmostClient {
const response = await this.client.post("/pages/info", { pageId });
return response.data?.data ?? response.data;
}
/**
* Resolve an agent-supplied pageId to the page's CANONICAL UUID (`page.id`),
* so every collaboration document the MCP opens is named `page.<uuid>` — the
* SAME name the web editor always uses (`page.${page.id}`).
*
* The agent commonly passes a 10-char public slugId (from URLs/listings) as
* the pageId. The web editor opens the collab doc by UUID, but the MCP used to
* pass that slugId straight into the collab doc name (`page.<slugId>`). For one
* DB row that produced TWO independent Yjs documents whose debounced stores
* clobbered each other — the agent's edit was silently lost (#260).
*
* A UUID input short-circuits with no network round-trip. A slugId is resolved
* once via getPageRaw and cached (both slugId->uuid and uuid->uuid), so
* repeated edits on the same page add no extra request.
*/
async resolvePageId(pageId) {
if (isUuid(pageId))
return pageId;
const cached = this.pageIdCache.get(pageId);
if (cached)
return cached;
const data = await this.getPageRaw(pageId);
const uuid = data?.id;
if (typeof uuid !== "string" || !uuid) {
throw new Error(`Could not resolve a canonical page id for "${pageId}"`);
}
this.pageIdCache.set(pageId, uuid);
return uuid;
}
async getPage(pageId) {
await this.ensureAuthenticated();
const resultData = await this.getPageRaw(pageId);
@@ -863,10 +906,12 @@ export class DocmostClient {
async tableInsertRow(pageId, tableRef, cells, index) {
await this.ensureAuthenticated();
const collabToken = await this.getCollabTokenWithReauth();
// Open the collab doc by the canonical UUID, never the slugId (#260).
const pageUuid = await this.resolvePageId(pageId);
// Track insertion in an outer var, reset per-transform, so a collab retry
// recomputes it cleanly (mirrors insertNode's pattern).
let inserted = false;
const mutation = await mutatePageContent(pageId, collabToken, this.apiUrl, (liveDoc) => {
const mutation = await mutatePageContent(pageUuid, collabToken, this.apiUrl, (liveDoc) => {
inserted = false;
const { doc: nd, inserted: ins } = insertTableRow(liveDoc, tableRef, cells, index);
inserted = ins;
@@ -892,8 +937,10 @@ export class DocmostClient {
async tableDeleteRow(pageId, tableRef, index) {
await this.ensureAuthenticated();
const collabToken = await this.getCollabTokenWithReauth();
// Open the collab doc by the canonical UUID, never the slugId (#260).
const pageUuid = await this.resolvePageId(pageId);
let deleted = false;
const mutation = await mutatePageContent(pageId, collabToken, this.apiUrl, (liveDoc) => {
const mutation = await mutatePageContent(pageUuid, collabToken, this.apiUrl, (liveDoc) => {
deleted = false;
const { doc: nd, deleted: del } = deleteTableRow(liveDoc, tableRef, index);
deleted = del;
@@ -921,8 +968,10 @@ export class DocmostClient {
async tableUpdateCell(pageId, tableRef, row, col, text) {
await this.ensureAuthenticated();
const collabToken = await this.getCollabTokenWithReauth();
// Open the collab doc by the canonical UUID, never the slugId (#260).
const pageUuid = await this.resolvePageId(pageId);
let updated = false;
const mutation = await mutatePageContent(pageId, collabToken, this.apiUrl, (liveDoc) => {
const mutation = await mutatePageContent(pageUuid, collabToken, this.apiUrl, (liveDoc) => {
updated = false;
const { doc: nd, updated: upd } = updateTableCell(liveDoc, tableRef, row, col, text);
updated = upd;
@@ -1034,6 +1083,10 @@ export class DocmostClient {
*/
async updatePage(pageId, content, title) {
await this.ensureAuthenticated();
// Open the collab doc by the canonical UUID, never the slugId (#260). The
// REST /pages/update title write below keeps the agent-supplied id (the
// server resolves a slugId there).
const pageUuid = await this.resolvePageId(pageId);
// Write the BODY first, then the title (#159 split-brain). If the collab
// body write fails (e.g. a persist timeout), the title must be left
// UNTOUCHED so the page never ends up with a new title over its old body.
@@ -1043,7 +1096,7 @@ export class DocmostClient {
let mutation;
try {
collabToken = await this.getCollabTokenWithReauth();
mutation = await updatePageContentRealtime(pageId, content, collabToken, this.apiUrl);
mutation = await updatePageContentRealtime(pageUuid, content, collabToken, this.apiUrl);
}
catch (error) {
// Verbose diagnostics (incl. anything that could expose a token prefix)
@@ -1259,7 +1312,9 @@ export class DocmostClient {
// Write the BODY first, then the title (#159 split-brain): a failed body
// write (e.g. persist timeout) must not leave a new title over the old body.
const collabToken = await this.getCollabTokenWithReauth();
const mutation = await this.replacePage(pageId, doc, collabToken, this.apiUrl);
// Open the collab doc by the canonical UUID, never the slugId (#260).
const pageUuid = await this.resolvePageId(pageId);
const mutation = await this.replacePage(pageUuid, doc, collabToken, this.apiUrl);
// Body persisted successfully — now it is safe to set the title.
if (title) {
await this.client.post("/pages/update", { pageId, title });
@@ -1294,8 +1349,10 @@ export class DocmostClient {
throw new Error("insert_footnote: text is required");
}
const collabToken = await this.getCollabTokenWithReauth();
// Open the collab doc by the canonical UUID, never the slugId (#260).
const pageUuid = await this.resolvePageId(pageId);
let result = null;
const mutation = await this.mutatePage(pageId, collabToken, this.apiUrl, (liveDoc) => {
const mutation = await this.mutatePage(pageUuid, collabToken, this.apiUrl, (liveDoc) => {
const r = insertInlineFootnote(liveDoc, { anchorText, text });
if (!r.inserted) {
// Abort the page-locked write by throwing: mutatePageContent does not
@@ -1383,7 +1440,9 @@ export class DocmostClient {
// PAGE import: canonicalize footnotes (see markdownToProseMirrorCanonical).
const doc = await markdownToProseMirrorCanonical(body);
const collabToken = await this.getCollabTokenWithReauth();
const mutation = await replacePageContent(pageId, doc, collabToken, this.apiUrl);
// Open the collab doc by the canonical UUID, never the slugId (#260).
const pageUuid = await this.resolvePageId(pageId);
const mutation = await replacePageContent(pageUuid, doc, collabToken, this.apiUrl);
// Collect distinct comment ids that actually became comment marks in the doc.
const collectCommentIds = (node, acc) => {
if (!node || typeof node !== "object")
@@ -1467,7 +1526,9 @@ export class DocmostClient {
// to the target (parity with the other full-doc write paths).
const canonical = canonicalizeFootnotes(content);
const collabToken = await this.getCollabTokenWithReauth();
const mutation = await this.replacePage(targetPageId, canonical, collabToken, this.apiUrl);
// Open the TARGET collab doc by its canonical UUID, never the slugId (#260).
const targetUuid = await this.resolvePageId(targetPageId);
const mutation = await this.replacePage(targetUuid, canonical, collabToken, this.apiUrl);
return {
success: true,
sourcePageId,
@@ -1483,6 +1544,8 @@ export class DocmostClient {
async editPageText(pageId, edits) {
await this.ensureAuthenticated();
const collabToken = await this.getCollabTokenWithReauth();
// Open the collab doc by the canonical UUID, never the slugId (#260).
const pageUuid = await this.resolvePageId(pageId);
// Apply the edits against the LIVE synced document, not the debounced REST
// snapshot, so concurrent human edits/comments are preserved. applyTextEdits
// records per-edit match problems in `failed` instead of throwing, and
@@ -1495,7 +1558,7 @@ export class DocmostClient {
// we must NOT write (no spurious history version) and must not claim a write
// happened.
let wrote = false;
const mutation = await mutatePageContent(pageId, collabToken, this.apiUrl, (liveDoc) => {
const mutation = await mutatePageContent(pageUuid, collabToken, this.apiUrl, (liveDoc) => {
wrote = false;
const r = applyTextEdits(liveDoc, edits);
results = r.results;
@@ -1580,10 +1643,12 @@ export class DocmostClient {
target.attrs.id = nodeId;
}
const collabToken = await this.getCollabTokenWithReauth();
// Open the collab doc by the canonical UUID, never the slugId (#260).
const pageUuid = await this.resolvePageId(pageId);
// Track the replacement count in an outer var, reset per-transform, so a
// collab retry recomputes it cleanly (mirrors replaceImage's pattern).
let replaced = 0;
const mutation = await mutatePageContent(pageId, collabToken, this.apiUrl, (liveDoc) => {
const mutation = await mutatePageContent(pageUuid, collabToken, this.apiUrl, (liveDoc) => {
replaced = 0;
const { doc: nd, replaced: r } = replaceNodeById(liveDoc, nodeId, target);
replaced = r;
@@ -1636,10 +1701,12 @@ export class DocmostClient {
}
}
const collabToken = await this.getCollabTokenWithReauth();
// Open the collab doc by the canonical UUID, never the slugId (#260).
const pageUuid = await this.resolvePageId(pageId);
// Track insertion in an outer var, reset per-transform, so a collab retry
// recomputes it cleanly (mirrors replaceImage's pattern).
let inserted = false;
const mutation = await mutatePageContent(pageId, collabToken, this.apiUrl, (liveDoc) => {
const mutation = await mutatePageContent(pageUuid, collabToken, this.apiUrl, (liveDoc) => {
inserted = false;
const { doc: nd, inserted: ins } = insertNodeRelative(liveDoc, node, opts);
inserted = ins;
@@ -1675,10 +1742,12 @@ export class DocmostClient {
async deleteNode(pageId, nodeId) {
await this.ensureAuthenticated();
const collabToken = await this.getCollabTokenWithReauth();
// Open the collab doc by the canonical UUID, never the slugId (#260).
const pageUuid = await this.resolvePageId(pageId);
// Track the deletion count in an outer var, reset per-transform, so a
// collab retry recomputes it cleanly (mirrors replaceImage's pattern).
let deleted = 0;
const mutation = await mutatePageContent(pageId, collabToken, this.apiUrl, (liveDoc) => {
const mutation = await mutatePageContent(pageUuid, collabToken, this.apiUrl, (liveDoc) => {
deleted = 0;
const { doc: nd, deleted: d } = deleteNodeById(liveDoc, nodeId);
deleted = d;
@@ -1921,7 +1990,10 @@ export class DocmostClient {
let anchored = false;
try {
const collabToken = await this.getCollabTokenWithReauth();
const mutation = await mutatePageContent(pageId, collabToken, this.apiUrl, (liveDoc) => {
// Open the collab doc by the canonical UUID, never the slugId (#260). The
// /comments/create REST call above keeps the agent-supplied id.
const pageUuid = await this.resolvePageId(pageId);
const mutation = await mutatePageContent(pageUuid, collabToken, this.apiUrl, (liveDoc) => {
const doc = liveDoc && liveDoc.type === "doc"
? liveDoc
: { type: "doc", content: [] };
@@ -2324,6 +2396,9 @@ export class DocmostClient {
if (opts.alt)
node.attrs.alt = opts.alt;
const collabToken = await this.getCollabTokenWithReauth();
// Open the collab doc by the canonical UUID, never the slugId (#260). The
// uploadImage /files/upload call above keeps the agent-supplied id.
const pageUuid = await this.resolvePageId(pageId);
// Recursively collect the plain text of a top-level block.
const blockText = (n) => {
let out = "";
@@ -2337,7 +2412,7 @@ export class DocmostClient {
// concurrent edits/comments/images are preserved and parallel insert_image
// calls (serialized by the per-page lock) each see the previous insertion.
let placement;
const mutation = await mutatePageContent(pageId, collabToken, this.apiUrl, (liveDoc) => {
const mutation = await mutatePageContent(pageUuid, collabToken, this.apiUrl, (liveDoc) => {
const doc = liveDoc && liveDoc.type === "doc"
? liveDoc
: { type: "doc", content: [] };
@@ -2424,6 +2499,13 @@ export class DocmostClient {
*/
async replaceImage(pageId, oldAttachmentId, url, opts = {}) {
const collabToken = await this.getCollabTokenWithReauth();
// Open the collab doc by the canonical UUID, never the slugId (#260). The
// page lock must ALSO key on the UUID so this operation serializes against
// other writes to the same page (mutatePageContent now locks by the resolved
// UUID too); locking by the raw slugId here would desync the mutex key and
// reopen the TOCTOU/orphan-attachment window the lock closes. uploadImage
// keeps the agent-supplied id (it hits REST, not the collab doc).
const pageUuid = await this.resolvePageId(pageId);
// Hold ONE per-page lock for the WHOLE operation (scan -> upload -> write).
// Previously the scan and the write were two separate mutatePageContent
// calls, each acquiring + releasing the lock, with the upload happening in
@@ -2435,7 +2517,7 @@ export class DocmostClient {
// reentrant, so the self-locking mutatePageContent would deadlock here)
// closes that TOCTOU window. uploadImage hits /files/upload over plain HTTP
// and does not touch the page lock, so it is safe to call while held.
return withPageLock(pageId, async () => {
return withPageLock(pageUuid, async () => {
// STEP 1: read-only live check. Scan the live document for any image node
// matching oldAttachmentId BEFORE uploading anything, so a wrong/stale id
// throws without ever creating an orphan attachment.
@@ -2453,7 +2535,7 @@ export class DocmostClient {
scan(node.content);
}
};
await this.mutateLiveContentUnlocked(pageId, collabToken, (liveDoc) => {
await this.mutateLiveContentUnlocked(pageUuid, collabToken, (liveDoc) => {
matchFound = false; // reset per-transform (collab may retry the read).
const doc = liveDoc && liveDoc.type === "doc"
? liveDoc
@@ -2501,7 +2583,7 @@ export class DocmostClient {
walk(node.content);
}
};
const mutation = await this.mutateLiveContentUnlocked(pageId, collabToken, (liveDoc) => {
const mutation = await this.mutateLiveContentUnlocked(pageUuid, collabToken, (liveDoc) => {
// Reset per-transform so collab retries recompute cleanly (no double-count).
replaced = 0;
const doc = liveDoc && liveDoc.type === "doc"
@@ -2598,7 +2680,10 @@ export class DocmostClient {
// JSON write path) before writing it back.
this.validateDocUrls(version.content);
const collabToken = await this.getCollabTokenWithReauth();
const mutation = await mutatePageContent(version.pageId, collabToken, this.apiUrl, () => version.content);
// version.pageId is the page entity id (already a UUID); resolvePageId
// short-circuits a UUID with no round-trip, so this is defensive only (#260).
const pageUuid = await this.resolvePageId(version.pageId);
const mutation = await mutatePageContent(pageUuid, collabToken, this.apiUrl, () => version.content);
return {
pageId: version.pageId,
restoredFrom: historyId,
@@ -2767,7 +2852,9 @@ export class DocmostClient {
}
// Apply atomically against the live doc.
const collabToken = await this.getCollabTokenWithReauth();
const mutation = await mutatePageContent(pageId, collabToken, this.apiUrl, runTransform);
// Open the collab doc by the canonical UUID, never the slugId (#260).
const pageUuid = await this.resolvePageId(pageId);
const mutation = await mutatePageContent(pageUuid, collabToken, this.apiUrl, runTransform);
// Optionally delete consumed comments (best-effort; a delete failure must
// not undo the successful write).
const deletedComments = [];

View File

@@ -27,7 +27,7 @@ const VERSION = packageJson.version;
// --- Modern McpServer Implementation ---
// Editing guide surfaced to MCP clients in the initialize result so they can
// pick the right tool by intent and avoid resending whole documents.
const SERVER_INSTRUCTIONS = "Docmost editing guide — choose the tool by intent: fix wording/typos/numbers (text inside blocks) -> edit_page_text (no node id needed). Change ONE block (paragraph/heading/callout/table cell/etc.) structurally -> patch_node (address by attrs.id from get_page_json). Add a block -> insert_node (before/after a block by attrs.id or by anchor text, or append). Remove a block -> delete_node (by attrs.id). Images -> insert_image (add an image from a web URL) / replace_image (swap an existing image for one from a web URL). New page -> create_page (Markdown). Bulk/structural rewrite or nodes without an id -> update_page_json (full ProseMirror replace; prefer the granular tools above to avoid resending the whole ~100KB+ document). Copy/replace a page's whole content from another page (server-side, no document through the model) -> copy_page_content. Rename a page (title only) -> rename_page. Read -> get_page (Markdown, lossy) or get_page_json (lossless ProseMirror with block ids). Comments -> create_comment (always inline; requires an EXACT selection — the contiguous text to anchor/highlight on; fails rather than leaving an unanchored comment), list_comments, update_comment, delete_comment, check_new_comments. Tip: read block ids via get_page_json, then use patch_node/insert_node/delete_node so you never resend the full document. " +
const SERVER_INSTRUCTIONS = "Docmost editing guide — choose the tool by intent: fix wording/typos/numbers (text inside blocks) -> edit_page_text (no node id needed). Change ONE block (paragraph/heading/callout/table cell/etc.) structurally -> patch_node (address by attrs.id from get_page_json). Add a block -> insert_node (before/after a block by attrs.id or by anchor text, or append). Remove a block -> delete_node (by attrs.id). Images -> insert_image (add an image from a web URL) / replace_image (swap an existing image for one from a web URL). New page -> create_page (Markdown). Bulk/structural rewrite or nodes without an id -> update_page_json (full ProseMirror replace; prefer the granular tools above to avoid resending the whole ~100KB+ document). Copy/replace a page's whole content from another page (server-side, no document through the model) -> copy_page_content. Rename a page (title only) -> rename_page. Read -> get_page (Markdown, lossy) or get_page_json (lossless ProseMirror with block ids). Comments -> create_comment (always inline; requires an EXACT selection — the contiguous text to anchor/highlight on; fails rather than leaving an unanchored comment), list_comments, update_comment, resolve_comment (resolve/reopen a thread, reversible — prefer over delete to close), delete_comment, check_new_comments. Tip: read block ids via get_page_json, then use patch_node/insert_node/delete_node so you never resend the full document. " +
"Complex/scripted rewrite (multiple coordinated edits, footnotes, renumbering) -> docmost_transform: write a JS `(doc, ctx) => doc` transform, preview the diff with dryRun (default), then apply with dryRun:false; ctx.helpers includes commentsToFootnotes for turning inline comments into numbered footnotes. " +
"Review what changed -> diff_page_versions (compare a historyId to current, or two history versions). See a page's saved versions -> list_page_history. Undo a bad edit -> restore_page_version (writes a past version back as current; itself revertible). " +
"Lossless markdown round-trip (download, edit, re-upload, incl. comment anchors) -> export_page_markdown / import_page_markdown.";
@@ -603,6 +603,27 @@ export function createDocmostMcpServer(config) {
],
};
});
// Tool: resolve_comment
server.registerTool("resolve_comment", {
description: "Resolve (close) or reopen a comment thread. Only top-level comments can " +
"be resolved — the server rejects resolving a reply. Reversible: pass " +
"resolved=false to reopen. Resolving keeps the thread and its replies " +
"(unlike delete_comment, which permanently removes them).",
inputSchema: {
commentId: z
.string()
.min(1)
.describe("ID of the top-level comment thread to resolve or reopen"),
resolved: z
.boolean()
.optional()
.default(true)
.describe("true (default) marks the thread resolved/closed; false reopens it"),
},
}, async ({ commentId, resolved }) => {
const result = await docmostClient.resolveComment(commentId, resolved);
return jsonContent(result);
});
// Tool: check_new_comments
server.registerTool("check_new_comments", {
description: "Check for new comments across pages in a space since a given timestamp. " +

View File

@@ -29,6 +29,41 @@ export async function getCollabToken(baseUrl, apiToken) {
throw error;
}
}
/**
* Pure cookie-parsing helper extracted from `performLogin` so the parsing logic
* can be unit-tested without performing the login network request. Given the
* raw `Set-Cookie` header array from the login response, return the `authToken`
* cookie's value.
*
* Behavior (kept identical to the original inline logic):
* - throws if there is no Set-Cookie header at all;
* - matches the cookie NAME exactly (`authToken`), so a future
* `authTokenRefresh=...` cookie is NOT picked up (a `startsWith` would be);
* - returns everything after the FIRST `=` up to the first `;`, so a base64
* value containing `=` padding is preserved (a naive `split("=")` would
* truncate it);
* - cookie attributes after the first `;` (Path, HttpOnly, Expires, …) are
* ignored;
* - throws if no `authToken` cookie is present.
*/
export function extractAuthTokenFromSetCookie(cookies) {
if (!cookies) {
throw new Error("No Set-Cookie header found in login response");
}
// Match the cookie name exactly to avoid matching a future
// authTokenRefresh cookie (startsWith would catch it).
const authCookie = cookies.find((c) => {
const kv = c.split(";")[0];
return kv.slice(0, kv.indexOf("=")) === "authToken";
});
if (!authCookie) {
throw new Error("No authToken cookie found in login response");
}
// Take everything after the FIRST "=" up to the first ";".
// Splitting on "=" would truncate base64 values containing "=" padding.
const kv = authCookie.split(";")[0];
return kv.slice(kv.indexOf("=") + 1);
}
export async function performLogin(baseUrl, email, password) {
try {
const response = await axios.post(`${baseUrl}/auth/login`, {
@@ -36,24 +71,7 @@ export async function performLogin(baseUrl, email, password) {
password,
});
// Extract token from Set-Cookie header
const cookies = response.headers["set-cookie"];
if (!cookies) {
throw new Error("No Set-Cookie header found in login response");
}
// Match the cookie name exactly to avoid matching a future
// authTokenRefresh cookie (startsWith would catch it).
const authCookie = cookies.find((c) => {
const kv = c.split(";")[0];
return kv.slice(0, kv.indexOf("=")) === "authToken";
});
if (!authCookie) {
throw new Error("No authToken cookie found in login response");
}
// Take everything after the FIRST "=" up to the first ";".
// Splitting on "=" would truncate base64 values containing "=" padding.
const kv = authCookie.split(";")[0];
const token = kv.slice(kv.indexOf("=") + 1);
return token;
return extractAuthTokenFromSetCookie(response.headers["set-cookie"]);
}
catch (error) {
// Avoid leaking the full server response body by default; log only the

View File

@@ -1089,7 +1089,24 @@ export const docmostExtensions = [
heading: {},
link: { openOnClick: false },
}),
Image.configure({ inline: false }),
// Stock @tiptap/extension-image has no caption attribute, so a round-trip
// through this schema would drop the data-caption the client TiptapImage
// emits. Mirror editor-ext image.ts: add a caption attribute that parses
// data-caption and re-renders it only when set (caption-less images stay
// clean), keeping the MCP markdown round-trip lossless.
Image.extend({
addAttributes() {
const parent = this.parent?.() ?? {};
return {
...parent,
caption: {
default: undefined,
parseHTML: (el) => el.getAttribute("data-caption") || undefined,
renderHTML: (attrs) => attrs.caption ? { "data-caption": attrs.caption } : {},
},
};
},
}).configure({ inline: false }),
TaskList,
TaskItem.configure({ nested: true }),
// Highlight stores its color unescaped and Docmost interpolates it into

View File

@@ -213,16 +213,27 @@ export function convertProseMirrorToMarkdown(content) {
// Two trailing spaces before the newline encode a markdown hard break;
// a bare "\n" would be reimported as a soft break and lost.
return " \n";
case "image":
case "image": {
const imgAlt = node.attrs?.alt || "";
const imgCaption = node.attrs?.caption || "";
if (imgCaption) {
// ![]() can't carry a caption, so (symmetric to video) emit a raw
// <img> wrapped in a block <div>. On import marked.parse keeps the raw
// HTML and generateJSON runs the image extension's parseHTML, which
// restores the caption from data-caption.
const parts = [`src="${escapeAttr(node.attrs?.src ?? "")}"`];
if (imgAlt)
parts.push(`alt="${escapeAttr(imgAlt)}"`);
parts.push(`data-caption="${escapeAttr(imgCaption)}"`);
return `<div><img ${parts.join(" ")}></div>`;
}
// Neutralize characters that could break out of the markdown image
// URL: spaces/newlines and parentheses would terminate the (...) target
// and let a stored src inject following markdown/HTML. Percent-encode
// them so the URL stays a single inert token.
const imgSrc = encodeMdUrl(node.attrs?.src);
// No "caption" attribute exists in the Docmost image schema, so we do
// not emit one (the previous caption branch was dead).
return `![${imgAlt}](${imgSrc})`;
}
case "video": {
// Emit the schema-matching <video> element so generateJSON rebuilds the
// node with its attrs intact. The schema's parseHTML reads src/aria-label
@@ -624,6 +635,8 @@ export function convertProseMirrorToMarkdown(content) {
const parts = [`src="${escapeAttr(attrs.src ?? "")}"`];
if (attrs.alt)
parts.push(`alt="${escapeAttr(attrs.alt)}"`);
if (attrs.caption)
parts.push(`data-caption="${escapeAttr(attrs.caption)}"`);
if (attrs.title)
parts.push(`title="${escapeAttr(attrs.title)}"`);
if (attrs.width != null)

View File

@@ -133,6 +133,18 @@ export type DocmostMcpConfig = { apiUrl: string } & (
};
};
// Canonical UUID shape (versions 1–8, matching the `uuid` package's `validate`
// that the server's isValidUUID uses). page.repo.ts treats any non-UUID pageId
// as a slugId, so the MCP detects a UUID locally and skips a /pages/info
// round-trip in resolvePageId. A 10-char nanoid slugId never contains dashes,
// so it can never be misread as a UUID here.
const UUID_RE =
/^[0-9a-f]{8}-[0-9a-f]{4}-[1-8][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i;
function isUuid(value: string): boolean {
return typeof value === "string" && UUID_RE.test(value);
}
export class DocmostClient {
private client: AxiosInstance;
private token: string | null = null;
@@ -160,6 +172,11 @@ export class DocmostClient {
// can all call login() at once. Memoizing a single promise collapses that
// thundering herd into ONE /auth/login request that everyone awaits.
private loginPromise: Promise<void> | null = null;
// Canonical-UUID cache for resolvePageId: maps an agent-supplied slugId to the
// page's canonical UUID, so repeated collab edits on the same page do not
// re-fetch /pages/info. A UUID input short-circuits before this cache (see
// resolvePageId), so only slugId->uuid entries are stored/read here.
private pageIdCache = new Map<string, string>();
// Two construction forms:
// - new DocmostClient(config) // discriminated union (current)
@@ -751,6 +768,36 @@ export class DocmostClient {
return response.data?.data ?? response.data;
}
/**
* Resolve an agent-supplied pageId to the page's CANONICAL UUID (`page.id`),
* so every collaboration document the MCP opens is named `page.<uuid>` — the
* SAME name the web editor always uses (`page.${page.id}`).
*
* The agent commonly passes a 10-char public slugId (from URLs/listings) as
* the pageId. The web editor opens the collab doc by UUID, but the MCP used to
* pass that slugId straight into the collab doc name (`page.<slugId>`). For one
* DB row that produced TWO independent Yjs documents whose debounced stores
* clobbered each other — the agent's edit was silently lost (#260).
*
* A UUID input short-circuits with no network round-trip. A slugId is resolved
* once via getPageRaw and cached (both slugId->uuid and uuid->uuid), so
* repeated edits on the same page add no extra request.
*/
private async resolvePageId(pageId: string): Promise<string> {
if (isUuid(pageId)) return pageId;
const cached = this.pageIdCache.get(pageId);
if (cached) return cached;
const data = await this.getPageRaw(pageId);
const uuid = data?.id;
if (typeof uuid !== "string" || !uuid) {
throw new Error(
`Could not resolve a canonical page id for "${pageId}"`,
);
}
this.pageIdCache.set(pageId, uuid);
return uuid;
}
async getPage(pageId: string) {
await this.ensureAuthenticated();
const resultData = await this.getPageRaw(pageId);
@@ -1083,12 +1130,14 @@ export class DocmostClient {
) {
await this.ensureAuthenticated();
const collabToken = await this.getCollabTokenWithReauth();
// Open the collab doc by the canonical UUID, never the slugId (#260).
const pageUuid = await this.resolvePageId(pageId);
// Track insertion in an outer var, reset per-transform, so a collab retry
// recomputes it cleanly (mirrors insertNode's pattern).
let inserted = false;
const mutation = await mutatePageContent(
pageId,
pageUuid,
collabToken,
this.apiUrl,
(liveDoc) => {
@@ -1126,10 +1175,12 @@ export class DocmostClient {
async tableDeleteRow(pageId: string, tableRef: string, index: number) {
await this.ensureAuthenticated();
const collabToken = await this.getCollabTokenWithReauth();
// Open the collab doc by the canonical UUID, never the slugId (#260).
const pageUuid = await this.resolvePageId(pageId);
let deleted = false;
const mutation = await mutatePageContent(
pageId,
pageUuid,
collabToken,
this.apiUrl,
(liveDoc) => {
@@ -1174,10 +1225,12 @@ export class DocmostClient {
) {
await this.ensureAuthenticated();
const collabToken = await this.getCollabTokenWithReauth();
// Open the collab doc by the canonical UUID, never the slugId (#260).
const pageUuid = await this.resolvePageId(pageId);
let updated = false;
const mutation = await mutatePageContent(
pageId,
pageUuid,
collabToken,
this.apiUrl,
(liveDoc) => {
@@ -1313,6 +1366,10 @@ export class DocmostClient {
*/
async updatePage(pageId: string, content: string, title?: string) {
await this.ensureAuthenticated();
// Open the collab doc by the canonical UUID, never the slugId (#260). The
// REST /pages/update title write below keeps the agent-supplied id (the
// server resolves a slugId there).
const pageUuid = await this.resolvePageId(pageId);
// Write the BODY first, then the title (#159 split-brain). If the collab
// body write fails (e.g. a persist timeout), the title must be left
@@ -1324,7 +1381,7 @@ export class DocmostClient {
try {
collabToken = await this.getCollabTokenWithReauth();
mutation = await updatePageContentRealtime(
pageId,
pageUuid,
content,
collabToken,
this.apiUrl,
@@ -1587,8 +1644,10 @@ export class DocmostClient {
// Write the BODY first, then the title (#159 split-brain): a failed body
// write (e.g. persist timeout) must not leave a new title over the old body.
const collabToken = await this.getCollabTokenWithReauth();
// Open the collab doc by the canonical UUID, never the slugId (#260).
const pageUuid = await this.resolvePageId(pageId);
const mutation = await this.replacePage(
pageId,
pageUuid,
doc,
collabToken,
this.apiUrl,
@@ -1630,9 +1689,11 @@ export class DocmostClient {
throw new Error("insert_footnote: text is required");
}
const collabToken = await this.getCollabTokenWithReauth();
// Open the collab doc by the canonical UUID, never the slugId (#260).
const pageUuid = await this.resolvePageId(pageId);
let result: { footnoteId: string; reused: boolean } | null = null;
const mutation = await this.mutatePage(
pageId,
pageUuid,
collabToken,
this.apiUrl,
(liveDoc: any) => {
@@ -1740,8 +1801,10 @@ export class DocmostClient {
// PAGE import: canonicalize footnotes (see markdownToProseMirrorCanonical).
const doc = await markdownToProseMirrorCanonical(body);
const collabToken = await this.getCollabTokenWithReauth();
// Open the collab doc by the canonical UUID, never the slugId (#260).
const pageUuid = await this.resolvePageId(pageId);
const mutation = await replacePageContent(
pageId,
pageUuid,
doc,
collabToken,
this.apiUrl,
@@ -1840,8 +1903,10 @@ export class DocmostClient {
const canonical = canonicalizeFootnotes(content);
const collabToken = await this.getCollabTokenWithReauth();
// Open the TARGET collab doc by its canonical UUID, never the slugId (#260).
const targetUuid = await this.resolvePageId(targetPageId);
const mutation = await this.replacePage(
targetPageId,
targetUuid,
canonical,
collabToken,
this.apiUrl,
@@ -1864,6 +1929,8 @@ export class DocmostClient {
await this.ensureAuthenticated();
const collabToken = await this.getCollabTokenWithReauth();
// Open the collab doc by the canonical UUID, never the slugId (#260).
const pageUuid = await this.resolvePageId(pageId);
// Apply the edits against the LIVE synced document, not the debounced REST
// snapshot, so concurrent human edits/comments are preserved. applyTextEdits
@@ -1878,7 +1945,7 @@ export class DocmostClient {
// happened.
let wrote = false;
const mutation = await mutatePageContent(
pageId,
pageUuid,
collabToken,
this.apiUrl,
(liveDoc) => {
@@ -1978,12 +2045,14 @@ export class DocmostClient {
}
const collabToken = await this.getCollabTokenWithReauth();
// Open the collab doc by the canonical UUID, never the slugId (#260).
const pageUuid = await this.resolvePageId(pageId);
// Track the replacement count in an outer var, reset per-transform, so a
// collab retry recomputes it cleanly (mirrors replaceImage's pattern).
let replaced = 0;
const mutation = await mutatePageContent(
pageId,
pageUuid,
collabToken,
this.apiUrl,
(liveDoc) => {
@@ -2066,12 +2135,14 @@ export class DocmostClient {
}
const collabToken = await this.getCollabTokenWithReauth();
// Open the collab doc by the canonical UUID, never the slugId (#260).
const pageUuid = await this.resolvePageId(pageId);
// Track insertion in an outer var, reset per-transform, so a collab retry
// recomputes it cleanly (mirrors replaceImage's pattern).
let inserted = false;
const mutation = await mutatePageContent(
pageId,
pageUuid,
collabToken,
this.apiUrl,
(liveDoc) => {
@@ -2120,12 +2191,14 @@ export class DocmostClient {
await this.ensureAuthenticated();
const collabToken = await this.getCollabTokenWithReauth();
// Open the collab doc by the canonical UUID, never the slugId (#260).
const pageUuid = await this.resolvePageId(pageId);
// Track the deletion count in an outer var, reset per-transform, so a
// collab retry recomputes it cleanly (mirrors replaceImage's pattern).
let deleted = 0;
const mutation = await mutatePageContent(
pageId,
pageUuid,
collabToken,
this.apiUrl,
(liveDoc) => {
@@ -2414,8 +2487,11 @@ export class DocmostClient {
let anchored = false;
try {
const collabToken = await this.getCollabTokenWithReauth();
// Open the collab doc by the canonical UUID, never the slugId (#260). The
// /comments/create REST call above keeps the agent-supplied id.
const pageUuid = await this.resolvePageId(pageId);
const mutation = await mutatePageContent(
pageId,
pageUuid,
collabToken,
this.apiUrl,
(liveDoc) => {
@@ -2893,6 +2969,9 @@ export class DocmostClient {
if (opts.alt) node.attrs.alt = opts.alt;
const collabToken = await this.getCollabTokenWithReauth();
// Open the collab doc by the canonical UUID, never the slugId (#260). The
// uploadImage /files/upload call above keeps the agent-supplied id.
const pageUuid = await this.resolvePageId(pageId);
// Recursively collect the plain text of a top-level block.
const blockText = (n: any): string => {
@@ -2907,7 +2986,7 @@ export class DocmostClient {
// calls (serialized by the per-page lock) each see the previous insertion.
let placement: "replaced" | "after" | "appended" | undefined;
const mutation = await mutatePageContent(
pageId,
pageUuid,
collabToken,
this.apiUrl,
(liveDoc) => {
@@ -3019,6 +3098,13 @@ export class DocmostClient {
opts: { align?: "left" | "center" | "right"; alt?: string } = {},
) {
const collabToken = await this.getCollabTokenWithReauth();
// Open the collab doc by the canonical UUID, never the slugId (#260). The
// page lock must ALSO key on the UUID so this operation serializes against
// other writes to the same page (mutatePageContent now locks by the resolved
// UUID too); locking by the raw slugId here would desync the mutex key and
// reopen the TOCTOU/orphan-attachment window the lock closes. uploadImage
// keeps the agent-supplied id (it hits REST, not the collab doc).
const pageUuid = await this.resolvePageId(pageId);
// Hold ONE per-page lock for the WHOLE operation (scan -> upload -> write).
// Previously the scan and the write were two separate mutatePageContent
@@ -3031,7 +3117,7 @@ export class DocmostClient {
// reentrant, so the self-locking mutatePageContent would deadlock here)
// closes that TOCTOU window. uploadImage hits /files/upload over plain HTTP
// and does not touch the page lock, so it is safe to call while held.
return withPageLock(pageId, async () => {
return withPageLock(pageUuid, async () => {
// STEP 1: read-only live check. Scan the live document for any image node
// matching oldAttachmentId BEFORE uploading anything, so a wrong/stale id
// throws without ever creating an orphan attachment.
@@ -3050,7 +3136,7 @@ export class DocmostClient {
}
};
await this.mutateLiveContentUnlocked(pageId, collabToken, (liveDoc) => {
await this.mutateLiveContentUnlocked(pageUuid, collabToken, (liveDoc) => {
matchFound = false; // reset per-transform (collab may retry the read).
const doc =
liveDoc && liveDoc.type === "doc"
@@ -3105,7 +3191,7 @@ export class DocmostClient {
};
const mutation = await this.mutateLiveContentUnlocked(
pageId,
pageUuid,
collabToken,
(liveDoc) => {
// Reset per-transform so collab retries recompute cleanly (no double-count).
@@ -3214,8 +3300,11 @@ export class DocmostClient {
// JSON write path) before writing it back.
this.validateDocUrls(version.content);
const collabToken = await this.getCollabTokenWithReauth();
// version.pageId is the page entity id (already a UUID); resolvePageId
// short-circuits a UUID with no round-trip, so this is defensive only (#260).
const pageUuid = await this.resolvePageId(version.pageId);
const mutation = await mutatePageContent(
version.pageId,
pageUuid,
collabToken,
this.apiUrl,
() => version.content,
@@ -3414,8 +3503,10 @@ export class DocmostClient {
// Apply atomically against the live doc.
const collabToken = await this.getCollabTokenWithReauth();
// Open the collab doc by the canonical UUID, never the slugId (#260).
const pageUuid = await this.resolvePageId(pageId);
const mutation = await mutatePageContent(
pageId,
pageUuid,
collabToken,
this.apiUrl,
runTransform,

View File

@@ -38,7 +38,7 @@ const VERSION = packageJson.version;
// Editing guide surfaced to MCP clients in the initialize result so they can
// pick the right tool by intent and avoid resending whole documents.
const SERVER_INSTRUCTIONS =
"Docmost editing guide — choose the tool by intent: fix wording/typos/numbers (text inside blocks) -> edit_page_text (no node id needed). Change ONE block (paragraph/heading/callout/table cell/etc.) structurally -> patch_node (address by attrs.id from get_page_json). Add a block -> insert_node (before/after a block by attrs.id or by anchor text, or append). Remove a block -> delete_node (by attrs.id). Images -> insert_image (add an image from a web URL) / replace_image (swap an existing image for one from a web URL). New page -> create_page (Markdown). Bulk/structural rewrite or nodes without an id -> update_page_json (full ProseMirror replace; prefer the granular tools above to avoid resending the whole ~100KB+ document). Copy/replace a page's whole content from another page (server-side, no document through the model) -> copy_page_content. Rename a page (title only) -> rename_page. Read -> get_page (Markdown, lossy) or get_page_json (lossless ProseMirror with block ids). Comments -> create_comment (always inline; requires an EXACT selection — the contiguous text to anchor/highlight on; fails rather than leaving an unanchored comment), list_comments, update_comment, delete_comment, check_new_comments. Tip: read block ids via get_page_json, then use patch_node/insert_node/delete_node so you never resend the full document. " +
"Docmost editing guide — choose the tool by intent: fix wording/typos/numbers (text inside blocks) -> edit_page_text (no node id needed). Change ONE block (paragraph/heading/callout/table cell/etc.) structurally -> patch_node (address by attrs.id from get_page_json). Add a block -> insert_node (before/after a block by attrs.id or by anchor text, or append). Remove a block -> delete_node (by attrs.id). Images -> insert_image (add an image from a web URL) / replace_image (swap an existing image for one from a web URL). New page -> create_page (Markdown). Bulk/structural rewrite or nodes without an id -> update_page_json (full ProseMirror replace; prefer the granular tools above to avoid resending the whole ~100KB+ document). Copy/replace a page's whole content from another page (server-side, no document through the model) -> copy_page_content. Rename a page (title only) -> rename_page. Read -> get_page (Markdown, lossy) or get_page_json (lossless ProseMirror with block ids). Comments -> create_comment (always inline; requires an EXACT selection — the contiguous text to anchor/highlight on; fails rather than leaving an unanchored comment), list_comments, update_comment, resolve_comment (resolve/reopen a thread, reversible — prefer over delete to close), delete_comment, check_new_comments. Tip: read block ids via get_page_json, then use patch_node/insert_node/delete_node so you never resend the full document. " +
"Complex/scripted rewrite (multiple coordinated edits, footnotes, renumbering) -> docmost_transform: write a JS `(doc, ctx) => doc` transform, preview the diff with dryRun (default), then apply with dryRun:false; ctx.helpers includes commentsToFootnotes for turning inline comments into numbered footnotes. " +
"Review what changed -> diff_page_versions (compare a historyId to current, or two history versions). See a page's saved versions -> list_page_history. Undo a bad edit -> restore_page_version (writes a past version back as current; itself revertible). " +
"Lossless markdown round-trip (download, edit, re-upload, incl. comment anchors) -> export_page_markdown / import_page_markdown.";
@@ -838,6 +838,35 @@ server.registerTool(
},
);
// Tool: resolve_comment
server.registerTool(
"resolve_comment",
{
description:
"Resolve (close) or reopen a comment thread. Only top-level comments can " +
"be resolved — the server rejects resolving a reply. Reversible: pass " +
"resolved=false to reopen. Resolving keeps the thread and its replies " +
"(unlike delete_comment, which permanently removes them).",
inputSchema: {
commentId: z
.string()
.min(1)
.describe("ID of the top-level comment thread to resolve or reopen"),
resolved: z
.boolean()
.optional()
.default(true)
.describe(
"true (default) marks the thread resolved/closed; false reopens it",
),
},
},
async ({ commentId, resolved }) => {
const result = await docmostClient.resolveComment(commentId, resolved);
return jsonContent(result);
},
);
// Tool: check_new_comments
server.registerTool(
"check_new_comments",

View File

@@ -38,6 +38,45 @@ export async function getCollabToken(
}
}
/**
* Pure cookie-parsing helper extracted from `performLogin` so the parsing logic
* can be unit-tested without performing the login network request. Given the
* raw `Set-Cookie` header array from the login response, return the `authToken`
* cookie's value.
*
* Behavior (kept identical to the original inline logic):
* - throws if there is no Set-Cookie header at all;
* - matches the cookie NAME exactly (`authToken`), so a future
* `authTokenRefresh=...` cookie is NOT picked up (a `startsWith` would be);
* - returns everything after the FIRST `=` up to the first `;`, so a base64
* value containing `=` padding is preserved (a naive `split("=")` would
* truncate it);
* - cookie attributes after the first `;` (Path, HttpOnly, Expires, …) are
* ignored;
* - throws if no `authToken` cookie is present.
*/
export function extractAuthTokenFromSetCookie(
cookies: string[] | undefined,
): string {
if (!cookies) {
throw new Error("No Set-Cookie header found in login response");
}
// Match the cookie name exactly to avoid matching a future
// authTokenRefresh cookie (startsWith would catch it).
const authCookie = cookies.find((c: string) => {
const kv = c.split(";")[0];
return kv.slice(0, kv.indexOf("=")) === "authToken";
});
if (!authCookie) {
throw new Error("No authToken cookie found in login response");
}
// Take everything after the FIRST "=" up to the first ";".
// Splitting on "=" would truncate base64 values containing "=" padding.
const kv = authCookie.split(";")[0];
return kv.slice(kv.indexOf("=") + 1);
}
export async function performLogin(
baseUrl: string,
email: string,
@@ -50,25 +89,7 @@ export async function performLogin(
});
// Extract token from Set-Cookie header
const cookies = response.headers["set-cookie"];
if (!cookies) {
throw new Error("No Set-Cookie header found in login response");
}
// Match the cookie name exactly to avoid matching a future
// authTokenRefresh cookie (startsWith would catch it).
const authCookie = cookies.find((c: string) => {
const kv = c.split(";")[0];
return kv.slice(0, kv.indexOf("=")) === "authToken";
});
if (!authCookie) {
throw new Error("No authToken cookie found in login response");
}
// Take everything after the FIRST "=" up to the first ";".
// Splitting on "=" would truncate base64 values containing "=" padding.
const kv = authCookie.split(";")[0];
const token = kv.slice(kv.indexOf("=") + 1);
return token;
return extractAuthTokenFromSetCookie(response.headers["set-cookie"]);
} catch (error: any) {
// Avoid leaking the full server response body by default; log only the
// HTTP status. Log the verbose body only when DEBUG is set.

View File

@@ -1184,7 +1184,26 @@ export const docmostExtensions = [
heading: {},
link: { openOnClick: false },
}),
Image.configure({ inline: false }),
// Stock @tiptap/extension-image has no caption attribute, so a round-trip
// through this schema would drop the data-caption the client TiptapImage
// emits. Mirror editor-ext image.ts: add a caption attribute that parses
// data-caption and re-renders it only when set (caption-less images stay
// clean), keeping the MCP markdown round-trip lossless.
Image.extend({
addAttributes() {
const parent = this.parent?.() ?? {};
return {
...parent,
caption: {
default: undefined,
parseHTML: (el: HTMLElement) =>
el.getAttribute("data-caption") || undefined,
renderHTML: (attrs: Record<string, any>) =>
attrs.caption ? { "data-caption": attrs.caption } : {},
},
};
},
}).configure({ inline: false }),
TaskList,
TaskItem.configure({ nested: true }),
// Highlight stores its color unescaped and Docmost interpolates it into

View File

@@ -234,16 +234,26 @@ export function convertProseMirrorToMarkdown(content: any): string {
// a bare "\n" would be reimported as a soft break and lost.
return " \n";
case "image":
case "image": {
const imgAlt = node.attrs?.alt || "";
const imgCaption = node.attrs?.caption || "";
if (imgCaption) {
// ![]() can't carry a caption, so (symmetric to video) emit a raw
// <img> wrapped in a block <div>. On import marked.parse keeps the raw
// HTML and generateJSON runs the image extension's parseHTML, which
// restores the caption from data-caption.
const parts: string[] = [`src="${escapeAttr(node.attrs?.src ?? "")}"`];
if (imgAlt) parts.push(`alt="${escapeAttr(imgAlt)}"`);
parts.push(`data-caption="${escapeAttr(imgCaption)}"`);
return `<div><img ${parts.join(" ")}></div>`;
}
// Neutralize characters that could break out of the markdown image
// URL: spaces/newlines and parentheses would terminate the (...) target
// and let a stored src inject following markdown/HTML. Percent-encode
// them so the URL stays a single inert token.
const imgSrc = encodeMdUrl(node.attrs?.src);
// No "caption" attribute exists in the Docmost image schema, so we do
// not emit one (the previous caption branch was dead).
return `![${imgAlt}](${imgSrc})`;
}
case "video": {
// Emit the schema-matching <video> element so generateJSON rebuilds the
@@ -684,6 +694,8 @@ export function convertProseMirrorToMarkdown(content: any): string {
const attrs = node.attrs || {};
const parts: string[] = [`src="${escapeAttr(attrs.src ?? "")}"`];
if (attrs.alt) parts.push(`alt="${escapeAttr(attrs.alt)}"`);
if (attrs.caption)
parts.push(`data-caption="${escapeAttr(attrs.caption)}"`);
if (attrs.title) parts.push(`title="${escapeAttr(attrs.title)}"`);
if (attrs.width != null) parts.push(`width="${escapeAttr(attrs.width)}"`);
if (attrs.height != null) parts.push(`height="${escapeAttr(attrs.height)}"`);

View File

@@ -469,6 +469,17 @@ async function main() {
check("update_comment + get_comment: content updated", got.data.content.includes("Обновлённый"), got.data.content);
const news = await client.checkNewComments(spaceId, beforeComments, pageId);
check("check_new_comments: finds new comments in subtree", news.totalNewComments >= 2, `total=${news.totalNewComments}`);
// resolve_comment: close the top-level thread, verify resolvedAt surfaces, then reopen
const resolvedRes = await client.resolveComment(c1.data.id, true);
check("resolve_comment: marks resolved", resolvedRes.success === true && resolvedRes.resolved === true);
const listResolved = await client.listComments(pageId);
const c1Resolved = listResolved.find((c) => c.id === c1.data.id);
check("resolve_comment: resolvedAt set in list", !!c1Resolved?.resolvedAt, `resolvedAt=${c1Resolved?.resolvedAt}`);
const reopenedRes = await client.resolveComment(c1.data.id, false);
check("resolve_comment: reopen succeeds", reopenedRes.resolved === false);
const listReopened = await client.listComments(pageId);
const c1Reopened = listReopened.find((c) => c.id === c1.data.id);
check("resolve_comment: resolvedAt cleared on reopen", !c1Reopened?.resolvedAt, `resolvedAt=${c1Reopened?.resolvedAt}`);
await client.deleteComment(reply.data.id);
await client.deleteComment(c1.data.id);
const listAfter = await client.listComments(pageId);

View File

@@ -132,7 +132,7 @@ test("patch_node REFUSES an ambiguous (duplicate) id without writing to collab",
await assert.rejects(
() =>
client.patchNode("page-1", DUP_ID, {
client.patchNode("11111111-1111-4111-8111-111111111111", DUP_ID, {
type: "paragraph",
content: [{ type: "text", text: "replacement" }],
}),
@@ -152,7 +152,7 @@ test("delete_node REFUSES an ambiguous (duplicate) id without writing to collab"
const client = new DocmostClient(baseURL, "user@example.com", "pw");
await assert.rejects(
() => client.deleteNode("page-2", DUP_ID),
() => client.deleteNode("22222222-2222-4222-8222-222222222222", DUP_ID),
/ambiguous/i,
"delete_node must reject a duplicate-id target with an 'ambiguous' error",
);

View File

@@ -37,6 +37,11 @@ function makeClient(liveDoc) {
async getCollabTokenWithReauth() {
return "collab-token";
}
// Identity resolution: this test isolates the footnote wrapper, so the
// slugId->uuid resolution (#260) is stubbed to a no-op and "p1" stays "p1".
async resolvePageId(pageId) {
return pageId;
}
async mutatePage(pageId, token, apiUrl, transform) {
calls.pageId = pageId;
calls.token = token;

View File

@@ -0,0 +1,387 @@
// Mock collab regression for the #260 data-loss bug: the MCP must open every
// collaboration document by the page's CANONICAL UUID (`page.<uuid>`) — the same
// name the web editor uses — even when the agent supplies a public slugId.
//
// Root cause: the agent commonly passes a 10-char slugId (from URLs/listings) as
// pageId. The web tab opens `page.<uuid>`, but the MCP used to pass the slugId
// straight into the collab doc name (`page.<slugId>`), so one DB page ended up
// with TWO independent Yjs documents whose debounced stores clobbered each other
// — the agent's edit was silently lost on reload.
//
// We stand up a real Hocuspocus server (like ambiguous-node-id.test.mjs) and
// capture the EXACT documentName each connection requests via onLoadDocument.
// The /pages/info mock resolves the slugId -> uuid, and counts its own hits so we
// can also prove the UUID short-circuit + cache (no redundant resolve round-trip).
import { test, after } from "node:test";
import assert from "node:assert/strict";
import http from "node:http";
import { WebSocketServer } from "ws";
import { Hocuspocus } from "@hocuspocus/server";
import { DocmostClient } from "../../build/client.js";
import { buildYDoc } from "../../build/lib/collaboration.js";
// Import the SAME page-lock module instance that build/client.js imports. ESM
// caches modules by resolved URL, so this `withPageLock` shares the very
// per-page mutex map (`chains`) the client uses — letting the replaceImage test
// probe which key the operation actually locks on (see that test for details).
import { withPageLock } from "../../build/lib/page-lock.js";
const SLUG = "dwzDdgPep2"; // 10-char nanoid public id (no dashes)
const UUID = "11111111-1111-4111-8111-111111111111"; // canonical page.id
// A simple one-paragraph document; "hello world" gives editPageText a match and
// insertFootnote an anchor. No table node, so tableInsertRow aborts with
// "no table found" — but the collab doc was still OPENED by then, which is what
// we assert (the doc NAME is fixed at connect time, before any transform runs).
function seedDoc() {
return {
type: "doc",
content: [
{
type: "paragraph",
attrs: { id: "p1" },
content: [{ type: "text", text: "hello world" }],
},
],
};
}
// Same shape as seedDoc but with one image node carrying attachmentId "att-old"
// (mirrors what client.addImage emits). replaceImage scans the live doc for this
// node, so it must survive the Yjs round-trip with attachmentId intact.
function seedDocWithImage() {
return {
type: "doc",
content: [
{
type: "paragraph",
attrs: { id: "p1" },
content: [{ type: "text", text: "hello world" }],
},
{
type: "image",
attrs: {
src: "/api/files/att-old/old.png",
attachmentId: "att-old",
size: 10,
align: "center",
width: null,
},
},
],
};
}
function readBody(req) {
return new Promise((resolve) => {
let raw = "";
req.on("data", (c) => (raw += c));
req.on("end", () => resolve(raw));
});
}
// Stand up an HTTP server that authenticates, hands out a collab token, serves
// /pages/info (slugId -> uuid resolution), and upgrades /collab to a Hocuspocus
// instance whose onLoadDocument records the requested documentName.
// opts.seed: a function returning the ProseMirror doc the collab server loads
// (defaults to seedDoc). opts.onUpload: an optional async hook invoked when
// /files/upload is hit, letting a test GATE the upload (hold replaceImage inside
// its page lock). Existing callers pass no opts and are unaffected.
async function spawnCollabStack(opts = {}) {
const seed = opts.seed ?? seedDoc;
const state = { docNames: [], pagesInfoCalls: [] };
const hocuspocus = new Hocuspocus({
quiet: true,
async onLoadDocument({ documentName }) {
state.docNames.push(documentName);
return buildYDoc(seed());
},
});
const wss = new WebSocketServer({ noServer: true });
const server = http.createServer(async (req, res) => {
const raw = await readBody(req);
if (req.url === "/api/auth/login") {
res.writeHead(200, {
"Content-Type": "application/json",
"Set-Cookie": "authToken=t; Path=/; HttpOnly",
});
res.end(JSON.stringify({ success: true }));
return;
}
if (req.url === "/api/auth/collab-token") {
res.writeHead(200, { "Content-Type": "application/json" });
res.end(JSON.stringify({ data: { token: "collab-jwt" } }));
return;
}
if (req.url === "/api/pages/info") {
let pageId;
try {
pageId = JSON.parse(raw)?.pageId;
} catch {
pageId = undefined;
}
state.pagesInfoCalls.push(pageId);
// Always resolve to the SAME canonical record, mirroring the server's
// findById (which accepts either the uuid or the slugId).
res.writeHead(200, { "Content-Type": "application/json" });
res.end(
JSON.stringify({
data: {
id: UUID,
slugId: SLUG,
title: "Doc",
spaceId: "space-1",
content: seedDoc(),
},
}),
);
return;
}
if (req.url && req.url.endsWith(".png")) {
// Serve image bytes for fetchRemoteImage (replaceImage downloads the new
// image before uploading it). Any non-empty image/* body is enough;
// fetchRemoteImage does not validate PNG magic bytes.
res.writeHead(200, { "Content-Type": "image/png" });
res.end(Buffer.from([0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a]));
return;
}
if (req.url === "/api/files/upload") {
// Optional gate: a test can hold replaceImage parked here (inside its page
// lock, after the scan) to probe the lock key. Default: respond at once.
if (opts.onUpload) await opts.onUpload();
res.writeHead(200, { "Content-Type": "application/json" });
res.end(
JSON.stringify({
data: { id: "att-new", fileName: "replacement.png", fileSize: 8 },
}),
);
return;
}
// Title writes (/pages/update) and anything else: succeed quietly.
res.writeHead(200, { "Content-Type": "application/json" });
res.end(JSON.stringify({ data: {} }));
});
// buildCollabWsUrl maps http://host:port/api -> ws://host:port/collab.
server.on("upgrade", (request, socket, head) => {
if (!request.url || !request.url.startsWith("/collab")) {
socket.destroy();
return;
}
wss.handleUpgrade(request, socket, head, (ws) => {
hocuspocus.handleConnection(ws, request);
});
});
const baseURL = await new Promise((resolve) => {
server.listen(0, "127.0.0.1", () => {
const { port } = server.address();
resolve(`http://127.0.0.1:${port}/api`);
});
});
openStacks.push({ server, hocuspocus });
return { state, baseURL };
}
const openStacks = [];
after(async () => {
await Promise.all(
openStacks.map(
({ server, hocuspocus }) =>
new Promise((resolve) => {
server.close(() => {
Promise.resolve(hocuspocus.destroy?.()).finally(resolve);
});
}),
),
);
});
test("editPageText with a slugId opens the collab doc by the resolved UUID (#260)", async () => {
const { state, baseURL } = await spawnCollabStack();
const client = new DocmostClient(baseURL, "user@example.com", "pw");
const res = await client.editPageText(SLUG, [
{ find: "hello", replace: "hi" },
]);
assert.equal(res.success, true);
assert.ok(
state.docNames.includes(`page.${UUID}`),
`collab doc must be opened as page.${UUID}, got ${JSON.stringify(state.docNames)}`,
);
assert.ok(
!state.docNames.includes(`page.${SLUG}`),
"collab doc must NEVER be opened by the slugId (that is the data-loss bug)",
);
// The slugId had to be resolved via /pages/info at least once.
assert.ok(state.pagesInfoCalls.length >= 1);
});
test("tableInsertRow with a slugId opens the collab doc by the resolved UUID (#260)", async () => {
const { state, baseURL } = await spawnCollabStack();
const client = new DocmostClient(baseURL, "user@example.com", "pw");
// No table in the seed doc, so this aborts with "no table found" — but the
// collab doc has ALREADY been opened (by UUID) before the transform decides.
await assert.rejects(
() => client.tableInsertRow(SLUG, "#0", ["a", "b"]),
/no table/i,
);
assert.deepEqual(
state.docNames,
[`page.${UUID}`],
"tableInsertRow must open the collab doc by the resolved UUID",
);
});
test("the generic mutate (insert_footnote) with a slugId opens by the resolved UUID (#260)", async () => {
const { state, baseURL } = await spawnCollabStack();
const client = new DocmostClient(baseURL, "user@example.com", "pw");
const res = await client.insertFootnote(SLUG, "world", "a note");
assert.equal(res.success, true);
assert.deepEqual(
state.docNames,
[`page.${UUID}`],
"insert_footnote (via the mutatePage seam) must open the collab doc by UUID",
);
});
test("a UUID input is passed through unchanged and triggers NO /pages/info fetch (short-circuit)", async () => {
const { state, baseURL } = await spawnCollabStack();
const client = new DocmostClient(baseURL, "user@example.com", "pw");
const res = await client.editPageText(UUID, [
{ find: "hello", replace: "hi" },
]);
assert.equal(res.success, true);
assert.deepEqual(state.docNames, [`page.${UUID}`]);
assert.equal(
state.pagesInfoCalls.length,
0,
"a UUID input must short-circuit resolvePageId with no /pages/info round-trip",
);
});
test("a repeated slugId edit resolves the UUID only once (cache)", async () => {
const { state, baseURL } = await spawnCollabStack();
const client = new DocmostClient(baseURL, "user@example.com", "pw");
// Each mock connection re-seeds a fresh "hello world" doc (the mock does not
// persist across connects), so both edits target "hello". The cache assertion
// only concerns the slugId->uuid resolution, not the document content.
await client.editPageText(SLUG, [{ find: "hello", replace: "hi" }]);
await client.editPageText(SLUG, [{ find: "hello", replace: "hey" }]);
assert.deepEqual(state.docNames, [`page.${UUID}`, `page.${UUID}`]);
assert.equal(
state.pagesInfoCalls.length,
1,
"the slugId->uuid resolution must be cached across edits on the same page",
);
});
// PR#265 reviewer finding F1. replaceImage is the one path where the resolved
// UUID gates BOTH (a) the collab-doc OPEN (mutateLiveContentUnlocked ->
// page.<uuid>) AND (b) the per-page mutex key withPageLock(uuid). The lock
// serializes the whole scan -> upload -> write against other writes to the same
// page (which now also lock by the resolved UUID), closing a TOCTOU/orphan-
// attachment window. A regression that re-keys this lock by the raw slugId would
// desync it from mutatePageContent's UUID key and silently reopen that window.
// This test pins both invariants and FAILS under either regression:
// - open by slugId -> assertion (a) sees page.<slug> in docNames;
// - lock by slugId -> assertion (b)'s UUID-keyed probe is no longer blocked.
test("replaceImage opens by the resolved UUID AND keys its page lock by that UUID, not the slugId (#260 / PR#265 F1)", async () => {
// A gate that holds the /files/upload response open, so replaceImage parks
// INSIDE its page lock (after the read-only scan, mid-upload) until released.
let releaseUpload;
const uploadReleased = new Promise((r) => (releaseUpload = r));
let uploadHit;
const uploadStarted = new Promise((r) => (uploadHit = r));
const { state, baseURL } = await spawnCollabStack({
seed: seedDocWithImage,
onUpload: async () => {
uploadHit(); // replaceImage is now holding its page lock...
await uploadReleased; // ...and stays parked until the test releases it.
},
});
const client = new DocmostClient(baseURL, "user@example.com", "pw");
// Kick off the replace but DO NOT await: it resolves SLUG->UUID, takes
// withPageLock(UUID), scan-opens page.<UUID>, finds the seeded "att-old"
// image, then blocks in uploadImage on our gate while still holding the lock.
// The image URL is served as image/png by the mock (the ".png" route above).
const imageUrl = `${baseURL}/x.png`;
const replacePromise = client.replaceImage(SLUG, "att-old", imageUrl);
await uploadStarted; // deterministic: replaceImage now holds its page lock.
// (a) OPEN BY UUID: the only collab doc opened so far (the scan pass) used the
// canonical UUID, never the slugId. (The write pass opens a second time after
// we release the gate; asserted at the end.)
assert.deepEqual(
state.docNames,
[`page.${UUID}`],
"replaceImage must scan-open the collab doc by the resolved UUID, never the slugId",
);
// (b) LOCK KEY == UUID (the distinct invariant). We share the SAME page-lock
// module instance as build/client.js, so enqueuing on key=UUID contends on the
// very chain replaceImage holds. Because replaceImage is deterministically
// parked mid-upload (still holding the lock), a UUID-keyed probe MUST stay
// queued; it cannot run until the lock frees. The contention here is pure
// in-memory promise-chain microtask scheduling (no timers, no socket I/O), so
// a single macrotask flush is a sufficient and deterministic observation.
// If replaceImage were reverted to lock by the slugId, the UUID chain would be
// free and this probe would run during the flush -> probeRan === true -> FAIL.
let probeRan = false;
const probeDone = withPageLock(UUID, async () => {
probeRan = true;
});
// setImmediate runs after the microtask queue fully drains, so a probe on a
// FREE chain would already have run by the time this resolves.
await new Promise((r) => setImmediate(r));
assert.equal(
probeRan,
false,
"a probe on key=UUID must stay blocked while replaceImage holds the lock; " +
"if it ran, replaceImage locked by a different key (e.g. the raw slugId)",
);
// Non-vacuity guard: a probe on an UNRELATED key DOES run after the same
// single flush. This proves the flush actually executes queued callbacks, so
// probeRan === false above means "blocked", not "the flush never ran anyone".
let freeRan = false;
const freeDone = withPageLock(`page.free-${UUID}`, async () => {
freeRan = true;
});
await new Promise((r) => setImmediate(r));
assert.equal(
freeRan,
true,
"sanity: a probe on a FREE key must run after one flush (the UUID probe was blocked by the held key, not by an inert flush)",
);
// Release the gate; replaceImage finishes and the queued UUID probe can run.
releaseUpload();
const res = await replacePromise;
await probeDone;
await freeDone;
assert.equal(res.success, true);
assert.equal(res.replaced, 1, "the one seeded image must be repointed");
// Both opens (scan pass + write pass) used the UUID; the slugId never appears.
assert.deepEqual(state.docNames, [`page.${UUID}`, `page.${UUID}`]);
assert.ok(
!state.docNames.includes(`page.${SLUG}`),
"replaceImage must NEVER open the collab doc by the slugId (the #260 bug)",
);
});

View File

@@ -66,6 +66,14 @@ function makeServer() {
sendJson(res, 200, { data: { token: "collab-jwt" } });
return;
}
if (req.url === "/api/pages/info") {
// Resolve the pageId -> canonical UUID (#260) so the test exercises the
// real body-write failure (no WS upgrade) rather than a resolve failure.
sendJson(res, 200, {
data: { id: "11111111-1111-4111-8111-111111111111", slugId: "page-1" },
});
return;
}
if (req.url === "/api/pages/update") {
state.titlePosted = true;
sendJson(res, 200, { data: {} });

View File

@@ -0,0 +1,93 @@
// Cookie parsing for the login flow.
//
// `performLogin` in auth-utils.ts does a real network POST and then extracts the
// auth token from the response's Set-Cookie header. The cookie-parsing logic was
// extracted into the pure, exported helper `extractAuthTokenFromSetCookie` so it
// can be tested without network I/O; `performLogin` now delegates to it, so these
// tests cover the exact parsing path the login uses.
import { test } from "node:test";
import assert from "node:assert/strict";
import { extractAuthTokenFromSetCookie } from "../../build/lib/auth-utils.js";
// ---------------------------------------------------------------------------
// Happy path: a single authToken cookie with attributes.
// ---------------------------------------------------------------------------
test("extracts the authToken value, ignoring trailing attributes", () => {
const cookies = [
"authToken=abc123; Path=/; HttpOnly; Secure; SameSite=Lax",
];
assert.equal(extractAuthTokenFromSetCookie(cookies), "abc123");
});
// ---------------------------------------------------------------------------
// A base64/JWT value containing "=" padding must NOT be truncated: only the
// FIRST "=" separates name from value.
// ---------------------------------------------------------------------------
test("preserves an '=' inside the value (base64 padding is not truncated)", () => {
const jwt = "eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIxIn0=";
const cookies = [`authToken=${jwt}; Path=/`];
assert.equal(extractAuthTokenFromSetCookie(cookies), jwt);
});
// ---------------------------------------------------------------------------
// Exact-name match: a different cookie whose name merely STARTS WITH "authToken"
// (e.g. authTokenRefresh) must not be picked up; the real authToken wins.
// ---------------------------------------------------------------------------
test("matches the cookie name exactly, not by prefix (authTokenRefresh ignored)", () => {
const cookies = [
"authTokenRefresh=refreshvalue; Path=/; HttpOnly",
"authToken=realtoken; Path=/; HttpOnly",
];
assert.equal(extractAuthTokenFromSetCookie(cookies), "realtoken");
});
// ---------------------------------------------------------------------------
// Picks the authToken out of several unrelated cookies regardless of order.
// ---------------------------------------------------------------------------
test("selects authToken among multiple unrelated cookies", () => {
const cookies = [
"session=xyz; Path=/",
"authToken=tok-7; Path=/; HttpOnly",
"theme=dark",
];
assert.equal(extractAuthTokenFromSetCookie(cookies), "tok-7");
});
// ---------------------------------------------------------------------------
// An empty value is valid and returns "".
// ---------------------------------------------------------------------------
test("returns an empty string when authToken has an empty value", () => {
assert.equal(extractAuthTokenFromSetCookie(["authToken=; Path=/"]), "");
});
// ---------------------------------------------------------------------------
// Missing Set-Cookie header -> documented error.
// ---------------------------------------------------------------------------
test("throws when there is no Set-Cookie header", () => {
assert.throws(
() => extractAuthTokenFromSetCookie(undefined),
/No Set-Cookie header/,
);
});
// ---------------------------------------------------------------------------
// Set-Cookie present but no authToken cookie -> documented error.
// ---------------------------------------------------------------------------
test("throws when no authToken cookie is present", () => {
assert.throws(
() => extractAuthTokenFromSetCookie(["session=xyz; Path=/", "theme=dark"]),
/No authToken cookie/,
);
});
// ---------------------------------------------------------------------------
// An empty cookie array also yields the "no authToken" error (header exists but
// is empty), distinct from the "no Set-Cookie header" case above.
// ---------------------------------------------------------------------------
test("throws 'no authToken' (not 'no header') for an empty cookie array", () => {
assert.throws(
() => extractAuthTokenFromSetCookie([]),
/No authToken cookie/,
);
});

View File

@@ -0,0 +1,213 @@
import { test } from "node:test";
import assert from "node:assert/strict";
import { readFileSync } from "node:fs";
import { fileURLToPath } from "node:url";
import { dirname, resolve } from "node:path";
import { DocmostClient } from "../../build/index.js";
// Drift guard for the THIRD hand-written layer of the AI tool set (issue #193,
// layer 3): the in-app server hand-mirrors the DocmostClient method signatures
// it consumes as the `DocmostClientLike` interface in
// apps/server/src/core/ai-chat/tools/docmost-client.loader.ts ("Signatures here
// mirror that file exactly"). That mirror lives across the ESM(mcp)/CJS(server)
// boundary and the package ships NO .d.ts, so the server typecheck cannot verify
// the names against the real class — a rename/removal in client.ts would surface
// only as a runtime "x is not a function" inside an agent tool call.
//
// SCOPE: this guard checks the method-NAME set only, not signatures. It pins the
// contract from the mcp side (ESM, where the real class is directly importable):
// every method the embedding host depends on MUST exist as a function on a real
// DocmostClient instance. If you rename/remove a client method, this fails here
// AND you must update DocmostClientLike to match. It does NOT verify parameter or
// return-type parity — signature drift between the hand-mirror and client.ts can
// still ship silently; full signature/type parity is the deferred staged-plan
// item below.
//
// Keep the HOST_CONTRACT_METHODS NAME list aligned with the method NAMES declared
// in the server's DocmostClientLike interface (the in-app per-user tool adapter
// only — it is a SUBSET of the DocmostClient surface — covers only what the in-app adapter
// consumes; the standalone MCP transport (packages/mcp/src/index.ts) calls additional
// client methods (insertImage/replaceImage/deleteComment/updateComment/insertFootnote)
// that this guard does NOT track — the MCP transport's own typecheck covers those). Full type-derivation
// of DocmostClientLike from this class is deferred (see the staged plan in
// docmost-client.loader.ts): the package emits no declarations and the real
// (inferred, concrete) return types conflict with the host's loose
// `Record<string,unknown>` + `as`-cast result handling.
const HOST_CONTRACT_METHODS = [
// read
"search",
"getPage",
"getWorkspace",
"getSpaces",
"listPages",
"listSidebarPages",
"getOutline",
"getPageJson",
"getNode",
"getTable",
"listComments",
"getComment",
"checkNewComments",
"listShares",
"listPageHistory",
"getPageHistory",
"diffPageVersions",
"exportPageMarkdown",
// write (page)
"createPage",
"updatePage",
"renamePage",
"movePage",
"deletePage",
"editPageText",
"patchNode",
"insertNode",
"deleteNode",
"updatePageJson",
"tableInsertRow",
"tableDeleteRow",
"tableUpdateCell",
"copyPageContent",
"importPageMarkdown",
"sharePage",
"unsharePage",
"restorePageVersion",
"transformPage",
"stashPage",
// write (comment)
"createComment",
"resolveComment",
];
test("DocmostClient implements every method the in-app DocmostClientLike mirror declares", () => {
// The constructor is side-effect-free (no network/login on construction): it
// only stores config and creates an axios instance, so it is safe to build a
// throwaway instance here with a dummy token provider.
const client = new DocmostClient({
apiUrl: "http://127.0.0.1:1/api",
getToken: async () => "test-token",
});
const missing = HOST_CONTRACT_METHODS.filter(
(name) => typeof client[name] !== "function",
);
assert.deepEqual(
missing,
[],
`DocmostClient is missing host-contract method(s): ${missing.join(", ")}. ` +
`Update packages/mcp/src/client.ts and/or the server's DocmostClientLike ` +
`interface (apps/server/src/core/ai-chat/tools/docmost-client.loader.ts) ` +
`so the hand-mirrored method NAMES stay aligned (this guards names only, ` +
`not signatures).`,
);
});
test("HOST_CONTRACT_METHODS has no duplicates", () => {
assert.equal(
new Set(HOST_CONTRACT_METHODS).size,
HOST_CONTRACT_METHODS.length,
);
});
// Parse the method names declared in the server's `DocmostClientLike` interface
// body. We read the .ts source as plain text (no TS compiler dep, and the file
// lives in the CJS server tree across the ESM boundary): scan from the
// `export interface DocmostClientLike {` line to its closing brace at column 0,
// matching member-signature lines like ` methodName(`. Nested param-object
// braces (`opts: { ... }`) are indented, so only the interface's own closing
// `}` (column 0) ends the scan.
function parseDocmostClientLikeMethods() {
const here = dirname(fileURLToPath(import.meta.url));
// packages/mcp/test/unit -> repo root is four levels up.
const loaderPath = resolve(
here,
"../../../../apps/server/src/core/ai-chat/tools/docmost-client.loader.ts",
);
let source;
try {
source = readFileSync(loaderPath, "utf8");
} catch (err) {
if (err && err.code === "ENOENT") {
throw new Error(
`Expected monorepo layout; server tree at ${loaderPath} not found. ` +
`This drift-guard reads the server's DocmostClientLike interface via a ` +
`fixed relative path and must run from inside the monorepo checkout.`,
);
}
throw err;
}
const lines = source.split(/\r?\n/);
const startIdx = lines.findIndex((l) =>
/^export interface DocmostClientLike\s*\{/.test(l),
);
assert.notEqual(
startIdx,
-1,
`Could not find "export interface DocmostClientLike {" in ${loaderPath}. ` +
`If the interface was renamed/moved, update this drift-guard test.`,
);
const methods = [];
let closed = false;
// Track whether we are inside a `/* ... */` block comment. Inner lines of a
// block comment need NOT start with `*`, so a `name(` line inside one would be
// falsely parsed as an interface method without this. (`//` line comments can
// never match the method regex below since they start with `/`.)
let inBlockComment = false;
for (let i = startIdx + 1; i < lines.length; i++) {
const line = lines[i];
if (inBlockComment) {
// Stay in the block until we see its closing `*/`.
if (line.includes("*/")) inBlockComment = false;
continue;
}
// Enter a block comment only when it opens without closing on the same line;
// a self-contained `/* ... */` on one line cannot precede a method name we
// care about (such lines start with `/`, so the method regex won't match).
if (line.includes("/*") && !line.includes("*/")) {
inBlockComment = true;
continue;
}
if (/^\}/.test(line)) {
closed = true;
break;
}
// Method-name match: a TS identifier (letters/digits/`_`/`$`, not starting
// with a digit) optionally followed by a generic clause (`method<T>(`), then
// the opening paren of the signature.
const m = /^\s*([A-Za-z_$][A-Za-z0-9_$]*)\s*(?:<[^>]*>)?\(/.exec(line);
if (m) methods.push(m[1]);
}
assert.ok(
closed,
`Did not find the closing brace of DocmostClientLike in ${loaderPath}.`,
);
assert.ok(
methods.length > 0,
`Parsed zero methods from DocmostClientLike in ${loaderPath} — the parser ` +
`is likely out of date with the interface formatting.`,
);
return methods;
}
// The point of the guard is to protect the DocmostClientLike mirror <-> client.ts
// link, but HOST_CONTRACT_METHODS is itself a HAND-COPY of that interface kept in
// sync manually. The list<->interface link must be tested too: a method consumed
// by the adapter and added to DocmostClientLike but forgotten here (or removed
// from the interface but left here) would otherwise escape both the server
// typecheck (pkg emits no .d.ts) and the first test above (name not in the list).
// Assert the two agree BOTH ways.
test("HOST_CONTRACT_METHODS exactly mirrors the server's DocmostClientLike interface", () => {
const interfaceMethods = parseDocmostClientLikeMethods();
assert.deepEqual(
[...HOST_CONTRACT_METHODS].sort(),
[...interfaceMethods].sort(),
`HOST_CONTRACT_METHODS has drifted from the DocmostClientLike interface in ` +
`apps/server/src/core/ai-chat/tools/docmost-client.loader.ts. Add/remove ` +
`method names in HOST_CONTRACT_METHODS so it lists EXACTLY the methods ` +
`declared in that interface (both directions are checked).`,
);
});

View File

@@ -0,0 +1,111 @@
// applyAnchorInDoc — first-match / ambiguity / boundary behavior.
//
// comment-anchor.test.mjs already covers the core apply paths (single-node
// match, spanning adjacent text nodes, code/italic boundary mark preservation,
// smart-quote normalization, no-match-no-mutation, pre-existing comment mark
// replacement, nested-list DFS). This file focuses on the SELECTION/RESOLUTION
// behavior those tests don't pin down: which occurrence/block wins when a
// selection appears more than once, sub-word ranges, and the run boundary
// created by a non-text inline node.
import { test } from "node:test";
import assert from "node:assert/strict";
import { applyAnchorInDoc, canAnchorInDoc } from "../../build/lib/comment-anchor.js";
const commentMark = (node) =>
(Array.isArray(node.marks) ? node.marks : []).find((m) => m && m.type === "comment") || null;
const paragraphDoc = (content) => ({ type: "doc", content: [{ type: "paragraph", content }] });
// ---------------------------------------------------------------------------
// Document order: when two separate blocks both contain the selection, only the
// FIRST block (DFS document order) is anchored; the second is left untouched.
// ---------------------------------------------------------------------------
test("anchors only the FIRST block when the selection occurs in two blocks", () => {
const doc = {
type: "doc",
content: [
{ type: "paragraph", content: [{ type: "text", text: "first target here" }] },
{ type: "paragraph", content: [{ type: "text", text: "second target here" }] },
],
};
assert.equal(applyAnchorInDoc(doc, "target", "C"), true);
const marked0 = doc.content[0].content.filter((p) => commentMark(p));
const marked1 = doc.content[1].content.filter((p) => commentMark(p));
assert.equal(marked0.length, 1, "first block is anchored");
assert.equal(marked0[0].text, "target");
assert.equal(marked1.length, 0, "second block is left untouched");
});
// ---------------------------------------------------------------------------
// Ambiguity within one block: indexOf finds the FIRST occurrence, so only the
// first "ab" is marked; the later occurrences stay in one unmarked fragment.
// ---------------------------------------------------------------------------
test("anchors only the FIRST occurrence within a block (ambiguous selection)", () => {
const doc = paragraphDoc([{ type: "text", text: "ab ab ab" }]);
assert.equal(applyAnchorInDoc(doc, "ab", "C"), true);
const parts = doc.content[0].content;
assert.equal(parts.length, 2, "split into [marked, rest]");
assert.equal(parts[0].text, "ab");
assert.ok(commentMark(parts[0]), "first occurrence is marked");
assert.equal(parts[1].text, " ab ab");
assert.equal(commentMark(parts[1]), null, "later occurrences are not marked");
});
// ---------------------------------------------------------------------------
// Sub-word range: a selection that is a substring inside a single text node is
// spliced into before / marked / after, marking exactly the matched characters.
// ---------------------------------------------------------------------------
test("anchors a sub-word range inside a single text node", () => {
const doc = paragraphDoc([{ type: "text", text: "Hello" }]);
assert.equal(applyAnchorInDoc(doc, "ell", "C"), true);
const parts = doc.content[0].content;
assert.deepEqual(parts.map((p) => p.text), ["H", "ell", "o"]);
assert.equal(commentMark(parts[0]), null);
assert.ok(commentMark(parts[1]), "only the matched substring is marked");
assert.equal(commentMark(parts[2]), null);
});
// ---------------------------------------------------------------------------
// A non-text inline node (hardBreak) breaks the matching run: a selection that
// would span the break cannot match, but one wholly inside a run still does.
// ---------------------------------------------------------------------------
test("a non-text inline node breaks the run: cross-break selection does not match", () => {
const make = () =>
paragraphDoc([
{ type: "text", text: "foo" },
{ type: "hardBreak" },
{ type: "text", text: "bar" },
]);
// "foobar" straddles the hardBreak -> no match, no mutation.
const docA = make();
const before = JSON.stringify(docA);
assert.equal(canAnchorInDoc(docA, "foobar"), false);
assert.equal(applyAnchorInDoc(docA, "foobar", "C"), false);
assert.equal(JSON.stringify(docA), before, "failed match must not mutate");
// "foo" lives entirely in the first run -> matches and is marked; the
// hardBreak node is preserved untouched.
const docB = make();
assert.equal(applyAnchorInDoc(docB, "foo", "C"), true);
const parts = docB.content[0].content;
assert.equal(parts[0].text, "foo");
assert.ok(commentMark(parts[0]));
assert.equal(parts[1].type, "hardBreak", "the inline atom is preserved");
assert.equal(parts[2].text, "bar");
assert.equal(commentMark(parts[2]), null);
});
// ---------------------------------------------------------------------------
// A whitespace-only selection normalizes to empty and never anchors.
// ---------------------------------------------------------------------------
test("a whitespace-only selection does not anchor and does not mutate", () => {
const doc = paragraphDoc([{ type: "text", text: "hello world" }]);
const before = JSON.stringify(doc);
assert.equal(canAnchorInDoc(doc, " "), false);
assert.equal(applyAnchorInDoc(doc, " ", "C"), false);
assert.equal(JSON.stringify(doc), before);
});

View File

@@ -149,3 +149,37 @@ test("empty task item still emits its marker", () => {
assert.equal(convertProseMirrorToMarkdown(input), "- [ ]\n- [x]");
});
// Image captions (issue #221). An image WITHOUT a caption stays the lossy-free
// `![alt](src)`; WITH a caption it is emitted as a raw <img data-caption>
// wrapped in a block <div> (symmetric to video) so the round-trip md -> html ->
// json restores the caption via the image extension's parseHTML.
test("image without a caption emits plain ![alt](src)", () => {
const input = doc({
type: "image",
attrs: { src: "/files/a.png", alt: "cat" },
});
assert.equal(convertProseMirrorToMarkdown(input), "![cat](/files/a.png)");
});
test("image with a caption emits a raw <img data-caption> in a block div", () => {
const input = doc({
type: "image",
attrs: { src: "/files/a.png", alt: "cat", caption: "A grey cat" },
});
assert.equal(
convertProseMirrorToMarkdown(input),
'<div><img src="/files/a.png" alt="cat" data-caption="A grey cat"></div>',
);
});
test("image caption escapes & and \" in the data-caption attribute", () => {
const input = doc({
type: "image",
attrs: { src: "/files/a.png", caption: 'Tom & "Jerry"' },
});
assert.equal(
convertProseMirrorToMarkdown(input),
'<div><img src="/files/a.png" data-caption="Tom &amp; &quot;Jerry&quot;"></div>',
);
});

View File

@@ -0,0 +1,135 @@
// Extra media round-trip coverage (issue #244), complementing
// media-roundtrip.test.mjs.
//
// The existing media-roundtrip.test.mjs already asserts that video, youtube,
// embed, excalidraw, audio and pdf SURVIVE a PM -> markdown -> PM round-trip and
// keeps their identifying src / provider / name / attachmentId. It does NOT,
// however, exercise:
// * the `drawio` node (a distinct schema node that shares the excalidraw
// converter case) — not covered at all;
// * the dimension / layout attributes (width, height, align) that ride in
// data-* attributes — exactly where a converter<->schema mismatch silently
// drops a value while the node itself survives;
// * attribute escaping for a src containing `"` (escapeAttr) — a malformed
// value here would either break the round-trip or inject HTML.
//
// These are the gaps this file locks down.
import { test } from "node:test";
import assert from "node:assert/strict";
import { convertProseMirrorToMarkdown } from "../../build/lib/markdown-converter.js";
import { markdownToProseMirror } from "../../build/lib/collaboration.js";
const doc = (...content) => ({ type: "doc", content });
const findAll = (node, type, acc = []) => {
if (!node || typeof node !== "object") return acc;
if (node.type === type) acc.push(node);
for (const c of node.content || []) findAll(c, type, acc);
return acc;
};
// PM node -> markdown -> PM; return both the markdown and the matching nodes.
const roundtrip = async (node, type) => {
const md = convertProseMirrorToMarkdown(doc(node));
const pm = await markdownToProseMirror(md);
return { md, found: findAll(pm, type) };
};
// ---------------------------------------------------------------------------
// drawio: a separate schema node sharing the excalidraw converter case. Not
// covered by the existing file at all, so guard its full round-trip here.
// ---------------------------------------------------------------------------
test("round-trip: drawio diagram survives with src, title, dimensions, align, attachmentId", async () => {
const { md, found } = await roundtrip(
{
type: "drawio",
attrs: {
src: "/api/files/d.drawio",
title: "Flow",
width: 400,
height: 300,
align: "left",
attachmentId: "dz1",
},
},
"drawio",
);
// The converter must emit the schema-matching div[data-type="drawio"].
assert.match(md, /data-type="drawio"/);
assert.equal(found.length, 1, "drawio node must survive the round-trip");
const a = found[0].attrs;
assert.equal(a.src, "/api/files/d.drawio");
assert.equal(a.title, "Flow");
assert.equal(a.attachmentId, "dz1");
assert.equal(a.align, "left");
// Numeric dimensions come back as strings via the schema parseHTML.
assert.equal(String(a.width), "400");
assert.equal(String(a.height), "300");
});
// ---------------------------------------------------------------------------
// Dimension + align attrs ride in data-* (or width/height) attributes. The
// existing file checks only src/provider/name/attachmentId, so a dropped
// width/height/align would pass there but fail here.
// ---------------------------------------------------------------------------
test("round-trip: youtube preserves width/height/align (data-* attrs)", async () => {
const { found } = await roundtrip(
{ type: "youtube", attrs: { src: "https://youtube.com/watch?v=x", width: 560, height: 315, align: "left" } },
"youtube",
);
assert.equal(found.length, 1);
const a = found[0].attrs;
assert.equal(String(a.width), "560");
assert.equal(String(a.height), "315");
assert.equal(a.align, "left");
});
test("round-trip: embed preserves provider, width/height and align", async () => {
const { found } = await roundtrip(
{ type: "embed", attrs: { src: "https://e.com/x", provider: "iframe", width: 600, height: 480, align: "right" } },
"embed",
);
assert.equal(found.length, 1);
const a = found[0].attrs;
assert.equal(a.provider, "iframe");
assert.equal(String(a.width), "600");
assert.equal(String(a.height), "480");
assert.equal(a.align, "right");
});
test("round-trip: video preserves width/height and align (data-align)", async () => {
const { found } = await roundtrip(
{ type: "video", attrs: { src: "/api/files/v.mp4", attachmentId: "att1", width: 640, height: 360, align: "right" } },
"video",
);
assert.equal(found.length, 1);
const a = found[0].attrs;
assert.equal(String(a.width), "640");
assert.equal(String(a.height), "360");
assert.equal(a.align, "right");
});
test("round-trip: pdf preserves width/height (standard attrs) plus name", async () => {
const { found } = await roundtrip(
{ type: "pdf", attrs: { src: "/api/files/x.pdf", name: "x.pdf", attachmentId: "a4", width: 700, height: 900 } },
"pdf",
);
assert.equal(found.length, 1);
const a = found[0].attrs;
assert.equal(a.name, "x.pdf");
assert.equal(String(a.width), "700");
assert.equal(String(a.height), "900");
});
// ---------------------------------------------------------------------------
// Escaping: a src containing a double quote must survive the attribute-quoted
// HTML emission (escapeAttr) and re-parse to the exact original value, with no
// node loss and no HTML injection.
// ---------------------------------------------------------------------------
test("round-trip: a src containing a double quote is escaped and recovered intact", async () => {
const tricky = 'https://e.com/x?a="b"&c=1';
const { found } = await roundtrip({ type: "youtube", attrs: { src: tricky } }, "youtube");
assert.equal(found.length, 1, "node must survive a quote-bearing src");
assert.equal(found[0].attrs.src, tricky, "the exact src is recovered");
});

View File

@@ -142,3 +142,31 @@ test("round-trip: pdf node survives markdown export with src + name + attachment
assert.equal(found[0].attrs?.name, "x.pdf");
assert.equal(found[0].attrs?.attachmentId, "a4");
});
// The converter emits captioned images as a raw <img data-caption="...">; for
// the caption to survive the PM -> markdown -> PM round-trip the docmost-schema
// Image node must parse data-caption back into the `caption` attr. Without that
// (stock @tiptap/extension-image), the caption is silently lost — these guard
// the "lossless" claim.
test("round-trip: image caption survives markdown export (data-caption restored)", async () => {
const found = await roundtrip(
{ type: "image", attrs: { src: "/api/files/cat.png", alt: "cat", caption: "A grey cat" } },
"image",
);
assert.equal(found.length, 1, "image node should survive");
assert.equal(found[0].attrs?.src, "/api/files/cat.png");
assert.equal(found[0].attrs?.caption, "A grey cat", "caption must round-trip");
});
test("round-trip: image caption with special chars survives markdown export", async () => {
const found = await roundtrip(
{ type: "image", attrs: { src: "/api/files/cat.png", caption: 'Tom & "Jerry"' } },
"image",
);
assert.equal(found.length, 1, "image node should survive");
assert.equal(
found[0].attrs?.caption,
'Tom & "Jerry"',
"special-char caption must round-trip unescaped",
);
});

View File

@@ -0,0 +1,139 @@
// CONTRACT / DRIFT GUARD: mcp diff vs the vendored editor-ext recreate-transform.
//
// packages/mcp/src/lib/diff.ts computes its document diff with
// `recreateTransform` from the published @fellow/prosemirror-recreate-transform
// package. Docmost's in-app history editor computes the SAME diff with its own
// vendored copy at
// packages/editor-ext/src/lib/recreate-transform/recreateTransform.ts.
// diff.ts's header comment claims the two are "identical" — if they ever drift,
// the headless mcp diff would stop matching what a user sees in the app.
//
// This test guards that claim two ways, on representative doc pairs, using the
// EXACT options diff.ts passes (complexSteps:false, wordDiffs:true,
// simplifyDiff:true):
// 1. invariant: each implementation's transform reproduces the target doc
// (apply(diff) == target);
// 2. cross-copy parity: both implementations emit the SAME step sequence, so a
// behavioral divergence between the two copies fails this test.
//
// The vendored copy is TypeScript, so it is transpiled to CommonJS at test time
// and required directly — the test runs the ACTUAL vendored source, not a stand-in.
import { test, before } from "node:test";
import assert from "node:assert/strict";
import ts from "typescript";
import fs from "node:fs";
import path from "node:path";
import { createRequire } from "node:module";
import { fileURLToPath } from "node:url";
import { recreateTransform as fellowRecreate } from "@fellow/prosemirror-recreate-transform";
import { Node } from "@tiptap/pm/model";
import { docmostSchema } from "../../build/lib/docmost-schema.js";
const require = createRequire(import.meta.url);
const HERE = path.dirname(fileURLToPath(import.meta.url));
// .../packages/mcp/test/unit -> repo packages root.
const PACKAGES = path.resolve(HERE, "..", "..", "..");
const VENDOR_SRC = path.join(
PACKAGES,
"editor-ext",
"src",
"lib",
"recreate-transform",
);
// Emit transpiled CJS under mcp/build so Node resolves the hoisted deps
// (@tiptap/pm, rfc6902, diff) up the directory tree exactly as diff.js does.
const VENDOR_OUT = path.resolve(HERE, "..", "..", "build", "_vendored_editor_ext");
// The exact options the mcp diff pipeline uses (diff.ts).
const DIFF_OPTS = { complexSteps: false, wordDiffs: true, simplifyDiff: true };
let vendoredRecreate;
before(() => {
assert.ok(
fs.existsSync(VENDOR_SRC),
`vendored recreate-transform sources missing at ${VENDOR_SRC}`,
);
fs.rmSync(VENDOR_OUT, { recursive: true, force: true });
fs.mkdirSync(VENDOR_OUT, { recursive: true });
// Mark the output as CommonJS so relative `require("./x")` resolves to x.js.
fs.writeFileSync(
path.join(VENDOR_OUT, "package.json"),
JSON.stringify({ type: "commonjs" }),
);
for (const f of fs.readdirSync(VENDOR_SRC)) {
if (!f.endsWith(".ts")) continue;
const code = fs.readFileSync(path.join(VENDOR_SRC, f), "utf8");
const out = ts.transpileModule(code, {
compilerOptions: {
module: ts.ModuleKind.CommonJS,
target: ts.ScriptTarget.ES2020,
},
});
fs.writeFileSync(path.join(VENDOR_OUT, f.replace(/\.ts$/, ".js")), out.outputText);
}
vendoredRecreate = require(path.join(VENDOR_OUT, "index.js")).recreateTransform;
assert.equal(typeof vendoredRecreate, "function", "vendored recreateTransform loaded");
});
// ---------------------------------------------------------------------------
// Builders + representative doc pairs covering the diff shapes diff.ts handles.
// ---------------------------------------------------------------------------
const t = (text, marks) => (marks ? { type: "text", text, marks } : { type: "text", text });
const para = (...c) => ({ type: "paragraph", content: c });
const doc = (...c) => ({ type: "doc", content: c });
const PAIRS = [
// word inserted mid-sentence
["insert word", doc(para(t("Hello world"))), doc(para(t("Hello brave world")))],
// whole block deleted
["delete block", doc(para(t("keep this")), para(t("remove this"))), doc(para(t("keep this")))],
// word removed mid-sentence
["delete word", doc(para(t("one two three"))), doc(para(t("one three")))],
// pure mark addition (complexSteps:false treats it as a content step)
["add mark", doc(para(t("plain"))), doc(para(t("plain", [{ type: "bold" }])))],
// two blocks swapped (reorder)
["reorder blocks", doc(para(t("a")), para(t("b"))), doc(para(t("b")), para(t("a")))],
// structural insert: an image node appears
[
"insert image",
doc(para(t("caption"))),
doc(para(t("caption")), { type: "image", attrs: { src: "/api/files/a.png", attachmentId: "i1" } }),
],
];
const stepsJSON = (tr) => JSON.stringify(tr.steps.map((s) => s.toJSON()));
for (const [label, fromJSON, toJSON] of PAIRS) {
test(`invariant: @fellow recreateTransform reproduces the target (${label})`, () => {
const from = Node.fromJSON(docmostSchema, fromJSON);
const to = Node.fromJSON(docmostSchema, toJSON);
const tr = fellowRecreate(from, to, DIFF_OPTS);
// apply(diff) == target, comparing schema-normalized JSON on both sides.
assert.equal(JSON.stringify(tr.doc.toJSON()), JSON.stringify(to.toJSON()));
});
test(`drift: @fellow and vendored editor-ext emit identical steps (${label})`, () => {
const mk = () => [
Node.fromJSON(docmostSchema, fromJSON),
Node.fromJSON(docmostSchema, toJSON),
];
const [fA, tA] = mk();
const [fB, tB] = mk();
const trFellow = fellowRecreate(fA, tA, DIFF_OPTS);
const trVendor = vendoredRecreate(fB, tB, DIFF_OPTS);
// Both must reach the same target...
const target = JSON.stringify(tA.toJSON());
assert.equal(JSON.stringify(trFellow.doc.toJSON()), target, "fellow reaches target");
assert.equal(JSON.stringify(trVendor.doc.toJSON()), target, "vendored reaches target");
// ...and, critically, via the SAME step sequence. A divergence in the two
// recreate-transform copies' algorithm would change the steps and fail here.
assert.equal(
stepsJSON(trVendor),
stepsJSON(trFellow),
`vendored editor-ext drifted from @fellow on "${label}"`,
);
});
}

View File

@@ -82,6 +82,24 @@ test("round-trip: image inside a column survives as an image node (not literal m
assert.ok(!JSON.stringify(out).includes("![pic]"), "image must not become literal markdown text");
});
test("round-trip: captioned image inside a column preserves its caption (imageToHtml branch)", async () => {
// A captioned image in a column is emitted via the imageToHtml helper (raw
// HTML container), a different path from the top-level image case. Special
// chars in the caption exercise attribute escaping on the way out and in.
const caption = 'Tom & "Jerry"';
const input = doc({
type: "columns",
content: [
{ type: "column", content: [{ type: "image", attrs: { src: "/api/files/a/p.png", alt: "pic", caption } }] },
{ type: "column", content: [para(text("right"))] },
],
});
const out = await roundtrip(input);
const imgs = findNodes(out, "image");
assert.equal(imgs.length, 1, "captioned image inside a column must survive");
assert.equal(imgs[0].attrs?.caption, caption, "caption (incl. special chars) must be preserved");
});
test("round-trip: blockquote inside a column survives as a blockquote node", async () => {
const input = doc({
type: "columns",