293348f9dc4e77458d44b66db83f882dd813aebe
25 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
e04afee629 |
test(#260): cover replaceImage's UUID lock-key invariant; drop dead cache line
Reviewer round 1 on the #260 collab-doc-name fix: - F1: replaceImage is the one path where the resolved UUID gates BOTH the collab-doc open AND the per-page mutex key (withPageLock(pageUuid)). Add a deterministic test to resolve-page-id-collab-doc-name.test.mjs: it gates /files/upload so replaceImage parks mid-upload holding its lock, asserts the doc opened as page.<uuid> (never page.<slug>), and probes the SHARED page-lock chain — a withPageLock(UUID) probe must stay blocked while replaceImage holds it (with a free-key probe as a non-vacuity guard). The test fails if the lock key is reverted to the slugId (verified). - F2: drop the dead `pageIdCache.set(uuid, uuid)` — resolvePageId returns on the isUuid() short-circuit before the cache is ever read with a uuid key, so only slugId->uuid entries are stored/read. Comment corrected to match. MCP suite 430/430, tsc 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
3b80285d57 |
fix(#260): open MCP collab docs by canonical UUID (slugId doc-name split)
Real root cause of the silent MCP edit loss: the web editor always opens the
collaboration document by the page UUID (`page.${page.id}`), but the MCP
opened it by the agent-supplied id — usually a slugId — so `page.${pageId}`
became `page.<slugId>`. For one DB page that is TWO independent Yjs documents;
both persist to the same `pages` row (findById/updatePage resolve id or
slugId), so the human tab's debounced store overwrites the agent edit
(last-store-wins) — gone after reload, never shown live. The slugId doc also
made the server's transclusion sync + embedding reindex throw Postgres 22P02.
Fix:
- MCP (primary): resolvePageId(pageId) returns the canonical UUID — a UUID
short-circuits with no network call, a slugId resolves once via getPageRaw
and is cached both ways. Every collab-write path (mutatePageContent /
updatePageContentRealtime / replacePageContent and the mutate/replace/
unlocked seams) now opens by the resolved UUID, so the MCP and the editor
share ONE Yjs doc. replaceImage's whole-operation page lock also keys on the
UUID so it serializes against the other (now-UUID-keyed) writes.
- Server (defense + kills the 22P02 noise): onStoreDocument passes the resolved
page.id — not the raw doc-name id — to syncTransclusion, the embedding queue,
the mention-notification job, addContributors, and the in-tx history read.
Content store and the empty-guard are untouched.
Tests: a new MCP test stands up a real Hocuspocus server and asserts a slugId
input opens `page.<uuid>` (never `page.<slugId>`), with UUID short-circuit and
single-resolve caching; the server spec asserts the side-effects receive the
UUID for a `page.<slugId>` doc. closes #260
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
8842bc8bf3 |
fix(sandbox): address PR #250 follow-up review — XSS hardening, eviction reconcile, doc sync (#243)
Security (must-fix):
- sandbox.controller: the anonymous GET /api/sb/:id response now sets
X-Content-Type-Options: nosniff, a restrictive CSP, and Content-Disposition=
attachment for any mime outside a raster-image allowlist (png/jpeg/gif/webp/
avif). entry.mime is attacker-controlled, so an evil.svg/evil.html could
otherwise execute script inline on the Docmost origin (stored XSS). Mirrors
the public attachment route's hardening.
Stability:
- client.stashPage: reconcile mirrors AFTER the final document put, not only
before it. The doc blob is the newest entry and FIFO eviction drops the
oldest = this stash's own images, so the stored doc could reference an
evicted blob (consumer 404) and over-report images.mirrored. A bounded loop
now reverts doc-put-evicted mirrors, drops the stale doc blob, and re-puts
until stable. Regenerated packages/mcp/build/.
- sandbox.controller: emit Cache-Control on the 304 branch too (ttlSeconds is
computed before the conditional check).
Docs:
- Bump the MCP tool count 39 -> 40 across all READMEs and AGENTS.md (the
registry now exposes exactly 40 tools).
Refactor:
- SandboxStore.asSink() centralizes the {put,has,evict} sink + uri<->id
mapping; the embedded-MCP and in-app agent-tools wiring sites share it.
Tests:
- security headers (inline vs attachment, nosniff, CSP), 304 Cache-Control,
putAndLink URL form, has()/remove(), asSink() round-trip, getSandboxPublicUrl
(trailing-slash trim + APP_URL fallback), and a stash test where the doc put
itself evicts a mirrored image.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
||
|
|
6eb335d5e3 |
fix(sandbox): address PR #250 review — SSRF guard, eviction safety, cleanup (#243)
Security: - stash_page: reject path-traversal / percent-encoded srcs before the authed loopback fetch (resolveInternalFilePath), closing an SSRF/exfiltration hole where a crafted node.attrs.src could read an arbitrary internal GET endpoint into the anonymous sandbox. Stability: - stash_page: revert + recount mirrors FIFO-evicted by a later put in the same stash (no dangling sandbox refs, honest images.mirrored/failed); free image blobs if the final document put throws. - Reject/clamp non-positive SANDBOX_TTL_MS to the 1h default (warn once). - Log mirror failures unconditionally (console.warn, no blob bodies). Cleanup / architecture: - Remove dead expiresAt from SandboxPutResult. - Centralize the /api/sb route in SANDBOX_ROUTE_SEGMENT/SANDBOX_API_PATH and move URL composition into SandboxStore.putAndLink; drop the duplicated sink closures and the now-unused EnvironmentService injection from McpService and AiChatToolsService. - Un-export isInternalFileUrl; document the process-local (instance-bound) sandbox limitation in the tool description and .env.example. Docs/tests: - README/README.ru: 38 -> 39 tools + stash_page entry. - Add traversal/normalize/recursion unit tests, stash self-eviction + doc-put-throw + empty/octet-stream mock tests, controller If-None-Match (wildcard/weak/list) + Cache-Control tests, and SANDBOX_TTL_MS validation tests. Regenerate packages/mcp/build. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
2fe4ca8537 |
feat(sandbox): in-RAM blob sandbox for out-of-band page transfer (#243)
Add an ephemeral, process-local blob store so the in-app agent (and the
embedded MCP) can hand a large page document and its images to an external
consumer WITHOUT routing the bytes through the model context or Docmost auth.
- SandboxStore (@Injectable singleton): Map<uuid,{buf,mime,sha256,expiresAt}>
in RAM only. put() picks a per-blob cap by mime (image vs doc), enforces a
total-bytes RAM guard with oldest-first eviction, and stamps a TTL; get()
lazily expires. sha256 computed at put() doubles as the strong ETag. An
unref'd sweep interval clears expired entries and is cleared on destroy.
- GET /api/sb/:uuid anonymous controller: serves raw bytes with Content-Type,
Content-Length and ETag=sha256; 404 on missing/expired/non-UUID (anti-
traversal), 304 on a matching If-None-Match. No tokens, no 401 — the
capability is the unguessable UUID + short TTL + TLS. Auth-exempt the same
way as /api/files/public (no JwtAuthGuard) plus an /api/sb entry in main.ts's
workspace-resolution preHandler so a remote consumer with no workspace host
is not rejected.
- stash_page tool in both layers (MCP resource_link + in-app {uri,size,sha256,
images}). client.stashPage serializes the get_page_json shape, mirrors every
INTERNAL file/image src (type-agnostic, covers drawio/excalidraw/video/file)
into the sandbox under Docmost auth and rewrites src to the sandbox URL;
external http(s) srcs are left untouched; dedup by src; a failed image fetch
is counted, never aborts the doc.
- SANDBOX_PUBLIC_URL / SANDBOX_TTL_MS / SANDBOX_MAX_BYTES /
SANDBOX_MAX_IMAGE_BYTES / SANDBOX_MAX_TOTAL_BYTES wired through the
environment service + validation + .env.example.
- SandboxModule (@Global) provides the shared store to the controller,
McpService and AiChatToolsService (same instance for put and get).
Tests: SandboxStore (round-trip, sha256, TTL lazy + sweep, caps, eviction),
SandboxController (200+ETag+CT+CL, 404 missing/expired/non-UUID, 304), and a
mock-HTTP stashPage test (mirror+rewrite internal, keep external, dedup, failed
image counted, returns only a link). Interoperates with the vvzvlad/habr-mcp
consumer's anonymous-GET + sha256-ETag + resource_link contract.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
c4ed4a4855 |
fix(footnotes): strip bare definitions on rebuild; MCP full-doc + zip-import canonicalize tests (#228)
Review #6 (approve-with-comments) follow-ups: 1. canonicalize step 7 now strips bare footnoteDefinitions at ANY depth (stripFootnoteDefinitionsDeep), not just footnotesList, in BOTH copies. A definition hand-authored outside a list (e.g. nested in a callout via a raw-JSON write path) was left in place while a copy was also added to the rebuilt list -> duplicate, idempotent, self-perpetuating. Runs only in the rebuild path (after the lists are stripped); the fast-path / placement-keep branch is untouched. Added a shared-corpus case (bare def nested in a callout) to pin it in both mirrors. 2. markdown-clipboard: removed the dead top-level footnoteReference check in canonicalizePastedFootnotes (an inline atom is never a top-level slice child; only the descendants scan can find it). Test coverage: 4. New MCP binding tests (full-doc-write-canonicalize.test.mjs): update_page_json and copy_page_content canonicalize the persisted full doc, asserted via a new `replacePage` seam (symmetric to the existing `mutatePage` seam) so no live collab socket is needed. Routed both writers through the seam. 5. New server spec (file-import-task.service.footnote-canonicalize.spec.ts): the zip-import path (processGenericImport) canonicalizes footnotes — real markdown->HTML->JSON via a real ImportService over a temp-dir .md file, DB trx stubbed to capture the persisted page content. FileImportTaskService had no spec before. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
9c1f952b2f |
fix(footnotes): guard insert against nested/bare definitions, skip definitions-only paste, doc + reorder fixes (#228)
Must-fix: - insertInlineFootnote could glue a footnoteReference inside an EXISTING definition (nested footnotesList, or a bare footnoteDefinition with no list wrapper), which canonicalize then dropped as an orphan — silently losing the definition's prose. Now: (a) the body/notes boundary is computed from the first top-level block that IS or CONTAINS (recursively) a footnotesList/ footnoteDefinition, not just a top-level list; and (b) the insertNodesAfterAnchor core skips footnotesList/footnoteDefinition subtrees entirely (skipSubtreeTypes), so an anchor whose only match is inside a definition -> inserted:false (clean abort, no write). Added tests: nested-definition, bare-definition, and body-before-nested-list-still-inserts. - editor-ext footnote-canonicalize header listed `markdownToProseMirror` among the canonicalizing MCP paths; it is the NON-canonicalizing primitive. Replaced with `markdownToProseMirrorCanonical` (+ note that the plain primitive is for comment bodies) and added copy_page_content. - Client paste: canonicalizePastedFootnotes now skips a definitions-ONLY paste (no footnoteReference anywhere) — canonicalizing it would strip the reference-less list and yield an EMPTY paste. Added a test. Suggestions: - docmost_transform now runs validateDocStructure/validateDocUrls on the RAW transform output BEFORE canonicalizeFootnotes (mirrors updatePageJson), so a too-deep doc gives the intended max-depth error instead of a stack overflow. - docmost_transform tool description now states the RESULT is footnote-canonical (dryRun diff may show tidy-ups; idempotent after first run). - insertFootnote: dropped the dead `result ? … : undefined` ternaries and the `as any` casts (result is always set by the time we return; the not-found path throws and aborts mutatePage). `const r = result!;`. Tests / architecture: - Added a LIVE-plugin golden case: the real footnoteSyncPlugin leaves a list with non-empty content after it in place, and canonicalize agrees (placement parity is now a driven property, not a hand-set expected). - Added generateFootnoteId uuidv7 shape + uniqueness test. - Item 9: added the ENFORCEMENT-RULE comments at the server parseProsemirrorContent and the MCP canonicalizer header (any NEW full-doc persist path MUST canonicalize; fragments/append/prepend and comment bodies MUST NOT). Kept per-call-site over a brittle grep CI test (the replace-vs-fragment + comment-vs-page nuance makes a single wrapper unsafe). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
3fd66b4245 |
fix(footnotes): don't canonicalize comment bodies (data loss); canonicalize only page write paths (#228)
Must-fix (REAL DATA LOSS): - markdownToProseMirror is reused for COMMENT bodies (createComment/updateComment). It unconditionally canonicalized, so a comment carrying a standalone footnote definition ([^1]: text with no matching reference) had its whole footnotesList stripped (referenceIds.length===0 -> stripFootnotesListsDeep) — the text vanished. Fix: markdownToProseMirror no longer canonicalizes (content-preserving primitive); a new markdownToProseMirrorCanonical wraps it for the PAGE write paths (markdown import via importPageMarkdown, update_page markdown via updatePageContentRealtime). Comment callers keep the non-canonicalizing primitive. Updated the now-false header comment and added create/update-comment inline notes. Added collaboration tests: comment path PRESERVES a reference-less definition; page path still drops it AND still reorders real footnotes. Updated the page-import canonicalization test to use the canonical variant. Suggestions / architecture: - #2: collapsed transforms.footnoteDefinition onto the shared makeFootnoteDefinition factory (adds only the inner paragraph block id); kept the dependency direction transforms -> footnote-authoring (no circular import, mirror stays pure). - #3: confirmed docmost_transform auto-canonicalization is documented (inline comment, tool description, CHANGELOG) — no code change. - #4: copyPageContent is a FULL-document write (replacePageContent of a type:"doc"); added a defensive canonicalizeFootnotes pass (no-op on already-canonical source). - CHANGELOG entry refined to list the FULL-document write paths (incl. copy_page_content) and to state canonicalization is NOT applied to comment bodies. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
a77a0bc92b |
fix(footnotes): re-review #232 — refuse footnoteRef into codeBlock/definition, deep-strip nested lists, docs + cross-copy guard (#228)
Must-fix: - REAL BUG: insertInlineFootnote could splice a footnoteReference (inline atom) into a codeBlock or an existing footnoteDefinition, persisting a schema-invalid doc (insert_footnote skips validateDocStructure). Now the search is bounded to the BODY (before the first footnotesList) and the insertNodesAfterAnchor core refuses textblocks that can't hold the atom (codeBlock); when the only match is in such a place the insert returns inserted:false and the write aborts cleanly. Reachable via docmost_transform too. Added codeBlock / definition / fall-through tests. - Fixed the deepEqualJson doc comment in both copies: arrays are order-SENSITIVE (correctness depends on it), only object keys are order-insensitive. - README.ru.md MCP tool count 38 -> 39 (lines 36/47/63), matching README.md/AGENTS. - CHANGELOG [Unreleased] Added entry for insert_footnote + server-side footnote canonicalization on non-editor write paths (#228). Suggestions: - canonicalize step 5/7 now strips footnotesList at ANY depth (both copies), so a schema-valid list nested in a callout/blockquote can't leave duplicate defs. - Exclude the test-only footnote-corpus.ts fixture from the editor-ext build (tsconfig), so it no longer ships in dist/. - Removed the duplicate manual canonicalize cases from the MCP unit test (the shared corpus covers them via full deepEqual); kept idempotence + immutability. - insertInlineFootnote dedup key now keys off the inline array directly (footnoteContentKey({ content: inline })) instead of a throwaway node. Tests / architecture: - New client-wrapper test (#9): overrides a small mutatePage seam to assert the not-found path throws and persists NOTHING, and the success path shapes footnoteId/reused/message/verify and writes the right content. Fixed the misleading comment in footnote-write.test.mjs. - B: cross-copy corpus parity guard test (loads both corpora, asserts deep-equal) so a typo in one copy can't pass both suites green. - A: declined — the full-vs-fragment decision lives at the call site, so a prepareDocForPersist wrapper would be a bare alias for canonicalizeFootnotes; kept the existing per-call-site comments instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
30cb9d293c |
feat(footnotes): inline authoring + deterministic server-side canonicalization
Make footnotes author-inline: the agent/tool inserts a footnote at its point of use (anchor + text) and the numbering plus the bottom list are DERIVED deterministically server-side. The agent has no access to footnotesList and cannot desync — out-of-order lists, orphan definitions, and raw trailing [^id] blocks become structurally impossible. editor-ext: - canonicalizeFootnotes(docJSON) -> docJSON: a pure, EditorView-free port of footnoteSyncPlugin's end-state. Distinct reference ids in document order are the source of truth; exactly one trailing footnotesList holds one definition per referenced id in reference order (reusing the existing node or synthesizing an empty one); orphans dropped; duplicate definitions resolved deterministically (first wins, never lost); idempotent. - Unit tests + a golden parity suite: on every editor-reachable steady state the live footnoteSyncPlugin's JSON is a canonicalize no-op (byte-for-byte parity), and the canonicalizer additionally repairs the out-of-order list a non-editor write produces. mcp: - footnote-canonicalize.ts: behavioural mirror of the editor-ext canonicalizer (the MCP package is intentionally decoupled from the editor barrel, like footnote-lex/docmost-schema), plus footnoteContentKey for content dedup. - Auto-canonicalize on EVERY write path: markdownToProseMirror (fixes import ordering), update_page_json, and after every docmost_transform. Idempotent, so it is a no-op when footnotes are already canonical. - insert_footnote tool + insertInlineFootnote: anchor + markdown text -> a mark-safe footnoteReference and a content-dedup'd definition; the list and numbering are derived. Same-content footnotes reuse one number/definition. - canonicalizeFootnotes + insertInlineFootnote exposed as docmost_transform sandbox helpers. Tests: editor-ext 157 green; MCP 325 green; server + client tsc clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
f80276d41a |
refactor(review): address PR #185 review (lease leak, tests, changelog, jsonb seam)
8-point multi-aspect review of the batch PR; security/regressions were clean. 1. Lease leak: the #180 reorder moved `toolsFor` (which leases external MCP clients, refCount+1) ahead of buildSystemPrompt + forUser, but the only release (closeExternalClients) was bound to the streamText callbacks. A throw in between leaked the lease (refCount stuck, undici sockets held until restart). Define closeExternalClients right after the lease and wrap buildSystemPrompt+forUser in try/catch that closes-then-rethrows. 2. Cover the patch_node/delete_node dup-id refusal (#159 #6): extract the guard into a pure `assertUnambiguousMatch` (node-ops) and unit-test 0/1/>1. 3. Regress the body-before-title order (#159 #10): mock-HTTP test (collab fails fast against a server with no WS upgrade) asserts /pages/update (title) is NEVER posted when the body write fails — for updatePage AND updatePageJson. 4. CHANGELOG [Unreleased]: #180, #168 (Added); #163 (Fixed). 5. Add the missing en-US i18n keys (Back to references / {{label}}). 6. Drop the duplicate content/empty/blank cases in ai-chat.prompt.spec.ts (they repeat the buildMcpToolingBlock unit tests); keep only sandwich placement + both-safety-copies. 7. CI Postgres pg16 -> pg18 (match docker-compose). 8. jsonb decode seam: shared `parseJsonbValue(value, guard)` in database/utils.ts holds the legacy double-encoding self-heal in one place; parseToolAllowlist / parseModelConfig keep only a type-guard. Verified: server build + 124 unit + 15 integration; mcp 311; prettier clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
8f1af676ba |
fix(mcp): write page body before title to avoid split-brain on failure (#159)
updatePage (markdown) and updatePageJson wrote the title via REST FIRST, then the body via collab. If the body write failed (e.g. a collab persist timeout), the page was left with the NEW title over its OLD body — a split-brain the tool reported as an error but never repaired (red-team finding #10). Reorder both: write the body first, and only set the title after the body has persisted. Now a body-write failure leaves the title untouched (no split-brain). A title write failing after a successful body is rarer (REST is fast) and leaves correct content under a stale title — the strictly lesser inconsistency — which is the same trade-off the issue's "atomic, or roll back the title" intends, without the fragility of a rollback write that could itself fail. No unit test: both paths require a live collab provider and the suite has no provider mock; the change is a pure reordering. All 306 mcp tests still pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
fdaf20ca7b |
fix(mcp): refuse ambiguous patch_node/delete_node on duplicated ids (#159)
Docmost duplicates block ids on copy/paste, and copyPageContent writes the source document verbatim with the same ids. `patchNode`/`deleteNode` address a block by `attrs.id` via replaceNodeById/deleteNodeById, which act on EVERY node sharing the id — so a single patch_node/delete_node could silently replace/remove multiple unrelated blocks with no signal to the model (red-team finding #6). Guard both write paths: when more than one node matches the id, skip the write entirely (the transform returns null -> no mutation) and throw a clear "ambiguous id — N nodes share it" error so the model re-targets with a more specific anchor. Only an unambiguous single match is written; the 0-match and 1-match behavior is unchanged. The duplicate-count basis is covered by node-ops.test.mjs (replaceNodeById / deleteNodeById report count===2 for a 2-duplicate doc). The end-to-end guard is not unit-tested because patchNode/deleteNode require a live collab provider and the test suite has no provider mock. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
a766672574 |
fix(mcp): replaceImage no longer yanks the cursor (#164)
`mutateLiveContentUnlocked` — the write path used by `replaceImage` — still did the pre-#152 destructive write (delete the whole fragment + applyUpdate a fresh Y.Doc), discarding every Yjs node id. y-prosemirror anchors the editor selection to those ids, so an open editor's cursor snapped to the document end on every image swap, exactly the #152 jump that the main write path no longer causes. Switch it to the same `applyDocToFragment(ydoc, newDoc)` structural diff (updateYFragment) as the main path, so unchanged nodes keep their ids and the live cursor stays put. It runs its own atomic transact, so the old explicit transact/delete is gone; the now-unused docmostExtensions import is dropped. Regression tests (cursor-stability suite): a sibling paragraph's RelativePosition survives a top-level image src/attachmentId swap, and an image nested in a callout, matching the shapes replaceImage produces. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
0e8af13122 |
test(footnotes): cover footnoteWarnings import plumbing + doc fixes (#169 second review)
Follow-up to the merged #166/#169. Addresses the second review pass (comment 1227): - footnoteWarnings plumbing: extract a single `footnoteWarningsField(markdown)` helper (footnote-analyze) and use it at all three call sites (create_page, update_page, import_page_markdown) so the field is attached identically. - New unit test footnote-warnings-import.test.mjs pins the contract that was uncovered: the field is present on problems / omitted on clean input, and the IMPORT path analyzes the BODY after the docmost:meta / docmost:comments blocks (a footnote-like token inside those JSON blocks must NOT warn; a real body marker must). Tested via the same pure composition the importer uses (footnoteWarningsField(parseDocmostMarkdown(full).body)) — no collab socket needed; a regression that analyzed fullMarkdown or skipped the body split would now go red. - footnote.marked.ts: correct the stale module header — it claimed "only definitions that have a matching reference are emitted", which was never true (orphan defs are emitted; the editor sync plugin reconciles). Now describes first-wins + reuse + sync reconciliation. - derive-id golden test: rename the describe from "(cross-package drift guard)" to "(deterministic-scheme pin)" — there is no second package to drift against. editor-ext 129, MCP 304 (+3), client+server tsc clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
17e683a311 |
feat(footnotes): reuse semantics + import diagnostics (#166)
Footnotes were strict 1:1: a repeated `[^a]` reference was treated as a collision and re-id'd to `a__2`, and a reference with no definition synthesized its own empty one — so an agent-authored article with reused labels produced dozens of empty `kowiki__N` footnotes. Move to Pandoc REUSE semantics and add non-fatal import diagnostics. Reuse (core): - resolveCollisions (footnote-sync): repeated references sharing an id are REUSE (recorded once in document order, never re-id'd) — one number, one shared definition. Only a duplicate DEFINITION is re-id'd deterministically and, with no matching reference, dropped by the existing orphan policy (first-wins). CollisionPlan.refReids is now always empty (harmless no-op downstream). - extractFootnoteDefinitions (marked) and extractFootnotes (MCP): duplicate definition ids are FIRST-WINS (keep first, drop rest); reference markers are never rewritten. Removed the marker-rewriting and the now-dead deriveFootnoteId mirror + helpers from the MCP path. Import diagnostics: - New analyzeFootnotes() (MCP): fence-aware pure scan reporting dangling references, empty/duplicate definitions and `[^id]` markers inside table rows. - createPage / updatePage / importPageMarkdown now attach `footnoteWarnings` (only when non-empty) so an agent can fix its markup; the page is still created. Paste-reuse: - footnotePastePlugin remaps only ids the pasted slice DEFINES (a colliding definition); a pasted lone reference to an existing id keeps it (reuse). Tests: reuse/first-wins rewrites of footnote.test, footnote-markdown.test, footnote.marked.orphan.test and the MCP footnotes.test; new footnote-paste.test (editor-ext) and footnote-analyze.test (MCP). Deleted derive-id-parity.test.mjs (the MCP no longer derives ids; editor-ext's deriveFootnoteId keeps its own golden test). editor-ext 128, MCP 299, server roundtrip 2, client views 3, client+server tsc clean. Two review suggestions applied: corrected a stale "duplicated in MCP" comment and the dangling-reference warning wording. Note: the multi-backlink editor UI (a reused definition linking back to each of its references) is deferred to a follow-up — this PR delivers the data-integrity core (reuse + warnings + paste-reuse). Forward links and numbering already reuse correctly; the backlink currently targets the first reference. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
4201f0a313 |
feat(comments): make AI comments inline-only with robust anchoring
The in-app AI chat hardcoded type='page' and the shared createComment swallowed anchoring failures silently, so agent comments never got a text anchor/highlight. - Forbid page-type comments for the agent: top-level comments are always inline and require an exact `selection`; replies inherit the parent anchor (stored as the historical `page` type). - Throw and roll back the just-created comment when the selection cannot be anchored, instead of leaving an orphan unanchored comment. - Add comment-anchor module: text normalization (smart quotes, dashes, nbsp, collapsed whitespace) and matching across adjacent text nodes within a block, so selections crossing inline-code/bold/link anchor. - Update create_comment (MCP) and createComment (ai-chat) tool schemas and descriptions; add unit + mock-HTTP orchestration tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
1e7a306f96 |
feat(mcp): add hierarchical tree mode to list_pages
list_pages gains an opt-in `tree` parameter on both surfaces (the
@docmost/mcp server tool and the AI-chat agent tool), which share the
same DocmostClient.listPages. Default behavior (recent-by-updatedAt flat
list) is unchanged.
- client.ts: listPages(spaceId?, limit=50, tree=false); when tree is
true it requires spaceId (throws a specific error otherwise), walks the
sidebar tree via the existing bounded/cycle-safe enumerateSpacePages,
and returns a nested tree; limit is ignored in tree mode.
- lib/tree.ts: new pure buildPageTree() — lean nodes { id, slugId, title,
children? }, children sorted by position (code-unit order), orphans
promoted to roots, cycle-safe.
- index.ts + ai-chat-tools.service.ts: expose `tree` in the tool schemas
and descriptions; docmost-client.loader.ts: mirror the new signature.
- tests: add packages/mcp/test/unit/tree.test.mjs (nesting, ordering,
lean shape, orphan promotion, cycle/self-reference safety).
- rebuild @docmost/mcp (build/ is tracked and loaded at runtime).
|
||
|
|
a945b47749 |
fix(mcp): verifiable mutation results + refuse formatting edits in edit_page_text
edit_page_text reported "success" when asked to change formatting (e.g. remove strikethrough): the markdown-strip fallback matched the bare text, the replace preserved marks, and the tool returned success — so the agent believed it had fixed something that never changed. Two fixes, both in the shared @docmost/mcp DocmostClient so they reach BOTH the standalone MCP server and the in-app AI chat (which loads @docmost/mcp): - Verifiable result for every content mutator: mutatePageContent now computes a `verify` change-report (text inserted/deleted, blocks changed, per-mark-type delta, integrity/structure delta) via summarizeChange() and returns it on all mutators (incl. replaceImage via mutateLiveContentUnlocked). diffDocs is text-only, so the mark/structure delta is what surfaces formatting changes. - edit_page_text hard-refuses formatting edits: applyTextEdits rejects an edit whose find/replace differ only in markdown markers (via stripBalancedWrappers, which strips balanced wrappers/links without trimming whitespace/emoji, so plain-text edits like trailing-space trims, snake_case, math are NOT refused). A fully-refused batch errors instead of silently succeeding. Also updated the model-facing edit_page_text descriptions in BOTH tool layers (packages/mcp/src/index.ts and ai-chat-tools.service.ts) to drop the misleading "strip-and-retry tolerated" wording and point formatting changes to patch_node. New unit tests: test/unit/diff-verify.test.mjs, test/unit/json-edit-refuse.test.mjs. |
||
|
|
334a50f003 |
feat(mcp): fetch insert_image/replace_image sources from web URLs
The insert_image and replace_image MCP tools previously uploaded only local files (filePath), which an AI MCP client cannot provide — it has no access to the server filesystem. Replace filePath with a required imageUrl and download the image over http(s). - client.ts: add fetchRemoteImage(url, maxBytes) — http/https-only scheme allowlist, 20 MiB cap (maxContentLength + post-download length recheck), 30s timeout, Content-Type→MIME resolution with URL-extension fallback, filename derivation with canonical extension - client.ts: rewrite uploadImage(pageId, url) as URL-only; drop the local-file branch, imageMimeFromPath and the fs import; insertImage/ replaceImage now take a url - index.ts: drop filePath, add required imageUrl to both tools; update tool descriptions and SERVER_INSTRUCTIONS - README: document the web-URL behaviour |
||
|
|
afd2248a75 |
feat(ai-chat): tolerate markdown in edit_page_text/insert_node locators
Locators (edit_page_text `find`, insert_node `anchorText`) are matched against the document's plain text, so a model-supplied locator carrying markdown wrappers (**bold**, *italic*, `code`, [t](url)) or trailing emoji never matched and the edit/insert failed. Add stripInlineMarkdown() and a fallback: try the locator verbatim first (exact match wins, so literal asterisks/underscores still work), and only on zero matches retry with a markdown-stripped form. The ambiguity guard runs on the post-fallback count, and `replace` / inserted node content are never stripped, so no formatting is lost. Failed edits gain an atom-aware reason plus a bounded "closest block text" hint; the insert_node "anchor not found" error now points at plain-text anchors / anchorNodeId. New packages/mcp/src/lib/text-normalize.ts (+ unit tests); wired into json-edit.ts and node-ops.ts; tool descriptions updated. Tests: 212 pass. |
||
|
|
fc9088b74d |
fix(ai-chat): cross-mark text edits, partial batches, JSON-string node parity
edit_page_text (applyTextEdits) now matches at the inline-block level instead of
per text node, so a find/replace may cross bold/italic/link boundaries; the
replacement inherits marks from the unchanged common prefix/suffix via a diff
splice. Atom (non-text inline) slots can never be part of a match, making the
U+FFFC placeholder collision-safe, and inserted text never inherits an atom's
marks.
The edit batch is no longer all-or-nothing: applyTextEdits returns
{ doc, results, failed } and applies what it can; editPageText writes only on a
real change (no spurious history version for a no-op) and throws an aggregated,
actionable error only when nothing applied.
The AI-chat insert_node / patch_node / update_page_json tools now JSON.parse a
node/content argument that arrives as a string, matching the standalone MCP
server (this is what made insert_node fail under OpenAI tool calls).
Tool descriptions gain concrete ProseMirror examples and reflect the new
edit_page_text behavior. Adds/updates json-edit unit tests (183 pass).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
44b340dc1a |
feat(ai-chat): agent write tools, provenance wiring, chat panel + provider settings UI" -m "Backend:
- Add reversible write tools to the per-user agent toolset (page create/update/ move/soft-delete; comment reply + resolve), exposed under the user's JWT and enforced by Docmost CASL; no permanent/force delete (D3). - Non-spoofable agent provenance: sign actor/aiChatId into the access and collab tokens (TokenService), propagate via jwt.strategy onto the request, and set pages.last_updated_source/last_updated_ai_chat_id on REST create/update/move and comments.created_source/resolved_source/ai_chat_id. - packages/mcp: add an optional getCollabToken provider (content-edit provenance) and guard against empty tokens; service-account /mcp path unchanged. Frontend: - Admin 'AI / Models' settings section: provider/model/embedding/base URL, a write-only API key field, system prompt, and Test connection. - AI chat panel (useChat + DefaultChatTransport): conversation list, streamed messages, tool-call action log and page citations; header entry point gated on settings.ai.chat. Compile-verified (server nest build + client tsc/vite); not yet live-tested. Known gaps: history 'AI agent' badge (C3), vector RAG (D), external MCP (E); chat tool-card citation links pending a fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
683da7a4c5 |
feat(ai-chat): per-user AI agent backend — LLM config, read-only agent, provenance schema
WIP checkpoint of the gitmost AI-chat backend (plan stages A + B1 + B3a). The agent acts under the requesting user's JWT (Docmost CASL enforces page access); the external service-account /mcp endpoint is untouched. LLM provider config (A2-A4): - integrations/crypto: AES-256-GCM SecretBoxService (key derived from APP_SECRET, per-record salt/iv; clear error on rotation instead of crashing). - ai_provider_credentials table/repo/types: encrypted API key stored outside workspace settings/baseFields, write-only (never returned by any endpoint). - integrations/ai: per-workspace AI SDK v6 provider driver (openai/gemini/ollama), admin-gated GET(masked)/PATCH(write-only key)/Test endpoints; settings.ai.provider holds non-secret config incl. systemPrompt. Removed unused AI_* env getters (DB is the single source of truth). Chat module (A1, A5-A8): - ai_chats/ai_chat_messages repos (workspace-scoped, soft-delete, tsv never selected). - core/ai-chat: CRUD + POST /ai-chat/stream (Fastify hijack + AI SDK v6 pipeUIMessageStreamToResponse, abort on disconnect, persist user/assistant msgs). - Agent loop: streamText + stepCountIs(8); read tools searchPages/getPage via a per-request DocmostClient over loopback REST under the user's minted access token. - Gate settings.ai.chat (+ 503 when provider unconfigured); buildSystemPrompt with a non-removable safety/anti-prompt-injection framework. Per-user rate limit. Per-user auth (B1): - @docmost/mcp DocmostClient gains an additive getToken variant (carry a user JWT, re-fetch on 401) and exports DocmostClient; the email/password service-account path (external /mcp, stdio) is unchanged. Agent-edit provenance backbone (B3a): - Migration: pages/page_history (last_updated_source, last_updated_ai_chat_id) and comments (created_source, ai_chat_id, resolved_source). - Signed actor/aiChatId claim in the collab token; onAuthenticate propagates it, onStoreDocument writes it with a sticky agent marker, saveHistory copies it. Migrations auto-run on boot (additive). Write tools, frontend, RAG and external MCP servers are not in this checkpoint. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
1f5987d6b0 |
feat(mcp): serve embedded community MCP server at /mcp
Replace the removed enterprise EE MCP (private apps/server/src/ee submodule,
license-gated /mcp route) with our docmost-mcp, vendored as an isolated ESM
workspace package and served by the server over HTTP — no enterprise license.
Backend:
- Add packages/mcp (@docmost/mcp): vendored docmost-mcp refactored into a
side-effect-free createDocmostMcpServer() factory (38 tools preserved),
stdio entry kept in stdio.ts, Streamable-HTTP session manager in http.ts.
- Add apps/server McpModule: @Post/@Get/@Delete('mcp') (served at /mcp via the
existing global-prefix exclude), @SkipTransform + reply.hijack to bridge raw
Fastify req/res into the SDK transport. The module dynamically imports the
ESM-only package from CommonJS via a Function-indirected import resolved with
require.resolve + file:// URL. Gated by the workspace ai.mcp toggle, a
service-account (MCP_DOCMOST_EMAIL/PASSWORD/API_URL) and optional MCP_TOKEN;
per-session idle eviction (MCP_SESSION_IDLE_MS).
- Drop the enterprise license check on mcpEnabled in workspace.service.
- Dockerfile: copy packages/mcp into the production image.
- .env.example: document MCP_DOCMOST_*, MCP_TOKEN, MCP_SESSION_IDLE_MS.
Frontend:
- Recreate the community "AI & MCP" workspace-settings panel (mcp-settings.tsx):
admin-only toggle on settings.ai.mcp with optimistic update, copyable
${APP_URL}/mcp URL; wired into workspace-settings page. Reuses existing i18n.
Fixes:
- Pin packages/mcp tiptap deps to 3.20.4 (matching the client) and inline
getStyleProperty, preventing a duplicate @tiptap/core@3.26.1 from leaking into
the client editor via pnpm shamefully-hoist (was breaking apps/client tsc).
|