PR #333 deliberately changed the canonical markdown export of callout
nodes to the Obsidian-native format ('> [!type]' + blockquote body,
pinned by packages/prosemirror-markdown unit tests); the importer still
parses both ':::type' fences and '> [!type]'. The get_page e2e assertion
was missed in that switch and still expected ':::info', failing the
e2e-mcp job on develop since 4369bbc5.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The in-app AI agent shipped all ~41 tool schemas on every model step. This
adds a two-tier catalog: core tools (frequent or one-line) stay always-active;
the rest are advertised as a compact catalog and their full schema is fetched
on demand via the loadTools meta-tool, wired through ai@6 prepareStep's
per-step activeTools.
- tools/tool-tiers.ts: CORE_TOOL_KEYS, INLINE_TOOL_TIERS, applyLoadTools,
catalog builders (+ tool-tiers.spec.ts, 13 cases).
- ai-chat.service.ts prepareAgentStep: returns activeTools =
[...CORE_TOOL_KEYS, loadTools, ...activatedTools]; per-turn activated Set.
- ai-chat.prompt.ts: buildToolCatalogBlock renders the deferred catalog.
- mcp/tool-specs.ts: tier + catalogLine metadata (external snake_case /mcp
transport unchanged).
- EnvironmentService.isAiChatDeferredToolsEnabled(): AI_CHAT_DEFERRED_TOOLS,
default ON per issue intent (kill-switch =false restores old behavior).
Gate: server ai-chat 631/631, tool-tiers 13/13, mcp 472/472, tsc clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The RE2 swap narrowed the contract: regex:true rejects lookaround ((?=…)/(?<=…))
and backreferences (\1). The internal JSDoc was updated, but the AGENT-VISIBLE
tool-spec (the only text the agent reads at call time, single-sourced to both
transports) still said 'a JS regular expression' — so an agent would write a
lookahead/backref and hit an error. Updated the .description and the regex flag
.describe() to name RE2 (linear-time, ReDoS-safe), list that char classes / word
boundaries / anchors / quantifiers work while lookaround and backreferences do
NOT, and keep the 'invalid/unsupported regex -> clear error' note.
mcp: tsc clean; tool-specs / server-instructions / contract tests green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Maintainer decision on the escalated ReDoS fork: use re2. The regex path
compiled agent-supplied patterns with `new RegExp` and ran them synchronously in
the shared event-loop; a catastrophic-backtracking pattern (e.g. `(a+)+$`) hung
the whole Node backend for all users (the tool is in both transports incl. the
in-app apps/server agent), and size caps do NOT bound backtracking.
Switch the regex engine to re2 (Google RE2, linear-time, no backtracking):
- `new RE2(query, caseSensitive?'g':'gi')`. RE2 extends RegExp, so eachMatch and
the zero-length-match lastIndex guard are unchanged.
- Unsupported patterns are now a CLEAN error, not a hang: RE2 throws on invalid
syntax AND on the backtracking-only features it can't do (lookaround
(?=…)/(?<=…), backreferences \1) — caught at compile and returned as a clear
tool error telling the agent to rewrite without them.
- Removed MAX_CONTAINER_TEXT + the per-container slice (re2 is linear, so it's no
longer a ReDoS defense, and truncating risked silently dropping real matches in
a long container); kept MAX_PATTERN_LENGTH as a cheap query sanity cap.
- Verified: `(a+)+$` over 50k `a` completes in ~4ms; lookaround/backref throw.
- Added re2 (^1.21.0) to packages/mcp; lockfile updated.
Reviewer DO items:
- F1 [doc]: removed the false "pass nodeId as a comment anchor" claim
(create_comment has no nodeId param — it needs a text `selection`). Fixed in
tool-specs.ts + page-search.ts (module + SearchMatch JSDoc) + client.ts; the ref
is for get_node/patch_node, and for a comment you build a unique text selection
from before+match+after.
- F2 [doc]: clarified `#<index>` refs (id-less table/cell) are accepted by get_node
but NOT patch_node (id-only).
- F3 [test]: round-trip — each match's nodeId fed to the real getNodeByRef
(attrs.id node + `#<index>` table-cell) to prove the ref format is consumable.
- F4 [test]: before/after edge-pinning (match in first 40 chars of a long
container; index 0 → before==""; container end → after=="").
- New re2 tests: catastrophic patterns complete fast; lookaround/backref → error.
mcp: tsc clean; node --test 472 passed (+5). apps/server: tsc --noEmit clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The listComments Comment[] -> { items, resolvedThreadsHidden } shape change
reached every src/host consumer but not the live-server e2e harness (run via
`node test-e2e.mjs`, not the node --test gate — so the green suite missed it).
The 4 calls now read .items; the post-resolve check passes includeResolved:true
so it still sees the now-resolved root c1 (the default feed hides it).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Editorial roles (Corrector/Factchecker) brute-forced `get_node` block-by-block to
find occurrences (unquoted «ё», straight quotes, «т.е.»), burning tokens. New
`search_in_page(pageId, query, {regex?, caseSensitive?, limit?})` reads the page's
ProseMirror JSON via the existing getPageRaw and searches it IN MEMORY — no server
endpoint, no DB/schema change, no touch to the packages/mcp/src/lib schema mirror.
New pure `searchInDoc(doc, query, opts)` (packages/mcp/src/lib/page-search.ts):
recursive descent to each TEXT CONTAINER (paragraph/heading/table-cell paragraph),
glues its inline text via `blockPlainText` (a match survives inline-mark
boundaries — e.g. «т.е.» split across bold/italic), searches literal (indexOf) or
regex, and returns `{ total, truncated, matches:[{ nodeId, blockIndex, type,
before, match, after }] }`. `nodeId` is the container's attrs.id or the
`#<topLevelIndex>` of the enclosing top-level block — the SAME ref format
get_node/patch_node/comment-anchoring accept (verified identical to getNodeByRef),
so the agent goes straight from a hit to a targeted comment; `before`/`after` are
~40-char windows for a unique selection. `total`/`truncated` always reported (never
silent truncation). Lives in the SHARED_TOOL_SPECS registry → exposed in BOTH
transports (external /mcp + in-app AI-chat), with a SERVER_INSTRUCTIONS line and a
DocmostClientLike signature + contract-test entry. Corrector/Factchecker prompts
get a one-line "use search_in_page first" hint (versions bumped, catalog hash lock
refreshed).
Guards: empty/whitespace query → clear error; invalid regex → clear error (not a
generic 500); zero-length regex matches (`\b`, `a*`) skipped with lastIndex
advanced (no loop/flood); MAX_PATTERN_LENGTH=1000, MAX_CONTAINER_TEXT=100k bound
each exec; limit clamped [1,200] (default 50).
Tests: new page-search.test.mjs (17) — literal+regex, case-sensitivity,
mark-boundary glue, nodeId for paragraph/heading (attrs.id) and table-cell
(#<index> fallback), context bounds, limit/total/truncated + clamp, invalid
regex/empty/over-long errors, zero-length skip, empty-doc null-safety.
mcp: tsc clean; node --test 467 passed (+17). apps/server: tsc --noEmit clean
(DocmostClientLike + wiring). catalog check.mjs OK.
Known limitations (from internal review, non-blocking):
- Residual ReDoS: a crafted catastrophic-backtracking pattern (e.g. `(a+)+$`)
against a large single container can hang the event loop — JS regex is not
interruptible, so the length caps bound the base but not the backtracking.
Realistic exposure is low (containers are small; the pattern is supplied by the
authenticated model). Candidate for a follow-up hardening (safe-regex validation
or a worker+timeout) if it matters.
- Case-insensitive LITERAL search folds via toLowerCase; a char whose lowercase
differs in length (e.g. Turkish İ) BEFORE a match could shift the context
window — negligible for the RU/EN editorial scenario.
- On a `#<index>` table-cell fallback, `type` is the inline container ("paragraph")
while nodeId addresses the top-level block — addressing is correct; the field is
documented as the container's type.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The AI agent (MCP + in-app chat) saw ALL comments incl. resolved via two
channels, cluttering its context and breaking fragment search. Default now:
the agent sees only ACTIVE discussions; resolved is opt-in. Active anchors and
threads are always kept.
Channel 1 — resolved comment anchors on agent reads (converter option):
`convertProseMirrorToMarkdown(content, options?)` gains
`options.dropResolvedCommentAnchors` (default false — zero change for every
existing caller incl. git-sync). Both `case "comment"` emitters (top-level and
the raw-HTML inlineToHtml path) emit BARE text (no `<span data-comment-id>`) when
`resolved && the flag`; active anchors keep their wrapper. mcp `getPage` passes
the flag; `export_page_markdown` does NOT (lossless export must preserve resolved
anchors — that is why it is an opt-in option, not unconditional); `get_page_json`
is untouched (lossless PM JSON). Built on the #293 package converter.
Channel 2 — `list_comments` default active-only: `listComments(pageId,
includeResolved=false)` now returns `{ items, resolvedThreadsHidden }` (was a
bare array). By default a RESOLVED top-level thread is hidden wholesale — the
root AND every reply anchored to it (a thread is gated only by its root's
resolvedAt; a resolved reply under an ACTIVE root stays). `resolvedThreadsHidden`
counts hidden threads so the agent knows to re-query. `includeResolved:true`
returns everything. The `includeResolved` param is added to both tool
registrations (MCP index.ts + in-app ai-chat-tools.service.ts); `DocmostClientLike`
signature updated. Server `findPageComments` is NOT touched — the web UI's tabs
depend on the full feed; filtering is only at the mcp-client level. All internal
call sites (export_page_markdown / checkNewComments / transformPage) updated to
`.items` with `includeResolved:true` to keep their full-feed behavior.
The comment model is assumed FLAT (a reply's parentCommentId points at the
thread root) — documented in the filter; a future reply-of-reply model would
need a root-walk there.
Tests: resolved-comment-anchors.test.ts (6 — anchor dropped with flag / kept
without, for BOTH emitters; active always kept); list-comments-resolved.test.mjs
(4 — resolved thread+reply hidden + counter; includeResolved:true returns all;
an ACTIVE thread with a RESOLVED reply is NOT hidden).
package vitest: 664 passed; tsc clean. mcp: node --test 458 passed; tsc clean.
apps/server + git-sync: tsc clean (converter option default-off).
NOTE: based on feat/293-B (#293/#326 STEP 5) — the converter lives in the
package; this PR is stacked on #333 and its base retargets to develop once #333
merges. mcp/build is gitignored (not committed).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
F1 [critical, data-loss] — escape the image alt in ``. Canon #4 moved
the top-level image off the lossless <img> form onto markdown ``, but
the alt was inserted raw; the importer re-parses the `![alt]` label as CommonMark
inline, so a markdown-active char in a realistic description ("Figure [1]", "the
*new* logo", "a]b[c") broke the round-trip — the image node vanished or emphasis
collapsed. Now `escapeLinkText(imgAttrs.alt ?? "")`, exactly as the link-form
media (attachment/pdf/embed) already escape their visible text. Regression test
added: six active-punctuation alts round-trip byte-stable with the node intact.
F2 [drift] — re-export `clampCalloutType` / `sanitizeCssColor` from the package
barrel and drop the verbatim copies in the mcp schema shim. The copies had
already drifted (the mcp `clampCalloutType` lost the callout-type alias mapping
the package applies), which is exactly the schema drift #293 exists to kill. The
sanitizers now live only in the package; mcp `schema.test.mjs` exercises the
single alias-aware implementation.
F3 [docs] — AGENTS.md:296 said `packages/mcp/build/` is committed; this branch
gitignored it (git-sync/prosemirror-markdown convention). Updated the line to say
it is gitignored and rebuilt in CI/Docker via `pnpm build`.
F4 [cleanup] — removed the dead `test.typecheck` block from the package
vitest.config.ts and deleted tsconfig.vitest.json. Both were copied verbatim from
git-sync; this package has zero `*.test-d.ts` files, and the ported comments
referenced git-sync-only entities. Kept the `docmost-client` resolve alias
(22 tests use it) and the runtime include/environment.
package vitest: 658 passed (+1 F1 regression); tsc clean. git-sync: 268 passed.
mcp: node --test 454 passed; tsc clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
mcp had its OWN drifted copy of the converter (markdown-converter.ts ~900 lines,
docmost-schema.ts ~1270 lines, markdown-document.ts) — older than the shared
package, missing the git-sync fixes AND the #293 canon. This switches mcp's
converter CORE to @docmost/prosemirror-markdown, so mcp jumps straight to the
canonical format and the drift-generating second copy is gone.
- markdown-converter.ts / markdown-document.ts / docmost-schema.ts become thin
re-export shims of the package (convertProseMirrorToMarkdown, the docmost:meta
envelope, docmostExtensions + docmostSchema=getSchema(docmostExtensions)). The
mcp-only helpers clampCalloutType/sanitizeCssColor are preserved verbatim in
the schema shim (the package doesn't expose them via its barrel). ~2170 lines
of the drifted converter/schema bodies deleted.
- collaboration.ts drops its own ~360-line marked pipeline (preprocessCallouts,
bridgeTaskLists, extractFootnotes, the footnoteRef extension) and re-points to
the package's markdownToProseMirror, keeping markdownToProseMirrorCanonical and
all the yjs/collab write glue. footnote-lex/analyze doc comments updated (they
now describe advisory legacy-syntax diagnostics, not an importer).
Schema parity verified: the package schema is a strict SUPERSET of mcp's old
schema — every node and attr mcp declared is present (the package only adds
status/pageEmbed/transclusion*/subpages.recursive/etc.), so nothing is silently
dropped on the switch. The switch actually FIXES two pre-existing mcp data-loss
bugs its own tests documented: htmlEmbed and pageBreak now round-trip (were
dropped by the old mcp converter).
Footnotes: the package assembles inline ^[body] footnotes on import (sequential
fn-N ids, identical bodies merged), so mcp's canonicalizeFootnotes is now an
idempotent no-op after it (verified). Legacy reference footnotes [^id]/[^id]:
are inert literal text (canon #2 no-backward-compat) — lossless, the text
survives verbatim.
Build hygiene: packages/mcp/build/ is now gitignored and untracked, matching the
git-sync/prosemirror-markdown convention (private package, rebuilt in CI/Docker,
so src and prod can never silently diverge). This also removes a dead untracked
build/_vendored_editor_ext/ artifact that a broad `git add` would otherwise
commit.
Dependency: packages/mcp/package.json gains @docmost/prosemirror-markdown
(workspace:*); pnpm-lock.yaml gets the matching link importer (mirrors git-sync).
mcp tests updated deliberately to the canonical forms (highlight ==, math $…$,
image <!--img-->, drawio/media discriminators, subpages/pageBreak
comments, textAlign, inline ^[…] footnotes) with strict assertions; 4 structural
safety-net round-trip tests added.
mcp: node --test 454 passed; tsc clean. package: 657 passed. git-sync: 268 passed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Audit of all 41 tool descriptions against the actual implementation found
factually wrong or misleading texts:
- list_comments claimed '(paginated)' — it takes only pageId and returns ALL
comments in one call (internal pagination); now also states that RESOLVED
threads are included and how to filter them. In-app twin synced.
- search claimed the limit default is 'applied by the client' — the client
deliberately omits it so the SERVER applies its default.
- create_page's '(automatically moves it to the correct hierarchy)' said
nothing useful — now documents parentPageId nesting semantics; move_page
drops the stale 'essential for organizing pages created via create_page'.
- share_page now warns the page becomes accessible to ANYONE with the URL.
- get_page (both transports) now explains inline <span data-comment-id> tags
are comment anchors (incl. resolved) — markup, not page text.
- patch_node/delete_node/insert_node pointed only at the expensive page-JSON
view for block ids — now route through the cheap page outline first.
- docmost_transform marks 'Примечания переводчика' as the DEFAULT
notesHeading, overridable for non-Russian pages.
Checks: @docmost/mcp tests 450/450 (incl. the server-instructions guard);
server ai-chat-tools spec 20/20; mcp build/ artifacts rebuilt.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The intent-routing guide had rotted: 17 of 41 registered tools were absent
(get_outline, get_node, the whole table_* family, search, stash_page, sharing,
page lifecycle), and two tips were actively harmful — 'read block ids via
get_page_json' told agents to pull the whole ~100KB document when get_outline
exists precisely to grab ids cheaply, and 'table cell -> patch_node by
attrs.id' dead-ends because table nodes carry no attrs.id.
- Rewrite SERVER_INSTRUCTIONS as intent clusters (READ / EDIT / PAGES /
COMMENTS / HISTORY) covering every tool except get_workspace; add safety
notes (share_page = PUBLIC, delete_page = soft) and a comment-anchor
markup warning for get_page.
- delete_page tool description: state SOFT delete / restorable explicitly.
- MAINTENANCE RULE comments at both registration sites (index.ts,
tool-specs.ts) + an AGENTS.md convention bullet: adding/renaming/removing
a tool REQUIRES updating the guide.
- New guard test (test/unit/server-instructions.test.mjs): extracts every
registered tool name from source and fails when one is not mentioned in
the shipped SERVER_INSTRUCTIONS (word-boundary match, so get_page can't
hide behind get_page_json); EXCEPTIONS list is itself validated against
the registry. SERVER_INSTRUCTIONS exported for the test.
Tests: @docmost/mcp 450/450 (448 + 2 new).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- editorial roles (ru/en): proofreader and line editor attach suggestedText
replacements to targeted fixes; fact-checker ALWAYS attaches the ready
correction for [Incorrect] verdicts; structural editor and narrator get a
light-touch rule for in-place rewordings; role versions bumped and the
content-hash lock refreshed
- MCP SERVER_INSTRUCTIONS: route 'propose a concrete text fix for one-click
human approval' to create_comment with suggestedText (unique-selection
reminder); build/ artifacts rebuilt
- AI-chat SAFETY_FRAMEWORK: mention the comment-suggestion capability so the
default assistant offers ready fixes instead of only describing changes
Checks: catalog check.mjs OK; @docmost/mcp tests 448/448; server
ai-chat.prompt spec 28/28.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
F1 [blocking]: a suggestion whose anchor matched via normalization could never
be applied (spurious 409). The comment mark lands on the doc's ACTUAL text
(Docmost auto-converts to typographic quotes/dashes/nbsp), but the stored
selection — used as expectedText at apply — was the raw ASCII agent input
(+substring(0,250)). So replaceYjsMarkedText's strict joined!==expectedText
always failed and threw "text changed" though nobody edited. Fix: new pure
getAnchoredText(doc, selection) reconstructs the exact raw doc substring the mark
covers (slicing identical to spliceCommentMark); on the suggestion path
client.createComment stores THAT as selection, so expectedText equals the marked
text and apply returns applied:true. Live anchoring still uses the raw agent
selection (normalization still finds the anchor). Truncation raised 250->2000
(+ DTO @MaxLength(2000)) so the anchored substring is never cut below the mark
span. Ordinary comments unchanged. AI-chat shares client.createComment, so
covered. Regression tests: getAnchoredText raw-vs-ASCII; create payload selection
is the typographic substring; apply with typographic expectedText -> applied.
F2 [blocking]: added comment.controller.spec.ts pinning that validateCanEdit runs
before applySuggestion (Forbidden -> applySuggestion never called; happy path ->
called; missing comment -> 404 without authorizing).
MCP 448 pass; server comment+yjs 54 pass. MCP build/ rebuilt.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Agents can attach a suggested replacement when creating an inline comment, via
both the MCP create_comment tool and the AI-chat createComment tool.
Because applying a suggestion edits the EXACT anchored text, an ambiguous anchor
would let Apply corrupt the wrong occurrence. So when suggestedText is set the
selection must occur EXACTLY ONCE:
- new countAnchorMatches(doc, selection) counts occurrences across all blocks
(same normalization/traversal as canAnchorInDoc), counting occurrences (2 in
one block => 2) — stricter than block-count, never under-counting distinct
occurrences (false-unique is the dangerous direction).
- client.createComment gains suggestedText: a pre-check (getPageJson +
countAnchorMatches: 0 => not-found, >=2 => ambiguity error) before create, and
an AUTHORITATIVE live check inside the anchoring mutation that recomputes on the
live doc and, if != 1, aborts and rolls back the just-created comment (reusing
the existing safeDeleteComment "anchor not found" path). Ordinary comments keep
first-occurrence behavior unchanged.
- suggestedText is rejected on a reply or without selection in all three layers
(MCP handler, MCP client, AI-chat tool), mirroring the server DTO/service.
- filterComment surfaces suggestedText/suggestionAppliedAt/suggestionAppliedById.
- DocmostClientLike.createComment signature updated. MCP build/ rebuilt.
Tests: countAnchorMatches (0/1/N, within/across/nested block, span nodes,
quote normalization); createComment (ambiguous refused pre-create, reply and
no-selection rejected, unique succeeds and forwards suggestedText, filterComment
surfaces it); ai-chat schema accepts suggestedText. MCP 443 pass; ai-chat 601 pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
F1: remove an accidentally-committed self-referential symlink
packages/mcp/node_modules/node_modules -> an absolute build-machine path (leaked a dev
home path, a pnpm artifact useless in the repo), and add a targeted ignore so it can't
recommit.
F2: the commentUpdated broadcast re-emitted the caller's pre-loaded comment mutated in
place, so the {agent,launcher} stack survived only because the controller happened to
load it with includeCreator:true — the fragile coupling that let the stack vanish on
edit once already. update() now RE-FETCHES the enriched comment before broadcasting,
symmetric with create()/resolveComment() (the row is already persisted), so all three
broadcasts carry the stack regardless of any caller's pre-load. Adds a caller-contract
test asserting all three broadcasts emit agent/launcher for an agent comment and neither
for a non-agent one, spotlighting the update path (non-vacuous vs the old re-emit).
F3: add a direct test of the page-history attachPageHistoryAgent mapping (its distinct
lastUpdatedSource/lastUpdatedAiChatId/lastUpdatedBy column set): role / no-role / MCP /
non-agent, and that the internal agentRole join column is stripped.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The same tool metadata (zod schema + model-facing description) was hand-duplicated
between the standalone MCP server and the in-app AI-chat agent, so every tweak had to
land in two places and copies drifted (a materialized parity bug). The shared
transport-agnostic registry (packages/mcp/src/tool-specs.ts) already de-duplicates 14
tools; this migrates two more genuinely-identical ones — patch_node/patchNode and
insert_node/insertNode. The canonical description is a strict SUPERSET of both originals
(keeps MCP's "without resending the whole document" + table-structure/anchor guidance
AND the in-app "reversible via page history" / "exactly one of anchorNodeId or
anchorText" framing — no model-facing guidance dropped); the schema is identical (the
in-app side just gains MCP's .min(1) on ids, a safe tightening). Each transport keeps its
own execute/auth wrapper, and the in-app parseNodeArg node-arg normalization is unchanged.
The three table tools are intentionally NOT merged (a real param-name divergence:
table vs tableRef) — documented on both sides. Other per-transport divergences
(search/share/create_comment/transform/list_pages) are left separate with a short comment
explaining why (the issue asked to flag these as intentional). DocmostClientLike stays a
hand-mirror (the ESM/CJS boundary blocks a compile-time type import; a runtime drift-guard
already pins it). Also fixes a latent contract-spec bug: derive `required` from
`instanceof z.ZodOptional` (matches the emitted JSON schema) instead of `isOptional()`,
which wrongly reported z.any() fields as optional.
Partially addresses #294.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
For AI-agent-authored content (comments + page history), replace the text AI-AGENT
badge with an avatar stack: the agent in front, the human who launched it smaller and
behind. This fixes the inverted hierarchy (the action was the agent's; the human just
launched it). closes#300.
Backend: a single server-authoritative resolver resolveAgentProvenance normalizes to
{ agent, launcher } from server columns only (createdSource/lastUpdatedSource, aiChatId,
creator, chat role) — nothing from request input, so agent identity can't be spoofed.
Internal chat -> agent = chat role (name/emoji), launcher = human; external MCP
(aiChatId null) -> agent = the agent account, launcher = null; non-agent -> neither.
The role join (aiChatId -> ai_chats.role_id -> ai_agent_roles) deliberately does NOT
filter enabled/deleted_at, so a later-disabled role still labels historical content
(mirrors findById, not findLiveEnabled). Enrichment is applied on BOTH findPageComments
(list) AND findById (the create/resolve/update broadcast path), so the stack shows on
live comment events and doesn't vanish on resolve/edit.
Frontend: new AgentAvatarStack + AgentGlyph (avatarUrl -> role emoji on violet ->
IconSparkles on violet), integrated into comment-list-item and history-item where the
badge was; the deep-link-to-chat click moved onto the stack. ai-agent-badge removed.
Tests: AgentAvatarStack (role/no-role/MCP/click/non-clickable), the provenance resolver
+ recorder tests proving the role join never filters enabled/deleted, and findById
enrichment (guards the live-broadcast regression).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reviewer round 1 on the #260 collab-doc-name fix:
- F1: replaceImage is the one path where the resolved UUID gates BOTH the
collab-doc open AND the per-page mutex key (withPageLock(pageUuid)). Add a
deterministic test to resolve-page-id-collab-doc-name.test.mjs: it gates
/files/upload so replaceImage parks mid-upload holding its lock, asserts the
doc opened as page.<uuid> (never page.<slug>), and probes the SHARED
page-lock chain — a withPageLock(UUID) probe must stay blocked while
replaceImage holds it (with a free-key probe as a non-vacuity guard). The
test fails if the lock key is reverted to the slugId (verified).
- F2: drop the dead `pageIdCache.set(uuid, uuid)` — resolvePageId returns on
the isUuid() short-circuit before the cache is ever read with a uuid key, so
only slugId->uuid entries are stored/read. Comment corrected to match.
MCP suite 430/430, tsc 0.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Real root cause of the silent MCP edit loss: the web editor always opens the
collaboration document by the page UUID (`page.${page.id}`), but the MCP
opened it by the agent-supplied id — usually a slugId — so `page.${pageId}`
became `page.<slugId>`. For one DB page that is TWO independent Yjs documents;
both persist to the same `pages` row (findById/updatePage resolve id or
slugId), so the human tab's debounced store overwrites the agent edit
(last-store-wins) — gone after reload, never shown live. The slugId doc also
made the server's transclusion sync + embedding reindex throw Postgres 22P02.
Fix:
- MCP (primary): resolvePageId(pageId) returns the canonical UUID — a UUID
short-circuits with no network call, a slugId resolves once via getPageRaw
and is cached both ways. Every collab-write path (mutatePageContent /
updatePageContentRealtime / replacePageContent and the mutate/replace/
unlocked seams) now opens by the resolved UUID, so the MCP and the editor
share ONE Yjs doc. replaceImage's whole-operation page lock also keys on the
UUID so it serializes against the other (now-UUID-keyed) writes.
- Server (defense + kills the 22P02 noise): onStoreDocument passes the resolved
page.id — not the raw doc-name id — to syncTransclusion, the embedding queue,
the mention-notification job, addContributors, and the in-tx history read.
Content store and the empty-guard are untouched.
Tests: a new MCP test stands up a real Hocuspocus server and asserts a slugId
input opens `page.<uuid>` (never `page.<slugId>`), with UUID short-circuit and
single-resolve caching; the server spec asserts the side-effects receive the
UUID for a `page.<slugId>` doc. closes#260
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
stashPage is declared in the server's DocmostClientLike interface and
shipped as the stash_page MCP tool (client.ts, tool-specs.ts, index.ts),
but the hand-maintained HOST_CONTRACT_METHODS mirror in the contract test
was never updated — so the drift-guard test failed and broke CI's
unit-test job. Add the missing name; both directions now agree.
F1 (data loss): packages/mcp keeps its own copy of the document schema
(AGENTS.md), and the spoiler mark was only added to editor-ext + the server
tiptapExtensions, so a doc with a spoiler silently lost the mark through /mcp.
Add a local Spoiler mark to docmostExtensions (span[data-spoiler] parse,
data-spoiler="true"+class render) and a case "spoiler" in markdown-converter
emitting the same <span data-spoiler="true">…</span> as the editor-ext turndown
rule; add an MCP json->md->json round-trip test. Regenerated build/lib output.
F2: add the #259 inline-spoiler entry to CHANGELOG [Unreleased] Added.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
uploadImage is internal to client.ts (called by insertImage/replaceImage);
the MCP transport (index.ts) does not call it directly. Remove it from the
comment's list of transport-called methods.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extract pure extractAuthTokenFromSetCookie from performLogin (behavior-identical)
so cookie parsing is unit-testable without a network login. Add round-trip
coverage for media attrs (width/height/align/drawio/escaping) the existing
suite omitted; applyAnchorInDoc selection/ambiguity/atom-break cases; and a
cross-copy drift guard proving the vendored editor-ext recreate-transform and
the @fellow npm copy used by diff.ts emit identical steps (apply(diff)==target).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The DocmostClientLike mirror covers only methods the in-app adapter consumes;
the standalone MCP transport calls additional client methods not tracked here
(covered by its own typecheck). Fixes the misleading 'superset' wording (F2).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A captioned image in a column is emitted via the imageToHtml helper, a
separate path from the top-level image case whose data-caption branch was
untested. Add a round-trip test with special chars (Tom & "Jerry") that
fails if the imageToHtml caption branch breaks.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The comment referenced markdownToHtml, which does not exist in the mcp
package; the import path is marked.parse + generateJSON (which runs the
image extension's parseHTML). Describe the actual step and regenerate the
build artifact in sync.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Architect-review hardening of the bidirectional DocmostClientLike <->
HOST_CONTRACT_METHODS guard (test-only, no production change):
- Interface method-name regex now accepts full TS identifiers
(digits/_/$) and generic signatures (method<T>(), avoiding a future
benign false-FAIL.
- Skip /* ... */ block comments in the interface body so a `name(` line
inside one is not falsely parsed as a method.
- Wrap the cross-package readFileSync with a clear "expected monorepo
layout" error instead of a bare ENOENT when run outside the monorepo.
- Narrow the guard's comments/error to state plainly it checks the
method-NAME set only; signature parity remains the deferred staged-plan
item.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add PM -> markdown -> PM round-trip assertions for image caption
(plain and special-char), which fail without F1 and pass with it.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Stock @tiptap/extension-image carries no caption attribute, so
markdownToProseMirror through docmostExtensions dropped the
data-caption the client emits, breaking the lossless claim. Extend the
Image node (mirroring editor-ext image.ts and the nearby Highlight
extend) to parse/render data-caption. Rebuilt build/.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The contract test only checked one direction (each name in
HOST_CONTRACT_METHODS exists on the real DocmostClient). But
HOST_CONTRACT_METHODS is itself a hand-copy of the server's
DocmostClientLike interface (docmost-client.loader.ts), and that
list<->interface link was untested: a method added to the interface +
consumed by the adapter but forgotten in the list (or removed from the
interface but left in the list) would escape both the server typecheck
(the pkg emits no .d.ts) and the existing test (name not in the list) ->
a runtime "x is not a function" in a tool call.
Parse the method names from the DocmostClientLike interface body (read
the .ts source via import.meta.url, scan member-signature lines) and
assert.deepEqual them against HOST_CONTRACT_METHODS BOTH ways. Lists are
currently identical (39=39), so this is a coverage hole closed, not a
live bug.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mandatory (test-coverage):
- internal-file-urls.test: pin the SSRF/traversal ACCEPT path of
resolveInternalFilePath (the sole guard for content-controlled `src`): an
absolute/protocol-relative URL has its foreign host dropped and only an
/api/files/ pathname survives (http://evil.com/api/files/x/y.png -> /files/x/y.png),
while a host-dropped path that escapes /api/files/ (https://evil.com/api/auth/whoami)
or a backslash-traversal (/api/files\..\auth\whoami) is rejected. Locks the
behavior so a future prefix-only refactor cannot silently open a bypass.
Suggestions:
- index.ts: the stash_page MCP tool now returns structuredContent
{ uri, sha256, size, images } alongside the resource_link, so the MCP output
matches the documented shape (clients get the blob's sha256/ETag and the
mirror counts, not just the link). No outputSchema registered. Rebuilt build/.
- new stash-page-mcp-result.test: server round-trip via InMemoryTransport asserts
both the resource_link and the structuredContent mirror.
- internal-file-urls.test: cover the new URL parse-failure catch branch
(http://[ -> "Invalid internal file src").
- environment.service.spec: assert getPositiveIntEnv warns once per key and
independently across keys (the invalidPositiveIntWarned dedup).
Tests: packages/mcp 383 pass; apps/server sandbox/environment/mcp 235 pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Security (must-fix):
- sandbox.controller: the anonymous GET /api/sb/:id response now sets
X-Content-Type-Options: nosniff, a restrictive CSP, and Content-Disposition=
attachment for any mime outside a raster-image allowlist (png/jpeg/gif/webp/
avif). entry.mime is attacker-controlled, so an evil.svg/evil.html could
otherwise execute script inline on the Docmost origin (stored XSS). Mirrors
the public attachment route's hardening.
Stability:
- client.stashPage: reconcile mirrors AFTER the final document put, not only
before it. The doc blob is the newest entry and FIFO eviction drops the
oldest = this stash's own images, so the stored doc could reference an
evicted blob (consumer 404) and over-report images.mirrored. A bounded loop
now reverts doc-put-evicted mirrors, drops the stale doc blob, and re-puts
until stable. Regenerated packages/mcp/build/.
- sandbox.controller: emit Cache-Control on the 304 branch too (ttlSeconds is
computed before the conditional check).
Docs:
- Bump the MCP tool count 39 -> 40 across all READMEs and AGENTS.md (the
registry now exposes exactly 40 tools).
Refactor:
- SandboxStore.asSink() centralizes the {put,has,evict} sink + uri<->id
mapping; the embedded-MCP and in-app agent-tools wiring sites share it.
Tests:
- security headers (inline vs attachment, nosniff, CSP), 304 Cache-Control,
putAndLink URL form, has()/remove(), asSink() round-trip, getSandboxPublicUrl
(trailing-slash trim + APP_URL fallback), and a stash test where the doc put
itself evicts a mirrored image.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Security:
- stash_page: reject path-traversal / percent-encoded srcs before the authed
loopback fetch (resolveInternalFilePath), closing an SSRF/exfiltration hole
where a crafted node.attrs.src could read an arbitrary internal GET endpoint
into the anonymous sandbox.
Stability:
- stash_page: revert + recount mirrors FIFO-evicted by a later put in the same
stash (no dangling sandbox refs, honest images.mirrored/failed); free image
blobs if the final document put throws.
- Reject/clamp non-positive SANDBOX_TTL_MS to the 1h default (warn once).
- Log mirror failures unconditionally (console.warn, no blob bodies).
Cleanup / architecture:
- Remove dead expiresAt from SandboxPutResult.
- Centralize the /api/sb route in SANDBOX_ROUTE_SEGMENT/SANDBOX_API_PATH and
move URL composition into SandboxStore.putAndLink; drop the duplicated sink
closures and the now-unused EnvironmentService injection from McpService and
AiChatToolsService.
- Un-export isInternalFileUrl; document the process-local (instance-bound)
sandbox limitation in the tool description and .env.example.
Docs/tests:
- README/README.ru: 38 -> 39 tools + stash_page entry.
- Add traversal/normalize/recursion unit tests, stash self-eviction +
doc-put-throw + empty/octet-stream mock tests, controller If-None-Match
(wildcard/weak/list) + Cache-Control tests, and SANDBOX_TTL_MS validation
tests. Regenerate packages/mcp/build.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add an ephemeral, process-local blob store so the in-app agent (and the
embedded MCP) can hand a large page document and its images to an external
consumer WITHOUT routing the bytes through the model context or Docmost auth.
- SandboxStore (@Injectable singleton): Map<uuid,{buf,mime,sha256,expiresAt}>
in RAM only. put() picks a per-blob cap by mime (image vs doc), enforces a
total-bytes RAM guard with oldest-first eviction, and stamps a TTL; get()
lazily expires. sha256 computed at put() doubles as the strong ETag. An
unref'd sweep interval clears expired entries and is cleared on destroy.
- GET /api/sb/:uuid anonymous controller: serves raw bytes with Content-Type,
Content-Length and ETag=sha256; 404 on missing/expired/non-UUID (anti-
traversal), 304 on a matching If-None-Match. No tokens, no 401 — the
capability is the unguessable UUID + short TTL + TLS. Auth-exempt the same
way as /api/files/public (no JwtAuthGuard) plus an /api/sb entry in main.ts's
workspace-resolution preHandler so a remote consumer with no workspace host
is not rejected.
- stash_page tool in both layers (MCP resource_link + in-app {uri,size,sha256,
images}). client.stashPage serializes the get_page_json shape, mirrors every
INTERNAL file/image src (type-agnostic, covers drawio/excalidraw/video/file)
into the sandbox under Docmost auth and rewrites src to the sandbox URL;
external http(s) srcs are left untouched; dedup by src; a failed image fetch
is counted, never aborts the doc.
- SANDBOX_PUBLIC_URL / SANDBOX_TTL_MS / SANDBOX_MAX_BYTES /
SANDBOX_MAX_IMAGE_BYTES / SANDBOX_MAX_TOTAL_BYTES wired through the
environment service + validation + .env.example.
- SandboxModule (@Global) provides the shared store to the controller,
McpService and AiChatToolsService (same instance for put and get).
Tests: SandboxStore (round-trip, sha256, TTL lazy + sweep, caps, eviction),
SandboxController (200+ETag+CT+CL, 404 missing/expired/non-UUID, 304), and a
mock-HTTP stashPage test (mirror+rewrite internal, keep external, dedup, failed
image counted, returns only a link). Interoperates with the vvzvlad/habr-mcp
consumer's anonymous-GET + sha256-ETag + resource_link contract.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Issue #193's tool-half has two open items. The shared, zod-agnostic tool-spec
registry (SHARED_TOOL_SPECS) for the identical tools is already merged
(f3fa15e7) and consumed by both layers, so that subset is done. The remaining
items are: (a) deriving the layer-3 hand-mirror `DocmostClientLike` from the
real client type, and (b) folding more tools into the registry. Both were
deferred as risky, and that deferral still holds (verified, see below) — so
this change ships the safest concrete increment instead of forcing the risk.
What this adds (behaviour-neutral, test-only + a doc comment):
- packages/mcp/test/unit/client-host-contract.test.mjs: pins the layer-3
contract from the ESM side, where the real DocmostClient is importable. It
asserts every method the in-app `DocmostClientLike` mirror declares exists as
a function on a real DocmostClient instance (constructor is side-effect-free).
A rename/removal in client.ts now fails this test instead of silently shipping
a runtime "x is not a function" into an agent tool call. Negative-case
verified (a bogus method name is detected).
- docmost-client.loader.ts: replaces the vague mirror comment with a pointer to
the guard test and a concrete, empirically-grounded staged plan for the full
type-derivation. Verified blockers kept it deferred: @docmost/mcp emits no
.d.ts (no `declaration`, no `types` export) and the server has no path mapping
for it, so there is no type to import today; and the real methods' inferred
CONCRETE return types conflict with the in-app adapter's loose
Record<string,unknown> + `as`-cast result handling (deriving the exact type
breaks the build / forces pervasive double-casts and full-surface test stubs).
Out of scope (noted in the issue): the PM<->Markdown converter unification.
Verified: server tsc clean; mcp tsc clean; mcp tests 369 pass (367 + 2 new);
ai-chat tools specs 51 pass. No behaviour change; committed mcp build untouched
(no mcp src changed).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a visible caption (<figcaption>) under images, editable from the
image bubble-menu and persisted across all formats: native Yjs/JSON,
HTML export, and Markdown.
- image node: new plain-text `caption` attribute (parse/render
`data-caption` on <img>, emitted only when set) + `setImageCaption`
command. The node stays an atom; the schema shape is unchanged, so the
server's generateHTML/generateJSON path round-trips it for free.
- resize node-view: re-parent the resizable wrapper into a <figure> and
render the caption in a <figcaption> BELOW it, outside nodeView.wrapper
(so onCommit's offsetHeight measurement and the left/right resize
handles still cover the image only). This path also drives read-only /
share rendering. React placeholder view renders the caption too.
- bubble-menu: new useCaptionControl panel modeled on useAltTextControl
(own icon, Caption strings, softer sanitizer, ~500 char limit).
- markdown lossless round-trip: a captioned image is emitted as a raw
<img data-caption> wrapped in a block <div> (same trick as <video>) in
both the editor-ext turndown rule and the MCP converter; caption-less
images stay clean . Import restores the caption via the
shared markdownToHtml + parseHTML.
- styles + i18n keys; tests for the schema attr round-trip, markdown
round-trip (editor-ext) and the MCP converter.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Review #6 (approve-with-comments) follow-ups:
1. canonicalize step 7 now strips bare footnoteDefinitions at ANY depth
(stripFootnoteDefinitionsDeep), not just footnotesList, in BOTH copies. A
definition hand-authored outside a list (e.g. nested in a callout via a
raw-JSON write path) was left in place while a copy was also added to the
rebuilt list -> duplicate, idempotent, self-perpetuating. Runs only in the
rebuild path (after the lists are stripped); the fast-path / placement-keep
branch is untouched. Added a shared-corpus case (bare def nested in a callout)
to pin it in both mirrors.
2. markdown-clipboard: removed the dead top-level footnoteReference check in
canonicalizePastedFootnotes (an inline atom is never a top-level slice child;
only the descendants scan can find it).
Test coverage:
4. New MCP binding tests (full-doc-write-canonicalize.test.mjs): update_page_json
and copy_page_content canonicalize the persisted full doc, asserted via a new
`replacePage` seam (symmetric to the existing `mutatePage` seam) so no live
collab socket is needed. Routed both writers through the seam.
5. New server spec (file-import-task.service.footnote-canonicalize.spec.ts): the
zip-import path (processGenericImport) canonicalizes footnotes — real
markdown->HTML->JSON via a real ImportService over a temp-dir .md file, DB trx
stubbed to capture the persisted page content. FileImportTaskService had no
spec before.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Must-fix:
- insertInlineFootnote could glue a footnoteReference inside an EXISTING
definition (nested footnotesList, or a bare footnoteDefinition with no list
wrapper), which canonicalize then dropped as an orphan — silently losing the
definition's prose. Now: (a) the body/notes boundary is computed from the first
top-level block that IS or CONTAINS (recursively) a footnotesList/
footnoteDefinition, not just a top-level list; and (b) the insertNodesAfterAnchor
core skips footnotesList/footnoteDefinition subtrees entirely (skipSubtreeTypes),
so an anchor whose only match is inside a definition -> inserted:false (clean
abort, no write). Added tests: nested-definition, bare-definition, and
body-before-nested-list-still-inserts.
- editor-ext footnote-canonicalize header listed `markdownToProseMirror` among the
canonicalizing MCP paths; it is the NON-canonicalizing primitive. Replaced with
`markdownToProseMirrorCanonical` (+ note that the plain primitive is for comment
bodies) and added copy_page_content.
- Client paste: canonicalizePastedFootnotes now skips a definitions-ONLY paste
(no footnoteReference anywhere) — canonicalizing it would strip the
reference-less list and yield an EMPTY paste. Added a test.
Suggestions:
- docmost_transform now runs validateDocStructure/validateDocUrls on the RAW
transform output BEFORE canonicalizeFootnotes (mirrors updatePageJson), so a
too-deep doc gives the intended max-depth error instead of a stack overflow.
- docmost_transform tool description now states the RESULT is footnote-canonical
(dryRun diff may show tidy-ups; idempotent after first run).
- insertFootnote: dropped the dead `result ? … : undefined` ternaries and the
`as any` casts (result is always set by the time we return; the not-found path
throws and aborts mutatePage). `const r = result!;`.
Tests / architecture:
- Added a LIVE-plugin golden case: the real footnoteSyncPlugin leaves a list with
non-empty content after it in place, and canonicalize agrees (placement parity
is now a driven property, not a hand-set expected).
- Added generateFootnoteId uuidv7 shape + uniqueness test.
- Item 9: added the ENFORCEMENT-RULE comments at the server parseProsemirrorContent
and the MCP canonicalizer header (any NEW full-doc persist path MUST canonicalize;
fragments/append/prepend and comment bodies MUST NOT). Kept per-call-site over a
brittle grep CI test (the replace-vs-fragment + comment-vs-page nuance makes a
single wrapper unsafe).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Must-fix (REAL DATA LOSS):
- markdownToProseMirror is reused for COMMENT bodies (createComment/updateComment).
It unconditionally canonicalized, so a comment carrying a standalone footnote
definition ([^1]: text with no matching reference) had its whole footnotesList
stripped (referenceIds.length===0 -> stripFootnotesListsDeep) — the text
vanished. Fix: markdownToProseMirror no longer canonicalizes (content-preserving
primitive); a new markdownToProseMirrorCanonical wraps it for the PAGE write
paths (markdown import via importPageMarkdown, update_page markdown via
updatePageContentRealtime). Comment callers keep the non-canonicalizing
primitive. Updated the now-false header comment and added create/update-comment
inline notes. Added collaboration tests: comment path PRESERVES a reference-less
definition; page path still drops it AND still reorders real footnotes. Updated
the page-import canonicalization test to use the canonical variant.
Suggestions / architecture:
- #2: collapsed transforms.footnoteDefinition onto the shared
makeFootnoteDefinition factory (adds only the inner paragraph block id); kept
the dependency direction transforms -> footnote-authoring (no circular import,
mirror stays pure).
- #3: confirmed docmost_transform auto-canonicalization is documented (inline
comment, tool description, CHANGELOG) — no code change.
- #4: copyPageContent is a FULL-document write (replacePageContent of a
type:"doc"); added a defensive canonicalizeFootnotes pass (no-op on
already-canonical source).
- CHANGELOG entry refined to list the FULL-document write paths (incl.
copy_page_content) and to state canonicalization is NOT applied to comment
bodies.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Must-fix:
- REAL BUG: insertInlineFootnote could splice a footnoteReference (inline atom)
into a codeBlock or an existing footnoteDefinition, persisting a schema-invalid
doc (insert_footnote skips validateDocStructure). Now the search is bounded to
the BODY (before the first footnotesList) and the insertNodesAfterAnchor core
refuses textblocks that can't hold the atom (codeBlock); when the only match is
in such a place the insert returns inserted:false and the write aborts cleanly.
Reachable via docmost_transform too. Added codeBlock / definition / fall-through
tests.
- Fixed the deepEqualJson doc comment in both copies: arrays are order-SENSITIVE
(correctness depends on it), only object keys are order-insensitive.
- README.ru.md MCP tool count 38 -> 39 (lines 36/47/63), matching README.md/AGENTS.
- CHANGELOG [Unreleased] Added entry for insert_footnote + server-side footnote
canonicalization on non-editor write paths (#228).
Suggestions:
- canonicalize step 5/7 now strips footnotesList at ANY depth (both copies), so a
schema-valid list nested in a callout/blockquote can't leave duplicate defs.
- Exclude the test-only footnote-corpus.ts fixture from the editor-ext build
(tsconfig), so it no longer ships in dist/.
- Removed the duplicate manual canonicalize cases from the MCP unit test (the
shared corpus covers them via full deepEqual); kept idempotence + immutability.
- insertInlineFootnote dedup key now keys off the inline array directly
(footnoteContentKey({ content: inline })) instead of a throwaway node.
Tests / architecture:
- New client-wrapper test (#9): overrides a small mutatePage seam to assert the
not-found path throws and persists NOTHING, and the success path shapes
footnoteId/reused/message/verify and writes the right content. Fixed the
misleading comment in footnote-write.test.mjs.
- B: cross-copy corpus parity guard test (loads both corpora, asserts deep-equal)
so a typo in one copy can't pass both suites green.
- A: declined — the full-vs-fragment decision lives at the call site, so a
prepareDocForPersist wrapper would be a bare alias for canonicalizeFootnotes;
kept the existing per-call-site comments instead.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Must-fix:
- Move canonicalizeFootnotes OUT of parseProsemirrorContent. It now runs only
on FULL writes (createPage, updatePageContent operation==='replace'), never on
an append/prepend fragment (a fragment would lose definition-only footnotes or
synthesize a bogus empty list). Add a server binding spec.
- Match the live plugin's list PLACEMENT: a single already-canonical
footnotesList is left exactly where it sits (the plugin never repositions a
sole correct list), so the first write no longer reorders content that follows
the list. Applied to BOTH the editor-ext copy and the MCP mirror; pinned by a
shared golden corpus case with content after the list.
- Fix MCP tool count 38 -> 39 (README x3, AGENTS.md) and the transformJs param
help (add canonicalizeFootnotes/insertInlineFootnote).
Simplifications:
- Remove the dead duplicate re-id mechanism (deriveFootnoteId/suffix/occurrence)
from the PURE canonicalizer in both copies — references are never renamed, so
the derived ids were never requested; first-wins-drop is the real behaviour.
This also makes the editor-ext footnote-util note about "no cross-package copy"
true again.
- Remove the sentinel round-trip in insertInlineFootnote: a generalized
insertNodesAfterAnchor core inserts the footnoteReference node directly.
- Drop the redundant per-definition deep clone in step 4 (shallow id-normalizing
copy; out is already deep-cloned).
Docs / architecture:
- Correct the editor-ext copy's "It exists because…" header to its real
consumers (server import, page.service create/update, client paste).
- Note markdownToProseMirror reuse for create/update comment in collaboration.ts.
- A: shared golden JSON corpus exercised by BOTH the editor-ext copy and the MCP
mirror (footnote-corpus.ts / .mjs) so "the two copies behave identically" is
checkable.
- C: split the MCP canonicalizer into a pure mirror + footnote-authoring.ts.
- B: import services persist via a different path, so left one-line consolidation
comments at the call sites rather than folding (does not fall out cleanly).
Tests: insertFootnote wrapper guards + docmost_transform dryRun auto-canonicalize
(MCP mock), page.service create/update + append/prepend binding (server jest),
shared corpus incl. nested-container reference.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make footnotes author-inline: the agent/tool inserts a footnote at its point
of use (anchor + text) and the numbering plus the bottom list are DERIVED
deterministically server-side. The agent has no access to footnotesList and
cannot desync — out-of-order lists, orphan definitions, and raw trailing
[^id] blocks become structurally impossible.
editor-ext:
- canonicalizeFootnotes(docJSON) -> docJSON: a pure, EditorView-free port of
footnoteSyncPlugin's end-state. Distinct reference ids in document order are
the source of truth; exactly one trailing footnotesList holds one definition
per referenced id in reference order (reusing the existing node or
synthesizing an empty one); orphans dropped; duplicate definitions resolved
deterministically (first wins, never lost); idempotent.
- Unit tests + a golden parity suite: on every editor-reachable steady state
the live footnoteSyncPlugin's JSON is a canonicalize no-op (byte-for-byte
parity), and the canonicalizer additionally repairs the out-of-order list a
non-editor write produces.
mcp:
- footnote-canonicalize.ts: behavioural mirror of the editor-ext canonicalizer
(the MCP package is intentionally decoupled from the editor barrel, like
footnote-lex/docmost-schema), plus footnoteContentKey for content dedup.
- Auto-canonicalize on EVERY write path: markdownToProseMirror (fixes import
ordering), update_page_json, and after every docmost_transform. Idempotent,
so it is a no-op when footnotes are already canonical.
- insert_footnote tool + insertInlineFootnote: anchor + markdown text -> a
mark-safe footnoteReference and a content-dedup'd definition; the list and
numbering are derived. Same-content footnotes reuse one number/definition.
- canonicalizeFootnotes + insertInlineFootnote exposed as docmost_transform
sandbox helpers.
Tests: editor-ext 157 green; MCP 325 green; server + client tsc clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>