diff --git a/AGENTS.md b/AGENTS.md index 50f86b17..e8eed03d 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -254,7 +254,7 @@ The API server is a Fastify app with a global `/api` prefix (`main.ts` excludes - **Redis** backs caching, the BullMQ queues, the WebSocket Socket.IO adapter, and collaboration sync. ### The two AI subsystems (the main fork additions) -1. **Embedded MCP server** (`integrations/mcp/` + `packages/mcp`). The standalone `@docmost/mcp` server (38 agent-native tools: per-block patch/insert/delete by id, scripted `(doc)=>doc` transforms with dry-run diff, table editing, version diff/restore, comments, images, shares) is bundled and served over HTTP at `/mcp`. It writes through Docmost's real-time-collaboration layer so concurrent human edits aren't clobbered. Each request authenticates **per-user** via the `Authorization` header — either HTTP Basic (`base64(email:password)`, the user's own Docmost login, validated through `AuthService`) or a Bearer access JWT (the user's `authToken`) — and the session acts under that user's permissions. `MCP_DOCMOST_EMAIL` / `MCP_DOCMOST_PASSWORD` are an **optional service-account fallback**, used only when a request carries neither Basic nor Bearer credentials (back-compat for CI/scripts). An admin enables MCP with a workspace toggle (Workspace settings → AI). Optionally protected by a shared `MCP_TOKEN`: when set, every `/mcp` request must carry a matching `X-MCP-Token` header (its own header, separate from `Authorization`, which now carries the per-user Basic/Bearer credentials). Note: this changed from the older `Authorization: Bearer ` scheme — see `.env.example` and the CHANGELOG Breaking Changes entry. +1. **Embedded MCP server** (`integrations/mcp/` + `packages/mcp`). The standalone `@docmost/mcp` server (39 agent-native tools: per-block patch/insert/delete by id, scripted `(doc)=>doc` transforms with dry-run diff, table editing, version diff/restore, comments, images, shares) is bundled and served over HTTP at `/mcp`. It writes through Docmost's real-time-collaboration layer so concurrent human edits aren't clobbered. Each request authenticates **per-user** via the `Authorization` header — either HTTP Basic (`base64(email:password)`, the user's own Docmost login, validated through `AuthService`) or a Bearer access JWT (the user's `authToken`) — and the session acts under that user's permissions. `MCP_DOCMOST_EMAIL` / `MCP_DOCMOST_PASSWORD` are an **optional service-account fallback**, used only when a request carries neither Basic nor Bearer credentials (back-compat for CI/scripts). An admin enables MCP with a workspace toggle (Workspace settings → AI). Optionally protected by a shared `MCP_TOKEN`: when set, every `/mcp` request must carry a matching `X-MCP-Token` header (its own header, separate from `Authorization`, which now carries the per-user Basic/Bearer credentials). Note: this changed from the older `Authorization: Bearer ` scheme — see `.env.example` and the CHANGELOG Breaking Changes entry. 2. **AI agent chat** (`core/ai-chat/` server + `apps/client/src/features/ai-chat/` client). A built-in agent over the wiki using the Vercel **AI SDK** (`ai`, `@ai-sdk/*`) against any OpenAI-compatible provider configured per workspace (`integrations/ai/` — credentials encrypted at rest via `integrations/crypto`, stored in `ai_provider_credentials`). Key pieces: - `core/ai-chat/tools/` — the agent's ~40 read+write tools. Every tool runs under the **calling user's** CASL permissions via a per-user loopback access token (`docmost-client.loader.ts`), so the agent can never exceed what the user could do. Only **reversible** operations are exposed (page history + trash; no permanent delete). Agent edits get an "AI agent" provenance badge in page history (`20260616T130000-agent-provenance` migration). - `core/ai-chat/embedding/` — RAG indexer + a BullMQ consumer on `AI_QUEUE` that embeds pages into `page_embeddings` (vector search), complementing Postgres full-text search. Pages are (re)indexed on edit; `AI_EMBEDDING_TIMEOUT_MS` bounds a hung embeddings endpoint. diff --git a/README.md b/README.md index cbbbdcab..8fd95cf5 100644 --- a/README.md +++ b/README.md @@ -34,7 +34,7 @@ The goal of the fork is a **100% open, AGPL-only build with no Enterprise-Editio | --- | --- | | **EE code removed** | Stripped all client and server Enterprise-Edition code; ships as a clean community/AGPL build with no license checks. | | **Comment resolution** | Re-implemented from scratch as a community feature (resolve / re-open with Open/Resolved tabs). No EE code reused, available to anyone who can comment. | -| **Embedded MCP server** | A community MCP server (`@docmost/mcp`, 38 tools) is served over HTTP at `/mcp` — no enterprise license required. Replaces the removed license-gated EE MCP. | +| **Embedded MCP server** | A community MCP server (`@docmost/mcp`, 39 tools) is served over HTTP at `/mcp` — no enterprise license required. Replaces the removed license-gated EE MCP. | | **AI agent chat** | Built-in AI agent chat over your wiki, written from scratch as a community feature — no enterprise license. The agent reads and edits pages on your behalf (scoped to your permissions), with full-text + vector (RAG) search and optional web access via external MCP servers. | | **Rebranding** | App logo / name changed from *Docmost* to *Gitmost*. | | **Compact page tree** | Default page-tree indentation reduced from 16px to 8px per nesting level. | @@ -44,7 +44,7 @@ The goal of the fork is a **100% open, AGPL-only build with no Enterprise-Editio ### Embedded MCP server Gitmost has **our own MCP server** — [docmost-mcp](https://github.com/vvzvlad/docmost-mcp), -which we wrote — **built directly into the app** and served at `/mcp`. It exposes **38 +which we wrote — **built directly into the app** and served at `/mcp`. It exposes **39 agent-native tools**: surgical per-block edits (patch / insert / delete by id), structure-preserving find/replace, scripted `(doc) => doc` transforms with a dry-run diff, structured table editing, version history with diff / restore, comments, images and share @@ -60,7 +60,7 @@ every little fix. And it needs no enterprise license. | | **Gitmost `/mcp` (our docmost-mcp)** | Docmost's built-in MCP | | --- | :---: | :---: | | **Enterprise license** | Not required | Required | -| **Tools** | 38, agent-native | Coarse (read Markdown, page CRUD, replace whole page) | +| **Tools** | 39, agent-native | Coarse (read Markdown, page CRUD, replace whole page) | | **Per-block edits / find-replace / scripted transforms** | ✅ | — | | **Structured table editing, version diff / restore** | ✅ | — | | **Comments, images, share links** | ✅ | — | diff --git a/apps/server/src/core/page/services/page.service.footnote-canonicalize.spec.ts b/apps/server/src/core/page/services/page.service.footnote-canonicalize.spec.ts new file mode 100644 index 00000000..3d2dac75 --- /dev/null +++ b/apps/server/src/core/page/services/page.service.footnote-canonicalize.spec.ts @@ -0,0 +1,153 @@ +// Binding test for issue #228 must-fix #1 / test-coverage #12: footnote +// canonicalization moved OUT of parseProsemirrorContent and is now applied only +// on FULL-document writes (createPage, and updatePageContent with operation +// 'replace'), NEVER on an append/prepend FRAGMENT. +// +// The Yjs encode / plain-text extract are stubbed (partial module mock keeps the +// REAL canonicalizeFootnotes) and parseProsemirrorContent is spied to return the +// raw fixture, so the test isolates the canonicalize BINDING from schema/Yjs. +jest.mock('@docmost/editor-ext', () => { + const actual = jest.requireActual('@docmost/editor-ext'); + return { + ...actual, + createYdocFromJson: jest.fn(() => Buffer.from([])), + jsonToText: jest.fn(() => ''), + }; +}); + +import { PageService } from './page.service'; + +const refNode = (id: string) => ({ type: 'footnoteReference', attrs: { id } }); +const defNode = (id: string, text: string) => ({ + type: 'footnoteDefinition', + attrs: { id }, + content: [{ type: 'paragraph', content: [{ type: 'text', text }] }], +}); +const doc = (...content: any[]) => ({ type: 'doc', content }); + +/** A full doc whose footnote definitions are OUT of reference order (b,a refs; + * a,b defs) — canonicalization must reorder the definitions to [b, a]. */ +const outOfOrderFull = () => + doc( + { type: 'paragraph', content: [{ type: 'text', text: 'x' }, refNode('b'), refNode('a')] }, + { type: 'footnotesList', content: [defNode('a', 'A'), defNode('b', 'B')] }, + ); + +/** A definition-ONLY fragment (no references): canonicalizing it would drop the + * whole footnotesList (referenceIds is empty) — i.e. LOSE the footnote. */ +const defOnlyFragment = () => + doc({ type: 'footnotesList', content: [defNode('a', 'appended note')] }); + +/** A reference-only fragment that REUSES an id defined elsewhere in the live + * doc: canonicalizing it would synthesize a bogus empty footnotesList/def. */ +const refReuseFragment = () => + doc({ type: 'paragraph', content: [{ type: 'text', text: 'more' }, refNode('a')] }); + +function listDefIds(content: any): string[] { + const list = (content.content ?? []).find((n: any) => n.type === 'footnotesList'); + return (list?.content ?? []) + .filter((n: any) => n.type === 'footnoteDefinition') + .map((n: any) => n.attrs?.id); +} +function hasFootnotesList(content: any): boolean { + return (content.content ?? []).some((n: any) => n.type === 'footnotesList'); +} + +describe('PageService footnote canonicalization binding (#228)', () => { + function makeService() { + let insertedContent: any = null; + let yjsPayload: any = null; + + const pageRepo = { + insertPage: jest.fn(async (values: any) => { + insertedContent = values.content; + return { id: 'page-id', slugId: 'slug-id' }; + }), + }; + const generalQueue = { add: jest.fn().mockReturnValue({ catch: jest.fn() }) }; + const collaborationGateway = { + handleYjsEvent: jest.fn(async (_evt: string, _name: string, payload: any) => { + yjsPayload = payload; + }), + }; + + const service = new PageService( + pageRepo as any, + {} as any, // pagePermissionRepo + {} as any, // attachmentRepo + {} as any, // db + {} as any, // storageService + {} as any, // attachmentQueue + {} as any, // aiQueue + generalQueue as any, + {} as any, // eventEmitter + collaborationGateway as any, + {} as any, // watcherService + {} as any, // transclusionService + ); + // Isolate the canonicalize BINDING: return the raw fixture (a deep clone so + // canonicalize never mutates the caller's object) instead of running the + // real markdown/HTML/JSON parse + schema validation. + jest + .spyOn(service as any, 'parseProsemirrorContent') + .mockImplementation(async (content: any) => structuredClone(content)); + jest.spyOn(service as any, 'nextPagePosition').mockResolvedValue('a0'); + + return { service, getInsertedContent: () => insertedContent, getYjsPayload: () => yjsPayload }; + } + + it('createPage (full write) canonicalizes footnotes into reference order', async () => { + const { service, getInsertedContent } = makeService(); + await service.create('user-id', 'workspace-id', { + spaceId: 'space-id', + content: outOfOrderFull(), + format: 'json', + } as any); + // Definitions reordered to reference order [b, a]. + expect(listDefIds(getInsertedContent())).toEqual(['b', 'a']); + }); + + it("updatePageContent operation 'replace' canonicalizes footnotes", async () => { + const { service, getYjsPayload } = makeService(); + await service.updatePageContent( + 'page-id', + outOfOrderFull(), + 'replace' as any, + 'json' as any, + { id: 'user-id' } as any, + ); + expect(getYjsPayload().operation).toBe('replace'); + expect(listDefIds(getYjsPayload().prosemirrorJson)).toEqual(['b', 'a']); + }); + + it("append of a definition-only fragment is NOT canonicalized (footnote preserved, not dropped)", async () => { + const { service, getYjsPayload } = makeService(); + await service.updatePageContent( + 'page-id', + defOnlyFragment(), + 'append' as any, + 'json' as any, + { id: 'user-id' } as any, + ); + // Canonicalizing a reference-less fragment would DROP the whole list; the + // fragment must pass through untouched so the merge keeps the definition. + expect(getYjsPayload().operation).toBe('append'); + expect(hasFootnotesList(getYjsPayload().prosemirrorJson)).toBe(true); + expect(listDefIds(getYjsPayload().prosemirrorJson)).toEqual(['a']); + }); + + it('prepend of a reference-reuse fragment is NOT canonicalized (no synthesized garbage list)', async () => { + const { service, getYjsPayload } = makeService(); + await service.updatePageContent( + 'page-id', + refReuseFragment(), + 'prepend' as any, + 'json' as any, + { id: 'user-id' } as any, + ); + // Canonicalizing would synthesize a bogus empty footnotesList for the reused + // reference; the fragment must pass through with no list at all. + expect(getYjsPayload().operation).toBe('prepend'); + expect(hasFootnotesList(getYjsPayload().prosemirrorJson)).toBe(false); + }); +}); diff --git a/apps/server/src/core/page/services/page.service.ts b/apps/server/src/core/page/services/page.service.ts index 97133e01..44382d8a 100644 --- a/apps/server/src/core/page/services/page.service.ts +++ b/apps/server/src/core/page/services/page.service.ts @@ -160,9 +160,14 @@ export class PageService { let ydoc = undefined; if (createPageDto?.content && createPageDto?.format) { - const prosemirrorJson = await this.parseProsemirrorContent( - createPageDto.content, - createPageDto.format, + // createPage always writes a FULL document, so canonicalize footnotes to + // the editor's invariant before persisting (issue #228). Pure + idempotent + // + shape-safe: a doc with no footnotes is returned unchanged. + const prosemirrorJson = canonicalizeFootnotes( + await this.parseProsemirrorContent( + createPageDto.content, + createPageDto.format, + ), ); content = prosemirrorJson; @@ -343,7 +348,17 @@ export class PageService { format: ContentFormat, user: User, ): Promise { - const prosemirrorJson = await this.parseProsemirrorContent(content, format); + let prosemirrorJson = await this.parseProsemirrorContent(content, format); + + // Canonicalize footnotes ONLY for a full-document write ('replace'). For an + // append/prepend FRAGMENT, canonicalizing is semantically wrong (it would + // drop a definition-only fragment's list, or synthesize a duplicate empty + // definition for a fragment reusing an existing id) — the fragment merges + // into the live doc where the editor's footnoteSyncPlugin keeps the invariant + // (issue #228, must-fix #1). + if (operation === 'replace') { + prosemirrorJson = canonicalizeFootnotes(prosemirrorJson); + } const documentName = `page.${pageId}`; await this.collaborationGateway.handleYjsEvent( @@ -1301,14 +1316,18 @@ export class PageService { } } - // markdown/html are converted via markdownToHtml -> htmlToJson and json may - // be written programmatically (API createPage/updatePageContent) — none of - // these run the editor's footnoteSyncPlugin, so footnotes keep the source's - // physical order, orphans survive, and reused references aren't collapsed. - // Canonicalize to the editor's invariant before persisting (issue #228). - // Pure + idempotent + shape-safe: a doc with no footnotes is unchanged. - prosemirrorJson = canonicalizeFootnotes(prosemirrorJson); - + // NOTE: footnote canonicalization is intentionally NOT done here. This + // method serves BOTH full writes (createPage / updatePageContent with + // operation 'replace') AND fragment writes (append / prepend). Canonicalizing + // a FRAGMENT is semantically wrong — e.g. a definition-only fragment has no + // references, so the canonicalizer would drop its whole footnotesList (lost + // footnotes), and a fragment reusing an existing id would synthesize an empty + // duplicate definition. The canonicalizer therefore runs only at the + // FULL-DOCUMENT callers (createPage, and updatePageContent for 'replace'), + // never on a fragment (issue #228, must-fix #1). + // (Future consolidation, architecture B: the import services persist via a + // different path; folding all of these into one "prepare JSON for persist" + // helper would centralize the canonicalize call — left as follow-up.) try { jsonToNode(prosemirrorJson); } catch (err) { diff --git a/apps/server/src/integrations/import/services/file-import-task.service.ts b/apps/server/src/integrations/import/services/file-import-task.service.ts index 7666e9b7..5ec2fe8d 100644 --- a/apps/server/src/integrations/import/services/file-import-task.service.ts +++ b/apps/server/src/integrations/import/services/file-import-task.service.ts @@ -504,6 +504,9 @@ export class FileImportTaskService { // zip-imported page's footnotes are reference-ordered, deduped, and // orphan-free like the editor's invariant (issue #228). Pure + // idempotent + shape-safe; a footnote-free doc is unchanged. + // (Future consolidation, architecture B: like import.service, this + // path persists directly rather than via PageService — a shared + // "prepare JSON for persist" helper would centralize this call.) const prosemirrorJson = canonicalizeFootnotes(extractedJson); const insertablePage: InsertablePage = { diff --git a/apps/server/src/integrations/import/services/import.service.ts b/apps/server/src/integrations/import/services/import.service.ts index c2057a73..75418e55 100644 --- a/apps/server/src/integrations/import/services/import.service.ts +++ b/apps/server/src/integrations/import/services/import.service.ts @@ -91,6 +91,10 @@ export class ImportService { // retains orphan definitions, and is not deduped. Canonicalize before // persisting so the stored page matches the editor's invariant (issue #228). // Pure + idempotent + shape-safe: a doc with no footnotes is unchanged. + // (Future consolidation, architecture B: this import path persists directly + // via pageRepo.insertPage rather than through PageService.createPage, so the + // canonicalize call lives here; folding both into one "prepare JSON for + // persist" helper is a sensible follow-up.) const prosemirrorJson = canonicalizeFootnotes(extracted.prosemirrorJson); const pageTitle = title || fileName; diff --git a/packages/editor-ext/src/lib/footnote/footnote-canonicalize.test.ts b/packages/editor-ext/src/lib/footnote/footnote-canonicalize.test.ts index 543c2028..80b56874 100644 --- a/packages/editor-ext/src/lib/footnote/footnote-canonicalize.test.ts +++ b/packages/editor-ext/src/lib/footnote/footnote-canonicalize.test.ts @@ -7,6 +7,7 @@ import { FootnoteReference } from './footnote-reference'; import { FootnotesList } from './footnotes-list'; import { FootnoteDefinition } from './footnote-definition'; import { canonicalizeFootnotes } from './footnote-canonicalize'; +import { FOOTNOTE_CORPUS } from './footnote-corpus'; import { collectReferenceIds, computeFootnoteNumbers, @@ -325,3 +326,21 @@ describe('canonicalizeFootnotes golden parity with footnoteSyncPlugin', () => { expect(new Set(defOrder(steady))).toEqual(new Set(defOrder(canon))); }); }); + +/** + * SHARED golden corpus: this editor-ext copy of `canonicalizeFootnotes` and the + * MCP mirror (`packages/mcp/src/lib/footnote-canonicalize.ts`) are BOTH run + * against the identical { input -> expected } corpus. Pinning the same expected + * outputs in both suites makes "the two pure copies behave identically" a + * checkable property without coupling the packages (architecture item A). The + * MCP mirror of these assertions lives in `test/unit/footnote-corpus.test.mjs`. + */ +describe('canonicalizeFootnotes shared golden corpus (editor-ext copy)', () => { + for (const { name, input, expected } of FOOTNOTE_CORPUS) { + it(`matches the corpus expected output: ${name}`, () => { + expect(canonicalizeFootnotes(input)).toEqual(expected); + // Idempotent on the corpus too. + expect(canonicalizeFootnotes(expected)).toEqual(expected); + }); + } +}); diff --git a/packages/editor-ext/src/lib/footnote/footnote-canonicalize.ts b/packages/editor-ext/src/lib/footnote/footnote-canonicalize.ts index 5017dc05..db543519 100644 --- a/packages/editor-ext/src/lib/footnote/footnote-canonicalize.ts +++ b/packages/editor-ext/src/lib/footnote/footnote-canonicalize.ts @@ -2,7 +2,6 @@ import { FOOTNOTE_REFERENCE_NAME, FOOTNOTES_LIST_NAME, FOOTNOTE_DEFINITION_NAME, - deriveFootnoteId, } from './footnote-util'; /** @@ -11,14 +10,20 @@ import { * `appendTransaction` that only runs inside a ProseMirror `EditorView`, this is * a PURE function over ProseMirror JSON: `canonicalizeFootnotes(doc) -> doc`. * - * It exists because every NON-editor write path (the MCP `markdownToProseMirror` - * importer, `update_page_json`, `docmost_transform`, the future git-sync writer) - * builds ProseMirror JSON directly via `TiptapTransformer`/`updateYFragment`, - * which NEVER runs the editor's plugins — so the canonical footnote topology was - * never enforced on those writes. That is the root cause of the symptom in the - * issue: footnotes rendered out of order (`1, 4, 2, 3, …`), a raw trailing - * `[^id]: …` block, and orphan definitions, all of which are simply the result - * of content written PAST the canonicalizer. + * It exists because the NON-editor write paths served by THIS copy build + * ProseMirror JSON directly (never running the editor's plugins), so the + * canonical footnote topology was never enforced on those writes. The consumers + * of this editor-ext copy are: the server markdown/HTML import + * (`markdownToHtml -> htmlToJson` in import.service / file-import-task.service), + * `PageService` create/update (`parseProsemirrorContent` for the JSON/markdown/ + * HTML REST write paths), and the client markdown PASTE path + * (`markdown-clipboard.ts`). (The MCP package mirrors this canonicalizer in + * `packages/mcp/src/lib/footnote-canonicalize.ts` for its own write paths — + * `markdownToProseMirror`, `update_page_json`, `docmost_transform`, + * `insert_footnote` — see that file's header.) All of these are the root cause + * of the symptom in the issue: footnotes rendered out of order (`1, 4, 2, 3, …`), + * a raw trailing `[^id]: …` block, and orphan definitions, all of which are + * simply the result of content written PAST the canonicalizer. * * The desired end-state (identical to the plugin's) is: * @@ -31,12 +36,14 @@ import { * or synthesizing an empty one when missing. The list sits after the last * meaningful block (only trailing empty paragraphs may follow it). * 3. Orphan definitions (no matching reference) are dropped. - * 4. Duplicate DEFINITIONS (two nodes sharing an id) are resolved - * deterministically: the first keeps the id; each later duplicate is re-id'd - * via `deriveFootnoteId` (never random) so it is never silently lost — and, - * lacking a matching reference, it then falls under the orphan policy and is - * dropped. This matches the editor's never-lose-by-collision rule and the - * importer's first-wins rule (both converge to "one definition per id"). + * 4. Duplicate DEFINITIONS (two nodes sharing an id) are resolved first-wins: + * the first definition for an id is kept; later duplicates carry the SAME + * id, so they can never be referenced separately and are simply dropped. + * This matches the importer's first-wins rule ("one definition per id"). + * (The LIVE editor instead re-id's a duplicate definition so a paste/collab + * merge cannot silently lose live user data; the artifacts this copy + * sanitizes are agent/import-authored, so first-wins is the right policy — + * see footnote-sync.ts `resolveCollisions`.) * 5. Idempotent: a document that already satisfies the invariant is returned * structurally unchanged (the existing definition/list nodes are reused * verbatim), so re-running the canonicalizer — or running it on a write that @@ -47,10 +54,18 @@ import { * PHYSICAL order of existing definition nodes to keep their Yjs/CRDT subtree * identity stable across collaborators (numbering is decoration-derived, so the * displayed numbers are correct regardless of physical order). This function has - * no live CRDT to protect, so it physically REORDERS the list into reference - * order — which is exactly the repair the out-of-order import needs. On every - * editor-reachable steady state (where the list is already reference-ordered) the - * two agree byte-for-byte; see the golden test. + * no live CRDT to protect, so when a REPAIR is needed it physically REORDERS the + * list into reference order — which is exactly the fix the out-of-order import + * needs. + * + * Placement PARITY with the plugin: when the document is already in the canonical + * single-list state, this function leaves that list EXACTLY where it sits (it + * does not move it to the end). The plugin behaves the same — it treats one + * footnotesList holding the canonical definition set as canonical regardless of + * whether content follows it (footnote-sync.ts: `primaryList` falls back to the + * last list and `noChangeNeeded` stays true). So on every editor-reachable steady + * state the two agree byte-for-byte, including when non-empty content follows the + * list; see the golden parity test and the shared corpus. * * Pure: deep-clones its input, never mutates the caller's object, and is * deterministic (no `Math.random`/`Date.now`). @@ -76,62 +91,69 @@ export function canonicalizeFootnotes(doc: T): T { const defNodes: any[] = []; collectDefinitions(out, defNodes); - // 3) Resolve the id topology deterministically. The first definition for an id - // keeps it; a later duplicate is re-id'd to a fresh derived id (never lost), - // which — having no matching reference — is dropped as an orphan in step 4. - const taken = new Set(referenceIds); + // 3) First definition per id wins. Later duplicates carry the SAME id, so they + // can never be referenced separately and would be orphans — they are simply + // dropped (first-wins; see the file header, item 4). + const defById = new Map(); for (const d of defNodes) { const id = d?.attrs?.id; - if (id) taken.add(id); - } - const occurrenceOf = new Map(); - const seenDefIds = new Set(); - // finalId -> definition node (the node reference inside `out`). - const defByFinalId = new Map(); - for (const d of defNodes) { - const origId = d?.attrs?.id; - if (!origId) continue; - if (!seenDefIds.has(origId)) { - seenDefIds.add(origId); - defByFinalId.set(origId, d); - } else { - const next = (occurrenceOf.get(origId) ?? 1) + 1; - occurrenceOf.set(origId, next); - const newId = deriveFootnoteId(origId, next, taken); - taken.add(newId); - defByFinalId.set(newId, d); - } + if (id && !defById.has(id)) defById.set(id, d); } // 4) Build the ordered definition list: one per referenced id, in REFERENCE // order, reusing the existing node (content preserved, id normalized) or - // synthesizing an empty definition. Definitions whose final id is NOT - // referenced are orphans and are simply never added. + // synthesizing an empty definition. Definitions whose id is NOT referenced + // are orphans and are simply never added. The reused node is SHALLOW-copied + // (id normalized): `out` is already a deep clone and the old lists are cut, + // so a second per-definition deep clone is needless. const orderedDefs: any[] = []; for (const id of referenceIds) { - const existing = defByFinalId.get(id); + const existing = defById.get(id); if (existing) { - const node = cloneJson(existing); - node.attrs = { ...(node.attrs ?? {}), id }; - orderedDefs.push(node); + orderedDefs.push({ + ...existing, + attrs: { ...(existing.attrs ?? {}), id }, + }); } else { orderedDefs.push(emptyDefinition(id)); } } - // 5) Strip every existing top-level footnotesList; we rebuild a single one. - const top: any[] = out.content.filter( - (n: any) => !(n && n.type === FOOTNOTES_LIST_NAME), - ); - - // 6) No references -> there must be NO list at all. + // 5) No references -> there must be NO list at all. if (referenceIds.length === 0) { - out.content = top; + out.content = out.content.filter( + (n: any) => !(n && n.type === FOOTNOTES_LIST_NAME), + ); return out; } - // 7) Insert exactly one footnotesList after the last meaningful (non-empty - // paragraph) block, so it coexists with a trailing-node empty paragraph. + // 6) Placement parity with the live plugin: when the document is ALREADY in the + // canonical single-list state, leave that list exactly where it sits instead + // of cutting and re-inserting it at the end. The plugin never repositions a + // sole correct list (footnote-sync.ts), so moving it here would silently + // reorder any user content that follows the list on the first write. The doc + // is in that state when there is exactly one top-level footnotesList, every + // definition in the doc is referenced (no orphans / duplicates: the def count + // equals the canonical count), and the list already holds exactly the + // canonical definitions in reference order. + const topLevelLists = out.content.filter( + (n: any) => n && n.type === FOOTNOTES_LIST_NAME, + ); + if ( + topLevelLists.length === 1 && + defNodes.length === orderedDefs.length && + deepEqualJson(topLevelLists[0].content, orderedDefs) + ) { + return out; + } + + // 7) Otherwise rebuild: strip every footnotesList and re-insert exactly one + // after the last meaningful (non-empty paragraph) block, so it coexists with + // a trailing-node empty paragraph. This both repairs a non-canonical doc and + // (in the import case) physically reorders the list into reference order. + const top: any[] = out.content.filter( + (n: any) => !(n && n.type === FOOTNOTES_LIST_NAME), + ); let insertAt = top.length; while (insertAt > 0 && isEmptyParagraph(top[insertAt - 1])) insertAt--; top.splice(insertAt, 0, { type: FOOTNOTES_LIST_NAME, content: orderedDefs }); @@ -139,6 +161,36 @@ export function canonicalizeFootnotes(doc: T): T { return out; } +/** + * Order-insensitive deep equality over plain JSON (objects/arrays/primitives). + * Used to detect an already-canonical footnotesList so its physical position is + * preserved (placement parity with the live plugin). + */ +function deepEqualJson(a: any, b: any): boolean { + if (a === b) return true; + if (a == null || b == null || typeof a !== typeof b) return false; + if (Array.isArray(a) || Array.isArray(b)) { + if (!Array.isArray(a) || !Array.isArray(b) || a.length !== b.length) { + return false; + } + for (let i = 0; i < a.length; i++) { + if (!deepEqualJson(a[i], b[i])) return false; + } + return true; + } + if (typeof a === 'object') { + const ka = Object.keys(a); + const kb = Object.keys(b); + if (ka.length !== kb.length) return false; + for (const k of ka) { + if (!Object.prototype.hasOwnProperty.call(b, k)) return false; + if (!deepEqualJson(a[k], b[k])) return false; + } + return true; + } + return false; +} + /** A fresh empty definition node for a referenced id with no definition. */ function emptyDefinition(id: string): any { return { diff --git a/packages/editor-ext/src/lib/footnote/footnote-corpus.ts b/packages/editor-ext/src/lib/footnote/footnote-corpus.ts new file mode 100644 index 00000000..e8521b74 --- /dev/null +++ b/packages/editor-ext/src/lib/footnote/footnote-corpus.ts @@ -0,0 +1,1179 @@ +/** + * SHARED golden corpus for the footnote canonicalizer (issue #228). + * + * Each case is { name, input, expected } where `expected` is exactly what + * `canonicalizeFootnotes(input)` must return. This is the CANONICAL copy; it is + * mirrored verbatim (data only) in `packages/mcp/test/unit/footnote-corpus.mjs`. + * Both the editor-ext copy and the MCP mirror of `canonicalizeFootnotes` are run + * against this corpus by their respective test suites, which turns "the two + * pure copies behave identically" into a checkable property without coupling the + * packages at build time. When you change one corpus, change the other. + * + * Coverage includes (besides ordering/orphan/reuse/dedup/synth/merge): a single + * canonical list with NON-EMPTY content after it (must NOT be repositioned — + * plugin placement parity, must-fix #2) and a reference nested inside a callout + * (the recursive collection, test-coverage #14). + */ +export interface FootnoteCorpusCase { + name: string; + input: any; + expected: any; +} + +export const FOOTNOTE_CORPUS: FootnoteCorpusCase[] = [ + { + "name": "out-of-order defs ordered by first reference", + "input": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "x" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "b" + } + }, + { + "type": "footnoteReference", + "attrs": { + "id": "a" + } + }, + { + "type": "footnoteReference", + "attrs": { + "id": "d" + } + }, + { + "type": "footnoteReference", + "attrs": { + "id": "c" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "a" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "A" + } + ] + } + ] + }, + { + "type": "footnoteDefinition", + "attrs": { + "id": "c" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "C" + } + ] + } + ] + }, + { + "type": "footnoteDefinition", + "attrs": { + "id": "b" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "B" + } + ] + } + ] + }, + { + "type": "footnoteDefinition", + "attrs": { + "id": "d" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "D" + } + ] + } + ] + } + ] + } + ] + }, + "expected": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "x" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "b" + } + }, + { + "type": "footnoteReference", + "attrs": { + "id": "a" + } + }, + { + "type": "footnoteReference", + "attrs": { + "id": "d" + } + }, + { + "type": "footnoteReference", + "attrs": { + "id": "c" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "b" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "B" + } + ] + } + ] + }, + { + "type": "footnoteDefinition", + "attrs": { + "id": "a" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "A" + } + ] + } + ] + }, + { + "type": "footnoteDefinition", + "attrs": { + "id": "d" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "D" + } + ] + } + ] + }, + { + "type": "footnoteDefinition", + "attrs": { + "id": "c" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "C" + } + ] + } + ] + } + ] + } + ] + } + }, + { + "name": "orphan definition dropped", + "input": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "x" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "a" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "a" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "A" + } + ] + } + ] + }, + { + "type": "footnoteDefinition", + "attrs": { + "id": "orphan" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "O" + } + ] + } + ] + } + ] + } + ] + }, + "expected": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "x" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "a" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "a" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "A" + } + ] + } + ] + } + ] + } + ] + } + }, + { + "name": "no references removes the list", + "input": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "plain" + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "orphan" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "O" + } + ] + } + ] + } + ] + } + ] + }, + "expected": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "plain" + } + ] + } + ] + } + }, + { + "name": "reuse: repeated references collapse to one definition", + "input": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "footnoteReference", + "attrs": { + "id": "d" + } + }, + { + "type": "text", + "text": " a " + }, + { + "type": "footnoteReference", + "attrs": { + "id": "d" + } + }, + { + "type": "footnoteReference", + "attrs": { + "id": "d" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "d" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "shared" + } + ] + } + ] + } + ] + } + ] + }, + "expected": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "footnoteReference", + "attrs": { + "id": "d" + } + }, + { + "type": "text", + "text": " a " + }, + { + "type": "footnoteReference", + "attrs": { + "id": "d" + } + }, + { + "type": "footnoteReference", + "attrs": { + "id": "d" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "d" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "shared" + } + ] + } + ] + } + ] + } + ] + } + }, + { + "name": "duplicate definitions: first wins", + "input": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "x" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "d" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "d" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "first" + } + ] + } + ] + }, + { + "type": "footnoteDefinition", + "attrs": { + "id": "d" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "second" + } + ] + } + ] + }, + { + "type": "footnoteDefinition", + "attrs": { + "id": "d" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "third" + } + ] + } + ] + } + ] + } + ] + }, + "expected": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "x" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "d" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "d" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "first" + } + ] + } + ] + } + ] + } + ] + } + }, + { + "name": "synthesizes an empty definition for a reference with none", + "input": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "x" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "missing" + } + } + ] + } + ] + }, + "expected": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "x" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "missing" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "missing" + }, + "content": [ + { + "type": "paragraph" + } + ] + } + ] + } + ] + } + }, + { + "name": "merges multiple footnotesList nodes into one", + "input": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "a" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "x" + } + }, + { + "type": "footnoteReference", + "attrs": { + "id": "y" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "x" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "X" + } + ] + } + ] + } + ] + }, + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "tail" + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "y" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "Y" + } + ] + } + ] + } + ] + } + ] + }, + "expected": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "a" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "x" + } + }, + { + "type": "footnoteReference", + "attrs": { + "id": "y" + } + } + ] + }, + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "tail" + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "x" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "X" + } + ] + } + ] + }, + { + "type": "footnoteDefinition", + "attrs": { + "id": "y" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "Y" + } + ] + } + ] + } + ] + } + ] + } + }, + { + "name": "single canonical list before a trailing empty paragraph stays put", + "input": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "x" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "a" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "a" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "A" + } + ] + } + ] + } + ] + }, + { + "type": "paragraph" + } + ] + }, + "expected": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "x" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "a" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "a" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "A" + } + ] + } + ] + } + ] + }, + { + "type": "paragraph" + } + ] + } + }, + { + "name": "single canonical list with NON-EMPTY content after it is NOT moved (plugin parity)", + "input": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "x" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "a" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "a" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "A" + } + ] + } + ] + } + ] + }, + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "epilogue text" + } + ] + } + ] + }, + "expected": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "x" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "a" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "a" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "A" + } + ] + } + ] + } + ] + }, + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "epilogue text" + } + ] + } + ] + } + }, + { + "name": "reference inside a nested container (callout) is collected", + "input": { + "type": "doc", + "content": [ + { + "type": "callout", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "see " + }, + { + "type": "footnoteReference", + "attrs": { + "id": "n" + } + } + ] + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "n" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "note" + } + ] + } + ] + } + ] + } + ] + }, + "expected": { + "type": "doc", + "content": [ + { + "type": "callout", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "see " + }, + { + "type": "footnoteReference", + "attrs": { + "id": "n" + } + } + ] + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "n" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "note" + } + ] + } + ] + } + ] + } + ] + } + }, + { + "name": "no footnotes at all is unchanged", + "input": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "just text" + } + ] + } + ] + }, + "expected": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "just text" + } + ] + } + ] + } + } +]; diff --git a/packages/mcp/build/index.js b/packages/mcp/build/index.js index 06bc19ea..58197d09 100644 --- a/packages/mcp/build/index.js +++ b/packages/mcp/build/index.js @@ -656,7 +656,8 @@ export function createDocmostMcpServer(config) { "parenthesized function). It receives a clone of the live doc and " + "ctx (comments, log, consume(id), helpers: blockText/walk/getList/" + "insertMarkerAfter/setCalloutRange/noteItem/mdToInlineNodes/" + - "commentsToFootnotes) and must return a {type:'doc'} node."), + "commentsToFootnotes/canonicalizeFootnotes/insertInlineFootnote) " + + "and must return a {type:'doc'} node."), dryRun: z .boolean() .optional() diff --git a/packages/mcp/build/lib/collaboration.js b/packages/mcp/build/lib/collaboration.js index 1b6b1a10..67942c6d 100644 --- a/packages/mcp/build/lib/collaboration.js +++ b/packages/mcp/build/lib/collaboration.js @@ -344,7 +344,16 @@ function extractFootnotes(markdown) { section: `
${inner}
`, }; } -/** Convert markdown to a ProseMirror doc using the full Docmost schema. */ +/** + * Convert markdown to a ProseMirror doc using the full Docmost schema. + * + * NOTE: besides the page-import write paths, this is also reused for comment + * bodies (createComment / updateComment). For an ordinary comment the + * canonicalize call below is a no-op (a comment carries no footnotes), so the + * reuse is safe; the only theoretical effect is if footnote markup were ever + * authored INSIDE a comment — a narrow case where canonicalizing the comment's + * own (self-contained) footnotes is still the correct behaviour. + */ export async function markdownToProseMirror(markdownContent) { const withCallouts = await preprocessCallouts(markdownContent); const { body, section } = extractFootnotes(withCallouts); diff --git a/packages/mcp/build/lib/footnote-authoring.js b/packages/mcp/build/lib/footnote-authoring.js new file mode 100644 index 00000000..ab8d7eb2 --- /dev/null +++ b/packages/mcp/build/lib/footnote-authoring.js @@ -0,0 +1,88 @@ +/** + * Inline-authoring helpers for footnotes (MCP). + * + * These build/identify footnote DEFINITION nodes for the author-inline tool + * (`insertInlineFootnote` in transforms.ts): a content key to de-duplicate notes + * by text, a definition-node factory, and a fresh uuidv7-style id generator. + * + * Split out of `footnote-canonicalize.ts` so that module stays a pure MIRROR of + * the editor-ext canonicalizer (compositionally symmetric to the editor-ext + * copy, which keeps its authoring helpers in `footnote-util.ts`). The pure + * canonicalizer has no dependency on these. + */ +const FOOTNOTE_DEFINITION_NAME = "footnoteDefinition"; +function cloneJson(v) { + if (typeof structuredClone === "function") + return structuredClone(v); + return JSON.parse(JSON.stringify(v)); +} +/** + * Normalized content key for de-duplicating footnote DEFINITIONS by their text. + * + * Two definitions with the same key are the SAME footnote — so the inline + * authoring tool reuses one id (one number, one definition, several references) + * instead of minting a second definition. Key = plaintext (whitespace-collapsed, + * trimmed) PLUS a signature of the inline mark types in order, so two notes that + * read the same but differ in formatting (one bold, one plain) are NOT merged. + * Conservative: only an exact match merges. + */ +export function footnoteContentKey(defNode) { + const parts = []; + const visit = (n) => { + if (!n || typeof n !== "object") + return; + if (n.type === "text" && typeof n.text === "string") { + const marks = Array.isArray(n.marks) + ? n.marks.map((m) => m?.type).filter(Boolean).sort().join(",") + : ""; + parts.push(`${n.text}${marks}`); + } + if (Array.isArray(n.content)) + for (const c of n.content) + visit(c); + }; + visit(defNode); + // Collapse the assembled text's whitespace and trim, keeping the mark + // signature attached so formatting differences still distinguish notes. + return parts + .join("") + .replace(/[ \t\r\n]+/g, " ") + .trim(); +} +/** + * Build a footnoteDefinition node from inline ProseMirror nodes, keyed by id. + */ +export function makeFootnoteDefinition(id, inlineNodes) { + const content = Array.isArray(inlineNodes) ? cloneJson(inlineNodes) : []; + return { + type: FOOTNOTE_DEFINITION_NAME, + attrs: { id }, + content: [{ type: "paragraph", content }], + }; +} +/** + * Generate a uuidv7-style id (time-ordered), matching editor-ext's + * `generateFootnoteId`. Used for a genuinely-new inline footnote id. + */ +export function generateFootnoteId() { + const now = Date.now(); + const timeHex = now.toString(16).padStart(12, "0"); + const rand = (length) => { + let s = ""; + for (let i = 0; i < length; i++) + s += Math.floor(Math.random() * 16).toString(16); + return s; + }; + const versioned = "7" + rand(3); + const variantNibble = (8 + Math.floor(Math.random() * 4)).toString(16); + const variant = variantNibble + rand(3); + return (timeHex.slice(0, 8) + + "-" + + timeHex.slice(8, 12) + + "-" + + versioned + + "-" + + variant + + "-" + + rand(12)); +} diff --git a/packages/mcp/build/lib/footnote-canonicalize.js b/packages/mcp/build/lib/footnote-canonicalize.js index 056a2d31..92511ae1 100644 --- a/packages/mcp/build/lib/footnote-canonicalize.js +++ b/packages/mcp/build/lib/footnote-canonicalize.js @@ -1,5 +1,5 @@ /** - * Server-side footnote canonicalizer + inline authoring helper (MCP mirror). + * Server-side footnote canonicalizer (MCP mirror — PURE). * * `canonicalizeFootnotes(doc)` is a pure ProseMirror-JSON port of the editor's * `footnoteSyncPlugin` end-state, identical in behaviour to @@ -8,7 +8,13 @@ * `docmost-schema.ts` nodes are mirrored: the MCP package is deliberately * decoupled from the browser/React-heavy editor barrel and operates on plain * JSON. The editor-ext copy owns the golden test against the live plugin; this - * copy must stay behaviourally identical. + * copy must stay behaviourally identical (a SHARED golden corpus, exercised by + * both test suites, pins that — see `test/unit/footnote-corpus.mjs`). + * + * This module is the pure MIRROR only. The inline-authoring helpers + * (`footnoteContentKey`, `makeFootnoteDefinition`, `generateFootnoteId`) used by + * `insertInlineFootnote` live in the sibling `footnote-authoring.ts`, so this + * file is compositionally symmetric to the editor-ext copy. * * Why it exists: every NON-editor write path (markdown import, update_page_json, * docmost_transform, insert_footnote) builds ProseMirror JSON directly, so the @@ -26,32 +32,6 @@ function cloneJson(v) { return structuredClone(v); return JSON.parse(JSON.stringify(v)); } -/** - * Deterministic unique id for the k-th (k >= 2) duplicate of an id during - * collision resolution. Pure function of (originalId, occurrence, taken) — no - * Math.random/Date.now — mirroring editor-ext's `deriveFootnoteId`. Kept local - * (the importer's first-wins de-dup means duplicates are rare here, but the - * canonicalizer must still resolve them deterministically). - */ -export function deriveFootnoteId(originalId, occurrence, taken) { - let candidate = `${originalId}__${occurrence}`; - let n = 0; - while (taken.has(candidate)) { - n += 1; - candidate = `${originalId}__${occurrence}${suffix(n)}`; - } - return candidate; -} -function suffix(n) { - let out = ""; - let x = n; - while (x > 0) { - const rem = (x - 1) % 25; - out = String.fromCharCode(98 + rem) + out; // 98 = 'b' - x = Math.floor((x - 1) / 25); - } - return out; -} function isEmptyParagraph(node) { return (!!node && node.type === "paragraph" && @@ -89,6 +69,41 @@ function emptyDefinition(id) { content: [{ type: "paragraph" }], }; } +/** + * Order-insensitive deep equality over plain JSON (objects/arrays/primitives). + * Used to detect an already-canonical footnotesList so its physical position is + * preserved (placement parity with the live plugin). + */ +function deepEqualJson(a, b) { + if (a === b) + return true; + if (a == null || b == null || typeof a !== typeof b) + return false; + if (Array.isArray(a) || Array.isArray(b)) { + if (!Array.isArray(a) || !Array.isArray(b) || a.length !== b.length) { + return false; + } + for (let i = 0; i < a.length; i++) { + if (!deepEqualJson(a[i], b[i])) + return false; + } + return true; + } + if (typeof a === "object") { + const ka = Object.keys(a); + const kb = Object.keys(b); + if (ka.length !== kb.length) + return false; + for (const k of ka) { + if (!Object.prototype.hasOwnProperty.call(b, k)) + return false; + if (!deepEqualJson(a[k], b[k])) + return false; + } + return true; + } + return false; +} /** * Canonicalize footnotes in a ProseMirror-JSON document. See the file header and * the editor-ext twin for the full contract. Pure (deep-clones input, @@ -101,52 +116,57 @@ export function canonicalizeFootnotes(doc) { return doc; } const out = cloneJson(doc); + // 1) Distinct reference ids in document order (deep — refs can live in + // callouts, tables, list items, ...). The ordering/numbering truth. const referenceIds = []; collectReferenceIds(out, referenceIds, new Set()); + // 2) Every definition node in document order (deep). const defNodes = []; collectDefinitions(out, defNodes); - const taken = new Set(referenceIds); + // 3) First definition per id wins; later duplicates carry the SAME id, so they + // cannot be referenced separately and would be orphans — they are dropped. + const defById = new Map(); for (const d of defNodes) { const id = d?.attrs?.id; - if (id) - taken.add(id); - } - const occurrenceOf = new Map(); - const seenDefIds = new Set(); - const defByFinalId = new Map(); - for (const d of defNodes) { - const origId = d?.attrs?.id; - if (!origId) - continue; - if (!seenDefIds.has(origId)) { - seenDefIds.add(origId); - defByFinalId.set(origId, d); - } - else { - const next = (occurrenceOf.get(origId) ?? 1) + 1; - occurrenceOf.set(origId, next); - const newId = deriveFootnoteId(origId, next, taken); - taken.add(newId); - defByFinalId.set(newId, d); - } + if (id && !defById.has(id)) + defById.set(id, d); } + // 4) Build the ordered definition list: one per referenced id, in REFERENCE + // order, reusing the existing node (shallow-copied, id normalized — `out` is + // already deep-cloned and the old lists are cut) or synthesizing an empty + // one. Definitions whose id is not referenced are orphans and never added. const orderedDefs = []; for (const id of referenceIds) { - const existing = defByFinalId.get(id); + const existing = defById.get(id); if (existing) { - const node = cloneJson(existing); - node.attrs = { ...(node.attrs ?? {}), id }; - orderedDefs.push(node); + orderedDefs.push({ + ...existing, + attrs: { ...(existing.attrs ?? {}), id }, + }); } else { orderedDefs.push(emptyDefinition(id)); } } - const top = out.content.filter((n) => !(n && n.type === FOOTNOTES_LIST_NAME)); + // 5) No references -> there must be NO list at all. if (referenceIds.length === 0) { - out.content = top; + out.content = out.content.filter((n) => !(n && n.type === FOOTNOTES_LIST_NAME)); return out; } + // 6) Placement parity with the live plugin: when the document is ALREADY in the + // canonical single-list state, leave that list exactly where it sits rather + // than cutting and re-inserting it at the end (the plugin never repositions a + // sole correct list, so moving it would silently reorder any content that + // follows the list on the first write). + const topLevelLists = out.content.filter((n) => n && n.type === FOOTNOTES_LIST_NAME); + if (topLevelLists.length === 1 && + defNodes.length === orderedDefs.length && + deepEqualJson(topLevelLists[0].content, orderedDefs)) { + return out; + } + // 7) Otherwise rebuild: strip every footnotesList and re-insert exactly one + // after the last meaningful (non-empty paragraph) block. + const top = out.content.filter((n) => !(n && n.type === FOOTNOTES_LIST_NAME)); let insertAt = top.length; while (insertAt > 0 && isEmptyParagraph(top[insertAt - 1])) insertAt--; @@ -154,73 +174,3 @@ export function canonicalizeFootnotes(doc) { out.content = top; return out; } -/** - * Normalized content key for de-duplicating footnote DEFINITIONS by their text. - * - * Two definitions with the same key are the SAME footnote — so the inline - * authoring tool reuses one id (one number, one definition, several references) - * instead of minting a second definition. Key = plaintext (whitespace-collapsed, - * trimmed) PLUS a signature of the inline mark types in order, so two notes that - * read the same but differ in formatting (one bold, one plain) are NOT merged. - * Conservative: only an exact match merges. - */ -export function footnoteContentKey(defNode) { - const parts = []; - const visit = (n) => { - if (!n || typeof n !== "object") - return; - if (n.type === "text" && typeof n.text === "string") { - const marks = Array.isArray(n.marks) - ? n.marks.map((m) => m?.type).filter(Boolean).sort().join(",") - : ""; - parts.push(`${n.text}${marks}`); - } - if (Array.isArray(n.content)) - for (const c of n.content) - visit(c); - }; - visit(defNode); - // Collapse the assembled text's whitespace and trim, keeping the mark - // signature attached so formatting differences still distinguish notes. - return parts - .join("") - .replace(/[ \t\r\n]+/g, " ") - .trim(); -} -/** - * Build a footnoteDefinition node from inline ProseMirror nodes, keyed by id. - */ -export function makeFootnoteDefinition(id, inlineNodes) { - const content = Array.isArray(inlineNodes) ? cloneJson(inlineNodes) : []; - return { - type: FOOTNOTE_DEFINITION_NAME, - attrs: { id }, - content: [{ type: "paragraph", content }], - }; -} -/** - * Generate a uuidv7-style id (time-ordered), matching editor-ext's - * `generateFootnoteId`. Used for a genuinely-new inline footnote id. - */ -export function generateFootnoteId() { - const now = Date.now(); - const timeHex = now.toString(16).padStart(12, "0"); - const rand = (length) => { - let s = ""; - for (let i = 0; i < length; i++) - s += Math.floor(Math.random() * 16).toString(16); - return s; - }; - const versioned = "7" + rand(3); - const variantNibble = (8 + Math.floor(Math.random() * 4)).toString(16); - const variant = variantNibble + rand(3); - return (timeHex.slice(0, 8) + - "-" + - timeHex.slice(8, 12) + - "-" + - versioned + - "-" + - variant + - "-" + - rand(12)); -} diff --git a/packages/mcp/build/lib/transforms.js b/packages/mcp/build/lib/transforms.js index 76147f02..ff5862a6 100644 --- a/packages/mcp/build/lib/transforms.js +++ b/packages/mcp/build/lib/transforms.js @@ -14,7 +14,8 @@ * - `marks` arrays are preserved verbatim when fragments are split/reordered. */ import { blockPlainText } from "./node-ops.js"; -import { canonicalizeFootnotes, footnoteContentKey, makeFootnoteDefinition, generateFootnoteId, } from "./footnote-canonicalize.js"; +import { canonicalizeFootnotes } from "./footnote-canonicalize.js"; +import { footnoteContentKey, makeFootnoteDefinition, generateFootnoteId, } from "./footnote-authoring.js"; export { canonicalizeFootnotes } from "./footnote-canonicalize.js"; /** Deep-clone a JSON-serializable value without mutating the original. */ function clone(value) { @@ -85,6 +86,19 @@ export function getList(doc, predicate) { * false when the anchor text was not found in any in-scope block. */ export function insertMarkerAfter(doc, anchor, marker, opts = {}) { + // A plain marker is a leading-space-padded unmarked text run. + return insertNodesAfterAnchor(doc, anchor, () => [{ type: "text", text: " " + marker }], opts); +} +/** + * Mark-safe insertion CORE: split the inline text run that holds the END of + * `anchor` (preserving the surrounding marks) and splice the nodes produced by + * `makeMiddle()` in at the split point. `insertMarkerAfter` (plain text marker) + * and `insertInlineFootnote` (a `footnoteReference` node) are both thin callers — + * the only difference is WHAT is inserted (a space-padded text run vs. a node + * that should hug the preceding word), which is exactly what `makeMiddle` + * decides. Operates on a clone; returns `{ doc, inserted }`. + */ +function insertNodesAfterAnchor(doc, anchor, makeMiddle, opts = {}) { const out = clone(doc); if (!isObject(out) || !Array.isArray(out.content) || !anchor) { return { doc: out, inserted: false }; @@ -138,8 +152,9 @@ export function insertMarkerAfter(doc, anchor, marker, opts = {}) { if (before.length > 0) { parts.push({ ...n, text: before, marks: [...marks] }); } - // Marker is a PLAIN run: no marks copied. Leading space separates it. - parts.push({ type: "text", text: " " + marker }); + // The inserted nodes are caller-decided (a space-padded marker run, + // or a node that hugs the word). They carry no copied marks. + parts.push(...makeMiddle()); if (after.length > 0) { parts.push({ ...n, text: after, marks: [...marks] }); } @@ -473,8 +488,6 @@ export function commentsToFootnotes(doc, comments, opts = {}) { const synced = setCalloutRange(working, definitions.length); return { doc: synced.doc, consumed }; } -/** A NUL-delimited sentinel that cannot occur in real prose. */ -const INLINE_FOOTNOTE_SENTINEL = "\u0000IFN\u0000"; /** * AUTHOR-INLINE footnote insertion. The caller supplies WHERE (anchorText) and * WHAT (markdown text); numbering and the bottom list are derived server-side by @@ -488,10 +501,10 @@ const INLINE_FOOTNOTE_SENTINEL = "\u0000IFN\u0000"; * minted and a new definition added. Conservative — only an exact content match * merges. * - * Mechanics: the marker is inserted with the same mark-safe `insertMarkerAfter` - * split used elsewhere, via a sentinel that is then replaced by a real - * `footnoteReference` node (dropping the inserted leading space so the marker - * attaches to the preceding word). The whole document is then canonicalized. + * Mechanics: the `footnoteReference` node is inserted DIRECTLY at the anchor via + * the same mark-safe split as `insertMarkerAfter` (the shared + * `insertNodesAfterAnchor` core), so it hugs the preceding word with no text + * sentinel round-trip. The whole document is then canonicalized. * * Operates on a clone of `doc`. When the anchor is not found, returns the input * unchanged with `inserted:false`. @@ -518,14 +531,13 @@ export function insertInlineFootnote(doc, opts) { } if (footnoteId == null) footnoteId = generateFootnoteId(); - // Insert a sentinel marker after the anchor (mark-safe split). - const r = insertMarkerAfter(doc, (opts.anchorText ?? "").trimEnd(), INLINE_FOOTNOTE_SENTINEL); + // Insert the footnoteReference node directly after the anchor (mark-safe + // split); it hugs the preceding word with no leading space. + const r = insertNodesAfterAnchor(doc, (opts.anchorText ?? "").trimEnd(), () => [{ type: "footnoteReference", attrs: { id: footnoteId } }]); if (!r.inserted) { return { doc: clone(doc), inserted: false, footnoteId, reused }; } let working = r.doc; - // Replace the sentinel run with a real footnoteReference node. - replaceSentinelWithReference(working, footnoteId); // Add a NEW definition (canonicalize will order/place it); a reused id needs // no new definition (the existing one is shared). if (!reused) { @@ -535,48 +547,6 @@ export function insertInlineFootnote(doc, opts) { working = canonicalizeFootnotes(working); return { doc: working, inserted: true, footnoteId, reused }; } -/** - * Replace the lone sentinel text run (created by insertMarkerAfter as - * `" " + sentinel`) with a footnoteReference node, dropping the leading space so - * the marker attaches to the preceding word. Mutates `doc` in place. - */ -function replaceSentinelWithReference(doc, footnoteId) { - let done = false; - const visit = (container) => { - if (done || !isObject(container) || !Array.isArray(container.content)) - return; - const arr = container.content; - for (let i = 0; i < arr.length; i++) { - const n = arr[i]; - if (isObject(n) && - n.type === "text" && - typeof n.text === "string" && - n.text.includes(INLINE_FOOTNOTE_SENTINEL)) { - const idx = n.text.indexOf(INLINE_FOOTNOTE_SENTINEL); - // Text before the sentinel, with a single trailing space (the one - // insertMarkerAfter prepended) stripped so the ref hugs the word. - const before = n.text.slice(0, idx).replace(/ $/, ""); - const after = n.text.slice(idx + INLINE_FOOTNOTE_SENTINEL.length); - const marks = Array.isArray(n.marks) ? n.marks : []; - const parts = []; - if (before.length > 0) - parts.push({ ...n, text: before, marks: [...marks] }); - parts.push({ type: "footnoteReference", attrs: { id: footnoteId } }); - if (after.length > 0) - parts.push({ ...n, text: after, marks: [...marks] }); - arr.splice(i, 1, ...parts); - done = true; - return; - } - } - for (const child of arr) { - visit(child); - if (done) - return; - } - }; - visit(doc); -} /** * Append a definition node so the canonicalizer can order/place it: into the * first existing footnotesList, or a new trailing list when none exists. diff --git a/packages/mcp/src/index.ts b/packages/mcp/src/index.ts index b980c8cc..d439229a 100644 --- a/packages/mcp/src/index.ts +++ b/packages/mcp/src/index.ts @@ -912,7 +912,8 @@ server.registerTool( "parenthesized function). It receives a clone of the live doc and " + "ctx (comments, log, consume(id), helpers: blockText/walk/getList/" + "insertMarkerAfter/setCalloutRange/noteItem/mdToInlineNodes/" + - "commentsToFootnotes) and must return a {type:'doc'} node.", + "commentsToFootnotes/canonicalizeFootnotes/insertInlineFootnote) " + + "and must return a {type:'doc'} node.", ), dryRun: z .boolean() diff --git a/packages/mcp/src/lib/collaboration.ts b/packages/mcp/src/lib/collaboration.ts index 55159ef9..e6f57aa8 100644 --- a/packages/mcp/src/lib/collaboration.ts +++ b/packages/mcp/src/lib/collaboration.ts @@ -393,7 +393,16 @@ function extractFootnotes(markdown: string): { }; } -/** Convert markdown to a ProseMirror doc using the full Docmost schema. */ +/** + * Convert markdown to a ProseMirror doc using the full Docmost schema. + * + * NOTE: besides the page-import write paths, this is also reused for comment + * bodies (createComment / updateComment). For an ordinary comment the + * canonicalize call below is a no-op (a comment carries no footnotes), so the + * reuse is safe; the only theoretical effect is if footnote markup were ever + * authored INSIDE a comment — a narrow case where canonicalizing the comment's + * own (self-contained) footnotes is still the correct behaviour. + */ export async function markdownToProseMirror( markdownContent: string, ): Promise { diff --git a/packages/mcp/src/lib/footnote-authoring.ts b/packages/mcp/src/lib/footnote-authoring.ts new file mode 100644 index 00000000..9dfcd7fa --- /dev/null +++ b/packages/mcp/src/lib/footnote-authoring.ts @@ -0,0 +1,91 @@ +/** + * Inline-authoring helpers for footnotes (MCP). + * + * These build/identify footnote DEFINITION nodes for the author-inline tool + * (`insertInlineFootnote` in transforms.ts): a content key to de-duplicate notes + * by text, a definition-node factory, and a fresh uuidv7-style id generator. + * + * Split out of `footnote-canonicalize.ts` so that module stays a pure MIRROR of + * the editor-ext canonicalizer (compositionally symmetric to the editor-ext + * copy, which keeps its authoring helpers in `footnote-util.ts`). The pure + * canonicalizer has no dependency on these. + */ + +const FOOTNOTE_DEFINITION_NAME = "footnoteDefinition"; + +function cloneJson(v: T): T { + if (typeof structuredClone === "function") return structuredClone(v); + return JSON.parse(JSON.stringify(v)) as T; +} + +/** + * Normalized content key for de-duplicating footnote DEFINITIONS by their text. + * + * Two definitions with the same key are the SAME footnote — so the inline + * authoring tool reuses one id (one number, one definition, several references) + * instead of minting a second definition. Key = plaintext (whitespace-collapsed, + * trimmed) PLUS a signature of the inline mark types in order, so two notes that + * read the same but differ in formatting (one bold, one plain) are NOT merged. + * Conservative: only an exact match merges. + */ +export function footnoteContentKey(defNode: any): string { + const parts: string[] = []; + const visit = (n: any): void => { + if (!n || typeof n !== "object") return; + if (n.type === "text" && typeof n.text === "string") { + const marks = Array.isArray(n.marks) + ? n.marks.map((m: any) => m?.type).filter(Boolean).sort().join(",") + : ""; + parts.push(`${n.text}${marks}`); + } + if (Array.isArray(n.content)) for (const c of n.content) visit(c); + }; + visit(defNode); + // Collapse the assembled text's whitespace and trim, keeping the mark + // signature attached so formatting differences still distinguish notes. + return parts + .join("") + .replace(/[ \t\r\n]+/g, " ") + .trim(); +} + +/** + * Build a footnoteDefinition node from inline ProseMirror nodes, keyed by id. + */ +export function makeFootnoteDefinition(id: string, inlineNodes: any[]): any { + const content = Array.isArray(inlineNodes) ? cloneJson(inlineNodes) : []; + return { + type: FOOTNOTE_DEFINITION_NAME, + attrs: { id }, + content: [{ type: "paragraph", content }], + }; +} + +/** + * Generate a uuidv7-style id (time-ordered), matching editor-ext's + * `generateFootnoteId`. Used for a genuinely-new inline footnote id. + */ +export function generateFootnoteId(): string { + const now = Date.now(); + const timeHex = now.toString(16).padStart(12, "0"); + const rand = (length: number) => { + let s = ""; + for (let i = 0; i < length; i++) + s += Math.floor(Math.random() * 16).toString(16); + return s; + }; + const versioned = "7" + rand(3); + const variantNibble = (8 + Math.floor(Math.random() * 4)).toString(16); + const variant = variantNibble + rand(3); + return ( + timeHex.slice(0, 8) + + "-" + + timeHex.slice(8, 12) + + "-" + + versioned + + "-" + + variant + + "-" + + rand(12) + ); +} diff --git a/packages/mcp/src/lib/footnote-canonicalize.ts b/packages/mcp/src/lib/footnote-canonicalize.ts index c05af3da..d5a4a257 100644 --- a/packages/mcp/src/lib/footnote-canonicalize.ts +++ b/packages/mcp/src/lib/footnote-canonicalize.ts @@ -1,5 +1,5 @@ /** - * Server-side footnote canonicalizer + inline authoring helper (MCP mirror). + * Server-side footnote canonicalizer (MCP mirror — PURE). * * `canonicalizeFootnotes(doc)` is a pure ProseMirror-JSON port of the editor's * `footnoteSyncPlugin` end-state, identical in behaviour to @@ -8,7 +8,13 @@ * `docmost-schema.ts` nodes are mirrored: the MCP package is deliberately * decoupled from the browser/React-heavy editor barrel and operates on plain * JSON. The editor-ext copy owns the golden test against the live plugin; this - * copy must stay behaviourally identical. + * copy must stay behaviourally identical (a SHARED golden corpus, exercised by + * both test suites, pins that — see `test/unit/footnote-corpus.mjs`). + * + * This module is the pure MIRROR only. The inline-authoring helpers + * (`footnoteContentKey`, `makeFootnoteDefinition`, `generateFootnoteId`) used by + * `insertInlineFootnote` live in the sibling `footnote-authoring.ts`, so this + * file is compositionally symmetric to the editor-ext copy. * * Why it exists: every NON-editor write path (markdown import, update_page_json, * docmost_transform, insert_footnote) builds ProseMirror JSON directly, so the @@ -28,38 +34,6 @@ function cloneJson(v: T): T { return JSON.parse(JSON.stringify(v)) as T; } -/** - * Deterministic unique id for the k-th (k >= 2) duplicate of an id during - * collision resolution. Pure function of (originalId, occurrence, taken) — no - * Math.random/Date.now — mirroring editor-ext's `deriveFootnoteId`. Kept local - * (the importer's first-wins de-dup means duplicates are rare here, but the - * canonicalizer must still resolve them deterministically). - */ -export function deriveFootnoteId( - originalId: string, - occurrence: number, - taken: Set | ReadonlySet, -): string { - let candidate = `${originalId}__${occurrence}`; - let n = 0; - while (taken.has(candidate)) { - n += 1; - candidate = `${originalId}__${occurrence}${suffix(n)}`; - } - return candidate; -} - -function suffix(n: number): string { - let out = ""; - let x = n; - while (x > 0) { - const rem = (x - 1) % 25; - out = String.fromCharCode(98 + rem) + out; // 98 = 'b' - x = Math.floor((x - 1) / 25); - } - return out; -} - function isEmptyParagraph(node: any): boolean { return ( !!node && @@ -98,6 +72,36 @@ function emptyDefinition(id: string): any { }; } +/** + * Order-insensitive deep equality over plain JSON (objects/arrays/primitives). + * Used to detect an already-canonical footnotesList so its physical position is + * preserved (placement parity with the live plugin). + */ +function deepEqualJson(a: any, b: any): boolean { + if (a === b) return true; + if (a == null || b == null || typeof a !== typeof b) return false; + if (Array.isArray(a) || Array.isArray(b)) { + if (!Array.isArray(a) || !Array.isArray(b) || a.length !== b.length) { + return false; + } + for (let i = 0; i < a.length; i++) { + if (!deepEqualJson(a[i], b[i])) return false; + } + return true; + } + if (typeof a === "object") { + const ka = Object.keys(a); + const kb = Object.keys(b); + if (ka.length !== kb.length) return false; + for (const k of ka) { + if (!Object.prototype.hasOwnProperty.call(b, k)) return false; + if (!deepEqualJson(a[k], b[k])) return false; + } + return true; + } + return false; +} + /** * Canonicalize footnotes in a ProseMirror-JSON document. See the file header and * the editor-ext twin for the full contract. Pure (deep-clones input, @@ -113,131 +117,72 @@ export function canonicalizeFootnotes(doc: T): T { } const out = cloneJson(doc) as any; + // 1) Distinct reference ids in document order (deep — refs can live in + // callouts, tables, list items, ...). The ordering/numbering truth. const referenceIds: string[] = []; collectReferenceIds(out, referenceIds, new Set()); + // 2) Every definition node in document order (deep). const defNodes: any[] = []; collectDefinitions(out, defNodes); - const taken = new Set(referenceIds); + // 3) First definition per id wins; later duplicates carry the SAME id, so they + // cannot be referenced separately and would be orphans — they are dropped. + const defById = new Map(); for (const d of defNodes) { const id = d?.attrs?.id; - if (id) taken.add(id); - } - const occurrenceOf = new Map(); - const seenDefIds = new Set(); - const defByFinalId = new Map(); - for (const d of defNodes) { - const origId = d?.attrs?.id; - if (!origId) continue; - if (!seenDefIds.has(origId)) { - seenDefIds.add(origId); - defByFinalId.set(origId, d); - } else { - const next = (occurrenceOf.get(origId) ?? 1) + 1; - occurrenceOf.set(origId, next); - const newId = deriveFootnoteId(origId, next, taken); - taken.add(newId); - defByFinalId.set(newId, d); - } + if (id && !defById.has(id)) defById.set(id, d); } + // 4) Build the ordered definition list: one per referenced id, in REFERENCE + // order, reusing the existing node (shallow-copied, id normalized — `out` is + // already deep-cloned and the old lists are cut) or synthesizing an empty + // one. Definitions whose id is not referenced are orphans and never added. const orderedDefs: any[] = []; for (const id of referenceIds) { - const existing = defByFinalId.get(id); + const existing = defById.get(id); if (existing) { - const node = cloneJson(existing); - node.attrs = { ...(node.attrs ?? {}), id }; - orderedDefs.push(node); + orderedDefs.push({ + ...existing, + attrs: { ...(existing.attrs ?? {}), id }, + }); } else { orderedDefs.push(emptyDefinition(id)); } } - const top: any[] = out.content.filter( - (n: any) => !(n && n.type === FOOTNOTES_LIST_NAME), - ); - + // 5) No references -> there must be NO list at all. if (referenceIds.length === 0) { - out.content = top; + out.content = out.content.filter( + (n: any) => !(n && n.type === FOOTNOTES_LIST_NAME), + ); return out; } + // 6) Placement parity with the live plugin: when the document is ALREADY in the + // canonical single-list state, leave that list exactly where it sits rather + // than cutting and re-inserting it at the end (the plugin never repositions a + // sole correct list, so moving it would silently reorder any content that + // follows the list on the first write). + const topLevelLists = out.content.filter( + (n: any) => n && n.type === FOOTNOTES_LIST_NAME, + ); + if ( + topLevelLists.length === 1 && + defNodes.length === orderedDefs.length && + deepEqualJson(topLevelLists[0].content, orderedDefs) + ) { + return out; + } + + // 7) Otherwise rebuild: strip every footnotesList and re-insert exactly one + // after the last meaningful (non-empty paragraph) block. + const top: any[] = out.content.filter( + (n: any) => !(n && n.type === FOOTNOTES_LIST_NAME), + ); let insertAt = top.length; while (insertAt > 0 && isEmptyParagraph(top[insertAt - 1])) insertAt--; top.splice(insertAt, 0, { type: FOOTNOTES_LIST_NAME, content: orderedDefs }); out.content = top; return out; } - -/** - * Normalized content key for de-duplicating footnote DEFINITIONS by their text. - * - * Two definitions with the same key are the SAME footnote — so the inline - * authoring tool reuses one id (one number, one definition, several references) - * instead of minting a second definition. Key = plaintext (whitespace-collapsed, - * trimmed) PLUS a signature of the inline mark types in order, so two notes that - * read the same but differ in formatting (one bold, one plain) are NOT merged. - * Conservative: only an exact match merges. - */ -export function footnoteContentKey(defNode: any): string { - const parts: string[] = []; - const visit = (n: any): void => { - if (!n || typeof n !== "object") return; - if (n.type === "text" && typeof n.text === "string") { - const marks = Array.isArray(n.marks) - ? n.marks.map((m: any) => m?.type).filter(Boolean).sort().join(",") - : ""; - parts.push(`${n.text}${marks}`); - } - if (Array.isArray(n.content)) for (const c of n.content) visit(c); - }; - visit(defNode); - // Collapse the assembled text's whitespace and trim, keeping the mark - // signature attached so formatting differences still distinguish notes. - return parts - .join("") - .replace(/[ \t\r\n]+/g, " ") - .trim(); -} - -/** - * Build a footnoteDefinition node from inline ProseMirror nodes, keyed by id. - */ -export function makeFootnoteDefinition(id: string, inlineNodes: any[]): any { - const content = Array.isArray(inlineNodes) ? cloneJson(inlineNodes) : []; - return { - type: FOOTNOTE_DEFINITION_NAME, - attrs: { id }, - content: [{ type: "paragraph", content }], - }; -} - -/** - * Generate a uuidv7-style id (time-ordered), matching editor-ext's - * `generateFootnoteId`. Used for a genuinely-new inline footnote id. - */ -export function generateFootnoteId(): string { - const now = Date.now(); - const timeHex = now.toString(16).padStart(12, "0"); - const rand = (length: number) => { - let s = ""; - for (let i = 0; i < length; i++) - s += Math.floor(Math.random() * 16).toString(16); - return s; - }; - const versioned = "7" + rand(3); - const variantNibble = (8 + Math.floor(Math.random() * 4)).toString(16); - const variant = variantNibble + rand(3); - return ( - timeHex.slice(0, 8) + - "-" + - timeHex.slice(8, 12) + - "-" + - versioned + - "-" + - variant + - "-" + - rand(12) - ); -} diff --git a/packages/mcp/src/lib/transforms.ts b/packages/mcp/src/lib/transforms.ts index 5c595f86..65313d49 100644 --- a/packages/mcp/src/lib/transforms.ts +++ b/packages/mcp/src/lib/transforms.ts @@ -15,12 +15,12 @@ */ import { blockPlainText } from "./node-ops.js"; +import { canonicalizeFootnotes } from "./footnote-canonicalize.js"; import { - canonicalizeFootnotes, footnoteContentKey, makeFootnoteDefinition, generateFootnoteId, -} from "./footnote-canonicalize.js"; +} from "./footnote-authoring.js"; export { canonicalizeFootnotes } from "./footnote-canonicalize.js"; @@ -113,6 +113,30 @@ export function insertMarkerAfter( anchor: string, marker: string, opts: InsertMarkerOptions = {}, +): { doc: any; inserted: boolean } { + // A plain marker is a leading-space-padded unmarked text run. + return insertNodesAfterAnchor( + doc, + anchor, + () => [{ type: "text", text: " " + marker }], + opts, + ); +} + +/** + * Mark-safe insertion CORE: split the inline text run that holds the END of + * `anchor` (preserving the surrounding marks) and splice the nodes produced by + * `makeMiddle()` in at the split point. `insertMarkerAfter` (plain text marker) + * and `insertInlineFootnote` (a `footnoteReference` node) are both thin callers — + * the only difference is WHAT is inserted (a space-padded text run vs. a node + * that should hug the preceding word), which is exactly what `makeMiddle` + * decides. Operates on a clone; returns `{ doc, inserted }`. + */ +function insertNodesAfterAnchor( + doc: any, + anchor: string, + makeMiddle: () => any[], + opts: InsertMarkerOptions = {}, ): { doc: any; inserted: boolean } { const out = clone(doc); if (!isObject(out) || !Array.isArray(out.content) || !anchor) { @@ -174,8 +198,9 @@ export function insertMarkerAfter( if (before.length > 0) { parts.push({ ...n, text: before, marks: [...marks] }); } - // Marker is a PLAIN run: no marks copied. Leading space separates it. - parts.push({ type: "text", text: " " + marker }); + // The inserted nodes are caller-decided (a space-padded marker run, + // or a node that hugs the word). They carry no copied marks. + parts.push(...makeMiddle()); if (after.length > 0) { parts.push({ ...n, text: after, marks: [...marks] }); } @@ -587,9 +612,6 @@ export interface InsertInlineFootnoteResult { reused: boolean; } -/** A NUL-delimited sentinel that cannot occur in real prose. */ -const INLINE_FOOTNOTE_SENTINEL = "\u0000IFN\u0000"; - /** * AUTHOR-INLINE footnote insertion. The caller supplies WHERE (anchorText) and * WHAT (markdown text); numbering and the bottom list are derived server-side by @@ -603,10 +625,10 @@ const INLINE_FOOTNOTE_SENTINEL = "\u0000IFN\u0000"; * minted and a new definition added. Conservative — only an exact content match * merges. * - * Mechanics: the marker is inserted with the same mark-safe `insertMarkerAfter` - * split used elsewhere, via a sentinel that is then replaced by a real - * `footnoteReference` node (dropping the inserted leading space so the marker - * attaches to the preceding word). The whole document is then canonicalized. + * Mechanics: the `footnoteReference` node is inserted DIRECTLY at the anchor via + * the same mark-safe split as `insertMarkerAfter` (the shared + * `insertNodesAfterAnchor` core), so it hugs the preceding word with no text + * sentinel round-trip. The whole document is then canonicalized. * * Operates on a clone of `doc`. When the anchor is not found, returns the input * unchanged with `inserted:false`. @@ -639,16 +661,18 @@ export function insertInlineFootnote( } if (footnoteId == null) footnoteId = generateFootnoteId(); - // Insert a sentinel marker after the anchor (mark-safe split). - const r = insertMarkerAfter(doc, (opts.anchorText ?? "").trimEnd(), INLINE_FOOTNOTE_SENTINEL); + // Insert the footnoteReference node directly after the anchor (mark-safe + // split); it hugs the preceding word with no leading space. + const r = insertNodesAfterAnchor( + doc, + (opts.anchorText ?? "").trimEnd(), + () => [{ type: "footnoteReference", attrs: { id: footnoteId } }], + ); if (!r.inserted) { return { doc: clone(doc), inserted: false, footnoteId, reused }; } let working = r.doc; - // Replace the sentinel run with a real footnoteReference node. - replaceSentinelWithReference(working, footnoteId); - // Add a NEW definition (canonicalize will order/place it); a reused id needs // no new definition (the existing one is shared). if (!reused) { @@ -660,47 +684,6 @@ export function insertInlineFootnote( return { doc: working, inserted: true, footnoteId, reused }; } -/** - * Replace the lone sentinel text run (created by insertMarkerAfter as - * `" " + sentinel`) with a footnoteReference node, dropping the leading space so - * the marker attaches to the preceding word. Mutates `doc` in place. - */ -function replaceSentinelWithReference(doc: any, footnoteId: string): void { - let done = false; - const visit = (container: any): void => { - if (done || !isObject(container) || !Array.isArray(container.content)) return; - const arr = container.content; - for (let i = 0; i < arr.length; i++) { - const n = arr[i]; - if ( - isObject(n) && - n.type === "text" && - typeof n.text === "string" && - n.text.includes(INLINE_FOOTNOTE_SENTINEL) - ) { - const idx = n.text.indexOf(INLINE_FOOTNOTE_SENTINEL); - // Text before the sentinel, with a single trailing space (the one - // insertMarkerAfter prepended) stripped so the ref hugs the word. - const before = n.text.slice(0, idx).replace(/ $/, ""); - const after = n.text.slice(idx + INLINE_FOOTNOTE_SENTINEL.length); - const marks = Array.isArray(n.marks) ? n.marks : []; - const parts: any[] = []; - if (before.length > 0) parts.push({ ...n, text: before, marks: [...marks] }); - parts.push({ type: "footnoteReference", attrs: { id: footnoteId } }); - if (after.length > 0) parts.push({ ...n, text: after, marks: [...marks] }); - arr.splice(i, 1, ...parts); - done = true; - return; - } - } - for (const child of arr) { - visit(child); - if (done) return; - } - }; - visit(doc); -} - /** * Append a definition node so the canonicalizer can order/place it: into the * first existing footnotesList, or a new trailing list when none exists. diff --git a/packages/mcp/test/mock/footnote-write.test.mjs b/packages/mcp/test/mock/footnote-write.test.mjs new file mode 100644 index 00000000..d013d7a3 --- /dev/null +++ b/packages/mcp/test/mock/footnote-write.test.mjs @@ -0,0 +1,152 @@ +// Mock-HTTP orchestration tests for the footnote WRITE wrappers on DocmostClient +// (issue #228): +// - insertFootnote (#11): the required-argument guards reject BEFORE any write, +// and never touch the collab/mutate path. +// - transformPage / docmost_transform (#13): the auto-canonicalize step +// (`result = canonicalizeFootnotes(raw)`) runs after every transform, so a +// transform that introduces an orphan footnote definition is silently tidied +// away — observable as an EMPTY diff in a dryRun preview. +// +// These stand a local http.createServer in for Docmost and only exercise plain +// HTTP routes (login / comments / pages.info), deliberately avoiding the live +// Hocuspocus collab WebSocket: the insertFootnote guards short-circuit before it, +// and docmost_transform's dryRun preview never opens it. The full collab mutate +// path (abort-via-throw on a missing anchor, the reused/message response branch) +// is covered at the pure level by insertInlineFootnote in +// test/unit/footnote-canonicalize.test.mjs. +import { test, after } from "node:test"; +import assert from "node:assert/strict"; +import http from "node:http"; +import { DocmostClient } from "../../build/client.js"; + +function readBody(req) { + return new Promise((resolve) => { + let raw = ""; + req.on("data", (c) => (raw += c)); + req.on("end", () => resolve(raw)); + }); +} +function startServer(handler) { + return new Promise((resolve) => { + const server = http.createServer(handler); + server.listen(0, "127.0.0.1", () => { + const { port } = server.address(); + resolve({ server, baseURL: `http://127.0.0.1:${port}/api` }); + }); + }); +} +function sendJson(res, status, obj, extraHeaders = {}) { + res.writeHead(status, { "Content-Type": "application/json", ...extraHeaders }); + res.end(JSON.stringify(obj)); +} +const openServers = []; +async function spawn(handler) { + const { server, baseURL } = await startServer(handler); + openServers.push(server); + return { baseURL }; +} +after(async () => { + await Promise.all(openServers.map((s) => new Promise((r) => s.close(r)))); +}); + +const ref = (id) => ({ type: "footnoteReference", attrs: { id } }); +const def = (id, text) => ({ + type: "footnoteDefinition", + attrs: { id }, + content: [{ type: "paragraph", content: [{ type: "text", text }] }], +}); + +// --------------------------------------------------------------------------- +// #11 insertFootnote guards: missing anchorText / text reject and never write. +// --------------------------------------------------------------------------- +test("insertFootnote rejects a missing anchorText before any write", async () => { + const otherRoutes = []; + const { baseURL } = await spawn(async (req, res) => { + await readBody(req); + if (req.url === "/api/auth/login") { + return sendJson(res, 200, { success: true }, { + "Set-Cookie": "authToken=t; Path=/; HttpOnly", + }); + } + otherRoutes.push(req.url); + sendJson(res, 404, { message: "not found" }); + }); + const client = new DocmostClient(baseURL, "user@example.com", "pw"); + await assert.rejects( + () => client.insertFootnote("page-1", " ", "a note"), + /anchorText is required/i, + ); + assert.deepEqual(otherRoutes, [], "must not hit any write route"); +}); + +test("insertFootnote rejects an empty text before any write", async () => { + const otherRoutes = []; + const { baseURL } = await spawn(async (req, res) => { + await readBody(req); + if (req.url === "/api/auth/login") { + return sendJson(res, 200, { success: true }, { + "Set-Cookie": "authToken=t; Path=/; HttpOnly", + }); + } + otherRoutes.push(req.url); + sendJson(res, 404, { message: "not found" }); + }); + const client = new DocmostClient(baseURL, "user@example.com", "pw"); + await assert.rejects( + () => client.insertFootnote("page-1", "anchor", " "), + /text is required/i, + ); + assert.deepEqual(otherRoutes, [], "must not hit any write route"); +}); + +// --------------------------------------------------------------------------- +// #13 docmost_transform auto-canonicalization: a transform that adds an orphan +// footnote definition produces NO net change (the canonicalizer drops it), so a +// dryRun preview reports an empty diff. Without the auto-canonicalize step the +// orphan would survive and the diff would be non-empty. +// --------------------------------------------------------------------------- +test("transformPage dryRun auto-canonicalizes footnotes (orphan def is dropped)", async () => { + // A page already in canonical footnote state (refs b,a; defs b,a). + const pageContent = { + type: "doc", + content: [ + { type: "paragraph", content: [{ type: "text", text: "x" }, ref("b"), ref("a")] }, + { type: "footnotesList", content: [def("b", "B"), def("a", "A")] }, + ], + }; + const { baseURL } = await spawn(async (req, res) => { + await readBody(req); + if (req.url === "/api/auth/login") { + return sendJson(res, 200, { success: true }, { + "Set-Cookie": "authToken=t; Path=/; HttpOnly", + }); + } + if (req.url === "/api/comments") { + return sendJson(res, 200, { data: { items: [], meta: { nextCursor: null } } }); + } + if (req.url === "/api/pages/info") { + return sendJson(res, 200, { + data: { id: "page-1", slugId: "s", title: "P", spaceId: "sp", content: pageContent }, + }); + } + sendJson(res, 404, { message: "not found" }); + }); + const client = new DocmostClient(baseURL, "user@example.com", "pw"); + + // The transform appends an ORPHAN definition (id "z", no matching reference). + const transformJs = `(doc) => { + const list = doc.content.find((n) => n.type === "footnotesList"); + list.content.push({ + type: "footnoteDefinition", + attrs: { id: "z" }, + content: [{ type: "paragraph", content: [{ type: "text", text: "orphan" }] }], + }); + return doc; + }`; + + const result = await client.transformPage("page-1", transformJs, { dryRun: true }); + assert.equal(result.pushed, false); + // Auto-canonicalize dropped the orphan, so the doc is unchanged => empty diff. + assert.equal(result.diff.summary.inserted, 0, "orphan def must be canonicalized away"); + assert.equal(result.diff.summary.deleted, 0); +}); diff --git a/packages/mcp/test/unit/footnote-canonicalize.test.mjs b/packages/mcp/test/unit/footnote-canonicalize.test.mjs index c2dd3005..d25a265b 100644 --- a/packages/mcp/test/unit/footnote-canonicalize.test.mjs +++ b/packages/mcp/test/unit/footnote-canonicalize.test.mjs @@ -1,10 +1,8 @@ import { test } from "node:test"; import assert from "node:assert/strict"; -import { - canonicalizeFootnotes, - footnoteContentKey, -} from "../../build/lib/footnote-canonicalize.js"; +import { canonicalizeFootnotes } from "../../build/lib/footnote-canonicalize.js"; +import { footnoteContentKey } from "../../build/lib/footnote-authoring.js"; import { insertInlineFootnote } from "../../build/lib/transforms.js"; import { markdownToProseMirror } from "../../build/lib/collaboration.js"; diff --git a/packages/mcp/test/unit/footnote-corpus.mjs b/packages/mcp/test/unit/footnote-corpus.mjs new file mode 100644 index 00000000..3a213491 --- /dev/null +++ b/packages/mcp/test/unit/footnote-corpus.mjs @@ -0,0 +1,1164 @@ +// MIRROR (data only) of +// packages/editor-ext/src/lib/footnote/footnote-corpus.ts — keep the two in +// sync. Shared golden corpus for the footnote canonicalizer (issue #228): each +// case is { name, input, expected } where `expected` is exactly what +// `canonicalizeFootnotes(input)` must return. Running BOTH the editor-ext copy +// and this MCP mirror against the same corpus makes "the two pure copies behave +// identically" a checkable property without coupling the packages. +export const FOOTNOTE_CORPUS = [ + { + "name": "out-of-order defs ordered by first reference", + "input": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "x" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "b" + } + }, + { + "type": "footnoteReference", + "attrs": { + "id": "a" + } + }, + { + "type": "footnoteReference", + "attrs": { + "id": "d" + } + }, + { + "type": "footnoteReference", + "attrs": { + "id": "c" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "a" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "A" + } + ] + } + ] + }, + { + "type": "footnoteDefinition", + "attrs": { + "id": "c" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "C" + } + ] + } + ] + }, + { + "type": "footnoteDefinition", + "attrs": { + "id": "b" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "B" + } + ] + } + ] + }, + { + "type": "footnoteDefinition", + "attrs": { + "id": "d" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "D" + } + ] + } + ] + } + ] + } + ] + }, + "expected": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "x" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "b" + } + }, + { + "type": "footnoteReference", + "attrs": { + "id": "a" + } + }, + { + "type": "footnoteReference", + "attrs": { + "id": "d" + } + }, + { + "type": "footnoteReference", + "attrs": { + "id": "c" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "b" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "B" + } + ] + } + ] + }, + { + "type": "footnoteDefinition", + "attrs": { + "id": "a" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "A" + } + ] + } + ] + }, + { + "type": "footnoteDefinition", + "attrs": { + "id": "d" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "D" + } + ] + } + ] + }, + { + "type": "footnoteDefinition", + "attrs": { + "id": "c" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "C" + } + ] + } + ] + } + ] + } + ] + } + }, + { + "name": "orphan definition dropped", + "input": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "x" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "a" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "a" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "A" + } + ] + } + ] + }, + { + "type": "footnoteDefinition", + "attrs": { + "id": "orphan" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "O" + } + ] + } + ] + } + ] + } + ] + }, + "expected": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "x" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "a" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "a" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "A" + } + ] + } + ] + } + ] + } + ] + } + }, + { + "name": "no references removes the list", + "input": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "plain" + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "orphan" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "O" + } + ] + } + ] + } + ] + } + ] + }, + "expected": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "plain" + } + ] + } + ] + } + }, + { + "name": "reuse: repeated references collapse to one definition", + "input": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "footnoteReference", + "attrs": { + "id": "d" + } + }, + { + "type": "text", + "text": " a " + }, + { + "type": "footnoteReference", + "attrs": { + "id": "d" + } + }, + { + "type": "footnoteReference", + "attrs": { + "id": "d" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "d" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "shared" + } + ] + } + ] + } + ] + } + ] + }, + "expected": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "footnoteReference", + "attrs": { + "id": "d" + } + }, + { + "type": "text", + "text": " a " + }, + { + "type": "footnoteReference", + "attrs": { + "id": "d" + } + }, + { + "type": "footnoteReference", + "attrs": { + "id": "d" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "d" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "shared" + } + ] + } + ] + } + ] + } + ] + } + }, + { + "name": "duplicate definitions: first wins", + "input": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "x" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "d" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "d" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "first" + } + ] + } + ] + }, + { + "type": "footnoteDefinition", + "attrs": { + "id": "d" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "second" + } + ] + } + ] + }, + { + "type": "footnoteDefinition", + "attrs": { + "id": "d" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "third" + } + ] + } + ] + } + ] + } + ] + }, + "expected": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "x" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "d" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "d" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "first" + } + ] + } + ] + } + ] + } + ] + } + }, + { + "name": "synthesizes an empty definition for a reference with none", + "input": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "x" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "missing" + } + } + ] + } + ] + }, + "expected": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "x" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "missing" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "missing" + }, + "content": [ + { + "type": "paragraph" + } + ] + } + ] + } + ] + } + }, + { + "name": "merges multiple footnotesList nodes into one", + "input": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "a" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "x" + } + }, + { + "type": "footnoteReference", + "attrs": { + "id": "y" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "x" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "X" + } + ] + } + ] + } + ] + }, + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "tail" + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "y" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "Y" + } + ] + } + ] + } + ] + } + ] + }, + "expected": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "a" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "x" + } + }, + { + "type": "footnoteReference", + "attrs": { + "id": "y" + } + } + ] + }, + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "tail" + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "x" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "X" + } + ] + } + ] + }, + { + "type": "footnoteDefinition", + "attrs": { + "id": "y" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "Y" + } + ] + } + ] + } + ] + } + ] + } + }, + { + "name": "single canonical list before a trailing empty paragraph stays put", + "input": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "x" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "a" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "a" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "A" + } + ] + } + ] + } + ] + }, + { + "type": "paragraph" + } + ] + }, + "expected": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "x" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "a" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "a" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "A" + } + ] + } + ] + } + ] + }, + { + "type": "paragraph" + } + ] + } + }, + { + "name": "single canonical list with NON-EMPTY content after it is NOT moved (plugin parity)", + "input": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "x" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "a" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "a" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "A" + } + ] + } + ] + } + ] + }, + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "epilogue text" + } + ] + } + ] + }, + "expected": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "x" + }, + { + "type": "footnoteReference", + "attrs": { + "id": "a" + } + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "a" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "A" + } + ] + } + ] + } + ] + }, + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "epilogue text" + } + ] + } + ] + } + }, + { + "name": "reference inside a nested container (callout) is collected", + "input": { + "type": "doc", + "content": [ + { + "type": "callout", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "see " + }, + { + "type": "footnoteReference", + "attrs": { + "id": "n" + } + } + ] + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "n" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "note" + } + ] + } + ] + } + ] + } + ] + }, + "expected": { + "type": "doc", + "content": [ + { + "type": "callout", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "see " + }, + { + "type": "footnoteReference", + "attrs": { + "id": "n" + } + } + ] + } + ] + }, + { + "type": "footnotesList", + "content": [ + { + "type": "footnoteDefinition", + "attrs": { + "id": "n" + }, + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "note" + } + ] + } + ] + } + ] + } + ] + } + }, + { + "name": "no footnotes at all is unchanged", + "input": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "just text" + } + ] + } + ] + }, + "expected": { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "text": "just text" + } + ] + } + ] + } + } +]; diff --git a/packages/mcp/test/unit/footnote-corpus.test.mjs b/packages/mcp/test/unit/footnote-corpus.test.mjs new file mode 100644 index 00000000..c58fa02a --- /dev/null +++ b/packages/mcp/test/unit/footnote-corpus.test.mjs @@ -0,0 +1,19 @@ +// Runs the MCP mirror of `canonicalizeFootnotes` against the SHARED golden +// corpus (the same { input -> expected } cases the editor-ext copy is tested +// against in footnote-canonicalize.test.ts). Pinning identical expected outputs +// in both suites makes "the editor-ext copy and the MCP mirror behave +// identically" a checkable property without coupling the two packages +// (architecture item A). The corpus data is mirrored in footnote-corpus.mjs. +import { test } from "node:test"; +import assert from "node:assert/strict"; + +import { canonicalizeFootnotes } from "../../build/lib/footnote-canonicalize.js"; +import { FOOTNOTE_CORPUS } from "./footnote-corpus.mjs"; + +for (const { name, input, expected } of FOOTNOTE_CORPUS) { + test(`shared corpus (MCP mirror): ${name}`, () => { + assert.deepEqual(canonicalizeFootnotes(input), expected); + // Idempotent on the corpus too. + assert.deepEqual(canonicalizeFootnotes(expected), expected); + }); +}