Footnotes were strict 1:1: a repeated `[^a]` reference was treated as a collision and re-id'd to `a__2`, and a reference with no definition synthesized its own empty one — so an agent-authored article with reused labels produced dozens of empty `kowiki__N` footnotes. Move to Pandoc REUSE semantics and add non-fatal import diagnostics. Reuse (core): - resolveCollisions (footnote-sync): repeated references sharing an id are REUSE (recorded once in document order, never re-id'd) — one number, one shared definition. Only a duplicate DEFINITION is re-id'd deterministically and, with no matching reference, dropped by the existing orphan policy (first-wins). CollisionPlan.refReids is now always empty (harmless no-op downstream). - extractFootnoteDefinitions (marked) and extractFootnotes (MCP): duplicate definition ids are FIRST-WINS (keep first, drop rest); reference markers are never rewritten. Removed the marker-rewriting and the now-dead deriveFootnoteId mirror + helpers from the MCP path. Import diagnostics: - New analyzeFootnotes() (MCP): fence-aware pure scan reporting dangling references, empty/duplicate definitions and `[^id]` markers inside table rows. - createPage / updatePage / importPageMarkdown now attach `footnoteWarnings` (only when non-empty) so an agent can fix its markup; the page is still created. Paste-reuse: - footnotePastePlugin remaps only ids the pasted slice DEFINES (a colliding definition); a pasted lone reference to an existing id keeps it (reuse). Tests: reuse/first-wins rewrites of footnote.test, footnote-markdown.test, footnote.marked.orphan.test and the MCP footnotes.test; new footnote-paste.test (editor-ext) and footnote-analyze.test (MCP). Deleted derive-id-parity.test.mjs (the MCP no longer derives ids; editor-ext's deriveFootnoteId keeps its own golden test). editor-ext 128, MCP 299, server roundtrip 2, client views 3, client+server tsc clean. Two review suggestions applied: corrected a stale "duplicated in MCP" comment and the dangling-reference warning wording. Note: the multi-backlink editor UI (a reused definition linking back to each of its references) is deferred to a follow-up — this PR delivers the data-integrity core (reuse + warnings + paste-reuse). Forward links and numbering already reuse correctly; the backlink currently targets the first reference. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
132 lines
4.8 KiB
TypeScript
132 lines
4.8 KiB
TypeScript
import { describe, it, expect } from "vitest";
|
|
import { htmlToMarkdown } from "../markdown/utils/turndown.utils";
|
|
import { markdownToHtml } from "../markdown/utils/marked.utils";
|
|
import { extractFootnoteDefinitions } from "../markdown/utils/footnote.marked";
|
|
|
|
// HTML the editor-ext nodes render (sup[data-footnote-ref], section/div).
|
|
const HTML =
|
|
`<p>Water<sup data-footnote-ref data-id="fn1"></sup> and clay<sup data-footnote-ref data-id="fn2"></sup>.</p>` +
|
|
`<section data-footnotes>` +
|
|
`<div data-footnote-def data-id="fn1"><p>First note.</p></div>` +
|
|
`<div data-footnote-def data-id="fn2"><p>Second note.</p></div>` +
|
|
`</section>`;
|
|
|
|
describe("footnote markdown round-trip", () => {
|
|
it("HTML -> Markdown produces pandoc footnote syntax", () => {
|
|
const md = htmlToMarkdown(HTML);
|
|
expect(md).toContain("[^fn1]");
|
|
expect(md).toContain("[^fn2]");
|
|
expect(md).toContain("[^fn1]: First note.");
|
|
expect(md).toContain("[^fn2]: Second note.");
|
|
});
|
|
|
|
it("Markdown -> HTML rebuilds the footnote nodes' HTML", async () => {
|
|
const md = htmlToMarkdown(HTML);
|
|
const html = await markdownToHtml(md);
|
|
expect(html).toContain('data-footnote-ref data-id="fn1"');
|
|
expect(html).toContain('data-footnote-ref data-id="fn2"');
|
|
expect(html).toContain("data-footnotes");
|
|
expect(html).toContain('data-footnote-def data-id="fn1"');
|
|
expect(html).toContain("First note.");
|
|
expect(html).toContain("Second note.");
|
|
});
|
|
|
|
it("preserves a [^id]: line shown inside a fenced code block (not a definition)", async () => {
|
|
// A document that DOCUMENTS footnote syntax inside a code fence. The
|
|
// `[^demo]: ...` line is example text, not a real definition, and must
|
|
// survive the Markdown -> HTML conversion verbatim.
|
|
const md = [
|
|
"Here is how footnotes look:",
|
|
"",
|
|
"```markdown",
|
|
"Some text[^demo]",
|
|
"",
|
|
"[^demo]: this is the definition",
|
|
"```",
|
|
"",
|
|
"End of doc.",
|
|
].join("\n");
|
|
|
|
const html = await markdownToHtml(md);
|
|
// The example definition line is kept inside the rendered code block.
|
|
expect(html).toContain("[^demo]: this is the definition");
|
|
// It did NOT get pulled out into a real footnotes section.
|
|
expect(html).not.toContain("data-footnotes");
|
|
expect(html).not.toContain("data-footnote-def");
|
|
});
|
|
|
|
it("extractFootnoteDefinitions keeps the FIRST duplicate definition and reuses markers", () => {
|
|
// Two definitions share id `d`, and the body has two `[^d]` markers. Under
|
|
// the import model (#166) duplicate definition ids are FIRST-WINS: only the
|
|
// first definition is kept; markers are NEVER rewritten, so the two `[^d]`
|
|
// references reuse the single footnote.
|
|
const md = [
|
|
"See here[^d] and there[^d].",
|
|
"",
|
|
"[^d]: first",
|
|
"[^d]: second",
|
|
].join("\n");
|
|
|
|
const { body, section } = extractFootnoteDefinitions(md);
|
|
|
|
const defIds = Array.from(
|
|
section.matchAll(/data-footnote-def data-id="([^"]+)"/g),
|
|
).map((m) => m[1]);
|
|
expect(defIds).toEqual(["d"]); // first-wins: one definition
|
|
expect(section).toContain("first");
|
|
expect(section).not.toContain("second"); // duplicate dropped
|
|
|
|
// Both markers stay `[^d]` (reuse) — no `d__2` minting.
|
|
const refIds = Array.from(body.matchAll(/\[\^([^\]\s]+)\]/g)).map(
|
|
(m) => m[1],
|
|
);
|
|
expect(refIds).toEqual(["d", "d"]);
|
|
});
|
|
|
|
it("extractFootnoteDefinitions is DETERMINISTIC and stable (same input -> same output)", () => {
|
|
// The output must be a pure function of the input markdown so importing the
|
|
// same source twice (or via the editor and the MCP mirror) is identical.
|
|
const md = [
|
|
"See[^d] one[^d] two[^d].",
|
|
"",
|
|
"[^d]: first",
|
|
"[^d]: second",
|
|
"[^d]: third",
|
|
].join("\n");
|
|
|
|
const run = () => {
|
|
const { body, section } = extractFootnoteDefinitions(md);
|
|
const defIds = Array.from(
|
|
section.matchAll(/data-footnote-def data-id="([^"]+)"/g),
|
|
).map((m) => m[1]);
|
|
const refIds = Array.from(body.matchAll(/\[\^([^\]\s]+)\]/g)).map(
|
|
(m) => m[1],
|
|
);
|
|
return { defIds, refIds };
|
|
};
|
|
|
|
const a = run();
|
|
const b = run();
|
|
expect(a).toEqual(b);
|
|
// First-wins: one kept definition `d`; all three reuse markers stay `d`.
|
|
expect(a.defIds).toEqual(["d"]);
|
|
expect(a.refIds).toEqual(["d", "d", "d"]);
|
|
});
|
|
|
|
it("markdownToHtml with a reused id renders ONE shared footnote def", async () => {
|
|
const md = [
|
|
"See here[^d] and there[^d].",
|
|
"",
|
|
"[^d]: first",
|
|
"[^d]: second",
|
|
].join("\n");
|
|
const html = await markdownToHtml(md);
|
|
const defIds = Array.from(
|
|
html.matchAll(/data-footnote-def data-id="([^"]+)"/g),
|
|
).map((m) => m[1]);
|
|
expect(defIds).toEqual(["d"]); // one shared definition
|
|
expect(html).toContain("first");
|
|
expect(html).not.toContain("second");
|
|
});
|
|
});
|