test(git-sync): exhaustive converter coverage + fix 3 round-trip data-loss bugs
Coder↔reviewer design loop (9 rounds, reviewer verdict: exhaustive) produced 92 specs; implemented +123 tests (465 -> 588 passing). The new round-trip coverage exposed three genuine data-loss bugs in the Markdown<->ProseMirror converter, all now FIXED (round-trip is lossless for these): 1. pageBreak was lost on export (no converter case -> rendered to "" and the node vanished). Now emits <div data-type="pageBreak"></div>, which the schema parses back -> round-trips. 2. A block image between blocks left an empty <p> artifact after import-hoisting, producing a phantom blank-gap diff on every sync. markdownToProseMirror now strips content-less paragraphs after generateJSON — with a schema-validity guard that keeps the obligatory single empty paragraph in `content: "block+"` containers (tableCell/tableHeader/blockquote/column/callout/doc), so empty cells/quotes never become an invalid `content: []`. 3. The `code` mark combined with another mark was not byte-stable (emitted nested HTML that the schema's `code` `excludes:"_"` collapsed on import). The converter now emits code-only when `code` co-occurs, matching the editor. New coverage spans media/diagram/details/columns/math/mention attribute round-trips, converter emission branches, git error paths, and engine decision branches. A dedicated test pins the empty-container schema validity (the review catch on the bug-2 fix). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -68,21 +68,21 @@ export function convertProseMirrorToMarkdown(content: any): string {
|
||||
let textContent = node.text || "";
|
||||
// Apply marks (bold, italic, code, etc.)
|
||||
if (node.marks) {
|
||||
// Markdown code spans (`...`) cannot carry inner formatting, so when a
|
||||
// run has the `code` mark alongside ANY other mark, backtick syntax
|
||||
// would leak literal ** / []() into the code text. In that case emit
|
||||
// nested HTML (<code> innermost, the other marks wrapping it as HTML)
|
||||
// so the output is at least well-formed and re-parseable.
|
||||
//
|
||||
// NOTE: this does NOT round-trip both marks. The schema's `code` mark
|
||||
// has `excludes: "_"` (it excludes every other mark), so on import the
|
||||
// co-occurring mark is always dropped — the run comes back as `code`
|
||||
// only. We keep the emission simple and accept that the other mark is
|
||||
// lost; preserving both is impossible while `code` excludes them.
|
||||
// Only use the backtick form when `code` is the sole mark.
|
||||
// The schema's `code` mark declares `excludes: "_"` — it excludes every
|
||||
// other inline mark — so the editor can NEVER produce a text run that
|
||||
// carries `code` together with another mark, and on import any
|
||||
// co-occurring mark is always dropped (the run comes back as code-only).
|
||||
// The lossless, byte-stable behavior is therefore: when a run has the
|
||||
// `code` mark, emit ONLY the backtick code span and ignore every other
|
||||
// mark, so md1 is already code-only and md2 === md1. Runs WITHOUT a code
|
||||
// mark are rendered exactly as before.
|
||||
const markTypes = node.marks.map((m: any) => m.type);
|
||||
const hasCode = markTypes.includes("code");
|
||||
const codeCombined = hasCode && markTypes.length > 1;
|
||||
if (hasCode) {
|
||||
textContent = `\`${textContent}\``;
|
||||
return textContent;
|
||||
}
|
||||
const codeCombined = false;
|
||||
for (const mark of node.marks) {
|
||||
switch (mark.type) {
|
||||
case "bold":
|
||||
@@ -571,6 +571,13 @@ export function convertProseMirrorToMarkdown(content: any): string {
|
||||
return `<div ${parts.join(" ")}>${inner}</div>`;
|
||||
}
|
||||
|
||||
case "pageBreak":
|
||||
// Emit the schema-matching div[data-type="pageBreak"] so marked passes
|
||||
// it through as a block and generateJSON rebuilds the pageBreak atom.
|
||||
// Without this case the node fell through to `default` and rendered ""
|
||||
// (the divider silently disappeared and could not round-trip).
|
||||
return `<div data-type="pageBreak"></div>`;
|
||||
|
||||
case "subpages":
|
||||
return "{{SUBPAGES}}";
|
||||
|
||||
|
||||
@@ -337,6 +337,44 @@ function bridgeTaskLists(html: string): string {
|
||||
return document.body.innerHTML;
|
||||
}
|
||||
|
||||
/**
|
||||
* Recursively strip content-less paragraph nodes from a generated doc.
|
||||
*
|
||||
* A block-level atom whose markdown form is INLINE (e.g. the block `image`'s
|
||||
* ``, or a bare media element) is wrapped by marked in a <p>; the schema
|
||||
* then HOISTS the block atom out of that paragraph, leaving an EMPTY paragraph
|
||||
* sibling. On the next export that empty `<p>` renders to "" and the doc "\n\n"
|
||||
* join injects a phantom blank gap, so the markdown is not byte-stable.
|
||||
*
|
||||
* Markdown blank lines are separators, never content, so generateJSON only ever
|
||||
* produces an empty paragraph as such a hoist artifact — removing them is safe
|
||||
* and general (it also subsumes the <div>-wrapper workaround the `video` case
|
||||
* uses). We remove ONLY `type === 'paragraph'` nodes whose `content` is absent
|
||||
* or an empty array; every other node (including atoms without `content`) is
|
||||
* preserved, and we recurse into the content of any node that has children.
|
||||
*/
|
||||
function stripEmptyParagraphs(node: any): any {
|
||||
if (!node || !Array.isArray(node.content)) {
|
||||
// Atom / leaf node (no children to recurse into): keep as-is.
|
||||
return node;
|
||||
}
|
||||
const mapped = node.content.map((child: any) => stripEmptyParagraphs(child));
|
||||
const isEmptyParagraph = (child: any): boolean =>
|
||||
!!child &&
|
||||
child.type === "paragraph" &&
|
||||
(!Array.isArray(child.content) || child.content.length === 0);
|
||||
const filtered = mapped.filter((child: any) => !isEmptyParagraph(child));
|
||||
// Schema-validity guard: several nodes require NON-empty block content
|
||||
// (`content: "block+"` — tableCell, tableHeader, blockquote, column, callout,
|
||||
// and the doc root). For an empty one of those, generateJSON materializes a
|
||||
// single empty paragraph as its OBLIGATORY content — that is not a hoist
|
||||
// artifact. If stripping would empty the container, keep ONE empty paragraph
|
||||
// so the result stays schema-valid (an empty cell/quote must not become `[]`).
|
||||
const cleaned =
|
||||
filtered.length === 0 && mapped.length > 0 ? [mapped[0]] : filtered;
|
||||
return { ...node, content: cleaned };
|
||||
}
|
||||
|
||||
/** Convert markdown to a ProseMirror doc using the full Docmost schema. */
|
||||
export async function markdownToProseMirror(
|
||||
markdownContent: string,
|
||||
@@ -345,5 +383,6 @@ export async function markdownToProseMirror(
|
||||
const withCallouts = await preprocessCallouts(markdownContent);
|
||||
const html = await marked.parse(withCallouts);
|
||||
const bridged = bridgeTaskLists(html);
|
||||
return generateJSON(bridged, docmostExtensions);
|
||||
const doc = generateJSON(bridged, docmostExtensions);
|
||||
return stripEmptyParagraphs(doc);
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user