test(git-sync): exhaustive converter coverage + fix 3 round-trip data-loss bugs

Coder↔reviewer design loop (9 rounds, reviewer verdict: exhaustive) produced 92 specs; implemented +123 tests (465 -> 588 passing). The new round-trip coverage exposed three genuine data-loss bugs in the Markdown<->ProseMirror converter, all now FIXED (round-trip is lossless for these): 1. pageBreak was lost on export (no converter case -> rendered to "" and the node vanished). Now emits <div data-type="pageBreak"></div>, which the schema parses back -> round-trips. 2. A block image between blocks left an empty <p> artifact after import-hoisting, producing a phantom blank-gap diff on every sync. markdownToProseMirror now strips content-less paragraphs after generateJSON — with a schema-validity guard that keeps the obligatory single empty paragraph in `content: "block+"` containers (tableCell/tableHeader/blockquote/column/callout/doc), so empty cells/quotes never become an invalid `content: []`. 3. The `code` mark combined with another mark was not byte-stable (emitted nested HTML that the schema's `code` `excludes:"_"` collapsed on import). The converter now emits code-only when `code` co-occurs, matching the editor. New coverage spans media/diagram/details/columns/math/mention attribute round-trips, converter emission branches, git error paths, and engine decision branches. A dedicated test pins the empty-container schema validity (the review catch on the bug-2 fix). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 06:50:20 +03:00
parent 04032ae677
commit d06cf97ed6
18 changed files with 2902 additions and 50 deletions
--- a/packages/git-sync/src/lib/markdown-converter.ts
+++ b/packages/git-sync/src/lib/markdown-converter.ts
@@ -68,21 +68,21 @@ export function convertProseMirrorToMarkdown(content: any): string {
        let textContent = node.text || "";
        // Apply marks (bold, italic, code, etc.)
        if (node.marks) {
-          // Markdown code spans (`...`) cannot carry inner formatting, so when a
-          // run has the `code` mark alongside ANY other mark, backtick syntax
-          // would leak literal ** / []() into the code text. In that case emit
-          // nested HTML (<code> innermost, the other marks wrapping it as HTML)
-          // so the output is at least well-formed and re-parseable.
-          //
-          // NOTE: this does NOT round-trip both marks. The schema's `code` mark
-          // has `excludes: "_"` (it excludes every other mark), so on import the
-          // co-occurring mark is always dropped — the run comes back as `code`
-          // only. We keep the emission simple and accept that the other mark is
-          // lost; preserving both is impossible while `code` excludes them.
-          // Only use the backtick form when `code` is the sole mark.
+          // The schema's `code` mark declares `excludes: "_"` — it excludes every
+          // other inline mark — so the editor can NEVER produce a text run that
+          // carries `code` together with another mark, and on import any
+          // co-occurring mark is always dropped (the run comes back as code-only).
+          // The lossless, byte-stable behavior is therefore: when a run has the
+          // `code` mark, emit ONLY the backtick code span and ignore every other
+          // mark, so md1 is already code-only and md2 === md1. Runs WITHOUT a code
+          // mark are rendered exactly as before.
          const markTypes = node.marks.map((m: any) => m.type);
          const hasCode = markTypes.includes("code");
-          const codeCombined = hasCode && markTypes.length > 1;
+          if (hasCode) {
+            textContent = `\`${textContent}\``;
+            return textContent;
+          }
+          const codeCombined = false;
          for (const mark of node.marks) {
            switch (mark.type) {
              case "bold":
@@ -571,6 +571,13 @@ export function convertProseMirrorToMarkdown(content: any): string {
        return `<div ${parts.join(" ")}>${inner}</div>`;
      }

+      case "pageBreak":
+        // Emit the schema-matching div[data-type="pageBreak"] so marked passes
+        // it through as a block and generateJSON rebuilds the pageBreak atom.
+        // Without this case the node fell through to `default` and rendered ""
+        // (the divider silently disappeared and could not round-trip).
+        return `<div data-type="pageBreak"></div>`;
+
      case "subpages":
        return "{{SUBPAGES}}";

--- a/packages/git-sync/src/lib/markdown-to-prosemirror.ts
+++ b/packages/git-sync/src/lib/markdown-to-prosemirror.ts
@@ -337,6 +337,44 @@ function bridgeTaskLists(html: string): string {
  return document.body.innerHTML;
 }

+/**
+ * Recursively strip content-less paragraph nodes from a generated doc.
+ *
+ * A block-level atom whose markdown form is INLINE (e.g. the block `image`'s
+ * `![](url)`, or a bare media element) is wrapped by marked in a <p>; the schema
+ * then HOISTS the block atom out of that paragraph, leaving an EMPTY paragraph
+ * sibling. On the next export that empty `<p>` renders to "" and the doc "\n\n"
+ * join injects a phantom blank gap, so the markdown is not byte-stable.
+ *
+ * Markdown blank lines are separators, never content, so generateJSON only ever
+ * produces an empty paragraph as such a hoist artifact — removing them is safe
+ * and general (it also subsumes the <div>-wrapper workaround the `video` case
+ * uses). We remove ONLY `type === 'paragraph'` nodes whose `content` is absent
+ * or an empty array; every other node (including atoms without `content`) is
+ * preserved, and we recurse into the content of any node that has children.
+ */
+function stripEmptyParagraphs(node: any): any {
+  if (!node || !Array.isArray(node.content)) {
+    // Atom / leaf node (no children to recurse into): keep as-is.
+    return node;
+  }
+  const mapped = node.content.map((child: any) => stripEmptyParagraphs(child));
+  const isEmptyParagraph = (child: any): boolean =>
+    !!child &&
+    child.type === "paragraph" &&
+    (!Array.isArray(child.content) || child.content.length === 0);
+  const filtered = mapped.filter((child: any) => !isEmptyParagraph(child));
+  // Schema-validity guard: several nodes require NON-empty block content
+  // (`content: "block+"` — tableCell, tableHeader, blockquote, column, callout,
+  // and the doc root). For an empty one of those, generateJSON materializes a
+  // single empty paragraph as its OBLIGATORY content — that is not a hoist
+  // artifact. If stripping would empty the container, keep ONE empty paragraph
+  // so the result stays schema-valid (an empty cell/quote must not become `[]`).
+  const cleaned =
+    filtered.length === 0 && mapped.length > 0 ? [mapped[0]] : filtered;
+  return { ...node, content: cleaned };
+}
+
 /** Convert markdown to a ProseMirror doc using the full Docmost schema. */
 export async function markdownToProseMirror(
  markdownContent: string,
@@ -345,5 +383,6 @@ export async function markdownToProseMirror(
  const withCallouts = await preprocessCallouts(markdownContent);
  const html = await marked.parse(withCallouts);
  const bridged = bridgeTaskLists(html);
-  return generateJSON(bridged, docmostExtensions);
+  const doc = generateJSON(bridged, docmostExtensions);
+  return stripEmptyParagraphs(doc);
 }