feat(git-sync): vendor pure converter + engine into @docmost/git-sync (Phase A.1)

First step of docs/git-sync-plan.md. New workspace package @docmost/git-sync vendoring the PURE parts from docmost-sync (HEAD b03eb35): - lib: markdown-converter, markdown-document, canonicalize, docmost-schema, node-ops, diff, and an extracted markdown-to-prosemirror (only the pure marked->HTML->generateJSON path from upstream collaboration.ts; no websocket). - engine (pure, no IO): reconcile, layout, sanitize, stabilize, loop-guard. Ported the upstream pure-module + round-trip corpus tests (vitest): 314 pass, 3 expected upstream known-limitation fails. tsc clean. No server wiring yet. docmost-schema inlines getStyleProperty (as packages/mcp does — @tiptap/core 3.20.4 doesn't export it). IO engine (pull/push/git/settings) deferred to later Phase A/B steps; the editor-ext idempotency gate (plan §13.1) is the next step. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 13:55:23 +03:00
parent 406921ac6a
commit c0dbc97fe2
61 changed files with 9729 additions and 1817 deletions
--- a/packages/git-sync/build/engine/layout.js
+++ b/packages/git-sync/build/engine/layout.js
@@ -1,170 +0,0 @@
-/**
- * Pure page-tree -> vault path mapping (SPEC §12).
- *
- * Given the flat list of page nodes for a space (as returned by
- * `listAllSpacePages`), compute for every page a deterministic, collision-free
- * destination: a folder path (root -> leaf ancestors) plus a file stem (the
- * page's own name, no extension). This module is intentionally PURE and
- * dependency-free apart from the sanitization helpers, so the whole tree ->
- * path logic is unit-testable without any I/O. The names are COSMETIC; identity
- * lives in each file's meta block (pageId / slugId).
- */
-import { sanitizeTitle, disambiguate } from "./sanitize.js";
-/**
- * Build the full vault layout for a space.
- *
- * Returns a Map keyed by pageId -> `{ segments, stem }`. The result is
- * deterministic for a given input and guarantees every full destination path
- * (`[...segments, stem].join("/")`) is unique, so no page can silently overwrite
- * another.
- *
- * Disambiguation is layered:
- *   1. Sibling collisions (same sanitized title under the same parent) are
- *      resolved with a stable ` ~<slugId>` suffix (the suffix is itself
- *      sanitized, since slugId/id is untrusted data that must never inject a
- *      path separator).
- *   2. A final full-path pass catches residual collisions that sibling-scoping
- *      cannot see — e.g. two pages whose parents are BOTH outside the input set
- *      both bucket at the root with `segments: []`.
- */
-export function buildVaultLayout(pages) {
-    // Index pages by id so the parent chain can be walked. Guard against
-    // duplicate ids in the input (first one wins).
-    const byId = new Map();
-    for (const p of pages) {
-        if (p && p.id && !byId.has(p.id))
-            byId.set(p.id, p);
-    }
-    // Resolve each node's display name once, deterministically, tracking sibling
-    // collisions per parent. `usedBySibling` maps a parent key -> set of names
-    // already taken under that parent. The bucket key is the node's parent ONLY
-    // when that parent is actually present in `byId`; otherwise (null parent, or
-    // an orphan whose parent is outside the input set) the node buckets at
-    // `"__root__"`. This is critical: orphans land at the vault root (see
-    // `folderSegmentsFor`), so they MUST share the root bucket with real root
-    // pages to be disambiguated against each other here — making `nameById` final
-    // before any `segments` are computed, so no ancestor name can drift later.
-    const usedBySibling = new Map();
-    const nameById = new Map();
-    for (const p of pages) {
-        if (p && p.id && !nameById.has(p.id)) {
-            const parentKey = p.parentPageId && byId.has(p.parentPageId) ? p.parentPageId : "__root__";
-            nameById.set(p.id, nameForNode(p, parentKey, usedBySibling));
-        }
-    }
-    // Every id we index above MUST get a resolved name; this helper returns it
-    // and THROWS if it is somehow absent, rather than silently recomputing a
-    // DIFFERENT, non-disambiguated name (which would desync a folder segment from
-    // its target file).
-    const nameOf = (id) => {
-        const name = nameById.get(id);
-        if (name === undefined) {
-            throw new Error(`buildVaultLayout: no resolved name for page id ${id}`);
-        }
-        return name;
-    };
-    // Build the folder path for a page by walking parentPageId to the root. The
-    // page's OWN name is the file stem; its ancestors become folders. A `visited`
-    // guard prevents an infinite loop on a malformed parent cycle.
-    const folderSegmentsFor = (node) => {
-        const ancestors = [];
-        const visited = new Set();
-        let current = node.parentPageId
-            ? byId.get(node.parentPageId)
-            : undefined;
-        while (current && current.id && !visited.has(current.id)) {
-            visited.add(current.id);
-            ancestors.unshift(nameOf(current.id));
-            current = current.parentPageId
-                ? byId.get(current.parentPageId)
-                : undefined;
-        }
-        return ancestors;
-    };
-    // First pass: compute the provisional { segments, stem } for every node.
-    const layout = new Map();
-    for (const p of pages) {
-        if (!p || !p.id || layout.has(p.id))
-            continue;
-        layout.set(p.id, {
-            segments: folderSegmentsFor(p),
-            stem: nameOf(p.id),
-        });
-    }
-    // FOLDER-NOTE transform (native-Obsidian layout): a page WITH CHILDREN lives at
-    // `<…>/<stem>/<stem>.md` — its body is the folder-note INSIDE its own folder
-    // (LostPaul Folder Notes convention), and its children sit alongside it in that
-    // folder. A leaf stays `<…>/<stem>.md`. Children's segments already point into
-    // the parent's folder (folderSegmentsFor walks ancestor NAMES), so only the
-    // parent's own file relocates here; the sibling name pass above already made
-    // the parent name unique, so folder == file name stays consistent.
-    for (const p of pages) {
-        if (!p || !p.id)
-            continue;
-        const entry = layout.get(p.id);
-        if (entry && p.hasChildren) {
-            entry.segments = [...entry.segments, entry.stem];
-        }
-    }
-    // Final full-path uniqueness pass — a belt-and-suspenders safety net. Note
-    // that cross-bucket (orphan/root) collisions are now resolved in the name pass
-    // above (orphans share the "__root__" bucket), so ancestor names are final
-    // before `segments` are built and this pass should rarely/never re-stem an
-    // ancestor. It only re-stems the colliding LATER leaf via the sanitized
-    // slugId/id, then (if still colliding) appends the id.
-    //
-    // Process FOLDER-NOTES (pages with children) FIRST so a parent claims its
-    // canonical `<name>/<name>.md` before a same-named CHILD — the child (a leaf)
-    // is the one that disambiguates, never the folder-note.
-    const usedPaths = new Set();
-    const seenIds = new Set();
-    const pathKey = (e) => [...e.segments, e.stem].join("/");
-    const ordered = pages
-        .filter((p) => Boolean(p && p.id))
-        .sort((a, b) => Number(Boolean(b.hasChildren)) - Number(Boolean(a.hasChildren)));
-    for (const p of ordered) {
-        if (seenIds.has(p.id))
-            continue;
-        seenIds.add(p.id);
-        const entry = layout.get(p.id);
-        if (!entry)
-            continue;
-        if (usedPaths.has(pathKey(entry))) {
-            // First attempt: disambiguate the stem with the sanitized slugId (or id).
-            entry.stem = disambiguate(entry.stem, sanitizeTitle(p.slugId ?? p.id));
-            if (usedPaths.has(pathKey(entry))) {
-                // Still colliding: append the (sanitized) id as a last resort. The id
-                // is globally unique, so this always resolves the collision.
-                entry.stem = disambiguate(entry.stem, sanitizeTitle(p.id));
-            }
-        }
-        usedPaths.add(pathKey(entry));
-    }
-    return layout;
-}
-/**
- * Compute a deterministic, collision-free name for a node among its SIBLINGS.
- * `usedBySibling` maps a parent key -> set of names already taken, so two
- * siblings that sanitize to the same name get a stable ` ~slugId` suffix
- * (SPEC §12). The suffix is itself passed through `sanitizeTitle`, because the
- * slugId/id is a second untrusted-data channel that must never leak a path
- * separator into the name. `parentKey` is supplied by the caller (it resolves
- * to `"__root__"` for root pages AND for orphans whose parent is outside the
- * input set, so they share one bucket). The name is COSMETIC; identity lives in
- * the meta block.
- */
-function nameForNode(node, parentKey, usedBySibling) {
-    let used = usedBySibling.get(parentKey);
-    if (!used) {
-        used = new Set();
-        usedBySibling.set(parentKey, used);
-    }
-    let name = sanitizeTitle(node.title ?? "");
-    if (used.has(name)) {
-        // Sibling collision: disambiguate with the stable, sanitized slugId (fall
-        // back to the sanitized pageId if no slugId is present).
-        name = disambiguate(name, sanitizeTitle(node.slugId ?? node.id));
-    }
-    used.add(name);
-    return name;
-}
--- a/packages/git-sync/build/engine/roundtrip-helpers.d.ts
+++ b/packages/git-sync/build/engine/roundtrip-helpers.d.ts
@@ -1,21 +0,0 @@
-/**
- * Pure, IO-free comparison helpers for the idempotency round-trip checks. The
- * round-trip harness that drives these lives in the package's tests, not in the
- * engine.
- */
-/**
- * Recursively strip every `attrs.id` from a ProseMirror node tree. Block ids
- * are regenerated by `markdownToProseMirror` (SPEC §11), so they must be
- * ignored when comparing the semantic shape of two documents. Returns a NEW
- * tree; the input is not mutated.
- */
-export declare function stripBlockIds(node: any): any;
-/**
- * Find the first divergence between two values via a recursive deep compare.
- * Returns a short path + the two differing values, or null if they are equal.
- */
-export declare function firstDivergence(a: any, b: any, path?: string): {
-    path: string;
-    a: any;
-    b: any;
-} | null;
--- a/packages/git-sync/build/engine/roundtrip-helpers.js
+++ b/packages/git-sync/build/engine/roundtrip-helpers.js
@@ -1,70 +0,0 @@
-/**
- * Pure, IO-free comparison helpers for the idempotency round-trip checks. The
- * round-trip harness that drives these lives in the package's tests, not in the
- * engine.
- */
-/**
- * Recursively strip every `attrs.id` from a ProseMirror node tree. Block ids
- * are regenerated by `markdownToProseMirror` (SPEC §11), so they must be
- * ignored when comparing the semantic shape of two documents. Returns a NEW
- * tree; the input is not mutated.
- */
-export function stripBlockIds(node) {
-    if (Array.isArray(node)) {
-        return node.map(stripBlockIds);
-    }
-    if (node && typeof node === "object") {
-        const out = {};
-        for (const key of Object.keys(node)) {
-            if (key === "attrs" && node.attrs && typeof node.attrs === "object") {
-                // Drop the `id` attr; keep every other attribute.
-                const { id, ...rest } = node.attrs;
-                void id;
-                out.attrs = stripBlockIds(rest);
-            }
-            else {
-                out[key] = stripBlockIds(node[key]);
-            }
-        }
-        return out;
-    }
-    return node;
-}
-/**
- * Find the first divergence between two values via a recursive deep compare.
- * Returns a short path + the two differing values, or null if they are equal.
- */
-export function firstDivergence(a, b, path = "$") {
-    if (a === b)
-        return null;
-    const ta = typeof a;
-    const tb = typeof b;
-    if (ta !== tb || a === null || b === null) {
-        return { path, a, b };
-    }
-    if (ta !== "object") {
-        return { path, a, b };
-    }
-    const aIsArr = Array.isArray(a);
-    const bIsArr = Array.isArray(b);
-    if (aIsArr !== bIsArr)
-        return { path, a, b };
-    if (aIsArr) {
-        if (a.length !== b.length) {
-            return { path: `${path}.length`, a: a.length, b: b.length };
-        }
-        for (let i = 0; i < a.length; i++) {
-            const d = firstDivergence(a[i], b[i], `${path}[${i}]`);
-            if (d)
-                return d;
-        }
-        return null;
-    }
-    const keys = new Set([...Object.keys(a), ...Object.keys(b)]);
-    for (const k of keys) {
-        const d = firstDivergence(a[k], b[k], `${path}.${k}`);
-        if (d)
-            return d;
-    }
-    return null;
-}
--- a/packages/git-sync/build/engine/stabilize.d.ts
+++ b/packages/git-sync/build/engine/stabilize.d.ts
@@ -1,41 +0,0 @@
-/**
- * Meta object as `exportPageBody` builds it (SPEC §4). Kept byte-for-byte
- * compatible so files produced here match `exportPageBody`'s output exactly.
- */
-export interface PageMeta {
-    version: 1;
-    pageId: string;
-    slugId: string;
-    title: string;
-    spaceId: string;
-    parentPageId: string | null;
-}
-/**
- * Produce the self-contained `.md` file text for a page from its raw
- * ProseMirror `content` + identity meta, in the verified fixpoint form.
- *
- *   md1        = convertProseMirrorToMarkdown(content)
- *   doc2       = markdownToProseMirror(md1)            // one import...
- *   stableBody = convertProseMirrorToMarkdown(doc2)    // ...and re-export
- *   file       = serializeDocmostMarkdownBody(meta, stableBody)
- *
- * The single export->import->export pass is the verified fixpoint (SPEC §11):
- * idempotent for already-stable content, and the convergence point for the
- * known converter asymmetries.
- */
-export declare function stabilizePageFile(content: unknown, meta: PageMeta): Promise<string>;
-/**
- * The fixpoint markdown BODY for a page's ProseMirror `content`, WITHOUT any meta
- * envelope:
- *
- *   md1        = convertProseMirrorToMarkdown(content)   // export...
- *   doc2       = markdownToProseMirror(md1)              // ...import...
- *   stableBody = convertProseMirrorToMarkdown(doc2)      // ...re-export
- *
- * The single export->import->export pass is the verified fixpoint (SPEC §11):
- * idempotent for already-stable content, and the convergence point for the known
- * converter asymmetries. The native-Obsidian writer (`serializePageFile`) wraps
- * this body with a minimal `gitmost_id` frontmatter; determinism here is what
- * keeps re-pulls of an unchanged page byte-identical (no churn, loop-guard).
- */
-export declare function stabilizePageBody(content: unknown): Promise<string>;
--- a/packages/git-sync/build/engine/stabilize.js
+++ b/packages/git-sync/build/engine/stabilize.js
@@ -1,52 +0,0 @@
-/**
- * Normalize-on-write helper (SPEC §11 "Резолюция").
- *
- * git diffs byte-for-byte, so writing a page in a NON-fixpoint markdown form
- * would make the next pull re-export it to a slightly different (but stable)
- * form and produce a phantom diff -> churny commits. The converter has a couple
- * of known one-pass asymmetries (a block image after a paragraph adds an empty
- * paragraph; a diagram materializes `data-align`), all of which converge to a
- * fixpoint after ONE `export -> import -> export` round-trip.
- *
- * So at write time we run exactly that one pass and persist the fixpoint form.
- * Already-stable content is unaffected (the pass is idempotent), so re-pulls of
- * unchanged pages produce identical bytes and git sees no diff.
- */
-import { convertProseMirrorToMarkdown, markdownToProseMirror, serializeDocmostMarkdownBody, } from "../lib/index.js";
-/**
- * Produce the self-contained `.md` file text for a page from its raw
- * ProseMirror `content` + identity meta, in the verified fixpoint form.
- *
- *   md1        = convertProseMirrorToMarkdown(content)
- *   doc2       = markdownToProseMirror(md1)            // one import...
- *   stableBody = convertProseMirrorToMarkdown(doc2)    // ...and re-export
- *   file       = serializeDocmostMarkdownBody(meta, stableBody)
- *
- * The single export->import->export pass is the verified fixpoint (SPEC §11):
- * idempotent for already-stable content, and the convergence point for the
- * known converter asymmetries.
- */
-export async function stabilizePageFile(content, meta) {
-    // The meta shape is exactly what `exportPageBody` writes; cast to the lib's
-    // DocmostMdMeta (a superset with optional fields) for the serializer.
-    return serializeDocmostMarkdownBody(meta, await stabilizePageBody(content));
-}
-/**
- * The fixpoint markdown BODY for a page's ProseMirror `content`, WITHOUT any meta
- * envelope:
- *
- *   md1        = convertProseMirrorToMarkdown(content)   // export...
- *   doc2       = markdownToProseMirror(md1)              // ...import...
- *   stableBody = convertProseMirrorToMarkdown(doc2)      // ...re-export
- *
- * The single export->import->export pass is the verified fixpoint (SPEC §11):
- * idempotent for already-stable content, and the convergence point for the known
- * converter asymmetries. The native-Obsidian writer (`serializePageFile`) wraps
- * this body with a minimal `gitmost_id` frontmatter; determinism here is what
- * keeps re-pulls of an unchanged page byte-identical (no churn, loop-guard).
- */
-export async function stabilizePageBody(content) {
-    const md1 = convertProseMirrorToMarkdown(content);
-    const doc2 = await markdownToProseMirror(md1);
-    return convertProseMirrorToMarkdown(doc2);
-}
--- a/packages/git-sync/build/index.d.ts
+++ b/packages/git-sync/build/index.d.ts
@@ -1,31 +0,0 @@
-/**
- * Public surface of `@docmost/git-sync`.
- *
- * Exposes the pure converter (markdown <-> ProseMirror, file envelope,
- * canonicalization) and the sync engine (reconcile planner, vault layout,
- * pull/push, the git wrapper, and the settings parser) that the gitmost server
- * drives in-process.
- */
-export { serializeDocmostMarkdown, serializeDocmostMarkdownBody, parseDocmostMarkdown, convertProseMirrorToMarkdown, markdownToProseMirror, canonicalizeContent, docsCanonicallyEqual, } from "./lib/index.js";
-export type { DocmostMdMeta } from "./lib/index.js";
-export { planReconciliation, decideAbsenceDeletions, MASS_DELETE_MIN_EXISTING, MASS_DELETE_FRACTION, } from "./engine/reconcile.js";
-export type { LiveEntry, ExistingEntry, WriteEntry, MovedEntry, ReconciliationPlan, DeletionDecision, } from "./engine/reconcile.js";
-export { buildVaultLayout } from "./engine/layout.js";
-export type { PageNode, VaultEntry } from "./engine/layout.js";
-export { sanitizeTitle, disambiguate } from "./engine/sanitize.js";
-export { stabilizePageFile } from "./engine/stabilize.js";
-export type { PageMeta } from "./engine/stabilize.js";
-export { bodyHash } from "./engine/loop-guard.js";
-export type { GitSyncClient, GitSyncPageNodeLite } from "./engine/client.types.js";
-export { VaultGit, vaultGitEnv, buildCommitMessage, BOT_AUTHOR_NAME, BOT_AUTHOR_EMAIL, DEFAULT_BRANCH, } from "./engine/git.js";
-export type { DiffEntry, MergeResult, CommitOptions } from "./engine/git.js";
-export { readExisting, computePullActions, applyPullActions, } from "./engine/pull.js";
-export type { ReadExistingDeps, PullActionsInput, PullActions, ApplyPullActionsDeps, ApplyResult, } from "./engine/pull.js";
-export { classifyRenameMoves, computePushActions, applyPushActions, runPush, parentFolderFile, parseArgs, LAST_PUSHED_REF, DOCMOST_BRANCH, LOCAL_AUTHOR_NAME, LOCAL_AUTHOR_EMAIL, LOCAL_SOURCE_TRAILER, } from "./engine/push.js";
-export type { CreateAction, UpdateAction, DeleteAction, RenameMoveAction, RenameMoveActionClassified, ClassifyRenameMovesDeps, PushActions, PushActionsInput, MetaSide, ApplyPushDeps, WrittenBackPage, PushedPageRecord, PushFailure, PushNoop, ApplyPushResult, PushDeps, PushRunResult, PushParsedArgs, } from "./engine/push.js";
-export { parseSettings, envSchema } from "./engine/settings.js";
-export type { Settings } from "./engine/settings.js";
-export { loadSettingsOrExit } from "./engine/config-errors.js";
-export { runCycle } from "./engine/cycle.js";
-export type { RunCycleDeps, RunCycleResult, CycleFs, } from "./engine/cycle.js";
-export { parsePageFile, serializePageFile } from "./lib/page-file.js";
--- a/packages/git-sync/build/index.js
+++ b/packages/git-sync/build/index.js
@@ -1,24 +0,0 @@
-/**
- * Public surface of `@docmost/git-sync`.
- *
- * Exposes the pure converter (markdown <-> ProseMirror, file envelope,
- * canonicalization) and the sync engine (reconcile planner, vault layout,
- * pull/push, the git wrapper, and the settings parser) that the gitmost server
- * drives in-process.
- */
-// Pure converter (markdown <-> ProseMirror, file envelope, canonicalization).
-export { serializeDocmostMarkdown, serializeDocmostMarkdownBody, parseDocmostMarkdown, convertProseMirrorToMarkdown, markdownToProseMirror, canonicalizeContent, docsCanonicallyEqual, } from "./lib/index.js";
-// Pure engine (no IO): reconcile planner, vault layout, sanitize, stabilize,
-// loop-guard body hash.
-export { planReconciliation, decideAbsenceDeletions, MASS_DELETE_MIN_EXISTING, MASS_DELETE_FRACTION, } from "./engine/reconcile.js";
-export { buildVaultLayout } from "./engine/layout.js";
-export { sanitizeTitle, disambiguate } from "./engine/sanitize.js";
-export { stabilizePageFile } from "./engine/stabilize.js";
-export { bodyHash } from "./engine/loop-guard.js";
-export { VaultGit, vaultGitEnv, buildCommitMessage, BOT_AUTHOR_NAME, BOT_AUTHOR_EMAIL, DEFAULT_BRANCH, } from "./engine/git.js";
-export { readExisting, computePullActions, applyPullActions, } from "./engine/pull.js";
-export { classifyRenameMoves, computePushActions, applyPushActions, runPush, parentFolderFile, parseArgs, LAST_PUSHED_REF, DOCMOST_BRANCH, LOCAL_AUTHOR_NAME, LOCAL_AUTHOR_EMAIL, LOCAL_SOURCE_TRAILER, } from "./engine/push.js";
-export { parseSettings, envSchema } from "./engine/settings.js";
-export { loadSettingsOrExit } from "./engine/config-errors.js";
-export { runCycle } from "./engine/cycle.js";
-export { parsePageFile, serializePageFile } from "./lib/page-file.js";
--- a/packages/git-sync/build/lib/canonicalize.d.ts
+++ b/packages/git-sync/build/lib/canonicalize.d.ts
@@ -1,38 +0,0 @@
-/**
- * Semantic canonicalization of ProseMirror/TipTap documents for the round-trip
- * idempotency check (SPEC §11, "Задача №0", option (б): compare a CANONICALIZED
- * form rather than raw bytes).
- *
- * `markdownToProseMirror` reconstructs schema DEFAULT attributes (e.g.
- * `indent: null` where the source omitted it) and regenerates per-block ids on
- * every import. A raw deep-equal of the source doc against the re-imported doc
- * therefore diverges even when the two are semantically identical. This module
- * normalizes a document so that two semantically-equal docs compare deep-equal
- * regardless of block ids and absent-vs-explicit-default-null attributes.
- *
- * It is a self-contained module with no external dependencies.
- */
-/**
- * Return a DEEP COPY of a ProseMirror node tree, canonicalized so that two
- * semantically-equal documents compare deep-equal. Rules (applied recursively
- * to the node, its `content`, and its `marks`):
- *
- *  1. Remove node-level `attrs.id` (regenerated on import). Mark attrs are NOT
- *     touched for `id` (marks carry no block id; only their meaningful attrs).
- *  2. In any `attrs` object (node OR mark) drop keys whose value is `null`/
- *     `undefined` (absent ≡ explicit default null) OR equals that node/mark
- *     type's known non-null schema default (absent ≡ explicit default).
- *     Keep every non-default value. The type is passed into the attrs
- *     normalizer so it can look up `KNOWN_DEFAULTS`.
- *  3. If an `attrs` object becomes empty after pruning, drop the `attrs` key.
- *  4. Preserve `marks` (including the `comment` mark and its `commentId` — a
- *     meaningful anchor per SPEC §3; never strip it).
- *  5. Preserve `text`, `type`, and `content` order exactly.
- *  6. Never mutate the input.
- */
-export declare function canonicalizeContent(node: any): any;
-/**
- * True when two ProseMirror documents are semantically equal: equal after
- * canonicalization (block ids stripped, absent-vs-default-null normalized).
- */
-export declare function docsCanonicallyEqual(a: any, b: any): boolean;
--- a/packages/git-sync/build/lib/canonicalize.js
+++ b/packages/git-sync/build/lib/canonicalize.js
@@ -1,245 +0,0 @@
-/**
- * Semantic canonicalization of ProseMirror/TipTap documents for the round-trip
- * idempotency check (SPEC §11, "Задача №0", option (б): compare a CANONICALIZED
- * form rather than raw bytes).
- *
- * `markdownToProseMirror` reconstructs schema DEFAULT attributes (e.g.
- * `indent: null` where the source omitted it) and regenerates per-block ids on
- * every import. A raw deep-equal of the source doc against the re-imported doc
- * therefore diverges even when the two are semantically identical. This module
- * normalizes a document so that two semantically-equal docs compare deep-equal
- * regardless of block ids and absent-vs-explicit-default-null attributes.
- *
- * It is a self-contained module with no external dependencies.
- */
-/**
- * Known NON-NULL schema defaults that `markdownToProseMirror` materializes on
- * import, keyed by node/mark type → { attr: defaultValue }.
- *
- * Why this exists: `canonicalizeAttrs` already treats an absent attr as
- * equivalent to an explicit `null`/`undefined`. But several Docmost schema
- * attributes default to a NON-null value, so import fills them in even when the
- * source omitted them — making "attr absent" diverge from "attr at its default
- * value" under a raw deep-equal. To keep "absent ≡ explicit-default", we ALSO
- * drop any attr whose value equals its known schema default. A non-default
- * value (e.g. `orderedList.start: 5`) is NOT a default, so it is KEPT.
- *
- * Every entry below was read from `packages/docmost-client/src/lib/
- * docmost-schema.ts` (the line refs are the exact `default:` declarations) and
- * confirmed to be materialized by an export→import→export round-trip:
- *   - mark `link`    target / rel  — DocmostAttributes + StarterKit link.
- *       StarterKit's link extension defaults `target: "_blank"` and
- *       `rel: "noopener noreferrer nofollow"`; both materialize on import
- *       (empirically confirmed) even when the source had only `href`.
- *   - mark `comment` resolved      — docmost-schema.ts L213-214 (`default: false`).
- *   - node `orderedList` start     — provided by StarterKit's orderedList
- *       (`default: 1`); materializes on import (empirically confirmed).
- *   - node `drawio`/`excalidraw`/`video`/`youtube`/`embed` align — the diagram
- *       attribute set and the media nodes declare `align: { default: "center" }`
- *       (docmost-schema.ts L745-750 diagramAttributes; L564 video; L626 youtube;
- *       L667 embed). The diagram `align` is the one the round-trip materializes
- *       (docmost-schema.ts L745); the media/embed entries normalize the SAME
- *       `align` default for consistency. Note: this only normalizes `align` —
- *       full canonical stability of `embed` is separately limited by the
- *       converter coercing numeric `width`/`height` to strings, which is outside
- *       canonicalize's scope.
- *
- * NOTE: `image` has NO non-null align default — its `align` defaults to `null`
- * (docmost-schema.ts L174), so it is already handled by the null-drop rule and
- * is intentionally NOT listed here.
- */
-const KNOWN_DEFAULTS = {
-    // mark types
-    link: {
-        target: "_blank",
-        rel: "noopener noreferrer nofollow",
-    },
-    comment: {
-        resolved: false,
-    },
-    // node types
-    orderedList: {
-        start: 1,
-    },
-    drawio: {
-        align: "center",
-    },
-    excalidraw: {
-        align: "center",
-    },
-    video: {
-        align: "center",
-    },
-    youtube: {
-        align: "center",
-    },
-    embed: {
-        align: "center",
-    },
-};
-/**
- * Prune an `attrs` object in place on a fresh copy: drop keys whose value is
- * `null` or `undefined` (an absent attribute and an explicit default of `null`
- * are semantically equivalent here). Optionally also drop a node-level `id`
- * (block ids are regenerated on import, SPEC §11). ALSO drop any attr whose
- * value equals the node/mark `type`'s known NON-null schema default
- * (`KNOWN_DEFAULTS`), so "attr absent" ≡ "attr at its default value" — without
- * this, the import-materialized `link.target`/`comment.resolved`/
- * `orderedList.start`/diagram `align` defaults would be a phantom diff. Every
- * non-default attribute value is KEPT (level, language, src, href, commentId,
- * width, a non-default `start`/`align`, ...).
- *
- * Returns the pruned attrs object, or `undefined` if nothing meaningful is
- * left (so the caller can drop the `attrs` key entirely: `{attrs:{}}` ≡ no
- * attrs).
- */
-function canonicalizeAttrs(attrs, dropId, type) {
-    const defaults = type ? KNOWN_DEFAULTS[type] : undefined;
-    const out = {};
-    // Stable key order so a JSON.stringify of the canonical form is comparable
-    // regardless of the input's key order.
-    for (const key of Object.keys(attrs).sort()) {
-        // Block ids are regenerated on import; drop them on NODE attrs only.
-        if (dropId && key === "id")
-            continue;
-        const value = attrs[key];
-        // Absent ≡ explicit-default-null/undefined.
-        if (value === null || value === undefined)
-            continue;
-        // Absent ≡ explicit known non-null default (e.g. link.target="_blank").
-        // A non-default value (e.g. orderedList.start=5) does NOT match, so it is
-        // kept. The `comment` mark's `commentId` is never a default, so it always
-        // survives (SPEC §3); only its `resolved: false` default is normalized away.
-        if (defaults && key in defaults && value === defaults[key])
-            continue;
-        out[key] = value;
-    }
-    return Object.keys(out).length > 0 ? out : undefined;
-}
-/**
- * Return a DEEP COPY of a ProseMirror node tree, canonicalized so that two
- * semantically-equal documents compare deep-equal. Rules (applied recursively
- * to the node, its `content`, and its `marks`):
- *
- *  1. Remove node-level `attrs.id` (regenerated on import). Mark attrs are NOT
- *     touched for `id` (marks carry no block id; only their meaningful attrs).
- *  2. In any `attrs` object (node OR mark) drop keys whose value is `null`/
- *     `undefined` (absent ≡ explicit default null) OR equals that node/mark
- *     type's known non-null schema default (absent ≡ explicit default).
- *     Keep every non-default value. The type is passed into the attrs
- *     normalizer so it can look up `KNOWN_DEFAULTS`.
- *  3. If an `attrs` object becomes empty after pruning, drop the `attrs` key.
- *  4. Preserve `marks` (including the `comment` mark and its `commentId` — a
- *     meaningful anchor per SPEC §3; never strip it).
- *  5. Preserve `text`, `type`, and `content` order exactly.
- *  6. Never mutate the input.
- */
-export function canonicalizeContent(node) {
-    if (Array.isArray(node)) {
-        return node.map((child) => canonicalizeContent(child));
-    }
-    if (node === null || typeof node !== "object") {
-        // Primitive leaf (string/number/boolean/null): returned as-is.
-        return node;
-    }
-    // A node is a mark when it has a `type` but never carries block `content`
-    // and lives inside a `marks` array. We cannot tell from the node alone, so
-    // we distinguish at the recursion site: node `attrs` drop `id`, mark `attrs`
-    // do not. This is handled by passing a `dropId` flag down for the `attrs`
-    // key specifically (nodes) vs the `marks[].attrs` path (marks).
-    const out = {};
-    for (const key of Object.keys(node)) {
-        if (key === "attrs" && node.attrs && typeof node.attrs === "object") {
-            // Node-level attrs: drop the block id, null/undefined attrs, and any
-            // attr at this node type's known non-null schema default.
-            const canon = canonicalizeAttrs(node.attrs, true, typeof node.type === "string" ? node.type : undefined);
-            if (canon !== undefined)
-                out.attrs = canon;
-            // else: drop the `attrs` key entirely (rule 3).
-        }
-        else if (key === "marks" && Array.isArray(node.marks)) {
-            // Marks: keep them all (incl. comment); canonicalize their attrs but do
-            // NOT drop `id` (a mark's `id` would be a meaningful attr, not a block
-            // id). An empty marks array is dropped so `marks:[]` ≡ no marks.
-            const marks = node.marks.map((mark) => canonicalizeMark(mark));
-            if (marks.length > 0)
-                out.marks = marks;
-        }
-        else {
-            out[key] = canonicalizeContent(node[key]);
-        }
-    }
-    return out;
-}
-/**
- * Canonicalize a single mark: keep `type`, prune its `attrs` (null/undefined
- * AND known non-null defaults dropped, empty attrs removed) but NEVER drop a
- * mark's attribute as a "block id" — marks have no block id, only meaningful
- * attrs (href, commentId, color, level, ...). Meaningful NON-default attrs
- * survive (the `comment` mark's `commentId` is never a default, so it always
- * survives — SPEC §3); only known defaults like `link.target="_blank"`,
- * `link.rel="noopener…"` and `comment.resolved=false` are normalized away.
- */
-function canonicalizeMark(mark) {
-    if (mark === null || typeof mark !== "object")
-        return mark;
-    const out = {};
-    for (const key of Object.keys(mark)) {
-        if (key === "attrs" && mark.attrs && typeof mark.attrs === "object") {
-            const canon = canonicalizeAttrs(mark.attrs, false, typeof mark.type === "string" ? mark.type : undefined);
-            if (canon !== undefined)
-                out.attrs = canon;
-        }
-        else {
-            out[key] = canonicalizeContent(mark[key]);
-        }
-    }
-    return out;
-}
-/**
- * Deep structural equality of two values that is key-order-insensitive.
- * Used to compare canonical forms. (`canonicalizeContent` already emits
- * `attrs` in a stable key order, but the top-level node keys preserve input
- * order, so we compare structurally rather than by string.)
- */
-function deepEqual(a, b) {
-    if (a === b)
-        return true;
-    if (typeof a !== typeof b)
-        return false;
-    if (a === null || b === null)
-        return a === b;
-    if (typeof a !== "object")
-        return false;
-    const aIsArr = Array.isArray(a);
-    const bIsArr = Array.isArray(b);
-    if (aIsArr !== bIsArr)
-        return false;
-    if (aIsArr) {
-        if (a.length !== b.length)
-            return false;
-        for (let i = 0; i < a.length; i++) {
-            if (!deepEqual(a[i], b[i]))
-                return false;
-        }
-        return true;
-    }
-    const aKeys = Object.keys(a);
-    const bKeys = Object.keys(b);
-    if (aKeys.length !== bKeys.length)
-        return false;
-    for (const k of aKeys) {
-        if (!Object.prototype.hasOwnProperty.call(b, k))
-            return false;
-        if (!deepEqual(a[k], b[k]))
-            return false;
-    }
-    return true;
-}
-/**
- * True when two ProseMirror documents are semantically equal: equal after
- * canonicalization (block ids stripped, absent-vs-default-null normalized).
- */
-export function docsCanonicallyEqual(a, b) {
-    return deepEqual(canonicalizeContent(a), canonicalizeContent(b));
-}
--- a/packages/git-sync/build/lib/index.d.ts
+++ b/packages/git-sync/build/lib/index.d.ts
@@ -1,16 +0,0 @@
-/**
- * Public surface of the pure converter (`lib/`). This barrel re-exports the
- * PURE, IO-free pieces the sync engine needs: the self-contained markdown
- * (de)serializers, the lossless ProseMirror <-> Markdown converter, the
- * markdown -> ProseMirror import path, and semantic canonicalization for the
- * round-trip idempotency check (SPEC §11).
- *
- * There is no REST client, websocket/collab write-path, auth-utils or page-lock
- * here — the gitmost server writes natively.
- */
-export { serializeDocmostMarkdown, parseDocmostMarkdown, serializeDocmostMarkdownBody, } from "./markdown-document.js";
-export type { DocmostMdMeta } from "./markdown-document.js";
-export { convertProseMirrorToMarkdown } from "./markdown-converter.js";
-export { markdownToProseMirror } from "./markdown-to-prosemirror.js";
-export { canonicalizeContent, docsCanonicallyEqual, } from "./canonicalize.js";
-export { parsePageFile, serializePageFile } from "./page-file.js";
--- a/packages/git-sync/build/lib/index.js
+++ b/packages/git-sync/build/lib/index.js
@@ -1,15 +0,0 @@
-/**
- * Public surface of the pure converter (`lib/`). This barrel re-exports the
- * PURE, IO-free pieces the sync engine needs: the self-contained markdown
- * (de)serializers, the lossless ProseMirror <-> Markdown converter, the
- * markdown -> ProseMirror import path, and semantic canonicalization for the
- * round-trip idempotency check (SPEC §11).
- *
- * There is no REST client, websocket/collab write-path, auth-utils or page-lock
- * here — the gitmost server writes natively.
- */
-export { serializeDocmostMarkdown, parseDocmostMarkdown, serializeDocmostMarkdownBody, } from "./markdown-document.js";
-export { convertProseMirrorToMarkdown } from "./markdown-converter.js";
-export { markdownToProseMirror } from "./markdown-to-prosemirror.js";
-export { canonicalizeContent, docsCanonicallyEqual, } from "./canonicalize.js";
-export { parsePageFile, serializePageFile } from "./page-file.js";
--- a/packages/git-sync/build/lib/markdown-converter.js
+++ b/packages/git-sync/build/lib/markdown-converter.js
@@ -1,801 +0,0 @@
-/**
- * Convert ProseMirror/TipTap JSON content to Markdown
- * Supports all Docmost-specific node types and extensions
- */
-export function convertProseMirrorToMarkdown(content) {
-    if (!content || !content.content)
-        return "";
-    // Escape a value interpolated into an HTML double-quoted attribute value
-    // (textAlign, colors, image src, math `text`, all data-* attrs, etc.). In the
-    // ATTRIBUTE context only the quote that delimits the value and the ampersand
-    // that starts an entity are special, so we escape ONLY & " (and ' for safety
-    // when single-quoted delimiters are used). We deliberately do NOT escape < or
-    // >: the HTML re-parser (parse5/jsdom via @tiptap/html) does NOT decode
-    // &lt;/&gt; back inside attribute values, so escaping them would corrupt the
-    // stored data (e.g. a math node's LaTeX `a < b`) and ACCUMULATE escapes on
-    // every round-trip (`a < b` -> `a &lt; b` -> `a &amp;lt; b`). Escaping & "
-    // keeps the value inert against attribute-injection while staying idempotent.
-    // NOTE: escape ONLY & and " here. The value is always wrapped in double
-    // quotes, so " is the only delimiter; ' is NOT special in a double-quoted
-    // value, and parse5 does not decode &#39; back inside attribute values, so
-    // escaping ' would (like < >) corrupt the value and accumulate &amp; on every
-    // round-trip. Escaping & and " is idempotent (parse5 decodes them back).
-    const escapeAttr = (value) => String(value)
-        .replace(/&/g, "&amp;")
-        .replace(/"/g, "&quot;");
-    // Escape a value placed as HTML element TEXT content (between tags), where
-    // <, >, and & are all significant. Used for text rendered inside raw-HTML
-    // blocks (table cells / columns) so stored characters cannot inject markup.
-    const escapeHtmlText = (value) => String(value)
-        .replace(/&/g, "&amp;")
-        .replace(/</g, "&lt;")
-        .replace(/>/g, "&gt;");
-    // Percent-encode characters that would break out of a markdown URL target
-    // (...) — whitespace/newlines and parentheses — so a stored src stays a
-    // single inert token (used for image/video/youtube srcs).
-    const encodeMdUrl = (value) => String(value || "")
-        .replace(/\s/g, (c) => (c === " " ? "%20" : encodeURIComponent(c)))
-        .replace(/\(/g, "%28")
-        .replace(/\)/g, "%29");
-    const processNode = (node) => {
-        const type = node.type;
-        const nodeContent = node.content || [];
-        switch (type) {
-            case "doc":
-                return nodeContent.map(processNode).join("\n\n");
-            case "paragraph":
-                const text = nodeContent.map(processNode).join("");
-                const align = node.attrs?.textAlign;
-                if (align && align !== "left") {
-                    return `<div align="${escapeAttr(align)}">${text}</div>`;
-                }
-                return text || "";
-            case "heading":
-                const level = node.attrs?.level || 1;
-                const headingText = nodeContent.map(processNode).join("");
-                return "#".repeat(level) + " " + headingText;
-            case "text":
-                let textContent = node.text || "";
-                // Apply marks (bold, italic, code, etc.)
-                if (node.marks) {
-                    // The schema's `code` mark declares `excludes: "_"` — it excludes every
-                    // other inline mark — so the editor can NEVER produce a text run that
-                    // carries `code` together with another mark, and on import any
-                    // co-occurring mark is always dropped (the run comes back as code-only).
-                    // The lossless, byte-stable behavior is therefore: when a run has the
-                    // `code` mark, emit ONLY the backtick code span and ignore every other
-                    // mark, so md1 is already code-only and md2 === md1. Runs WITHOUT a code
-                    // mark are rendered exactly as before.
-                    const markTypes = node.marks.map((m) => m.type);
-                    const hasCode = markTypes.includes("code");
-                    if (hasCode) {
-                        textContent = `\`${textContent}\``;
-                        return textContent;
-                    }
-                    const codeCombined = false;
-                    for (const mark of node.marks) {
-                        switch (mark.type) {
-                            case "bold":
-                                textContent = codeCombined
-                                    ? `<strong>${textContent}</strong>`
-                                    : `**${textContent}**`;
-                                break;
-                            case "italic":
-                                textContent = codeCombined
-                                    ? `<em>${textContent}</em>`
-                                    : `*${textContent}*`;
-                                break;
-                            case "code":
-                                // When combined with another mark, wrap as <code> so the
-                                // surrounding HTML marks can nest around it; otherwise use the
-                                // plain backtick span.
-                                textContent = codeCombined
-                                    ? `<code>${textContent}</code>`
-                                    : `\`${textContent}\``;
-                                break;
-                            case "link": {
-                                const href = mark.attrs?.href || "";
-                                const title = mark.attrs?.title;
-                                if (codeCombined) {
-                                    // Emit an HTML anchor so it can wrap the nested <code>.
-                                    const safeHref = escapeAttr(href);
-                                    if (title) {
-                                        textContent = `<a href="${safeHref}" title="${escapeAttr(String(title))}">${textContent}</a>`;
-                                    }
-                                    else {
-                                        textContent = `<a href="${safeHref}">${textContent}</a>`;
-                                    }
-                                }
-                                else if (title) {
-                                    // Emit the optional markdown link title; escape an embedded
-                                    // double-quote so it cannot terminate the title string early.
-                                    const safeTitle = String(title).replace(/"/g, '\\"');
-                                    textContent = `[${textContent}](${href} "${safeTitle}")`;
-                                }
-                                else {
-                                    textContent = `[${textContent}](${href})`;
-                                }
-                                break;
-                            }
-                            case "strike":
-                                textContent = codeCombined
-                                    ? `<s>${textContent}</s>`
-                                    : `~~${textContent}~~`;
-                                break;
-                            case "underline":
-                                textContent = `<u>${textContent}</u>`;
-                                break;
-                            case "subscript":
-                                textContent = `<sub>${textContent}</sub>`;
-                                break;
-                            case "superscript":
-                                textContent = `<sup>${textContent}</sup>`;
-                                break;
-                            case "highlight": {
-                                // Preserve a null/empty color as a plain highlight (a bare
-                                // <mark> with no background-color); only emit the style when a
-                                // color is actually set, so a plain highlight is not forced to
-                                // yellow on export.
-                                const color = mark.attrs?.color;
-                                textContent = color
-                                    ? `<mark style="background-color: ${escapeAttr(color)}">${textContent}</mark>`
-                                    : `<mark>${textContent}</mark>`;
-                                break;
-                            }
-                            case "textStyle":
-                                if (mark.attrs?.color) {
-                                    textContent = `<span style="color: ${escapeAttr(mark.attrs.color)}">${textContent}</span>`;
-                                }
-                                break;
-                            case "comment": {
-                                // Emit the inline comment anchor so highlights round-trip. The
-                                // schema's Comment mark parses span[data-comment-id] (attrs
-                                // commentId/resolved).
-                                const cid = mark.attrs?.commentId;
-                                if (cid) {
-                                    const resolvedAttr = mark.attrs?.resolved
-                                        ? ` data-resolved="true"`
-                                        : "";
-                                    textContent = `<span data-comment-id="${escapeAttr(cid)}"${resolvedAttr}>${textContent}</span>`;
-                                }
-                                break;
-                            }
-                        }
-                    }
-                }
-                return textContent;
-            case "codeBlock":
-                const language = node.attrs?.language || "";
-                // Strip ALL trailing newlines so the export is idempotent: marked
-                // re-adds exactly one trailing "\n" on import, so trimming only one
-                // here would let the text grow by "\n" on each round-trip. Removing
-                // every trailing newline makes repeated cycles stable.
-                const code = nodeContent
-                    .map(processNode)
-                    .join("")
-                    .replace(/\n+$/, "");
-                return "```" + language + "\n" + code + "\n```";
-            case "bulletList":
-                return nodeContent
-                    .map((item) => processListItem(item, "-"))
-                    .join("\n");
-            case "orderedList":
-                return nodeContent
-                    .map((item, index) => processListItem(item, `${index + 1}.`))
-                    .join("\n");
-            case "taskList":
-                return nodeContent.map((item) => processTaskItem(item)).join("\n");
-            case "taskItem":
-                // Delegate to the same helper used by taskList so multi-block and
-                // nested task items render and indent consistently.
-                return processTaskItem(node);
-            case "listItem":
-                return nodeContent.map(processNode).join("\n");
-            case "blockquote":
-                // Prefix EVERY line of EVERY child with "> " and separate block-level
-                // children with a blank ">" line so code blocks / multi-paragraph
-                // quotes round-trip correctly.
-                return nodeContent
-                    .map((n) => processNode(n)
-                    .split("\n")
-                    .map((line) => (line.length ? `> ${line}` : ">"))
-                    .join("\n"))
-                    .join("\n>\n");
-            case "horizontalRule":
-                return "---";
-            case "hardBreak":
-                // Two trailing spaces before the newline encode a markdown hard break;
-                // a bare "\n" would be reimported as a soft break and lost.
-                return "  \n";
-            case "image":
-                const imgAlt = node.attrs?.alt || "";
-                // Neutralize characters that could break out of the markdown image
-                // URL: spaces/newlines and parentheses would terminate the (...) target
-                // and let a stored src inject following markdown/HTML. Percent-encode
-                // them so the URL stays a single inert token.
-                const imgSrc = encodeMdUrl(node.attrs?.src);
-                // No "caption" attribute exists in the Docmost image schema, so we do
-                // not emit one (the previous caption branch was dead).
-                return `![${imgAlt}](${imgSrc})`;
-            case "video": {
-                // Emit the schema-matching <video> element so generateJSON rebuilds the
-                // node with its attrs intact. The schema's parseHTML reads src/aria-label
-                // from the standard attributes and the remaining attrs from data-*.
-                const attrs = node.attrs || {};
-                const parts = [`src="${escapeAttr(attrs.src ?? "")}"`];
-                if (attrs.alt)
-                    parts.push(`aria-label="${escapeAttr(attrs.alt)}"`);
-                if (attrs.attachmentId)
-                    parts.push(`data-attachment-id="${escapeAttr(attrs.attachmentId)}"`);
-                if (attrs.width != null)
-                    parts.push(`width="${escapeAttr(attrs.width)}"`);
-                if (attrs.height != null)
-                    parts.push(`height="${escapeAttr(attrs.height)}"`);
-                if (attrs.size != null)
-                    parts.push(`data-size="${escapeAttr(attrs.size)}"`);
-                if (attrs.align)
-                    parts.push(`data-align="${escapeAttr(attrs.align)}"`);
-                if (attrs.aspectRatio != null)
-                    parts.push(`data-aspect-ratio="${escapeAttr(attrs.aspectRatio)}"`);
-                // Wrap in a block <div> so marked treats it as a block (a bare <video>
-                // is inline-level HTML and marked wraps it in <p>, leaving a spurious
-                // empty paragraph beside the hoisted block atom). The wrapper has no
-                // data-type, so the schema parser ignores it and just hoists the video.
-                return `<div><video ${parts.join(" ")}></video></div>`;
-            }
-            case "youtube": {
-                // Emit the schema-matching div[data-type="youtube"]; the schema reads
-                // src from data-src and width/height/align from data-* attributes.
-                const attrs = node.attrs || {};
-                const parts = [
-                    `data-type="youtube"`,
-                    `data-src="${escapeAttr(attrs.src ?? "")}"`,
-                ];
-                if (attrs.width != null)
-                    parts.push(`data-width="${escapeAttr(attrs.width)}"`);
-                if (attrs.height != null)
-                    parts.push(`data-height="${escapeAttr(attrs.height)}"`);
-                if (attrs.align)
-                    parts.push(`data-align="${escapeAttr(attrs.align)}"`);
-                return `<div ${parts.join(" ")}></div>`;
-            }
-            case "table": {
-                // A GFM pipe table cannot represent merged cells. If ANY cell carries
-                // colspan>1 or rowspan>1, a pipe table would corrupt the grid on
-                // re-import, so emit the WHOLE table as raw HTML <table> instead: the
-                // schema's table family parseHTML (tag table/tr/td/th, with colspan/
-                // rowspan read from the same-named HTML attrs and align via parseHTML)
-                // round-trips it faithfully. Otherwise keep the lighter GFM pipe table.
-                const tableRows = nodeContent;
-                if (tableRows.length === 0)
-                    return "";
-                const hasSpan = tableRows.some((row) => (row.content || []).some((cell) => (cell.attrs?.colspan ?? 1) > 1 || (cell.attrs?.rowspan ?? 1) > 1));
-                if (hasSpan) {
-                    // Render each cell's block children to HTML (marked does NOT parse
-                    // markdown inside a raw HTML block, so emitting markdown here would
-                    // leak literal ** / `` into the cell). blockToHtml mirrors the schema
-                    // HTML so inner formatting re-parses into the right marks/nodes.
-                    const renderHtmlCell = (cell) => {
-                        const tag = cell.type === "tableHeader" ? "th" : "td";
-                        const a = cell.attrs || {};
-                        const cellParts = [];
-                        if ((a.colspan ?? 1) > 1)
-                            cellParts.push(`colspan="${escapeAttr(a.colspan)}"`);
-                        if ((a.rowspan ?? 1) > 1)
-                            cellParts.push(`rowspan="${escapeAttr(a.rowspan)}"`);
-                        if (a.align)
-                            cellParts.push(`align="${escapeAttr(a.align)}"`);
-                        const open = cellParts.length
-                            ? `<${tag} ${cellParts.join(" ")}>`
-                            : `<${tag}>`;
-                        const inner = (cell.content || [])
-                            .map((block) => blockToHtml(block))
-                            .join("");
-                        return `${open}${inner}</${tag}>`;
-                    };
-                    const htmlRows = tableRows
-                        .map((row) => `<tr>${(row.content || []).map(renderHtmlCell).join("")}</tr>`)
-                        .join("");
-                    return `<table><tbody>${htmlRows}</tbody></table>`;
-                }
-                // No merged cells: emit a GFM table (header row + separator) so the
-                // markdown can be parsed back into a table on re-import.
-                const rows = tableRows.map(processNode);
-                const headerCells = tableRows[0]?.content || [];
-                const columns = headerCells.length || 1;
-                // Derive alignment markers (:--, :-:, --:) from each header cell.
-                const markers = Array.from({ length: columns }, (_, i) => {
-                    const align = headerCells[i]?.attrs?.align;
-                    switch (align) {
-                        case "left":
-                            return ":--";
-                        case "center":
-                            return ":-:";
-                        case "right":
-                            return "--:";
-                        default:
-                            return "---";
-                    }
-                });
-                const separator = "| " + markers.join(" | ") + " |";
-                return [rows[0], separator, ...rows.slice(1)].join("\n");
-            }
-            case "tableRow":
-                return "| " + nodeContent.map(processNode).join(" | ") + " |";
-            case "tableCell":
-            case "tableHeader": {
-                // Join multiple block children with a space (not "") so adjacent blocks
-                // like a paragraph followed by a list don't collide into "line1- a".
-                // Then collapse newlines and escape pipes so a cell containing "|" or a
-                // line break cannot corrupt the surrounding GFM row.
-                return nodeContent
-                    .map(processNode)
-                    .join(" ")
-                    .replace(/\r?\n/g, " ")
-                    .replace(/\|/g, "\\|");
-            }
-            case "callout":
-                const calloutType = node.attrs?.type || "info";
-                const calloutContent = nodeContent.map(processNode).join("\n");
-                return `:::${calloutType.toLowerCase()}\n${calloutContent}\n:::`;
-            case "details":
-                return nodeContent.map(processNode).join("\n");
-            case "detailsSummary":
-                const summaryText = nodeContent.map(processNode).join("");
-                return `<details>\n<summary>${summaryText}</summary>\n`;
-            case "detailsContent":
-                const detailsText = nodeContent.map(processNode).join("\n");
-                return `${detailsText}\n</details>`;
-            case "mathInline": {
-                // The schema's `text` attribute has no parseHTML, so TipTap's default
-                // parser reads it from the `text` HTML attribute (NOT the element's text
-                // content). Emit span[data-type="mathInline"] carrying the LaTeX in a
-                // `text="..."` attribute so it round-trips. marked cannot parse $...$
-                // back, so the previous form was lossy.
-                const inlineMath = node.attrs?.text || "";
-                return `<span data-type="mathInline" data-katex="true" text="${escapeAttr(inlineMath)}"></span>`;
-            }
-            case "mathBlock": {
-                // Same as mathInline: the LaTeX must ride in the `text` HTML attribute
-                // for the schema's default parser to recover it.
-                const blockMath = node.attrs?.text || "";
-                return `<div data-type="mathBlock" data-katex="true" text="${escapeAttr(blockMath)}"></div>`;
-            }
-            case "mention": {
-                // Emit span[data-type="mention"] with the schema's data-* attributes so
-                // generateJSON rebuilds the mention node instead of leaving "@label"
-                // plain text that cannot re-parse.
-                const attrs = node.attrs || {};
-                const parts = [`data-type="mention"`];
-                if (attrs.id)
-                    parts.push(`data-id="${escapeAttr(attrs.id)}"`);
-                if (attrs.label)
-                    parts.push(`data-label="${escapeAttr(attrs.label)}"`);
-                if (attrs.entityType)
-                    parts.push(`data-entity-type="${escapeAttr(attrs.entityType)}"`);
-                if (attrs.entityId)
-                    parts.push(`data-entity-id="${escapeAttr(attrs.entityId)}"`);
-                if (attrs.slugId)
-                    parts.push(`data-slug-id="${escapeAttr(attrs.slugId)}"`);
-                if (attrs.creatorId)
-                    parts.push(`data-creator-id="${escapeAttr(attrs.creatorId)}"`);
-                if (attrs.anchorId)
-                    parts.push(`data-anchor-id="${escapeAttr(attrs.anchorId)}"`);
-                // Keep the label as visible text content too; the schema reads attrs
-                // from data-*, so the inner text is purely cosmetic and harmless.
-                const mentionLabel = attrs.label || attrs.id || "";
-                // The label is visible element TEXT content here (the data-* attrs above
-                // carry the real values), so escape it for the text context, not attrs.
-                return `<span ${parts.join(" ")}>@${escapeHtmlText(mentionLabel)}</span>`;
-            }
-            case "attachment": {
-                // BUG FIX: the old code read node.attrs.fileName / node.attrs.src, but
-                // the schema stores name/url (plus mime/size/attachmentId). Emit the
-                // schema-matching div[data-type="attachment"] with data-attachment-*
-                // attrs so the node round-trips instead of degrading to a markdown link.
-                const attrs = node.attrs || {};
-                const parts = [
-                    `data-type="attachment"`,
-                    `data-attachment-url="${escapeAttr(attrs.url ?? "")}"`,
-                ];
-                if (attrs.name)
-                    parts.push(`data-attachment-name="${escapeAttr(attrs.name)}"`);
-                if (attrs.mime)
-                    parts.push(`data-attachment-mime="${escapeAttr(attrs.mime)}"`);
-                if (attrs.size != null)
-                    parts.push(`data-attachment-size="${escapeAttr(attrs.size)}"`);
-                if (attrs.attachmentId)
-                    parts.push(`data-attachment-id="${escapeAttr(attrs.attachmentId)}"`);
-                return `<div ${parts.join(" ")}></div>`;
-            }
-            case "drawio":
-            case "excalidraw": {
-                // Emit the schema-matching div[data-type=...] carrying the diagram's
-                // attrs as data-* (the schema's diagramAttributes reads src/title/alt/
-                // width/height/size/aspectRatio/align/attachmentId from data-*), so the
-                // diagram round-trips instead of degrading to a lossy placeholder.
-                const attrs = node.attrs || {};
-                const parts = [
-                    `data-type="${type}"`,
-                    `data-src="${escapeAttr(attrs.src ?? "")}"`,
-                ];
-                if (attrs.title != null)
-                    parts.push(`data-title="${escapeAttr(attrs.title)}"`);
-                if (attrs.alt != null)
-                    parts.push(`data-alt="${escapeAttr(attrs.alt)}"`);
-                if (attrs.width != null)
-                    parts.push(`data-width="${escapeAttr(attrs.width)}"`);
-                if (attrs.height != null)
-                    parts.push(`data-height="${escapeAttr(attrs.height)}"`);
-                if (attrs.size != null)
-                    parts.push(`data-size="${escapeAttr(attrs.size)}"`);
-                if (attrs.aspectRatio != null)
-                    parts.push(`data-aspect-ratio="${escapeAttr(attrs.aspectRatio)}"`);
-                if (attrs.align)
-                    parts.push(`data-align="${escapeAttr(attrs.align)}"`);
-                if (attrs.attachmentId)
-                    parts.push(`data-attachment-id="${escapeAttr(attrs.attachmentId)}"`);
-                return `<div ${parts.join(" ")}></div>`;
-            }
-            case "embed": {
-                // Emit the schema-matching div[data-type="embed"]; the schema reads
-                // src/provider/align/width/height from data-* attributes so the node
-                // (and its provider iframe info) survives the round-trip.
-                const attrs = node.attrs || {};
-                const parts = [
-                    `data-type="embed"`,
-                    `data-src="${escapeAttr(attrs.src ?? "")}"`,
-                    `data-provider="${escapeAttr(attrs.provider ?? "")}"`,
-                ];
-                if (attrs.align)
-                    parts.push(`data-align="${escapeAttr(attrs.align)}"`);
-                if (attrs.width != null)
-                    parts.push(`data-width="${escapeAttr(attrs.width)}"`);
-                if (attrs.height != null)
-                    parts.push(`data-height="${escapeAttr(attrs.height)}"`);
-                return `<div ${parts.join(" ")}></div>`;
-            }
-            case "audio": {
-                // Emit the schema-matching <audio> element (was emitting nothing). The
-                // schema reads src from src and attachmentId/size from data-*.
-                const attrs = node.attrs || {};
-                const parts = [`src="${escapeAttr(attrs.src ?? "")}"`];
-                if (attrs.attachmentId)
-                    parts.push(`data-attachment-id="${escapeAttr(attrs.attachmentId)}"`);
-                if (attrs.size != null)
-                    parts.push(`data-size="${escapeAttr(attrs.size)}"`);
-                // Wrap in a block <div> for the same reason as video: a bare <audio> is
-                // inline-level HTML that marked would wrap in <p>.
-                return `<div><audio ${parts.join(" ")}></audio></div>`;
-            }
-            case "pdf": {
-                // Emit the schema-matching div[data-type="pdf"] (was emitting nothing).
-                // The schema reads src/width/height from standard attrs and name/
-                // attachmentId/size from data-*.
-                const attrs = node.attrs || {};
-                const parts = [
-                    `data-type="pdf"`,
-                    `src="${escapeAttr(attrs.src ?? "")}"`,
-                ];
-                if (attrs.name)
-                    parts.push(`data-name="${escapeAttr(attrs.name)}"`);
-                if (attrs.attachmentId)
-                    parts.push(`data-attachment-id="${escapeAttr(attrs.attachmentId)}"`);
-                if (attrs.size != null)
-                    parts.push(`data-size="${escapeAttr(attrs.size)}"`);
-                if (attrs.width != null)
-                    parts.push(`width="${escapeAttr(attrs.width)}"`);
-                if (attrs.height != null)
-                    parts.push(`height="${escapeAttr(attrs.height)}"`);
-                return `<div ${parts.join(" ")}></div>`;
-            }
-            case "columns": {
-                // Emit the schema-matching div[data-type="columns"] wrapper so the
-                // multi-column layout survives. Without a case the children were
-                // concatenated with no separator and the text merged. The schema reads
-                // layout from data-layout and widthMode from data-width-mode. The whole
-                // block is raw HTML, so render children via blockToHtml (NOT markdown,
-                // which marked would not re-parse inside a raw HTML block).
-                const attrs = node.attrs || {};
-                const parts = [`data-type="columns"`];
-                if (attrs.layout)
-                    parts.push(`data-layout="${escapeAttr(attrs.layout)}"`);
-                if (attrs.widthMode && attrs.widthMode !== "normal")
-                    parts.push(`data-width-mode="${escapeAttr(attrs.widthMode)}"`);
-                const inner = nodeContent.map((n) => blockToHtml(n)).join("");
-                return `<div ${parts.join(" ")}>${inner}</div>`;
-            }
-            case "column": {
-                // Emit the schema-matching div[data-type="column"]; the schema reads the
-                // column width from data-width. Children are rendered as HTML so their
-                // formatting survives inside this raw HTML block.
-                const attrs = node.attrs || {};
-                const parts = [`data-type="column"`];
-                if (attrs.width)
-                    parts.push(`data-width="${escapeAttr(attrs.width)}"`);
-                const inner = nodeContent.map((n) => blockToHtml(n)).join("");
-                return `<div ${parts.join(" ")}>${inner}</div>`;
-            }
-            case "pageBreak":
-                // Emit the schema-matching div[data-type="pageBreak"] so marked passes
-                // it through as a block and generateJSON rebuilds the pageBreak atom.
-                // Without this case the node fell through to `default` and rendered ""
-                // (the divider silently disappeared and could not round-trip).
-                return `<div data-type="pageBreak"></div>`;
-            case "subpages":
-                return "{{SUBPAGES}}";
-            default:
-                // Fallback: process children
-                return nodeContent.map(processNode).join("");
-        }
-    };
-    // Render inline content (text runs + their marks) to HTML. Used by the raw
-    // HTML fallbacks (spanned tables, columns) where marked will NOT re-parse
-    // markdown, so backtick/asterisk/bracket syntax would otherwise leak as
-    // literal characters. Each mark is mirrored to the HTML the schema's parseHTML
-    // accepts so it re-imports as the matching ProseMirror mark.
-    const inlineToHtml = (inlineNodes) => (inlineNodes || [])
-        .map((n) => {
-        if (n.type === "hardBreak")
-            return "<br>";
-        if (n.type !== "text") {
-            // Inline atoms (mention, mathInline) already emit schema HTML.
-            return processNode(n);
-        }
-        let t = escapeHtmlText(n.text || "");
-        for (const mark of n.marks || []) {
-            switch (mark.type) {
-                case "bold":
-                    t = `<strong>${t}</strong>`;
-                    break;
-                case "italic":
-                    t = `<em>${t}</em>`;
-                    break;
-                case "code":
-                    t = `<code>${t}</code>`;
-                    break;
-                case "strike":
-                    t = `<s>${t}</s>`;
-                    break;
-                case "underline":
-                    t = `<u>${t}</u>`;
-                    break;
-                case "subscript":
-                    t = `<sub>${t}</sub>`;
-                    break;
-                case "superscript":
-                    t = `<sup>${t}</sup>`;
-                    break;
-                case "link":
-                    t = `<a href="${escapeAttr(mark.attrs?.href || "")}">${t}</a>`;
-                    break;
-                case "highlight":
-                    t = mark.attrs?.color
-                        ? `<mark style="background-color: ${escapeAttr(mark.attrs.color)}">${t}</mark>`
-                        : `<mark>${t}</mark>`;
-                    break;
-                case "textStyle":
-                    if (mark.attrs?.color)
-                        t = `<span style="color: ${escapeAttr(mark.attrs.color)}">${t}</span>`;
-                    break;
-                case "comment":
-                    // Inline comment anchor inside a raw-HTML container (columns /
-                    // spanned table cells), so commented text there also round-trips.
-                    if (mark.attrs?.commentId) {
-                        const r = mark.attrs?.resolved ? ` data-resolved="true"` : "";
-                        t = `<span data-comment-id="${escapeAttr(mark.attrs.commentId)}"${r}>${t}</span>`;
-                    }
-                    break;
-            }
-        }
-        return t;
-    })
-        .join("");
-    // Emit the schema-matching <img> for an image node. Shared so the image is
-    // emitted as real HTML wherever a raw-HTML container needs it (inside a column
-    // or a spanned table cell), where markdown `![](...)` would NOT be re-parsed
-    // and would survive as literal text. The Image extension reads src/alt from
-    // the standard attributes; the Docmost extra attrs (width/height/align/size/
-    // attachmentId/aspectRatio) are global attributes read from same-named DOM
-    // attributes, so emit them by name.
-    const imageToHtml = (node) => {
-        const attrs = node.attrs || {};
-        const parts = [`src="${escapeAttr(attrs.src ?? "")}"`];
-        if (attrs.alt)
-            parts.push(`alt="${escapeAttr(attrs.alt)}"`);
-        if (attrs.title)
-            parts.push(`title="${escapeAttr(attrs.title)}"`);
-        if (attrs.width != null)
-            parts.push(`width="${escapeAttr(attrs.width)}"`);
-        if (attrs.height != null)
-            parts.push(`height="${escapeAttr(attrs.height)}"`);
-        if (attrs.align)
-            parts.push(`align="${escapeAttr(attrs.align)}"`);
-        if (attrs.size != null)
-            parts.push(`data-size="${escapeAttr(attrs.size)}"`);
-        if (attrs.attachmentId)
-            parts.push(`data-attachment-id="${escapeAttr(attrs.attachmentId)}"`);
-        if (attrs.aspectRatio != null)
-            parts.push(`data-aspect-ratio="${escapeAttr(attrs.aspectRatio)}"`);
-        return `<img ${parts.join(" ")}>`;
-    };
-    // Emit the schema-matching div[data-type="callout"] for a callout node. The
-    // schema reads the banner type from data-callout-type. Children are rendered
-    // as HTML so they survive inside a raw-HTML container.
-    const calloutToHtml = (node) => {
-        const type = (node.attrs?.type || "info").toLowerCase();
-        const inner = (node.content || []).map(blockToHtml).join("");
-        return `<div data-type="callout" data-callout-type="${escapeAttr(type)}">${inner}</div>`;
-    };
-    // Emit a schema-matching <details> tree. The schema parses <details>,
-    // summary[data-type="detailsSummary"], and div[data-type="detailsContent"].
-    const detailsToHtml = (node) => {
-        const inner = (node.content || []).map(blockToHtml).join("");
-        return `<details>${inner}</details>`;
-    };
-    const detailsSummaryToHtml = (node) => `<summary data-type="detailsSummary">${inlineToHtml(node.content || [])}</summary>`;
-    const detailsContentToHtml = (node) => {
-        const inner = (node.content || []).map(blockToHtml).join("");
-        return `<div data-type="detailsContent">${inner}</div>`;
-    };
-    // Emit the schema-matching taskList/taskItem HTML. bridgeTaskLists (in
-    // collaboration.ts) recognizes ul[data-type="taskList"] with
-    // li[data-type="taskItem"][data-checked]; emitting that directly here keeps
-    // task lists inside columns/cells from degrading to literal "- [ ]" text.
-    const taskListToHtml = (node) => {
-        const items = (node.content || [])
-            .map((it) => {
-            const checked = it.attrs?.checked ? "true" : "false";
-            return `<li data-type="taskItem" data-checked="${checked}">${blockChildrenToHtml(it)}</li>`;
-        })
-            .join("");
-        return `<ul data-type="taskList">${items}</ul>`;
-    };
-    // Render a block node to HTML for the raw-HTML containers (spanned tables,
-    // columns). marked does NOT re-parse markdown inside a raw-HTML block, so
-    // EVERY block type that can appear inside a column or a spanned cell must be
-    // emitted as schema-matching HTML here — never as markdown, or it would land
-    // as literal text on re-import. Nodes whose processNode case already produces
-    // schema-matching HTML (math/media/embed/attachment/nested columns/spanned
-    // table) are delegated to processNode; the markdown-emitting cases
-    // (image/blockquote/callout/details/hr/taskList) get explicit HTML here.
-    const blockToHtml = (block) => {
-        const children = block.content || [];
-        switch (block.type) {
-            case "paragraph":
-                return `<p>${inlineToHtml(children)}</p>`;
-            case "heading": {
-                const level = block.attrs?.level || 1;
-                return `<h${level}>${inlineToHtml(children)}</h${level}>`;
-            }
-            case "bulletList":
-                return `<ul>${children
-                    .map((li) => `<li>${blockChildrenToHtml(li)}</li>`)
-                    .join("")}</ul>`;
-            case "orderedList":
-                return `<ol>${children
-                    .map((li) => `<li>${blockChildrenToHtml(li)}</li>`)
-                    .join("")}</ol>`;
-            case "codeBlock": {
-                const lang = block.attrs?.language || "";
-                // The code itself is element TEXT content (between <code> tags), so it
-                // must escape < > & — NOT the attribute escaper. The language rides in
-                // a class ATTRIBUTE, so it uses escapeAttr.
-                const code = escapeHtmlText(children
-                    .map(processNode)
-                    .join("")
-                    .replace(/\n+$/, ""));
-                const cls = lang ? ` class="language-${escapeAttr(lang)}"` : "";
-                return `<pre><code${cls}>${code}</code></pre>`;
-            }
-            case "image":
-                return imageToHtml(block);
-            case "blockquote":
-                return `<blockquote>${children.map(blockToHtml).join("")}</blockquote>`;
-            case "horizontalRule":
-                return "<hr>";
-            case "callout":
-                return calloutToHtml(block);
-            case "details":
-                return detailsToHtml(block);
-            case "detailsSummary":
-                return detailsSummaryToHtml(block);
-            case "detailsContent":
-                return detailsContentToHtml(block);
-            case "taskList":
-                return taskListToHtml(block);
-            case "taskItem":
-                // A bare taskItem (outside a taskList) still needs a wrapping list so
-                // the schema parses it; wrap it in a single-item taskList.
-                return taskListToHtml({ content: [block] });
-            // table (incl. spanned), columns/column, math, media, embed, attachment,
-            // mention, etc. already emit schema-matching HTML from processNode.
-            case "table":
-            case "columns":
-            case "column":
-            case "mathBlock":
-            case "video":
-            case "audio":
-            case "pdf":
-            case "youtube":
-            case "embed":
-            case "attachment":
-            case "drawio":
-            case "excalidraw":
-                return processNode(block);
-            default:
-                // Any still-unhandled block type: NEVER fall back to markdown inside a
-                // raw-HTML block (it would become literal text). Wrap its rendered
-                // children in a <div> so their content is preserved; if it has no block
-                // children, render its inline content instead.
-                if (children.length && children.some((c) => c.type !== "text")) {
-                    return `<div>${children.map(blockToHtml).join("")}</div>`;
-                }
-                return `<div>${inlineToHtml(children)}</div>`;
-        }
-    };
-    // Render the block children of a list item to HTML (a listItem holds block+
-    // content). Mirrors processListItem but for the HTML fallback path.
-    const blockChildrenToHtml = (item) => (item.content || []).map((b) => blockToHtml(b)).join("");
-    // Indent the rendered children of a list item under a marker prefix.
-    // Each child block is a (possibly multi-line) string. The very first physical
-    // line of the first child carries the marker (e.g. "- " or "1. "); EVERY
-    // other line — the remaining lines of the first child AND all lines of every
-    // subsequent child (nested lists, code blocks, extra paragraphs) — is indented
-    // to align under the marker. Without indenting these continuation lines, the
-    // 2nd/3rd line of a nested child collapses to column 0 and escapes the list.
-    //
-    // The continuation indent MUST equal the LIST marker width, which is not the
-    // same as the visible prefix width:
-    //   - bullet "- "          -> 2 columns
-    //   - task   "- [ ] "      -> marker is still "- " (the "[ ] " is content), 2
-    //   - ordered "1. "/"10. " -> 3/4 columns, scaling with the number's digits
-    // CommonMark anchors nested content to the marker column, so an ordered item
-    // indented to only 2 columns would be re-parsed as a sibling/loose content on
-    // re-import. Callers therefore pass the exact indent width to use.
-    const indentItemChildren = (childStrings, prefix, indentWidth) => {
-        const indent = " ".repeat(indentWidth);
-        const lines = [];
-        childStrings.forEach((child, childIndex) => {
-            child.split("\n").forEach((line, lineIndex) => {
-                if (childIndex === 0 && lineIndex === 0) {
-                    // First physical line of the first block gets the marker.
-                    lines.push(`${prefix} ${line}`);
-                }
-                else {
-                    // Indent every continuation line by the marker width; keep blank
-                    // lines blank rather than emitting trailing whitespace.
-                    lines.push(line.length ? `${indent}${line}` : "");
-                }
-            });
-        });
-        return lines.join("\n");
-    };
-    const processListItem = (item, prefix) => {
-        const itemContent = item.content || [];
-        const childStrings = itemContent.map(processNode);
-        if (childStrings.length === 0)
-            return prefix;
-        // The rendered marker is `${prefix} ` (prefix + one space), so its width —
-        // and thus the continuation indent — is prefix.length + 1. This is correct
-        // for both bullet ("-" -> 2) and ordered ("1." -> 3, "10." -> 4) markers,
-        // since for those the visible prefix IS the list marker.
-        return indentItemChildren(childStrings, prefix, prefix.length + 1);
-    };
-    const processTaskItem = (item) => {
-        const checked = item.attrs?.checked || false;
-        const checkbox = checked ? "[x]" : "[ ]";
-        const prefix = `- ${checkbox}`;
-        const itemContent = item.content || [];
-        const childStrings = itemContent.map(processNode);
-        // An empty task item still needs its checkbox marker; without this guard
-        // the indent below produces "" and the "- [ ]"/"- [x]" row disappears.
-        if (childStrings.length === 0)
-            return prefix;
-        // The list marker for a task item is just "- " (2 columns); the "[ ] "/"[x] "
-        // checkbox is item content, NOT part of the marker. So the continuation
-        // indent is a fixed 2 — do NOT derive it from the wider prefix.length.
-        return indentItemChildren(childStrings, prefix, 2);
-    };
-    return processNode(content).trim();
-}
--- a/packages/git-sync/build/lib/markdown-document.d.ts
+++ b/packages/git-sync/build/lib/markdown-document.d.ts
@@ -1,68 +0,0 @@
-/**
- * Self-contained Docmost-flavoured Markdown document (custom extensions).
- *
- * A single `.md` file that packages everything needed to losslessly round-trip
- * a page through "download -> edit body -> re-upload":
- *   - a leading `docmost:meta` block: a one-line JSON object with page identity;
- *   - the Markdown body (carrying inline comment anchors and diagrams as HTML);
- *   - a trailing `docmost:comments` block: a one-line JSON array of comment
- *     threads.
- *
- * Both metadata blocks are HTML comments on purpose: `marked`/`generateJSON`
- * drop HTML comments, so even if the WHOLE file were ever fed straight to the
- * importer without first stripping the blocks, the metadata cannot leak into the
- * document. (A fenced ```docmost-comments``` block would WRONGLY become a
- * codeBlock node, so a fenced block is deliberately NOT used.)
- *
- * The delimiter literals may legitimately appear in the BODY too (e.g. a user
- * re-pastes an exported `.md` into a page, or a page documents this very
- * format). To stay robust, parsing treats only the FINAL, document-ending
- * `docmost:comments` block as metadata: it is the last `<!-- docmost:comments`
- * opener whose closing `-->` sits at the very end of the file. Any earlier
- * literal occurrence is left in the body untouched.
- *
- * NOTE on comments: in this version the comment THREAD records are preserved in
- * the file but are NOT pushed back to the server on import — only the inline
- * comment marks (anchors) embedded in the body are restored. Managing comment
- * records stays with the comment tools/UI.
- */
-export interface DocmostMdMeta {
-    version: number;
-    pageId?: string;
-    slugId?: string;
-    title?: string;
-    spaceId?: string;
-    parentPageId?: string | null;
-}
-/**
- * Assemble the full self-contained markdown file: meta block, body, and the
- * comments block. The meta block is always emitted; the comments block is always
- * emitted too (with `[]` when there are no comments) so the format stays uniform
- * and parsing stays simple.
- */
-export declare function serializeDocmostMarkdown(meta: DocmostMdMeta, body: string, comments: any[]): string;
-/**
- * Split a self-contained file back into its parts. Tolerant: if the meta or
- * comments block is missing (e.g. a hand-written plain-markdown file), the
- * corresponding value is returned as `null` and the whole input is treated as
- * the body. This never throws on a MISSING block; only a `JSON.parse` failure
- * inside a block that IS present is surfaced as a thrown Error with a clear
- * message. Robust to `\r\n` line endings.
- */
-export declare function parseDocmostMarkdown(full: string): {
-    meta: DocmostMdMeta | null;
-    body: string;
-    comments: any[] | null;
-};
-/**
- * Serialize a self-contained markdown file with the meta block + body ONLY —
- * NO trailing `docmost:comments` block. The sync engine never touches
- * `/comments` (SPEC §3): the synced file carries just page identity (meta) and
- * the body, where comment threads survive only as inline `<span
- * data-comment-id>` anchor marks inside the body.
- *
- * `parseDocmostMarkdown` already tolerates a missing comments block (it returns
- * `comments: null` and treats the rest as body), so a file produced here
- * round-trips cleanly through the parser.
- */
-export declare function serializeDocmostMarkdownBody(meta: DocmostMdMeta, body: string): string;
--- a/packages/git-sync/build/lib/markdown-document.js
+++ b/packages/git-sync/build/lib/markdown-document.js
@@ -1,118 +0,0 @@
-/**
- * Self-contained Docmost-flavoured Markdown document (custom extensions).
- *
- * A single `.md` file that packages everything needed to losslessly round-trip
- * a page through "download -> edit body -> re-upload":
- *   - a leading `docmost:meta` block: a one-line JSON object with page identity;
- *   - the Markdown body (carrying inline comment anchors and diagrams as HTML);
- *   - a trailing `docmost:comments` block: a one-line JSON array of comment
- *     threads.
- *
- * Both metadata blocks are HTML comments on purpose: `marked`/`generateJSON`
- * drop HTML comments, so even if the WHOLE file were ever fed straight to the
- * importer without first stripping the blocks, the metadata cannot leak into the
- * document. (A fenced ```docmost-comments``` block would WRONGLY become a
- * codeBlock node, so a fenced block is deliberately NOT used.)
- *
- * The delimiter literals may legitimately appear in the BODY too (e.g. a user
- * re-pastes an exported `.md` into a page, or a page documents this very
- * format). To stay robust, parsing treats only the FINAL, document-ending
- * `docmost:comments` block as metadata: it is the last `<!-- docmost:comments`
- * opener whose closing `-->` sits at the very end of the file. Any earlier
- * literal occurrence is left in the body untouched.
- *
- * NOTE on comments: in this version the comment THREAD records are preserved in
- * the file but are NOT pushed back to the server on import — only the inline
- * comment marks (anchors) embedded in the body are restored. Managing comment
- * records stays with the comment tools/UI.
- */
-// Match the leading meta block (allow leading whitespace). Capture group 1 is
-// the JSON text between the markers.
-const META_RE = /^\s*<!--\s*docmost:meta\s*\n([\s\S]*?)\n-->/;
-// Match a `docmost:comments` opener. Used globally to scan for the LAST opener
-// rather than end-anchoring a single regex (which would mis-capture across a
-// literal opener that appears earlier in the body).
-const COMMENTS_OPEN_RE = /<!--[ \t]*docmost:comments[ \t]*\r?\n/g;
-/**
- * Assemble the full self-contained markdown file: meta block, body, and the
- * comments block. The meta block is always emitted; the comments block is always
- * emitted too (with `[]` when there are no comments) so the format stays uniform
- * and parsing stays simple.
- */
-export function serializeDocmostMarkdown(meta, body, comments) {
-    const metaJson = JSON.stringify(meta);
-    const commentsJson = JSON.stringify(Array.isArray(comments) ? comments : []);
-    const trimmedBody = (body ?? "").trim();
-    return (`<!-- docmost:meta\n${metaJson}\n-->\n\n` +
-        `${trimmedBody}\n\n` +
-        `<!-- docmost:comments\n${commentsJson}\n-->\n`);
-}
-/**
- * Split a self-contained file back into its parts. Tolerant: if the meta or
- * comments block is missing (e.g. a hand-written plain-markdown file), the
- * corresponding value is returned as `null` and the whole input is treated as
- * the body. This never throws on a MISSING block; only a `JSON.parse` failure
- * inside a block that IS present is surfaced as a thrown Error with a clear
- * message. Robust to `\r\n` line endings.
- */
-export function parseDocmostMarkdown(full) {
-    // Normalize line endings so the anchored regexes work regardless of CRLF.
-    const normalized = (full ?? "").replace(/\r\n/g, "\n");
-    // Extract the leading meta block (start-anchored — already unambiguous).
-    let meta = null;
-    let metaEnd = 0;
-    const metaMatch = normalized.match(META_RE);
-    if (metaMatch) {
-        try {
-            meta = JSON.parse(metaMatch[1]);
-        }
-        catch (e) {
-            throw new Error(`Invalid docmost:meta JSON block: ${e instanceof Error ? e.message : String(e)}`);
-        }
-        // Body starts right after the matched meta block.
-        metaEnd = (metaMatch.index ?? 0) + metaMatch[0].length;
-    }
-    // Find the LAST `<!-- docmost:comments` opener; the real file-level block is
-    // the final one whose closing `-->` ends the document. Any earlier literal
-    // occurrence inside the body (e.g. a re-pasted export) is left in the body.
-    let lastOpenStart = -1;
-    let lastOpenEnd = -1;
-    let m;
-    COMMENTS_OPEN_RE.lastIndex = 0;
-    while ((m = COMMENTS_OPEN_RE.exec(normalized)) !== null) {
-        lastOpenStart = m.index;
-        lastOpenEnd = m.index + m[0].length;
-    }
-    let comments = null;
-    let bodyEnd = normalized.length;
-    if (lastOpenStart !== -1) {
-        const rest = normalized.slice(lastOpenEnd);
-        const close = rest.match(/\r?\n-->[ \t]*\r?\n?\s*$/); // closer must end the doc
-        if (close) {
-            const jsonText = rest.slice(0, close.index);
-            try {
-                comments = JSON.parse(jsonText);
-            }
-            catch (e) {
-                throw new Error(`Invalid docmost:comments JSON block: ${e instanceof Error ? e.message : String(e)}`);
-            }
-            bodyEnd = lastOpenStart; // strip from the opener to end of document
-        }
-    }
-    const body = normalized.slice(metaEnd, bodyEnd).trim();
-    return { meta, body, comments };
-}
-/**
- * Serialize a self-contained markdown file with the meta block + body ONLY —
- * NO trailing `docmost:comments` block. The sync engine never touches
- * `/comments` (SPEC §3): the synced file carries just page identity (meta) and
- * the body, where comment threads survive only as inline `<span
- * data-comment-id>` anchor marks inside the body.
- *
- * `parseDocmostMarkdown` already tolerates a missing comments block (it returns
- * `comments: null` and treats the rest as body), so a file produced here
- * round-trips cleanly through the parser.
- */
-export function serializeDocmostMarkdownBody(meta, body) {
-    return `<!-- docmost:meta\n${JSON.stringify(meta)}\n-->\n\n${(body ?? "").trim()}\n`;
-}
--- a/packages/git-sync/build/lib/markdown-to-prosemirror.js
+++ b/packages/git-sync/build/lib/markdown-to-prosemirror.js
@@ -1,306 +0,0 @@
-/**
- * Pure markdown -> ProseMirror conversion.
- *
- * The converter path is `markdownToProseMirror` (marked -> HTML ->
- * generateJSON) plus the two pre/post processors it needs (`preprocessCallouts`,
- * `bridgeTaskLists`). The gitmost server writes the resulting page bodies
- * natively through the collab gateway, so no websocket/Yjs write-path lives
- * here.
- */
-import { generateJSON } from "@tiptap/html";
-import { JSDOM } from "jsdom";
-import { marked } from "marked";
-import { docmostExtensions } from "./docmost-schema.js";
-// Setup DOM environment for Tiptap HTML parsing in Node.js
-const dom = new JSDOM("<!DOCTYPE html><html><body></body></html>");
-global.window = dom.window;
-global.document = dom.window.document;
-// @ts-ignore
-global.Element = dom.window.Element;
-/**
- * Hard ceiling above which we skip callout preprocessing entirely. The linear
- * scanner below has no quadratic blow-up, but we still cap input defensively so
- * a pathological multi-megabyte payload cannot tie up the event loop; in that
- * case the markdown is passed through verbatim (callouts are simply not
- * detected) rather than risking a slow scan.
- */
-const MAX_CALLOUT_PREPROCESS_BYTES = 4 * 1024 * 1024; // 4 MB
-/** Matches an opening callout fence: `:::type` (type captured, lower-cased). */
-const CALLOUT_OPEN_RE = /^:::\s*(\w+)\s*$/;
-/** Matches a bare closing callout fence: `:::`. */
-const CALLOUT_CLOSE_RE = /^:::\s*$/;
-/** Matches the start/end of a code fence (``` or ~~~), capturing the marker. */
-const CODE_FENCE_RE = /^(\s*)(`{3,}|~{3,})/;
-/**
- * Pre-process Docmost-flavoured markdown: convert `:::type ... :::`
- * callout blocks (the syntax our markdown export produces) into HTML
- * divs that the callout extension parses. The inner content is rendered
- * through marked as regular markdown.
- *
- * Implemented as a single linear pass over the lines (no quadratic regex
- * rescan). It:
- *   - tracks fenced code regions (```...``` and ~~~...~~~) and never treats a
- *     `:::` line that lives inside a code fence as a callout delimiter, so a
- *     callout body that itself contains a fenced code block with a `:::` line is
- *     no longer corrupted;
- *   - matches an opening `:::type` line with the next CLOSING `:::` at the SAME
- *     nesting level, supporting NESTED callouts via a depth counter (an inner
- *     `:::type` opens a deeper level and consumes a matching `:::`);
- *   - emits the same `<div data-type="callout" data-callout-type="TYPE">` output
- *     (inner rendered through marked) as the previous regex implementation.
- */
-async function preprocessCallouts(markdown) {
-    // Defensive cap: skip preprocessing for pathologically large inputs.
-    if (markdown.length > MAX_CALLOUT_PREPROCESS_BYTES) {
-        return markdown;
-    }
-    // Recursively transform a slice of lines, converting top-level callouts in
-    // that slice into <div> blocks and rendering their inner content (which may
-    // itself contain nested callouts) through this same function.
-    const transform = async (lines) => {
-        const out = [];
-        let inCodeFence = false;
-        let codeFenceMarker = ""; // the exact run of backticks/tildes that opened it
-        let i = 0;
-        while (i < lines.length) {
-            const line = lines[i];
-            // Inside a code fence, only its matching closing fence is significant;
-            // everything else (including `:::` lines) is copied through verbatim.
-            if (inCodeFence) {
-                out.push(line);
-                const fence = line.match(CODE_FENCE_RE);
-                if (fence && fence[2].startsWith(codeFenceMarker[0]) &&
-                    fence[2].length >= codeFenceMarker.length) {
-                    inCodeFence = false;
-                    codeFenceMarker = "";
-                }
-                i++;
-                continue;
-            }
-            // A code fence opening outside any callout body: enter code-fence mode.
-            const fenceOpen = line.match(CODE_FENCE_RE);
-            if (fenceOpen) {
-                inCodeFence = true;
-                codeFenceMarker = fenceOpen[2];
-                out.push(line);
-                i++;
-                continue;
-            }
-            // An opening callout fence: scan forward (with code-fence and nested
-            // callout awareness) for its matching closing `:::` at the same level.
-            const open = line.match(CALLOUT_OPEN_RE);
-            if (open) {
-                const type = open[1].toLowerCase();
-                const bodyLines = [];
-                let depth = 1;
-                let innerInCodeFence = false;
-                let innerCodeFenceMarker = "";
-                let j = i + 1;
-                for (; j < lines.length; j++) {
-                    const bl = lines[j];
-                    if (innerInCodeFence) {
-                        const f = bl.match(CODE_FENCE_RE);
-                        if (f && f[2].startsWith(innerCodeFenceMarker[0]) &&
-                            f[2].length >= innerCodeFenceMarker.length) {
-                            innerInCodeFence = false;
-                            innerCodeFenceMarker = "";
-                        }
-                        bodyLines.push(bl);
-                        continue;
-                    }
-                    const innerFence = bl.match(CODE_FENCE_RE);
-                    if (innerFence) {
-                        innerInCodeFence = true;
-                        innerCodeFenceMarker = innerFence[2];
-                        bodyLines.push(bl);
-                        continue;
-                    }
-                    if (CALLOUT_OPEN_RE.test(bl)) {
-                        depth++;
-                        bodyLines.push(bl);
-                        continue;
-                    }
-                    if (CALLOUT_CLOSE_RE.test(bl)) {
-                        depth--;
-                        if (depth === 0)
-                            break; // matching close for THIS callout
-                        bodyLines.push(bl);
-                        continue;
-                    }
-                    bodyLines.push(bl);
-                }
-                if (j < lines.length) {
-                    // Found the matching closing fence: render the body (recursively, so
-                    // nested callouts are handled) and emit the callout div.
-                    const inner = await transform(bodyLines);
-                    const renderedInner = await marked.parse(inner);
-                    out.push(`\n<div data-type="callout" data-callout-type="${type}">${renderedInner}</div>\n`);
-                    i = j + 1; // skip past the closing `:::`
-                    continue;
-                }
-                // No matching close (unterminated callout): treat the opener as a
-                // literal line and continue, preserving the original text.
-                out.push(line);
-                i++;
-                continue;
-            }
-            out.push(line);
-            i++;
-        }
-        return out.join("\n");
-    };
-    return transform(markdown.split("\n"));
-}
-/**
- * Bridge marked's checkbox lists to TipTap task lists.
- *
- * marked renders GitHub task list items (`- [x] done`) as a plain
- * `<ul><li><p><input type="checkbox" checked> text</p></li></ul>` WITHOUT the
- * markup TipTap's TaskList/TaskItem extensions parse. This rewrites such lists
- * into the shape those extensions expect:
- *   TaskList parseHTML matches `ul[data-type="taskList"]`,
- *   TaskItem matches `li[data-type="taskItem"]`,
- *   the checked state is read from `data-checked === "true"`.
- *
- * A list is only converted when it has at least one `<li>` and EVERY direct
- * `<li>` contains a checkbox input. Both `<ul>` and `<ol>` are considered: a
- * numbered checklist (`1. [x] a`, which marked renders as an `<ol>` of checkbox
- * `<li>`s) would otherwise lose its task state. TipTap task lists are unordered,
- * so a matching `<ol>` is emitted as `data-type="taskList"` exactly like a
- * `<ul>`. Mixed or ordinary lists (including ordinary `<ol>` lists) are left
- * untouched so they keep rendering as bullet/numbered lists. The marked `<p>`
- * wrapper is kept inside the `<li>` because TaskItem content allows paragraphs.
- */
-function bridgeTaskLists(html) {
-    // Cheap early-out: if the markup contains no checkbox input at all there is
-    // nothing to bridge, so skip the expensive JSDOM parse entirely. This is the
-    // common case (most pages have no task lists).
-    if (!/type=["']?checkbox/i.test(html)) {
-        return html;
-    }
-    // Defensive cap (consistent with preprocessCallouts): skip the bridge for
-    // pathologically large inputs rather than running a second expensive JSDOM
-    // parse on a multi-megabyte payload. The markup is passed through verbatim.
-    if (html.length > MAX_CALLOUT_PREPROCESS_BYTES) {
-        return html;
-    }
-    const dom = new JSDOM(html);
-    const document = dom.window.document;
-    // Collect the checkbox(es) that belong to THIS <li> directly: either direct
-    // child <input type="checkbox"> elements or ones inside the <li>'s direct <p>
-    // child (the shape marked emits: `<li><p><input type="checkbox"> text</p></li>`).
-    // Checkboxes nested deeper (e.g. inside a child <ul>/<ol>) are excluded so a
-    // bullet <li> that merely contains a nested task sublist is not misdetected.
-    // Raw inline HTML can put more than one checkbox in a single <li>; we gather
-    // ALL of them so none survive into the converted item.
-    const directCheckboxes = (li) => {
-        const found = [];
-        for (const child of Array.from(li.children)) {
-            if (child.tagName === "INPUT" &&
-                child.getAttribute("type") === "checkbox") {
-                found.push(child);
-                continue;
-            }
-            if (child.tagName === "P") {
-                for (const inp of Array.from(child.querySelectorAll(":scope > input[type='checkbox']"))) {
-                    found.push(inp);
-                }
-            }
-        }
-        return found;
-    };
-    // Both <ul> and <ol> are candidates: an <ol> whose every direct <li> carries
-    // its own checkbox is a numbered checklist that must also become a taskList.
-    const lists = Array.from(document.querySelectorAll("ul, ol"));
-    for (const list of lists) {
-        // Only consider DIRECT child <li> elements; nested lists are handled by
-        // their own iteration of the outer loop.
-        const items = Array.from(list.children).filter((child) => child.tagName === "LI");
-        if (items.length === 0)
-            continue;
-        const itemCheckboxes = items.map((li) => directCheckboxes(li));
-        // Convert only when every direct <li> carries at least one OWN checkbox.
-        if (!itemCheckboxes.every((boxes) => boxes.length > 0))
-            continue;
-        // A numbered checklist arrives as an <ol>. We must NOT leave the tag as
-        // <ol> while tagging it data-type="taskList": generateJSON would then match
-        // BOTH the orderedList rule (tag ol) and the taskList rule (data-type),
-        // emitting a phantom empty orderedList beside the real taskList. So rename a
-        // qualifying <ol> to a <ul> — move its <li> children over and replace it —
-        // leaving only the taskList rule to match. Already-<ul> lists are unchanged.
-        let target = list;
-        if (list.tagName === "OL") {
-            const ul = document.createElement("ul");
-            // Carry over existing attributes (e.g. class) so nothing is silently lost.
-            for (const attr of Array.from(list.attributes)) {
-                ul.setAttribute(attr.name, attr.value);
-            }
-            // Move every child node (including the <li>s we collected) into the <ul>.
-            while (list.firstChild) {
-                ul.appendChild(list.firstChild);
-            }
-            list.replaceWith(ul);
-            target = ul;
-        }
-        target.setAttribute("data-type", "taskList");
-        items.forEach((li, index) => {
-            const boxes = itemCheckboxes[index];
-            // The first checkbox determines the checked state (matches the previous
-            // single-checkbox behaviour); any extras only need removing.
-            const input = boxes[0] ?? null;
-            li.setAttribute("data-type", "taskItem");
-            const checked = input != null &&
-                (input.hasAttribute("checked") || input.checked);
-            li.setAttribute("data-checked", checked ? "true" : "false");
-            // Remove ALL direct checkbox inputs so none survive into the content
-            // (a raw-inline-HTML <li> may carry more than one).
-            for (const box of boxes) {
-                box.remove();
-            }
-        });
-    }
-    return document.body.innerHTML;
-}
-/**
- * Recursively strip content-less paragraph nodes from a generated doc.
- *
- * A block-level atom whose markdown form is INLINE (e.g. the block `image`'s
- * `![](url)`, or a bare media element) is wrapped by marked in a <p>; the schema
- * then HOISTS the block atom out of that paragraph, leaving an EMPTY paragraph
- * sibling. On the next export that empty `<p>` renders to "" and the doc "\n\n"
- * join injects a phantom blank gap, so the markdown is not byte-stable.
- *
- * Markdown blank lines are separators, never content, so generateJSON only ever
- * produces an empty paragraph as such a hoist artifact — removing them is safe
- * and general (it also subsumes the <div>-wrapper workaround the `video` case
- * uses). We remove ONLY `type === 'paragraph'` nodes whose `content` is absent
- * or an empty array; every other node (including atoms without `content`) is
- * preserved, and we recurse into the content of any node that has children.
- */
-function stripEmptyParagraphs(node) {
-    if (!node || !Array.isArray(node.content)) {
-        // Atom / leaf node (no children to recurse into): keep as-is.
-        return node;
-    }
-    const mapped = node.content.map((child) => stripEmptyParagraphs(child));
-    const isEmptyParagraph = (child) => !!child &&
-        child.type === "paragraph" &&
-        (!Array.isArray(child.content) || child.content.length === 0);
-    const filtered = mapped.filter((child) => !isEmptyParagraph(child));
-    // Schema-validity guard: several nodes require NON-empty block content
-    // (`content: "block+"` — tableCell, tableHeader, blockquote, column, callout,
-    // and the doc root). For an empty one of those, generateJSON materializes a
-    // single empty paragraph as its OBLIGATORY content — that is not a hoist
-    // artifact. If stripping would empty the container, keep ONE empty paragraph
-    // so the result stays schema-valid (an empty cell/quote must not become `[]`).
-    const cleaned = filtered.length === 0 && mapped.length > 0 ? [mapped[0]] : filtered;
-    return { ...node, content: cleaned };
-}
-/** Convert markdown to a ProseMirror doc using the full Docmost schema. */
-export async function markdownToProseMirror(markdownContent) {
-    const withCallouts = await preprocessCallouts(markdownContent);
-    const html = await marked.parse(withCallouts);
-    const bridged = bridgeTaskLists(html);
-    const doc = generateJSON(bridged, docmostExtensions);
-    return stripEmptyParagraphs(doc);
-}