Move every SERVER Markdown->ProseMirror path off the editor-ext markdown layer
(`markdownToHtml`, a second marked-based parser) onto the canonical
`@docmost/prosemirror-markdown` package, and add a foreign-markdown normalizer at
the import boundary.
Code:
- `ImportService.processMarkdown` (single `.md` upload) now parses
`markdownToProseMirror(normalizeForeignMarkdown(md))` directly — no HTML hop.
- `PageService.parseProsemirrorContent` markdown case (page create/update with
`format: 'markdown'`) same.
- `FileImportTaskService` (zip import) parses markdown with the package, then
serializes to HTML (`jsonToHtml`) so the SHARED HTML attachment / internal-link
pipeline (processAttachments + formatImportHtml + processHTML) keeps handling
`.md` and `.html` imports uniformly. The markdown PARSE — the drift source — no
longer goes through editor-ext; the PM->HTML->PM hop that follows is lossless
plumbing for attachment resolution, not a second parse.
- `canonicalizeFootnotes` stays as an idempotent #228 safety net for the HTML
path (a no-op on the already-canonical markdown output).
Normalizer (`integrations/import/utils/foreign-markdown.ts`): a TEXT pre-pass,
NOT a parser fork. The strict canonical parser does not accept GFM `[^id]`
reference footnotes (and would misread `[^id]: def` as a CommonMark link-ref
definition, silently corrupting the ref into a bogus link), so the normalizer
rewrites reference footnotes into canonical inline `^[def]` before parsing.
Callout surfaces (`:::type` and `> [!type]`) are intentionally NOT touched — the
canonical parser already accepts BOTH natively, so normalizing them would be
redundant and risk degrading its nesting/code-fence-aware handling.
Fixtures-first: foreign-markdown.spec pins the normalizer and the end-to-end
acceptance (no literal `[^id]`/`:::` leaks; re-export is canonical). The two
footnote-canonicalize specs are updated to the canonical output — the parser
assigns fresh `fn-*` ids, so they now assert by definition BODY order (still
reference-ordered, deduped, orphan-free).
FINAL CHECK: `grep -rn "htmlToMarkdown\|markdownToHtml" apps/server/src` (non
-test) is now empty — both editor-ext markdown-layer functions are gone from the
server.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move every SERVER ProseMirror->Markdown path off the editor-ext markdown layer
(`htmlToMarkdown`, a second turndown-based converter) onto the canonical
`@docmost/prosemirror-markdown` package.
- `ExportService.exportPage` (page/space markdown export) and
`collaboration.util.jsonToMarkdown` (used by page.controller's markdown
responses and the AI public-share chat tool) now serialize DIRECTLY from
ProseMirror JSON via `convertProseMirrorToMarkdown` — no HTML intermediate, no
`<colgroup>` scrub (the converter emits GFM tables directly).
This is the SAME serializer the git-sync vault writer feeds, so an exported page
BODY is byte-identical to its vault representation: no more export-md vs vault-md
drift. The HTML export path is unchanged (still `jsonToHtml`).
Emitted markdown moves to the canonical forms: callouts `> [!type]` (not
`:::type`), inline footnotes `^[…]` (not `[^id]`), lossless images
` <!--img {…}-->` (editor-ext dropped width/height/align).
Fixtures-first: export-markdown.spec asserts those canonical forms and the
export==vault-by-construction equality (both call the package converter). The
one deliberate export/vault delta — export prepends the page title as an H1
while the vault carries it in frontmatter — is pinned by a test.
Test infra: declare the `@docmost/prosemirror-markdown` workspace dep; teach
jest to load its ESM build (babel-jest) and stub `@tiptap/react` (server code
imports editor-ext, whose node views reference React renderers only used in a
live browser editor — never on the server).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>