F7 [CRITICAL] The round-1 F2(a) fix built ONE alternation regex over all
definition ids (`(id1|id2|...)`). On prefix-chain ids (a, aa, aaa, ...) V8's
regex compiler blows its stack with a fatal, UNCATCHABLE 'RegExpCompiler
Allocation failed' that kills the whole process — strictly worse than the
original per-def thread-hang, and its match cost was still O(text x defs).
Replaced with a single FIXED generic scanner `/\[\^([^\]]+)\]/g` plus a map
lookup in the replacer: genuinely O(total text), no per-document regex
compilation, cannot blow up. Output is identical (only real def ids are inlined).
F8 [WARNING] The frontmatter strip regex was not line-anchored: it closed on the
FIRST `---` anywhere, so a value containing a triple-dash (e.g.
'title: Q1 --- Q2') truncated the frontmatter and leaked the rest into the body.
Replaced with the line-anchored shape the canonical parser already uses
(page-file.ts): open on `---\n`, close on a `\n---` line.
Adds tests: 4000 prefix-chain ids do not crash and stay fast; a frontmatter
value containing '---' is stripped whole.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Addresses the round-1 review of #369:
F1 [CRITICAL] Restore prom-client. The prior commit removed it as a 'stray dep',
but metrics.registry.ts imports it unconditionally at startup (main.ts boot), so
a clean frozen install had no prom-client -> server tsc TS2307 + boot crash. It
was surviving only via hoisting from a warm store. Restored to apps/server
dependencies + regenerated the lock (prom-client/tdigest/bintrees return),
keeping the @docmost/prosemirror-markdown dep. Verified: clean frozen install ->
require.resolve('prom-client') ok, server tsc EXIT 0.
F2 [HIGH] Two quadratic ReDoS vectors in foreign-markdown.ts on untrusted import
(runs synchronously on the request thread, 30MB cap):
(a) pass-2 was O(lines x defs) — a per-def RegExp rebuilt and run over every
line. Replaced with ONE precompiled alternation regex over all def ids,
built once per document, with an id->body lookup in the replacer: O(text).
(b) the inline-code split alternation backtracks quadratically on a long
UNCLOSED backtick run. Lines over 8KB now skip the split (left untouched) —
a real footnote line is never that long.
F3 [WARNING] Restore the leading YAML front-matter strip that the retired
markdownToHtml layer did. Without it, Obsidian/Hugo/Jekyll/git-sync files leak
their front-matter into the body (and 'title:' renders as a setext heading that
title extraction can hijack).
F4 [WARNING] Extend the zip-import spec with an image (width+align) + callout
fidelity assertion through the PM->HTML->PM hop (the one hop the package suite
does not cover).
F5/F6 Update AGENTS.md (apps/server is now a prosemirror-markdown consumer) and
make the server pretest build prosemirror-markdown too.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The foreign-markdown import normalizer rewrote GFM reference footnotes
(`[^id]` + `[^id]: def`) into canonical inline `^[def]` footnotes, but two
edge cases corrupted content:
1. A `[^id]` inside an inline-code span (backticks) was rewritten like prose
text — only fenced code blocks were protected. Now the rewrite pass splits
each line on inline-code spans and only touches the text outside them.
2. An unbalanced `]` in a definition body truncated the resulting `^[...]`
footnote at the canonical tokenizer, leaking the tail as literal text. The
body's square brackets are now backslash-escaped before wrapping.
Adds golden cases for both.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move every SERVER Markdown->ProseMirror path off the editor-ext markdown layer
(`markdownToHtml`, a second marked-based parser) onto the canonical
`@docmost/prosemirror-markdown` package, and add a foreign-markdown normalizer at
the import boundary.
Code:
- `ImportService.processMarkdown` (single `.md` upload) now parses
`markdownToProseMirror(normalizeForeignMarkdown(md))` directly — no HTML hop.
- `PageService.parseProsemirrorContent` markdown case (page create/update with
`format: 'markdown'`) same.
- `FileImportTaskService` (zip import) parses markdown with the package, then
serializes to HTML (`jsonToHtml`) so the SHARED HTML attachment / internal-link
pipeline (processAttachments + formatImportHtml + processHTML) keeps handling
`.md` and `.html` imports uniformly. The markdown PARSE — the drift source — no
longer goes through editor-ext; the PM->HTML->PM hop that follows is lossless
plumbing for attachment resolution, not a second parse.
- `canonicalizeFootnotes` stays as an idempotent #228 safety net for the HTML
path (a no-op on the already-canonical markdown output).
Normalizer (`integrations/import/utils/foreign-markdown.ts`): a TEXT pre-pass,
NOT a parser fork. The strict canonical parser does not accept GFM `[^id]`
reference footnotes (and would misread `[^id]: def` as a CommonMark link-ref
definition, silently corrupting the ref into a bogus link), so the normalizer
rewrites reference footnotes into canonical inline `^[def]` before parsing.
Callout surfaces (`:::type` and `> [!type]`) are intentionally NOT touched — the
canonical parser already accepts BOTH natively, so normalizing them would be
redundant and risk degrading its nesting/code-fence-aware handling.
Fixtures-first: foreign-markdown.spec pins the normalizer and the end-to-end
acceptance (no literal `[^id]`/`:::` leaks; re-export is canonical). The two
footnote-canonicalize specs are updated to the canonical output — the parser
assigns fresh `fn-*` ids, so they now assert by definition BODY order (still
reference-ordered, deduped, orphan-free).
FINAL CHECK: `grep -rn "htmlToMarkdown\|markdownToHtml" apps/server/src` (non
-test) is now empty — both editor-ext markdown-layer functions are gone from the
server.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>