refactor(converter): единый пакет @docmost/prosemirror-markdown + канон форматов, git-sync и mcp переключены (#293, шаги 2–5) #333
Open
agent_coder
wants to merge 13 commits from
feat/293-B-prosemirror-markdown-pkg into develop
pull from: feat/293-B-prosemirror-markdown-pkg
merge into: vvzvlad:develop
vvzvlad:main
vvzvlad:fix/328-resolved-anchor-spam
vvzvlad:fix/331-intraline-diff
vvzvlad:fix/330-search-in-page
vvzvlad:fix/329-ephemeral-suggestions
vvzvlad:fix/324-coverage-gate
vvzvlad:fix/325-mobile-390
vvzvlad:develop
vvzvlad:feat/293-A-git-sync-package
vvzvlad:feat/git-sync
vvzvlad:feat/300-avatar-oklch
vvzvlad:fix/321-banner-mobile
vvzvlad:feat/300-avatar-colors
vvzvlad:feat/315-comment-suggestions
vvzvlad:feat/scroll-restore-stable-wait
vvzvlad:feat/300-agent-avatar-stack
vvzvlad:feat/300-avatar-polish
vvzvlad:refactor/294-tool-spec-registry
vvzvlad:feat/scroll-restore-ux
vvzvlad:feat/184-autonomous-agent-runs
vvzvlad:fix/responsive-tablet-sidebar
vvzvlad:feature/ai-chat-page-change-observability
vvzvlad:feature/offline-sync
vvzvlad:image-inline-center
vvzvlad:fix/283-short-remap-title
vvzvlad:fix/283-slash-layout
vvzvlad:image-inline-row
vvzvlad:feat/276-ai-chat-dock
vvzvlad:fix/269-table-menu-refocus
vvzvlad:docs/dev-stand-guide
vvzvlad:feat/266-scroll-position
vvzvlad:fix/260-collab-docname-slugid
vvzvlad:test/244-phase2-tail
vvzvlad:fix/262-reindex-progress-realtime
vvzvlad:fix/258-changelog-compare-links
vvzvlad:fix/244-dataloss-bugs
vvzvlad:feat/246-spoiler
vvzvlad:feat/221-image-captions
vvzvlad:test/244-part-b
vvzvlad:feat/251-intentional-clear
vvzvlad:fix/embeddings-reindex-progress
vvzvlad:refactor/193-tool-spec-registry
vvzvlad:fix/255-ws-redis-adapter-leak
vvzvlad:fix/252-e2e-open-handles
vvzvlad:feat/229-catalog-yaml
vvzvlad:feat/243-blob-sandbox
vvzvlad:feat/228-inline-footnotes
vvzvlad:fix/qa-ui-bugs-216-218
vvzvlad:feature/agent-roles-catalog
vvzvlad:fix/share-alias-rename
vvzvlad:fix/ai-chat-empty-render
vvzvlad:feat/191-chat-doc-binding
vvzvlad:feat/201-temporary-notes
vvzvlad:feat/198-interrupt-agent
vvzvlad:feat/ai-chat-full-history
vvzvlad:feat/199-ai-generate-title
vvzvlad:feat/205-share-aliases
vvzvlad:batch/issues-189-187-170
vvzvlad:feat/170-mcp-test-button
vvzvlad:feat/189-context-badge
vvzvlad:feat/198-interrupt-agent-send-now
vvzvlad:fix/issues-190-159
vvzvlad:fix/ai-chat-new-chat-during-stream
vvzvlad:fix/ai-chat-stream-perf
vvzvlad:batch/issues-2026-06-25
vvzvlad:feat/ai-chat-persistent-history
vvzvlad:fix/ai-chat-copy-chat-wysiwyg
vvzvlad:fix/ai-stream-reset-resilience
vvzvlad:fix/ai-stream-undici-timeout
vvzvlad:fix/footnote-review-1227-followup
vvzvlad:fix/ai-chat-token-counter-realtime
vvzvlad:docs/manual-qa-test-plan
13 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
08222345ef |
fix(prosemirror-markdown): escape canon inline-extension triggers = $ ^ in link/alt text (#333 review F5)
F1 (round 1) wrapped the image alt in escapeLinkText, and that helper also guards the link-form media captions (attachment/pdf/embed). But its character class covered only stock CommonMark — NOT the Docmost inline EXTENSIONS this same PR registers on the marked instance: highlight `==x==` (canon #7), math `$x$` (canon #6), footnote `^[x]` (canon #2). Their triggers `= $ ^` are not CommonMark punctuation, so an alt or media filename like `x $A$ y`, `use ==bold==`, `^[fn]`, or `data $A$.csv` was silently turned into a math/highlight/footnote node on import — the same class of round-trip data loss F1 closed, reintroduced by this PR's own canon. Fix: add `= $ ^` to the escapeLinkText class (`/[\\`*_~[\]<&!()=$^]/g`). `\= \$ \^` decode back to literals (all ASCII punctuation) AND, being escape tokens, stop the extension tokenizer from matching — verified lossless byte-stable round-trip. Updated the helper comment to name the two trigger sets (CommonMark + Docmost inline extensions). Extended the adversarial round-trip tests: image alt gains `x $A$ y` / `5$ and 10$` / `use ==bold==` / `^[fn]` / `cost $5 == price`; pdf name gains `data $A$.csv` / `q3 ==final==.pdf` / `5$ and 10$.pdf` / `note ^[x].pdf` — all byte-stable with the node intact, so the hole can't reopen. package vitest: 658 passed; tsc clean. git-sync: 268. mcp: 454. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
1a7b817250 |
fix(prosemirror-markdown): escape image alt + consolidate schema sanitizers + tidy (#333 review F1-F4)
F1 [critical, data-loss] — escape the image alt in ``. Canon #4 moved the top-level image off the lossless <img> form onto markdown ``, but the alt was inserted raw; the importer re-parses the `![alt]` label as CommonMark inline, so a markdown-active char in a realistic description ("Figure [1]", "the *new* logo", "a]b[c") broke the round-trip — the image node vanished or emphasis collapsed. Now `escapeLinkText(imgAttrs.alt ?? "")`, exactly as the link-form media (attachment/pdf/embed) already escape their visible text. Regression test added: six active-punctuation alts round-trip byte-stable with the node intact. F2 [drift] — re-export `clampCalloutType` / `sanitizeCssColor` from the package barrel and drop the verbatim copies in the mcp schema shim. The copies had already drifted (the mcp `clampCalloutType` lost the callout-type alias mapping the package applies), which is exactly the schema drift #293 exists to kill. The sanitizers now live only in the package; mcp `schema.test.mjs` exercises the single alias-aware implementation. F3 [docs] — AGENTS.md:296 said `packages/mcp/build/` is committed; this branch gitignored it (git-sync/prosemirror-markdown convention). Updated the line to say it is gitignored and rebuilt in CI/Docker via `pnpm build`. F4 [cleanup] — removed the dead `test.typecheck` block from the package vitest.config.ts and deleted tsconfig.vitest.json. Both were copied verbatim from git-sync; this package has zero `*.test-d.ts` files, and the ported comments referenced git-sync-only entities. Kept the `docmost-client` resolve alias (22 tests use it) and the runtime include/environment. package vitest: 658 passed (+1 F1 regression); tsc clean. git-sync: 268 passed. mcp: node --test 454 passed; tsc clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
124f5a45a2 |
refactor(mcp): consume @docmost/prosemirror-markdown, drop the drifted converter copy (#293/#326 step 5)
mcp had its OWN drifted copy of the converter (markdown-converter.ts ~900 lines, docmost-schema.ts ~1270 lines, markdown-document.ts) — older than the shared package, missing the git-sync fixes AND the #293 canon. This switches mcp's converter CORE to @docmost/prosemirror-markdown, so mcp jumps straight to the canonical format and the drift-generating second copy is gone. - markdown-converter.ts / markdown-document.ts / docmost-schema.ts become thin re-export shims of the package (convertProseMirrorToMarkdown, the docmost:meta envelope, docmostExtensions + docmostSchema=getSchema(docmostExtensions)). The mcp-only helpers clampCalloutType/sanitizeCssColor are preserved verbatim in the schema shim (the package doesn't expose them via its barrel). ~2170 lines of the drifted converter/schema bodies deleted. - collaboration.ts drops its own ~360-line marked pipeline (preprocessCallouts, bridgeTaskLists, extractFootnotes, the footnoteRef extension) and re-points to the package's markdownToProseMirror, keeping markdownToProseMirrorCanonical and all the yjs/collab write glue. footnote-lex/analyze doc comments updated (they now describe advisory legacy-syntax diagnostics, not an importer). Schema parity verified: the package schema is a strict SUPERSET of mcp's old schema — every node and attr mcp declared is present (the package only adds status/pageEmbed/transclusion*/subpages.recursive/etc.), so nothing is silently dropped on the switch. The switch actually FIXES two pre-existing mcp data-loss bugs its own tests documented: htmlEmbed and pageBreak now round-trip (were dropped by the old mcp converter). Footnotes: the package assembles inline ^[body] footnotes on import (sequential fn-N ids, identical bodies merged), so mcp's canonicalizeFootnotes is now an idempotent no-op after it (verified). Legacy reference footnotes [^id]/[^id]: are inert literal text (canon #2 no-backward-compat) — lossless, the text survives verbatim. Build hygiene: packages/mcp/build/ is now gitignored and untracked, matching the git-sync/prosemirror-markdown convention (private package, rebuilt in CI/Docker, so src and prod can never silently diverge). This also removes a dead untracked build/_vendored_editor_ext/ artifact that a broad `git add` would otherwise commit. Dependency: packages/mcp/package.json gains @docmost/prosemirror-markdown (workspace:*); pnpm-lock.yaml gets the matching link importer (mirrors git-sync). mcp tests updated deliberately to the canonical forms (highlight ==, math $…$, image <!--img-->, drawio/media discriminators, subpages/pageBreak comments, textAlign, inline ^[…] footnotes) with strict assertions; 4 structural safety-net round-trip tests added. mcp: node --test 454 passed; tsc clean. package: 657 passed. git-sync: 268 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
b751852425 |
fix(prosemirror-markdown): converter inventory bugs — spoiler/link-title in raw-HTML, contract test, codeCombined dead code (#293)
The four bugs found during the #293 HTML-emission inventory, fixed in the package: 1. Spoiler mark was silently lost in the raw-HTML path: inlineToHtml (columns / spanned cells) had no `case "spoiler"`, so spoilered text there dropped the mark on round-trip. Now emits `<span data-spoiler="true">` — the same form the top-level serializer uses and exactly what the schema's Spoiler mark parses. 2. Link `title` was dropped in the raw-HTML path: inlineToHtml's link case emitted `<a href>` without the title. The schema's link mark carries a `title` global attr (DocmostAttributes), so a titled link inside a column now round-trips via `<a href … title=…>`. 3. Serializer contract test: emoji/date/toc were flagged as possibly caseless inline atoms. Verified they exist in NEITHER the package schema NOR editor-ext, so no node handling is needed today. Added serializer-contract.test.ts, which derives every node type from the live schema (getSchema(docmostExtensions)) and asserts each has an explicit serializer `case` — all 45 current node types are covered and present, and a future node added without a case will fail this test loudly. 4. codeCombined dead code: `const codeCombined = false` was hardcoded, so every `codeCombined ? <html> : <markdown>` ternary always took the markdown branch. Removed the variable and the dead HTML-alternative branches (bold/italic/code/ link/strike). Pure cleanup — output is byte-identical (goldens + full suite pass unchanged). The `hasCode` early-return (code excludes other marks) stays. Tests: spoiler-inside-column and link-title-inside-column round-trips, the serializer contract test + inline-atom non-empty behavioral checks. package vitest: 657 passed; tsc clean. git-sync: 268 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
65d81f745a |
feat(prosemirror-markdown): inline footnotes ^[text] (#293 canon #2)
Footnotes now use the single canonical Pandoc/Obsidian inline form: the note body is written AT the reference as `^[body]`, and the separate `<section data-footnotes>` list is NOT emitted in markdown — it is reassembled on import. New shared module src/lib/footnote.ts. Serialize (markdown-converter.ts): a top-of-convert pre-scan builds Map<id, definition> from the footnotesList; a footnoteReference emits `^[<rendered body>]` (body paragraphs joined by a literal `\n`, real backslash-n written `\\n`, stray unbalanced `[`/`]` escaped via balanceBrackets while a balanced `[link](url)` stays intact); footnotesList/footnoteDefinition emit nothing; an ORPHAN definition (no ref) is appended at doc end as its own `^[body]` line so bodies are never lost (intentional, documented). The raw-HTML path (inlineToHtml, columns) emits `<sup data-footnote-ref data-fn-text="…">`, carrying the text at the ref there too; blockToHtml keeps the schema `<section>`/`<div>` form for a list nested in a column. Parse (markdown-to-prosemirror.ts): a `^[…]` inline extension on the dedicated marked instance BALANCES brackets with a depth counter (respecting `\`-escapes), so `^[note [a] b]` captures the full content, unbalanced `^[` fails open to literal text. A post-marked assembleFootnotes pass collects every `<sup data-fn-text>`, dedups by the EXACT body string, assigns sequential ids (fn-1, fn-2, … first-seen), builds one `<div data-footnote-def>` per unique body in a single `<section data-footnotes>`, and strips data-fn-text. No hash is used (F1): dedup keying on the exact text makes an id collision between DIFFERENT bodies impossible, while identical bodies still merge; ids are never written to markdown, so round-trips stay byte-stable, and all id assignment is local to the one call (race-free). Correctness hardening from internal review: - F2: raw user backslashes in a footnote body are doubled (`\`->`\\`) at text emission (via a per-conversion inFootnoteBody closure flag) BEFORE the serializer's own escapes (`\[ \] \= \$`) are layered on, so a body ending in `\` (Windows path, LaTeX, regex) no longer breaks the `^[…]` envelope and round-trips exactly; parseInline decodes `\\`->`\`. The old `\n`->`\\n` step is subsumed by this and removed. - N1: assembleFootnotes runs to a FIXED POINT — parseInline of a def body can spawn a nested `<sup data-fn-text>` (a legal nested footnote `^[a ^[b] c]`), so the section is attached before the loop (querySelectorAll only sees attached nodes) and the scan repeats until no pending sup remains; the dedup map persists across rounds. Nested and 3+-level footnotes now round-trip byte-stably instead of silently dropping the inner body. Bounded by MAX_FOOTNOTE_ROUNDS as a fail-open safety net. - N2: the id counter is seeded past the highest existing fn-<N> so a reused section's ids can never collide with generated ones. - A literal `^[` in prose text is escaped `^\[` so it does not become a phantom footnote on re-import (codeBlock/inline-code excluded). No backward compat: reference form `[^id]`/`[^id]: def` is not parsed (stays literal). No existing golden asserted the old footnote HTML output. Tests: new footnote.test.ts (22 cases: basic byte-stable round-trip, bracket balancing, multi-paragraph `\n`, real backslash-n, dedup both directions, NESTED + 3-level nest, F1 hash-collision pair surviving as distinct defs, F2 backslash bodies byte-stable, N2 id-seed, column data-fn-text form, orphan def, no-backward-compat, literal-`^[` prose, fail-open, empty `^[]`). package vitest: 607 passed; tsc clean. git-sync: 268 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
bfbd927866 |
feat(prosemirror-markdown): math as $…$ / $$…$$ (#293 canon #6)
mathInline serializes as `$LaTeX$` and mathBlock as an own-line `$$\n<latex>\n$$` fence (multi-line safe), closing hand-authoring gap A18. The LaTeX still lives in node.attrs.text; a literal `$` inside it is escaped `\$`. On the raw-HTML path (columns/cells) math keeps the schema-HTML `<span data-type="mathInline">` / `<div data-type="mathBlock">` form (markdown is not re-parsed inside raw HTML) — blockToHtml gets an explicit mathBlock case and inlineToHtml a mathInline case, sharing the mathInlineHtml/mathBlockHtml helpers with the fallbacks so the two forms cannot drift. Parse: mathInlineExtension (inline) + mathBlockExtension (block) are added to the SAME dedicated marked instance introduced for canon #7 (global singleton untouched). The inline extension uses a currency-safe PANDOC rule: an opening `$` must not be followed by whitespace, and the closing `$` must not be preceded by whitespace nor followed by a digit — so `$5`, `$5 and $10`, `a $5 b $6 c`, `100$` stay literal text while `$x^2$` is math. The block extension matches a `$$` fence line and captures multi-line LaTeX non-greedily up to the next `$$` line. The pandoc boundary rule lives ONCE in the new math-inline.ts (INLINE_MATH_SOURCE) and is shared by the import tokenizer (^-anchored) and the export prose escaper (global), so parse and serialize cannot disagree about what is math. escapeProseMath (case "text", non-code runs only) escapes ONLY the two delimiting `$` of a span the rule WOULD match, so a would-be-math prose span like `the set $A$` re-imports as literal text while currency `$5 and $10` is emitted CLEAN (zero backslash churn). marked decodes `\$`→`$` on re-parse, byte-stable. Fallbacks to the lossless schema-HTML form (all documented + tested): mathInline → <span> when empty / whitespace-edged / multi-line / pre-existing `\$` / trailing `\` / immediately before a digit-text sibling (renderInlineChildren guard, so `$…$5` can't lose the node); mathBlock → <div> when the LaTeX contains `$$`. Each fallback round-trips losslessly and byte-stably. Code safety (guards the canon #7 regression class): codeBlock reads raw child text and inline `code` runs are excluded from escapeProseMath, so `$5`/`$x$` in code stay literal with no math and no backslash corruption. ReDoS-checked on adversarial 40k-char inputs (0–1 ms). Tests: new math.test.ts (26 cases: serialize exactness, multi-line block, `\$` escaping, currency ×5 asserting no `\$`, prose escape, columns schema-HTML, inline-code/codeBlock safety, fail-open). Goldens in roundtrip / markdown-converter flipped top-level math to `$…$`/`$$…$$`; the escapeAttr-idempotence golden wraps math in a column (still exercises escapeAttr); columns/raw-HTML math assertions unchanged. package vitest: 585 passed; tsc clean. git-sync: 268 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
77f5224b55 |
feat(prosemirror-markdown): highlight without color as ==text== (#293 canon #7)
A `highlight` mark WITHOUT a color now serializes as the Obsidian/GFM `==text==`
syntax (closing hand-authoring gap A19); a highlight WITH a color keeps the
`<mark style="background-color: …">` HTML form (condition is deterministic on
the color attr). On the raw-HTML path (columns/spanned cells) BOTH forms stay
`<mark>` via inlineToHtml — markdown is not re-parsed inside a raw-HTML block.
Parse: `==` is not standard markdown, so the importer uses a DEDICATED marked
instance (`new Marked().use({extensions:[highlightMark]})`) rather than the
global singleton — registered once, never leaks `==` behavior to other callers.
The inline extension tokenizes `==text==` (non-empty, non-space-leading inner,
lazy so `==a== ==b==` is two marks; inner re-tokenized so nested marks survive;
`====`/`==x` fail-open to literal) into `<mark>` with no color, which the schema
parses as a color-less highlight. Inline code (`` `a == b` ``) stays code via
marked token precedence. marked 17 defaults (gfm:true, breaks:false) are
identical for the fresh instance, so tables/strike/autolinks are unaffected.
Losslessness: a LITERAL `==` in a text run would otherwise be misparsed as a
highlight on the next import, so `case "text"` backslash-escapes each `=` of a
`==` pair (marked decodes `\=` back to `=`), and this round-trips byte-stably.
The escape does NOT run for inline-code runs, and — CRITICALLY — codeBlock now
reads its child text RAW (schema `content: "text*"`) instead of routing through
`case "text"`: marked does not decode `\=` inside a fence, so escaping there
would permanently stamp backslashes into any `==` comparison (ubiquitous in
source code) and corrupt the block on the git-sync data path.
Tests: new highlight.test.ts (19 cases incl. serialize forms, colored vs plain,
column `<mark>` path, nested marks, inline-code exclusion, literal-`==` escape,
fail-open, AND a codeBlock-with-`==` regression proving no backslash corruption
+ byte-stable round-trip). Golden inline-mark matrix flipped top-level no-color
highlight to `==m==`; the kept `<mark style=…>` assertions are the colored/
raw-HTML cases.
package vitest: 559 passed; tsc clean. git-sync: 268 passed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
e2a3b5fc4d |
feat(prosemirror-markdown): media family as md-form + discriminator comment (#293 canon #8)
Ten media/embed node types move their TOP-LEVEL serialization off raw schema HTML onto a readable markdown target plus an always-emitted discriminator comment whose NAME selects the node type. The schema-HTML form is retained on the raw-HTML/columns path (comments are dropped by the DOM parse stage there). image-form <!--name …--> youtube, video, audio, drawio, excalidraw link-form [text](src)<!--name …--> pdf, attachment, embed (text=filename/provider) standalone <!--pageembed …--> / <!--transclusion …--> pageEmbed, transclusionReference The comment NAME is the node-type discriminator and is ALWAYS emitted, even when the attr JSON is empty (`<!--youtube-->`), so a bare `` is never mistaken for an `image` and a bare `[t](u)` stays a plain link — no URL-sniffing. src rides in the markdown target; every other non-default attr (incl. the id links attachmentId/sourcePageId/transclusionId) rides in the comment JSON (stable key order, numerics stringified, align="center" omitted). New src/lib/media-html.ts: byte-exact builders reproducing the schema HTML each old processNode case returned. Both the serializer's raw-HTML path (blockToHtml, now de-delegated from `return processNode(block)` to explicit per-type cases) and the importer call these, so serialize and parse cannot drift. Import (applyCommentDirectives): image-form binds the preceding <img> (src from it), link-form the preceding <a> (src=href, text=filename/provider), standalone replaces the comment (same leading-doc-level handling as #5). Each rebuilds the schema element via the media-html builder, then swaps it in; the empty-<p> hoist is absorbed by stripEmptyParagraphs. Fail-open: wrong element/position/name or malformed JSON -> inert, no throw. Link-form visible text is escaped (escapeLinkText) for the FULL set of CommonMark inline-active punctuation (\ ` * _ ~ [ ] < & ! ( )), not just [ ] \: the label is parsed as inline content, so a filename/provider like `report *v2*.pdf` or `.pdf` would otherwise lose the markup (or fragment the parse) when the importer reads a.textContent back — a data-loss regression vs the old data-attachment-name form. Adversarial round-trip fixtures lock byte- and value-stability for emphasis/code/strike/autolink/entity/image markers and nested-link names. Tests: new media-comments.test.ts (40 cases: per-type exact md + lossless byte-stable round-trip incl. id links, minimal-node discriminator-still-emitted, in-column schema-HTML form, discriminator integrity, fail-open, active-punct filenames). Goldens in media-roundtrip / markdown-converter-golden / markdown-converter / diagram-roundtrip updated to the md+comment form (columns stay schema-HTML). The former known-limitation image-diagrams fixture is now byte- AND canonically-stable (canon #8 omits the diagram align="center" default) and was promoted from an it.fails into the green corpus (11-image-diagrams.json). git-sync stabilize.test.ts: the "diagram materializes data-align=center" fixpoint moved into a column (where the raw-HTML asymmetry still holds), since top level is now byte-stable. package vitest: 540 passed; tsc clean. git-sync: 268 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
d7d8db2102 |
feat(prosemirror-markdown): images as  + attached img-comment (#293 canon #4)
Every image now serializes as ``; non-default layout/identity attrs
that markdown cannot express ride along in an attached `<!--img {…}-->` comment
on the same line, replacing the prior "image-with-attrs -> raw <img>" split for
the top-level path:
 <!--img {"width":"420","align":"left","attachmentId":"…"}-->
Keys (emitted only when non-default, stable order): width, height, align, size,
aspectRatio, attachmentId, caption, title. Numeric sizing attrs are stringified
in the payload (the import side reads DOM attributes back as strings), so a
numeric `width:420` round-trips byte-stably instead of churning `420 -> "420"`.
attachedCommentFor defuses any `--` in a value (e.g. a caption containing the
comment-closing `-->`) so the payload can never close the comment early.
Align default unified to "center" (#293 canon #4): editor-ext declares
image.align default "center" while this package's schema declared null — keeping
null would make the clean `` form dead code (every editor image is
"center"). Now the schema default is "center" (docmost-schema image align, with
explicit parseHTML/renderHTML), canonicalize KNOWN_DEFAULTS drops align=="center"
for image, and the serializer omits align when it is null OR "center". A null
align collapses to "center" on re-import (a null align is not a distinct editor
state) — stable, no ping-pong. Only left/right emit a comment.
Import: applyCommentDirectives gains an `img` handler that targets the comment's
previousElementSibling <img> and writes each decoded key to the DOM attribute
the schema reads (align, width, height, data-size, data-aspect-ratio,
data-attachment-id, data-caption, title), then removes the comment. Attached
only: a standalone `<!--img-->` with no adjacent image is inert. Fail-open on
malformed JSON / unknown keys.
Raw-HTML path unchanged in spirit: images inside columns/cells keep the
`<img …>` form (comments are dropped by the DOM parse stage); imageToHtml now
omits a redundant align="center" to match the unified default.
Tests: new image-comment.test.ts (21 cases incl. caption == `-->`, numeric-size
byte-stability, image-in-column <img> form, fail-open). Goldens updated
deliberately: markdown-roundtrip-spoiler-caption (captioned image -> comment
form), markdown-converter-gaps spec 14/15 (title now round-trips via comment;
column image drops redundant align), canonicalize-extra (center+null dropped,
left kept).
package vitest: 498 passed | 1 expected-fail; tsc clean. git-sync (rebuilt
build): 268 passed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
e814bca243 |
feat(prosemirror-markdown): subpages/pageBreak as standalone comments (#293 canon #5)
Move the two "invisible machinery" atoms off the <div data-type="..."> HTML
form onto standalone HTML comments on their own line, keeping the markdown
human-readable while still round-tripping:
subpages -> <!--subpages--> / <!--subpages {"recursive":true}-->
pageBreak -> <!--pagebreak-->
Adds standaloneCommentFor(name, attrs?) to attached-comment.ts (emits
`<!--name-->` when attrs are empty/absent, else `<!--name {compact-json}-->`).
The `--`-escaping + compact-JSON logic is factored into a shared internal
escapeCommentJson() so standaloneCommentFor and attachedCommentFor cannot drift
(verified byte-identical output for attachedCommentFor — no #9 regression).
Position determines legality (canon #5): subpages/pagebreak are honored ONLY
standalone; the same comment attached after visible text is inert. The parser
pass (applyAttachedComments renamed applyCommentDirectives) now also
materializes these standalone comments into the schema `<div data-type=...>`
element before generateJSON drops the comment node. A LEADING standalone
comment is parsed at document level (outside <body>); the pass walks the whole
document and re-inserts leading comments into <body> in document order, so
block order is preserved.
Raw-HTML path: blockToHtml gains explicit subpages/pageBreak cases emitting the
`<div data-type=...>` form. Comments are dropped by the DOM parse stage inside
columns/cells, so the div-form must stay there — this also fixes a latent
default-fallthrough (`<div></div>`) that silently dropped these atoms inside a
column.
Tests: new machinery-comments.test.ts (primitive, subpages default/recursive
exact strings + round-trip, pageBreak, subpages-inside-column div-form,
fail-open for attached-position/malformed, and multi-node document-order
regression locking the leading/mid/trailing comment ordering). Top-level
goldens in markdown-converter-golden/gaps updated deliberately to the comment
form; the columns/raw-HTML goldens keep the div-form.
package vitest: 477 passed | 1 expected-fail; tsc clean. git-sync: 268 passed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
f1ab76e879 |
feat(prosemirror-markdown): serialize textAlign as attached comment (#293 canon #9)
Move paragraph/heading textAlign off the HTML-wrapper form
(<p style="text-align:…"> / <hN style=…>) onto a trailing attached HTML
comment on the block line: `text <!--attrs {"textAlign":"center"}-->`. This
keeps the readable markdown block form (plain `text` / `## Title`) while
preserving alignment losslessly. "left"/null stay bare (no churn).
Adds a reusable attached-comment primitive (attached-comment.ts) that #4
(image) and #8 (media) will reuse:
- attachedCommentFor(name, json) -> `<!--name {compact-json}-->`, escaping any
`--` pair inside the JSON as -- so the payload can never close the
comment early;
- parseAttachedComment(data) with grammar `^\s*([A-Za-z][\w-]*)(?:\s+({…}))?\s*$`
whose name excludes `:`, so envelope comments (docmost:meta / docmost:comments)
never match — fail-open on anything malformed.
On import, applyAttachedComments runs AFTER marked.parse but BEFORE generateJSON
(parse5 drops comments), re-expressing the attrs comment as an inline
text-align style on the parent block, then removing the comment node.
Guards: emit only when there is a visible element to attach to — paragraph
requires non-empty text, heading requires non-empty headingText (symmetry:
an empty aligned heading stays bare `##`, no orphan comment).
Goldens in markdown-converter-golden/gaps updated deliberately to the
attached-comment form (assertions stay strict: exact output + lossless
round-trip). New textalign.test.ts (19 tests) covers center/right/justify on
paragraph and heading, byte-stable re-export, and fail-open branches.
Raw-HTML containers (columns/cells/callout via blockToHtml) keep the inline
text-align form intentionally — comments are dropped inside raw HTML.
package vitest: 462 passed | 1 expected-fail; tsc clean. git-sync: 268 passed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
6dcc19ce59 |
refactor(git-sync): consume @docmost/prosemirror-markdown, drop the duplicate lib (#293 stage 3 / no-op)
git-sync's converter-core (src/lib) was a byte-identical duplicate of the new @docmost/prosemirror-markdown package (created in the previous commit). Switch git-sync to consume the package and delete its copy — ending the duplication that the whole #293 effort targets. Pure no-op: NO format/behavior change. - git-sync depends on @docmost/prosemirror-markdown (workspace:*); engine (stabilize/push/pull) + src/index barrel + 12 engine tests re-point their converter imports to the package. - Delete git-sync/src/lib (8 files) and the 23 duplicate converter-core test files + their fixtures — the converter and its ~440 tests now live once, in the package. git-sync keeps only its ENGINE tests, which exercise the converter through the package (the no-op proof). Kept roundtrip-helpers.ts (an engine test imports firstDivergence from it; pure helper, no double-run). - Added docmostExtensions to the package barrel (a kept engine schema-validity test needs it). Verified: editor-ext + prosemirror-markdown + git-sync all tsc EXIT 0; git-sync vitest 28 files, 268 passed, 0 failures (engine cycle/roundtrip/push/ pull/reconcile green = no-op proof); prosemirror-markdown vitest still 443 passed | 1 expected-fail; pnpm --frozen-lockfile EXIT 0; no ../lib refs remain in git-sync. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
d6d7dd82f6 |
feat(prosemirror-markdown): new headless converter package seeded from git-sync (#293 stage 1)
Create @docmost/prosemirror-markdown — the single framework-free ProseMirror<-> Markdown converter + schema mirror that git-sync and mcp will both consume, ending the three-hand-synced-copies drift (#293). This step only CREATES the package (no consumer yet; git-sync untouched); the switch of git-sync and mcp onto it, plus the canonical format decisions, come in later commits of this PR. - packages/prosemirror-markdown/src/lib/: the 8 converter-core files copied VERBATIM from packages/git-sync/src/lib (docmost-schema, markdown-converter, markdown-to-prosemirror, canonicalize, markdown-document, node-ops, page-file, index). Confirmed byte-identical — no behavioral drift introduced. - src/index.ts barrel; package.json (@tiptap/* + jsdom/marked/zod, editor-ext workspace devDep for the contract test); tsconfig/vitest configs. - 24 converter-core test files + fixtures copied (engine-coupled layout/ redteam-layout-title tests correctly excluded — they import ../src/engine). - pnpm-lock importer added; build/ gitignored (CI-built). Verified (clean checkout, no network): pnpm --frozen-lockfile EXIT 0; tsc EXIT 0; vitest 23 files, 443 passed | 1 expected-fail (the same image-diagrams known-limitation carried from git-sync) — faithful extraction. git-sync untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |