2c2d60a5dcc0fc308ffd0ee2bb664757c7c2199a
237 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
6c82c54470 |
test(mcp): expect Obsidian '> [!info]' callout export in e2e (#333 canon)
PR #333 deliberately changed the canonical markdown export of callout
nodes to the Obsidian-native format ('> [!type]' + blockquote body,
pinned by packages/prosemirror-markdown unit tests); the importer still
parses both ':::type' fences and '> [!type]'. The get_page e2e assertion
was missed in that switch and still expected ':::info', failing the
e2e-mcp job on develop since
|
||
|
|
8978d69f3e |
Merge pull request 'fix(converter): стабильность round-trip image/медиа — «» ≡ absent (класс defaults-instability)' (#350) from fix/media-roundtrip-stability into develop
Reviewed-on: #350 |
||
|
|
c192f2a2e1 |
test(prosemirror-markdown): pin the third state — explicit "" converges once, then idempotent
Reviewer addition to the round-trip stability matrix: besides "attr absent" and "attr has a real value", a string attr in the empty-string class has a third, degenerate state — a LITERAL "" (a user types alt/title/name in the editor then deletes it, and Tiptap persists `attr: ""`, distinct from never-set). The fix's `getAttribute(...) || null` coercion normalizes such a stored "" to the default on the FIRST round-trip (a one-time "" -> null diff) and is byte-stable from the SECOND round-trip on. Adds a convergence contract to the reusable matrix helper (emptyStringClass flag + runConvergenceCase): pass 1 must converge the attr to its schema default (NOT asserted byte-stable vs the "" input — that is the intended one-time normalization); pass 2 must deep-equal pass 1 (idempotent thereafter). Driven for every empty-string-class attr across image + the media family (image/drawio alt+title, video alt via aria-label, pdf/attachment name, attachment mime). Documents the one-time normalization so a future sync/QA diff does not flag the single "" -> null change as converter corruption. Gate: package suite 33 files / 682 tests passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
2ce672709a |
fix(prosemirror-markdown): stabilize image round-trip — "" ≡ absent on parse (empty-string class)
A stored image authored without `alt` gained a phantom `alt: ""` on every
round-trip (`markdownToProseMirror(convertProseMirrorToMarkdown(doc))`): `marked`
renders `` as `<img alt="">`, and the stock tiptap Image `alt` parseHTML
(`getAttribute("alt")`) materialized the empty string where the original had no
attribute. That false diff is a real GS-EDIT-REVERT churn source — an agent /
git-sync touch of a page with an image mutates the stored JSON (`absent -> ""`),
producing phantom diffs that can overwrite live edits.
Fix is PARSE-SIDE ("" ≡ absent), so the RAW round-trip is idempotent — not only
the canonical form (history / stored JSON diff on the raw shape; masking it only
in canonicalize would leave that noise). `image.alt`/`title` parseHTML now coerce
`getAttribute(...) || null`, plus defense-in-depth `|| null` across the at-risk
empty-string class (video aria-label, drawio/excalidraw title+alt, pdf name,
attachment name+mime) matching the existing `image.caption || null` precedent.
NOTE — image `align` is NOT changed: it round-trips correctly (center via the
schema default "center", left/right via the `<!--img {...}-->` comment). Its
`toBeUndefined()` in the git-sync gate is canonical-form normalization, not a loss.
Intentional divergence from editor-ext: editor-ext's literal `alt` parseHTML
returns "" verbatim, but this coercion CONVERGES on editor-ext's real STORED
shape (an image inserted without alt has no `alt` attribute -> re-parses absent,
never ""), so the round-trip is idempotent and matches real documents.
Adds a reusable, node-agnostic round-trip-stability matrix helper
(test/roundtrip-stability.helper.ts) — given a node + attr spec it enumerates
default/non-default combos and asserts byte-stability of BOTH the raw and the
canonical round-trip (the documented numeric width/height→string coercion encoded
as an explicit allowed normalization) — driven over image + the whole media
family (video/audio/pdf/attachment/embed/drawio/excalidraw). The only raw
empty-string instability it found was image.alt; the family was already stable.
Gate: package suite 33 files / 672 tests passed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
e431b33bb1 |
feat(ai-chat): deferred tool loading (tiers + loadTools meta-tool) (#332)
The in-app AI agent shipped all ~41 tool schemas on every model step. This adds a two-tier catalog: core tools (frequent or one-line) stay always-active; the rest are advertised as a compact catalog and their full schema is fetched on demand via the loadTools meta-tool, wired through ai@6 prepareStep's per-step activeTools. - tools/tool-tiers.ts: CORE_TOOL_KEYS, INLINE_TOOL_TIERS, applyLoadTools, catalog builders (+ tool-tiers.spec.ts, 13 cases). - ai-chat.service.ts prepareAgentStep: returns activeTools = [...CORE_TOOL_KEYS, loadTools, ...activatedTools]; per-turn activated Set. - ai-chat.prompt.ts: buildToolCatalogBlock renders the deferred catalog. - mcp/tool-specs.ts: tier + catalogLine metadata (external snake_case /mcp transport unchanged). - EnvironmentService.isAiChatDeferredToolsEnabled(): AI_CHAT_DEFERRED_TOOLS, default ON per issue intent (kill-switch =false restores old behavior). Gate: server ai-chat 631/631, tool-tiers 13/13, mcp 472/472, tsc clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
eacc1c4811 |
Merge branch 'develop' of https://gitea.vvzvlad.xyz/vvzvlad/gitmost into feat/293-B-prosemirror-markdown-pkg
# Conflicts: # packages/mcp/build/client.js # packages/mcp/build/index.js # packages/mcp/build/tool-specs.js |
||
|
|
348dcd0802 |
Merge pull request 'feat(mcp): search_in_page — внутристраничный поиск для агента (#330)' (#339) from fix/330-search-in-page into develop
Reviewed-on: #339 |
||
|
|
086bc1bf8b |
docs(mcp): search_in_page regex desc names RE2, not JS regex (#330 review F5)
The RE2 swap narrowed the contract: regex:true rejects lookaround ((?=…)/(?<=…)) and backreferences (\1). The internal JSDoc was updated, but the AGENT-VISIBLE tool-spec (the only text the agent reads at call time, single-sourced to both transports) still said 'a JS regular expression' — so an agent would write a lookahead/backref and hit an error. Updated the .description and the regex flag .describe() to name RE2 (linear-time, ReDoS-safe), list that char classes / word boundaries / anchors / quantifiers work while lookaround and backreferences do NOT, and keep the 'invalid/unsupported regex -> clear error' note. mcp: tsc clean; tool-specs / server-instructions / contract tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
77b245461f |
fix(mcp): search_in_page regex via re2 (ReDoS-safe) + review DO F1-F4 (#330 review)
Maintainer decision on the escalated ReDoS fork: use re2. The regex path compiled agent-supplied patterns with `new RegExp` and ran them synchronously in the shared event-loop; a catastrophic-backtracking pattern (e.g. `(a+)+$`) hung the whole Node backend for all users (the tool is in both transports incl. the in-app apps/server agent), and size caps do NOT bound backtracking. Switch the regex engine to re2 (Google RE2, linear-time, no backtracking): - `new RE2(query, caseSensitive?'g':'gi')`. RE2 extends RegExp, so eachMatch and the zero-length-match lastIndex guard are unchanged. - Unsupported patterns are now a CLEAN error, not a hang: RE2 throws on invalid syntax AND on the backtracking-only features it can't do (lookaround (?=…)/(?<=…), backreferences \1) — caught at compile and returned as a clear tool error telling the agent to rewrite without them. - Removed MAX_CONTAINER_TEXT + the per-container slice (re2 is linear, so it's no longer a ReDoS defense, and truncating risked silently dropping real matches in a long container); kept MAX_PATTERN_LENGTH as a cheap query sanity cap. - Verified: `(a+)+$` over 50k `a` completes in ~4ms; lookaround/backref throw. - Added re2 (^1.21.0) to packages/mcp; lockfile updated. Reviewer DO items: - F1 [doc]: removed the false "pass nodeId as a comment anchor" claim (create_comment has no nodeId param — it needs a text `selection`). Fixed in tool-specs.ts + page-search.ts (module + SearchMatch JSDoc) + client.ts; the ref is for get_node/patch_node, and for a comment you build a unique text selection from before+match+after. - F2 [doc]: clarified `#<index>` refs (id-less table/cell) are accepted by get_node but NOT patch_node (id-only). - F3 [test]: round-trip — each match's nodeId fed to the real getNodeByRef (attrs.id node + `#<index>` table-cell) to prove the ref format is consumable. - F4 [test]: before/after edge-pinning (match in first 40 chars of a long container; index 0 → before==""; container end → after==""). - New re2 tests: catastrophic patterns complete fast; lookaround/backref → error. mcp: tsc clean; node --test 472 passed (+5). apps/server: tsc --noEmit clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
832c3cafdf |
test(mcp): update test-e2e.mjs listComments calls to the {items} shape (#328 review F1)
The listComments Comment[] -> { items, resolvedThreadsHidden } shape change
reached every src/host consumer but not the live-server e2e harness (run via
`node test-e2e.mjs`, not the node --test gate — so the green suite missed it).
The 4 calls now read .items; the post-resolve check passes includeResolved:true
so it still sees the now-resolved root c1 (the default feed hides it).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
40d42d61e6 |
feat(mcp): search_in_page tool — in-page substring/regex search for the agent (#330)
Editorial roles (Corrector/Factchecker) brute-forced `get_node` block-by-block to
find occurrences (unquoted «ё», straight quotes, «т.е.»), burning tokens. New
`search_in_page(pageId, query, {regex?, caseSensitive?, limit?})` reads the page's
ProseMirror JSON via the existing getPageRaw and searches it IN MEMORY — no server
endpoint, no DB/schema change, no touch to the packages/mcp/src/lib schema mirror.
New pure `searchInDoc(doc, query, opts)` (packages/mcp/src/lib/page-search.ts):
recursive descent to each TEXT CONTAINER (paragraph/heading/table-cell paragraph),
glues its inline text via `blockPlainText` (a match survives inline-mark
boundaries — e.g. «т.е.» split across bold/italic), searches literal (indexOf) or
regex, and returns `{ total, truncated, matches:[{ nodeId, blockIndex, type,
before, match, after }] }`. `nodeId` is the container's attrs.id or the
`#<topLevelIndex>` of the enclosing top-level block — the SAME ref format
get_node/patch_node/comment-anchoring accept (verified identical to getNodeByRef),
so the agent goes straight from a hit to a targeted comment; `before`/`after` are
~40-char windows for a unique selection. `total`/`truncated` always reported (never
silent truncation). Lives in the SHARED_TOOL_SPECS registry → exposed in BOTH
transports (external /mcp + in-app AI-chat), with a SERVER_INSTRUCTIONS line and a
DocmostClientLike signature + contract-test entry. Corrector/Factchecker prompts
get a one-line "use search_in_page first" hint (versions bumped, catalog hash lock
refreshed).
Guards: empty/whitespace query → clear error; invalid regex → clear error (not a
generic 500); zero-length regex matches (`\b`, `a*`) skipped with lastIndex
advanced (no loop/flood); MAX_PATTERN_LENGTH=1000, MAX_CONTAINER_TEXT=100k bound
each exec; limit clamped [1,200] (default 50).
Tests: new page-search.test.mjs (17) — literal+regex, case-sensitivity,
mark-boundary glue, nodeId for paragraph/heading (attrs.id) and table-cell
(#<index> fallback), context bounds, limit/total/truncated + clamp, invalid
regex/empty/over-long errors, zero-length skip, empty-doc null-safety.
mcp: tsc clean; node --test 467 passed (+17). apps/server: tsc --noEmit clean
(DocmostClientLike + wiring). catalog check.mjs OK.
Known limitations (from internal review, non-blocking):
- Residual ReDoS: a crafted catastrophic-backtracking pattern (e.g. `(a+)+$`)
against a large single container can hang the event loop — JS regex is not
interruptible, so the length caps bound the base but not the backtracking.
Realistic exposure is low (containers are small; the pattern is supplied by the
authenticated model). Candidate for a follow-up hardening (safe-regex validation
or a worker+timeout) if it matters.
- Case-insensitive LITERAL search folds via toLowerCase; a char whose lowercase
differs in length (e.g. Turkish İ) BEFORE a match could shift the context
window — negligible for the RU/EN editorial scenario.
- On a `#<index>` table-cell fallback, `type` is the inline container ("paragraph")
while nodeId addresses the top-level block — addressing is correct; the field is
documented as the container's type.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
bcd194ee5d |
feat(mcp): hide resolved-comment anchors + feed from the agent (#328)
The AI agent (MCP + in-app chat) saw ALL comments incl. resolved via two channels, cluttering its context and breaking fragment search. Default now: the agent sees only ACTIVE discussions; resolved is opt-in. Active anchors and threads are always kept. Channel 1 — resolved comment anchors on agent reads (converter option): `convertProseMirrorToMarkdown(content, options?)` gains `options.dropResolvedCommentAnchors` (default false — zero change for every existing caller incl. git-sync). Both `case "comment"` emitters (top-level and the raw-HTML inlineToHtml path) emit BARE text (no `<span data-comment-id>`) when `resolved && the flag`; active anchors keep their wrapper. mcp `getPage` passes the flag; `export_page_markdown` does NOT (lossless export must preserve resolved anchors — that is why it is an opt-in option, not unconditional); `get_page_json` is untouched (lossless PM JSON). Built on the #293 package converter. Channel 2 — `list_comments` default active-only: `listComments(pageId, includeResolved=false)` now returns `{ items, resolvedThreadsHidden }` (was a bare array). By default a RESOLVED top-level thread is hidden wholesale — the root AND every reply anchored to it (a thread is gated only by its root's resolvedAt; a resolved reply under an ACTIVE root stays). `resolvedThreadsHidden` counts hidden threads so the agent knows to re-query. `includeResolved:true` returns everything. The `includeResolved` param is added to both tool registrations (MCP index.ts + in-app ai-chat-tools.service.ts); `DocmostClientLike` signature updated. Server `findPageComments` is NOT touched — the web UI's tabs depend on the full feed; filtering is only at the mcp-client level. All internal call sites (export_page_markdown / checkNewComments / transformPage) updated to `.items` with `includeResolved:true` to keep their full-feed behavior. The comment model is assumed FLAT (a reply's parentCommentId points at the thread root) — documented in the filter; a future reply-of-reply model would need a root-walk there. Tests: resolved-comment-anchors.test.ts (6 — anchor dropped with flag / kept without, for BOTH emitters; active always kept); list-comments-resolved.test.mjs (4 — resolved thread+reply hidden + counter; includeResolved:true returns all; an ACTIVE thread with a RESOLVED reply is NOT hidden). package vitest: 664 passed; tsc clean. mcp: node --test 458 passed; tsc clean. apps/server + git-sync: tsc clean (converter option default-off). NOTE: based on feat/293-B (#293/#326 STEP 5) — the converter lives in the package; this PR is stacked on #333 and its base retargets to develop once #333 merges. mcp/build is gitignored (not committed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
08222345ef |
fix(prosemirror-markdown): escape canon inline-extension triggers = $ ^ in link/alt text (#333 review F5)
F1 (round 1) wrapped the image alt in escapeLinkText, and that helper also guards the link-form media captions (attachment/pdf/embed). But its character class covered only stock CommonMark — NOT the Docmost inline EXTENSIONS this same PR registers on the marked instance: highlight `==x==` (canon #7), math `$x$` (canon #6), footnote `^[x]` (canon #2). Their triggers `= $ ^` are not CommonMark punctuation, so an alt or media filename like `x $A$ y`, `use ==bold==`, `^[fn]`, or `data $A$.csv` was silently turned into a math/highlight/footnote node on import — the same class of round-trip data loss F1 closed, reintroduced by this PR's own canon. Fix: add `= $ ^` to the escapeLinkText class (`/[\\`*_~[\]<&!()=$^]/g`). `\= \$ \^` decode back to literals (all ASCII punctuation) AND, being escape tokens, stop the extension tokenizer from matching — verified lossless byte-stable round-trip. Updated the helper comment to name the two trigger sets (CommonMark + Docmost inline extensions). Extended the adversarial round-trip tests: image alt gains `x $A$ y` / `5$ and 10$` / `use ==bold==` / `^[fn]` / `cost $5 == price`; pdf name gains `data $A$.csv` / `q3 ==final==.pdf` / `5$ and 10$.pdf` / `note ^[x].pdf` — all byte-stable with the node intact, so the hole can't reopen. package vitest: 658 passed; tsc clean. git-sync: 268. mcp: 454. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
baa41d66ad |
test(infra): coverage-gate + acceptInvitation atomicity int-spec + turn-end unit (#324)
Tail of #244. Three items: 1. Coverage-gate (main). develop had no coverage tooling at all. Added @vitest/coverage-v8@4.1.6 (pinned to the vitest already in use) to the three vitest packages — git-sync, editor-ext (which also gains its missing direct `vitest` devDep), apps/client — and enabled v8 coverage with per-package thresholds (no root vitest config exists, so per-package is the only meaningful scope). v8 provider is chosen deliberately: istanbul broke on the ESM `@docmost/editor-ext` barrel; v8 collects native runtime coverage and never re-parses ESM. `enabled: true` wires the gate into the plain `test` script, so `pnpm -r test` (the CI entrypoint) enforces it without a manual `--coverage`. Thresholds set ~4-5 pts below measured current coverage so the gate PASSES today and FAILS on regression (verified: forcing lines=95 on editor-ext exits 1). `all: false` — coverage counts test-touched files; documented in the configs (with `all: true` the many untested type/barrel files would sink the % and make the gate meaningless). Measured→threshold (S/B/F/L): git-sync 91.78/79.16/76.76/92.46 → 88/75/72/88; editor-ext 58.58/48.1/64.96/58.91 → 54/44/60/54; client 59.93/58/48.47/59.39 → 55/53/44/55. All exit 0. 2. acceptInvitation atomicity int-spec. New apps/server/test/integration/workspace-accept-invitation-atomicity.int-spec.ts (+ createDefaultGroup/createInvitation seeders in test/integration/db.ts per its convention). Wires the real WorkspaceInvitationService with real User/Group/GroupUser repos against the test Kysely, stubbing only the post-commit collaborators. Asserts the invariant protected by users_email_workspace_id_unique: (a) two CONCURRENT accepts → exactly one fulfilled, one BadRequestException('Invitation already accepted'), membership count == 1, invitation consumed; (b) repeated sequential accept → still one membership; (c) the survivor is in the workspace default group (whole-tx, no torn state). Ran against real Postgres+Redis: 3/3 pass. 3. turn-end decision unit test. `decideTurnEnd` does not exist as a symbol; the turn-end logic lives in chat-thread.tsx's onFinish handler. Added a focused block to the existing chat-thread.test.tsx (matching its hoisted-mock style): clean finish → flush queued (continue); abort/disconnect/error → queue preserved (end) with the correct notice; parent notified on every terminal outcome. 8 passed (3 existing + 5 new). Verified: git-sync 712, editor-ext 247, client 888 (all with the gate, exit 0); int-spec 3/3 (real Postgres); tsc --noEmit clean for client + server; pnpm install --frozen-lockfile consistent (lockfile additive). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
1a7b817250 |
fix(prosemirror-markdown): escape image alt + consolidate schema sanitizers + tidy (#333 review F1-F4)
F1 [critical, data-loss] — escape the image alt in ``. Canon #4 moved the top-level image off the lossless <img> form onto markdown ``, but the alt was inserted raw; the importer re-parses the `![alt]` label as CommonMark inline, so a markdown-active char in a realistic description ("Figure [1]", "the *new* logo", "a]b[c") broke the round-trip — the image node vanished or emphasis collapsed. Now `escapeLinkText(imgAttrs.alt ?? "")`, exactly as the link-form media (attachment/pdf/embed) already escape their visible text. Regression test added: six active-punctuation alts round-trip byte-stable with the node intact. F2 [drift] — re-export `clampCalloutType` / `sanitizeCssColor` from the package barrel and drop the verbatim copies in the mcp schema shim. The copies had already drifted (the mcp `clampCalloutType` lost the callout-type alias mapping the package applies), which is exactly the schema drift #293 exists to kill. The sanitizers now live only in the package; mcp `schema.test.mjs` exercises the single alias-aware implementation. F3 [docs] — AGENTS.md:296 said `packages/mcp/build/` is committed; this branch gitignored it (git-sync/prosemirror-markdown convention). Updated the line to say it is gitignored and rebuilt in CI/Docker via `pnpm build`. F4 [cleanup] — removed the dead `test.typecheck` block from the package vitest.config.ts and deleted tsconfig.vitest.json. Both were copied verbatim from git-sync; this package has zero `*.test-d.ts` files, and the ported comments referenced git-sync-only entities. Kept the `docmost-client` resolve alias (22 tests use it) and the runtime include/environment. package vitest: 658 passed (+1 F1 regression); tsc clean. git-sync: 268 passed. mcp: node --test 454 passed; tsc clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
124f5a45a2 |
refactor(mcp): consume @docmost/prosemirror-markdown, drop the drifted converter copy (#293/#326 step 5)
mcp had its OWN drifted copy of the converter (markdown-converter.ts ~900 lines, docmost-schema.ts ~1270 lines, markdown-document.ts) — older than the shared package, missing the git-sync fixes AND the #293 canon. This switches mcp's converter CORE to @docmost/prosemirror-markdown, so mcp jumps straight to the canonical format and the drift-generating second copy is gone. - markdown-converter.ts / markdown-document.ts / docmost-schema.ts become thin re-export shims of the package (convertProseMirrorToMarkdown, the docmost:meta envelope, docmostExtensions + docmostSchema=getSchema(docmostExtensions)). The mcp-only helpers clampCalloutType/sanitizeCssColor are preserved verbatim in the schema shim (the package doesn't expose them via its barrel). ~2170 lines of the drifted converter/schema bodies deleted. - collaboration.ts drops its own ~360-line marked pipeline (preprocessCallouts, bridgeTaskLists, extractFootnotes, the footnoteRef extension) and re-points to the package's markdownToProseMirror, keeping markdownToProseMirrorCanonical and all the yjs/collab write glue. footnote-lex/analyze doc comments updated (they now describe advisory legacy-syntax diagnostics, not an importer). Schema parity verified: the package schema is a strict SUPERSET of mcp's old schema — every node and attr mcp declared is present (the package only adds status/pageEmbed/transclusion*/subpages.recursive/etc.), so nothing is silently dropped on the switch. The switch actually FIXES two pre-existing mcp data-loss bugs its own tests documented: htmlEmbed and pageBreak now round-trip (were dropped by the old mcp converter). Footnotes: the package assembles inline ^[body] footnotes on import (sequential fn-N ids, identical bodies merged), so mcp's canonicalizeFootnotes is now an idempotent no-op after it (verified). Legacy reference footnotes [^id]/[^id]: are inert literal text (canon #2 no-backward-compat) — lossless, the text survives verbatim. Build hygiene: packages/mcp/build/ is now gitignored and untracked, matching the git-sync/prosemirror-markdown convention (private package, rebuilt in CI/Docker, so src and prod can never silently diverge). This also removes a dead untracked build/_vendored_editor_ext/ artifact that a broad `git add` would otherwise commit. Dependency: packages/mcp/package.json gains @docmost/prosemirror-markdown (workspace:*); pnpm-lock.yaml gets the matching link importer (mirrors git-sync). mcp tests updated deliberately to the canonical forms (highlight ==, math $…$, image <!--img-->, drawio/media discriminators, subpages/pageBreak comments, textAlign, inline ^[…] footnotes) with strict assertions; 4 structural safety-net round-trip tests added. mcp: node --test 454 passed; tsc clean. package: 657 passed. git-sync: 268 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
b751852425 |
fix(prosemirror-markdown): converter inventory bugs — spoiler/link-title in raw-HTML, contract test, codeCombined dead code (#293)
The four bugs found during the #293 HTML-emission inventory, fixed in the package: 1. Spoiler mark was silently lost in the raw-HTML path: inlineToHtml (columns / spanned cells) had no `case "spoiler"`, so spoilered text there dropped the mark on round-trip. Now emits `<span data-spoiler="true">` — the same form the top-level serializer uses and exactly what the schema's Spoiler mark parses. 2. Link `title` was dropped in the raw-HTML path: inlineToHtml's link case emitted `<a href>` without the title. The schema's link mark carries a `title` global attr (DocmostAttributes), so a titled link inside a column now round-trips via `<a href … title=…>`. 3. Serializer contract test: emoji/date/toc were flagged as possibly caseless inline atoms. Verified they exist in NEITHER the package schema NOR editor-ext, so no node handling is needed today. Added serializer-contract.test.ts, which derives every node type from the live schema (getSchema(docmostExtensions)) and asserts each has an explicit serializer `case` — all 45 current node types are covered and present, and a future node added without a case will fail this test loudly. 4. codeCombined dead code: `const codeCombined = false` was hardcoded, so every `codeCombined ? <html> : <markdown>` ternary always took the markdown branch. Removed the variable and the dead HTML-alternative branches (bold/italic/code/ link/strike). Pure cleanup — output is byte-identical (goldens + full suite pass unchanged). The `hasCode` early-return (code excludes other marks) stays. Tests: spoiler-inside-column and link-title-inside-column round-trips, the serializer contract test + inline-atom non-empty behavioral checks. package vitest: 657 passed; tsc clean. git-sync: 268 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
65d81f745a |
feat(prosemirror-markdown): inline footnotes ^[text] (#293 canon #2)
Footnotes now use the single canonical Pandoc/Obsidian inline form: the note body is written AT the reference as `^[body]`, and the separate `<section data-footnotes>` list is NOT emitted in markdown — it is reassembled on import. New shared module src/lib/footnote.ts. Serialize (markdown-converter.ts): a top-of-convert pre-scan builds Map<id, definition> from the footnotesList; a footnoteReference emits `^[<rendered body>]` (body paragraphs joined by a literal `\n`, real backslash-n written `\\n`, stray unbalanced `[`/`]` escaped via balanceBrackets while a balanced `[link](url)` stays intact); footnotesList/footnoteDefinition emit nothing; an ORPHAN definition (no ref) is appended at doc end as its own `^[body]` line so bodies are never lost (intentional, documented). The raw-HTML path (inlineToHtml, columns) emits `<sup data-footnote-ref data-fn-text="…">`, carrying the text at the ref there too; blockToHtml keeps the schema `<section>`/`<div>` form for a list nested in a column. Parse (markdown-to-prosemirror.ts): a `^[…]` inline extension on the dedicated marked instance BALANCES brackets with a depth counter (respecting `\`-escapes), so `^[note [a] b]` captures the full content, unbalanced `^[` fails open to literal text. A post-marked assembleFootnotes pass collects every `<sup data-fn-text>`, dedups by the EXACT body string, assigns sequential ids (fn-1, fn-2, … first-seen), builds one `<div data-footnote-def>` per unique body in a single `<section data-footnotes>`, and strips data-fn-text. No hash is used (F1): dedup keying on the exact text makes an id collision between DIFFERENT bodies impossible, while identical bodies still merge; ids are never written to markdown, so round-trips stay byte-stable, and all id assignment is local to the one call (race-free). Correctness hardening from internal review: - F2: raw user backslashes in a footnote body are doubled (`\`->`\\`) at text emission (via a per-conversion inFootnoteBody closure flag) BEFORE the serializer's own escapes (`\[ \] \= \$`) are layered on, so a body ending in `\` (Windows path, LaTeX, regex) no longer breaks the `^[…]` envelope and round-trips exactly; parseInline decodes `\\`->`\`. The old `\n`->`\\n` step is subsumed by this and removed. - N1: assembleFootnotes runs to a FIXED POINT — parseInline of a def body can spawn a nested `<sup data-fn-text>` (a legal nested footnote `^[a ^[b] c]`), so the section is attached before the loop (querySelectorAll only sees attached nodes) and the scan repeats until no pending sup remains; the dedup map persists across rounds. Nested and 3+-level footnotes now round-trip byte-stably instead of silently dropping the inner body. Bounded by MAX_FOOTNOTE_ROUNDS as a fail-open safety net. - N2: the id counter is seeded past the highest existing fn-<N> so a reused section's ids can never collide with generated ones. - A literal `^[` in prose text is escaped `^\[` so it does not become a phantom footnote on re-import (codeBlock/inline-code excluded). No backward compat: reference form `[^id]`/`[^id]: def` is not parsed (stays literal). No existing golden asserted the old footnote HTML output. Tests: new footnote.test.ts (22 cases: basic byte-stable round-trip, bracket balancing, multi-paragraph `\n`, real backslash-n, dedup both directions, NESTED + 3-level nest, F1 hash-collision pair surviving as distinct defs, F2 backslash bodies byte-stable, N2 id-seed, column data-fn-text form, orphan def, no-backward-compat, literal-`^[` prose, fail-open, empty `^[]`). package vitest: 607 passed; tsc clean. git-sync: 268 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
bfbd927866 |
feat(prosemirror-markdown): math as $…$ / $$…$$ (#293 canon #6)
mathInline serializes as `$LaTeX$` and mathBlock as an own-line `$$\n<latex>\n$$` fence (multi-line safe), closing hand-authoring gap A18. The LaTeX still lives in node.attrs.text; a literal `$` inside it is escaped `\$`. On the raw-HTML path (columns/cells) math keeps the schema-HTML `<span data-type="mathInline">` / `<div data-type="mathBlock">` form (markdown is not re-parsed inside raw HTML) — blockToHtml gets an explicit mathBlock case and inlineToHtml a mathInline case, sharing the mathInlineHtml/mathBlockHtml helpers with the fallbacks so the two forms cannot drift. Parse: mathInlineExtension (inline) + mathBlockExtension (block) are added to the SAME dedicated marked instance introduced for canon #7 (global singleton untouched). The inline extension uses a currency-safe PANDOC rule: an opening `$` must not be followed by whitespace, and the closing `$` must not be preceded by whitespace nor followed by a digit — so `$5`, `$5 and $10`, `a $5 b $6 c`, `100$` stay literal text while `$x^2$` is math. The block extension matches a `$$` fence line and captures multi-line LaTeX non-greedily up to the next `$$` line. The pandoc boundary rule lives ONCE in the new math-inline.ts (INLINE_MATH_SOURCE) and is shared by the import tokenizer (^-anchored) and the export prose escaper (global), so parse and serialize cannot disagree about what is math. escapeProseMath (case "text", non-code runs only) escapes ONLY the two delimiting `$` of a span the rule WOULD match, so a would-be-math prose span like `the set $A$` re-imports as literal text while currency `$5 and $10` is emitted CLEAN (zero backslash churn). marked decodes `\$`→`$` on re-parse, byte-stable. Fallbacks to the lossless schema-HTML form (all documented + tested): mathInline → <span> when empty / whitespace-edged / multi-line / pre-existing `\$` / trailing `\` / immediately before a digit-text sibling (renderInlineChildren guard, so `$…$5` can't lose the node); mathBlock → <div> when the LaTeX contains `$$`. Each fallback round-trips losslessly and byte-stably. Code safety (guards the canon #7 regression class): codeBlock reads raw child text and inline `code` runs are excluded from escapeProseMath, so `$5`/`$x$` in code stay literal with no math and no backslash corruption. ReDoS-checked on adversarial 40k-char inputs (0–1 ms). Tests: new math.test.ts (26 cases: serialize exactness, multi-line block, `\$` escaping, currency ×5 asserting no `\$`, prose escape, columns schema-HTML, inline-code/codeBlock safety, fail-open). Goldens in roundtrip / markdown-converter flipped top-level math to `$…$`/`$$…$$`; the escapeAttr-idempotence golden wraps math in a column (still exercises escapeAttr); columns/raw-HTML math assertions unchanged. package vitest: 585 passed; tsc clean. git-sync: 268 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
77f5224b55 |
feat(prosemirror-markdown): highlight without color as ==text== (#293 canon #7)
A `highlight` mark WITHOUT a color now serializes as the Obsidian/GFM `==text==`
syntax (closing hand-authoring gap A19); a highlight WITH a color keeps the
`<mark style="background-color: …">` HTML form (condition is deterministic on
the color attr). On the raw-HTML path (columns/spanned cells) BOTH forms stay
`<mark>` via inlineToHtml — markdown is not re-parsed inside a raw-HTML block.
Parse: `==` is not standard markdown, so the importer uses a DEDICATED marked
instance (`new Marked().use({extensions:[highlightMark]})`) rather than the
global singleton — registered once, never leaks `==` behavior to other callers.
The inline extension tokenizes `==text==` (non-empty, non-space-leading inner,
lazy so `==a== ==b==` is two marks; inner re-tokenized so nested marks survive;
`====`/`==x` fail-open to literal) into `<mark>` with no color, which the schema
parses as a color-less highlight. Inline code (`` `a == b` ``) stays code via
marked token precedence. marked 17 defaults (gfm:true, breaks:false) are
identical for the fresh instance, so tables/strike/autolinks are unaffected.
Losslessness: a LITERAL `==` in a text run would otherwise be misparsed as a
highlight on the next import, so `case "text"` backslash-escapes each `=` of a
`==` pair (marked decodes `\=` back to `=`), and this round-trips byte-stably.
The escape does NOT run for inline-code runs, and — CRITICALLY — codeBlock now
reads its child text RAW (schema `content: "text*"`) instead of routing through
`case "text"`: marked does not decode `\=` inside a fence, so escaping there
would permanently stamp backslashes into any `==` comparison (ubiquitous in
source code) and corrupt the block on the git-sync data path.
Tests: new highlight.test.ts (19 cases incl. serialize forms, colored vs plain,
column `<mark>` path, nested marks, inline-code exclusion, literal-`==` escape,
fail-open, AND a codeBlock-with-`==` regression proving no backslash corruption
+ byte-stable round-trip). Golden inline-mark matrix flipped top-level no-color
highlight to `==m==`; the kept `<mark style=…>` assertions are the colored/
raw-HTML cases.
package vitest: 559 passed; tsc clean. git-sync: 268 passed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
e2a3b5fc4d |
feat(prosemirror-markdown): media family as md-form + discriminator comment (#293 canon #8)
Ten media/embed node types move their TOP-LEVEL serialization off raw schema HTML onto a readable markdown target plus an always-emitted discriminator comment whose NAME selects the node type. The schema-HTML form is retained on the raw-HTML/columns path (comments are dropped by the DOM parse stage there). image-form <!--name …--> youtube, video, audio, drawio, excalidraw link-form [text](src)<!--name …--> pdf, attachment, embed (text=filename/provider) standalone <!--pageembed …--> / <!--transclusion …--> pageEmbed, transclusionReference The comment NAME is the node-type discriminator and is ALWAYS emitted, even when the attr JSON is empty (`<!--youtube-->`), so a bare `` is never mistaken for an `image` and a bare `[t](u)` stays a plain link — no URL-sniffing. src rides in the markdown target; every other non-default attr (incl. the id links attachmentId/sourcePageId/transclusionId) rides in the comment JSON (stable key order, numerics stringified, align="center" omitted). New src/lib/media-html.ts: byte-exact builders reproducing the schema HTML each old processNode case returned. Both the serializer's raw-HTML path (blockToHtml, now de-delegated from `return processNode(block)` to explicit per-type cases) and the importer call these, so serialize and parse cannot drift. Import (applyCommentDirectives): image-form binds the preceding <img> (src from it), link-form the preceding <a> (src=href, text=filename/provider), standalone replaces the comment (same leading-doc-level handling as #5). Each rebuilds the schema element via the media-html builder, then swaps it in; the empty-<p> hoist is absorbed by stripEmptyParagraphs. Fail-open: wrong element/position/name or malformed JSON -> inert, no throw. Link-form visible text is escaped (escapeLinkText) for the FULL set of CommonMark inline-active punctuation (\ ` * _ ~ [ ] < & ! ( )), not just [ ] \: the label is parsed as inline content, so a filename/provider like `report *v2*.pdf` or `.pdf` would otherwise lose the markup (or fragment the parse) when the importer reads a.textContent back — a data-loss regression vs the old data-attachment-name form. Adversarial round-trip fixtures lock byte- and value-stability for emphasis/code/strike/autolink/entity/image markers and nested-link names. Tests: new media-comments.test.ts (40 cases: per-type exact md + lossless byte-stable round-trip incl. id links, minimal-node discriminator-still-emitted, in-column schema-HTML form, discriminator integrity, fail-open, active-punct filenames). Goldens in media-roundtrip / markdown-converter-golden / markdown-converter / diagram-roundtrip updated to the md+comment form (columns stay schema-HTML). The former known-limitation image-diagrams fixture is now byte- AND canonically-stable (canon #8 omits the diagram align="center" default) and was promoted from an it.fails into the green corpus (11-image-diagrams.json). git-sync stabilize.test.ts: the "diagram materializes data-align=center" fixpoint moved into a column (where the raw-HTML asymmetry still holds), since top level is now byte-stable. package vitest: 540 passed; tsc clean. git-sync: 268 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
d7d8db2102 |
feat(prosemirror-markdown): images as  + attached img-comment (#293 canon #4)
Every image now serializes as ``; non-default layout/identity attrs
that markdown cannot express ride along in an attached `<!--img {…}-->` comment
on the same line, replacing the prior "image-with-attrs -> raw <img>" split for
the top-level path:
 <!--img {"width":"420","align":"left","attachmentId":"…"}-->
Keys (emitted only when non-default, stable order): width, height, align, size,
aspectRatio, attachmentId, caption, title. Numeric sizing attrs are stringified
in the payload (the import side reads DOM attributes back as strings), so a
numeric `width:420` round-trips byte-stably instead of churning `420 -> "420"`.
attachedCommentFor defuses any `--` in a value (e.g. a caption containing the
comment-closing `-->`) so the payload can never close the comment early.
Align default unified to "center" (#293 canon #4): editor-ext declares
image.align default "center" while this package's schema declared null — keeping
null would make the clean `` form dead code (every editor image is
"center"). Now the schema default is "center" (docmost-schema image align, with
explicit parseHTML/renderHTML), canonicalize KNOWN_DEFAULTS drops align=="center"
for image, and the serializer omits align when it is null OR "center". A null
align collapses to "center" on re-import (a null align is not a distinct editor
state) — stable, no ping-pong. Only left/right emit a comment.
Import: applyCommentDirectives gains an `img` handler that targets the comment's
previousElementSibling <img> and writes each decoded key to the DOM attribute
the schema reads (align, width, height, data-size, data-aspect-ratio,
data-attachment-id, data-caption, title), then removes the comment. Attached
only: a standalone `<!--img-->` with no adjacent image is inert. Fail-open on
malformed JSON / unknown keys.
Raw-HTML path unchanged in spirit: images inside columns/cells keep the
`<img …>` form (comments are dropped by the DOM parse stage); imageToHtml now
omits a redundant align="center" to match the unified default.
Tests: new image-comment.test.ts (21 cases incl. caption == `-->`, numeric-size
byte-stability, image-in-column <img> form, fail-open). Goldens updated
deliberately: markdown-roundtrip-spoiler-caption (captioned image -> comment
form), markdown-converter-gaps spec 14/15 (title now round-trips via comment;
column image drops redundant align), canonicalize-extra (center+null dropped,
left kept).
package vitest: 498 passed | 1 expected-fail; tsc clean. git-sync (rebuilt
build): 268 passed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
e814bca243 |
feat(prosemirror-markdown): subpages/pageBreak as standalone comments (#293 canon #5)
Move the two "invisible machinery" atoms off the <div data-type="..."> HTML
form onto standalone HTML comments on their own line, keeping the markdown
human-readable while still round-tripping:
subpages -> <!--subpages--> / <!--subpages {"recursive":true}-->
pageBreak -> <!--pagebreak-->
Adds standaloneCommentFor(name, attrs?) to attached-comment.ts (emits
`<!--name-->` when attrs are empty/absent, else `<!--name {compact-json}-->`).
The `--`-escaping + compact-JSON logic is factored into a shared internal
escapeCommentJson() so standaloneCommentFor and attachedCommentFor cannot drift
(verified byte-identical output for attachedCommentFor — no #9 regression).
Position determines legality (canon #5): subpages/pagebreak are honored ONLY
standalone; the same comment attached after visible text is inert. The parser
pass (applyAttachedComments renamed applyCommentDirectives) now also
materializes these standalone comments into the schema `<div data-type=...>`
element before generateJSON drops the comment node. A LEADING standalone
comment is parsed at document level (outside <body>); the pass walks the whole
document and re-inserts leading comments into <body> in document order, so
block order is preserved.
Raw-HTML path: blockToHtml gains explicit subpages/pageBreak cases emitting the
`<div data-type=...>` form. Comments are dropped by the DOM parse stage inside
columns/cells, so the div-form must stay there — this also fixes a latent
default-fallthrough (`<div></div>`) that silently dropped these atoms inside a
column.
Tests: new machinery-comments.test.ts (primitive, subpages default/recursive
exact strings + round-trip, pageBreak, subpages-inside-column div-form,
fail-open for attached-position/malformed, and multi-node document-order
regression locking the leading/mid/trailing comment ordering). Top-level
goldens in markdown-converter-golden/gaps updated deliberately to the comment
form; the columns/raw-HTML goldens keep the div-form.
package vitest: 477 passed | 1 expected-fail; tsc clean. git-sync: 268 passed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
f1ab76e879 |
feat(prosemirror-markdown): serialize textAlign as attached comment (#293 canon #9)
Move paragraph/heading textAlign off the HTML-wrapper form
(<p style="text-align:…"> / <hN style=…>) onto a trailing attached HTML
comment on the block line: `text <!--attrs {"textAlign":"center"}-->`. This
keeps the readable markdown block form (plain `text` / `## Title`) while
preserving alignment losslessly. "left"/null stay bare (no churn).
Adds a reusable attached-comment primitive (attached-comment.ts) that #4
(image) and #8 (media) will reuse:
- attachedCommentFor(name, json) -> `<!--name {compact-json}-->`, escaping any
`--` pair inside the JSON as -- so the payload can never close the
comment early;
- parseAttachedComment(data) with grammar `^\s*([A-Za-z][\w-]*)(?:\s+({…}))?\s*$`
whose name excludes `:`, so envelope comments (docmost:meta / docmost:comments)
never match — fail-open on anything malformed.
On import, applyAttachedComments runs AFTER marked.parse but BEFORE generateJSON
(parse5 drops comments), re-expressing the attrs comment as an inline
text-align style on the parent block, then removing the comment node.
Guards: emit only when there is a visible element to attach to — paragraph
requires non-empty text, heading requires non-empty headingText (symmetry:
an empty aligned heading stays bare `##`, no orphan comment).
Goldens in markdown-converter-golden/gaps updated deliberately to the
attached-comment form (assertions stay strict: exact output + lossless
round-trip). New textalign.test.ts (19 tests) covers center/right/justify on
paragraph and heading, byte-stable re-export, and fail-open branches.
Raw-HTML containers (columns/cells/callout via blockToHtml) keep the inline
text-align form intentionally — comments are dropped inside raw HTML.
package vitest: 462 passed | 1 expected-fail; tsc clean. git-sync: 268 passed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
6dcc19ce59 |
refactor(git-sync): consume @docmost/prosemirror-markdown, drop the duplicate lib (#293 stage 3 / no-op)
git-sync's converter-core (src/lib) was a byte-identical duplicate of the new @docmost/prosemirror-markdown package (created in the previous commit). Switch git-sync to consume the package and delete its copy — ending the duplication that the whole #293 effort targets. Pure no-op: NO format/behavior change. - git-sync depends on @docmost/prosemirror-markdown (workspace:*); engine (stabilize/push/pull) + src/index barrel + 12 engine tests re-point their converter imports to the package. - Delete git-sync/src/lib (8 files) and the 23 duplicate converter-core test files + their fixtures — the converter and its ~440 tests now live once, in the package. git-sync keeps only its ENGINE tests, which exercise the converter through the package (the no-op proof). Kept roundtrip-helpers.ts (an engine test imports firstDivergence from it; pure helper, no double-run). - Added docmostExtensions to the package barrel (a kept engine schema-validity test needs it). Verified: editor-ext + prosemirror-markdown + git-sync all tsc EXIT 0; git-sync vitest 28 files, 268 passed, 0 failures (engine cycle/roundtrip/push/ pull/reconcile green = no-op proof); prosemirror-markdown vitest still 443 passed | 1 expected-fail; pnpm --frozen-lockfile EXIT 0; no ../lib refs remain in git-sync. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
d6d7dd82f6 |
feat(prosemirror-markdown): new headless converter package seeded from git-sync (#293 stage 1)
Create @docmost/prosemirror-markdown — the single framework-free ProseMirror<-> Markdown converter + schema mirror that git-sync and mcp will both consume, ending the three-hand-synced-copies drift (#293). This step only CREATES the package (no consumer yet; git-sync untouched); the switch of git-sync and mcp onto it, plus the canonical format decisions, come in later commits of this PR. - packages/prosemirror-markdown/src/lib/: the 8 converter-core files copied VERBATIM from packages/git-sync/src/lib (docmost-schema, markdown-converter, markdown-to-prosemirror, canonicalize, markdown-document, node-ops, page-file, index). Confirmed byte-identical — no behavioral drift introduced. - src/index.ts barrel; package.json (@tiptap/* + jsdom/marked/zod, editor-ext workspace devDep for the contract test); tsconfig/vitest configs. - 24 converter-core test files + fixtures copied (engine-coupled layout/ redteam-layout-title tests correctly excluded — they import ../src/engine). - pnpm-lock importer added; build/ gitignored (CI-built). Verified (clean checkout, no network): pnpm --frozen-lockfile EXIT 0; tsc EXIT 0; vitest 23 files, 443 passed | 1 expected-fail (the same image-diagrams known-limitation carried from git-sync) — faithful extraction. git-sync untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
f5d19f9728 |
Merge pull request 'build(git-sync): пакет @docmost/git-sync в develop, code-only (#326 step 1 / PR-A)' (#327) from feat/293-A-git-sync-package into develop
Reviewed-on: #327 |
||
|
|
351615e5bc |
prompt(mcp): fix inaccurate and misleading tool descriptions
Audit of all 41 tool descriptions against the actual implementation found factually wrong or misleading texts: - list_comments claimed '(paginated)' — it takes only pageId and returns ALL comments in one call (internal pagination); now also states that RESOLVED threads are included and how to filter them. In-app twin synced. - search claimed the limit default is 'applied by the client' — the client deliberately omits it so the SERVER applies its default. - create_page's '(automatically moves it to the correct hierarchy)' said nothing useful — now documents parentPageId nesting semantics; move_page drops the stale 'essential for organizing pages created via create_page'. - share_page now warns the page becomes accessible to ANYONE with the URL. - get_page (both transports) now explains inline <span data-comment-id> tags are comment anchors (incl. resolved) — markup, not page text. - patch_node/delete_node/insert_node pointed only at the expensive page-JSON view for block ids — now route through the cheap page outline first. - docmost_transform marks 'Примечания переводчика' as the DEFAULT notesHeading, overridable for non-Russian pages. Checks: @docmost/mcp tests 450/450 (incl. the server-instructions guard); server ai-chat-tools spec 20/20; mcp build/ artifacts rebuilt. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> |
||
|
|
1fda0ec8b0 |
prompt(mcp): rewrite SERVER_INSTRUCTIONS to cover all tools + guard test
The intent-routing guide had rotted: 17 of 41 registered tools were absent (get_outline, get_node, the whole table_* family, search, stash_page, sharing, page lifecycle), and two tips were actively harmful — 'read block ids via get_page_json' told agents to pull the whole ~100KB document when get_outline exists precisely to grab ids cheaply, and 'table cell -> patch_node by attrs.id' dead-ends because table nodes carry no attrs.id. - Rewrite SERVER_INSTRUCTIONS as intent clusters (READ / EDIT / PAGES / COMMENTS / HISTORY) covering every tool except get_workspace; add safety notes (share_page = PUBLIC, delete_page = soft) and a comment-anchor markup warning for get_page. - delete_page tool description: state SOFT delete / restorable explicitly. - MAINTENANCE RULE comments at both registration sites (index.ts, tool-specs.ts) + an AGENTS.md convention bullet: adding/renaming/removing a tool REQUIRES updating the guide. - New guard test (test/unit/server-instructions.test.mjs): extracts every registered tool name from source and fails when one is not mentioned in the shipped SERVER_INSTRUCTIONS (word-boundary match, so get_page can't hide behind get_page_json); EXCEPTIONS list is itself validated against the registry. SERVER_INSTRUCTIONS exported for the test. Tests: @docmost/mcp 450/450 (448 + 2 new). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> |
||
|
|
5edd75da42 |
build(git-sync): remove committed node_modules + close the nested-node_modules gitignore class (#327 review)
PR-A inherited a committed packages/git-sync/node_modules (31 files: pnpm store symlinks, .bin shims, and a committed vitest .vite cache) that arrived in develop with the dead build/ — the F2 junk class. The root .gitignore `/node_modules` is anchored, so nested packages/*/node_modules slipped through. - git rm --cached the 31 files. - .gitignore: `/node_modules` -> `node_modules/` (non-anchored) so nested package node_modules are ignored at any depth — closes the class, not just this instance. - Add explicit "@docmost/editor-ext": "workspace:*" devDependency to git-sync (schema-editor-ext-contract.test imported it via hoist; now declared). Re-verified in a clean checkout (all from local store, no network): pnpm install --frozen-lockfile EXIT 0; git-sync tsc EXIT 0; vitest 51 files, 711 passed | 1 expected-fail, 0 failures; schema-editor-ext-contract 2/2. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
24b903aaf3 |
build(git-sync): land the @docmost/git-sync package into develop, code-only (#326 step 1 / PR-A)
The git-sync converter + engine source lived only on the #119 branch; develop had just the dead compiled build/. Bring the whole package (src + ~700 tests) onto develop under CI, with NO consumer wired — git-sync stays fully inert in develop (nothing in apps/server imports it), so runtime behavior is unchanged. This unblocks #293 (extract the shared converter package from the landed source) and lets #119's functionality land LAST, already writing the canonical format (per the #326 landing order). - packages/git-sync: src (lib converter + engine) + test corpus + configs. - Remove develop's dead committed packages/git-sync/build/; gitignore it (built in CI/Docker via pnpm build, never committed — no src/build drift). - pnpm-lock.yaml: add the @docmost/git-sync importer (a missing workspace package in the lock is a CI blocker). `pnpm install --frozen-lockfile` passes. - NO server integration / loader / Dockerfile runtime changes (those come with #119 at step 6). Verified: tsc clean; vitest 711 passed | 1 expected-fail, 0 failures, 0 type errors; pnpm --frozen-lockfile EXIT 0; apps/server has no git-sync import. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
588596fb2f |
prompt(agents): teach agent prompts to use comment suggestedText fixes (#315)
- editorial roles (ru/en): proofreader and line editor attach suggestedText replacements to targeted fixes; fact-checker ALWAYS attaches the ready correction for [Incorrect] verdicts; structural editor and narrator get a light-touch rule for in-place rewordings; role versions bumped and the content-hash lock refreshed - MCP SERVER_INSTRUCTIONS: route 'propose a concrete text fix for one-click human approval' to create_comment with suggestedText (unique-selection reminder); build/ artifacts rebuilt - AI-chat SAFETY_FRAMEWORK: mention the comment-suggestion capability so the default assistant offers ready fixes instead of only describing changes Checks: catalog check.mjs OK; @docmost/mcp tests 448/448; server ai-chat.prompt spec 28/28. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> |
||
|
|
48c1ec46f7 |
fix(comment): store the real anchored substring as expectedText + pin authz (#318 F1/F2)
F1 [blocking]: a suggestion whose anchor matched via normalization could never be applied (spurious 409). The comment mark lands on the doc's ACTUAL text (Docmost auto-converts to typographic quotes/dashes/nbsp), but the stored selection — used as expectedText at apply — was the raw ASCII agent input (+substring(0,250)). So replaceYjsMarkedText's strict joined!==expectedText always failed and threw "text changed" though nobody edited. Fix: new pure getAnchoredText(doc, selection) reconstructs the exact raw doc substring the mark covers (slicing identical to spliceCommentMark); on the suggestion path client.createComment stores THAT as selection, so expectedText equals the marked text and apply returns applied:true. Live anchoring still uses the raw agent selection (normalization still finds the anchor). Truncation raised 250->2000 (+ DTO @MaxLength(2000)) so the anchored substring is never cut below the mark span. Ordinary comments unchanged. AI-chat shares client.createComment, so covered. Regression tests: getAnchoredText raw-vs-ASCII; create payload selection is the typographic substring; apply with typographic expectedText -> applied. F2 [blocking]: added comment.controller.spec.ts pinning that validateCanEdit runs before applySuggestion (Forbidden -> applySuggestion never called; happy path -> called; missing comment -> 404 without authorizing). MCP 448 pass; server comment+yjs 54 pass. MCP build/ rebuilt. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
cd539558ed |
feat(agent-tools): suggestedText on create_comment with strict anchor uniqueness (#315 phase 6)
Agents can attach a suggested replacement when creating an inline comment, via both the MCP create_comment tool and the AI-chat createComment tool. Because applying a suggestion edits the EXACT anchored text, an ambiguous anchor would let Apply corrupt the wrong occurrence. So when suggestedText is set the selection must occur EXACTLY ONCE: - new countAnchorMatches(doc, selection) counts occurrences across all blocks (same normalization/traversal as canAnchorInDoc), counting occurrences (2 in one block => 2) — stricter than block-count, never under-counting distinct occurrences (false-unique is the dangerous direction). - client.createComment gains suggestedText: a pre-check (getPageJson + countAnchorMatches: 0 => not-found, >=2 => ambiguity error) before create, and an AUTHORITATIVE live check inside the anchoring mutation that recomputes on the live doc and, if != 1, aborts and rolls back the just-created comment (reusing the existing safeDeleteComment "anchor not found" path). Ordinary comments keep first-occurrence behavior unchanged. - suggestedText is rejected on a reply or without selection in all three layers (MCP handler, MCP client, AI-chat tool), mirroring the server DTO/service. - filterComment surfaces suggestedText/suggestionAppliedAt/suggestionAppliedById. - DocmostClientLike.createComment signature updated. MCP build/ rebuilt. Tests: countAnchorMatches (0/1/N, within/across/nested block, span nodes, quote normalization); createComment (ambiguous refused pre-create, reply and no-selection rejected, unique succeeds and forwards suggestedText, filterComment surfaces it); ai-chat schema accepts suggestedText. MCP 443 pass; ai-chat 601 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
36b3539571 |
Merge pull request 'refactor(ai-chat): move patch_node/insert_node into the shared tool-spec registry (#294)' (#305) from refactor/294-tool-spec-registry into develop
Reviewed-on: #305 |
||
|
|
86c1307ed2 |
fix(#300 review): drop stray symlink, re-fetch enriched on comment update, cover history mapping (F1/F2/F3)
F1: remove an accidentally-committed self-referential symlink
packages/mcp/node_modules/node_modules -> an absolute build-machine path (leaked a dev
home path, a pnpm artifact useless in the repo), and add a targeted ignore so it can't
recommit.
F2: the commentUpdated broadcast re-emitted the caller's pre-loaded comment mutated in
place, so the {agent,launcher} stack survived only because the controller happened to
load it with includeCreator:true — the fragile coupling that let the stack vanish on
edit once already. update() now RE-FETCHES the enriched comment before broadcasting,
symmetric with create()/resolveComment() (the row is already persisted), so all three
broadcasts carry the stack regardless of any caller's pre-load. Adds a caller-contract
test asserting all three broadcasts emit agent/launcher for an agent comment and neither
for a non-agent one, spotlighting the update path (non-vacuous vs the old re-emit).
F3: add a direct test of the page-history attachPageHistoryAgent mapping (its distinct
lastUpdatedSource/lastUpdatedAiChatId/lastUpdatedBy column set): role / no-role / MCP /
non-agent, and that the internal agentRole join column is stripped.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
f720151c63 |
refactor(ai-chat): move patch_node/insert_node metadata into the shared tool-spec registry (#294)
The same tool metadata (zod schema + model-facing description) was hand-duplicated between the standalone MCP server and the in-app AI-chat agent, so every tweak had to land in two places and copies drifted (a materialized parity bug). The shared transport-agnostic registry (packages/mcp/src/tool-specs.ts) already de-duplicates 14 tools; this migrates two more genuinely-identical ones — patch_node/patchNode and insert_node/insertNode. The canonical description is a strict SUPERSET of both originals (keeps MCP's "without resending the whole document" + table-structure/anchor guidance AND the in-app "reversible via page history" / "exactly one of anchorNodeId or anchorText" framing — no model-facing guidance dropped); the schema is identical (the in-app side just gains MCP's .min(1) on ids, a safe tightening). Each transport keeps its own execute/auth wrapper, and the in-app parseNodeArg node-arg normalization is unchanged. The three table tools are intentionally NOT merged (a real param-name divergence: table vs tableRef) — documented on both sides. Other per-transport divergences (search/share/create_comment/transform/list_pages) are left separate with a short comment explaining why (the issue asked to flag these as intentional). DocmostClientLike stays a hand-mirror (the ESM/CJS boundary blocks a compile-time type import; a runtime drift-guard already pins it). Also fixes a latent contract-spec bug: derive `required` from `instanceof z.ZodOptional` (matches the emitted JSON schema) instead of `isOptional()`, which wrongly reported z.any() fields as optional. Partially addresses #294. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
0968ea97d2 |
feat(ai-chat): agent avatar stack — agent in front, launcher behind (#300)
For AI-agent-authored content (comments + page history), replace the text AI-AGENT badge with an avatar stack: the agent in front, the human who launched it smaller and behind. This fixes the inverted hierarchy (the action was the agent's; the human just launched it). closes #300. Backend: a single server-authoritative resolver resolveAgentProvenance normalizes to { agent, launcher } from server columns only (createdSource/lastUpdatedSource, aiChatId, creator, chat role) — nothing from request input, so agent identity can't be spoofed. Internal chat -> agent = chat role (name/emoji), launcher = human; external MCP (aiChatId null) -> agent = the agent account, launcher = null; non-agent -> neither. The role join (aiChatId -> ai_chats.role_id -> ai_agent_roles) deliberately does NOT filter enabled/deleted_at, so a later-disabled role still labels historical content (mirrors findById, not findLiveEnabled). Enrichment is applied on BOTH findPageComments (list) AND findById (the create/resolve/update broadcast path), so the stack shows on live comment events and doesn't vanish on resolve/edit. Frontend: new AgentAvatarStack + AgentGlyph (avatarUrl -> role emoji on violet -> IconSparkles on violet), integrated into comment-list-item and history-item where the badge was; the deep-link-to-chat click moved onto the stack. ai-agent-badge removed. Tests: AgentAvatarStack (role/no-role/MCP/click/non-clickable), the provenance resolver + recorder tests proving the role join never filters enabled/deleted, and findById enrichment (guards the live-broadcast regression). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
4d8315da5c |
docs(#298 review): document the browser-safety invariant of the isNodeRuntime guard (F1)
The whole fix's correctness rests on isNodeRuntime being false in the browser (so the interactive live-DOM comment branch still runs), and that is NOT covered by any test (client vitest runs under jsdom->node where isNodeRuntime is true). Document it: Vite substitutes only process.env, not the bare process object, so typeof process is undefined in the client bundle; do not add a process polyfill without revisiting this guard, or comment interactivity dies silently. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
3f7e1bdc7b |
fix(export): stop comment.renderHTML returning a live jsdom node on the server (#298)
Page/space export (Markdown & HTML, both via jsonToHtml -> generateHTML) crashed with "Export failed:undefined" on any page carrying a `comment` mark. Root cause: comment.renderHTML returned a LIVE DOM node (document.createElement + a click listener) whenever a global `document` existed — and the in-process MCP module injects a jsdom global.window+global.document into the Node server, defeating the old `typeof document === "undefined"` guard. The server export runs happy-dom's DOMSerializer, which crashes appending the foreign jsdom node (NodeUtility.isInclusiveAncestor -> "Cannot read properties of undefined (reading 'length')"). comment is the only extension returning a live node. Fix: widen the guard with an isNodeRuntime check (process.versions.node) so on any Node runtime renderHTML returns the plain, serializable spec array — even when MCP injected jsdom globals. The browser branch (createElement + click -> ACTIVE_COMMENT_EVENT) is untouched, so in-editor comment interactivity is preserved (Vite defines only process.env as a member-expression substitution, no `process` object in the browser bundle, so isNodeRuntime is false there). The mcp schema mirror already returns a spec array and is not on the export path (tiptapExtensions imports Comment from @docmost/editor-ext), so no mirror change is needed. Also: export-modal now reads the real error text from the response Blob (responseType:'blob' made err.response.data.message always undefined) so a failed export shows the server's message instead of "undefined". Adds a regression test that runs the real jsonToHtml on a comment-marked doc with jsdom globals injected (reproduces the crash on the unpatched code, passes after). closes #298 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
5664da57ad |
feat(editor): center inline image rows by default via CSS :has()
Follow-up to #284: rows of inline-aligned images were pinned left while a single image defaults to centered — inconsistent. A row has no DOM wrapper (each image is an independent block node), so its placement is controlled by the text-align of the nearest block ancestor. - media.css: enable text-align:center only on containers that actually hold a direct inline-image child (:has), and reset every other child back to text-align:start so ordinary text is unaffected; explicit per-block toolbar alignment (inline style) still wins; browsers without :has() keep the previous start-pinned rows - image.ts: comment in the inline branch now points to the media.css rule (cross-package discoverability), no code change Reviewed: math/caption/table-header/footnote text-align rules audited; React node views are wrapped in .react-renderer, so .mathBlock is not a direct child and keeps its own centering (verified in happy-dom). |
||
|
|
20032be921 |
feat(editor): inline image alignment — place several images side by side
Add a new value "inline" to the image align attribute (alongside
left/center/right/floatLeft/floatRight). Inline images render as
inline-block containers, so consecutive ones form a row that wraps
naturally on narrow viewports; unlike the float modes, text does not
wrap around them.
- applyAlignment: reset-then-apply extended to display/vertical-align;
the reset restores the constructor's inline display:flex so non-inline
modes keep byte-identical styles and editor-ext stays independent of
the client CSS class
- image bubble menu: new "Inline (side by side)" button (IconLayoutColumns)
with active state, mirroring the float buttons
- i18n: key registered in en-US and ru-RU ("В ряд"), like the float labels
- tests: 3 new applyAlignment specs (apply, reset on switch-away, float->inline)
- no schema/MCP/markdown changes needed: align round-trips as data-align
|
||
|
|
e04afee629 |
test(#260): cover replaceImage's UUID lock-key invariant; drop dead cache line
Reviewer round 1 on the #260 collab-doc-name fix: - F1: replaceImage is the one path where the resolved UUID gates BOTH the collab-doc open AND the per-page mutex key (withPageLock(pageUuid)). Add a deterministic test to resolve-page-id-collab-doc-name.test.mjs: it gates /files/upload so replaceImage parks mid-upload holding its lock, asserts the doc opened as page.<uuid> (never page.<slug>), and probes the SHARED page-lock chain — a withPageLock(UUID) probe must stay blocked while replaceImage holds it (with a free-key probe as a non-vacuity guard). The test fails if the lock key is reverted to the slugId (verified). - F2: drop the dead `pageIdCache.set(uuid, uuid)` — resolvePageId returns on the isUuid() short-circuit before the cache is ever read with a uuid key, so only slugId->uuid entries are stored/read. Comment corrected to match. MCP suite 430/430, tsc 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
3b80285d57 |
fix(#260): open MCP collab docs by canonical UUID (slugId doc-name split)
Real root cause of the silent MCP edit loss: the web editor always opens the
collaboration document by the page UUID (`page.${page.id}`), but the MCP
opened it by the agent-supplied id — usually a slugId — so `page.${pageId}`
became `page.<slugId>`. For one DB page that is TWO independent Yjs documents;
both persist to the same `pages` row (findById/updatePage resolve id or
slugId), so the human tab's debounced store overwrites the agent edit
(last-store-wins) — gone after reload, never shown live. The slugId doc also
made the server's transclusion sync + embedding reindex throw Postgres 22P02.
Fix:
- MCP (primary): resolvePageId(pageId) returns the canonical UUID — a UUID
short-circuits with no network call, a slugId resolves once via getPageRaw
and is cached both ways. Every collab-write path (mutatePageContent /
updatePageContentRealtime / replacePageContent and the mutate/replace/
unlocked seams) now opens by the resolved UUID, so the MCP and the editor
share ONE Yjs doc. replaceImage's whole-operation page lock also keys on the
UUID so it serializes against the other (now-UUID-keyed) writes.
- Server (defense + kills the 22P02 noise): onStoreDocument passes the resolved
page.id — not the raw doc-name id — to syncTransclusion, the embedding queue,
the mention-notification job, addContributors, and the in-tx history read.
Content store and the empty-guard are untouched.
Tests: a new MCP test stands up a real Hocuspocus server and asserts a slugId
input opens `page.<uuid>` (never `page.<slugId>`), with UUID short-circuit and
single-resolve caching; the server spec asserts the side-effects receive the
UUID for a `page.<slugId>` doc. closes #260
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
f8d26420eb |
test(mcp): add stashPage to HOST_CONTRACT_METHODS (fix drift-guard)
stashPage is declared in the server's DocmostClientLike interface and shipped as the stash_page MCP tool (client.ts, tool-specs.ts, index.ts), but the hand-maintained HOST_CONTRACT_METHODS mirror in the contract test was never updated — so the drift-guard test failed and broke CI's unit-test job. Add the missing name; both directions now agree. |
||
|
|
14f83abe78 |
fix(editor-ext): remove duplicate escapeHtmlAttr (TS2393, broken CI)
Merging the image-captions (#221) and lossless-export branches each added its own escapeHtmlAttr in turndown.utils.ts, producing two implementations of the same function and breaking `tsc --build` (TS2393) — which failed the Build editor-ext step across all CI jobs. Drop the lighter image-captions duplicate (escapes & and ") and keep the fuller version (escapes & " < >). It is a strict superset: both call sites (serializeAttrs, the image rule) place the value inside a double-quoted HTML attribute, where extra < > escaping is harmless and idempotent on re-import. Verified: editor-ext builds; turndown.dataloss + image-markdown tests pass. |
||
|
|
22ea387495 |
Merge pull request 'feat(#246): inline spoiler mark (blur + click-reveal, lossless Markdown)' (#259) from feat/246-spoiler into develop
Reviewed-on: #259 |
||
|
|
b56a1629d2 |
Merge pull request 'feat(editor): image captions (figcaption) with lossless markdown round-trip (#221)' (#233) from feat/221-image-captions into develop
Reviewed-on: #233 |
||
|
|
7e6dd457a4 |
Merge pull request 'refactor(#193): tool-host drift-guard + staged plan (shared spec registry already merged)' (#249) from refactor/193-tool-spec-registry into develop
Reviewed-on: #249 |
||
|
|
9bbac29bc5 |
Merge remote-tracking branch 'gitea/develop' into HEAD
# Conflicts: # apps/server/src/collaboration/extensions/persistence-store.spec.ts # apps/server/src/collaboration/extensions/persistence.extension.ts |