feat(mcp): serve embedded community MCP server at /mcp

Replace the removed enterprise EE MCP (private apps/server/src/ee submodule,
license-gated /mcp route) with our docmost-mcp, vendored as an isolated ESM
workspace package and served by the server over HTTP — no enterprise license.

Backend:
- Add packages/mcp (@docmost/mcp): vendored docmost-mcp refactored into a
  side-effect-free createDocmostMcpServer() factory (38 tools preserved),
  stdio entry kept in stdio.ts, Streamable-HTTP session manager in http.ts.
- Add apps/server McpModule: @Post/@Get/@Delete('mcp') (served at /mcp via the
  existing global-prefix exclude), @SkipTransform + reply.hijack to bridge raw
  Fastify req/res into the SDK transport. The module dynamically imports the
  ESM-only package from CommonJS via a Function-indirected import resolved with
  require.resolve + file:// URL. Gated by the workspace ai.mcp toggle, a
  service-account (MCP_DOCMOST_EMAIL/PASSWORD/API_URL) and optional MCP_TOKEN;
  per-session idle eviction (MCP_SESSION_IDLE_MS).
- Drop the enterprise license check on mcpEnabled in workspace.service.
- Dockerfile: copy packages/mcp into the production image.
- .env.example: document MCP_DOCMOST_*, MCP_TOKEN, MCP_SESSION_IDLE_MS.

Frontend:
- Recreate the community "AI & MCP" workspace-settings panel (mcp-settings.tsx):
  admin-only toggle on settings.ai.mcp with optimistic update, copyable
  ${APP_URL}/mcp URL; wired into workspace-settings page. Reuses existing i18n.

Fixes:
- Pin packages/mcp tiptap deps to 3.20.4 (matching the client) and inline
  getStyleProperty, preventing a duplicate @tiptap/core@3.26.1 from leaking into
  the client editor via pnpm shamefully-hoist (was breaking apps/client tsc).
This commit is contained in:
vvzvlad
2026-06-16 23:54:53 +03:00
parent 1b693edf2b
commit 1f5987d6b0
92 changed files with 21690 additions and 7 deletions

357
packages/mcp/README.md Normal file
View File

@@ -0,0 +1,357 @@
# Docmost MCP Server
**English** · [Русский](README.ru.md)
A Model Context Protocol (MCP) server for [Docmost](https://docmost.com/) that lets
AI agents **read, search, write, restructure, review, version, comment on, illustrate
and publish** documentation — safely, against a live instance, without an enterprise
license.
> **Written by an agent, for agents.** A human edits a document with their eyes and hands:
> they read it, click into the editor, and retype. An agent works differently — it is far
> better at *writing a small function that fixes the text* than at re-reading and
> re-emitting a whole document. So this server is built around the way a model actually
> wants to edit: address a block by id, run a find/replace, or hand it a
> `(doc, ctx) => doc` transform and let it *program* the change. `docmost_transform` is
> that interface. Other Docmost MCPs are human-shaped — they expose "open the page" and
> "replace the page"; this one exposes the editing primitives a model is good at.
It exposes **38 tools** built around three ideas that the other Docmost MCPs do not
combine:
1. **Surgical, token-cheap edits.** Address a single block by id and patch it, or run
a find/replace, instead of round-tripping a whole ~100 KB document through the model.
2. **Safe live writes.** Every mutation goes through Docmost's real-time collaboration
layer (the same WebSocket the web editor uses), serialized per page, so it never
clobbers a concurrent human edit and is confirmed persisted before the tool returns.
3. **A real safety net.** Version history, a Docmost-equivalent diff, a one-call
restore, and a dry-run preview for scripted rewrites — so an agent can edit
boldly and you can always see and undo what it did.
---
## Why this server (vs. the alternatives)
There are several Docmost MCPs. Here is a capability-by-capability comparison.
"Official" is Docmost's built-in MCP; the others are the community projects on GitHub.
| Capability | **This server** | Official (built-in) | MrMartiniMo/docmost-mcp | cyborgx0x/mcp-docmost | aleksvin8888 / isak-landin |
| --- | :---: | :---: | :---: | :---: | :---: |
| **Enterprise license required** | **No** | **Yes** | No | No | No |
| Authentication | email + password, **auto re-auth** | API key | email + password | cookie `authToken` (copy from DevTools) | Docmost API / **direct PostgreSQL** |
| Read page as Markdown | ✅ | ✅ | ✅ | ✅ | ✅ (read-only) |
| **Lossless Markdown round-trip** (export / import, keeps comment anchors) | ✅ | — | — | — | — |
| Read **lossless ProseMirror JSON** (with block ids) | ✅ | — | — | — | — |
| **Compact page outline** (cheap block-id lookup) | ✅ | — | — | — | — |
| **Fetch a single block** (by id or index) | ✅ | — | — | — | — |
| Create / move / delete pages | ✅ | ✅ | ✅ | ✅ | — |
| **Per-block edits** (patch/insert/delete by id) | ✅ | — | — | — | — |
| **Surgical find/replace** (structure-preserving) | ✅ | — | — | — | — |
| **Scripted JS transform** (sandboxed, dry-run diff) | ✅ | — | — | — | — |
| **Structured table editing** (row / cell CRUD) | ✅ | — | — | — | — |
| Page **version history** | ✅ | — | — | ✅ | — |
| **Diff two versions** | ✅ | — | — | — | — |
| **Restore a version** (revertible) | ✅ | — | — | — | — |
| **Comments** (CRUD + inline anchoring) | ✅ | — | — | ✅ | — |
| **Poll for new comments** since a timestamp | ✅ | — | — | — | — |
| **Images** (insert / replace) | ✅ | — | — | — | — |
| **Public share links** (create / revoke / list) | ✅ | — | — | — | — |
| Export to HTML / PDF | — | — | — | ✅ | — |
| **Safe real-time-collab writes** (no clobber, confirmed) | ✅ | n/a | ✅ | — | n/a (read-only) |
### What that means in practice
- **No enterprise tax.** Docmost's official MCP is an enterprise feature: it needs an
active enterprise license. This server is MIT and
talks to *any* self-hosted Docmost over the standard API + collaboration socket, with
nothing but an account email and password.
- **Token-efficient editing.** Most Docmost MCPs (and the official one) only offer
"replace the whole page" writes — the agent must download the entire document, mutate
it, and upload it back, paying for the full document **twice** on every tiny fix.
This server lets the agent change exactly one block (`patch_node` / `insert_node` /
`delete_node`), do a structure-preserving find/replace (`edit_page_text`), or copy a
whole page server-side (`copy_page_content`) — **without the document ever passing
through the model**.
- **Writes that don't fight the editor.** Naive REST writes race with whatever a human
is typing and can silently overwrite their edits, or fail against Docmost's debounced
save. This server applies every change through the live collaboration document
(Hocuspocus/Yjs), reading and writing **synchronously inside one sync tick** so no
concurrent edit can interleave, serializing writes **per page** with a mutex, and
**waiting for the server to acknowledge persistence** before returning. If the socket
drops mid-write, the tool errors instead of falsely reporting success.
- **Agent-native editing model.** Human-facing servers expose "open the page" and "replace
the page", because that mirrors how a person works. A model edits better by *programming*
the change — addressing blocks by id, running a find/replace, or supplying a
`(doc, ctx) => doc` transform (`docmost_transform`, with a dry-run diff before it
commits). This server is shaped around that, which is why it has editing primitives the
others simply don't.
- **An editing safety net the others lack.** `list_page_history``diff_page_versions`
`restore_page_version` give an agent (and you) a full view-and-undo loop. The diff
uses the *same* `recreateTransform → ChangeSet → simplifyChanges` pipeline Docmost's
own history viewer uses, so what you see matches the product.
- **Convenience over cookie-scraping.** Some community servers authenticate by making
you copy a session cookie out of your browser's DevTools (it expires), or by reaching
**directly into the PostgreSQL database**. This server logs in with credentials and
**transparently re-authenticates on
a 401/403** (with in-flight de-duplication), so long-running agents don't die when a
token expires. It also respects Docmost's own access control, because it goes through
the API and the collaboration server like a normal user.
---
## Tools
All 38 tools, grouped by what you'd reach for them.
### Exploration & retrieval
- **`get_workspace`** — Information about the current Docmost workspace.
- **`list_spaces`** — All spaces in the workspace.
- **`list_pages`** — Recent pages in a space, ordered by `updatedAt` desc (default 50,
max 100). Use `search` for lookups in large spaces.
- **`search`** — Full-text search across pages and content (bounded by `limit`, max 100).
- **`get_page`** — A page's content as clean **Markdown** (convenient, but a *lossy*
view — block ids and exact table/callout structure are approximated).
- **`get_page_json`** — A page's **lossless ProseMirror/TipTap JSON**, including every
block's `attrs.id` and the `slugId` used in URLs. This is what the per-block editing
tools consume.
- **`get_outline`** — A compact outline of a page's top-level blocks (`{index, type, id,
level, firstText}`; tables add row/column counts and their header-cell texts, lists add
item counts) **without** the document body. The cheap way to locate a section or table
and grab its block id before
`get_node` / `patch_node` / `insert_node`.
- **`get_node`** — Fetch a single block's full ProseMirror subtree (lossless) without
pulling the whole page. Address it by a block id (from `get_outline` / `get_page_json`),
or by `#<index>` for a top-level block — use the `#<index>` form for tables/rows/cells,
which carry no id.
### Page lifecycle
- **`create_page`** — Create a page from Markdown and place it in the hierarchy (optional
`parentPageId`) in one call. Uses Docmost's import API for clean Markdown→ProseMirror.
- **`rename_page`** — Change a page's title only, without touching or resending content.
- **`move_page`** — Re-parent a page (nest it, or move to root); supports fractional-index
positioning. Returns only on a *positively confirmed* success.
- **`delete_page`** — Delete a single page.
- **`copy_page_content`** — Replace one page's body with a copy of another's, **entirely
server-side** — the document never passes through the model. The target keeps its own
title and slug (so its URL is preserved).
### Editing
- **`edit_page_text`** — Surgical find/replace inside a page's text. Preserves **all**
structure: block ids, marks, links, callouts, tables. The preferred tool for fixing
wording, typos, numbers and names.
- **`patch_node`** — Replace a single block addressed by its `attrs.id` (from
`get_page_json`), without resending the document.
- **`insert_node`** — Insert a block before/after another (by `attrs.id` or anchor text),
or append at the end.
- **`delete_node`** — Remove a single block by its `attrs.id`.
- **`update_page_json`** — Replace a page's entire content with a ProseMirror document
(bulk rewrites, or when nodes lack ids). `content` is optional — omit it to update only
the title. Keeps the block ids you pass in, so heading anchors and history stay stable.
- **`docmost_transform`** — The agent-native editing interface: instead of retyping a
document, the agent **writes a function that fixes it**. Edit a page by running an
arbitrary **`(doc, ctx) => doc` JavaScript transform** against its *live* ProseMirror
document. Runs **sandboxed**
(no `require`/`process`/`fs`/network, 5 s timeout). **Dry-run by default**: returns a
diff preview without writing; set `dryRun:false` to apply atomically. `ctx` exposes the
page's comments and a toolbox of helpers (`walk`, `getList`, `blockText`,
`insertMarkerAfter`, `setCalloutRange`, `commentsToFootnotes`, …) for multi-step,
coordinated rewrites such as renumbering, or turning inline comments into numbered
footnotes.
### Tables
- **`table_get`** — Read a table as a matrix: `{rows, cols, cells (text[][]), cellIds}`
(a paragraph id per cell, or `null`). Address the table by `#<index>` (from
`get_outline`) or any block id inside it. Use `cellIds` with `patch_node` for
rich-formatted cell edits.
- **`table_insert_row`** — Insert a row of plain-text cells, padded to the table's column
count (passing more cells than columns is an error). `index` is the 0-based insert
position (0 inserts before the header); omit it to append at the end.
- **`table_delete_row`** — Delete the row at a 0-based `index`. Refuses to delete a table's
only row; deleting row 0 promotes the next row to header.
- **`table_update_cell`** — Set the plain-text content of cell `[row, col]` (0-based). For
rich formatting, `patch_node` the cell's paragraph id from `table_get`.
### Markdown round-trip
- **`export_page_markdown`** — Export a page to a single self-contained, **lossless
Docmost-flavoured Markdown** file: a meta header, the body with inline comment anchors
and diagrams, and a trailing comments-thread block. Built for a download → edit body →
`import_page_markdown` round-trip that preserves everything, including comment highlights.
- **`import_page_markdown`** — Replace a page's content from a Docmost-flavoured Markdown
file produced by `export_page_markdown`, restoring comment-highlight anchors and diagrams
from their inline HTML. (Comment *threads* in the file are not re-created on the server —
only the page body and inline comment marks are written; manage threads via the comment
tools/UI.)
### Images
- **`insert_image`** — Upload a local image and insert it in one step: append it, drop it
in place of a text placeholder (`replaceText`), or put it after a given block
(`afterText`). Preserves all other block ids.
- **`replace_image`** — Swap an existing image. Uploads the new file as a **fresh
attachment** (clean URL that renders and busts browser caches), then re-points every
node referencing the old attachment (recursively, including callouts/tables) via the
live document, preserving comments, alignment and alt text. (In-place overwrite is
deliberately avoided — some Docmost versions corrupt the attachment on overwrite.)
### Comments
- **`create_comment`** — Add a page comment, optionally **anchored inline** to an exact
span of text (the first occurrence is wrapped in a comment mark).
- **`list_comments`** — List a page's comments (content returned as Markdown).
- **`update_comment`** — Edit an existing comment.
- **`delete_comment`** — Delete a comment.
- **`check_new_comments`** — Find comments created after a given ISO-8601 timestamp across
a space, optionally scoped to a page subtree — ideal for an agent that watches a doc for
feedback.
### Versioning & history
- **`list_page_history`** — A page's saved versions (Docmost auto-snapshots on save),
newest first, cursor-paginated. Each item's id is the `historyId`.
- **`diff_page_versions`** — Diff two versions (or a version against the live page).
Returns inserted/deleted text, integrity counts (images, links, tables, callouts,
footnote markers), and a human-readable Markdown summary — computed with the same
pipeline Docmost's own history viewer uses.
- **`restore_page_version`** — Write a saved version back as the current content. Docmost
has no restore endpoint, so this creates a **new** snapshot — the restore is itself
revertible.
### Sharing
- **`share_page`** — Make a page publicly accessible (idempotent) and return its public
URL (`<app>/share/<key>/p/<slugId>`); optional search-engine indexing.
- **`unshare_page`** — Revoke a page's public share.
- **`list_shares`** — All public shares in the workspace, with titles and public URLs.
---
## Choosing the right editing tool
This same guidance is also delivered at runtime via the MCP server `instructions` field,
so capable clients steer the model automatically.
- **Text fixes** (wording, typos, numbers): `edit_page_text`.
- **One block** (paragraph/heading/callout/table cell): `patch_node` / `insert_node` /
`delete_node`, addressing the node by its `attrs.id` from `get_page_json`.
- **Images**: `insert_image` / `replace_image`.
- **A new page**: `create_page`.
- **Bulk rewrite, or nodes without ids**: `update_page_json`.
- **Multi-step / scripted rewrite** (renumbering, footnotes, coordinated edits):
`docmost_transform` — preview with `dryRun`, then apply.
- **Copy a whole page's content from another page** (server-side): `copy_page_content`.
- **Rename a page** (title only): `rename_page`.
- **Reads**: `get_page` (Markdown) / `get_page_json` (lossless ProseMirror with ids).
- **Review changes**: `list_page_history` → `diff_page_versions` → `restore_page_version`.
- **Comments**: `create_comment` (with optional inline anchoring) / `list_comments` /
`update_comment` / `delete_comment` / `check_new_comments`.
- **Navigate a page cheaply** (find a section/table, grab a block id): `get_outline` →
`get_node`.
- **Tables** (add/remove a row, set a cell): `table_get` / `table_insert_row` /
`table_delete_row` / `table_update_cell`.
- **Round-trip a page as Markdown** (download, edit, re-upload losslessly with comments):
`export_page_markdown` / `import_page_markdown`.
---
## How it works (technical details)
- **Safe real-time-collaboration writes.** Content mutations are applied through Docmost's
collaboration WebSocket (Hocuspocus + Yjs). The server connects, waits for the initial
sync so its local doc mirrors the authoritative server doc (including edits not yet in
the debounced REST snapshot), then **reads → transforms → writes synchronously** in one
tick so no remote update can interleave, and **waits for persistence acknowledgement**
before returning.
- **Per-page write serialization.** A per-`pageId` async mutex ensures two MCP writes to
the same page never overlap; different pages never block each other.
- **Transparent re-authentication.** Login uses email/password; expired tokens are
refreshed automatically on the first 401/403 (covering JSON, multipart upload, and the
collaboration-token path), with in-flight login de-duplication so a burst of calls
triggers a single re-login.
- **Lossless and lossy reads.** `get_page_json` returns the exact ProseMirror tree with
block ids; `get_page` returns clean Markdown for convenience.
- **Full Docmost schema.** Markdown↔ProseMirror conversion supports callouts (including
nested), task lists (bullet *and* numbered checklists), tables, math blocks, embeds,
highlights, sub/superscript and more, with defensive caps against pathological input.
- **Structured tables & lossless Markdown round-trip.** Tables can be edited as a matrix
(read, insert/delete rows, set cells by `[row,col]`) without resending the document, and
a page can be exported to and re-imported from a self-contained Docmost-flavoured
Markdown file that preserves inline comment anchors and diagrams.
- **Token-optimized responses.** API responses are filtered down to the fields agents
actually need, and large collections (spaces, pages, comments, history) are paginated.
- **Hardened runtime.** Global handlers keep a stray socket error from tearing down the
stdio server; `move_page` requires a positively confirmed success; the diff engine
falls back to a coarse block diff rather than hard-failing on a pathological document.
---
## Installation
```bash
npm install
npm run build
```
## Configuration
The server requires three environment variables:
- `DOCMOST_API_URL` — full URL to your Docmost API (e.g. `https://docs.example.com/api`).
- `DOCMOST_EMAIL` — account email for authentication.
- `DOCMOST_PASSWORD` — account password.
## Usage with Claude Desktop / a generic MCP client
Add the server to your MCP configuration (e.g. `claude_desktop_config.json`):
```json
{
"mcpServers": {
"docmost-local": {
"command": "node",
"args": ["./build/index.js"],
"env": {
"DOCMOST_API_URL": "http://localhost:3000/api",
"DOCMOST_EMAIL": "test@docmost.com",
"DOCMOST_PASSWORD": "test"
}
}
}
}
```
## Development
```bash
# Watch mode
npm run watch
# Build
npm run build
# Tests (unit + mock; the live end-to-end suite needs a running Docmost)
npm test
npm run test:e2e
```
## Lineage & acknowledgements
This project began as a fork of [MrMartiniMo/docmost-mcp](https://github.com/MrMartiniMo/docmost-mcp)
(by Moritz Krause) and extends it substantially — adding per-block node editing,
surgical text edits, the sandboxed `docmost_transform`, version history / diff / restore,
comments, image insert/replace, public sharing, server-side page copy, dual
JSON/Markdown reads, transparent re-authentication and significant hardening. The comment
tools were ported from upstream PR #3 by Max Nikitin. Thanks to both.
## License
MIT

371
packages/mcp/README.ru.md Normal file
View File

@@ -0,0 +1,371 @@
# Docmost MCP Server
[English](README.md) · **Русский**
Сервер Model Context Protocol (MCP) для [Docmost](https://docmost.com/), который
позволяет ИИ-агентам **читать, искать, писать, реструктурировать, рецензировать, вести
версии, комментировать, иллюстрировать и публиковать** документацию — безопасно, на живом
инстансе и без enterprise-лицензии.
> **Написан агентом для агентов.** Человек правит документ глазами и руками: читает,
> заходит в редактор, перепечатывает. Агент работает иначе — ему гораздо проще *написать
> небольшую функцию, которая чинит текст*, чем перечитывать и заново выдавать весь
> документ. Поэтому сервер построен вокруг того, как модели на самом деле удобно
> редактировать: адресовать блок по id, сделать find/replace или передать трансформ
> `(doc, ctx) => doc` и позволить модели *запрограммировать* правку. `docmost_transform` —
> это и есть такой интерфейс. Другие Docmost-MCP «заточены под человека» — они дают
> «открыть страницу» и «заменить страницу»; этот даёт примитивы редактирования, в которых
> модель сильна.
Сервер предоставляет **38 инструментов**, построенных вокруг трёх идей, которые другие
Docmost-MCP не сочетают:
1. **Точечные, экономичные по токенам правки.** Адресуйте отдельный блок по id и патчите
его или делайте find/replace вместо того, чтобы гонять весь документ ~100 КБ через
модель.
2. **Безопасная запись на живой документ.** Каждая мутация проходит через слой
коллаборации реального времени (тот же WebSocket, что использует веб-редактор),
сериализуется по странице, поэтому никогда не затирает параллельную правку человека и
подтверждается как сохранённая до возврата из инструмента.
3. **Настоящая страховка.** История версий, дифф, эквивалентный Docmost, восстановление
одним вызовом и предпросмотр (dry-run) для скриптовых правок — чтобы агент мог
редактировать смело, а вы всегда могли увидеть и откатить сделанное.
---
## Почему именно этот сервер (в сравнении с альтернативами)
Существует несколько Docmost-MCP. Ниже — сравнение по возможностям.
«Официальный» — встроенный MCP Docmost; остальные — community-проекты на GitHub.
| Возможность | **Этот сервер** | Официальный (встроенный) | MrMartiniMo/docmost-mcp | cyborgx0x/mcp-docmost | aleksvin8888 / isak-landin |
| --- | :---: | :---: | :---: | :---: | :---: |
| **Нужна enterprise-лицензия** | **Нет** | **Да** | Нет | Нет | Нет |
| Аутентификация | email + пароль, **авто-переавторизация** | API-ключ | email + пароль | cookie `authToken` (копировать из DevTools) | API Docmost / **напрямую PostgreSQL** |
| Чтение страницы как Markdown | ✅ | ✅ | ✅ | ✅ | ✅ (только чтение) |
| **Lossless Markdown round-trip** (экспорт/импорт, сохраняет якоря комментариев) | ✅ | — | — | — | — |
| Чтение **lossless ProseMirror JSON** (с id блоков) | ✅ | — | — | — | — |
| **Компактная структура страницы** (дешёвый поиск id блока) | ✅ | — | — | — | — |
| **Получение одного блока** (по id или индексу) | ✅ | — | — | — | — |
| Создание / перемещение / удаление страниц | ✅ | ✅ | ✅ | ✅ | — |
| **Поблочные правки** (patch/insert/delete по id) | ✅ | — | — | — | — |
| **Хирургический find/replace** (с сохранением структуры) | ✅ | — | — | — | — |
| **Скриптовый JS-трансформ** (песочница, dry-run дифф) | ✅ | — | — | — | — |
| **Структурное редактирование таблиц** (CRUD строк/ячеек) | ✅ | — | — | — | — |
| **История версий** страницы | ✅ | — | — | ✅ | — |
| **Дифф двух версий** | ✅ | — | — | — | — |
| **Восстановление версии** (обратимое) | ✅ | — | — | — | — |
| **Комментарии** (CRUD + inline-привязка) | ✅ | — | — | ✅ | — |
| **Поллинг новых комментариев** с момента времени | ✅ | — | — | — | — |
| **Изображения** (вставка / замена) | ✅ | — | — | — | — |
| **Публичные ссылки** (создать / отозвать / список) | ✅ | — | — | — | — |
| Экспорт в HTML / PDF | — | — | — | ✅ | — |
| **Безопасная запись через real-time-collab** (без затирания, с подтверждением) | ✅ | n/a | ✅ | — | n/a (только чтение) |
### Что это даёт на практике
- **Никакого enterprise-налога.** Официальный MCP Docmost — enterprise-функция: нужна
активная enterprise-лицензия. Этот сервер — MIT и работает с *любым* self-hosted Docmost
через стандартный API + сокет коллаборации, имея лишь email и пароль аккаунта.
- **Экономия токенов при редактировании.** Большинство Docmost-MCP (и официальный)
предлагают только запись «заменить всю страницу» — агент вынужден скачать весь документ,
изменить и загрузить обратно, оплачивая весь документ **дважды** на каждой мелкой
правке. Этот сервер позволяет агенту изменить ровно один блок (`patch_node` /
`insert_node` / `delete_node`), сделать find/replace с сохранением структуры
(`edit_page_text`) или скопировать страницу на стороне сервера (`copy_page_content`) —
**причём документ ни разу не проходит через модель**.
- **Записи, которые не воюют с редактором.** Наивная запись через REST конфликтует с тем,
что в этот момент печатает человек, и может молча затереть его правки или упасть на
дебаунс-сохранении Docmost. Этот сервер применяет каждое изменение через живой документ
коллаборации (Hocuspocus/Yjs), читая и записывая **синхронно в пределах одного тика
синхронизации**, чтобы никакая параллельная правка не вклинилась, сериализует записи
**по странице** мьютексом и **ждёт подтверждения сохранения от сервера** до возврата.
Если сокет отвалился посреди записи, инструмент возвращает ошибку, а не ложный успех.
- **Агентоориентированная модель редактирования.** Серверы «под человека» дают «открыть
страницу» и «заменить страницу», потому что это отражает то, как работает человек. Модель
редактирует лучше, *программируя* правку — адресуя блоки по id, делая find/replace или
передавая трансформ `(doc, ctx) => doc` (`docmost_transform`, с dry-run диффом перед
коммитом). Этот сервер построен вокруг этого — поэтому у него есть примитивы
редактирования, которых у остальных просто нет.
- **Страховка при редактировании, которой нет у других.** `list_page_history`
`diff_page_versions``restore_page_version` дают агенту (и вам) полный цикл «посмотреть
и откатить». Дифф использует *тот же* конвейер `recreateTransform → ChangeSet →
simplifyChanges`, что и встроенный просмотр истории Docmost, так что результат совпадает
с продуктом.
- **Удобство вместо выковыривания cookie.** Некоторые community-серверы аутентифицируются,
заставляя вас копировать сессионный cookie из DevTools браузера (он истекает), либо лезут
**напрямую в базу PostgreSQL**. Этот сервер логинится по учётным данным и **прозрачно
переавторизуется на 401/403** (с дедупликацией
параллельных логинов), поэтому долгоживущие агенты не падают, когда токен истёк. Он также
соблюдает контроль доступа Docmost, потому что ходит через API и сервер коллаборации как
обычный пользователь.
---
## Инструменты
Все 38 инструментов, сгруппированы по задачам, для которых вы их возьмёте.
### Чтение и поиск
- **`get_workspace`** — Информация о текущем воркспейсе Docmost.
- **`list_spaces`** — Все пространства воркспейса.
- **`list_pages`** — Недавние страницы пространства, по убыванию `updatedAt` (по умолчанию
50, максимум 100). Для поиска в больших пространствах используйте `search`.
- **`search`** — Полнотекстовый поиск по страницам и контенту (ограничен `limit`, максимум
100).
- **`get_page`** — Контент страницы как чистый **Markdown** (удобно, но это
*lossy*-представление — id блоков и точная структура таблиц/коллаутов аппроксимируются).
- **`get_page_json`** — **Lossless ProseMirror/TipTap JSON** страницы, включая `attrs.id`
каждого блока и `slugId`, используемый в URL. Именно его потребляют инструменты
поблочного редактирования.
- **`get_outline`** — Компактная структура страницы из блоков верхнего уровня (`{index,
type, id, level, firstText}`; для таблиц добавляются число строк/столбцов и тексты ячеек
заголовка, для списков — число пунктов) **без** тела документа. Дешёвый способ найти раздел или таблицу и получить
id блока перед `get_node` / `patch_node` / `insert_node`.
- **`get_node`** — Получить полное ProseMirror-поддерево одного блока (lossless), не
вытягивая всю страницу. Адресуйте его по id блока (из `get_outline` / `get_page_json`)
или формой `#<index>` для блока верхнего уровня — используйте `#<index>` для
таблиц/строк/ячеек, у которых нет id.
### Жизненный цикл страниц
- **`create_page`** — Создать страницу из Markdown и поместить в иерархию (опционально
`parentPageId`) одним вызовом. Использует import API Docmost для чистой конвертации
Markdown→ProseMirror.
- **`rename_page`** — Изменить только заголовок страницы, не трогая и не пересылая контент.
- **`move_page`** — Сменить родителя страницы (вложить или вынести в корень); поддерживает
позиционирование по fractional-index. Возвращает успех только при *положительно
подтверждённом* результате.
- **`delete_page`** — Удалить одну страницу.
- **`copy_page_content`** — Заменить тело одной страницы копией тела другой, **полностью на
стороне сервера** — документ не проходит через модель. У целевой страницы сохраняются
собственные заголовок и slug (URL не меняется).
### Редактирование
- **`edit_page_text`** — Хирургический find/replace внутри текста страницы. Сохраняет
**всю** структуру: id блоков, marks, ссылки, коллауты, таблицы. Предпочтительный
инструмент для правки формулировок, опечаток, чисел и имён.
- **`patch_node`** — Заменить один блок, адресованный по `attrs.id` (из `get_page_json`),
без пересылки документа.
- **`insert_node`** — Вставить блок до/после другого (по `attrs.id` или по якорному тексту)
либо добавить в конец.
- **`delete_node`** — Удалить один блок по его `attrs.id`.
- **`update_page_json`** — Заменить весь контент страницы документом ProseMirror (массовые
перезаписи или когда у узлов нет id). `content` опционален — опустите его, чтобы изменить
только заголовок. Сохраняет переданные id блоков, поэтому якоря заголовков и история
остаются стабильными.
- **`docmost_transform`** — Агентоориентированный интерфейс редактирования: вместо
перепечатывания документа агент **пишет функцию, которая его чинит**. Редактирует
страницу, запуская произвольный **JS-трансформ `(doc, ctx) => doc`** на её *живом*
документе ProseMirror. Работает в **песочнице** (без `require`/`process`/`fs`/сети,
таймаут 5 с). **По умолчанию dry-run**: возвращает предпросмотр диффа без записи;
установите `dryRun:false`, чтобы применить атомарно. `ctx` даёт доступ к комментариям
страницы и набору хелперов (`walk`, `getList`, `blockText`, `insertMarkerAfter`,
`setCalloutRange`, `commentsToFootnotes`, …) для многошаговых согласованных перезаписей —
например перенумерации или превращения inline-комментариев в нумерованные сноски.
### Таблицы
- **`table_get`** — Прочитать таблицу как матрицу: `{rows, cols, cells (text[][]),
cellIds}` (id абзаца на ячейку или `null`). Адресуйте таблицу через `#<index>` (из
`get_outline`) или любой id блока внутри неё. Используйте `cellIds` вместе с `patch_node`
для правок ячеек с форматированием.
- **`table_insert_row`** — Вставить строку из текстовых ячеек, дополненную до числа
столбцов таблицы (передать ячеек больше числа столбцов — ошибка). `index` — 0-based
позиция вставки (0 вставляет перед заголовком); опустите, чтобы добавить в конец.
- **`table_delete_row`** — Удалить строку по 0-based `index`. Отказывается удалять
единственную строку таблицы; удаление строки 0 делает заголовком следующую строку.
- **`table_update_cell`** — Задать текстовое содержимое ячейки `[row, col]` (0-based). Для
форматирования используйте `patch_node` по id абзаца ячейки из `table_get`.
### Markdown: экспорт и импорт
- **`export_page_markdown`** — Экспортировать страницу в один самодостаточный, **lossless
Markdown в диалекте Docmost**: мета-заголовок, тело с inline-якорями комментариев и
диаграммами и завершающий блок тредов комментариев. Рассчитан на цикл «скачать →
отредактировать тело → `import_page_markdown`», сохраняющий всё, включая выделения
комментариев.
- **`import_page_markdown`** — Заменить контент страницы из Markdown-файла в диалекте
Docmost, созданного `export_page_markdown`, восстанавливая якоря-выделения комментариев и
диаграммы из их inline-HTML. (Треды комментариев из файла не пересоздаются на сервере —
записываются только тело страницы и inline-марки комментариев; тредами управляйте через
инструменты/UI комментариев.)
### Изображения
- **`insert_image`** — Загрузить локальное изображение и вставить за один шаг: добавить в
конец, поставить вместо текстового плейсхолдера (`replaceText`) или после заданного блока
(`afterText`). Сохраняет id всех остальных блоков.
- **`replace_image`** — Заменить существующее изображение. Загружает новый файл как **новое
вложение** (чистый URL, который рендерится и сбрасывает кэш браузера), затем
перенаправляет все узлы, ссылавшиеся на старое вложение (рекурсивно, включая
коллауты/таблицы), через живой документ, сохраняя комментарии, выравнивание и alt-текст.
(Перезапись «по месту» намеренно не используется — некоторые версии Docmost портят
вложение при перезаписи.)
### Комментарии
- **`create_comment`** — Добавить комментарий к странице, опционально **привязав inline** к
точному фрагменту текста (первое вхождение оборачивается comment-маркой).
- **`list_comments`** — Список комментариев страницы (контент возвращается как Markdown).
- **`update_comment`** — Изменить существующий комментарий.
- **`delete_comment`** — Удалить комментарий.
- **`check_new_comments`** — Найти комментарии, созданные после заданной метки времени
ISO-8601, по пространству, опционально в рамках поддерева страниц — идеально для агента,
который следит за обратной связью в документе.
### Версии и история
- **`list_page_history`** — Сохранённые версии страницы (Docmost авто-снапшотит при каждом
сохранении), новые сверху, курсорная пагинация. id каждого элемента — это `historyId`.
- **`diff_page_versions`** — Дифф двух версий (или версии против живой страницы).
Возвращает вставленный/удалённый текст, счётчики целостности (изображения, ссылки,
таблицы, коллауты, маркеры сносок) и человекочитаемую Markdown-сводку — посчитано тем же
конвейером, что использует встроенный просмотр истории Docmost.
- **`restore_page_version`** — Записать сохранённую версию обратно как текущий контент. У
Docmost нет эндпоинта восстановления, поэтому создаётся **новый** снапшот — само
восстановление тоже обратимо.
### Публикация
- **`share_page`** — Сделать страницу публично доступной (идемпотентно) и вернуть её
публичный URL (`<app>/share/<key>/p/<slugId>`); опционально индексирование поисковиками.
- **`unshare_page`** — Отозвать публичный доступ к странице.
- **`list_shares`** — Все публичные ссылки воркспейса с заголовками и публичными URL.
---
## Как выбрать инструмент редактирования
Та же подсказка отдаётся в рантайме через поле `instructions` MCP-сервера, так что
подходящие клиенты направляют модель автоматически.
- **Правки текста** (формулировки, опечатки, числа): `edit_page_text`.
- **Один блок** (абзац/заголовок/коллаут/ячейка таблицы): `patch_node` / `insert_node` /
`delete_node`, адресуя узел по его `attrs.id` из `get_page_json`.
- **Изображения**: `insert_image` / `replace_image`.
- **Новая страница**: `create_page`.
- **Массовая перезапись или узлы без id**: `update_page_json`.
- **Многошаговая / скриптовая перезапись** (перенумерация, сноски, согласованные правки):
`docmost_transform` — предпросмотр через `dryRun`, затем применение.
- **Скопировать контент целой страницы из другой** (на стороне сервера):
`copy_page_content`.
- **Переименовать страницу** (только заголовок): `rename_page`.
- **Чтение**: `get_page` (Markdown) / `get_page_json` (lossless ProseMirror с id).
- **Просмотр изменений**: `list_page_history` → `diff_page_versions` →
`restore_page_version`.
- **Комментарии**: `create_comment` (с опциональной inline-привязкой) / `list_comments` /
`update_comment` / `delete_comment` / `check_new_comments`.
- **Дешёвая навигация по странице** (найти раздел/таблицу, получить id блока): `get_outline`
→ `get_node`.
- **Таблицы** (добавить/удалить строку, задать ячейку): `table_get` / `table_insert_row` /
`table_delete_row` / `table_update_cell`.
- **Round-trip страницы через Markdown** (скачать, отредактировать, залить обратно без
потерь, с комментариями): `export_page_markdown` / `import_page_markdown`.
---
## Как это устроено (технические детали)
- **Безопасная запись через коллаборацию реального времени.** Мутации контента применяются
через WebSocket коллаборации Docmost (Hocuspocus + Yjs). Сервер подключается, ждёт
первичной синхронизации, чтобы локальный документ отражал авторитетный серверный (включая
правки, которых ещё нет в дебаунс-снапшоте REST), затем **читает → трансформирует →
пишет синхронно** в одном тике, чтобы никакое удалённое обновление не вклинилось, и
**ждёт подтверждения сохранения** до возврата.
- **Сериализация записи по странице.** Асинхронный мьютекс по `pageId` гарантирует, что
две записи MCP в одну страницу никогда не пересекаются; разные страницы друг друга не
блокируют.
- **Прозрачная переавторизация.** Логин по email/паролю; истёкшие токены обновляются
автоматически на первом 401/403 (покрывая JSON, multipart-загрузку и путь токена
коллаборации), с дедупликацией параллельных логинов, так что пачка вызовов вызывает один
повторный логин.
- **Lossless- и lossy-чтение.** `get_page_json` возвращает точное дерево ProseMirror с id
блоков; `get_page` возвращает чистый Markdown для удобства.
- **Полная схема Docmost.** Конвертация Markdown↔ProseMirror поддерживает коллауты
(включая вложенные), списки задач (маркированные *и* нумерованные чек-листы), таблицы,
блоки формул, эмбеды, выделение, под/надстрочный текст и прочее, с защитными лимитами
против патологического ввода.
- **Структурные таблицы и lossless Markdown round-trip.** Таблицы можно редактировать как
матрицу (чтение, вставка/удаление строк, задание ячеек по `[row, col]`) без пересылки
документа, а страницу — экспортировать и заново импортировать как самодостаточный
Markdown-файл в диалекте Docmost, сохраняющий inline-якоря комментариев и диаграммы.
- **Ответы, оптимизированные по токенам.** Ответы API урезаются до полей, действительно
нужных агентам, а большие коллекции (пространства, страницы, комментарии, история)
пагинируются.
- **Закалённый рантайм.** Глобальные обработчики не дают случайной ошибке сокета уронить
stdio-сервер; `move_page` требует положительно подтверждённого успеха; движок диффа
откатывается к грубому поблочному диффу, а не падает на патологическом документе.
---
## Установка
```bash
npm install
npm run build
```
## Конфигурация
Серверу нужны три переменные окружения:
- `DOCMOST_API_URL` — полный URL к API вашего Docmost (например,
`https://docs.example.com/api`).
- `DOCMOST_EMAIL` — email аккаунта для аутентификации.
- `DOCMOST_PASSWORD` — пароль аккаунта.
## Использование с Claude Desktop / произвольным MCP-клиентом
Добавьте сервер в конфигурацию MCP (например, `claude_desktop_config.json`):
```json
{
"mcpServers": {
"docmost-local": {
"command": "node",
"args": ["./build/index.js"],
"env": {
"DOCMOST_API_URL": "http://localhost:3000/api",
"DOCMOST_EMAIL": "test@docmost.com",
"DOCMOST_PASSWORD": "test"
}
}
}
}
```
## Разработка
```bash
# Режим наблюдения
npm run watch
# Сборка
npm run build
# Тесты (unit + mock; live end-to-end набор требует запущенного Docmost)
npm test
npm run test:e2e
```
## Происхождение и благодарности
Проект начинался как форк
[MrMartiniMo/docmost-mcp](https://github.com/MrMartiniMo/docmost-mcp) (автор Moritz Krause)
и существенно его расширяет — добавлены поблочное редактирование узлов, хирургические
правки текста, песочница `docmost_transform`, история версий / дифф / восстановление,
комментарии, вставка/замена изображений, публичные ссылки, серверное копирование страниц,
двойное чтение JSON/Markdown, прозрачная переавторизация и значительное упрочнение.
Инструменты комментариев портированы из upstream PR #3 от Max Nikitin. Спасибо обоим.
## Лицензия
MIT

89
packages/mcp/TEST-PLAN.md Normal file
View File

@@ -0,0 +1,89 @@
# Docmost MCP — Test Plan (editing & image tools)
Manual/E2E test plan for every content-mutating tool, with special focus on
images and image replacement. Executed against a live Docmost instance
(`docs.vvzvlad.xyz`) and verified visually in Chrome (public share + authenticated
editor).
## How to run the automated part
```
DOCMOST_API_URL=https://<host>/api \
DOCMOST_EMAIL=<email> \
DOCMOST_PASSWORD=<password> \
node test-e2e.mjs
```
`test-e2e.mjs` creates a throwaway page, exercises every code path (including the
image upload/insert/replace cycle) and deletes the page afterwards. Collab writes
are debounced server-side, so the script waits ~16 s before reading back via REST.
## Test matrix
| # | Tool / path | What is checked | Expected |
|---|-------------|-----------------|----------|
| 1 | `create_page` | title with spaces, slugId returned | page created, title intact |
| 2 | `update_page` (markdown) | headings, **bold**/*italic*/~~strike~~/`code`/link, nested bullet + ordered lists, blockquote, code block, `:::callout:::`, table | all structures survive re-import |
| 3 | `get_page_json` | lossless ProseMirror, block ids, callout/table nodes | present (note: reads the **debounced** REST snapshot — recent collab writes may lag a few seconds) |
| 4 | `edit_page_text` | surgical replace; block ids + marks preserved; ambiguous match rejected; missing match reported | edits applied, ids stable, errors correct |
| 5 | `update_page_json` | full lossless write; custom block ids preserved; existing content (text edits, images, callout, table) not lost | round-trips intact |
| 6 | `upload_image` | uploads attachment, returns node | src is a **clean** `/api/files/<id>/<file>` URL, served `200 image/*` |
| 7 | `insert_image` (append / `replaceText` / `afterText`) | three placements | image lands in the right place, all other block ids preserved |
| 8 | **`replace_image`** | swap an existing figure for new bytes; comments/align/alt preserved; **the new URL must actually serve the image** | new image renders (`200`), old node repointed |
## Image-specific assertions (the recurring bug area)
For every uploaded/inserted/replaced image, assert at the HTTP level that the
`src` actually serves bytes — this is what catches "broken image" regressions:
* `GET <src>``200`, `Content-Type: image/*`, body starts with the image magic
(`89 50 4E 47` for PNG, etc.).
* `src` does **not** contain a `?v=` query (see "Known pitfalls").
* After `replace_image`: the returned `newAttachmentId` **differs** from the old
one (replacement uses a fresh attachment → fresh URL), and `GET <new src>``200`.
* The old image node on the page is repointed to the new attachmentId.
## Browser verification (Chrome)
Open the page (public `/share/<key>/p/<slug>` URL, or the authenticated editor)
and check each `<img>`:
```js
[...document.querySelectorAll('.ProseMirror img')].map(im => ({
src: im.getAttribute('src'),
loaded: im.naturalWidth > 0, // 0 ⇒ broken
}));
```
`loaded === true` (naturalWidth > 0) means the image really rendered; `0` means a
broken/empty figure.
## Known pitfalls (root-caused during testing)
1. **In-place attachment overwrite corrupts the file (HTTP 500).**
Uploading with an existing `attachmentId` (`POST /files/upload` + `attachmentId`)
overwrites the bytes in place. On this Docmost the attachment then returns
**500 for every URL** (clean, `?v=`, any filename) → broken image. Therefore
`replace_image` must upload a **new** attachment and repoint the nodes; the new
id yields a new URL that both renders and busts the browser cache. The old
attachment is left as an unreferenced orphan: Docmost exposes **no HTTP API to
delete a single content attachment** (verified against the attachment
controller/service and by probing ~20 route variants live — all 404; an
attachment unlinked from a page stays reachable with no auto-GC). Attachments
are removed only by cascade (page/space/user deletion). This matches Docmost's
own editor, which also orphans attachments on image removal/replacement.
2. **`?v=<hash>` cache-buster is unnecessary and was a red herring.**
The file endpoint serves `…/file.png?v=<hash>` exactly like the clean URL
(`200 image/*`) — verified at the HTTP layer, on the public share, and in the
authenticated editor. The broken images people saw came from pitfall #1, not
from `?v=`. Image `src` is kept clean (`/api/files/<id>/<file>`); cache-busting
on replace is achieved by the new attachment id.
3. **REST snapshot lag.** `get_page_json` reads the debounced DB snapshot, so a
write made moments earlier may not be visible yet. Wait (~16 s) before reading
back, and never feed a possibly-stale snapshot straight into `update_page_json`.
4. **Callout type narrowing (minor, open).** A `:::warning` callout is imported as
`type: "info"` — the markdown→callout conversion does not carry non-`info`
types through. Cosmetic; tracked separately.

2159
packages/mcp/build/client.js Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,92 @@
import { randomUUID } from "node:crypto";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import { isInitializeRequest } from "@modelcontextprotocol/sdk/types.js";
import { createDocmostMcpServer } from "./index.js";
/**
* Build a stateful Streamable-HTTP handler for the Docmost MCP server. The
* embedding host (the gitmost NestJS server) bridges its raw Node req/res into
* `handleRequest`. One McpServer + transport is created per MCP session and
* kept alive between requests, keyed by the `mcp-session-id` header.
*/
export function createMcpHttpHandler(config) {
// One transport (and one McpServer) per MCP session, keyed by session id.
const transports = {};
// Last activity timestamp per session id, used for idle eviction.
const lastSeen = {};
// Idle session TTL (ms): a session with no activity for this long is evicted.
// Defaults to 30 min; overridable via MCP_SESSION_IDLE_MS.
const idleTtlMs = (() => {
const parsed = parseInt(process.env.MCP_SESSION_IDLE_MS ?? "", 10);
return Number.isFinite(parsed) && parsed > 0 ? parsed : 30 * 60 * 1000;
})();
// Periodically close transports idle longer than the TTL. transport.close()
// triggers its onclose, which removes it from `transports`; we also drop the
// lastSeen entry. unref() so this timer never keeps the process alive.
const sweepIntervalMs = 5 * 60 * 1000;
const sweepTimer = setInterval(() => {
const now = Date.now();
for (const sid of Object.keys(transports)) {
if (now - (lastSeen[sid] ?? 0) > idleTtlMs) {
void transports[sid].close();
delete lastSeen[sid];
}
}
}, sweepIntervalMs);
sweepTimer.unref();
async function handleRequest(req, res, parsedBody) {
const sessionId = req.headers["mcp-session-id"];
const method = (req.method || "GET").toUpperCase();
let transport = sessionId ? transports[sessionId] : undefined;
if (method === "POST" && !transport) {
// A new session may only be created by an initialize request without a
// session id.
if (sessionId || !isInitializeRequest(parsedBody)) {
res.statusCode = 400;
res.setHeader("Content-Type", "application/json");
res.end(JSON.stringify({
jsonrpc: "2.0",
error: {
code: -32000,
message: "Bad Request: no valid session ID provided",
},
id: null,
}));
return;
}
transport = new StreamableHTTPServerTransport({
sessionIdGenerator: () => randomUUID(),
onsessioninitialized: (sid) => {
transports[sid] = transport;
lastSeen[sid] = Date.now();
},
});
transport.onclose = () => {
const sid = transport.sessionId;
if (sid && transports[sid])
delete transports[sid];
};
const server = createDocmostMcpServer(config);
await server.connect(transport);
await transport.handleRequest(req, res, parsedBody);
return;
}
if (!transport) {
res.statusCode = 400;
res.setHeader("Content-Type", "application/json");
res.end(JSON.stringify({
jsonrpc: "2.0",
error: {
code: -32000,
message: "Bad Request: no valid session ID provided",
},
id: null,
}));
return;
}
// Routing to an existing transport: refresh its idle timestamp.
if (sessionId)
lastSeen[sessionId] = Date.now();
await transport.handleRequest(req, res, parsedBody);
}
return { handleRequest };
}

777
packages/mcp/build/index.js Normal file
View File

@@ -0,0 +1,777 @@
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { z } from "zod";
import { readFileSync } from "fs";
import { fileURLToPath } from "url";
import { dirname, join } from "path";
import { DocmostClient } from "./client.js";
// Read version from package.json
const __filename = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);
const packageJson = JSON.parse(readFileSync(join(__dirname, "../package.json"), "utf-8"));
const VERSION = packageJson.version;
// --- Modern McpServer Implementation ---
// Editing guide surfaced to MCP clients in the initialize result so they can
// pick the right tool by intent and avoid resending whole documents.
const SERVER_INSTRUCTIONS = "Docmost editing guide — choose the tool by intent: fix wording/typos/numbers (text inside blocks) -> edit_page_text (no node id needed). Change ONE block (paragraph/heading/callout/table cell/etc.) structurally -> patch_node (address by attrs.id from get_page_json). Add a block -> insert_node (before/after a block by attrs.id or by anchor text, or append). Remove a block -> delete_node (by attrs.id). Images -> insert_image (place a local image file) / replace_image (swap an existing image file). New page -> create_page (Markdown). Bulk/structural rewrite or nodes without an id -> update_page_json (full ProseMirror replace; prefer the granular tools above to avoid resending the whole ~100KB+ document). Copy/replace a page's whole content from another page (server-side, no document through the model) -> copy_page_content. Rename a page (title only) -> rename_page. Read -> get_page (Markdown, lossy) or get_page_json (lossless ProseMirror with block ids). Comments -> create_comment (an inline comment anchors to its selection text), list_comments, update_comment, delete_comment, check_new_comments. Tip: read block ids via get_page_json, then use patch_node/insert_node/delete_node so you never resend the full document. " +
"Complex/scripted rewrite (multiple coordinated edits, footnotes, renumbering) -> docmost_transform: write a JS `(doc, ctx) => doc` transform, preview the diff with dryRun (default), then apply with dryRun:false; ctx.helpers includes commentsToFootnotes for turning inline comments into numbered footnotes. " +
"Review what changed -> diff_page_versions (compare a historyId to current, or two history versions). See a page's saved versions -> list_page_history. Undo a bad edit -> restore_page_version (writes a past version back as current; itself revertible). " +
"Lossless markdown round-trip (download, edit, re-upload, incl. comment anchors) -> export_page_markdown / import_page_markdown.";
// Helper to format JSON responses
const jsonContent = (data) => ({
content: [{ type: "text", text: JSON.stringify(data, null, 2) }],
});
/**
* Create a fully configured Docmost MCP server. Side-effect-free: it does not
* read environment variables and does not connect any transport — the caller
* decides how to expose it (stdio or HTTP). The client talks to Docmost over
* REST + the collaboration WebSocket using the provided service-account
* credentials and auto-re-authenticates.
*/
export function createDocmostMcpServer(config) {
const docmostClient = new DocmostClient(config.apiUrl, config.email, config.password);
const server = new McpServer({
name: "docmost-mcp",
version: VERSION,
}, { instructions: SERVER_INSTRUCTIONS });
// Tool: get_workspace
server.registerTool("get_workspace", {
description: "Get the current Docmost workspace",
}, async () => {
const workspace = await docmostClient.getWorkspace();
return jsonContent(workspace);
});
// Tool: list_spaces
server.registerTool("list_spaces", {
description: "List all available spaces in Docmost",
}, async () => {
const spaces = await docmostClient.getSpaces();
return jsonContent(spaces);
});
// Tool: list_pages
server.registerTool("list_pages", {
description: "List most recent pages in a space ordered by updatedAt (descending). " +
"Returns a bounded list (default 50, max 100) — use search for lookups " +
"in large spaces.",
inputSchema: {
spaceId: z.string().optional(),
limit: z
.number()
.int()
.min(1)
.max(100)
.optional()
.describe("Max pages to return (default 50, max 100)"),
},
}, async ({ spaceId, limit }) => {
const result = await docmostClient.listPages(spaceId, limit ?? 50);
return jsonContent(result);
});
// Tool: get_page
server.registerTool("get_page", {
description: "Get page details with content converted to Markdown. The conversion is " +
"LOSSY (block ids, exact table/callout structure are approximated); for a " +
"lossless representation use get_page_json.",
inputSchema: {
pageId: z.string().min(1),
},
}, async ({ pageId }) => {
const page = await docmostClient.getPage(pageId);
return jsonContent(page);
});
// Tool: get_page_json
server.registerTool("get_page_json", {
description: "Get page details with the raw ProseMirror JSON content (lossless: " +
"includes block ids, callouts, tables, link/image attributes) plus the " +
"slugId used in URLs. Use together with update_page_json for precise " +
"structural edits, or edit_page_text for simple text fixes.",
inputSchema: {
pageId: z.string().min(1),
},
}, async ({ pageId }) => {
const page = await docmostClient.getPageJson(pageId);
return jsonContent(page);
});
// Tool: get_outline
server.registerTool("get_outline", {
description: "Return a COMPACT outline of a page's top-level blocks ({index, type, " +
"id, level, firstText}; tables add rows/cols/header; lists add item " +
"count) WITHOUT the full document body. Use it to locate sections/tables " +
"and grab block ids cheaply before get_node / patch_node / insert_node.",
inputSchema: {
pageId: z.string().min(1),
},
}, async ({ pageId }) => {
const result = await docmostClient.getOutline(pageId);
return jsonContent(result);
});
// Tool: get_node
server.registerTool("get_node", {
description: "Fetch a single node's full ProseMirror subtree (lossless) without " +
"pulling the whole document. `nodeId` is a block id from get_outline/" +
"get_page_json (works for headings/paragraphs/callouts/images), OR " +
"`#<index>` to fetch a top-level block by its outline index — use the " +
"`#<index>` form for tables/rows/cells, which carry no id.",
inputSchema: {
pageId: z.string().min(1),
nodeId: z.string().min(1),
},
}, async ({ pageId, nodeId }) => {
const result = await docmostClient.getNode(pageId, nodeId);
return jsonContent(result);
});
// Tool: table_get
server.registerTool("table_get", {
description: "Read a table as a matrix. Returns {rows, cols, cells (text[][]), " +
"cellIds (paragraph id per cell, or null)}. `table` = `#<index>` from " +
"get_outline, or any block id inside the table. Use cellIds with " +
"patch_node for rich-formatted cell edits. `cols` is the FIRST row's " +
"width; ragged tables may vary per row, so use the per-row length of " +
"`cells` for each row.",
inputSchema: {
pageId: z.string().min(1),
table: z.string().min(1),
},
}, async ({ pageId, table }) => {
const result = await docmostClient.getTable(pageId, table);
return jsonContent(result);
});
// Tool: table_insert_row
server.registerTool("table_insert_row", {
description: "Insert a row of plain-text cells into a table. `table` = `#<index>` or " +
"a block id inside it. `cells` = text per column (padded to the table's " +
"column count; error if more cells than columns). `index` = 0-based " +
"insert position (0 inserts before the header); omit to append at the end.",
inputSchema: {
pageId: z.string().min(1),
table: z.string().min(1),
cells: z.array(z.string()),
index: z.number().int().optional(),
},
}, async ({ pageId, table, cells, index }) => {
const result = await docmostClient.tableInsertRow(pageId, table, cells, index);
return jsonContent(result);
});
// Tool: table_delete_row
server.registerTool("table_delete_row", {
description: "Delete the row at 0-based `index` from a table (`table` = `#<index>` or " +
"a block id inside it). Refuses to delete the table's only row. An " +
"out-of-range `index` throws. Deleting `index` 0 removes the header row, " +
"and the next row becomes the new header.",
inputSchema: {
pageId: z.string().min(1),
table: z.string().min(1),
index: z.number().int(),
},
}, async ({ pageId, table, index }) => {
const result = await docmostClient.tableDeleteRow(pageId, table, index);
return jsonContent(result);
});
// Tool: table_update_cell
server.registerTool("table_update_cell", {
description: "Set the plain-text content of cell [row,col] (0-based) in a table " +
"(`table` = `#<index>` or a block id inside it). Replaces the cell's " +
"content with a single text paragraph; for rich formatting use patch_node " +
"on the cell's paragraph id from table_get.",
inputSchema: {
pageId: z.string().min(1),
table: z.string().min(1),
row: z.number().int(),
col: z.number().int(),
text: z.string(),
},
}, async ({ pageId, table, row, col, text }) => {
const result = await docmostClient.tableUpdateCell(pageId, table, row, col, text);
return jsonContent(result);
});
// Tool: create_page
server.registerTool("create_page", {
description: "Create a new page with content (automatically moves it to the correct hierarchy).",
inputSchema: {
title: z.string().min(1).describe("Title of the page"),
content: z.string().min(1).describe("Markdown content"),
spaceId: z.string().min(1),
parentPageId: z
.string()
.optional()
.describe("Optional parent page ID to nest under"),
},
}, async ({ title, content, spaceId, parentPageId }) => {
const result = await docmostClient.createPage(title, content, spaceId, parentPageId);
return jsonContent(result);
});
// Tool: update_page_json
server.registerTool("update_page_json", {
description: "Replace a page's content with a raw ProseMirror JSON document " +
"(lossless write: preserves the block ids, callouts, tables and " +
"attributes you pass in). Typical flow: get_page_json -> modify the " +
"JSON -> update_page_json. Keep existing node ids intact so heading " +
"anchors and history stay stable. `content` is OPTIONAL: omit it to " +
"update only the title (though prefer rename_page for a title-only " +
"change). Supplying neither content nor title is an error.",
inputSchema: {
pageId: z.string().min(1).describe("ID of the page to update"),
content: z
.any()
.optional()
.describe('ProseMirror document: {"type":"doc","content":[...]}. Omit to rename only.'),
title: z.string().optional().describe("Optional new title"),
},
}, async ({ pageId, content, title }) => {
// Only parse/validate the document when it was actually supplied; when it
// is omitted, pass it straight through so the client performs a title-only
// (or no-op) update.
let doc;
if (content === undefined || content === null) {
doc = undefined;
}
else if (typeof content === "string") {
try {
doc = JSON.parse(content);
}
catch {
throw new Error("content was a string but not valid JSON");
}
}
else {
doc = content;
}
const result = await docmostClient.updatePageJson(pageId, doc, title);
return jsonContent(result);
});
// Tool: export_page_markdown
server.registerTool("export_page_markdown", {
description: "Export a page to a single self-contained, lossless Docmost-flavoured " +
"Markdown file (custom extensions): YAML-free meta header, body with " +
"inline comment anchors and diagrams, and a trailing comments-thread " +
"block. Designed for a download -> edit body -> import_page_markdown " +
"round-trip that preserves everything, including comment highlights. " +
"Comment THREADS are preserved in the file but are not re-pushed to the " +
"server on import.",
inputSchema: {
pageId: z.string().min(1),
},
}, async ({ pageId }) => {
const md = await docmostClient.exportPageMarkdown(pageId);
return { content: [{ type: "text", text: md }] };
});
// Tool: import_page_markdown
server.registerTool("import_page_markdown", {
description: "Replace a page's content from a self-contained Docmost-flavoured " +
"Markdown file produced by export_page_markdown. Restores comment " +
"highlight anchors and diagrams from their inline HTML. NOTE: comment " +
"thread records are NOT created/updated/deleted on the server by this " +
"tool — only the page body + inline comment marks are written; manage " +
"comment threads via the comment tools/UI.",
inputSchema: {
pageId: z.string().min(1),
markdown: z.string().min(1),
},
}, async ({ pageId, markdown }) => {
const res = await docmostClient.importPageMarkdown(pageId, markdown);
return jsonContent(res);
});
// Tool: copy_page_content
server.registerTool("copy_page_content", {
description: "Replace targetPageId's content with a copy of sourcePageId's content, " +
"entirely server-side — the document is NOT sent through the model. The " +
"target keeps its own title and slug; only its body is replaced. Ideal " +
"for 'make page A's content equal to B' or 'replace A with B but keep A's URL'.",
inputSchema: {
sourcePageId: z.string().min(1).describe("Page to copy content FROM"),
targetPageId: z
.string()
.min(1)
.describe("Page whose content is REPLACED (title/slug kept)"),
},
}, async ({ sourcePageId, targetPageId }) => {
const result = await docmostClient.copyPageContent(sourcePageId, targetPageId);
return jsonContent(result);
});
// Tool: rename_page
server.registerTool("rename_page", {
description: "Rename a page (change its title only) without touching or resending " +
"its content.",
inputSchema: {
pageId: z.string().min(1).describe("ID of the page to rename"),
title: z.string().min(1).describe("New title"),
},
}, async ({ pageId, title }) => {
const result = await docmostClient.renamePage(pageId, title);
return jsonContent(result);
});
// Tool: edit_page_text
server.registerTool("edit_page_text", {
description: "Surgical find/replace inside a page's text. Preserves ALL structure: " +
"block ids, marks, links, callouts, tables. Each `find` must match " +
"exactly once (or set replaceAll). A match must lie inside one " +
"formatting run; if the target text crosses bold/link boundaries the " +
"tool reports it — use a shorter fragment or update_page_json then. " +
"This is the preferred tool for fixing wording, typos, numbers, names.",
inputSchema: {
pageId: z.string().describe("ID of the page to edit"),
edits: z
.array(z.object({
find: z.string().describe("Exact text to find"),
replace: z.string().describe("Replacement text (may be empty)"),
replaceAll: z
.boolean()
.optional()
.describe("Replace every occurrence (default: must match once)"),
}))
.min(1)
.describe("List of find/replace operations, applied in order"),
},
}, async ({ pageId, edits }) => {
const result = await docmostClient.editPageText(pageId, edits);
return jsonContent(result);
});
// Tool: patch_node
server.registerTool("patch_node", {
description: "Replaces a single block identified by its attrs.id WITHOUT resending the " +
"whole document. Get the block id from get_page_json, then pass a " +
"ProseMirror node to put in its place. Cheaper and safer than " +
"update_page_json for one-block structural edits.",
inputSchema: {
pageId: z.string().min(1),
nodeId: z.string().min(1),
node: z
.any()
.describe("ProseMirror node JSON to put in place of the node with this id"),
},
}, async ({ pageId, nodeId, node }) => {
let parsedNode;
if (typeof node === "string") {
try {
parsedNode = JSON.parse(node);
}
catch {
throw new Error("node was a string but not valid JSON");
}
}
else {
parsedNode = node;
}
const result = await docmostClient.patchNode(pageId, nodeId, parsedNode);
return jsonContent(result);
});
// Tool: insert_node
server.registerTool("insert_node", {
description: "Insert a block before/after another block (by attrs.id or anchor text) " +
"or append at the end. Get anchor block ids from get_page_json. Avoids " +
"resending the whole document. Can also insert table structure: to add a " +
"tableRow, pass a tableRow node with position before/after and anchor " +
"INSIDE the target table — anchorNodeId of any block/cell in it, or " +
"anchorText matching the table; to add a tableCell/tableHeader, use " +
"anchorNodeId of a block inside the target row (anchorText only resolves " +
"top-level blocks, so it cannot target a row). Note: append is top-level " +
"only and rejects structural table nodes.",
inputSchema: {
pageId: z.string().min(1),
node: z.any(),
position: z.enum(["before", "after", "append"]),
anchorNodeId: z.string().optional(),
anchorText: z.string().optional(),
},
}, async ({ pageId, node, position, anchorNodeId, anchorText }) => {
let parsedNode;
if (typeof node === "string") {
try {
parsedNode = JSON.parse(node);
}
catch {
throw new Error("node was a string but not valid JSON");
}
}
else {
parsedNode = node;
}
const result = await docmostClient.insertNode(pageId, parsedNode, {
position,
anchorNodeId,
anchorText,
});
return jsonContent(result);
});
// Tool: delete_node
server.registerTool("delete_node", {
description: "Remove a single block by its attrs.id (from get_page_json) WITHOUT " +
"resending the whole document.",
inputSchema: {
pageId: z.string().min(1),
nodeId: z.string().min(1),
},
}, async ({ pageId, nodeId }) => {
const result = await docmostClient.deleteNode(pageId, nodeId);
return jsonContent(result);
});
// Tool: insert_image
server.registerTool("insert_image", {
description: "Upload a local image and insert it into a page in one step. By default " +
"appends the image at the end of the page. With replaceText, replaces the " +
"first top-level block whose text contains that string (handy for " +
'swapping a text placeholder like "[image: foo.png]" for the real image). ' +
"With afterText, inserts the image right after the first block containing " +
"that string. Preserves all other block ids.",
inputSchema: {
pageId: z.string().min(1),
filePath: z
.string()
.min(1)
.describe("Absolute local path to the image file"),
align: z.enum(["left", "center", "right"]).optional(),
alt: z.string().optional(),
replaceText: z
.string()
.optional()
.describe("Replace the first top-level block whose text contains this string with the image"),
afterText: z
.string()
.optional()
.describe("Insert the image right after the first top-level block whose text contains this string"),
},
}, async ({ pageId, filePath, align, alt, replaceText, afterText }) => {
const result = await docmostClient.insertImage(pageId, filePath, {
align,
alt,
replaceText,
afterText,
});
return jsonContent(result);
});
// Tool: replace_image
server.registerTool("replace_image", {
description: "Replace an existing image on a page: uploads the new file as a NEW " +
"attachment (fresh clean URL that renders and busts browser caches), then " +
"repoints every image node referencing the old attachmentId (recursively, " +
"incl. callouts/tables) via the live document, preserving comments, " +
"alignment and alt. The old attachment is left as an unreferenced orphan " +
"(Docmost has no API to delete a single attachment; it is removed only when " +
"the page/space is deleted). In-place byte overwrite is avoided because some " +
"Docmost versions corrupt the attachment (HTTP 500) on overwrite.",
inputSchema: {
pageId: z.string().min(1),
attachmentId: z
.string()
.min(1)
.describe("attachmentId of the image currently in the page to replace"),
filePath: z
.string()
.min(1)
.describe("Absolute local path to the new image file"),
align: z.enum(["left", "center", "right"]).optional(),
alt: z.string().optional(),
},
}, async ({ pageId, attachmentId, filePath, align, alt }) => {
const result = await docmostClient.replaceImage(pageId, attachmentId, filePath, {
align,
alt,
});
return jsonContent(result);
});
// Tool: share_page
server.registerTool("share_page", {
description: "Make a page publicly accessible (idempotent) and return its public " +
"URL. The URL format is <app>/share/<key>/p/<slugId>.",
inputSchema: {
pageId: z.string().min(1).describe("ID of the page to share"),
searchIndexing: z
.boolean()
.optional()
.describe("Allow search engines to index the page (default true)"),
},
}, async ({ pageId, searchIndexing }) => {
const result = await docmostClient.sharePage(pageId, searchIndexing ?? true);
return jsonContent(result);
});
// Tool: unshare_page
server.registerTool("unshare_page", {
description: "Remove the public share of a page (revokes the public URL).",
inputSchema: {
pageId: z.string().min(1).describe("ID of the page to unshare"),
},
}, async ({ pageId }) => {
const result = await docmostClient.unsharePage(pageId);
return jsonContent(result);
});
// Tool: list_shares
server.registerTool("list_shares", {
description: "List all public shares in the workspace with page titles and public URLs.",
}, async () => {
const result = await docmostClient.listShares();
return jsonContent(result);
});
// Tool: move_page
server.registerTool("move_page", {
description: "Move a page to a new parent (nesting) or root. Essential for organizing pages created via 'create_page'.",
inputSchema: {
pageId: z.string().min(1),
parentPageId: z
.string()
.nullable()
.optional()
.describe("Target parent page ID. Pass 'null' or empty string to move to root."),
position: z
.string()
.min(5)
.optional()
.describe("fractional-index position key; min 5 chars; omit to append at the end."),
},
}, async ({ pageId, parentPageId, position }) => {
const finalParentId = parentPageId === "" || parentPageId === "null" ? null : parentPageId;
// Cheap cycle guard: a page cannot be moved directly under itself.
// (Deeper descendant-cycle detection is intentionally out of scope.)
if (finalParentId !== null && finalParentId === pageId) {
throw new Error("cannot move a page under itself");
}
const result = await docmostClient.movePage(pageId, finalParentId || null, position);
// Require POSITIVE confirmation: the live /pages/move success shape is
// exactly { success: true, status: 200 }. An empty body, a 204, or any odd
// shape lacking success === true must NOT be reported as a successful move,
// so we surface the raw API result instead of declaring success.
if (!(result && typeof result === "object" && result.success === true)) {
throw new Error(`Failed to move page ${pageId}: ${JSON.stringify(result)}`);
}
return jsonContent({
message: `Successfully moved page ${pageId} to parent ${finalParentId || "root"}`,
result,
});
});
// Tool: delete_page
server.registerTool("delete_page", {
description: "Delete a single page by ID.",
inputSchema: {
pageId: z.string().min(1),
},
}, async ({ pageId }) => {
await docmostClient.deletePage(pageId);
return {
content: [
{ type: "text", text: `Successfully deleted page ${pageId}` },
],
};
});
// --- Comment tools (ported from upstream PR #3 by Max Nikitin) ---
// Tool: list_comments
server.registerTool("list_comments", {
description: "List all comments on a page (paginated). Content is returned as Markdown.",
inputSchema: {
pageId: z.string().describe("ID of the page"),
},
}, async ({ pageId }) => {
const comments = await docmostClient.listComments(pageId);
return jsonContent(comments);
});
// Tool: create_comment
server.registerTool("create_comment", {
description: "Create a new comment on a page. Content is provided as Markdown and " +
"automatically converted to the required format.",
inputSchema: {
pageId: z.string().describe("ID of the page to comment on"),
content: z.string().min(1).describe("Comment content in Markdown format"),
type: z
.enum(["page", "inline"])
.optional()
.describe("Comment type: 'page' for general page comment (default), 'inline' for text selection comment"),
selection: z
.string()
// Enforce the documented 250-char cap to match the description above.
.max(250)
.optional()
.describe("For an inline comment, the EXACT text in the page to anchor/highlight the comment on (the first occurrence of this text is wrapped in a comment mark). Max 250 chars. Required when type is 'inline'."),
parentCommentId: z
.string()
.optional()
.describe("Parent comment ID to create a reply (max 2 nesting levels)"),
},
}, async ({ pageId, content, type, selection, parentCommentId }) => {
const result = await docmostClient.createComment(pageId, content, type || "page", selection, parentCommentId);
return jsonContent(result);
});
// Tool: update_comment
server.registerTool("update_comment", {
description: "Update an existing comment's content. Only the comment creator can " +
"update it. Content is provided as Markdown.",
inputSchema: {
commentId: z.string().min(1).describe("ID of the comment to update"),
content: z
.string()
.min(1)
.describe("New comment content in Markdown format"),
},
}, async ({ commentId, content }) => {
const result = await docmostClient.updateComment(commentId, content);
return jsonContent(result);
});
// Tool: delete_comment
server.registerTool("delete_comment", {
description: "Delete a comment. Only the comment creator or space admin can delete it.",
inputSchema: {
commentId: z.string().min(1).describe("ID of the comment to delete"),
},
}, async ({ commentId }) => {
await docmostClient.deleteComment(commentId);
return {
content: [
{
type: "text",
text: `Successfully deleted comment ${commentId}`,
},
],
};
});
// Tool: check_new_comments
server.registerTool("check_new_comments", {
description: "Check for new comments across pages in a space since a given timestamp. " +
"Optionally scope to a page subtree (folder). Returns only comments " +
"created after the specified time.",
inputSchema: {
spaceId: z.string().describe("Space ID to check for new comments"),
since: z
.string()
.min(1)
.describe("ISO 8601 timestamp — only return comments created after this time (e.g. '2026-03-10T00:00:00Z')"),
parentPageId: z
.string()
.optional()
.describe("Optional root page ID to scope the check to a subtree (folder). " +
"Only pages under this parent will be checked."),
},
}, async ({ spaceId, since, parentPageId }) => {
// Reject an unparseable timestamp up front: otherwise the comparison
// against NaN silently treats every comment as "not new" and the tool
// returns zero results without signalling the bad input.
if (Number.isNaN(Date.parse(since))) {
throw new Error(`Invalid 'since' timestamp: ${JSON.stringify(since)} — expected an ISO 8601 date (e.g. '2026-03-10T00:00:00Z')`);
}
const result = await docmostClient.checkNewComments(spaceId, since, parentPageId);
return jsonContent(result);
});
// Tool: search
server.registerTool("search", {
description: "Search for pages and content. Results are bounded by `limit` " +
"(default applied by the client, max 100).",
inputSchema: {
query: z.string().min(1).describe("Search query"),
limit: z
.number()
.int()
.min(1)
.max(100)
.optional()
.describe("Max results to return (max 100)"),
},
}, async ({ query, limit }) => {
// The tool exposes no spaceId filter, so pass undefined for the client's
// optional spaceId parameter and forward limit into its correct slot.
const result = await docmostClient.search(query, undefined, limit);
return jsonContent(result);
});
// Tool: docmost_transform
server.registerTool("docmost_transform", {
description: "Edit a page by running an arbitrary JS transform `(doc, ctx) => doc` " +
"against its LIVE ProseMirror document, with a diff preview and page " +
"history as the safety net. By default dryRun=true: returns a diff " +
"preview WITHOUT writing. Set dryRun=false to apply (atomic, won't " +
"clobber concurrent edits). `doc` is the lossless ProseMirror document " +
"({type:'doc',content:[...]}); return a new doc of the same shape. " +
"`ctx` gives you: comments (the page's comments, each {id, content " +
"(markdown), selection, type}); log (array; console.log pushes to it); " +
"consume(id) (mark a comment id as consumed — those are deleted when " +
"deleteComments=true after a successful apply); and helpers: " +
"blockText(node) (plain text), walk(node, fn) (depth-first over all " +
"nodes incl. callouts/tables/lists), getList(doc, predicate) (find a " +
"node even without attrs.id), insertMarkerAfter(doc, anchor, marker, " +
"{beforeBlock}) (insert a plain unmarked text run after anchor, " +
"mark-safe), setCalloutRange(doc, n) (sync a [1]…[K] callout range to " +
"[1]…[n]), noteItem(inlineNodes) (wrap inline nodes in a listItem with a " +
"fresh id), mdToInlineNodes(markdown) (comment markdown -> inline nodes), " +
"and commentsToFootnotes(doc, comments, {notesHeading}) (turn inline " +
"comments into numbered footnotes). Footnote convention: markers are " +
"plain '[N]' text in the body; the notes are an orderedList under a " +
"heading whose text is 'Примечания переводчика'. The transform runs " +
"sandboxed (no require/process/fs/network, 5s timeout) and must return a " +
"{type:'doc'} node.",
inputSchema: {
pageId: z.string().min(1),
transformJs: z
.string()
.min(1)
.describe("A JS function `(doc, ctx) => doc` (expression-arrow or " +
"parenthesized function). It receives a clone of the live doc and " +
"ctx (comments, log, consume(id), helpers: blockText/walk/getList/" +
"insertMarkerAfter/setCalloutRange/noteItem/mdToInlineNodes/" +
"commentsToFootnotes) and must return a {type:'doc'} node."),
dryRun: z
.boolean()
.optional()
.default(true)
.describe("Preview only (no write) when true (default)."),
deleteComments: z
.boolean()
.optional()
.default(false)
.describe("After a successful apply, delete every comment id passed to " +
"ctx.consume(id)."),
},
}, async ({ pageId, transformJs, dryRun, deleteComments }) => {
const result = await docmostClient.transformPage(pageId, transformJs, {
dryRun,
deleteComments,
});
return jsonContent(result);
});
// Tool: diff_page_versions
server.registerTool("diff_page_versions", {
description: "Diff two versions of a page and return a Docmost-equivalent change set " +
"(inserted/deleted text, integrity counts for images/links/tables/" +
"callouts/footnote markers, and a human-readable markdown summary). " +
"`from`/`to` each accept a historyId, or null/'current' for the page's " +
"current content (defaults: from=current, to=current — pass a historyId " +
"from list_page_history to compare against the live page).",
inputSchema: {
pageId: z.string().min(1),
from: z
.string()
.optional()
.describe("historyId, or 'current'/omit for current content"),
to: z
.string()
.optional()
.describe("historyId, or 'current'/omit for current content"),
},
}, async ({ pageId, from, to }) => {
const result = await docmostClient.diffPageVersions(pageId, from, to);
return jsonContent(result);
});
// Tool: list_page_history
server.registerTool("list_page_history", {
description: "List a page's saved versions (Docmost auto-snapshots on every save), " +
"newest first, cursor-paginated. Returns { items, nextCursor }; each " +
"item's id is the historyId to pass to diff_page_versions or " +
"restore_page_version.",
inputSchema: {
pageId: z.string().min(1),
cursor: z
.string()
.optional()
.describe("Pagination cursor from a previous nextCursor"),
},
}, async ({ pageId, cursor }) => {
const result = await docmostClient.listPageHistory(pageId, cursor);
return jsonContent(result);
});
// Tool: restore_page_version
server.registerTool("restore_page_version", {
description: "Restore a page to a saved version: writes that version's content back " +
"as the page's current content (Docmost has no restore endpoint, so " +
"this creates a NEW history snapshot — the restore is itself revertible). " +
"Get the historyId from list_page_history.",
inputSchema: {
historyId: z.string().min(1),
},
}, async ({ historyId }) => {
const result = await docmostClient.restorePageVersion(historyId);
return jsonContent(result);
});
return server;
}

View File

@@ -0,0 +1,74 @@
import axios from "axios";
export async function getCollabToken(baseUrl, apiToken) {
try {
const response = await axios.post(`${baseUrl}/auth/collab-token`, {}, {
headers: {
Authorization: `Bearer ${apiToken}`,
"Content-Type": "application/json",
},
});
// console.error('Collab Token Response:', response.data);
// Response is wrapped in { data: { token: ... } }
return response.data.data?.token || response.data.token;
}
catch (error) {
if (axios.isAxiosError(error)) {
// Attach the HTTP status to the plain Error so callers (e.g.
// getCollabTokenWithReauth) can still detect a 401/403 after the
// original AxiosError has been wrapped away.
// Avoid leaking the full server response body by default; include only
// status + statusText. Append the body only when DEBUG is set.
let message = `Failed to get collab token: ${error.response?.status} ${error.response?.statusText}`;
if (process.env.DEBUG) {
message += ` - ${JSON.stringify(error.response?.data)}`;
}
const err = new Error(message);
err.status = error.response?.status;
throw err;
}
throw error;
}
}
export async function performLogin(baseUrl, email, password) {
try {
const response = await axios.post(`${baseUrl}/auth/login`, {
email,
password,
});
// Extract token from Set-Cookie header
const cookies = response.headers["set-cookie"];
if (!cookies) {
throw new Error("No Set-Cookie header found in login response");
}
// Match the cookie name exactly to avoid matching a future
// authTokenRefresh cookie (startsWith would catch it).
const authCookie = cookies.find((c) => {
const kv = c.split(";")[0];
return kv.slice(0, kv.indexOf("=")) === "authToken";
});
if (!authCookie) {
throw new Error("No authToken cookie found in login response");
}
// Take everything after the FIRST "=" up to the first ";".
// Splitting on "=" would truncate base64 values containing "=" padding.
const kv = authCookie.split(";")[0];
const token = kv.slice(kv.indexOf("=") + 1);
return token;
}
catch (error) {
// Avoid leaking the full server response body by default; log only the
// HTTP status. Log the verbose body only when DEBUG is set.
if (axios.isAxiosError(error)) {
if (process.env.DEBUG) {
console.error("Login failed:", error.response?.data);
}
else {
console.error("Login failed:", error.response?.status);
}
}
else {
console.error("Login failed:", error.message);
}
throw error;
}
}

View File

@@ -0,0 +1,553 @@
import { HocuspocusProvider } from "@hocuspocus/provider";
import { TiptapTransformer } from "@hocuspocus/transformer";
import * as Y from "yjs";
import WebSocket from "ws";
import { marked } from "marked";
import { generateJSON } from "@tiptap/html";
import { JSDOM } from "jsdom";
import { docmostExtensions } from "./docmost-schema.js";
import { withPageLock } from "./page-lock.js";
import { sanitizeForYjs, findUnstorableAttr } from "./node-ops.js";
// Setup DOM environment for Tiptap HTML parsing in Node.js
const dom = new JSDOM("<!DOCTYPE html><html><body></body></html>");
global.window = dom.window;
global.document = dom.window.document;
// @ts-ignore
global.Element = dom.window.Element;
// @ts-ignore
global.WebSocket = WebSocket;
// Navigator is read-only in newer Node versions and already exists
// global.navigator = dom.window.navigator;
/**
* Hard ceiling above which we skip callout preprocessing entirely. The linear
* scanner below has no quadratic blow-up, but we still cap input defensively so
* a pathological multi-megabyte payload cannot tie up the event loop; in that
* case the markdown is passed through verbatim (callouts are simply not
* detected) rather than risking a slow scan.
*/
const MAX_CALLOUT_PREPROCESS_BYTES = 4 * 1024 * 1024; // 4 MB
/** Matches an opening callout fence: `:::type` (type captured, lower-cased). */
const CALLOUT_OPEN_RE = /^:::\s*(\w+)\s*$/;
/** Matches a bare closing callout fence: `:::`. */
const CALLOUT_CLOSE_RE = /^:::\s*$/;
/** Matches the start/end of a code fence (``` or ~~~), capturing the marker. */
const CODE_FENCE_RE = /^(\s*)(`{3,}|~{3,})/;
/**
* Pre-process Docmost-flavoured markdown: convert `:::type ... :::`
* callout blocks (the syntax our markdown export produces) into HTML
* divs that the callout extension parses. The inner content is rendered
* through marked as regular markdown.
*
* Implemented as a single linear pass over the lines (no quadratic regex
* rescan). It:
* - tracks fenced code regions (```...``` and ~~~...~~~) and never treats a
* `:::` line that lives inside a code fence as a callout delimiter, so a
* callout body that itself contains a fenced code block with a `:::` line is
* no longer corrupted;
* - matches an opening `:::type` line with the next CLOSING `:::` at the SAME
* nesting level, supporting NESTED callouts via a depth counter (an inner
* `:::type` opens a deeper level and consumes a matching `:::`);
* - emits the same `<div data-type="callout" data-callout-type="TYPE">` output
* (inner rendered through marked) as the previous regex implementation.
*/
async function preprocessCallouts(markdown) {
// Defensive cap: skip preprocessing for pathologically large inputs.
if (markdown.length > MAX_CALLOUT_PREPROCESS_BYTES) {
return markdown;
}
// Recursively transform a slice of lines, converting top-level callouts in
// that slice into <div> blocks and rendering their inner content (which may
// itself contain nested callouts) through this same function.
const transform = async (lines) => {
const out = [];
let inCodeFence = false;
let codeFenceMarker = ""; // the exact run of backticks/tildes that opened it
let i = 0;
while (i < lines.length) {
const line = lines[i];
// Inside a code fence, only its matching closing fence is significant;
// everything else (including `:::` lines) is copied through verbatim.
if (inCodeFence) {
out.push(line);
const fence = line.match(CODE_FENCE_RE);
if (fence && fence[2].startsWith(codeFenceMarker[0]) &&
fence[2].length >= codeFenceMarker.length) {
inCodeFence = false;
codeFenceMarker = "";
}
i++;
continue;
}
// A code fence opening outside any callout body: enter code-fence mode.
const fenceOpen = line.match(CODE_FENCE_RE);
if (fenceOpen) {
inCodeFence = true;
codeFenceMarker = fenceOpen[2];
out.push(line);
i++;
continue;
}
// An opening callout fence: scan forward (with code-fence and nested
// callout awareness) for its matching closing `:::` at the same level.
const open = line.match(CALLOUT_OPEN_RE);
if (open) {
const type = open[1].toLowerCase();
const bodyLines = [];
let depth = 1;
let innerInCodeFence = false;
let innerCodeFenceMarker = "";
let j = i + 1;
for (; j < lines.length; j++) {
const bl = lines[j];
if (innerInCodeFence) {
const f = bl.match(CODE_FENCE_RE);
if (f && f[2].startsWith(innerCodeFenceMarker[0]) &&
f[2].length >= innerCodeFenceMarker.length) {
innerInCodeFence = false;
innerCodeFenceMarker = "";
}
bodyLines.push(bl);
continue;
}
const innerFence = bl.match(CODE_FENCE_RE);
if (innerFence) {
innerInCodeFence = true;
innerCodeFenceMarker = innerFence[2];
bodyLines.push(bl);
continue;
}
if (CALLOUT_OPEN_RE.test(bl)) {
depth++;
bodyLines.push(bl);
continue;
}
if (CALLOUT_CLOSE_RE.test(bl)) {
depth--;
if (depth === 0)
break; // matching close for THIS callout
bodyLines.push(bl);
continue;
}
bodyLines.push(bl);
}
if (j < lines.length) {
// Found the matching closing fence: render the body (recursively, so
// nested callouts are handled) and emit the callout div.
const inner = await transform(bodyLines);
const renderedInner = await marked.parse(inner);
out.push(`\n<div data-type="callout" data-callout-type="${type}">${renderedInner}</div>\n`);
i = j + 1; // skip past the closing `:::`
continue;
}
// No matching close (unterminated callout): treat the opener as a
// literal line and continue, preserving the original text.
out.push(line);
i++;
continue;
}
out.push(line);
i++;
}
return out.join("\n");
};
return transform(markdown.split("\n"));
}
/**
* Bridge marked's checkbox lists to TipTap task lists.
*
* marked renders GitHub task list items (`- [x] done`) as a plain
* `<ul><li><p><input type="checkbox" checked> text</p></li></ul>` WITHOUT the
* markup TipTap's TaskList/TaskItem extensions parse. This rewrites such lists
* into the shape those extensions expect:
* TaskList parseHTML matches `ul[data-type="taskList"]`,
* TaskItem matches `li[data-type="taskItem"]`,
* the checked state is read from `data-checked === "true"`.
*
* A list is only converted when it has at least one `<li>` and EVERY direct
* `<li>` contains a checkbox input. Both `<ul>` and `<ol>` are considered: a
* numbered checklist (`1. [x] a`, which marked renders as an `<ol>` of checkbox
* `<li>`s) would otherwise lose its task state. TipTap task lists are unordered,
* so a matching `<ol>` is emitted as `data-type="taskList"` exactly like a
* `<ul>`. Mixed or ordinary lists (including ordinary `<ol>` lists) are left
* untouched so they keep rendering as bullet/numbered lists. The marked `<p>`
* wrapper is kept inside the `<li>` because TaskItem content allows paragraphs.
*/
function bridgeTaskLists(html) {
// Cheap early-out: if the markup contains no checkbox input at all there is
// nothing to bridge, so skip the expensive JSDOM parse entirely. This is the
// common case (most pages have no task lists).
if (!/type=["']?checkbox/i.test(html)) {
return html;
}
// Defensive cap (consistent with preprocessCallouts): skip the bridge for
// pathologically large inputs rather than running a second expensive JSDOM
// parse on a multi-megabyte payload. The markup is passed through verbatim.
if (html.length > MAX_CALLOUT_PREPROCESS_BYTES) {
return html;
}
const dom = new JSDOM(html);
const document = dom.window.document;
// Collect the checkbox(es) that belong to THIS <li> directly: either direct
// child <input type="checkbox"> elements or ones inside the <li>'s direct <p>
// child (the shape marked emits: `<li><p><input type="checkbox"> text</p></li>`).
// Checkboxes nested deeper (e.g. inside a child <ul>/<ol>) are excluded so a
// bullet <li> that merely contains a nested task sublist is not misdetected.
// Raw inline HTML can put more than one checkbox in a single <li>; we gather
// ALL of them so none survive into the converted item.
const directCheckboxes = (li) => {
const found = [];
for (const child of Array.from(li.children)) {
if (child.tagName === "INPUT" &&
child.getAttribute("type") === "checkbox") {
found.push(child);
continue;
}
if (child.tagName === "P") {
for (const inp of Array.from(child.querySelectorAll(":scope > input[type='checkbox']"))) {
found.push(inp);
}
}
}
return found;
};
// Both <ul> and <ol> are candidates: an <ol> whose every direct <li> carries
// its own checkbox is a numbered checklist that must also become a taskList.
const lists = Array.from(document.querySelectorAll("ul, ol"));
for (const list of lists) {
// Only consider DIRECT child <li> elements; nested lists are handled by
// their own iteration of the outer loop.
const items = Array.from(list.children).filter((child) => child.tagName === "LI");
if (items.length === 0)
continue;
const itemCheckboxes = items.map((li) => directCheckboxes(li));
// Convert only when every direct <li> carries at least one OWN checkbox.
if (!itemCheckboxes.every((boxes) => boxes.length > 0))
continue;
// A numbered checklist arrives as an <ol>. We must NOT leave the tag as
// <ol> while tagging it data-type="taskList": generateJSON would then match
// BOTH the orderedList rule (tag ol) and the taskList rule (data-type),
// emitting a phantom empty orderedList beside the real taskList. So rename a
// qualifying <ol> to a <ul> — move its <li> children over and replace it —
// leaving only the taskList rule to match. Already-<ul> lists are unchanged.
let target = list;
if (list.tagName === "OL") {
const ul = document.createElement("ul");
// Carry over existing attributes (e.g. class) so nothing is silently lost.
for (const attr of Array.from(list.attributes)) {
ul.setAttribute(attr.name, attr.value);
}
// Move every child node (including the <li>s we collected) into the <ul>.
while (list.firstChild) {
ul.appendChild(list.firstChild);
}
list.replaceWith(ul);
target = ul;
}
target.setAttribute("data-type", "taskList");
items.forEach((li, index) => {
const boxes = itemCheckboxes[index];
// The first checkbox determines the checked state (matches the previous
// single-checkbox behaviour); any extras only need removing.
const input = boxes[0] ?? null;
li.setAttribute("data-type", "taskItem");
const checked = input != null &&
(input.hasAttribute("checked") || input.checked);
li.setAttribute("data-checked", checked ? "true" : "false");
// Remove ALL direct checkbox inputs so none survive into the content
// (a raw-inline-HTML <li> may carry more than one).
for (const box of boxes) {
box.remove();
}
});
}
return document.body.innerHTML;
}
/** Convert markdown to a ProseMirror doc using the full Docmost schema. */
export async function markdownToProseMirror(markdownContent) {
const withCallouts = await preprocessCallouts(markdownContent);
const html = await marked.parse(withCallouts);
const bridged = bridgeTaskLists(html);
return generateJSON(bridged, docmostExtensions);
}
/**
* Build the collaboration WebSocket URL from an API base URL:
* switch http(s)->ws(s), strip a trailing /api, mount on /collab.
* Shared by the live read and the mutate path so both target the same socket.
*/
export function buildCollabWsUrl(baseUrl) {
let wsUrl = baseUrl.replace(/^http/, "ws");
try {
const urlObj = new URL(wsUrl);
if (urlObj.pathname.endsWith("/api") || urlObj.pathname.endsWith("/api/")) {
urlObj.pathname = urlObj.pathname.replace(/\/api\/?$/, "");
}
urlObj.pathname = urlObj.pathname.replace(/\/$/, "") + "/collab";
// Drop any query/hash from the base URL so it is not carried into the
// collaboration ws URL.
urlObj.search = "";
urlObj.hash = "";
wsUrl = urlObj.toString();
}
catch (e) {
// Fallback if URL parsing fails
if (!wsUrl.endsWith("/collab")) {
wsUrl = wsUrl.replace(/\/$/, "") + "/collab";
}
}
return wsUrl;
}
/**
* Encode a ProseMirror doc to a Yjs document, sanitizing it first and turning
* the opaque yjs "Unexpected content type" failure into a descriptive error.
*
* `sanitizeForYjs` strips `undefined` node/mark attributes (the common cause of
* the failure); if `toYdoc` still throws, `findUnstorableAttr` is used to point
* at the offending attribute path.
*/
export function buildYDoc(doc) {
const safe = sanitizeForYjs(doc);
try {
return TiptapTransformer.toYdoc(safe, "default", docmostExtensions);
}
catch (e) {
const bad = findUnstorableAttr(safe);
throw new Error(`Failed to encode document to Yjs (toYdoc): ${e instanceof Error ? e.message : String(e)}.${bad ? ` Offending attribute: ${bad}.` : " A node/mark attribute likely holds a value Yjs cannot store (e.g. undefined)."}`);
}
}
/**
* Validate that a doc is Yjs-encodable by building (and discarding) a Y.Doc.
* Throws the same descriptive error as the apply path when it is not. Used by
* the dry-run preview so it fails identically to apply.
*/
export function assertYjsEncodable(doc) {
buildYDoc(doc);
}
/** Time we wait for the initial handshake/sync before giving up. */
const CONNECT_TIMEOUT_MS = 25000;
/** Time we wait for the server to acknowledge our write before giving up. */
const PERSIST_TIMEOUT_MS = 20000;
/**
* Safely mutate the live content of a page over the collaboration websocket.
*
* This is the single safe write path for every MCP content mutation. It:
* 1. serializes per-page writes through withPageLock (no two MCP writes on
* the same page overlap);
* 2. connects to Hocuspocus and waits for the initial sync so the local ydoc
* mirrors the authoritative server doc — INCLUDING edits/comments/images
* that are not yet in the debounced REST snapshot;
* 3. inside onSynced, SYNCHRONOUSLY reads the live doc, runs `transform`, and
* writes the result back — with no `await` between read and write so no
* remote update can interleave and clobber concurrent human edits;
* 4. waits for the server to acknowledge the write (unsyncedChanges -> 0)
* before resolving, so the next operation observes our change.
*
* `transform` receives the live ProseMirror doc and returns the NEW full
* ProseMirror doc to write, or `null` to abort with no write (a no-op). If
* `transform` throws, the error is propagated to the caller (not swallowed).
*
* Returns the doc that was written, or the live doc when the transform aborted.
*/
export async function mutatePageContent(pageId, collabToken, baseUrl, transform) {
return withPageLock(pageId, () => {
if (process.env.DEBUG) {
console.error(`Starting realtime content mutate for page ${pageId}`);
// Token prefix is sensitive; only log it under DEBUG.
console.error(`Token prefix: ${collabToken ? collabToken.substring(0, 5) : "NONE"}...`);
}
const ydoc = new Y.Doc();
const wsUrl = buildCollabWsUrl(baseUrl);
if (process.env.DEBUG)
console.error(`Connecting to WebSocket: ${wsUrl}`);
return new Promise((resolve, reject) => {
let provider;
let applied = false; // onSynced may fire again on reconnect — apply once.
let settled = false;
// Set true on disconnect/close so a reconnect-driven unsyncedChanges->0
// cannot be mistaken for a successful persist of our write.
let connectionLost = false;
let connectTimer;
let persistTimer;
let unsyncedHandler;
const cleanup = () => {
if (connectTimer)
clearTimeout(connectTimer);
if (persistTimer)
clearTimeout(persistTimer);
if (provider) {
if (unsyncedHandler) {
try {
provider.off("unsyncedChanges", unsyncedHandler);
}
catch (err) { }
}
try {
provider.destroy();
}
catch (err) { }
}
};
const finish = (err, value) => {
if (settled)
return;
settled = true;
cleanup();
if (err)
reject(err);
else
resolve(value);
};
connectTimer = setTimeout(() => {
finish(new Error("Connection timeout to collaboration server"));
}, CONNECT_TIMEOUT_MS);
// Resolve once the server has acknowledged our update. The provider
// increments unsyncedChanges when our local update is sent and
// decrements it when the server replies with a SyncStatus(applied=true);
// reaching 0 means the authoritative in-memory ydoc on the server now
// contains our write.
const waitForPersistence = () => {
if (settled)
return;
// A missing provider is a failure, not a success: without it the write
// can never have been acknowledged. Only an actual unsyncedChanges===0
// on a live provider counts as persisted.
if (!provider) {
finish(new Error("collab provider gone before persistence"));
return;
}
if (provider.unsyncedChanges === 0) {
finish(null, lastWrittenDoc);
return;
}
persistTimer = setTimeout(() => {
finish(new Error("Timeout waiting for collaboration server to persist the update"));
}, PERSIST_TIMEOUT_MS);
unsyncedHandler = (data) => {
// Only treat unsyncedChanges->0 as success when the connection is
// still up. A transient disconnect + reconnect handshake can drive
// the counter back to 0 without our write being re-transmitted; in
// that case let the disconnect/close error win instead.
if (data.number === 0 && !connectionLost) {
finish(null, lastWrittenDoc);
}
};
provider.on("unsyncedChanges", unsyncedHandler);
};
let lastWrittenDoc;
provider = new HocuspocusProvider({
url: wsUrl,
name: `page.${pageId}`,
document: ydoc,
token: collabToken,
// @ts-ignore - Required for Node.js environment
WebSocketPolyfill: WebSocket,
onConnect: () => {
if (process.env.DEBUG)
console.error("WS Connect");
},
// An unexpected disconnect/close while we are still waiting (during the
// connect-wait before onSynced, or during the persistence wait after the
// write) means the update will never be acknowledged — surface it now
// instead of hanging until the connect/persist timeout fires. `finish`
// is idempotent via the `settled` flag, so the onClose that our own
// cleanup()->provider.destroy() triggers (after settled=true is set) is
// a harmless no-op and cannot cause a double-resolve.
onDisconnect: () => {
if (process.env.DEBUG)
console.error("WS Disconnect");
// Mark BEFORE finish so the unsyncedChanges handler (if it races)
// sees the connection as lost and won't report a false success.
connectionLost = true;
finish(new Error("Collaboration connection closed before the update was persisted/synced"));
},
onClose: () => {
if (process.env.DEBUG)
console.error("WS Close");
// Mark BEFORE finish so the unsyncedChanges handler (if it races)
// sees the connection as lost and won't report a false success.
connectionLost = true;
finish(new Error("Collaboration connection closed before the update was persisted/synced"));
},
onSynced: () => {
if (applied || settled)
return;
applied = true;
if (process.env.DEBUG)
console.error("Connected and synced!");
// CRITICAL: everything between reading the live doc and writing it
// back must stay synchronous (no await). While the JS event loop is
// not yielded, no incoming remote update can interleave, so any
// already-synced concurrent edits are preserved in liveDoc.
let newDoc;
try {
let liveDoc = TiptapTransformer.fromYdoc(ydoc, "default");
if (!liveDoc ||
typeof liveDoc !== "object" ||
!Array.isArray(liveDoc.content)) {
liveDoc = { type: "doc", content: [] };
}
newDoc = transform(liveDoc);
if (newDoc == null) {
// Transform aborted — write nothing, return the live doc.
lastWrittenDoc = liveDoc;
finish(null, liveDoc);
return;
}
const tempDoc = buildYDoc(newDoc);
// Fetch the fragment immediately before the transact that mutates
// it, rather than reusing a handle grabbed across the transform.
const fragment = ydoc.getXmlFragment("default");
ydoc.transact(() => {
if (fragment.length > 0) {
fragment.delete(0, fragment.length);
}
Y.applyUpdate(ydoc, Y.encodeStateAsUpdate(tempDoc));
});
}
catch (e) {
// Includes errors thrown by transform (e.g. "afterText not found",
// "text not found"): propagate them verbatim to the caller.
finish(e instanceof Error ? e : new Error(String(e)));
return;
}
lastWrittenDoc = newDoc;
if (process.env.DEBUG)
console.error("Content written, waiting for server to persist...");
waitForPersistence();
},
onAuthenticationFailed: () => {
finish(new Error("Authentication failed for collaboration connection"));
},
});
});
});
}
/**
* Replace the live content of a page over the collaboration websocket.
* Accepts a ready ProseMirror JSON document; the caller controls whether
* it was produced from markdown (ids regenerate) or edited in place
* (existing block ids preserved).
*
* This is an intentional full replace (used by update_page / update_page_json),
* but now runs under the per-page lock and waits for server persistence via
* mutatePageContent.
*/
export async function replacePageContent(pageId, prosemirrorDoc, collabToken, baseUrl) {
// Fail fast on a bad document instead of deferring the failure into the
// collaboration write (where TiptapTransformer.toYdoc(undefined) used to
// throw). The transform must return a valid ProseMirror doc.
if (prosemirrorDoc == null ||
typeof prosemirrorDoc !== "object" ||
prosemirrorDoc.type !== "doc") {
throw new Error("replacePageContent: invalid ProseMirror document");
}
await mutatePageContent(pageId, collabToken, baseUrl, () => prosemirrorDoc);
}
/**
* Markdown update path (kept for backwards compatibility).
* NOTE: this re-imports the whole document — block ids are regenerated.
* Tables and :::callout::: blocks survive thanks to the full schema.
*/
export async function updatePageContentRealtime(pageId, markdownContent, collabToken, baseUrl) {
const tiptapJson = await markdownToProseMirror(markdownContent);
await mutatePageContent(pageId, collabToken, baseUrl, () => tiptapJson);
}

View File

@@ -0,0 +1,273 @@
/**
* Headless, Docmost-equivalent document diff.
*
* Docmost's history editor computes a change set with the exact pipeline below
* (recreateTransform -> ChangeSet.addSteps -> simplifyChanges) and renders it as
* editor decorations. This module runs the SAME computation but serializes the
* result to text + integrity counts instead of decorations, so a diff can be
* previewed without a browser.
*
* recreateTransform here comes from @fellow/prosemirror-recreate-transform, the
* maintained published fork of the MIT prosemirror-recreate-steps source that
* Docmost vendors in @docmost/editor-ext; it exposes the identical
* recreateTransform(fromDoc, toDoc, { complexSteps, wordDiffs, simplifyDiff })
* signature.
*
* If recreateTransform / the changeset throws on a pathological document pair,
* we fall back to a coarse block-level text diff so the tool never hard-fails.
*/
import { getSchema } from "@tiptap/core";
import { Node } from "@tiptap/pm/model";
import { ChangeSet, simplifyChanges } from "@tiptap/pm/changeset";
import { recreateTransform } from "@fellow/prosemirror-recreate-transform";
import { docmostExtensions } from "./docmost-schema.js";
/** Build the schema once; it is pure and reused across calls. */
const schema = getSchema(docmostExtensions);
/** Recursively concatenate the plain text of a JSON node. */
function plainText(node) {
if (!node || typeof node !== "object")
return "";
let out = "";
if (typeof node.text === "string")
out += node.text;
if (Array.isArray(node.content)) {
for (const child of node.content)
out += plainText(child);
}
return out;
}
/** Count nodes in a JSON doc that satisfy `pred` (recursive). */
function countNodes(doc, pred) {
let n = 0;
const visit = (node) => {
if (!node || typeof node !== "object")
return;
if (pred(node))
n++;
if (Array.isArray(node.content))
for (const c of node.content)
visit(c);
};
visit(doc);
return n;
}
/**
* Count UNIQUE links in a JSON doc by their `href`. A single link can be split
* across several adjacent text runs (e.g. a "link+bold" run followed by a "link"
* run); counting link-bearing runs would over-count it. Walking the tree and
* collecting hrefs into a Set keys each distinct link once. Link marks with a
* missing/empty href are bucketed under a single "" key so a malformed link is
* still counted as one.
*/
function countUniqueLinks(doc) {
const hrefs = new Set();
const visit = (node) => {
if (!node || typeof node !== "object")
return;
if (node.type === "text" && Array.isArray(node.marks)) {
for (const m of node.marks) {
if (m && m.type === "link") {
const href = m.attrs && typeof m.attrs.href === "string" ? m.attrs.href : "";
hrefs.add(href);
}
}
}
if (Array.isArray(node.content))
for (const c of node.content)
visit(c);
};
visit(doc);
return hrefs.size;
}
/**
* Parse the ordered list of integers from `[N]` footnote markers found in the
* BODY only (every top-level block before the first "Примечания..." notes
* heading; if no such heading, the whole doc). Returned in reading order.
*/
function footnoteMarkers(doc, notesHeading) {
const top = Array.isArray(doc?.content) ? doc.content : [];
const notesIdx = top.findIndex((n) => n &&
n.type === "heading" &&
plainText(n).trim() === notesHeading);
const bodyBlocks = notesIdx >= 0 ? top.slice(0, notesIdx) : top;
const markers = [];
const re = /\[(\d+)\]/g;
for (const block of bodyBlocks) {
const text = plainText(block);
let m;
re.lastIndex = 0;
while ((m = re.exec(text)) !== null) {
markers.push(Number(m[1]));
}
}
return markers;
}
/** Compute the [old,new] integrity tuples for two JSON docs. */
function computeIntegrity(oldDoc, newDoc, notesHeading) {
const images = [
countNodes(oldDoc, (n) => n.type === "image"),
countNodes(newDoc, (n) => n.type === "image"),
];
const links = [
countUniqueLinks(oldDoc),
countUniqueLinks(newDoc),
];
const tables = [
countNodes(oldDoc, (n) => n.type === "table"),
countNodes(newDoc, (n) => n.type === "table"),
];
const callouts = [
countNodes(oldDoc, (n) => n.type === "callout"),
countNodes(newDoc, (n) => n.type === "callout"),
];
const fns = [
footnoteMarkers(oldDoc, notesHeading),
footnoteMarkers(newDoc, notesHeading),
];
return { images, links, tables, callouts, footnoteMarkers: fns };
}
/**
* Resolve the lead text of the top-level block in a ProseMirror Node that
* contains the given document position. Returns "" when out of range.
*/
function blockContextAt(node, pos) {
try {
const clamped = Math.max(0, Math.min(pos, node.content.size));
const $pos = node.resolve(clamped);
// depth 1 is the top-level block in a doc node.
const block = $pos.depth >= 1 ? $pos.node(1) : $pos.node(0);
const text = block.textContent || "";
return text.length > 80 ? text.slice(0, 77) + "..." : text;
}
catch {
return "";
}
}
/** Truncate a string for the markdown summary. */
function truncate(s, n = 120) {
return s.length > n ? s.slice(0, n - 3) + "..." : s;
}
/**
* Coarse fallback: a block-by-block plain-text diff. Used only when the precise
* changeset pipeline throws, so the tool degrades gracefully instead of failing.
*/
function coarseDiff(oldDoc, newDoc) {
const oldBlocks = Array.isArray(oldDoc?.content) ? oldDoc.content : [];
const newBlocks = Array.isArray(newDoc?.content) ? newDoc.content : [];
const oldTexts = oldBlocks.map(plainText);
const newTexts = newBlocks.map(plainText);
const oldSet = new Set(oldTexts);
const newSet = new Set(newTexts);
const changes = [];
for (const t of oldTexts) {
if (!newSet.has(t) && t.trim() !== "") {
changes.push({ op: "delete", block: truncate(t, 80), text: t });
}
}
for (const t of newTexts) {
if (!oldSet.has(t) && t.trim() !== "") {
changes.push({ op: "insert", block: truncate(t, 80), text: t });
}
}
return changes;
}
/** Build the human-readable unified-ish markdown summary. */
function renderMarkdown(result, fellBack) {
const lines = [];
const { summary, integrity, changes } = result;
lines.push(`# Diff: ${summary.inserted} inserted / ${summary.deleted} deleted (${summary.blocksChanged} blocks changed)`);
if (fellBack) {
lines.push("");
lines.push("> note: precise diff failed; coarse block-level diff shown.");
}
lines.push("");
lines.push("## Integrity (old -> new)");
lines.push(`- images: ${integrity.images[0]} -> ${integrity.images[1]}`);
lines.push(`- links: ${integrity.links[0]} -> ${integrity.links[1]}`);
lines.push(`- tables: ${integrity.tables[0]} -> ${integrity.tables[1]}`);
lines.push(`- callouts: ${integrity.callouts[0]} -> ${integrity.callouts[1]}`);
lines.push(`- footnoteMarkers: [${integrity.footnoteMarkers[0].join(", ")}] -> [${integrity.footnoteMarkers[1].join(", ")}]`);
lines.push("");
lines.push("## Changes");
if (changes.length === 0) {
lines.push("(no textual changes)");
}
else {
for (const c of changes) {
const sign = c.op === "insert" ? "+" : "-";
const ctx = c.block ? ` @ ${truncate(c.block, 60)}` : "";
lines.push(`${sign} ${truncate(c.text)}${ctx}`);
}
}
return lines.join("\n");
}
/**
* Diff two ProseMirror JSON documents the way Docmost's history editor does and
* serialize the result to text + integrity counts.
*
* @param oldDocJson the earlier document
* @param newDocJson the later document
* @param notesHeading heading delimiting body from notes for footnote counting
*/
export function diffDocs(oldDocJson, newDocJson, notesHeading = "Примечания переводчика") {
const integrity = computeIntegrity(oldDocJson, newDocJson, notesHeading);
let changes = [];
let inserted = 0;
let deleted = 0;
let fellBack = false;
const changedBlocks = new Set();
try {
const oldNode = Node.fromJSON(schema, oldDocJson);
const newNode = Node.fromJSON(schema, newDocJson);
const tr = recreateTransform(oldNode, newNode, {
complexSteps: false,
wordDiffs: true,
simplifyDiff: true,
});
const changeSet = ChangeSet.create(oldNode).addSteps(tr.doc, tr.mapping.maps, []);
const simplified = simplifyChanges(changeSet.changes, newNode);
for (const change of simplified) {
// Deleted text lives in the OLD doc coordinate range [fromA, toA).
if (change.toA > change.fromA) {
const text = oldNode.textBetween(change.fromA, change.toA, "\n", " ");
if (text.length > 0) {
deleted += text.length;
const block = blockContextAt(oldNode, change.fromA);
changes.push({ op: "delete", block, text });
if (block)
changedBlocks.add("d:" + block);
}
}
// Inserted text lives in the NEW doc coordinate range [fromB, toB).
if (change.toB > change.fromB) {
const text = newNode.textBetween(change.fromB, change.toB, "\n", " ");
if (text.length > 0) {
inserted += text.length;
const block = blockContextAt(newNode, change.fromB);
changes.push({ op: "insert", block, text });
if (block)
changedBlocks.add("i:" + block);
}
}
}
}
catch {
// Pathological pair: degrade to a coarse block-level diff so we never throw.
fellBack = true;
changes = coarseDiff(oldDocJson, newDocJson);
for (const c of changes) {
if (c.op === "insert")
inserted += c.text.length;
else
deleted += c.text.length;
if (c.block)
changedBlocks.add(c.op[0] + ":" + c.block);
}
}
const partial = {
summary: { inserted, deleted, blocksChanged: changedBlocks.size },
integrity,
changes,
};
return { ...partial, markdown: renderMarkdown(partial, fellBack) };
}

View File

@@ -0,0 +1,999 @@
/**
* Full TipTap extension set matching the real Docmost document schema.
*
* The default StarterKit-only schema silently destroys Docmost-specific
* nodes (callout, table) and drops attributes it does not know about
* (node ids, image sizing, link targets). Every code path that converts
* to or from ProseMirror JSON must use THIS set, otherwise a round-trip
* loses content.
*/
import StarterKit from "@tiptap/starter-kit";
import Image from "@tiptap/extension-image";
import TaskList from "@tiptap/extension-task-list";
import TaskItem from "@tiptap/extension-task-item";
import Highlight from "@tiptap/extension-highlight";
import Subscript from "@tiptap/extension-subscript";
import Superscript from "@tiptap/extension-superscript";
import { Node, Extension, Mark } from "@tiptap/core";
// Inlined from @tiptap/core's getStyleProperty (added after 3.20.x) so this
// package can stay on the same @tiptap/core version as the editor and avoid a
// duplicate-tiptap version split in the monorepo. Reads a single declaration
// from an element's inline `style` attribute, last-wins, case-insensitive.
function getStyleProperty(element, propertyName) {
const styleAttr = element.getAttribute("style");
if (!styleAttr) {
return null;
}
const decls = styleAttr.split(";").map((decl) => decl.trim()).filter(Boolean);
const target = propertyName.toLowerCase();
for (let i = decls.length - 1; i >= 0; i -= 1) {
const decl = decls[i];
const colonIndex = decl.indexOf(":");
if (colonIndex === -1) {
continue;
}
const prop = decl.slice(0, colonIndex).trim().toLowerCase();
if (prop === target) {
return decl.slice(colonIndex + 1).trim();
}
}
return null;
}
/** Allowed Docmost callout types; anything else falls back to "info". */
const CALLOUT_TYPES = ["info", "warning", "danger", "success"];
export const clampCalloutType = (value) => value && CALLOUT_TYPES.includes(value.toLowerCase())
? value.toLowerCase()
: "info";
/**
* Allowlist guard for CSS color values imported from HTML.
*
* Docmost interpolates stored mark colors straight into an inline style
* attribute (e.g. style="background-color: ${color}" / "color: ${color}").
* An unsanitized value such as `red; --x: url(...)` or `red"><script>` would
* let a crafted document break out of the style attribute. We therefore only
* accept a narrow, well-formed subset of CSS <color> syntax and reject (-> null)
* anything else.
*
* Accepted forms:
* - named colors: letters only, e.g. "red", "rebeccapurple"
* - hex: #rgb, #rgba, #rrggbb, #rrggbbaa
* - functional notation: rgb()/rgba()/hsl()/hsla() containing only
* digits, %, ., commas, spaces and slashes
*/
const SAFE_COLOR_RE = /^(?:[a-zA-Z]+|#(?:[0-9a-fA-F]{3,4}|[0-9a-fA-F]{6}|[0-9a-fA-F]{8})|(?:rgb|rgba|hsl|hsla)\([0-9.,%/\s]+\))$/;
export const sanitizeCssColor = (value) => {
if (typeof value !== "string")
return null;
const color = value.trim();
return color && SAFE_COLOR_RE.test(color) ? color : null;
};
/** Docmost callout (info/warning/danger/success banner). */
const Callout = Node.create({
name: "callout",
group: "block",
content: "block+",
defining: true,
addAttributes() {
return {
// Read the type from data-callout-type so generateJSON(html) preserves
// it; without an explicit parseHTML every imported callout became "info".
type: {
default: "info",
parseHTML: (el) => clampCalloutType(el.getAttribute("data-callout-type")),
renderHTML: (attrs) => ({
"data-callout-type": clampCalloutType(attrs.type),
}),
},
icon: {
default: null,
parseHTML: (el) => el.getAttribute("data-icon"),
renderHTML: (attrs) => attrs.icon ? { "data-icon": attrs.icon } : {},
},
};
},
parseHTML() {
return [{ tag: 'div[data-type="callout"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["div", { "data-type": "callout", ...HTMLAttributes }, 0];
},
});
/** Minimal table family: enough for schema round-trips and HTML parsing. */
const Table = Node.create({
name: "table",
group: "block",
content: "tableRow+",
isolating: true,
parseHTML() {
return [{ tag: "table" }];
},
renderHTML() {
return ["table", ["tbody", 0]];
},
});
const TableRow = Node.create({
name: "tableRow",
content: "(tableCell | tableHeader)*",
parseHTML() {
return [{ tag: "tr" }];
},
renderHTML() {
return ["tr", 0];
},
});
const cellAttributes = () => ({
colspan: { default: 1 },
rowspan: { default: 1 },
colwidth: { default: null },
backgroundColor: { default: null },
backgroundColorName: { default: null },
// Column alignment so GFM aligned tables (|:--|:-:|--:|) round-trip.
align: {
default: null,
parseHTML: (el) => el.getAttribute("align") || el.style.textAlign || null,
renderHTML: (attrs) => attrs.align ? { align: attrs.align } : {},
},
});
const TableCell = Node.create({
name: "tableCell",
content: "block+",
isolating: true,
addAttributes: cellAttributes,
parseHTML() {
return [{ tag: "td" }];
},
renderHTML() {
return ["td", 0];
},
});
const TableHeader = Node.create({
name: "tableHeader",
content: "block+",
isolating: true,
addAttributes: cellAttributes,
parseHTML() {
return [{ tag: "th" }];
},
renderHTML() {
return ["th", 0];
},
});
/**
* Attributes Docmost stores on standard nodes that the stock extensions
* do not declare. Without these, Node.fromJSON silently drops them —
* including the block ids that heading anchors rely on.
*/
const DocmostAttributes = Extension.create({
name: "docmostAttributes",
addGlobalAttributes() {
return [
{
types: ["heading", "paragraph"],
attributes: {
id: { default: null },
indent: { default: null },
textAlign: { default: null },
},
},
{
types: ["image"],
attributes: {
align: { default: null },
attachmentId: { default: null },
aspectRatio: { default: null },
height: { default: null },
placeholder: { default: null },
size: { default: null },
width: { default: null },
},
},
{
types: ["orderedList"],
attributes: { type: { default: null } },
},
{
types: ["link"],
attributes: { internal: { default: null }, title: { default: null } },
},
];
},
});
/**
* Docmost inline comment mark. Anchors a comment thread to a text range via
* `commentId`. Without it, any document containing comment highlights fails to
* round-trip through the schema ("There is no mark type comment in this schema"),
* which breaks update_page_json and edit_page_text on every commented page.
* Mirrors Docmost's @docmost/editor-ext comment mark (commentId / resolved).
*/
const Comment = Mark.create({
name: "comment",
exitable: true,
inclusive: false,
addAttributes() {
return {
commentId: {
default: null,
parseHTML: (el) => el.getAttribute("data-comment-id"),
renderHTML: (attrs) => attrs.commentId ? { "data-comment-id": attrs.commentId } : {},
},
resolved: {
default: false,
parseHTML: (el) => el.getAttribute("data-resolved") === "true",
renderHTML: (attrs) => attrs.resolved ? { "data-resolved": "true" } : {},
},
};
},
parseHTML() {
return [{ tag: "span[data-comment-id]" }];
},
renderHTML({ HTMLAttributes }) {
return ["span", { class: "comment-mark", ...HTMLAttributes }, 0];
},
});
/**
* Text color mark. The markdown-converter emits colored text as
* <span style="color: ...">, but with no mark parsing it back the color was
* silently dropped on import. This mirrors TipTap's @tiptap/extension-text-style
* `textStyle` mark (the name Docmost expects) and carries a single `color`
* attribute. The parsed color is passed through the allowlist guard so a crafted
* style cannot break out of the attribute when Docmost re-renders it.
*/
const TextStyle = Mark.create({
name: "textStyle",
addAttributes() {
return {
color: {
default: null,
parseHTML: (el) => sanitizeCssColor(el.style.color || el.getAttribute("data-color")),
renderHTML: (attrs) => {
const color = sanitizeCssColor(attrs.color);
return color ? { style: `color: ${color}` } : {};
},
},
};
},
parseHTML() {
return [
{
tag: "span",
// Only claim a plain colored span. Do NOT match spans that are already a
// comment mark (data-comment-id) or a mention node (data-type=mention),
// otherwise importing such HTML would silently drop the comment/mention.
getAttrs: (el) => el.style.color &&
!el.getAttribute("data-comment-id") &&
el.getAttribute("data-type") !== "mention"
? {}
: false,
},
];
},
renderHTML({ HTMLAttributes }) {
return ["span", HTMLAttributes, 0];
},
});
/**
* Passthrough definitions for the remaining Docmost-specific nodes.
*
* TiptapTransformer.toYdoc (the write path every mutation uses) throws
* "Unknown node type: X" for any node not registered here, so editing ANY
* page that contains one of these nodes used to fail outright. The read path
* (fromYdoc) accepts them, which is why they appear in real documents.
*
* Each node below mirrors the real @docmost/editor-ext definition's name,
* group, content, inline/atom flags and attribute keys (with the same data-*
* HTML mapping) so that a fromYdoc -> transform -> toYdoc round-trip both
* validates and preserves attributes faithfully. Interactive concerns
* (node views, commands, keyboard shortcuts, input rules, suggestion plugins)
* are intentionally omitted: the MCP server never renders these nodes, it only
* needs the schema to accept and carry them. The Callout node above is the
* pattern these follow.
*/
/** Docmost @mention (user/page reference). Inline atom. */
const Mention = Node.create({
name: "mention",
group: "inline",
inline: true,
selectable: true,
atom: true,
draggable: true,
addAttributes() {
return {
id: {
default: null,
parseHTML: (el) => el.getAttribute("data-id"),
renderHTML: (attrs) => attrs.id ? { "data-id": attrs.id } : {},
},
label: {
default: null,
parseHTML: (el) => el.getAttribute("data-label"),
renderHTML: (attrs) => attrs.label ? { "data-label": attrs.label } : {},
},
entityType: {
default: null,
parseHTML: (el) => el.getAttribute("data-entity-type"),
renderHTML: (attrs) => attrs.entityType ? { "data-entity-type": attrs.entityType } : {},
},
entityId: {
default: null,
parseHTML: (el) => el.getAttribute("data-entity-id"),
renderHTML: (attrs) => attrs.entityId ? { "data-entity-id": attrs.entityId } : {},
},
slugId: {
default: null,
parseHTML: (el) => el.getAttribute("data-slug-id"),
renderHTML: (attrs) => attrs.slugId ? { "data-slug-id": attrs.slugId } : {},
},
creatorId: {
default: null,
parseHTML: (el) => el.getAttribute("data-creator-id"),
renderHTML: (attrs) => attrs.creatorId ? { "data-creator-id": attrs.creatorId } : {},
},
anchorId: {
default: null,
parseHTML: (el) => el.getAttribute("data-anchor-id"),
renderHTML: (attrs) => attrs.anchorId ? { "data-anchor-id": attrs.anchorId } : {},
},
};
},
parseHTML() {
return [{ tag: 'span[data-type="mention"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["span", { "data-type": "mention", ...HTMLAttributes }, 0];
},
});
/** Inline KaTeX expression. Carries the LaTeX source in `text`. */
const MathInline = Node.create({
name: "mathInline",
group: "inline",
inline: true,
atom: true,
addAttributes() {
return {
text: { default: "" },
};
},
parseHTML() {
return [{ tag: 'span[data-type="mathInline"]' }];
},
renderHTML({ HTMLAttributes }) {
return [
"span",
{ "data-type": "mathInline", "data-katex": "true" },
`${HTMLAttributes.text ?? ""}`,
];
},
});
/** Block KaTeX expression. Carries the LaTeX source in `text`. */
const MathBlock = Node.create({
name: "mathBlock",
group: "block",
atom: true,
isolating: true,
addAttributes() {
return {
text: { default: "" },
};
},
parseHTML() {
return [{ tag: 'div[data-type="mathBlock"]' }];
},
renderHTML({ HTMLAttributes }) {
return [
"div",
{ "data-type": "mathBlock", "data-katex": "true" },
`${HTMLAttributes.text ?? ""}`,
];
},
});
/** Collapsible <details> wrapper: summary + content children. */
const Details = Node.create({
name: "details",
group: "block",
content: "detailsSummary detailsContent",
defining: true,
isolating: true,
addAttributes() {
return {
open: {
default: false,
parseHTML: (el) => el.getAttribute("open"),
renderHTML: (attrs) => attrs.open ? { open: "" } : {},
},
};
},
parseHTML() {
return [{ tag: "details" }];
},
renderHTML({ HTMLAttributes }) {
return ["details", { ...HTMLAttributes }, 0];
},
});
/** Clickable summary line of a <details> block. */
const DetailsSummary = Node.create({
name: "detailsSummary",
group: "block",
content: "inline*",
defining: true,
isolating: true,
selectable: false,
parseHTML() {
return [{ tag: "summary" }];
},
renderHTML({ HTMLAttributes }) {
return ["summary", { "data-type": "detailsSummary", ...HTMLAttributes }, 0];
},
});
/** Body of a <details> block. Permissive content so fromYdoc output validates. */
const DetailsContent = Node.create({
name: "detailsContent",
group: "block",
// Docmost declares block* (an empty details body is valid); block+ would
// reject a collapsed/empty details on round-trip.
content: "block*",
defining: true,
selectable: false,
parseHTML() {
return [{ tag: 'div[data-type="detailsContent"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["div", { "data-type": "detailsContent", ...HTMLAttributes }, 0];
},
});
/** File attachment card (non-image upload). Block atom. */
const Attachment = Node.create({
name: "attachment",
group: "block",
inline: false,
isolating: true,
atom: true,
defining: true,
draggable: true,
addAttributes() {
return {
url: {
default: "",
parseHTML: (el) => el.getAttribute("data-attachment-url"),
renderHTML: (attrs) => ({
"data-attachment-url": attrs.url ?? "",
}),
},
name: {
default: null,
parseHTML: (el) => el.getAttribute("data-attachment-name"),
renderHTML: (attrs) => attrs.name ? { "data-attachment-name": attrs.name } : {},
},
mime: {
default: null,
parseHTML: (el) => el.getAttribute("data-attachment-mime"),
renderHTML: (attrs) => attrs.mime ? { "data-attachment-mime": attrs.mime } : {},
},
size: {
default: null,
parseHTML: (el) => el.getAttribute("data-attachment-size"),
renderHTML: (attrs) => attrs.size != null ? { "data-attachment-size": attrs.size } : {},
},
attachmentId: {
default: null,
parseHTML: (el) => el.getAttribute("data-attachment-id"),
renderHTML: (attrs) => attrs.attachmentId
? { "data-attachment-id": attrs.attachmentId }
: {},
},
// Docmost declares `placeholder` (a transient upload key, not rendered
// to HTML). Carry it so a round-trip never hits "Unsupported attribute".
placeholder: { default: null },
};
},
parseHTML() {
return [{ tag: 'div[data-type="attachment"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["div", { "data-type": "attachment", ...HTMLAttributes }, 0];
},
});
/** Uploaded <video> player. Block atom. */
const Video = Node.create({
name: "video",
group: "block",
isolating: true,
atom: true,
defining: true,
draggable: true,
addAttributes() {
return {
src: {
default: "",
parseHTML: (el) => el.getAttribute("src"),
renderHTML: (attrs) => ({ src: attrs.src ?? "" }),
},
alt: {
default: null,
parseHTML: (el) => el.getAttribute("aria-label"),
renderHTML: (attrs) => attrs.alt ? { "aria-label": attrs.alt } : {},
},
attachmentId: {
default: null,
parseHTML: (el) => el.getAttribute("data-attachment-id"),
renderHTML: (attrs) => attrs.attachmentId
? { "data-attachment-id": attrs.attachmentId }
: {},
},
width: {
default: null,
parseHTML: (el) => el.getAttribute("width"),
renderHTML: (attrs) => attrs.width != null ? { width: attrs.width } : {},
},
height: {
default: null,
parseHTML: (el) => el.getAttribute("height"),
renderHTML: (attrs) => attrs.height != null ? { height: attrs.height } : {},
},
size: {
default: null,
parseHTML: (el) => el.getAttribute("data-size"),
renderHTML: (attrs) => attrs.size != null ? { "data-size": attrs.size } : {},
},
align: {
default: "center",
parseHTML: (el) => el.getAttribute("data-align"),
renderHTML: (attrs) => attrs.align ? { "data-align": attrs.align } : {},
},
aspectRatio: {
default: null,
parseHTML: (el) => el.getAttribute("data-aspect-ratio"),
renderHTML: (attrs) => attrs.aspectRatio != null
? { "data-aspect-ratio": attrs.aspectRatio }
: {},
},
// Docmost declares `placeholder` (a transient upload key, not rendered
// to HTML). Carry it so a round-trip never hits "Unsupported attribute".
placeholder: { default: null },
};
},
parseHTML() {
return [{ tag: "video" }];
},
renderHTML({ HTMLAttributes }) {
return ["video", { controls: "true", ...HTMLAttributes }];
},
});
/**
* Defensive passthrough for a `youtube` node. Docmost itself has no dedicated
* youtube node (YouTube is handled via `embed`), but the converter read path
* references this type, so accept it as a generic block atom that preserves
* its src so legacy/external documents survive a round-trip.
*/
const Youtube = Node.create({
name: "youtube",
group: "block",
inline: false,
isolating: true,
atom: true,
defining: true,
draggable: true,
addAttributes() {
return {
src: {
default: "",
parseHTML: (el) => el.getAttribute("data-src"),
renderHTML: (attrs) => ({
"data-src": attrs.src ?? "",
}),
},
width: {
default: null,
parseHTML: (el) => el.getAttribute("data-width"),
renderHTML: (attrs) => attrs.width != null ? { "data-width": attrs.width } : {},
},
height: {
default: null,
parseHTML: (el) => el.getAttribute("data-height"),
renderHTML: (attrs) => attrs.height != null ? { "data-height": attrs.height } : {},
},
align: {
default: "center",
parseHTML: (el) => el.getAttribute("data-align"),
renderHTML: (attrs) => attrs.align ? { "data-align": attrs.align } : {},
},
};
},
parseHTML() {
return [{ tag: 'div[data-type="youtube"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["div", { "data-type": "youtube", ...HTMLAttributes }, 0];
},
});
/** Generic embed (provider iframe). Block atom. */
const Embed = Node.create({
name: "embed",
group: "block",
inline: false,
isolating: true,
atom: true,
defining: true,
draggable: true,
addAttributes() {
return {
src: {
default: "",
parseHTML: (el) => el.getAttribute("data-src"),
renderHTML: (attrs) => ({
"data-src": attrs.src ?? "",
}),
},
provider: {
default: "",
parseHTML: (el) => el.getAttribute("data-provider"),
renderHTML: (attrs) => ({
"data-provider": attrs.provider ?? "",
}),
},
align: {
default: "center",
parseHTML: (el) => el.getAttribute("data-align"),
renderHTML: (attrs) => ({
"data-align": attrs.align ?? "center",
}),
},
width: {
default: 800,
parseHTML: (el) => el.getAttribute("data-width"),
renderHTML: (attrs) => ({
"data-width": attrs.width,
}),
},
height: {
default: 600,
parseHTML: (el) => el.getAttribute("data-height"),
renderHTML: (attrs) => ({
"data-height": attrs.height,
}),
},
};
},
parseHTML() {
return [{ tag: 'div[data-type="embed"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["div", { "data-type": "embed", ...HTMLAttributes }, 0];
},
});
/** Shared attribute set for drawio/excalidraw diagram nodes. */
const diagramAttributes = () => ({
src: {
default: "",
parseHTML: (el) => el.getAttribute("data-src"),
renderHTML: (attrs) => ({
"data-src": attrs.src ?? "",
}),
},
title: {
default: null,
parseHTML: (el) => el.getAttribute("data-title"),
renderHTML: (attrs) => attrs.title ? { "data-title": attrs.title } : {},
},
alt: {
default: null,
parseHTML: (el) => el.getAttribute("data-alt"),
renderHTML: (attrs) => attrs.alt ? { "data-alt": attrs.alt } : {},
},
width: {
default: null,
parseHTML: (el) => el.getAttribute("data-width"),
renderHTML: (attrs) => attrs.width != null ? { "data-width": attrs.width } : {},
},
height: {
default: null,
parseHTML: (el) => el.getAttribute("data-height"),
renderHTML: (attrs) => attrs.height != null ? { "data-height": attrs.height } : {},
},
size: {
default: null,
parseHTML: (el) => el.getAttribute("data-size"),
renderHTML: (attrs) => attrs.size != null ? { "data-size": attrs.size } : {},
},
aspectRatio: {
default: null,
parseHTML: (el) => el.getAttribute("data-aspect-ratio"),
renderHTML: (attrs) => attrs.aspectRatio != null
? { "data-aspect-ratio": attrs.aspectRatio }
: {},
},
align: {
default: "center",
parseHTML: (el) => el.getAttribute("data-align"),
renderHTML: (attrs) => attrs.align ? { "data-align": attrs.align } : {},
},
attachmentId: {
default: null,
parseHTML: (el) => el.getAttribute("data-attachment-id"),
renderHTML: (attrs) => attrs.attachmentId ? { "data-attachment-id": attrs.attachmentId } : {},
},
});
/** draw.io diagram. Block atom (image-backed). */
const Drawio = Node.create({
name: "drawio",
group: "block",
inline: false,
isolating: true,
atom: true,
defining: true,
draggable: true,
addAttributes: diagramAttributes,
parseHTML() {
return [{ tag: 'div[data-type="drawio"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["div", { "data-type": "drawio", ...HTMLAttributes }, 0];
},
});
/** Excalidraw diagram. Block atom (image-backed). */
const Excalidraw = Node.create({
name: "excalidraw",
group: "block",
inline: false,
isolating: true,
atom: true,
defining: true,
draggable: true,
addAttributes: diagramAttributes,
parseHTML() {
return [{ tag: 'div[data-type="excalidraw"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["div", { "data-type": "excalidraw", ...HTMLAttributes }, 0];
},
});
/** Multi-column layout container holding one or more `column` children. */
const Columns = Node.create({
name: "columns",
group: "block",
content: "column+",
defining: true,
isolating: true,
addAttributes() {
return {
layout: {
default: "two_equal",
parseHTML: (el) => el.getAttribute("data-layout"),
renderHTML: (attrs) => attrs.layout ? { "data-layout": attrs.layout } : {},
},
widthMode: {
default: "normal",
parseHTML: (el) => el.getAttribute("data-width-mode") || "normal",
renderHTML: (attrs) => attrs.widthMode && attrs.widthMode !== "normal"
? { "data-width-mode": attrs.widthMode }
: {},
},
};
},
parseHTML() {
return [{ tag: 'div[data-type="columns"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["div", { "data-type": "columns", ...HTMLAttributes }, 0];
},
});
/** Single column within a `columns` layout. */
const Column = Node.create({
name: "column",
group: "block",
content: "block+",
defining: true,
isolating: true,
selectable: false,
addAttributes() {
return {
width: {
default: null,
parseHTML: (el) => {
const value = el.getAttribute("data-width");
return value ? parseFloat(value) : null;
},
renderHTML: (attrs) => attrs.width ? { "data-width": attrs.width } : {},
},
};
},
parseHTML() {
return [{ tag: 'div[data-type="column"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["div", { "data-type": "column", ...HTMLAttributes }, 0];
},
});
/**
* Subpages listing block (auto-generated index of child pages). Docmost
* declares no attributes; the markdown-converter has a `case "subpages"`, so
* the read path can emit it and toYdoc must accept it. Block atom.
*/
const Subpages = Node.create({
name: "subpages",
group: "block",
inline: false,
isolating: true,
atom: true,
defining: true,
draggable: true,
parseHTML() {
return [{ tag: 'div[data-type="subpages"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["div", { "data-type": "subpages", ...HTMLAttributes }, 0];
},
});
/** Uploaded <audio> player. Block atom. Mirrors Docmost audio attrs. */
const Audio = Node.create({
name: "audio",
group: "block",
inline: false,
isolating: true,
atom: true,
defining: true,
draggable: true,
addAttributes() {
return {
src: {
default: "",
parseHTML: (el) => el.getAttribute("src"),
renderHTML: (attrs) => ({ src: attrs.src ?? "" }),
},
attachmentId: {
default: null,
parseHTML: (el) => el.getAttribute("data-attachment-id"),
renderHTML: (attrs) => attrs.attachmentId
? { "data-attachment-id": attrs.attachmentId }
: {},
},
size: {
default: null,
parseHTML: (el) => el.getAttribute("data-size"),
renderHTML: (attrs) => attrs.size != null ? { "data-size": attrs.size } : {},
},
// Transient upload key Docmost declares with rendered:false; carried so
// a round-trip never hits "Unsupported attribute".
placeholder: { default: null },
};
},
parseHTML() {
return [{ tag: "audio" }];
},
renderHTML({ HTMLAttributes }) {
return ["audio", { controls: "true", ...HTMLAttributes }];
},
});
/** Embedded PDF viewer. Block atom. Mirrors Docmost pdf attrs. */
const Pdf = Node.create({
name: "pdf",
group: "block",
inline: false,
isolating: true,
atom: true,
defining: true,
draggable: true,
addAttributes() {
return {
src: {
default: "",
parseHTML: (el) => el.getAttribute("src"),
renderHTML: (attrs) => ({ src: attrs.src ?? "" }),
},
name: {
default: null,
parseHTML: (el) => el.getAttribute("data-name"),
renderHTML: (attrs) => attrs.name ? { "data-name": attrs.name } : {},
},
attachmentId: {
default: null,
parseHTML: (el) => el.getAttribute("data-attachment-id"),
renderHTML: (attrs) => attrs.attachmentId
? { "data-attachment-id": attrs.attachmentId }
: {},
},
size: {
default: null,
parseHTML: (el) => el.getAttribute("data-size"),
renderHTML: (attrs) => attrs.size != null ? { "data-size": attrs.size } : {},
},
width: {
default: null,
parseHTML: (el) => el.getAttribute("width"),
renderHTML: (attrs) => attrs.width != null ? { width: attrs.width } : {},
},
height: {
default: null,
parseHTML: (el) => el.getAttribute("height"),
renderHTML: (attrs) => attrs.height != null ? { height: attrs.height } : {},
},
// Transient upload key Docmost declares with rendered:false; carried so
// a round-trip never hits "Unsupported attribute".
placeholder: { default: null },
};
},
parseHTML() {
return [{ tag: 'div[data-type="pdf"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["div", { "data-type": "pdf", ...HTMLAttributes }, 0];
},
});
/** Page break (print/export divider). Block atom; Docmost declares no attrs. */
const PageBreak = Node.create({
name: "pageBreak",
group: "block",
inline: false,
isolating: true,
atom: true,
defining: true,
draggable: true,
parseHTML() {
return [{ tag: 'div[data-type="pageBreak"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["div", { "data-type": "pageBreak", ...HTMLAttributes }];
},
});
/**
* Full extension list. Image is block-level (matches Docmost); the
* ProseMirror DOM parser hoists <img> found inside <p> automatically.
* StarterKit v3 already bundles the link extension, configured here.
*/
export const docmostExtensions = [
StarterKit.configure({
codeBlock: {},
heading: {},
link: { openOnClick: false },
}),
Image.configure({ inline: false }),
TaskList,
TaskItem.configure({ nested: true }),
// Highlight stores its color unescaped and Docmost interpolates it into
// style="background-color: ${color}". Wrap the color attribute's parseHTML
// with the same allowlist guard used by textStyle so a crafted import color
// cannot break out of the style attribute. Multicolor behavior is preserved.
Highlight.extend({
addAttributes() {
const parent = this.parent?.() ?? {};
return {
...parent,
color: {
...parent.color,
parseHTML: (el) => sanitizeCssColor(el.getAttribute("data-color") ||
getStyleProperty(el, "background-color") ||
el.style.backgroundColor),
},
};
},
}).configure({ multicolor: true }),
Subscript,
Superscript,
// StarterKit does not provide a textStyle mark, so register ours; without it
// generateJSON drops <span style="color: ...">, defeating the color import.
TextStyle,
Comment,
Callout,
Table,
TableRow,
TableCell,
TableHeader,
Mention,
MathInline,
MathBlock,
Details,
DetailsSummary,
DetailsContent,
Attachment,
Video,
Youtube,
Embed,
Drawio,
Excalidraw,
Columns,
Column,
Subpages,
Audio,
Pdf,
PageBreak,
DocmostAttributes,
];

View File

@@ -0,0 +1,87 @@
/**
* Filter functions to extract only relevant information from API responses
* for better agent consumption
*/
export function filterWorkspace(data) {
return {
id: data.id,
name: data.name,
description: data.description,
defaultSpaceId: data.defaultSpaceId,
createdAt: data.createdAt,
updatedAt: data.updatedAt,
deletedAt: data.deletedAt,
};
}
export function filterSpace(space) {
return {
id: space.id,
name: space.name,
description: space.description,
slug: space.slug,
visibility: space.visibility,
createdAt: space.createdAt,
updatedAt: space.updatedAt,
deletedAt: space.deletedAt,
};
}
export function filterGroup(group) {
return {
id: group.id,
name: group.name,
description: group.description,
workspaceId: group.workspaceId,
createdAt: group.createdAt,
updatedAt: group.updatedAt,
deletedAt: group.deletedAt,
};
}
export function filterPage(page, content, subpages) {
return {
id: page.id,
slugId: page.slugId,
title: page.title,
parentPageId: page.parentPageId,
spaceId: page.spaceId,
isLocked: page.isLocked,
createdAt: page.createdAt,
updatedAt: page.updatedAt,
deletedAt: page.deletedAt,
// Include converted markdown content if valid string (even empty)
...(typeof content === "string" && { content }),
// Include subpages if provided
...(subpages &&
subpages.length > 0 && {
subpages: subpages.map((p) => ({ id: p.id, title: p.title })),
}),
};
}
export function filterComment(comment, markdownContent) {
return {
id: comment.id,
pageId: comment.pageId,
content: markdownContent ?? comment.content,
selection: comment.selection || null,
type: comment.type || "page",
parentCommentId: comment.parentCommentId || null,
creatorId: comment.creatorId,
creatorName: comment.creator?.name || null,
createdAt: comment.createdAt,
editedAt: comment.editedAt || null,
resolvedAt: comment.resolvedAt || null,
resolvedById: comment.resolvedById || null,
};
}
export function filterSearchResult(result) {
return {
id: result.id,
title: result.title,
parentPageId: result.parentPageId,
createdAt: result.createdAt,
updatedAt: result.updatedAt,
rank: result.rank,
highlight: result.highlight,
spaceId: result.space?.id,
spaceName: result.space?.name,
};
}

View File

@@ -0,0 +1,100 @@
/**
* Surgical text edits on a ProseMirror document without re-importing it.
*
* Each edit replaces an exact substring inside individual text nodes,
* preserving every node id, mark and attribute around it. This is the
* safe alternative to a full markdown re-import for small wording fixes.
*/
/** Collect plain text of the whole document (for span-detection hints). */
function collectText(node) {
let out = "";
if (node.type === "text")
out += node.text || "";
for (const child of node.content || [])
out += collectText(child);
return out;
}
function countOccurrences(haystack, needle) {
if (!needle)
return 0;
let count = 0;
let idx = haystack.indexOf(needle);
while (idx !== -1) {
count++;
idx = haystack.indexOf(needle, idx + needle.length);
}
return count;
}
/**
* Apply text edits to a ProseMirror doc (mutates a deep copy, returns it).
* Throws a descriptive error when an edit matches zero times or matches
* multiple times without replaceAll — so the caller can refine `find`.
*/
export function applyTextEdits(doc, edits) {
const copy = JSON.parse(JSON.stringify(doc));
const results = [];
for (const edit of edits) {
if (!edit.find)
throw new Error("edit.find must be a non-empty string");
// Count matches inside individual text nodes first.
let nodeMatches = 0;
(function count(node) {
if (node.type === "text" && node.text) {
nodeMatches += countOccurrences(node.text, edit.find);
}
for (const child of node.content || [])
count(child);
})(copy);
if (nodeMatches === 0) {
// Distinguish "text not present" from "text spans formatting runs".
const fullText = collectText(copy);
if (fullText.includes(edit.find)) {
throw new Error(`Edit "${truncate(edit.find)}": the text exists in the document but spans ` +
`multiple formatting runs (bold/link/italic boundaries). Use a shorter ` +
`fragment that stays inside one run, or use update_page_json for ` +
`structural changes.`);
}
throw new Error(`Edit "${truncate(edit.find)}": text not found in the document.`);
}
if (nodeMatches > 1 && !edit.replaceAll) {
throw new Error(`Edit "${truncate(edit.find)}": matches ${nodeMatches} times. ` +
`Provide a longer, unique fragment or set replaceAll: true.`);
}
// Perform the replacement(s).
let done = 0;
(function replace(node) {
if (node.type === "text" && node.text && node.text.includes(edit.find)) {
if (edit.replaceAll) {
done += countOccurrences(node.text, edit.find);
node.text = node.text.split(edit.find).join(edit.replace);
}
else if (done === 0) {
// Avoid String.replace: its second arg treats $&, $1, $`, $', $$ as
// special patterns, expanding them instead of inserting literally.
// Splice the first occurrence by index to keep the replacement literal.
const idx = node.text.indexOf(edit.find);
node.text =
node.text.slice(0, idx) +
edit.replace +
node.text.slice(idx + edit.find.length);
done = 1;
}
}
for (const child of node.content || [])
replace(child);
})(copy);
results.push({ find: edit.find, replacements: done });
}
// Drop text nodes that became empty (ProseMirror forbids empty text nodes).
(function prune(node) {
if (Array.isArray(node.content)) {
node.content = node.content.filter((child) => !(child.type === "text" && child.text === ""));
for (const child of node.content)
prune(child);
}
})(copy);
return { doc: copy, results };
}
function truncate(s) {
return s.length > 60 ? s.slice(0, 57) + "..." : s;
}

View File

@@ -0,0 +1,795 @@
/**
* Convert ProseMirror/TipTap JSON content to Markdown
* Supports all Docmost-specific node types and extensions
*/
export function convertProseMirrorToMarkdown(content) {
if (!content || !content.content)
return "";
// Escape a value interpolated into an HTML double-quoted attribute value
// (textAlign, colors, image src, math `text`, all data-* attrs, etc.). In the
// ATTRIBUTE context only the quote that delimits the value and the ampersand
// that starts an entity are special, so we escape ONLY & " (and ' for safety
// when single-quoted delimiters are used). We deliberately do NOT escape < or
// >: the HTML re-parser (parse5/jsdom via @tiptap/html) does NOT decode
// &lt;/&gt; back inside attribute values, so escaping them would corrupt the
// stored data (e.g. a math node's LaTeX `a < b`) and ACCUMULATE escapes on
// every round-trip (`a < b` -> `a &lt; b` -> `a &amp;lt; b`). Escaping & "
// keeps the value inert against attribute-injection while staying idempotent.
// NOTE: escape ONLY & and " here. The value is always wrapped in double
// quotes, so " is the only delimiter; ' is NOT special in a double-quoted
// value, and parse5 does not decode &#39; back inside attribute values, so
// escaping ' would (like < >) corrupt the value and accumulate &amp; on every
// round-trip. Escaping & and " is idempotent (parse5 decodes them back).
const escapeAttr = (value) => String(value)
.replace(/&/g, "&amp;")
.replace(/"/g, "&quot;");
// Escape a value placed as HTML element TEXT content (between tags), where
// <, >, and & are all significant. Used for text rendered inside raw-HTML
// blocks (table cells / columns) so stored characters cannot inject markup.
const escapeHtmlText = (value) => String(value)
.replace(/&/g, "&amp;")
.replace(/</g, "&lt;")
.replace(/>/g, "&gt;");
// Percent-encode characters that would break out of a markdown URL target
// (...) — whitespace/newlines and parentheses — so a stored src stays a
// single inert token (used for image/video/youtube srcs).
const encodeMdUrl = (value) => String(value || "")
.replace(/\s/g, (c) => (c === " " ? "%20" : encodeURIComponent(c)))
.replace(/\(/g, "%28")
.replace(/\)/g, "%29");
const processNode = (node) => {
const type = node.type;
const nodeContent = node.content || [];
switch (type) {
case "doc":
return nodeContent.map(processNode).join("\n\n");
case "paragraph":
const text = nodeContent.map(processNode).join("");
const align = node.attrs?.textAlign;
if (align && align !== "left") {
return `<div align="${escapeAttr(align)}">${text}</div>`;
}
return text || "";
case "heading":
const level = node.attrs?.level || 1;
const headingText = nodeContent.map(processNode).join("");
return "#".repeat(level) + " " + headingText;
case "text":
let textContent = node.text || "";
// Apply marks (bold, italic, code, etc.)
if (node.marks) {
// Markdown code spans (`...`) cannot carry inner formatting, so when a
// run has the `code` mark alongside ANY other mark, backtick syntax
// would leak literal ** / []() into the code text. In that case emit
// nested HTML (<code> innermost, the other marks wrapping it as HTML)
// so the output is at least well-formed and re-parseable.
//
// NOTE: this does NOT round-trip both marks. The schema's `code` mark
// has `excludes: "_"` (it excludes every other mark), so on import the
// co-occurring mark is always dropped — the run comes back as `code`
// only. We keep the emission simple and accept that the other mark is
// lost; preserving both is impossible while `code` excludes them.
// Only use the backtick form when `code` is the sole mark.
const markTypes = node.marks.map((m) => m.type);
const hasCode = markTypes.includes("code");
const codeCombined = hasCode && markTypes.length > 1;
for (const mark of node.marks) {
switch (mark.type) {
case "bold":
textContent = codeCombined
? `<strong>${textContent}</strong>`
: `**${textContent}**`;
break;
case "italic":
textContent = codeCombined
? `<em>${textContent}</em>`
: `*${textContent}*`;
break;
case "code":
// When combined with another mark, wrap as <code> so the
// surrounding HTML marks can nest around it; otherwise use the
// plain backtick span.
textContent = codeCombined
? `<code>${textContent}</code>`
: `\`${textContent}\``;
break;
case "link": {
const href = mark.attrs?.href || "";
const title = mark.attrs?.title;
if (codeCombined) {
// Emit an HTML anchor so it can wrap the nested <code>.
const safeHref = escapeAttr(href);
if (title) {
textContent = `<a href="${safeHref}" title="${escapeAttr(String(title))}">${textContent}</a>`;
}
else {
textContent = `<a href="${safeHref}">${textContent}</a>`;
}
}
else if (title) {
// Emit the optional markdown link title; escape an embedded
// double-quote so it cannot terminate the title string early.
const safeTitle = String(title).replace(/"/g, '\\"');
textContent = `[${textContent}](${href} "${safeTitle}")`;
}
else {
textContent = `[${textContent}](${href})`;
}
break;
}
case "strike":
textContent = codeCombined
? `<s>${textContent}</s>`
: `~~${textContent}~~`;
break;
case "underline":
textContent = `<u>${textContent}</u>`;
break;
case "subscript":
textContent = `<sub>${textContent}</sub>`;
break;
case "superscript":
textContent = `<sup>${textContent}</sup>`;
break;
case "highlight": {
// Preserve a null/empty color as a plain highlight (a bare
// <mark> with no background-color); only emit the style when a
// color is actually set, so a plain highlight is not forced to
// yellow on export.
const color = mark.attrs?.color;
textContent = color
? `<mark style="background-color: ${escapeAttr(color)}">${textContent}</mark>`
: `<mark>${textContent}</mark>`;
break;
}
case "textStyle":
if (mark.attrs?.color) {
textContent = `<span style="color: ${escapeAttr(mark.attrs.color)}">${textContent}</span>`;
}
break;
case "comment": {
// Emit the inline comment anchor so highlights round-trip. The
// schema's Comment mark parses span[data-comment-id] (attrs
// commentId/resolved).
const cid = mark.attrs?.commentId;
if (cid) {
const resolvedAttr = mark.attrs?.resolved
? ` data-resolved="true"`
: "";
textContent = `<span data-comment-id="${escapeAttr(cid)}"${resolvedAttr}>${textContent}</span>`;
}
break;
}
}
}
}
return textContent;
case "codeBlock":
const language = node.attrs?.language || "";
// Strip ALL trailing newlines so the export is idempotent: marked
// re-adds exactly one trailing "\n" on import, so trimming only one
// here would let the text grow by "\n" on each round-trip. Removing
// every trailing newline makes repeated cycles stable.
const code = nodeContent
.map(processNode)
.join("")
.replace(/\n+$/, "");
return "```" + language + "\n" + code + "\n```";
case "bulletList":
return nodeContent
.map((item) => processListItem(item, "-"))
.join("\n");
case "orderedList":
return nodeContent
.map((item, index) => processListItem(item, `${index + 1}.`))
.join("\n");
case "taskList":
return nodeContent.map((item) => processTaskItem(item)).join("\n");
case "taskItem":
// Delegate to the same helper used by taskList so multi-block and
// nested task items render and indent consistently.
return processTaskItem(node);
case "listItem":
return nodeContent.map(processNode).join("\n");
case "blockquote":
// Prefix EVERY line of EVERY child with "> " and separate block-level
// children with a blank ">" line so code blocks / multi-paragraph
// quotes round-trip correctly.
return nodeContent
.map((n) => processNode(n)
.split("\n")
.map((line) => (line.length ? `> ${line}` : ">"))
.join("\n"))
.join("\n>\n");
case "horizontalRule":
return "---";
case "hardBreak":
// Two trailing spaces before the newline encode a markdown hard break;
// a bare "\n" would be reimported as a soft break and lost.
return " \n";
case "image":
const imgAlt = node.attrs?.alt || "";
// Neutralize characters that could break out of the markdown image
// URL: spaces/newlines and parentheses would terminate the (...) target
// and let a stored src inject following markdown/HTML. Percent-encode
// them so the URL stays a single inert token.
const imgSrc = encodeMdUrl(node.attrs?.src);
// No "caption" attribute exists in the Docmost image schema, so we do
// not emit one (the previous caption branch was dead).
return `![${imgAlt}](${imgSrc})`;
case "video": {
// Emit the schema-matching <video> element so generateJSON rebuilds the
// node with its attrs intact. The schema's parseHTML reads src/aria-label
// from the standard attributes and the remaining attrs from data-*.
const attrs = node.attrs || {};
const parts = [`src="${escapeAttr(attrs.src ?? "")}"`];
if (attrs.alt)
parts.push(`aria-label="${escapeAttr(attrs.alt)}"`);
if (attrs.attachmentId)
parts.push(`data-attachment-id="${escapeAttr(attrs.attachmentId)}"`);
if (attrs.width != null)
parts.push(`width="${escapeAttr(attrs.width)}"`);
if (attrs.height != null)
parts.push(`height="${escapeAttr(attrs.height)}"`);
if (attrs.size != null)
parts.push(`data-size="${escapeAttr(attrs.size)}"`);
if (attrs.align)
parts.push(`data-align="${escapeAttr(attrs.align)}"`);
if (attrs.aspectRatio != null)
parts.push(`data-aspect-ratio="${escapeAttr(attrs.aspectRatio)}"`);
// Wrap in a block <div> so marked treats it as a block (a bare <video>
// is inline-level HTML and marked wraps it in <p>, leaving a spurious
// empty paragraph beside the hoisted block atom). The wrapper has no
// data-type, so the schema parser ignores it and just hoists the video.
return `<div><video ${parts.join(" ")}></video></div>`;
}
case "youtube": {
// Emit the schema-matching div[data-type="youtube"]; the schema reads
// src from data-src and width/height/align from data-* attributes.
const attrs = node.attrs || {};
const parts = [
`data-type="youtube"`,
`data-src="${escapeAttr(attrs.src ?? "")}"`,
];
if (attrs.width != null)
parts.push(`data-width="${escapeAttr(attrs.width)}"`);
if (attrs.height != null)
parts.push(`data-height="${escapeAttr(attrs.height)}"`);
if (attrs.align)
parts.push(`data-align="${escapeAttr(attrs.align)}"`);
return `<div ${parts.join(" ")}></div>`;
}
case "table": {
// A GFM pipe table cannot represent merged cells. If ANY cell carries
// colspan>1 or rowspan>1, a pipe table would corrupt the grid on
// re-import, so emit the WHOLE table as raw HTML <table> instead: the
// schema's table family parseHTML (tag table/tr/td/th, with colspan/
// rowspan read from the same-named HTML attrs and align via parseHTML)
// round-trips it faithfully. Otherwise keep the lighter GFM pipe table.
const tableRows = nodeContent;
if (tableRows.length === 0)
return "";
const hasSpan = tableRows.some((row) => (row.content || []).some((cell) => (cell.attrs?.colspan ?? 1) > 1 || (cell.attrs?.rowspan ?? 1) > 1));
if (hasSpan) {
// Render each cell's block children to HTML (marked does NOT parse
// markdown inside a raw HTML block, so emitting markdown here would
// leak literal ** / `` into the cell). blockToHtml mirrors the schema
// HTML so inner formatting re-parses into the right marks/nodes.
const renderHtmlCell = (cell) => {
const tag = cell.type === "tableHeader" ? "th" : "td";
const a = cell.attrs || {};
const cellParts = [];
if ((a.colspan ?? 1) > 1)
cellParts.push(`colspan="${escapeAttr(a.colspan)}"`);
if ((a.rowspan ?? 1) > 1)
cellParts.push(`rowspan="${escapeAttr(a.rowspan)}"`);
if (a.align)
cellParts.push(`align="${escapeAttr(a.align)}"`);
const open = cellParts.length
? `<${tag} ${cellParts.join(" ")}>`
: `<${tag}>`;
const inner = (cell.content || [])
.map((block) => blockToHtml(block))
.join("");
return `${open}${inner}</${tag}>`;
};
const htmlRows = tableRows
.map((row) => `<tr>${(row.content || []).map(renderHtmlCell).join("")}</tr>`)
.join("");
return `<table><tbody>${htmlRows}</tbody></table>`;
}
// No merged cells: emit a GFM table (header row + separator) so the
// markdown can be parsed back into a table on re-import.
const rows = tableRows.map(processNode);
const headerCells = tableRows[0]?.content || [];
const columns = headerCells.length || 1;
// Derive alignment markers (:--, :-:, --:) from each header cell.
const markers = Array.from({ length: columns }, (_, i) => {
const align = headerCells[i]?.attrs?.align;
switch (align) {
case "left":
return ":--";
case "center":
return ":-:";
case "right":
return "--:";
default:
return "---";
}
});
const separator = "| " + markers.join(" | ") + " |";
return [rows[0], separator, ...rows.slice(1)].join("\n");
}
case "tableRow":
return "| " + nodeContent.map(processNode).join(" | ") + " |";
case "tableCell":
case "tableHeader": {
// Join multiple block children with a space (not "") so adjacent blocks
// like a paragraph followed by a list don't collide into "line1- a".
// Then collapse newlines and escape pipes so a cell containing "|" or a
// line break cannot corrupt the surrounding GFM row.
return nodeContent
.map(processNode)
.join(" ")
.replace(/\r?\n/g, " ")
.replace(/\|/g, "\\|");
}
case "callout":
const calloutType = node.attrs?.type || "info";
const calloutContent = nodeContent.map(processNode).join("\n");
return `:::${calloutType.toLowerCase()}\n${calloutContent}\n:::`;
case "details":
return nodeContent.map(processNode).join("\n");
case "detailsSummary":
const summaryText = nodeContent.map(processNode).join("");
return `<details>\n<summary>${summaryText}</summary>\n`;
case "detailsContent":
const detailsText = nodeContent.map(processNode).join("\n");
return `${detailsText}\n</details>`;
case "mathInline": {
// The schema's `text` attribute has no parseHTML, so TipTap's default
// parser reads it from the `text` HTML attribute (NOT the element's text
// content). Emit span[data-type="mathInline"] carrying the LaTeX in a
// `text="..."` attribute so it round-trips. marked cannot parse $...$
// back, so the previous form was lossy.
const inlineMath = node.attrs?.text || "";
return `<span data-type="mathInline" data-katex="true" text="${escapeAttr(inlineMath)}"></span>`;
}
case "mathBlock": {
// Same as mathInline: the LaTeX must ride in the `text` HTML attribute
// for the schema's default parser to recover it.
const blockMath = node.attrs?.text || "";
return `<div data-type="mathBlock" data-katex="true" text="${escapeAttr(blockMath)}"></div>`;
}
case "mention": {
// Emit span[data-type="mention"] with the schema's data-* attributes so
// generateJSON rebuilds the mention node instead of leaving "@label"
// plain text that cannot re-parse.
const attrs = node.attrs || {};
const parts = [`data-type="mention"`];
if (attrs.id)
parts.push(`data-id="${escapeAttr(attrs.id)}"`);
if (attrs.label)
parts.push(`data-label="${escapeAttr(attrs.label)}"`);
if (attrs.entityType)
parts.push(`data-entity-type="${escapeAttr(attrs.entityType)}"`);
if (attrs.entityId)
parts.push(`data-entity-id="${escapeAttr(attrs.entityId)}"`);
if (attrs.slugId)
parts.push(`data-slug-id="${escapeAttr(attrs.slugId)}"`);
if (attrs.creatorId)
parts.push(`data-creator-id="${escapeAttr(attrs.creatorId)}"`);
if (attrs.anchorId)
parts.push(`data-anchor-id="${escapeAttr(attrs.anchorId)}"`);
// Keep the label as visible text content too; the schema reads attrs
// from data-*, so the inner text is purely cosmetic and harmless.
const mentionLabel = attrs.label || attrs.id || "";
// The label is visible element TEXT content here (the data-* attrs above
// carry the real values), so escape it for the text context, not attrs.
return `<span ${parts.join(" ")}>@${escapeHtmlText(mentionLabel)}</span>`;
}
case "attachment": {
// BUG FIX: the old code read node.attrs.fileName / node.attrs.src, but
// the schema stores name/url (plus mime/size/attachmentId). Emit the
// schema-matching div[data-type="attachment"] with data-attachment-*
// attrs so the node round-trips instead of degrading to a markdown link.
const attrs = node.attrs || {};
const parts = [
`data-type="attachment"`,
`data-attachment-url="${escapeAttr(attrs.url ?? "")}"`,
];
if (attrs.name)
parts.push(`data-attachment-name="${escapeAttr(attrs.name)}"`);
if (attrs.mime)
parts.push(`data-attachment-mime="${escapeAttr(attrs.mime)}"`);
if (attrs.size != null)
parts.push(`data-attachment-size="${escapeAttr(attrs.size)}"`);
if (attrs.attachmentId)
parts.push(`data-attachment-id="${escapeAttr(attrs.attachmentId)}"`);
return `<div ${parts.join(" ")}></div>`;
}
case "drawio":
case "excalidraw": {
// Emit the schema-matching div[data-type=...] carrying the diagram's
// attrs as data-* (the schema's diagramAttributes reads src/title/alt/
// width/height/size/aspectRatio/align/attachmentId from data-*), so the
// diagram round-trips instead of degrading to a lossy placeholder.
const attrs = node.attrs || {};
const parts = [
`data-type="${type}"`,
`data-src="${escapeAttr(attrs.src ?? "")}"`,
];
if (attrs.title != null)
parts.push(`data-title="${escapeAttr(attrs.title)}"`);
if (attrs.alt != null)
parts.push(`data-alt="${escapeAttr(attrs.alt)}"`);
if (attrs.width != null)
parts.push(`data-width="${escapeAttr(attrs.width)}"`);
if (attrs.height != null)
parts.push(`data-height="${escapeAttr(attrs.height)}"`);
if (attrs.size != null)
parts.push(`data-size="${escapeAttr(attrs.size)}"`);
if (attrs.aspectRatio != null)
parts.push(`data-aspect-ratio="${escapeAttr(attrs.aspectRatio)}"`);
if (attrs.align)
parts.push(`data-align="${escapeAttr(attrs.align)}"`);
if (attrs.attachmentId)
parts.push(`data-attachment-id="${escapeAttr(attrs.attachmentId)}"`);
return `<div ${parts.join(" ")}></div>`;
}
case "embed": {
// Emit the schema-matching div[data-type="embed"]; the schema reads
// src/provider/align/width/height from data-* attributes so the node
// (and its provider iframe info) survives the round-trip.
const attrs = node.attrs || {};
const parts = [
`data-type="embed"`,
`data-src="${escapeAttr(attrs.src ?? "")}"`,
`data-provider="${escapeAttr(attrs.provider ?? "")}"`,
];
if (attrs.align)
parts.push(`data-align="${escapeAttr(attrs.align)}"`);
if (attrs.width != null)
parts.push(`data-width="${escapeAttr(attrs.width)}"`);
if (attrs.height != null)
parts.push(`data-height="${escapeAttr(attrs.height)}"`);
return `<div ${parts.join(" ")}></div>`;
}
case "audio": {
// Emit the schema-matching <audio> element (was emitting nothing). The
// schema reads src from src and attachmentId/size from data-*.
const attrs = node.attrs || {};
const parts = [`src="${escapeAttr(attrs.src ?? "")}"`];
if (attrs.attachmentId)
parts.push(`data-attachment-id="${escapeAttr(attrs.attachmentId)}"`);
if (attrs.size != null)
parts.push(`data-size="${escapeAttr(attrs.size)}"`);
// Wrap in a block <div> for the same reason as video: a bare <audio> is
// inline-level HTML that marked would wrap in <p>.
return `<div><audio ${parts.join(" ")}></audio></div>`;
}
case "pdf": {
// Emit the schema-matching div[data-type="pdf"] (was emitting nothing).
// The schema reads src/width/height from standard attrs and name/
// attachmentId/size from data-*.
const attrs = node.attrs || {};
const parts = [
`data-type="pdf"`,
`src="${escapeAttr(attrs.src ?? "")}"`,
];
if (attrs.name)
parts.push(`data-name="${escapeAttr(attrs.name)}"`);
if (attrs.attachmentId)
parts.push(`data-attachment-id="${escapeAttr(attrs.attachmentId)}"`);
if (attrs.size != null)
parts.push(`data-size="${escapeAttr(attrs.size)}"`);
if (attrs.width != null)
parts.push(`width="${escapeAttr(attrs.width)}"`);
if (attrs.height != null)
parts.push(`height="${escapeAttr(attrs.height)}"`);
return `<div ${parts.join(" ")}></div>`;
}
case "columns": {
// Emit the schema-matching div[data-type="columns"] wrapper so the
// multi-column layout survives. Without a case the children were
// concatenated with no separator and the text merged. The schema reads
// layout from data-layout and widthMode from data-width-mode. The whole
// block is raw HTML, so render children via blockToHtml (NOT markdown,
// which marked would not re-parse inside a raw HTML block).
const attrs = node.attrs || {};
const parts = [`data-type="columns"`];
if (attrs.layout)
parts.push(`data-layout="${escapeAttr(attrs.layout)}"`);
if (attrs.widthMode && attrs.widthMode !== "normal")
parts.push(`data-width-mode="${escapeAttr(attrs.widthMode)}"`);
const inner = nodeContent.map((n) => blockToHtml(n)).join("");
return `<div ${parts.join(" ")}>${inner}</div>`;
}
case "column": {
// Emit the schema-matching div[data-type="column"]; the schema reads the
// column width from data-width. Children are rendered as HTML so their
// formatting survives inside this raw HTML block.
const attrs = node.attrs || {};
const parts = [`data-type="column"`];
if (attrs.width)
parts.push(`data-width="${escapeAttr(attrs.width)}"`);
const inner = nodeContent.map((n) => blockToHtml(n)).join("");
return `<div ${parts.join(" ")}>${inner}</div>`;
}
case "subpages":
return "{{SUBPAGES}}";
default:
// Fallback: process children
return nodeContent.map(processNode).join("");
}
};
// Render inline content (text runs + their marks) to HTML. Used by the raw
// HTML fallbacks (spanned tables, columns) where marked will NOT re-parse
// markdown, so backtick/asterisk/bracket syntax would otherwise leak as
// literal characters. Each mark is mirrored to the HTML the schema's parseHTML
// accepts so it re-imports as the matching ProseMirror mark.
const inlineToHtml = (inlineNodes) => (inlineNodes || [])
.map((n) => {
if (n.type === "hardBreak")
return "<br>";
if (n.type !== "text") {
// Inline atoms (mention, mathInline) already emit schema HTML.
return processNode(n);
}
let t = escapeHtmlText(n.text || "");
for (const mark of n.marks || []) {
switch (mark.type) {
case "bold":
t = `<strong>${t}</strong>`;
break;
case "italic":
t = `<em>${t}</em>`;
break;
case "code":
t = `<code>${t}</code>`;
break;
case "strike":
t = `<s>${t}</s>`;
break;
case "underline":
t = `<u>${t}</u>`;
break;
case "subscript":
t = `<sub>${t}</sub>`;
break;
case "superscript":
t = `<sup>${t}</sup>`;
break;
case "link":
t = `<a href="${escapeAttr(mark.attrs?.href || "")}">${t}</a>`;
break;
case "highlight":
t = mark.attrs?.color
? `<mark style="background-color: ${escapeAttr(mark.attrs.color)}">${t}</mark>`
: `<mark>${t}</mark>`;
break;
case "textStyle":
if (mark.attrs?.color)
t = `<span style="color: ${escapeAttr(mark.attrs.color)}">${t}</span>`;
break;
case "comment":
// Inline comment anchor inside a raw-HTML container (columns /
// spanned table cells), so commented text there also round-trips.
if (mark.attrs?.commentId) {
const r = mark.attrs?.resolved ? ` data-resolved="true"` : "";
t = `<span data-comment-id="${escapeAttr(mark.attrs.commentId)}"${r}>${t}</span>`;
}
break;
}
}
return t;
})
.join("");
// Emit the schema-matching <img> for an image node. Shared so the image is
// emitted as real HTML wherever a raw-HTML container needs it (inside a column
// or a spanned table cell), where markdown `![](...)` would NOT be re-parsed
// and would survive as literal text. The Image extension reads src/alt from
// the standard attributes; the Docmost extra attrs (width/height/align/size/
// attachmentId/aspectRatio) are global attributes read from same-named DOM
// attributes, so emit them by name.
const imageToHtml = (node) => {
const attrs = node.attrs || {};
const parts = [`src="${escapeAttr(attrs.src ?? "")}"`];
if (attrs.alt)
parts.push(`alt="${escapeAttr(attrs.alt)}"`);
if (attrs.title)
parts.push(`title="${escapeAttr(attrs.title)}"`);
if (attrs.width != null)
parts.push(`width="${escapeAttr(attrs.width)}"`);
if (attrs.height != null)
parts.push(`height="${escapeAttr(attrs.height)}"`);
if (attrs.align)
parts.push(`align="${escapeAttr(attrs.align)}"`);
if (attrs.size != null)
parts.push(`data-size="${escapeAttr(attrs.size)}"`);
if (attrs.attachmentId)
parts.push(`data-attachment-id="${escapeAttr(attrs.attachmentId)}"`);
if (attrs.aspectRatio != null)
parts.push(`data-aspect-ratio="${escapeAttr(attrs.aspectRatio)}"`);
return `<img ${parts.join(" ")}>`;
};
// Emit the schema-matching div[data-type="callout"] for a callout node. The
// schema reads the banner type from data-callout-type. Children are rendered
// as HTML so they survive inside a raw-HTML container.
const calloutToHtml = (node) => {
const type = (node.attrs?.type || "info").toLowerCase();
const inner = (node.content || []).map(blockToHtml).join("");
return `<div data-type="callout" data-callout-type="${escapeAttr(type)}">${inner}</div>`;
};
// Emit a schema-matching <details> tree. The schema parses <details>,
// summary[data-type="detailsSummary"], and div[data-type="detailsContent"].
const detailsToHtml = (node) => {
const inner = (node.content || []).map(blockToHtml).join("");
return `<details>${inner}</details>`;
};
const detailsSummaryToHtml = (node) => `<summary data-type="detailsSummary">${inlineToHtml(node.content || [])}</summary>`;
const detailsContentToHtml = (node) => {
const inner = (node.content || []).map(blockToHtml).join("");
return `<div data-type="detailsContent">${inner}</div>`;
};
// Emit the schema-matching taskList/taskItem HTML. bridgeTaskLists (in
// collaboration.ts) recognizes ul[data-type="taskList"] with
// li[data-type="taskItem"][data-checked]; emitting that directly here keeps
// task lists inside columns/cells from degrading to literal "- [ ]" text.
const taskListToHtml = (node) => {
const items = (node.content || [])
.map((it) => {
const checked = it.attrs?.checked ? "true" : "false";
return `<li data-type="taskItem" data-checked="${checked}">${blockChildrenToHtml(it)}</li>`;
})
.join("");
return `<ul data-type="taskList">${items}</ul>`;
};
// Render a block node to HTML for the raw-HTML containers (spanned tables,
// columns). marked does NOT re-parse markdown inside a raw-HTML block, so
// EVERY block type that can appear inside a column or a spanned cell must be
// emitted as schema-matching HTML here — never as markdown, or it would land
// as literal text on re-import. Nodes whose processNode case already produces
// schema-matching HTML (math/media/embed/attachment/nested columns/spanned
// table) are delegated to processNode; the markdown-emitting cases
// (image/blockquote/callout/details/hr/taskList) get explicit HTML here.
const blockToHtml = (block) => {
const children = block.content || [];
switch (block.type) {
case "paragraph":
return `<p>${inlineToHtml(children)}</p>`;
case "heading": {
const level = block.attrs?.level || 1;
return `<h${level}>${inlineToHtml(children)}</h${level}>`;
}
case "bulletList":
return `<ul>${children
.map((li) => `<li>${blockChildrenToHtml(li)}</li>`)
.join("")}</ul>`;
case "orderedList":
return `<ol>${children
.map((li) => `<li>${blockChildrenToHtml(li)}</li>`)
.join("")}</ol>`;
case "codeBlock": {
const lang = block.attrs?.language || "";
// The code itself is element TEXT content (between <code> tags), so it
// must escape < > & — NOT the attribute escaper. The language rides in
// a class ATTRIBUTE, so it uses escapeAttr.
const code = escapeHtmlText(children
.map(processNode)
.join("")
.replace(/\n+$/, ""));
const cls = lang ? ` class="language-${escapeAttr(lang)}"` : "";
return `<pre><code${cls}>${code}</code></pre>`;
}
case "image":
return imageToHtml(block);
case "blockquote":
return `<blockquote>${children.map(blockToHtml).join("")}</blockquote>`;
case "horizontalRule":
return "<hr>";
case "callout":
return calloutToHtml(block);
case "details":
return detailsToHtml(block);
case "detailsSummary":
return detailsSummaryToHtml(block);
case "detailsContent":
return detailsContentToHtml(block);
case "taskList":
return taskListToHtml(block);
case "taskItem":
// A bare taskItem (outside a taskList) still needs a wrapping list so
// the schema parses it; wrap it in a single-item taskList.
return taskListToHtml({ content: [block] });
// table (incl. spanned), columns/column, math, media, embed, attachment,
// mention, etc. already emit schema-matching HTML from processNode.
case "table":
case "columns":
case "column":
case "mathBlock":
case "video":
case "audio":
case "pdf":
case "youtube":
case "embed":
case "attachment":
case "drawio":
case "excalidraw":
return processNode(block);
default:
// Any still-unhandled block type: NEVER fall back to markdown inside a
// raw-HTML block (it would become literal text). Wrap its rendered
// children in a <div> so their content is preserved; if it has no block
// children, render its inline content instead.
if (children.length && children.some((c) => c.type !== "text")) {
return `<div>${children.map(blockToHtml).join("")}</div>`;
}
return `<div>${inlineToHtml(children)}</div>`;
}
};
// Render the block children of a list item to HTML (a listItem holds block+
// content). Mirrors processListItem but for the HTML fallback path.
const blockChildrenToHtml = (item) => (item.content || []).map((b) => blockToHtml(b)).join("");
// Indent the rendered children of a list item under a marker prefix.
// Each child block is a (possibly multi-line) string. The very first physical
// line of the first child carries the marker (e.g. "- " or "1. "); EVERY
// other line — the remaining lines of the first child AND all lines of every
// subsequent child (nested lists, code blocks, extra paragraphs) — is indented
// to align under the marker. Without indenting these continuation lines, the
// 2nd/3rd line of a nested child collapses to column 0 and escapes the list.
//
// The continuation indent MUST equal the LIST marker width, which is not the
// same as the visible prefix width:
// - bullet "- " -> 2 columns
// - task "- [ ] " -> marker is still "- " (the "[ ] " is content), 2
// - ordered "1. "/"10. " -> 3/4 columns, scaling with the number's digits
// CommonMark anchors nested content to the marker column, so an ordered item
// indented to only 2 columns would be re-parsed as a sibling/loose content on
// re-import. Callers therefore pass the exact indent width to use.
const indentItemChildren = (childStrings, prefix, indentWidth) => {
const indent = " ".repeat(indentWidth);
const lines = [];
childStrings.forEach((child, childIndex) => {
child.split("\n").forEach((line, lineIndex) => {
if (childIndex === 0 && lineIndex === 0) {
// First physical line of the first block gets the marker.
lines.push(`${prefix} ${line}`);
}
else {
// Indent every continuation line by the marker width; keep blank
// lines blank rather than emitting trailing whitespace.
lines.push(line.length ? `${indent}${line}` : "");
}
});
});
return lines.join("\n");
};
const processListItem = (item, prefix) => {
const itemContent = item.content || [];
const childStrings = itemContent.map(processNode);
if (childStrings.length === 0)
return prefix;
// The rendered marker is `${prefix} ` (prefix + one space), so its width —
// and thus the continuation indent — is prefix.length + 1. This is correct
// for both bullet ("-" -> 2) and ordered ("1." -> 3, "10." -> 4) markers,
// since for those the visible prefix IS the list marker.
return indentItemChildren(childStrings, prefix, prefix.length + 1);
};
const processTaskItem = (item) => {
const checked = item.attrs?.checked || false;
const checkbox = checked ? "[x]" : "[ ]";
const prefix = `- ${checkbox}`;
const itemContent = item.content || [];
const childStrings = itemContent.map(processNode);
// An empty task item still needs its checkbox marker; without this guard
// the indent below produces "" and the "- [ ]"/"- [x]" row disappears.
if (childStrings.length === 0)
return prefix;
// The list marker for a task item is just "- " (2 columns); the "[ ] "/"[x] "
// checkbox is item content, NOT part of the marker. So the continuation
// indent is a fixed 2 — do NOT derive it from the wider prefix.length.
return indentItemChildren(childStrings, prefix, 2);
};
return processNode(content).trim();
}

View File

@@ -0,0 +1,104 @@
/**
* Self-contained Docmost-flavoured Markdown document (custom extensions).
*
* A single `.md` file that packages everything needed to losslessly round-trip
* a page through "download -> edit body -> re-upload":
* - a leading `docmost:meta` block: a one-line JSON object with page identity;
* - the Markdown body (carrying inline comment anchors and diagrams as HTML);
* - a trailing `docmost:comments` block: a one-line JSON array of comment
* threads.
*
* Both metadata blocks are HTML comments on purpose: `marked`/`generateJSON`
* drop HTML comments, so even if the WHOLE file were ever fed straight to the
* importer without first stripping the blocks, the metadata cannot leak into the
* document. (A fenced ```docmost-comments``` block would WRONGLY become a
* codeBlock node, so a fenced block is deliberately NOT used.)
*
* The delimiter literals may legitimately appear in the BODY too (e.g. a user
* re-pastes an exported `.md` into a page, or a page documents this very
* format). To stay robust, parsing treats only the FINAL, document-ending
* `docmost:comments` block as metadata: it is the last `<!-- docmost:comments`
* opener whose closing `-->` sits at the very end of the file. Any earlier
* literal occurrence is left in the body untouched.
*
* NOTE on comments: in this version the comment THREAD records are preserved in
* the file but are NOT pushed back to the server on import — only the inline
* comment marks (anchors) embedded in the body are restored. Managing comment
* records stays with the comment tools/UI.
*/
// Match the leading meta block (allow leading whitespace). Capture group 1 is
// the JSON text between the markers.
const META_RE = /^\s*<!--\s*docmost:meta\s*\n([\s\S]*?)\n-->/;
// Match a `docmost:comments` opener. Used globally to scan for the LAST opener
// rather than end-anchoring a single regex (which would mis-capture across a
// literal opener that appears earlier in the body).
const COMMENTS_OPEN_RE = /<!--[ \t]*docmost:comments[ \t]*\r?\n/g;
/**
* Assemble the full self-contained markdown file: meta block, body, and the
* comments block. The meta block is always emitted; the comments block is always
* emitted too (with `[]` when there are no comments) so the format stays uniform
* and parsing stays simple.
*/
export function serializeDocmostMarkdown(meta, body, comments) {
const metaJson = JSON.stringify(meta);
const commentsJson = JSON.stringify(Array.isArray(comments) ? comments : []);
const trimmedBody = (body ?? "").trim();
return (`<!-- docmost:meta\n${metaJson}\n-->\n\n` +
`${trimmedBody}\n\n` +
`<!-- docmost:comments\n${commentsJson}\n-->\n`);
}
/**
* Split a self-contained file back into its parts. Tolerant: if the meta or
* comments block is missing (e.g. a hand-written plain-markdown file), the
* corresponding value is returned as `null` and the whole input is treated as
* the body. This never throws on a MISSING block; only a `JSON.parse` failure
* inside a block that IS present is surfaced as a thrown Error with a clear
* message. Robust to `\r\n` line endings.
*/
export function parseDocmostMarkdown(full) {
// Normalize line endings so the anchored regexes work regardless of CRLF.
const normalized = (full ?? "").replace(/\r\n/g, "\n");
// Extract the leading meta block (start-anchored — already unambiguous).
let meta = null;
let metaEnd = 0;
const metaMatch = normalized.match(META_RE);
if (metaMatch) {
try {
meta = JSON.parse(metaMatch[1]);
}
catch (e) {
throw new Error(`Invalid docmost:meta JSON block: ${e instanceof Error ? e.message : String(e)}`);
}
// Body starts right after the matched meta block.
metaEnd = (metaMatch.index ?? 0) + metaMatch[0].length;
}
// Find the LAST `<!-- docmost:comments` opener; the real file-level block is
// the final one whose closing `-->` ends the document. Any earlier literal
// occurrence inside the body (e.g. a re-pasted export) is left in the body.
let lastOpenStart = -1;
let lastOpenEnd = -1;
let m;
COMMENTS_OPEN_RE.lastIndex = 0;
while ((m = COMMENTS_OPEN_RE.exec(normalized)) !== null) {
lastOpenStart = m.index;
lastOpenEnd = m.index + m[0].length;
}
let comments = null;
let bodyEnd = normalized.length;
if (lastOpenStart !== -1) {
const rest = normalized.slice(lastOpenEnd);
const close = rest.match(/\r?\n-->[ \t]*\r?\n?\s*$/); // closer must end the doc
if (close) {
const jsonText = rest.slice(0, close.index);
try {
comments = JSON.parse(jsonText);
}
catch (e) {
throw new Error(`Invalid docmost:comments JSON block: ${e instanceof Error ? e.message : String(e)}`);
}
bodyEnd = lastOpenStart; // strip from the opener to end of document
}
}
const body = normalized.slice(metaEnd, bodyEnd).trim();
return { meta, body, comments };
}

View File

@@ -0,0 +1,770 @@
/**
* Pure, network-free helpers for manipulating a ProseMirror/TipTap document
* tree by node id.
*
* A ProseMirror node here is a plain JSON object of the shape produced by
* Docmost: `{ type, attrs?, content?, text?, marks? }`. Children live in the
* `content` array; a node carries a stable id in `attrs.id`. Callouts and
* table cells hold their children in `content` just like any other block, so a
* single recursive walk reaches them all.
*
* Every exported function operates on a DEEP CLONE of the input document and
* returns the new document. The input doc and any `newNode`/`node` argument are
* never mutated. All functions are defensively null-safe: missing/!Array
* `content`, non-object nodes, and absent `attrs` are tolerated.
*/
/** Deep-clone a JSON-serializable value without mutating the original. */
function clone(value) {
if (typeof structuredClone === "function") {
return structuredClone(value);
}
// Fallback for environments without structuredClone.
return JSON.parse(JSON.stringify(value));
}
/** True if `value` is a non-null object (and not an array). */
function isObject(value) {
return value != null && typeof value === "object" && !Array.isArray(value);
}
/** True if `node` carries the given id in `node.attrs.id`. */
function matchesId(node, nodeId) {
return isObject(node) && isObject(node.attrs) && node.attrs.id === nodeId;
}
/**
* Recursively concatenate all text contained in a node.
*
* Text nodes contribute their `text` string; container nodes contribute the
* joined `blockPlainText` of their `content` children. Returns "" for nullish
* or non-object inputs.
*/
export function blockPlainText(node) {
if (!isObject(node))
return "";
let out = "";
if (typeof node.text === "string") {
out += node.text;
}
if (Array.isArray(node.content)) {
for (const child of node.content) {
out += blockPlainText(child);
}
}
return out;
}
/** Truncate `text` to at most `n` chars, appending an ellipsis when cut. */
function truncate(text, n) {
return text.length > n ? text.slice(0, n) + "…" : text;
}
/**
* Build a COMPACT outline of the TOP-LEVEL blocks of `doc` (the entries in
* `doc.content`). Deliberately does NOT recurse into paragraphs, list items, or
* table cells — compactness is the point; use `getNodeByRef` to drill into a
* specific block.
*
* Each entry carries `{ index, type, id, firstText }`, plus type-specific
* extras: headings add `level`; tables add `rows`/`cols` and the first row's
* cell texts as `header`; list blocks (types ending in "List") add `items`.
* `firstText` is the block's plain text truncated to 100 chars. Null-safe:
* a missing or non-object doc/content yields `[]`.
*/
export function buildOutline(doc) {
if (!isObject(doc) || !Array.isArray(doc.content))
return [];
const out = [];
for (let i = 0; i < doc.content.length; i++) {
const block = doc.content[i];
const type = isObject(block) ? block.type : undefined;
const entry = {
index: i,
type,
id: isObject(block) && isObject(block.attrs) ? block.attrs.id ?? null : null,
firstText: truncate(blockPlainText(block), 100),
};
if (type === "heading") {
entry.level = isObject(block.attrs) ? block.attrs.level ?? null : null;
}
else if (type === "table") {
const headerRow = block.content?.[0]?.content ?? [];
entry.rows = block.content?.length ?? 0;
entry.cols = block.content?.[0]?.content?.length ?? 0;
entry.header = headerRow.map((cell) => truncate(blockPlainText(cell), 40));
}
else if (typeof type === "string" && type.endsWith("List")) {
entry.items = block.content?.length ?? 0;
}
out.push(entry);
}
return out;
}
/**
* Resolve a single node by reference and return `{ node, path, type }`, or
* `null` when nothing matches.
*
* - `ref` of the form `#<n>` (e.g. `#2`) selects the TOP-LEVEL block at index
* `n` in `doc.content`. This is the only way to address table/tableRow/
* tableCell nodes, which carry no `attrs.id`.
* - Otherwise `ref` is treated as a block id: the FIRST node anywhere in the
* tree with `attrs.id === ref` is returned.
*
* `path` is the array of child indices from the doc root down to the node
* (so a top-level block is `[index]`). The returned `node` is a DEEP CLONE,
* so callers can mutate it without touching the input doc. Null-safe.
*/
export function getNodeByRef(doc, ref) {
if (!isObject(doc))
return null;
// "#<n>": index into the top-level content array.
const indexMatch = typeof ref === "string" ? ref.match(/^#(\d+)$/) : null;
if (indexMatch) {
const index = Number(indexMatch[1]);
const block = Array.isArray(doc.content) ? doc.content[index] : undefined;
if (!isObject(block))
return null;
return { node: clone(block), path: [index], type: block.type };
}
// Otherwise: depth-first search for the first node with attrs.id === ref.
const search = (node, trail) => {
if (!isObject(node))
return null;
if (Array.isArray(node.content)) {
for (let i = 0; i < node.content.length; i++) {
const child = node.content[i];
const path = [...trail, i];
if (matchesId(child, ref)) {
return { node: clone(child), path, type: child.type };
}
const hit = search(child, path);
if (hit != null)
return hit;
}
}
return null;
};
return search(doc, []);
}
/**
* Replace EVERY node whose `attrs.id === nodeId` with a deep clone of
* `newNode`, anywhere in the tree (including inside callouts and table cells).
*
* Operates on a clone of `doc`; returns `{ doc, replaced }` where `replaced`
* is the number of nodes substituted. A fresh clone of `newNode` is used for
* each match so they do not share references.
*/
export function replaceNodeById(doc, nodeId, newNode) {
const out = clone(doc);
let replaced = 0;
// Walk a content array, replacing direct matches and recursing into the
// (possibly new) children of non-matching nodes.
const walkContent = (content) => {
for (let i = 0; i < content.length; i++) {
const child = content[i];
if (matchesId(child, nodeId)) {
content[i] = clone(newNode);
replaced++;
// Do not recurse into a freshly substituted node.
continue;
}
if (isObject(child) && Array.isArray(child.content)) {
walkContent(child.content);
}
}
};
if (isObject(out) && Array.isArray(out.content)) {
walkContent(out.content);
}
return { doc: out, replaced };
}
/**
* Remove EVERY node whose `attrs.id === nodeId` from its parent `content`
* array, anywhere in the tree (recursive, including callouts and tables).
*
* Operates on a clone of `doc`; returns `{ doc, deleted }` where `deleted` is
* the number of nodes removed.
*/
export function deleteNodeById(doc, nodeId) {
const out = clone(doc);
let deleted = 0;
// Filter a content array in place, dropping matches and recursing into the
// surviving children.
const walkContent = (content) => {
const kept = [];
for (const child of content) {
if (matchesId(child, nodeId)) {
deleted++;
continue;
}
if (isObject(child) && Array.isArray(child.content)) {
child.content = walkContent(child.content);
}
kept.push(child);
}
return kept;
};
if (isObject(out) && Array.isArray(out.content)) {
out.content = walkContent(out.content);
}
return { doc: out, deleted };
}
/**
* Deep-clone `doc` and strip every node/mark attribute whose value is strictly
* `undefined`, so the result is safe to hand to Yjs (which throws an opaque
* "Unexpected content type" when asked to store an `undefined` attribute value).
*
* Only `undefined` keys are removed; `null`, `false`, `0`, and `""` are all
* legitimate JSON-storable values and are preserved. Operates on a clone and
* returns it; the input is never mutated. Defensively null-safe like the rest
* of the file.
*/
export function sanitizeForYjs(doc) {
const out = clone(doc);
// Drop every key whose value is strictly `undefined` from an attrs object.
const stripUndefined = (attrs) => {
if (!isObject(attrs))
return;
for (const key of Object.keys(attrs)) {
if (attrs[key] === undefined) {
delete attrs[key];
}
}
};
const walk = (node) => {
if (!isObject(node))
return;
stripUndefined(node.attrs);
if (Array.isArray(node.marks)) {
for (const mark of node.marks) {
if (isObject(mark))
stripUndefined(mark.attrs);
}
}
if (Array.isArray(node.content)) {
for (const child of node.content) {
walk(child);
}
}
};
walk(out);
return out;
}
/**
* Diagnostics helper: walk the tree and return a human-readable path string for
* the FIRST attribute value (in any `node.attrs` or `mark.attrs`) that Yjs
* cannot store — i.e. `undefined`, a `function`, a `symbol`, or a `bigint`
* (e.g. `content[3].content[0].attrs.indent (undefined)`). Returns `null` when
* every attribute is storable. Null-safe.
*/
export function findUnstorableAttr(doc) {
const isUnstorable = (value) => {
if (value === undefined)
return "undefined";
const t = typeof value;
if (t === "function")
return "function";
if (t === "symbol")
return "symbol";
if (t === "bigint")
return "bigint";
return null;
};
// Check an attrs object; return the offending sub-path or null.
const checkAttrs = (attrs, basePath) => {
if (!isObject(attrs))
return null;
for (const key of Object.keys(attrs)) {
const kind = isUnstorable(attrs[key]);
if (kind != null)
return `${basePath}.${key} (${kind})`;
}
return null;
};
const walk = (node, path) => {
if (!isObject(node))
return null;
const attrHit = checkAttrs(node.attrs, `${path}.attrs`);
if (attrHit != null)
return attrHit;
if (Array.isArray(node.marks)) {
for (let i = 0; i < node.marks.length; i++) {
const markHit = checkAttrs(node.marks[i]?.attrs, `${path}.marks[${i}].attrs`);
if (markHit != null)
return markHit;
}
}
if (Array.isArray(node.content)) {
for (let i = 0; i < node.content.length; i++) {
const childHit = walk(node.content[i], `${path}.content[${i}]`);
if (childHit != null)
return childHit;
}
}
return null;
};
// The root doc node carries no useful index, so start the path at "doc".
if (!isObject(doc))
return null;
const attrHit = checkAttrs(doc.attrs, "attrs");
if (attrHit != null)
return attrHit;
if (Array.isArray(doc.content)) {
for (let i = 0; i < doc.content.length; i++) {
const childHit = walk(doc.content[i], `content[${i}]`);
if (childHit != null)
return childHit;
}
}
return null;
}
/**
* Table structural node types and the container each must live directly inside.
* Used by `insertNodeRelative` to splice rows/cells into the correct ancestor
* rather than blindly into the anchor's direct parent (which would corrupt the
* table's nesting).
*/
const STRUCTURAL_TYPES = new Set(["tableRow", "tableCell", "tableHeader"]);
const REQUIRED_CONTAINER = {
tableRow: "table",
tableCell: "tableRow",
tableHeader: "tableRow",
};
/**
* Locate an anchor and return its ancestor chain (from `doc` down to and
* including the matched node). Each chain entry is `{ node, index }` where
* `index` is the node's position inside its parent's `content` array (the root
* doc has index -1). Returns `null` when the anchor cannot be resolved.
*/
function findAnchorChain(doc, opts) {
if (!isObject(doc))
return null;
// DFS by id anywhere in the tree, accumulating the path.
if (opts.anchorNodeId != null) {
const targetId = opts.anchorNodeId;
const search = (node, index, trail) => {
if (!isObject(node))
return null;
const here = [...trail, { node, index }];
if (matchesId(node, targetId))
return here;
if (Array.isArray(node.content)) {
for (let i = 0; i < node.content.length; i++) {
const hit = search(node.content[i], i, here);
if (hit != null)
return hit;
}
}
return null;
};
return search(doc, -1, []);
}
// By text: only top-level blocks are scanned (same rule as the JSON path).
if (opts.anchorText != null && Array.isArray(doc.content)) {
for (let i = 0; i < doc.content.length; i++) {
if (blockPlainText(doc.content[i]).includes(opts.anchorText)) {
return [
{ node: doc, index: -1 },
{ node: doc.content[i], index: i },
];
}
}
}
return null;
}
/**
* Insert a deep clone of `node` relative to an anchor.
*
* - position "append": push the node onto the top-level `doc.content`.
* - position "before"/"after": locate the anchor and splice the node into the
* anchor's parent `content` array immediately before / after it.
*
* Anchor resolution for before/after:
* - if `anchorNodeId` is given, find the node with `attrs.id === anchorNodeId`
* anywhere in the tree (recursive);
* - otherwise, if `anchorText` is given, scan only TOP-LEVEL `doc.content`
* blocks and pick the first whose `blockPlainText` includes `anchorText`.
*
* Operates on a clone of `doc`; returns `{ doc, inserted }`. `inserted` is
* false when the anchor could not be resolved (the doc is returned unchanged
* apart from being cloned).
*/
export function insertNodeRelative(doc, node, opts) {
const out = clone(doc);
const fresh = clone(node);
// Defensive: stay null-safe like the other exports — a missing opts means
// there is nothing actionable to do.
if (!isObject(opts))
return { doc: out, inserted: false };
const isStructural = isObject(node) && STRUCTURAL_TYPES.has(node.type);
// "append": top-level push.
if (opts.position === "append") {
// Structural table nodes (tableRow/tableCell/tableHeader) cannot live at the
// top level — appending one would produce invalid nesting.
if (isStructural) {
throw new Error(`insert_node: cannot append a ${node.type} at the top level; use ` +
`position before/after with an anchor inside the target table`);
}
if (isObject(out)) {
if (!Array.isArray(out.content))
out.content = [];
out.content.push(fresh);
return { doc: out, inserted: true };
}
return { doc: out, inserted: false };
}
const offset = opts.position === "after" ? 1 : 0;
// Structural insert (before/after a tableRow/tableCell/tableHeader): splice
// into the nearest enclosing table/tableRow rather than the anchor's direct
// parent, so the row/cell lands at the correct level of the table.
if (isStructural) {
const containerType = REQUIRED_CONTAINER[node.type];
const chain = findAnchorChain(out, opts);
// Anchor not resolved at all — keep the existing "anchor not found" path.
if (chain == null)
return { doc: out, inserted: false };
// Find the DEEPEST ancestor (including the anchor itself) of the required
// container type.
let containerIdx = -1;
for (let i = chain.length - 1; i >= 0; i--) {
if (isObject(chain[i].node) && chain[i].node.type === containerType) {
containerIdx = i;
break;
}
}
if (containerIdx === -1) {
throw new Error(`insert_node: cannot insert a ${node.type} here — the anchor is not ` +
`inside a ${containerType}. Anchor on a cell's text or a block id ` +
`that lives inside the target table.`);
}
const container = chain[containerIdx].node;
if (!Array.isArray(container.content))
container.content = [];
if (containerIdx === chain.length - 1) {
// The matched container IS the anchor node itself (e.g. anchorText
// resolved to the table block): append/prepend within it.
const at = opts.position === "after" ? container.content.length : 0;
container.content.splice(at, 0, fresh);
}
else {
// The immediate child on the path leading to the anchor is the row/cell
// to splice next to.
const enclosingChildIndex = chain[containerIdx + 1].index;
container.content.splice(enclosingChildIndex + offset, 0, fresh);
}
return { doc: out, inserted: true };
}
// Resolve by id anywhere in the tree: splice into the parent content array.
if (opts.anchorNodeId != null) {
let inserted = false;
const walkContent = (content) => {
for (let i = 0; i < content.length; i++) {
const child = content[i];
if (matchesId(child, opts.anchorNodeId)) {
content.splice(i + offset, 0, fresh);
inserted = true;
return;
}
if (isObject(child) && Array.isArray(child.content)) {
walkContent(child.content);
if (inserted)
return;
}
}
};
if (isObject(out) && Array.isArray(out.content)) {
walkContent(out.content);
}
return { doc: out, inserted };
}
// Resolve by text: only top-level doc.content blocks are scanned.
if (opts.anchorText != null && isObject(out) && Array.isArray(out.content)) {
for (let i = 0; i < out.content.length; i++) {
if (blockPlainText(out.content[i]).includes(opts.anchorText)) {
out.content.splice(i + offset, 0, fresh);
return { doc: out, inserted: true };
}
}
}
return { doc: out, inserted: false };
}
// ===========================================================================
// Table editing helpers
//
// A Docmost table is a ProseMirror subtree with NO ids on the structural nodes:
// table -> { type:"table", content:[tableRow...] }
// row -> { type:"tableRow", content:[tableCell|tableHeader...] }
// cell -> { type:"tableCell"|"tableHeader", attrs:{colspan,rowspan,colwidth},
// content:[paragraph...] }
// para -> { type:"paragraph", attrs:{id,indent}, content:[textNode...] }
// Only paragraphs/headings carry an `attrs.id`, so a cell is addressed via the
// id of the paragraph inside it. The helpers below all operate on a DEEP CLONE
// of the input doc (via `clone`) and never mutate their inputs.
// ===========================================================================
/**
* Collect EVERY `attrs.id` present anywhere in `node` into `used`. Used to seed
* `makeFreshId` so generated paragraph ids never collide with existing ones.
*/
function collectIds(node, used) {
if (!isObject(node))
return;
if (isObject(node.attrs) && typeof node.attrs.id === "string") {
used.add(node.attrs.id);
}
if (Array.isArray(node.content)) {
for (const child of node.content)
collectIds(child, used);
}
}
/**
* Fresh-id generator: returns a random Docmost-style id (12 chars from
* lowercase `a-z0-9`) that is not already in `used`, and records it. On the
* rare collision the id is regenerated. Callers rely on uniqueness, not on the
* exact string, so randomness is fine — and unlike a module-local counter it
* needs no reset and cannot become predictable across calls.
*/
function makeFreshId(used) {
const alphabet = "abcdefghijklmnopqrstuvwxyz0123456789";
let id;
do {
id = "";
for (let i = 0; i < 12; i++) {
id += alphabet[Math.floor(Math.random() * alphabet.length)];
}
} while (used.has(id) || id === "");
used.add(id);
return id;
}
/**
* Resolve a table reference against an ALREADY-CLONED doc and return the LIVE
* table node (a reference inside `rootClone`, so the caller may mutate it) plus
* its index path. Returns null when no table matches.
*
* - `#<n>`: the top-level block at index `n`, only if its `type === "table"`.
* - otherwise: DFS for the node with `attrs.id === tableRef`, then walk UP its
* ancestor chain to the nearest `type === "table"` ancestor.
*/
function locateTable(rootClone, tableRef) {
if (!isObject(rootClone))
return null;
// "#<n>": index into the top-level content array; must be a table.
const indexMatch = typeof tableRef === "string" ? tableRef.match(/^#(\d+)$/) : null;
if (indexMatch) {
const index = Number(indexMatch[1]);
const block = Array.isArray(rootClone.content)
? rootClone.content[index]
: undefined;
if (isObject(block) && block.type === "table") {
return { table: block, path: [index] };
}
return null;
}
// Otherwise: DFS for attrs.id === tableRef, tracking the ancestor chain, then
// climb to the nearest enclosing table.
const search = (node, trail) => {
if (!isObject(node))
return null;
if (Array.isArray(node.content)) {
for (let i = 0; i < node.content.length; i++) {
const child = node.content[i];
const here = [...trail, { node: child, index: i }];
if (matchesId(child, tableRef)) {
// Walk UP to the nearest table ancestor (including the match itself).
for (let j = here.length - 1; j >= 0; j--) {
if (isObject(here[j].node) && here[j].node.type === "table") {
return {
table: here[j].node,
path: here.slice(0, j + 1).map((e) => e.index),
};
}
}
return null; // id found but no enclosing table
}
const hit = search(child, here);
if (hit != null)
return hit;
}
}
return null;
};
return search(rootClone, []);
}
/** Build the plain-text → single-paragraph cell content used by all writers. */
function makeCellParagraph(id, text) {
return {
type: "paragraph",
attrs: { id, indent: 0 },
// Empty string → a paragraph with an empty content array.
content: text ? [{ type: "text", text }] : [],
};
}
/**
* Read a table as a matrix. Returns null when `tableRef` resolves to no table.
*
* - `rows`/`cols`: the table's row count and the column count of its FIRST row.
* Tables may be ragged (rows of differing length), so `cols` reflects only
* row 0; use the per-row length of `cells`/`cellIds` for each row's actual
* width.
* - `cells`: `string[][]` of each cell's `blockPlainText`.
* - `cellIds`: `(string|null)[][]` of each cell's FIRST paragraph id (or null),
* so callers can `patch_node` a cell for rich-formatted edits.
* - `path`: index path of the table within the doc.
*/
export function readTable(doc, tableRef) {
const root = clone(doc);
const located = locateTable(root, tableRef);
if (located == null)
return null;
const { table, path } = located;
const rowNodes = Array.isArray(table.content) ? table.content : [];
const rows = rowNodes.length;
const cols = rowNodes[0]?.content?.length ?? 0;
const cells = [];
const cellIds = [];
for (const rowNode of rowNodes) {
const cellNodes = Array.isArray(rowNode?.content) ? rowNode.content : [];
const rowText = [];
const rowIds = [];
for (const cellNode of cellNodes) {
rowText.push(blockPlainText(cellNode));
// The cell's first paragraph carries the id used for patch_node.
const firstPara = Array.isArray(cellNode?.content)
? cellNode.content[0]
: undefined;
const id = isObject(firstPara) && isObject(firstPara.attrs)
? firstPara.attrs.id ?? null
: null;
rowIds.push(id);
}
cells.push(rowText);
cellIds.push(rowIds);
}
return { rows, cols, cells, cellIds, path };
}
/**
* Insert a row of plain-text cells into a table. Returns `{ doc, inserted }`.
*
* The row is padded to the table's column count (`cells[i] ?? ""`); supplying
* MORE cells than columns throws. Each new cell copies `colwidth` for its
* column from the header row when present, gets a fresh-id paragraph, and a
* `colspan:1, rowspan:1` attrs. `index` (when an integer in `[0, rows]`) splices
* the row there; otherwise the row is appended at the end.
*/
export function insertTableRow(doc, tableRef, cells, index) {
const out = clone(doc);
const located = locateTable(out, tableRef);
if (located == null)
return { doc: out, inserted: false };
const { table } = located;
if (!Array.isArray(table.content))
table.content = [];
const rows = table.content.length;
const headerRow = table.content[0];
const headerCells = Array.isArray(headerRow?.content) ? headerRow.content : [];
// Column count is the WIDEST existing row, so the guard below stays
// meaningful for ragged tables and the new row matches the table's width.
// Fall back to the supplied cell count only when the table has no rows.
let colCount = 0;
for (const r of table.content) {
if (isObject(r) && Array.isArray(r.content))
colCount = Math.max(colCount, r.content.length);
}
if (colCount === 0)
colCount = Array.isArray(cells) ? cells.length : 0;
if (Array.isArray(cells) && cells.length > colCount) {
throw new Error(`table_insert_row: got ${cells.length} cell(s) but the table has ${colCount} column(s)`);
}
// Resolve the landing index up front so the cell-type decision and the splice
// below agree: a valid integer in [0, rows] splices there, else we append.
const landingIndex = typeof index === "number" && Number.isInteger(index) && index >= 0 && index <= rows
? index
: rows;
// Seed the id generator with every id already in the doc so the new cell
// paragraph ids are unique within the whole document.
const used = new Set();
collectIds(out, used);
const newCells = [];
for (let i = 0; i < colCount; i++) {
const text = (Array.isArray(cells) ? cells[i] : undefined) ?? "";
const attrs = { colspan: 1, rowspan: 1 };
// Copy this column's colwidth from the header row's cell when present.
const colwidth = headerCells[i]?.attrs?.colwidth;
if (colwidth !== undefined)
attrs.colwidth = colwidth;
// A row landing at index 0 becomes the new header row, so inherit the
// current header cell's type per column (Docmost uses "tableHeader" there);
// every other position is a plain data cell.
const cellType = landingIndex === 0 ? headerCells[i]?.type ?? "tableCell" : "tableCell";
newCells.push({
type: cellType,
attrs,
content: [makeCellParagraph(makeFreshId(used), text)],
});
}
const newRow = { type: "tableRow", content: newCells };
// Splice at the resolved landing index (append when index was omitted/invalid).
table.content.splice(landingIndex, 0, newRow);
return { doc: out, inserted: true };
}
/**
* Delete the row at 0-based `index` from a table. Returns `{ doc, deleted }`.
* `deleted` is false only when the table cannot be located. Throws on an
* out-of-range index, and refuses to delete the table's only row.
*/
export function deleteTableRow(doc, tableRef, index) {
const out = clone(doc);
const located = locateTable(out, tableRef);
if (located == null)
return { doc: out, deleted: false };
const { table } = located;
if (!Array.isArray(table.content))
table.content = [];
const rows = table.content.length;
if (!Number.isInteger(index) || index < 0 || index >= rows) {
throw new Error(`table_delete_row: row index ${index} out of range (table has ${rows} row(s))`);
}
if (rows <= 1) {
throw new Error("table_delete_row: refusing to delete the only row of the table");
}
table.content.splice(index, 1);
return { doc: out, deleted: true };
}
/**
* Set the plain-text content of cell `[row, col]` (0-based) to `text`. Returns
* `{ doc, updated }`; `updated` is false only when the table cannot be located.
* Throws when `row`/`col` is out of range. The cell's own attrs (colspan/
* rowspan/colwidth) are preserved; its content becomes a single text paragraph
* that reuses the cell's existing first-paragraph id when present, else a fresh
* one.
*/
export function updateTableCell(doc, tableRef, row, col, text) {
const out = clone(doc);
const located = locateTable(out, tableRef);
if (located == null)
return { doc: out, updated: false };
const { table } = located;
const rowNodes = Array.isArray(table.content) ? table.content : [];
const rows = rowNodes.length;
const rowNode = rowNodes[row];
const cols = isObject(rowNode) && Array.isArray(rowNode.content)
? rowNode.content.length
: 0;
if (!Number.isInteger(row) ||
row < 0 ||
row >= rows ||
!Number.isInteger(col) ||
col < 0 ||
col >= cols) {
throw new Error(`table_update_cell: cell [${row},${col}] out of range`);
}
const cellNode = rowNode.content[col];
// Reuse the cell's existing first-paragraph id, or mint a fresh unique one.
const existingPara = Array.isArray(cellNode?.content)
? cellNode.content[0]
: undefined;
let id = isObject(existingPara) && isObject(existingPara.attrs)
? existingPara.attrs.id
: undefined;
if (typeof id !== "string" || id.length === 0) {
const used = new Set();
collectIds(out, used);
id = makeFreshId(used);
}
cellNode.content = [makeCellParagraph(id, text)];
return { doc: out, updated: true };
}

View File

@@ -0,0 +1,31 @@
/**
* Per-page async mutex.
*
* Content writes over the collaboration websocket must never overlap for the
* same page: two concurrent full-document replaces would race on the live Yjs
* fragment. We serialize them with a per-pageId promise chain — each new
* operation waits for the previous one on that page to settle (success or
* failure) before it runs. Different pages never block each other.
*/
const chains = new Map();
// The returned promise carries the real result/rejection of `fn` and MUST be
// awaited/handled by the caller; only the internal chaining tail swallows
// errors (purely to gate ordering).
export function withPageLock(pageId, fn) {
// Wait for the previous op on this page; swallow its error so a failure does
// not poison the queue for the next caller.
const prev = (chains.get(pageId) ?? Promise.resolve()).catch(() => { });
const run = prev.then(fn);
// The tail used for chaining must also swallow errors (it only gates order).
const tail = run.catch(() => { });
chains.set(pageId, tail);
// Drop the map entry once this op is the tail and has settled, to avoid an
// unbounded map of resolved promises.
tail.then(() => {
if (chains.get(pageId) === tail) {
chains.delete(pageId);
}
});
// Callers get the real result/rejection of fn.
return run;
}

View File

@@ -0,0 +1,405 @@
/**
* Pure, network-free transform primitives for a ProseMirror/TipTap document
* tree, plus one higher-level orchestration (commentsToFootnotes).
*
* A ProseMirror node here is a plain JSON object of the shape produced by
* Docmost: `{ type, attrs?, content?, text?, marks? }`. Children live in the
* `content` array; callouts, tables, lists all hold their children in
* `content`, so a single recursive walk reaches them all.
*
* Conventions (matching node-ops.ts):
* - functions that produce a new document deep-clone their input and return a
* `{ doc, ... }` object; the caller's objects are never mutated.
* - functions are defensively null-safe.
* - `marks` arrays are preserved verbatim when fragments are split/reordered.
*/
import { blockPlainText } from "./node-ops.js";
/** Deep-clone a JSON-serializable value without mutating the original. */
function clone(value) {
if (typeof structuredClone === "function") {
return structuredClone(value);
}
// Fallback for environments without structuredClone.
return JSON.parse(JSON.stringify(value));
}
/** True if `value` is a non-null object (and not an array). */
function isObject(value) {
return value != null && typeof value === "object" && !Array.isArray(value);
}
/**
* Plain text of a node (re-export of node-ops' blockPlainText so transform
* authors have a single import surface). Recurses through nested content.
*/
export function blockText(node) {
return blockPlainText(node);
}
/**
* Depth-first visit of every node in the tree, including the root and the
* nested content of callouts, tables, lists, etc. `fn` is called once per node.
* Null-safe: a nullish or non-object node is ignored.
*/
export function walk(node, fn) {
if (!isObject(node))
return;
fn(node);
if (Array.isArray(node.content)) {
for (const child of node.content) {
walk(child, fn);
}
}
}
/**
* Find the FIRST node (depth-first) matching `predicate`, anywhere in the tree.
* Works even when the node carries no `attrs.id` (it searches the raw tree, not
* an id index). Returns the live node reference inside `doc` (NOT a clone), or
* null when nothing matches. Typical use: `getList(doc, n => n.type ===
* "orderedList")`.
*/
export function getList(doc, predicate) {
let found = null;
walk(doc, (node) => {
if (found == null && predicate(node)) {
found = node;
}
});
return found;
}
/**
* Insert `marker` as a PLAIN (unmarked) text run right after the first
* occurrence of `anchor`.
*
* The text run that contains the END of the anchor is SPLIT at the anchor end,
* so all existing marks (links, bold, ...) on the surrounding text are
* preserved, while the inserted marker run carries NO marks. The marker is
* inserted as a leading-space-padded run (`" " + marker`) so it visually
* separates from the preceding word.
*
* The anchor is matched against the concatenated plain text of each top-level
* block (so an anchor that spans several text/mark runs still matches). The
* insertion happens inside the inline content array that holds the anchor's
* final character.
*
* Operates on a clone of `doc`; returns `{ doc, inserted }`. `inserted` is
* false when the anchor text was not found in any in-scope block.
*/
export function insertMarkerAfter(doc, anchor, marker, opts = {}) {
const out = clone(doc);
if (!isObject(out) || !Array.isArray(out.content) || !anchor) {
return { doc: out, inserted: false };
}
const limit = typeof opts.beforeBlock === "number"
? Math.min(opts.beforeBlock, out.content.length)
: out.content.length;
for (let b = 0; b < limit; b++) {
const block = out.content[b];
if (!isObject(block))
continue;
// Quick reject: skip blocks whose plain text cannot contain the anchor.
if (!blockPlainText(block).includes(anchor))
continue;
// Walk the inline content arrays inside this block, tracking a running
// character offset so we can locate the inline array + text run that holds
// the END of the anchor's first occurrence.
let inserted = false;
let offset = 0; // characters of plain text seen so far in this block
const anchorEnd = (() => blockPlainText(block).indexOf(anchor) + anchor.length)();
// Recurse into inline-bearing containers (paragraph, heading, table cell,
// callout child paragraphs, ...). We only split inside an array of inline
// nodes (text/inline atoms); the FIRST array whose cumulative range covers
// anchorEnd receives the split + marker.
const visit = (container) => {
if (inserted || !isObject(container) || !Array.isArray(container.content)) {
return;
}
const inline = container.content;
// Detect whether this array is an inline array (contains text nodes).
const hasText = inline.some((n) => isObject(n) && n.type === "text");
if (hasText) {
for (let i = 0; i < inline.length; i++) {
const n = inline[i];
const len = isObject(n) ? blockPlainText(n).length : 0;
const runStart = offset;
const runEnd = offset + len;
// The run that contains the anchor end (anchorEnd lands inside this
// run, i.e. runStart < anchorEnd <= runEnd) is the split point.
if (!inserted &&
isObject(n) &&
n.type === "text" &&
typeof n.text === "string" &&
anchorEnd > runStart &&
anchorEnd <= runEnd) {
const cut = anchorEnd - runStart; // split index within this text run
const before = n.text.slice(0, cut);
const after = n.text.slice(cut);
const marks = Array.isArray(n.marks) ? n.marks : [];
const parts = [];
if (before.length > 0) {
parts.push({ ...n, text: before, marks: [...marks] });
}
// Marker is a PLAIN run: no marks copied. Leading space separates it.
parts.push({ type: "text", text: " " + marker });
if (after.length > 0) {
parts.push({ ...n, text: after, marks: [...marks] });
}
inline.splice(i, 1, ...parts);
inserted = true;
return;
}
offset = runEnd;
}
}
else {
// Not an inline array: recurse into children (e.g. callout -> paragraph).
for (const child of inline) {
visit(child);
if (inserted)
return;
}
}
};
visit(block);
if (inserted) {
return { doc: out, inserted: true };
}
// If the block matched in plain text but we could not split (e.g. anchor
// lands inside an atom), fall through to the next block rather than failing.
}
return { doc: out, inserted: false };
}
/**
* In the disclaimer callout, replace a `[1]…[K]` range marker with `[1]…[n]`.
*
* Docmost translations use a callout that states the footnote range, e.g.
* "[1]…[5]". When the number of notes changes, this rewrites the trailing
* number of any `[1]…[K]` (or `[1]...[K]`, ASCII ellipsis) occurrence found in a
* callout's text nodes to `[1]…[n]`. Operates on a clone; returns
* `{ doc, changed }` where `changed` is the number of text nodes rewritten.
*/
export function setCalloutRange(doc, n) {
const out = clone(doc);
let changed = 0;
// Match "[1]" + (… or ...) + "[<digits>]"; rewrite the last number to n.
const rangeRe = /(\[1\]\s*(?:…|\.\.\.)\s*\[)\d+(\])/g;
walk(out, (node) => {
if (node.type === "callout") {
walk(node, (inner) => {
if (inner.type === "text" &&
typeof inner.text === "string" &&
rangeRe.test(inner.text)) {
rangeRe.lastIndex = 0;
inner.text = inner.text.replace(rangeRe, `$1${n}$2`);
changed++;
}
rangeRe.lastIndex = 0;
});
}
});
return { doc: out, changed };
}
/**
* Generate a short random id for a new block's `attrs.id`. Docmost uses nanoid;
* a base36 random string is sufficient here (uniqueness within one document).
*/
function freshId() {
return (Math.random().toString(36).slice(2, 12) +
Math.random().toString(36).slice(2, 6));
}
/**
* Wrap inline ProseMirror nodes in a list item:
* { type:"listItem", content:[{ type:"paragraph", attrs:{id}, content: inlineNodes }] }
* with a fresh random block id on the paragraph. The inline nodes are cloned so
* the result shares no references with the caller's input.
*/
export function noteItem(inlineNodes) {
const content = Array.isArray(inlineNodes) ? clone(inlineNodes) : [];
return {
type: "listItem",
content: [
{
type: "paragraph",
attrs: { id: freshId() },
content,
},
],
};
}
/**
* Convert a comment's markdown (e.g. `**Lead.** body...`) into inline
* ProseMirror nodes.
*
* A leading `комментарий: ` (case-insensitive) or `N. ` numeric prefix is
* stripped first. Then a minimal bold-split is applied: a leading
* `**bold lead**` run becomes a text node with a bold mark, and the remainder
* becomes a plain text node. This keeps the conversion synchronous (the
* transform sandbox runs synchronously) and dependency-free; the existing
* async markdownToProseMirror is intentionally NOT used here.
*/
export function mdToInlineNodes(markdown) {
let md = typeof markdown === "string" ? markdown : "";
// Strip a leading "комментарий: " prefix (case-insensitive) or a "N. " prefix.
md = md.replace(/^\s*комментарий\s*:\s*/i, "");
md = md.replace(/^\s*\d+\.\s+/, "");
md = md.trim();
if (md === "")
return [];
const nodes = [];
// Leading bold lead: **...** at the very start.
const leadMatch = /^\*\*([^*]+)\*\*\s*/.exec(md);
if (leadMatch) {
const leadText = leadMatch[1];
nodes.push({
type: "text",
text: leadText,
marks: [{ type: "bold" }],
});
const rest = md.slice(leadMatch[0].length);
if (rest.length > 0) {
// Preserve the separating space that followed the bold lead.
const sep = /^\*\*[^*]+\*\*(\s*)/.exec(md);
const spacing = sep ? sep[1] : "";
nodes.push({ type: "text", text: spacing + rest });
}
return nodes;
}
// No bold lead: emit the whole thing as a single plain text node, with any
// remaining **bold** spans split out inline.
return splitInlineBold(md);
}
/**
* Split a string with inline `**bold**` spans into text nodes, bolding the
* spans. Used as the no-lead fallback in mdToInlineNodes.
*/
function splitInlineBold(text) {
const nodes = [];
const re = /\*\*([^*]+)\*\*/g;
let last = 0;
let m;
while ((m = re.exec(text)) !== null) {
if (m.index > last) {
nodes.push({ type: "text", text: text.slice(last, m.index) });
}
nodes.push({ type: "text", text: m[1], marks: [{ type: "bold" }] });
last = m.index + m[0].length;
}
if (last < text.length) {
nodes.push({ type: "text", text: text.slice(last) });
}
return nodes.length > 0 ? nodes : [{ type: "text", text }];
}
/**
* Turn inline comments into numbered footnotes.
*
* For each inline comment that carries a `selection`:
* 1. insert a placeholder marker (a NUL-delimited "\u0000FN<i>\u0000"
* sentinel) right after the selection text in the BODY (before the
* notes heading);
* 2. build a note list item from the comment's markdown content.
*
* Then RENUMBER every footnote marker in the body by reading order: existing
* `[N]` markers and the new "\u0000FN<i>\u0000" placeholders are both replaced by a
* sequential `[seq]`, and the notes orderedList is reordered so each note lines
* up with its marker's reading-order position. Finally the disclaimer callout
* range is synced to the new note count.
*
* Returns `{ doc, consumed }` where `consumed` lists the ids of comments that
* were successfully anchored (their selection was found and a placeholder
* inserted). Operates on a clone of `doc`.
*/
export function commentsToFootnotes(doc, comments, opts = {}) {
let working = clone(doc);
const notesHeading = opts.notesHeading ?? "Примечания переводчика";
const top = Array.isArray(working.content) ? working.content : [];
const notesIdx = top.findIndex((n) => isObject(n) && n.type === "heading" && blockText(n).trim() === notesHeading);
if (notesIdx < 0) {
throw new Error(`heading "${notesHeading}" not found`);
}
// The notes orderedList lives at or after the heading.
const notesList = top
.slice(notesIdx)
.find((n) => isObject(n) && n.type === "orderedList");
if (!notesList) {
throw new Error("notes orderedList not found");
}
const consumed = [];
const noteByPh = new Map();
(Array.isArray(comments) ? comments : []).forEach((c, i) => {
if (!c || !c.selection)
return;
// Collision-proof sentinel delimited by NUL control chars, which never occur
// in real Docmost prose — so the renumber regex below cannot mistake any body
// text (e.g. "Press F1 for help", model "FN2") for a placeholder. The NUL is
// transient: the placeholder round-trips within this function (insertMarkerAfter
// inserts it, the renumber pass replaces it with "[N]"), so it never persists
// in a returned/pushed document.
const ph = `\u0000FN${i}\u0000`;
// insertMarkerAfter returns a NEW cloned doc; reassign `working` and refresh
// the `top` / `notesList` references that point into it.
const r = insertMarkerAfter(working, c.selection.trimEnd(), ph, {
beforeBlock: notesIdx,
});
if (!r.inserted)
return;
working = r.doc;
noteByPh.set(ph, noteItem(mdToInlineNodes(c.content)));
consumed.push(c.id);
});
// Re-resolve references into the (possibly re-cloned) working doc.
const top2 = Array.isArray(working.content) ? working.content : [];
const notesList2 = top2
.slice(notesIdx)
.find((n) => isObject(n) && n.type === "orderedList");
if (!notesList2) {
throw new Error("notes orderedList not found");
}
const oldNotes = Array.isArray(notesList2.content)
? notesList2.content
: [];
const newNotes = [];
let seq = 0;
// Match either an existing "[N]" marker or a NUL-delimited "\u0000FN<i>\u0000"
// placeholder, in reading order across the body (blocks before the notes heading).
const re = /\[(\d+)\]|\u0000FN(\d+)\u0000/g;
// Same range regex setCalloutRange uses to detect the disclaimer callout's
// "[1]…[K]" range; used here to decide whether a top-level callout is the
// disclaimer (skip) or an ordinary callout (renumber normally).
const disclaimerRangeRe = /(\[1\]\s*(?:…|\.\.\.)\s*\[)\d+(\])/;
for (let i = 0; i < notesIdx; i++) {
// Skip ONLY the disclaimer callout: its "[1]…[K]" range is NOT a footnote
// marker and is synced separately by setCalloutRange. Renumbering it here
// would consume note slots and corrupt the sequence. Other top-level
// callouts may carry legitimate "[N]" body markers and are renumbered.
if (isObject(top2[i]) &&
top2[i].type === "callout" &&
disclaimerRangeRe.test(blockText(top2[i]))) {
continue;
}
walk(top2[i], (node) => {
if (node.type !== "text" || typeof node.text !== "string")
return;
node.text = node.text.replace(re, (_m, oldNum, phIdx) => {
if (oldNum != null) {
const note = oldNotes[Number(oldNum) - 1];
// Every existing body marker MUST map to a real note. An out-of-range
// marker means the document is internally inconsistent; fail loudly
// rather than silently dropping the note and desyncing the callout.
if (note === undefined) {
throw new Error(`footnote [${oldNum}] has no matching note (notes list has ${oldNotes.length} items); document is inconsistent`);
}
newNotes.push(note);
}
else {
newNotes.push(noteByPh.get(`\u0000FN${phIdx}\u0000`));
}
return `[${++seq}]`;
});
});
}
// Reorder the notes list IN PLACE on `working` first, THEN sync the callout
// range. setCalloutRange clones `working`, so the reordered notes (mutated
// before the clone) are carried into its result automatically. No null-filter
// here: marker count and note count must stay exactly equal (the out-of-range
// guard above guarantees no undefined entry is ever pushed).
notesList2.content = newNotes;
const synced = setCalloutRange(working, notesList2.content.length);
return { doc: synced.doc, consumed };
}

40
packages/mcp/build/stdio.js Executable file
View File

@@ -0,0 +1,40 @@
#!/usr/bin/env node
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { createDocmostMcpServer } from "./index.js";
// Standalone stdio entrypoint. This restores the original behavior of the
// package when run as a CLI (`docmost-mcp`): it reads credentials from the
// environment and serves the MCP protocol over stdin/stdout. The factory in
// index.ts stays side-effect-free; all the process/transport lifecycle lives
// here.
const API_URL = process.env.DOCMOST_API_URL;
const EMAIL = process.env.DOCMOST_EMAIL;
const PASSWORD = process.env.DOCMOST_PASSWORD;
if (!API_URL || !EMAIL || !PASSWORD) {
console.error("Error: DOCMOST_API_URL, DOCMOST_EMAIL, and DOCMOST_PASSWORD environment variables are required.");
process.exit(1);
}
async function run() {
// Global safety nets so a stray rejection/exception cannot silently kill
// the stdio server. Per-tool errors still flow through the SDK and are not
// affected by these handlers; these only catch errors raised OUTSIDE a tool
// call (e.g. a transient ws/collab socket "error" event). Such errors must
// NOT tear down the whole stdio server, so we log only and keep running.
// Genuine startup failures are still fatal via run().catch(...) below.
process.on("unhandledRejection", (reason) => {
console.error("Unhandled promise rejection:", reason);
});
process.on("uncaughtException", (error) => {
console.error("Uncaught exception:", error);
});
const server = createDocmostMcpServer({
apiUrl: API_URL,
email: EMAIL,
password: PASSWORD,
});
const transport = new StdioServerTransport();
await server.connect(transport);
}
run().catch((error) => {
console.error("Fatal error running server:", error);
process.exit(1);
});

View File

@@ -0,0 +1,13 @@
{
"mcpServers": {
"docmost-local": {
"command": "node",
"args": ["./build/index.js"],
"env": {
"DOCMOST_API_URL": "http://localhost:3000/api",
"DOCMOST_EMAIL": "test@docmost.com",
"DOCMOST_PASSWORD": "test"
}
}
}
}

17
packages/mcp/node_modules/.bin/marked generated vendored Executable file
View File

@@ -0,0 +1,17 @@
#!/bin/sh
basedir=$(dirname "$(echo "$0" | sed -e 's,\\,/,g')")
case `uname` in
*CYGWIN*) basedir=`cygpath -w "$basedir"`;;
esac
if [ -z "$NODE_PATH" ]; then
export NODE_PATH="/Users/vvzvlad/Data/Projects/gitmost/node_modules/.pnpm/marked@17.0.5/node_modules/marked/bin/node_modules:/Users/vvzvlad/Data/Projects/gitmost/node_modules/.pnpm/marked@17.0.5/node_modules/marked/node_modules:/Users/vvzvlad/Data/Projects/gitmost/node_modules/.pnpm/marked@17.0.5/node_modules:/Users/vvzvlad/Data/Projects/gitmost/node_modules/.pnpm/node_modules"
else
export NODE_PATH="/Users/vvzvlad/Data/Projects/gitmost/node_modules/.pnpm/marked@17.0.5/node_modules/marked/bin/node_modules:/Users/vvzvlad/Data/Projects/gitmost/node_modules/.pnpm/marked@17.0.5/node_modules/marked/node_modules:/Users/vvzvlad/Data/Projects/gitmost/node_modules/.pnpm/marked@17.0.5/node_modules:/Users/vvzvlad/Data/Projects/gitmost/node_modules/.pnpm/node_modules:$NODE_PATH"
fi
if [ -x "$basedir/node" ]; then
exec "$basedir/node" "$basedir/../marked/bin/marked.js" "$@"
else
exec node "$basedir/../marked/bin/marked.js" "$@"
fi

17
packages/mcp/node_modules/.bin/tsc generated vendored Executable file
View File

@@ -0,0 +1,17 @@
#!/bin/sh
basedir=$(dirname "$(echo "$0" | sed -e 's,\\,/,g')")
case `uname` in
*CYGWIN*) basedir=`cygpath -w "$basedir"`;;
esac
if [ -z "$NODE_PATH" ]; then
export NODE_PATH="/Users/vvzvlad/Data/Projects/gitmost/node_modules/.pnpm/typescript@5.9.3/node_modules/typescript/bin/node_modules:/Users/vvzvlad/Data/Projects/gitmost/node_modules/.pnpm/typescript@5.9.3/node_modules/typescript/node_modules:/Users/vvzvlad/Data/Projects/gitmost/node_modules/.pnpm/typescript@5.9.3/node_modules:/Users/vvzvlad/Data/Projects/gitmost/node_modules/.pnpm/node_modules"
else
export NODE_PATH="/Users/vvzvlad/Data/Projects/gitmost/node_modules/.pnpm/typescript@5.9.3/node_modules/typescript/bin/node_modules:/Users/vvzvlad/Data/Projects/gitmost/node_modules/.pnpm/typescript@5.9.3/node_modules/typescript/node_modules:/Users/vvzvlad/Data/Projects/gitmost/node_modules/.pnpm/typescript@5.9.3/node_modules:/Users/vvzvlad/Data/Projects/gitmost/node_modules/.pnpm/node_modules:$NODE_PATH"
fi
if [ -x "$basedir/node" ]; then
exec "$basedir/node" "$basedir/../typescript/bin/tsc" "$@"
else
exec node "$basedir/../typescript/bin/tsc" "$@"
fi

17
packages/mcp/node_modules/.bin/tsserver generated vendored Executable file
View File

@@ -0,0 +1,17 @@
#!/bin/sh
basedir=$(dirname "$(echo "$0" | sed -e 's,\\,/,g')")
case `uname` in
*CYGWIN*) basedir=`cygpath -w "$basedir"`;;
esac
if [ -z "$NODE_PATH" ]; then
export NODE_PATH="/Users/vvzvlad/Data/Projects/gitmost/node_modules/.pnpm/typescript@5.9.3/node_modules/typescript/bin/node_modules:/Users/vvzvlad/Data/Projects/gitmost/node_modules/.pnpm/typescript@5.9.3/node_modules/typescript/node_modules:/Users/vvzvlad/Data/Projects/gitmost/node_modules/.pnpm/typescript@5.9.3/node_modules:/Users/vvzvlad/Data/Projects/gitmost/node_modules/.pnpm/node_modules"
else
export NODE_PATH="/Users/vvzvlad/Data/Projects/gitmost/node_modules/.pnpm/typescript@5.9.3/node_modules/typescript/bin/node_modules:/Users/vvzvlad/Data/Projects/gitmost/node_modules/.pnpm/typescript@5.9.3/node_modules/typescript/node_modules:/Users/vvzvlad/Data/Projects/gitmost/node_modules/.pnpm/typescript@5.9.3/node_modules:/Users/vvzvlad/Data/Projects/gitmost/node_modules/.pnpm/node_modules:$NODE_PATH"
fi
if [ -x "$basedir/node" ]; then
exec "$basedir/node" "$basedir/../typescript/bin/tsserver" "$@"
else
exec node "$basedir/../typescript/bin/tsserver" "$@"
fi

View File

@@ -0,0 +1 @@
../../../../node_modules/.pnpm/@fellow+prosemirror-recreate-transform@1.2.3/node_modules/@fellow/prosemirror-recreate-transform

1
packages/mcp/node_modules/@hocuspocus/provider generated vendored Symbolic link
View File

@@ -0,0 +1 @@
../../../../node_modules/.pnpm/@hocuspocus+provider@3.4.4_y-protocols@1.0.6_yjs@13.6.30__yjs@13.6.30/node_modules/@hocuspocus/provider

1
packages/mcp/node_modules/@hocuspocus/transformer generated vendored Symbolic link
View File

@@ -0,0 +1 @@
../../../../node_modules/.pnpm/@hocuspocus+transformer@3.4.4_@tiptap+core@3.20.4_@tiptap+pm@3.20.4__@tiptap+pm@3.20.4__d2104a828d218219abc1c54b602a69ac/node_modules/@hocuspocus/transformer

1
packages/mcp/node_modules/@modelcontextprotocol/sdk generated vendored Symbolic link
View File

@@ -0,0 +1 @@
../../../../node_modules/.pnpm/@modelcontextprotocol+sdk@1.29.0_@cfworker+json-schema@4.1.1_zod@3.25.76/node_modules/@modelcontextprotocol/sdk

1
packages/mcp/node_modules/@tiptap/core generated vendored Symbolic link
View File

@@ -0,0 +1 @@
../../../../node_modules/.pnpm/@tiptap+core@3.20.4_@tiptap+pm@3.20.4/node_modules/@tiptap/core

1
packages/mcp/node_modules/@tiptap/extension-highlight generated vendored Symbolic link
View File

@@ -0,0 +1 @@
../../../../node_modules/.pnpm/@tiptap+extension-highlight@3.20.4_@tiptap+core@3.20.4_@tiptap+pm@3.20.4_/node_modules/@tiptap/extension-highlight

1
packages/mcp/node_modules/@tiptap/extension-image generated vendored Symbolic link
View File

@@ -0,0 +1 @@
../../../../node_modules/.pnpm/@tiptap+extension-image@3.20.4_@tiptap+core@3.20.4_@tiptap+pm@3.20.4_/node_modules/@tiptap/extension-image

1
packages/mcp/node_modules/@tiptap/extension-link generated vendored Symbolic link
View File

@@ -0,0 +1 @@
../../../../node_modules/.pnpm/@tiptap+extension-link@3.20.4_@tiptap+core@3.20.4_@tiptap+pm@3.20.4__@tiptap+pm@3.20.4/node_modules/@tiptap/extension-link

1
packages/mcp/node_modules/@tiptap/extension-subscript generated vendored Symbolic link
View File

@@ -0,0 +1 @@
../../../../node_modules/.pnpm/@tiptap+extension-subscript@3.20.4_@tiptap+core@3.20.4_@tiptap+pm@3.20.4__@tiptap+pm@3.20.4/node_modules/@tiptap/extension-subscript

1
packages/mcp/node_modules/@tiptap/extension-superscript generated vendored Symbolic link
View File

@@ -0,0 +1 @@
../../../../node_modules/.pnpm/@tiptap+extension-superscript@3.20.4_@tiptap+core@3.20.4_@tiptap+pm@3.20.4__@tiptap+pm@3.20.4/node_modules/@tiptap/extension-superscript

1
packages/mcp/node_modules/@tiptap/extension-task-item generated vendored Symbolic link
View File

@@ -0,0 +1 @@
../../../../node_modules/.pnpm/@tiptap+extension-task-item@3.20.4_@tiptap+extension-list@3.20.4_@tiptap+core@3.20.4_@t_f120fce1a3d9fc85461b67496f03c362/node_modules/@tiptap/extension-task-item

1
packages/mcp/node_modules/@tiptap/extension-task-list generated vendored Symbolic link
View File

@@ -0,0 +1 @@
../../../../node_modules/.pnpm/@tiptap+extension-task-list@3.20.4_@tiptap+extension-list@3.20.4_@tiptap+core@3.20.4_@t_c94f69f56aee3556ec680ab7491aa1d4/node_modules/@tiptap/extension-task-list

1
packages/mcp/node_modules/@tiptap/html generated vendored Symbolic link
View File

@@ -0,0 +1 @@
../../../../node_modules/.pnpm/@tiptap+html@3.20.4_@tiptap+core@3.20.4_@tiptap+pm@3.20.4__@tiptap+pm@3.20.4_happy-dom@20.8.9/node_modules/@tiptap/html

1
packages/mcp/node_modules/@tiptap/starter-kit generated vendored Symbolic link
View File

@@ -0,0 +1 @@
../../../../node_modules/.pnpm/@tiptap+starter-kit@3.20.4/node_modules/@tiptap/starter-kit

1
packages/mcp/node_modules/@types/form-data generated vendored Symbolic link
View File

@@ -0,0 +1 @@
../../../../node_modules/.pnpm/@types+form-data@2.5.2/node_modules/@types/form-data

1
packages/mcp/node_modules/@types/jsdom generated vendored Symbolic link
View File

@@ -0,0 +1 @@
../../../../node_modules/.pnpm/@types+jsdom@27.0.0/node_modules/@types/jsdom

1
packages/mcp/node_modules/@types/node generated vendored Symbolic link
View File

@@ -0,0 +1 @@
../../../../node_modules/.pnpm/@types+node@20.19.43/node_modules/@types/node

1
packages/mcp/node_modules/axios generated vendored Symbolic link
View File

@@ -0,0 +1 @@
../../../node_modules/.pnpm/axios@1.16.0/node_modules/axios

1
packages/mcp/node_modules/form-data generated vendored Symbolic link
View File

@@ -0,0 +1 @@
../../../node_modules/.pnpm/form-data@4.0.5/node_modules/form-data

1
packages/mcp/node_modules/jsdom generated vendored Symbolic link
View File

@@ -0,0 +1 @@
../../../node_modules/.pnpm/jsdom@27.4.0_@noble+hashes@2.0.1/node_modules/jsdom

1
packages/mcp/node_modules/marked generated vendored Symbolic link
View File

@@ -0,0 +1 @@
../../../node_modules/.pnpm/marked@17.0.5/node_modules/marked

1
packages/mcp/node_modules/typescript generated vendored Symbolic link
View File

@@ -0,0 +1 @@
../../../node_modules/.pnpm/typescript@5.9.3/node_modules/typescript

1
packages/mcp/node_modules/ws generated vendored Symbolic link
View File

@@ -0,0 +1 @@
../../../node_modules/.pnpm/ws@8.20.1/node_modules/ws

1
packages/mcp/node_modules/yjs generated vendored Symbolic link
View File

@@ -0,0 +1 @@
../../../node_modules/.pnpm/yjs@13.6.30/node_modules/yjs

1
packages/mcp/node_modules/zod generated vendored Symbolic link
View File

@@ -0,0 +1 @@
../../../node_modules/.pnpm/zod@3.25.76/node_modules/zod

63
packages/mcp/package.json Normal file
View File

@@ -0,0 +1,63 @@
{
"name": "@docmost/mcp",
"version": "1.0.0",
"description": "A Model Context Protocol (MCP) server for Docmost, allowing AI agents to manage documentation spaces and pages.",
"private": true,
"type": "module",
"main": "./build/index.js",
"exports": {
".": "./build/index.js",
"./http": "./build/http.js"
},
"bin": {
"docmost-mcp": "./build/stdio.js"
},
"scripts": {
"build": "tsc",
"start": "node build/stdio.js",
"watch": "tsc --watch",
"pretest": "tsc",
"test": "node --test \"test/unit/*.test.mjs\" \"test/mock/*.test.mjs\"",
"test:unit": "node --test \"test/unit/*.test.mjs\"",
"test:mock": "node --test \"test/mock/*.test.mjs\"",
"test:e2e": "node test-e2e.mjs"
},
"keywords": [
"mcp",
"docmost",
"documentation",
"ai",
"agent"
],
"author": "Moritz Krause",
"license": "MIT",
"dependencies": {
"@fellow/prosemirror-recreate-transform": "^1.2.3",
"@hocuspocus/provider": "^3.4.4",
"@hocuspocus/transformer": "^3.4.4",
"@modelcontextprotocol/sdk": "^1.25.3",
"@tiptap/core": "3.20.4",
"@tiptap/extension-highlight": "3.20.4",
"@tiptap/extension-image": "3.20.4",
"@tiptap/extension-link": "3.20.4",
"@tiptap/extension-subscript": "3.20.4",
"@tiptap/extension-superscript": "3.20.4",
"@tiptap/extension-task-item": "3.20.4",
"@tiptap/extension-task-list": "3.20.4",
"@tiptap/html": "3.20.4",
"@tiptap/starter-kit": "3.20.4",
"@types/jsdom": "^27.0.0",
"axios": "^1.6.0",
"form-data": "^4.0.0",
"jsdom": "^27.4.0",
"marked": "^17.0.1",
"ws": "^8.19.0",
"yjs": "^13.6.29",
"zod": "^3.22.0"
},
"devDependencies": {
"@types/form-data": "^2.5.0",
"@types/node": "^20.0.0",
"typescript": "^5.0.0"
}
}

2577
packages/mcp/src/client.ts Normal file

File diff suppressed because it is too large Load Diff

106
packages/mcp/src/http.ts Normal file
View File

@@ -0,0 +1,106 @@
import { randomUUID } from "node:crypto";
import { IncomingMessage, ServerResponse } from "node:http";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import { isInitializeRequest } from "@modelcontextprotocol/sdk/types.js";
import { createDocmostMcpServer, DocmostMcpConfig } from "./index.js";
/**
* Build a stateful Streamable-HTTP handler for the Docmost MCP server. The
* embedding host (the gitmost NestJS server) bridges its raw Node req/res into
* `handleRequest`. One McpServer + transport is created per MCP session and
* kept alive between requests, keyed by the `mcp-session-id` header.
*/
export function createMcpHttpHandler(config: DocmostMcpConfig) {
// One transport (and one McpServer) per MCP session, keyed by session id.
const transports: Record<string, StreamableHTTPServerTransport> = {};
// Last activity timestamp per session id, used for idle eviction.
const lastSeen: Record<string, number> = {};
// Idle session TTL (ms): a session with no activity for this long is evicted.
// Defaults to 30 min; overridable via MCP_SESSION_IDLE_MS.
const idleTtlMs = (() => {
const parsed = parseInt(process.env.MCP_SESSION_IDLE_MS ?? "", 10);
return Number.isFinite(parsed) && parsed > 0 ? parsed : 30 * 60 * 1000;
})();
// Periodically close transports idle longer than the TTL. transport.close()
// triggers its onclose, which removes it from `transports`; we also drop the
// lastSeen entry. unref() so this timer never keeps the process alive.
const sweepIntervalMs = 5 * 60 * 1000;
const sweepTimer = setInterval(() => {
const now = Date.now();
for (const sid of Object.keys(transports)) {
if (now - (lastSeen[sid] ?? 0) > idleTtlMs) {
void transports[sid].close();
delete lastSeen[sid];
}
}
}, sweepIntervalMs);
sweepTimer.unref();
async function handleRequest(
req: IncomingMessage,
res: ServerResponse,
parsedBody?: unknown,
): Promise<void> {
const sessionId = req.headers["mcp-session-id"] as string | undefined;
const method = (req.method || "GET").toUpperCase();
let transport = sessionId ? transports[sessionId] : undefined;
if (method === "POST" && !transport) {
// A new session may only be created by an initialize request without a
// session id.
if (sessionId || !isInitializeRequest(parsedBody)) {
res.statusCode = 400;
res.setHeader("Content-Type", "application/json");
res.end(
JSON.stringify({
jsonrpc: "2.0",
error: {
code: -32000,
message: "Bad Request: no valid session ID provided",
},
id: null,
}),
);
return;
}
transport = new StreamableHTTPServerTransport({
sessionIdGenerator: () => randomUUID(),
onsessioninitialized: (sid: string) => {
transports[sid] = transport!;
lastSeen[sid] = Date.now();
},
});
transport.onclose = () => {
const sid = transport!.sessionId;
if (sid && transports[sid]) delete transports[sid];
};
const server = createDocmostMcpServer(config);
await server.connect(transport);
await transport.handleRequest(req, res, parsedBody);
return;
}
if (!transport) {
res.statusCode = 400;
res.setHeader("Content-Type", "application/json");
res.end(
JSON.stringify({
jsonrpc: "2.0",
error: {
code: -32000,
message: "Bad Request: no valid session ID provided",
},
id: null,
}),
);
return;
}
// Routing to an existing transport: refresh its idle timestamp.
if (sessionId) lastSeen[sessionId] = Date.now();
await transport.handleRequest(req, res, parsedBody);
}
return { handleRequest };
}

1088
packages/mcp/src/index.ts Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,86 @@
import axios from "axios";
export async function getCollabToken(
baseUrl: string,
apiToken: string,
): Promise<string> {
try {
const response = await axios.post(
`${baseUrl}/auth/collab-token`,
{},
{
headers: {
Authorization: `Bearer ${apiToken}`,
"Content-Type": "application/json",
},
},
);
// console.error('Collab Token Response:', response.data);
// Response is wrapped in { data: { token: ... } }
return response.data.data?.token || response.data.token;
} catch (error) {
if (axios.isAxiosError(error)) {
// Attach the HTTP status to the plain Error so callers (e.g.
// getCollabTokenWithReauth) can still detect a 401/403 after the
// original AxiosError has been wrapped away.
// Avoid leaking the full server response body by default; include only
// status + statusText. Append the body only when DEBUG is set.
let message = `Failed to get collab token: ${error.response?.status} ${error.response?.statusText}`;
if (process.env.DEBUG) {
message += ` - ${JSON.stringify(error.response?.data)}`;
}
const err: any = new Error(message);
err.status = error.response?.status;
throw err;
}
throw error;
}
}
export async function performLogin(
baseUrl: string,
email: string,
password: string,
): Promise<string> {
try {
const response = await axios.post(`${baseUrl}/auth/login`, {
email,
password,
});
// Extract token from Set-Cookie header
const cookies = response.headers["set-cookie"];
if (!cookies) {
throw new Error("No Set-Cookie header found in login response");
}
// Match the cookie name exactly to avoid matching a future
// authTokenRefresh cookie (startsWith would catch it).
const authCookie = cookies.find((c: string) => {
const kv = c.split(";")[0];
return kv.slice(0, kv.indexOf("=")) === "authToken";
});
if (!authCookie) {
throw new Error("No authToken cookie found in login response");
}
// Take everything after the FIRST "=" up to the first ";".
// Splitting on "=" would truncate base64 values containing "=" padding.
const kv = authCookie.split(";")[0];
const token = kv.slice(kv.indexOf("=") + 1);
return token;
} catch (error: any) {
// Avoid leaking the full server response body by default; log only the
// HTTP status. Log the verbose body only when DEBUG is set.
if (axios.isAxiosError(error)) {
if (process.env.DEBUG) {
console.error("Login failed:", error.response?.data);
} else {
console.error("Login failed:", error.response?.status);
}
} else {
console.error("Login failed:", error.message);
}
throw error;
}
}

View File

@@ -0,0 +1,618 @@
import { HocuspocusProvider } from "@hocuspocus/provider";
import { TiptapTransformer } from "@hocuspocus/transformer";
import * as Y from "yjs";
import WebSocket from "ws";
import { marked } from "marked";
import { generateJSON } from "@tiptap/html";
import { JSDOM } from "jsdom";
import { docmostExtensions } from "./docmost-schema.js";
import { withPageLock } from "./page-lock.js";
import { sanitizeForYjs, findUnstorableAttr } from "./node-ops.js";
// Setup DOM environment for Tiptap HTML parsing in Node.js
const dom = new JSDOM("<!DOCTYPE html><html><body></body></html>");
global.window = dom.window as any;
global.document = dom.window.document;
// @ts-ignore
global.Element = dom.window.Element;
// @ts-ignore
global.WebSocket = WebSocket;
// Navigator is read-only in newer Node versions and already exists
// global.navigator = dom.window.navigator;
/**
* Hard ceiling above which we skip callout preprocessing entirely. The linear
* scanner below has no quadratic blow-up, but we still cap input defensively so
* a pathological multi-megabyte payload cannot tie up the event loop; in that
* case the markdown is passed through verbatim (callouts are simply not
* detected) rather than risking a slow scan.
*/
const MAX_CALLOUT_PREPROCESS_BYTES = 4 * 1024 * 1024; // 4 MB
/** Matches an opening callout fence: `:::type` (type captured, lower-cased). */
const CALLOUT_OPEN_RE = /^:::\s*(\w+)\s*$/;
/** Matches a bare closing callout fence: `:::`. */
const CALLOUT_CLOSE_RE = /^:::\s*$/;
/** Matches the start/end of a code fence (``` or ~~~), capturing the marker. */
const CODE_FENCE_RE = /^(\s*)(`{3,}|~{3,})/;
/**
* Pre-process Docmost-flavoured markdown: convert `:::type ... :::`
* callout blocks (the syntax our markdown export produces) into HTML
* divs that the callout extension parses. The inner content is rendered
* through marked as regular markdown.
*
* Implemented as a single linear pass over the lines (no quadratic regex
* rescan). It:
* - tracks fenced code regions (```...``` and ~~~...~~~) and never treats a
* `:::` line that lives inside a code fence as a callout delimiter, so a
* callout body that itself contains a fenced code block with a `:::` line is
* no longer corrupted;
* - matches an opening `:::type` line with the next CLOSING `:::` at the SAME
* nesting level, supporting NESTED callouts via a depth counter (an inner
* `:::type` opens a deeper level and consumes a matching `:::`);
* - emits the same `<div data-type="callout" data-callout-type="TYPE">` output
* (inner rendered through marked) as the previous regex implementation.
*/
async function preprocessCallouts(markdown: string): Promise<string> {
// Defensive cap: skip preprocessing for pathologically large inputs.
if (markdown.length > MAX_CALLOUT_PREPROCESS_BYTES) {
return markdown;
}
// Recursively transform a slice of lines, converting top-level callouts in
// that slice into <div> blocks and rendering their inner content (which may
// itself contain nested callouts) through this same function.
const transform = async (lines: string[]): Promise<string> => {
const out: string[] = [];
let inCodeFence = false;
let codeFenceMarker = ""; // the exact run of backticks/tildes that opened it
let i = 0;
while (i < lines.length) {
const line = lines[i];
// Inside a code fence, only its matching closing fence is significant;
// everything else (including `:::` lines) is copied through verbatim.
if (inCodeFence) {
out.push(line);
const fence = line.match(CODE_FENCE_RE);
if (fence && fence[2].startsWith(codeFenceMarker[0]) &&
fence[2].length >= codeFenceMarker.length) {
inCodeFence = false;
codeFenceMarker = "";
}
i++;
continue;
}
// A code fence opening outside any callout body: enter code-fence mode.
const fenceOpen = line.match(CODE_FENCE_RE);
if (fenceOpen) {
inCodeFence = true;
codeFenceMarker = fenceOpen[2];
out.push(line);
i++;
continue;
}
// An opening callout fence: scan forward (with code-fence and nested
// callout awareness) for its matching closing `:::` at the same level.
const open = line.match(CALLOUT_OPEN_RE);
if (open) {
const type = open[1].toLowerCase();
const bodyLines: string[] = [];
let depth = 1;
let innerInCodeFence = false;
let innerCodeFenceMarker = "";
let j = i + 1;
for (; j < lines.length; j++) {
const bl = lines[j];
if (innerInCodeFence) {
const f = bl.match(CODE_FENCE_RE);
if (f && f[2].startsWith(innerCodeFenceMarker[0]) &&
f[2].length >= innerCodeFenceMarker.length) {
innerInCodeFence = false;
innerCodeFenceMarker = "";
}
bodyLines.push(bl);
continue;
}
const innerFence = bl.match(CODE_FENCE_RE);
if (innerFence) {
innerInCodeFence = true;
innerCodeFenceMarker = innerFence[2];
bodyLines.push(bl);
continue;
}
if (CALLOUT_OPEN_RE.test(bl)) {
depth++;
bodyLines.push(bl);
continue;
}
if (CALLOUT_CLOSE_RE.test(bl)) {
depth--;
if (depth === 0) break; // matching close for THIS callout
bodyLines.push(bl);
continue;
}
bodyLines.push(bl);
}
if (j < lines.length) {
// Found the matching closing fence: render the body (recursively, so
// nested callouts are handled) and emit the callout div.
const inner = await transform(bodyLines);
const renderedInner = await marked.parse(inner);
out.push(
`\n<div data-type="callout" data-callout-type="${type}">${renderedInner}</div>\n`,
);
i = j + 1; // skip past the closing `:::`
continue;
}
// No matching close (unterminated callout): treat the opener as a
// literal line and continue, preserving the original text.
out.push(line);
i++;
continue;
}
out.push(line);
i++;
}
return out.join("\n");
};
return transform(markdown.split("\n"));
}
/**
* Bridge marked's checkbox lists to TipTap task lists.
*
* marked renders GitHub task list items (`- [x] done`) as a plain
* `<ul><li><p><input type="checkbox" checked> text</p></li></ul>` WITHOUT the
* markup TipTap's TaskList/TaskItem extensions parse. This rewrites such lists
* into the shape those extensions expect:
* TaskList parseHTML matches `ul[data-type="taskList"]`,
* TaskItem matches `li[data-type="taskItem"]`,
* the checked state is read from `data-checked === "true"`.
*
* A list is only converted when it has at least one `<li>` and EVERY direct
* `<li>` contains a checkbox input. Both `<ul>` and `<ol>` are considered: a
* numbered checklist (`1. [x] a`, which marked renders as an `<ol>` of checkbox
* `<li>`s) would otherwise lose its task state. TipTap task lists are unordered,
* so a matching `<ol>` is emitted as `data-type="taskList"` exactly like a
* `<ul>`. Mixed or ordinary lists (including ordinary `<ol>` lists) are left
* untouched so they keep rendering as bullet/numbered lists. The marked `<p>`
* wrapper is kept inside the `<li>` because TaskItem content allows paragraphs.
*/
function bridgeTaskLists(html: string): string {
// Cheap early-out: if the markup contains no checkbox input at all there is
// nothing to bridge, so skip the expensive JSDOM parse entirely. This is the
// common case (most pages have no task lists).
if (!/type=["']?checkbox/i.test(html)) {
return html;
}
// Defensive cap (consistent with preprocessCallouts): skip the bridge for
// pathologically large inputs rather than running a second expensive JSDOM
// parse on a multi-megabyte payload. The markup is passed through verbatim.
if (html.length > MAX_CALLOUT_PREPROCESS_BYTES) {
return html;
}
const dom = new JSDOM(html);
const document = dom.window.document;
// Collect the checkbox(es) that belong to THIS <li> directly: either direct
// child <input type="checkbox"> elements or ones inside the <li>'s direct <p>
// child (the shape marked emits: `<li><p><input type="checkbox"> text</p></li>`).
// Checkboxes nested deeper (e.g. inside a child <ul>/<ol>) are excluded so a
// bullet <li> that merely contains a nested task sublist is not misdetected.
// Raw inline HTML can put more than one checkbox in a single <li>; we gather
// ALL of them so none survive into the converted item.
const directCheckboxes = (li: Element): Element[] => {
const found: Element[] = [];
for (const child of Array.from(li.children)) {
if (
child.tagName === "INPUT" &&
child.getAttribute("type") === "checkbox"
) {
found.push(child);
continue;
}
if (child.tagName === "P") {
for (const inp of Array.from(
child.querySelectorAll(":scope > input[type='checkbox']"),
)) {
found.push(inp);
}
}
}
return found;
};
// Both <ul> and <ol> are candidates: an <ol> whose every direct <li> carries
// its own checkbox is a numbered checklist that must also become a taskList.
const lists = Array.from(document.querySelectorAll("ul, ol"));
for (const list of lists) {
// Only consider DIRECT child <li> elements; nested lists are handled by
// their own iteration of the outer loop.
const items = Array.from(list.children).filter(
(child) => child.tagName === "LI",
);
if (items.length === 0) continue;
const itemCheckboxes = items.map((li) => directCheckboxes(li));
// Convert only when every direct <li> carries at least one OWN checkbox.
if (!itemCheckboxes.every((boxes) => boxes.length > 0)) continue;
// A numbered checklist arrives as an <ol>. We must NOT leave the tag as
// <ol> while tagging it data-type="taskList": generateJSON would then match
// BOTH the orderedList rule (tag ol) and the taskList rule (data-type),
// emitting a phantom empty orderedList beside the real taskList. So rename a
// qualifying <ol> to a <ul> — move its <li> children over and replace it —
// leaving only the taskList rule to match. Already-<ul> lists are unchanged.
let target: Element = list;
if (list.tagName === "OL") {
const ul = document.createElement("ul");
// Carry over existing attributes (e.g. class) so nothing is silently lost.
for (const attr of Array.from(list.attributes)) {
ul.setAttribute(attr.name, attr.value);
}
// Move every child node (including the <li>s we collected) into the <ul>.
while (list.firstChild) {
ul.appendChild(list.firstChild);
}
list.replaceWith(ul);
target = ul;
}
target.setAttribute("data-type", "taskList");
items.forEach((li, index) => {
const boxes = itemCheckboxes[index];
// The first checkbox determines the checked state (matches the previous
// single-checkbox behaviour); any extras only need removing.
const input = boxes[0] ?? null;
li.setAttribute("data-type", "taskItem");
const checked =
input != null &&
(input.hasAttribute("checked") || (input as any).checked);
li.setAttribute("data-checked", checked ? "true" : "false");
// Remove ALL direct checkbox inputs so none survive into the content
// (a raw-inline-HTML <li> may carry more than one).
for (const box of boxes) {
box.remove();
}
});
}
return document.body.innerHTML;
}
/** Convert markdown to a ProseMirror doc using the full Docmost schema. */
export async function markdownToProseMirror(
markdownContent: string,
): Promise<any> {
const withCallouts = await preprocessCallouts(markdownContent);
const html = await marked.parse(withCallouts);
const bridged = bridgeTaskLists(html);
return generateJSON(bridged, docmostExtensions);
}
/**
* Build the collaboration WebSocket URL from an API base URL:
* switch http(s)->ws(s), strip a trailing /api, mount on /collab.
* Shared by the live read and the mutate path so both target the same socket.
*/
export function buildCollabWsUrl(baseUrl: string): string {
let wsUrl = baseUrl.replace(/^http/, "ws");
try {
const urlObj = new URL(wsUrl);
if (urlObj.pathname.endsWith("/api") || urlObj.pathname.endsWith("/api/")) {
urlObj.pathname = urlObj.pathname.replace(/\/api\/?$/, "");
}
urlObj.pathname = urlObj.pathname.replace(/\/$/, "") + "/collab";
// Drop any query/hash from the base URL so it is not carried into the
// collaboration ws URL.
urlObj.search = "";
urlObj.hash = "";
wsUrl = urlObj.toString();
} catch (e) {
// Fallback if URL parsing fails
if (!wsUrl.endsWith("/collab")) {
wsUrl = wsUrl.replace(/\/$/, "") + "/collab";
}
}
return wsUrl;
}
/**
* Encode a ProseMirror doc to a Yjs document, sanitizing it first and turning
* the opaque yjs "Unexpected content type" failure into a descriptive error.
*
* `sanitizeForYjs` strips `undefined` node/mark attributes (the common cause of
* the failure); if `toYdoc` still throws, `findUnstorableAttr` is used to point
* at the offending attribute path.
*/
export function buildYDoc(doc: any): Y.Doc {
const safe = sanitizeForYjs(doc);
try {
return TiptapTransformer.toYdoc(safe, "default", docmostExtensions);
} catch (e) {
const bad = findUnstorableAttr(safe);
throw new Error(
`Failed to encode document to Yjs (toYdoc): ${e instanceof Error ? e.message : String(e)}.${bad ? ` Offending attribute: ${bad}.` : " A node/mark attribute likely holds a value Yjs cannot store (e.g. undefined)."}`,
);
}
}
/**
* Validate that a doc is Yjs-encodable by building (and discarding) a Y.Doc.
* Throws the same descriptive error as the apply path when it is not. Used by
* the dry-run preview so it fails identically to apply.
*/
export function assertYjsEncodable(doc: any): void {
buildYDoc(doc);
}
/** Time we wait for the initial handshake/sync before giving up. */
const CONNECT_TIMEOUT_MS = 25000;
/** Time we wait for the server to acknowledge our write before giving up. */
const PERSIST_TIMEOUT_MS = 20000;
/**
* Safely mutate the live content of a page over the collaboration websocket.
*
* This is the single safe write path for every MCP content mutation. It:
* 1. serializes per-page writes through withPageLock (no two MCP writes on
* the same page overlap);
* 2. connects to Hocuspocus and waits for the initial sync so the local ydoc
* mirrors the authoritative server doc — INCLUDING edits/comments/images
* that are not yet in the debounced REST snapshot;
* 3. inside onSynced, SYNCHRONOUSLY reads the live doc, runs `transform`, and
* writes the result back — with no `await` between read and write so no
* remote update can interleave and clobber concurrent human edits;
* 4. waits for the server to acknowledge the write (unsyncedChanges -> 0)
* before resolving, so the next operation observes our change.
*
* `transform` receives the live ProseMirror doc and returns the NEW full
* ProseMirror doc to write, or `null` to abort with no write (a no-op). If
* `transform` throws, the error is propagated to the caller (not swallowed).
*
* Returns the doc that was written, or the live doc when the transform aborted.
*/
export async function mutatePageContent(
pageId: string,
collabToken: string,
baseUrl: string,
transform: (liveDoc: any) => any | null,
): Promise<any> {
return withPageLock(pageId, () => {
if (process.env.DEBUG) {
console.error(`Starting realtime content mutate for page ${pageId}`);
// Token prefix is sensitive; only log it under DEBUG.
console.error(
`Token prefix: ${collabToken ? collabToken.substring(0, 5) : "NONE"}...`,
);
}
const ydoc = new Y.Doc();
const wsUrl = buildCollabWsUrl(baseUrl);
if (process.env.DEBUG) console.error(`Connecting to WebSocket: ${wsUrl}`);
return new Promise<any>((resolve, reject) => {
let provider: HocuspocusProvider | undefined;
let applied = false; // onSynced may fire again on reconnect — apply once.
let settled = false;
// Set true on disconnect/close so a reconnect-driven unsyncedChanges->0
// cannot be mistaken for a successful persist of our write.
let connectionLost = false;
let connectTimer: ReturnType<typeof setTimeout> | undefined;
let persistTimer: ReturnType<typeof setTimeout> | undefined;
let unsyncedHandler: ((data: { number: number }) => void) | undefined;
const cleanup = () => {
if (connectTimer) clearTimeout(connectTimer);
if (persistTimer) clearTimeout(persistTimer);
if (provider) {
if (unsyncedHandler) {
try {
provider.off("unsyncedChanges", unsyncedHandler);
} catch (err) {}
}
try {
provider.destroy();
} catch (err) {}
}
};
const finish = (err: Error | null, value?: any) => {
if (settled) return;
settled = true;
cleanup();
if (err) reject(err);
else resolve(value);
};
connectTimer = setTimeout(() => {
finish(new Error("Connection timeout to collaboration server"));
}, CONNECT_TIMEOUT_MS);
// Resolve once the server has acknowledged our update. The provider
// increments unsyncedChanges when our local update is sent and
// decrements it when the server replies with a SyncStatus(applied=true);
// reaching 0 means the authoritative in-memory ydoc on the server now
// contains our write.
const waitForPersistence = () => {
if (settled) return;
// A missing provider is a failure, not a success: without it the write
// can never have been acknowledged. Only an actual unsyncedChanges===0
// on a live provider counts as persisted.
if (!provider) {
finish(new Error("collab provider gone before persistence"));
return;
}
if (provider.unsyncedChanges === 0) {
finish(null, lastWrittenDoc);
return;
}
persistTimer = setTimeout(() => {
finish(
new Error(
"Timeout waiting for collaboration server to persist the update",
),
);
}, PERSIST_TIMEOUT_MS);
unsyncedHandler = (data: { number: number }) => {
// Only treat unsyncedChanges->0 as success when the connection is
// still up. A transient disconnect + reconnect handshake can drive
// the counter back to 0 without our write being re-transmitted; in
// that case let the disconnect/close error win instead.
if (data.number === 0 && !connectionLost) {
finish(null, lastWrittenDoc);
}
};
provider.on("unsyncedChanges", unsyncedHandler);
};
let lastWrittenDoc: any;
provider = new HocuspocusProvider({
url: wsUrl,
name: `page.${pageId}`,
document: ydoc,
token: collabToken,
// @ts-ignore - Required for Node.js environment
WebSocketPolyfill: WebSocket,
onConnect: () => {
if (process.env.DEBUG) console.error("WS Connect");
},
// An unexpected disconnect/close while we are still waiting (during the
// connect-wait before onSynced, or during the persistence wait after the
// write) means the update will never be acknowledged — surface it now
// instead of hanging until the connect/persist timeout fires. `finish`
// is idempotent via the `settled` flag, so the onClose that our own
// cleanup()->provider.destroy() triggers (after settled=true is set) is
// a harmless no-op and cannot cause a double-resolve.
onDisconnect: () => {
if (process.env.DEBUG) console.error("WS Disconnect");
// Mark BEFORE finish so the unsyncedChanges handler (if it races)
// sees the connection as lost and won't report a false success.
connectionLost = true;
finish(
new Error(
"Collaboration connection closed before the update was persisted/synced",
),
);
},
onClose: () => {
if (process.env.DEBUG) console.error("WS Close");
// Mark BEFORE finish so the unsyncedChanges handler (if it races)
// sees the connection as lost and won't report a false success.
connectionLost = true;
finish(
new Error(
"Collaboration connection closed before the update was persisted/synced",
),
);
},
onSynced: () => {
if (applied || settled) return;
applied = true;
if (process.env.DEBUG) console.error("Connected and synced!");
// CRITICAL: everything between reading the live doc and writing it
// back must stay synchronous (no await). While the JS event loop is
// not yielded, no incoming remote update can interleave, so any
// already-synced concurrent edits are preserved in liveDoc.
let newDoc: any;
try {
let liveDoc = TiptapTransformer.fromYdoc(ydoc, "default");
if (
!liveDoc ||
typeof liveDoc !== "object" ||
!Array.isArray(liveDoc.content)
) {
liveDoc = { type: "doc", content: [] };
}
newDoc = transform(liveDoc);
if (newDoc == null) {
// Transform aborted — write nothing, return the live doc.
lastWrittenDoc = liveDoc;
finish(null, liveDoc);
return;
}
const tempDoc = buildYDoc(newDoc);
// Fetch the fragment immediately before the transact that mutates
// it, rather than reusing a handle grabbed across the transform.
const fragment = ydoc.getXmlFragment("default");
ydoc.transact(() => {
if (fragment.length > 0) {
fragment.delete(0, fragment.length);
}
Y.applyUpdate(ydoc, Y.encodeStateAsUpdate(tempDoc));
});
} catch (e) {
// Includes errors thrown by transform (e.g. "afterText not found",
// "text not found"): propagate them verbatim to the caller.
finish(e instanceof Error ? e : new Error(String(e)));
return;
}
lastWrittenDoc = newDoc;
if (process.env.DEBUG)
console.error("Content written, waiting for server to persist...");
waitForPersistence();
},
onAuthenticationFailed: () => {
finish(
new Error("Authentication failed for collaboration connection"),
);
},
});
});
});
}
/**
* Replace the live content of a page over the collaboration websocket.
* Accepts a ready ProseMirror JSON document; the caller controls whether
* it was produced from markdown (ids regenerate) or edited in place
* (existing block ids preserved).
*
* This is an intentional full replace (used by update_page / update_page_json),
* but now runs under the per-page lock and waits for server persistence via
* mutatePageContent.
*/
export async function replacePageContent(
pageId: string,
prosemirrorDoc: any,
collabToken: string,
baseUrl: string,
): Promise<void> {
// Fail fast on a bad document instead of deferring the failure into the
// collaboration write (where TiptapTransformer.toYdoc(undefined) used to
// throw). The transform must return a valid ProseMirror doc.
if (
prosemirrorDoc == null ||
typeof prosemirrorDoc !== "object" ||
prosemirrorDoc.type !== "doc"
) {
throw new Error("replacePageContent: invalid ProseMirror document");
}
await mutatePageContent(pageId, collabToken, baseUrl, () => prosemirrorDoc);
}
/**
* Markdown update path (kept for backwards compatibility).
* NOTE: this re-imports the whole document — block ids are regenerated.
* Tables and :::callout::: blocks survive thanks to the full schema.
*/
export async function updatePageContentRealtime(
pageId: string,
markdownContent: string,
collabToken: string,
baseUrl: string,
): Promise<void> {
const tiptapJson = await markdownToProseMirror(markdownContent);
await mutatePageContent(pageId, collabToken, baseUrl, () => tiptapJson);
}

View File

@@ -0,0 +1,319 @@
/**
* Headless, Docmost-equivalent document diff.
*
* Docmost's history editor computes a change set with the exact pipeline below
* (recreateTransform -> ChangeSet.addSteps -> simplifyChanges) and renders it as
* editor decorations. This module runs the SAME computation but serializes the
* result to text + integrity counts instead of decorations, so a diff can be
* previewed without a browser.
*
* recreateTransform here comes from @fellow/prosemirror-recreate-transform, the
* maintained published fork of the MIT prosemirror-recreate-steps source that
* Docmost vendors in @docmost/editor-ext; it exposes the identical
* recreateTransform(fromDoc, toDoc, { complexSteps, wordDiffs, simplifyDiff })
* signature.
*
* If recreateTransform / the changeset throws on a pathological document pair,
* we fall back to a coarse block-level text diff so the tool never hard-fails.
*/
import { getSchema } from "@tiptap/core";
import { Node } from "@tiptap/pm/model";
import { ChangeSet, simplifyChanges } from "@tiptap/pm/changeset";
import { recreateTransform } from "@fellow/prosemirror-recreate-transform";
import { docmostExtensions } from "./docmost-schema.js";
/** A single inserted/deleted change with its containing-block context. */
export interface DiffChange {
op: "insert" | "delete";
/** Lead (plain) text of the block that contains the change, for context. */
block: string;
/** The inserted or deleted text. */
text: string;
}
/** Integrity counts as [old, new] tuples; footnoteMarkers as [oldList, newList]. */
export interface DiffIntegrity {
images: [number, number];
links: [number, number];
tables: [number, number];
callouts: [number, number];
footnoteMarkers: [number[], number[]];
}
export interface DiffResult {
summary: { inserted: number; deleted: number; blocksChanged: number };
integrity: DiffIntegrity;
changes: DiffChange[];
/** Human-readable unified-ish summary. */
markdown: string;
}
/** Build the schema once; it is pure and reused across calls. */
const schema = getSchema(docmostExtensions);
/** Recursively concatenate the plain text of a JSON node. */
function plainText(node: any): string {
if (!node || typeof node !== "object") return "";
let out = "";
if (typeof node.text === "string") out += node.text;
if (Array.isArray(node.content)) {
for (const child of node.content) out += plainText(child);
}
return out;
}
/** Count nodes in a JSON doc that satisfy `pred` (recursive). */
function countNodes(doc: any, pred: (node: any) => boolean): number {
let n = 0;
const visit = (node: any): void => {
if (!node || typeof node !== "object") return;
if (pred(node)) n++;
if (Array.isArray(node.content)) for (const c of node.content) visit(c);
};
visit(doc);
return n;
}
/**
* Count UNIQUE links in a JSON doc by their `href`. A single link can be split
* across several adjacent text runs (e.g. a "link+bold" run followed by a "link"
* run); counting link-bearing runs would over-count it. Walking the tree and
* collecting hrefs into a Set keys each distinct link once. Link marks with a
* missing/empty href are bucketed under a single "" key so a malformed link is
* still counted as one.
*/
function countUniqueLinks(doc: any): number {
const hrefs = new Set<string>();
const visit = (node: any): void => {
if (!node || typeof node !== "object") return;
if (node.type === "text" && Array.isArray(node.marks)) {
for (const m of node.marks) {
if (m && m.type === "link") {
const href = m.attrs && typeof m.attrs.href === "string" ? m.attrs.href : "";
hrefs.add(href);
}
}
}
if (Array.isArray(node.content)) for (const c of node.content) visit(c);
};
visit(doc);
return hrefs.size;
}
/**
* Parse the ordered list of integers from `[N]` footnote markers found in the
* BODY only (every top-level block before the first "Примечания..." notes
* heading; if no such heading, the whole doc). Returned in reading order.
*/
function footnoteMarkers(doc: any, notesHeading: string): number[] {
const top: any[] = Array.isArray(doc?.content) ? doc.content : [];
const notesIdx = top.findIndex(
(n) =>
n &&
n.type === "heading" &&
plainText(n).trim() === notesHeading,
);
const bodyBlocks = notesIdx >= 0 ? top.slice(0, notesIdx) : top;
const markers: number[] = [];
const re = /\[(\d+)\]/g;
for (const block of bodyBlocks) {
const text = plainText(block);
let m: RegExpExecArray | null;
re.lastIndex = 0;
while ((m = re.exec(text)) !== null) {
markers.push(Number(m[1]));
}
}
return markers;
}
/** Compute the [old,new] integrity tuples for two JSON docs. */
function computeIntegrity(
oldDoc: any,
newDoc: any,
notesHeading: string,
): DiffIntegrity {
const images: [number, number] = [
countNodes(oldDoc, (n) => n.type === "image"),
countNodes(newDoc, (n) => n.type === "image"),
];
const links: [number, number] = [
countUniqueLinks(oldDoc),
countUniqueLinks(newDoc),
];
const tables: [number, number] = [
countNodes(oldDoc, (n) => n.type === "table"),
countNodes(newDoc, (n) => n.type === "table"),
];
const callouts: [number, number] = [
countNodes(oldDoc, (n) => n.type === "callout"),
countNodes(newDoc, (n) => n.type === "callout"),
];
const fns: [number[], number[]] = [
footnoteMarkers(oldDoc, notesHeading),
footnoteMarkers(newDoc, notesHeading),
];
return { images, links, tables, callouts, footnoteMarkers: fns };
}
/**
* Resolve the lead text of the top-level block in a ProseMirror Node that
* contains the given document position. Returns "" when out of range.
*/
function blockContextAt(node: Node, pos: number): string {
try {
const clamped = Math.max(0, Math.min(pos, node.content.size));
const $pos = node.resolve(clamped);
// depth 1 is the top-level block in a doc node.
const block = $pos.depth >= 1 ? $pos.node(1) : $pos.node(0);
const text = block.textContent || "";
return text.length > 80 ? text.slice(0, 77) + "..." : text;
} catch {
return "";
}
}
/** Truncate a string for the markdown summary. */
function truncate(s: string, n = 120): string {
return s.length > n ? s.slice(0, n - 3) + "..." : s;
}
/**
* Coarse fallback: a block-by-block plain-text diff. Used only when the precise
* changeset pipeline throws, so the tool degrades gracefully instead of failing.
*/
function coarseDiff(oldDoc: any, newDoc: any): DiffChange[] {
const oldBlocks: any[] = Array.isArray(oldDoc?.content) ? oldDoc.content : [];
const newBlocks: any[] = Array.isArray(newDoc?.content) ? newDoc.content : [];
const oldTexts = oldBlocks.map(plainText);
const newTexts = newBlocks.map(plainText);
const oldSet = new Set(oldTexts);
const newSet = new Set(newTexts);
const changes: DiffChange[] = [];
for (const t of oldTexts) {
if (!newSet.has(t) && t.trim() !== "") {
changes.push({ op: "delete", block: truncate(t, 80), text: t });
}
}
for (const t of newTexts) {
if (!oldSet.has(t) && t.trim() !== "") {
changes.push({ op: "insert", block: truncate(t, 80), text: t });
}
}
return changes;
}
/** Build the human-readable unified-ish markdown summary. */
function renderMarkdown(
result: Omit<DiffResult, "markdown">,
fellBack: boolean,
): string {
const lines: string[] = [];
const { summary, integrity, changes } = result;
lines.push(
`# Diff: ${summary.inserted} inserted / ${summary.deleted} deleted (${summary.blocksChanged} blocks changed)`,
);
if (fellBack) {
lines.push("");
lines.push("> note: precise diff failed; coarse block-level diff shown.");
}
lines.push("");
lines.push("## Integrity (old -> new)");
lines.push(`- images: ${integrity.images[0]} -> ${integrity.images[1]}`);
lines.push(`- links: ${integrity.links[0]} -> ${integrity.links[1]}`);
lines.push(`- tables: ${integrity.tables[0]} -> ${integrity.tables[1]}`);
lines.push(`- callouts: ${integrity.callouts[0]} -> ${integrity.callouts[1]}`);
lines.push(
`- footnoteMarkers: [${integrity.footnoteMarkers[0].join(", ")}] -> [${integrity.footnoteMarkers[1].join(", ")}]`,
);
lines.push("");
lines.push("## Changes");
if (changes.length === 0) {
lines.push("(no textual changes)");
} else {
for (const c of changes) {
const sign = c.op === "insert" ? "+" : "-";
const ctx = c.block ? ` @ ${truncate(c.block, 60)}` : "";
lines.push(`${sign} ${truncate(c.text)}${ctx}`);
}
}
return lines.join("\n");
}
/**
* Diff two ProseMirror JSON documents the way Docmost's history editor does and
* serialize the result to text + integrity counts.
*
* @param oldDocJson the earlier document
* @param newDocJson the later document
* @param notesHeading heading delimiting body from notes for footnote counting
*/
export function diffDocs(
oldDocJson: any,
newDocJson: any,
notesHeading: string = "Примечания переводчика",
): DiffResult {
const integrity = computeIntegrity(oldDocJson, newDocJson, notesHeading);
let changes: DiffChange[] = [];
let inserted = 0;
let deleted = 0;
let fellBack = false;
const changedBlocks = new Set<string>();
try {
const oldNode = Node.fromJSON(schema, oldDocJson);
const newNode = Node.fromJSON(schema, newDocJson);
const tr = recreateTransform(oldNode, newNode, {
complexSteps: false,
wordDiffs: true,
simplifyDiff: true,
});
const changeSet = ChangeSet.create(oldNode).addSteps(
tr.doc,
tr.mapping.maps,
[],
);
const simplified = simplifyChanges(changeSet.changes, newNode);
for (const change of simplified) {
// Deleted text lives in the OLD doc coordinate range [fromA, toA).
if (change.toA > change.fromA) {
const text = oldNode.textBetween(change.fromA, change.toA, "\n", " ");
if (text.length > 0) {
deleted += text.length;
const block = blockContextAt(oldNode, change.fromA);
changes.push({ op: "delete", block, text });
if (block) changedBlocks.add("d:" + block);
}
}
// Inserted text lives in the NEW doc coordinate range [fromB, toB).
if (change.toB > change.fromB) {
const text = newNode.textBetween(change.fromB, change.toB, "\n", " ");
if (text.length > 0) {
inserted += text.length;
const block = blockContextAt(newNode, change.fromB);
changes.push({ op: "insert", block, text });
if (block) changedBlocks.add("i:" + block);
}
}
}
} catch {
// Pathological pair: degrade to a coarse block-level diff so we never throw.
fellBack = true;
changes = coarseDiff(oldDocJson, newDocJson);
for (const c of changes) {
if (c.op === "insert") inserted += c.text.length;
else deleted += c.text.length;
if (c.block) changedBlocks.add(c.op[0] + ":" + c.block);
}
}
const partial: Omit<DiffResult, "markdown"> = {
summary: { inserted, deleted, blocksChanged: changedBlocks.size },
integrity,
changes,
};
return { ...partial, markdown: renderMarkdown(partial, fellBack) };
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,93 @@
/**
* Filter functions to extract only relevant information from API responses
* for better agent consumption
*/
export function filterWorkspace(data: any) {
return {
id: data.id,
name: data.name,
description: data.description,
defaultSpaceId: data.defaultSpaceId,
createdAt: data.createdAt,
updatedAt: data.updatedAt,
deletedAt: data.deletedAt,
};
}
export function filterSpace(space: any) {
return {
id: space.id,
name: space.name,
description: space.description,
slug: space.slug,
visibility: space.visibility,
createdAt: space.createdAt,
updatedAt: space.updatedAt,
deletedAt: space.deletedAt,
};
}
export function filterGroup(group: any) {
return {
id: group.id,
name: group.name,
description: group.description,
workspaceId: group.workspaceId,
createdAt: group.createdAt,
updatedAt: group.updatedAt,
deletedAt: group.deletedAt,
};
}
export function filterPage(page: any, content?: string, subpages?: any[]) {
return {
id: page.id,
slugId: page.slugId,
title: page.title,
parentPageId: page.parentPageId,
spaceId: page.spaceId,
isLocked: page.isLocked,
createdAt: page.createdAt,
updatedAt: page.updatedAt,
deletedAt: page.deletedAt,
// Include converted markdown content if valid string (even empty)
...(typeof content === "string" && { content }),
// Include subpages if provided
...(subpages &&
subpages.length > 0 && {
subpages: subpages.map((p) => ({ id: p.id, title: p.title })),
}),
};
}
export function filterComment(comment: any, markdownContent?: string) {
return {
id: comment.id,
pageId: comment.pageId,
content: markdownContent ?? comment.content,
selection: comment.selection || null,
type: comment.type || "page",
parentCommentId: comment.parentCommentId || null,
creatorId: comment.creatorId,
creatorName: comment.creator?.name || null,
createdAt: comment.createdAt,
editedAt: comment.editedAt || null,
resolvedAt: comment.resolvedAt || null,
resolvedById: comment.resolvedById || null,
};
}
export function filterSearchResult(result: any) {
return {
id: result.id,
title: result.title,
parentPageId: result.parentPageId,
createdAt: result.createdAt,
updatedAt: result.updatedAt,
rank: result.rank,
highlight: result.highlight,
spaceId: result.space?.id,
spaceName: result.space?.name,
};
}

View File

@@ -0,0 +1,127 @@
/**
* Surgical text edits on a ProseMirror document without re-importing it.
*
* Each edit replaces an exact substring inside individual text nodes,
* preserving every node id, mark and attribute around it. This is the
* safe alternative to a full markdown re-import for small wording fixes.
*/
export interface TextEdit {
find: string;
replace: string;
/** Replace every occurrence; otherwise the edit must match exactly once. */
replaceAll?: boolean;
}
export interface TextEditResult {
find: string;
replacements: number;
}
/** Collect plain text of the whole document (for span-detection hints). */
function collectText(node: any): string {
let out = "";
if (node.type === "text") out += node.text || "";
for (const child of node.content || []) out += collectText(child);
return out;
}
function countOccurrences(haystack: string, needle: string): number {
if (!needle) return 0;
let count = 0;
let idx = haystack.indexOf(needle);
while (idx !== -1) {
count++;
idx = haystack.indexOf(needle, idx + needle.length);
}
return count;
}
/**
* Apply text edits to a ProseMirror doc (mutates a deep copy, returns it).
* Throws a descriptive error when an edit matches zero times or matches
* multiple times without replaceAll — so the caller can refine `find`.
*/
export function applyTextEdits(
doc: any,
edits: TextEdit[],
): { doc: any; results: TextEditResult[] } {
const copy = JSON.parse(JSON.stringify(doc));
const results: TextEditResult[] = [];
for (const edit of edits) {
if (!edit.find) throw new Error("edit.find must be a non-empty string");
// Count matches inside individual text nodes first.
let nodeMatches = 0;
(function count(node: any) {
if (node.type === "text" && node.text) {
nodeMatches += countOccurrences(node.text, edit.find);
}
for (const child of node.content || []) count(child);
})(copy);
if (nodeMatches === 0) {
// Distinguish "text not present" from "text spans formatting runs".
const fullText = collectText(copy);
if (fullText.includes(edit.find)) {
throw new Error(
`Edit "${truncate(edit.find)}": the text exists in the document but spans ` +
`multiple formatting runs (bold/link/italic boundaries). Use a shorter ` +
`fragment that stays inside one run, or use update_page_json for ` +
`structural changes.`,
);
}
throw new Error(
`Edit "${truncate(edit.find)}": text not found in the document.`,
);
}
if (nodeMatches > 1 && !edit.replaceAll) {
throw new Error(
`Edit "${truncate(edit.find)}": matches ${nodeMatches} times. ` +
`Provide a longer, unique fragment or set replaceAll: true.`,
);
}
// Perform the replacement(s).
let done = 0;
(function replace(node: any) {
if (node.type === "text" && node.text && node.text.includes(edit.find)) {
if (edit.replaceAll) {
done += countOccurrences(node.text, edit.find);
node.text = node.text.split(edit.find).join(edit.replace);
} else if (done === 0) {
// Avoid String.replace: its second arg treats $&, $1, $`, $', $$ as
// special patterns, expanding them instead of inserting literally.
// Splice the first occurrence by index to keep the replacement literal.
const idx = node.text.indexOf(edit.find);
node.text =
node.text.slice(0, idx) +
edit.replace +
node.text.slice(idx + edit.find.length);
done = 1;
}
}
for (const child of node.content || []) replace(child);
})(copy);
results.push({ find: edit.find, replacements: done });
}
// Drop text nodes that became empty (ProseMirror forbids empty text nodes).
(function prune(node: any) {
if (Array.isArray(node.content)) {
node.content = node.content.filter(
(child: any) => !(child.type === "text" && child.text === ""),
);
for (const child of node.content) prune(child);
}
})(copy);
return { doc: copy, results };
}
function truncate(s: string): string {
return s.length > 60 ? s.slice(0, 57) + "..." : s;
}

View File

@@ -0,0 +1,861 @@
/**
* Convert ProseMirror/TipTap JSON content to Markdown
* Supports all Docmost-specific node types and extensions
*/
export function convertProseMirrorToMarkdown(content: any): string {
if (!content || !content.content) return "";
// Escape a value interpolated into an HTML double-quoted attribute value
// (textAlign, colors, image src, math `text`, all data-* attrs, etc.). In the
// ATTRIBUTE context only the quote that delimits the value and the ampersand
// that starts an entity are special, so we escape ONLY & " (and ' for safety
// when single-quoted delimiters are used). We deliberately do NOT escape < or
// >: the HTML re-parser (parse5/jsdom via @tiptap/html) does NOT decode
// &lt;/&gt; back inside attribute values, so escaping them would corrupt the
// stored data (e.g. a math node's LaTeX `a < b`) and ACCUMULATE escapes on
// every round-trip (`a < b` -> `a &lt; b` -> `a &amp;lt; b`). Escaping & "
// keeps the value inert against attribute-injection while staying idempotent.
// NOTE: escape ONLY & and " here. The value is always wrapped in double
// quotes, so " is the only delimiter; ' is NOT special in a double-quoted
// value, and parse5 does not decode &#39; back inside attribute values, so
// escaping ' would (like < >) corrupt the value and accumulate &amp; on every
// round-trip. Escaping & and " is idempotent (parse5 decodes them back).
const escapeAttr = (value: unknown): string =>
String(value)
.replace(/&/g, "&amp;")
.replace(/"/g, "&quot;");
// Escape a value placed as HTML element TEXT content (between tags), where
// <, >, and & are all significant. Used for text rendered inside raw-HTML
// blocks (table cells / columns) so stored characters cannot inject markup.
const escapeHtmlText = (value: unknown): string =>
String(value)
.replace(/&/g, "&amp;")
.replace(/</g, "&lt;")
.replace(/>/g, "&gt;");
// Percent-encode characters that would break out of a markdown URL target
// (...) — whitespace/newlines and parentheses — so a stored src stays a
// single inert token (used for image/video/youtube srcs).
const encodeMdUrl = (value: unknown): string =>
String(value || "")
.replace(/\s/g, (c: string) => (c === " " ? "%20" : encodeURIComponent(c)))
.replace(/\(/g, "%28")
.replace(/\)/g, "%29");
const processNode = (node: any): string => {
const type = node.type;
const nodeContent = node.content || [];
switch (type) {
case "doc":
return nodeContent.map(processNode).join("\n\n");
case "paragraph":
const text = nodeContent.map(processNode).join("");
const align = node.attrs?.textAlign;
if (align && align !== "left") {
return `<div align="${escapeAttr(align)}">${text}</div>`;
}
return text || "";
case "heading":
const level = node.attrs?.level || 1;
const headingText = nodeContent.map(processNode).join("");
return "#".repeat(level) + " " + headingText;
case "text":
let textContent = node.text || "";
// Apply marks (bold, italic, code, etc.)
if (node.marks) {
// Markdown code spans (`...`) cannot carry inner formatting, so when a
// run has the `code` mark alongside ANY other mark, backtick syntax
// would leak literal ** / []() into the code text. In that case emit
// nested HTML (<code> innermost, the other marks wrapping it as HTML)
// so the output is at least well-formed and re-parseable.
//
// NOTE: this does NOT round-trip both marks. The schema's `code` mark
// has `excludes: "_"` (it excludes every other mark), so on import the
// co-occurring mark is always dropped — the run comes back as `code`
// only. We keep the emission simple and accept that the other mark is
// lost; preserving both is impossible while `code` excludes them.
// Only use the backtick form when `code` is the sole mark.
const markTypes = node.marks.map((m: any) => m.type);
const hasCode = markTypes.includes("code");
const codeCombined = hasCode && markTypes.length > 1;
for (const mark of node.marks) {
switch (mark.type) {
case "bold":
textContent = codeCombined
? `<strong>${textContent}</strong>`
: `**${textContent}**`;
break;
case "italic":
textContent = codeCombined
? `<em>${textContent}</em>`
: `*${textContent}*`;
break;
case "code":
// When combined with another mark, wrap as <code> so the
// surrounding HTML marks can nest around it; otherwise use the
// plain backtick span.
textContent = codeCombined
? `<code>${textContent}</code>`
: `\`${textContent}\``;
break;
case "link": {
const href = mark.attrs?.href || "";
const title = mark.attrs?.title;
if (codeCombined) {
// Emit an HTML anchor so it can wrap the nested <code>.
const safeHref = escapeAttr(href);
if (title) {
textContent = `<a href="${safeHref}" title="${escapeAttr(String(title))}">${textContent}</a>`;
} else {
textContent = `<a href="${safeHref}">${textContent}</a>`;
}
} else if (title) {
// Emit the optional markdown link title; escape an embedded
// double-quote so it cannot terminate the title string early.
const safeTitle = String(title).replace(/"/g, '\\"');
textContent = `[${textContent}](${href} "${safeTitle}")`;
} else {
textContent = `[${textContent}](${href})`;
}
break;
}
case "strike":
textContent = codeCombined
? `<s>${textContent}</s>`
: `~~${textContent}~~`;
break;
case "underline":
textContent = `<u>${textContent}</u>`;
break;
case "subscript":
textContent = `<sub>${textContent}</sub>`;
break;
case "superscript":
textContent = `<sup>${textContent}</sup>`;
break;
case "highlight": {
// Preserve a null/empty color as a plain highlight (a bare
// <mark> with no background-color); only emit the style when a
// color is actually set, so a plain highlight is not forced to
// yellow on export.
const color = mark.attrs?.color;
textContent = color
? `<mark style="background-color: ${escapeAttr(color)}">${textContent}</mark>`
: `<mark>${textContent}</mark>`;
break;
}
case "textStyle":
if (mark.attrs?.color) {
textContent = `<span style="color: ${escapeAttr(mark.attrs.color)}">${textContent}</span>`;
}
break;
case "comment": {
// Emit the inline comment anchor so highlights round-trip. The
// schema's Comment mark parses span[data-comment-id] (attrs
// commentId/resolved).
const cid = mark.attrs?.commentId;
if (cid) {
const resolvedAttr = mark.attrs?.resolved
? ` data-resolved="true"`
: "";
textContent = `<span data-comment-id="${escapeAttr(cid)}"${resolvedAttr}>${textContent}</span>`;
}
break;
}
}
}
}
return textContent;
case "codeBlock":
const language = node.attrs?.language || "";
// Strip ALL trailing newlines so the export is idempotent: marked
// re-adds exactly one trailing "\n" on import, so trimming only one
// here would let the text grow by "\n" on each round-trip. Removing
// every trailing newline makes repeated cycles stable.
const code = nodeContent
.map(processNode)
.join("")
.replace(/\n+$/, "");
return "```" + language + "\n" + code + "\n```";
case "bulletList":
return nodeContent
.map((item: any) => processListItem(item, "-"))
.join("\n");
case "orderedList":
return nodeContent
.map((item: any, index: number) =>
processListItem(item, `${index + 1}.`),
)
.join("\n");
case "taskList":
return nodeContent.map((item: any) => processTaskItem(item)).join("\n");
case "taskItem":
// Delegate to the same helper used by taskList so multi-block and
// nested task items render and indent consistently.
return processTaskItem(node);
case "listItem":
return nodeContent.map(processNode).join("\n");
case "blockquote":
// Prefix EVERY line of EVERY child with "> " and separate block-level
// children with a blank ">" line so code blocks / multi-paragraph
// quotes round-trip correctly.
return nodeContent
.map((n: any) =>
processNode(n)
.split("\n")
.map((line: string) => (line.length ? `> ${line}` : ">"))
.join("\n"),
)
.join("\n>\n");
case "horizontalRule":
return "---";
case "hardBreak":
// Two trailing spaces before the newline encode a markdown hard break;
// a bare "\n" would be reimported as a soft break and lost.
return " \n";
case "image":
const imgAlt = node.attrs?.alt || "";
// Neutralize characters that could break out of the markdown image
// URL: spaces/newlines and parentheses would terminate the (...) target
// and let a stored src inject following markdown/HTML. Percent-encode
// them so the URL stays a single inert token.
const imgSrc = encodeMdUrl(node.attrs?.src);
// No "caption" attribute exists in the Docmost image schema, so we do
// not emit one (the previous caption branch was dead).
return `![${imgAlt}](${imgSrc})`;
case "video": {
// Emit the schema-matching <video> element so generateJSON rebuilds the
// node with its attrs intact. The schema's parseHTML reads src/aria-label
// from the standard attributes and the remaining attrs from data-*.
const attrs = node.attrs || {};
const parts: string[] = [`src="${escapeAttr(attrs.src ?? "")}"`];
if (attrs.alt) parts.push(`aria-label="${escapeAttr(attrs.alt)}"`);
if (attrs.attachmentId)
parts.push(
`data-attachment-id="${escapeAttr(attrs.attachmentId)}"`,
);
if (attrs.width != null)
parts.push(`width="${escapeAttr(attrs.width)}"`);
if (attrs.height != null)
parts.push(`height="${escapeAttr(attrs.height)}"`);
if (attrs.size != null)
parts.push(`data-size="${escapeAttr(attrs.size)}"`);
if (attrs.align)
parts.push(`data-align="${escapeAttr(attrs.align)}"`);
if (attrs.aspectRatio != null)
parts.push(`data-aspect-ratio="${escapeAttr(attrs.aspectRatio)}"`);
// Wrap in a block <div> so marked treats it as a block (a bare <video>
// is inline-level HTML and marked wraps it in <p>, leaving a spurious
// empty paragraph beside the hoisted block atom). The wrapper has no
// data-type, so the schema parser ignores it and just hoists the video.
return `<div><video ${parts.join(" ")}></video></div>`;
}
case "youtube": {
// Emit the schema-matching div[data-type="youtube"]; the schema reads
// src from data-src and width/height/align from data-* attributes.
const attrs = node.attrs || {};
const parts: string[] = [
`data-type="youtube"`,
`data-src="${escapeAttr(attrs.src ?? "")}"`,
];
if (attrs.width != null)
parts.push(`data-width="${escapeAttr(attrs.width)}"`);
if (attrs.height != null)
parts.push(`data-height="${escapeAttr(attrs.height)}"`);
if (attrs.align)
parts.push(`data-align="${escapeAttr(attrs.align)}"`);
return `<div ${parts.join(" ")}></div>`;
}
case "table": {
// A GFM pipe table cannot represent merged cells. If ANY cell carries
// colspan>1 or rowspan>1, a pipe table would corrupt the grid on
// re-import, so emit the WHOLE table as raw HTML <table> instead: the
// schema's table family parseHTML (tag table/tr/td/th, with colspan/
// rowspan read from the same-named HTML attrs and align via parseHTML)
// round-trips it faithfully. Otherwise keep the lighter GFM pipe table.
const tableRows: any[] = nodeContent;
if (tableRows.length === 0) return "";
const hasSpan = tableRows.some((row: any) =>
(row.content || []).some(
(cell: any) =>
(cell.attrs?.colspan ?? 1) > 1 || (cell.attrs?.rowspan ?? 1) > 1,
),
);
if (hasSpan) {
// Render each cell's block children to HTML (marked does NOT parse
// markdown inside a raw HTML block, so emitting markdown here would
// leak literal ** / `` into the cell). blockToHtml mirrors the schema
// HTML so inner formatting re-parses into the right marks/nodes.
const renderHtmlCell = (cell: any): string => {
const tag = cell.type === "tableHeader" ? "th" : "td";
const a = cell.attrs || {};
const cellParts: string[] = [];
if ((a.colspan ?? 1) > 1)
cellParts.push(`colspan="${escapeAttr(a.colspan)}"`);
if ((a.rowspan ?? 1) > 1)
cellParts.push(`rowspan="${escapeAttr(a.rowspan)}"`);
if (a.align) cellParts.push(`align="${escapeAttr(a.align)}"`);
const open = cellParts.length
? `<${tag} ${cellParts.join(" ")}>`
: `<${tag}>`;
const inner = (cell.content || [])
.map((block: any) => blockToHtml(block))
.join("");
return `${open}${inner}</${tag}>`;
};
const htmlRows = tableRows
.map(
(row: any) =>
`<tr>${(row.content || []).map(renderHtmlCell).join("")}</tr>`,
)
.join("");
return `<table><tbody>${htmlRows}</tbody></table>`;
}
// No merged cells: emit a GFM table (header row + separator) so the
// markdown can be parsed back into a table on re-import.
const rows = tableRows.map(processNode);
const headerCells = tableRows[0]?.content || [];
const columns = headerCells.length || 1;
// Derive alignment markers (:--, :-:, --:) from each header cell.
const markers = Array.from({ length: columns }, (_, i) => {
const align = headerCells[i]?.attrs?.align;
switch (align) {
case "left":
return ":--";
case "center":
return ":-:";
case "right":
return "--:";
default:
return "---";
}
});
const separator = "| " + markers.join(" | ") + " |";
return [rows[0], separator, ...rows.slice(1)].join("\n");
}
case "tableRow":
return "| " + nodeContent.map(processNode).join(" | ") + " |";
case "tableCell":
case "tableHeader": {
// Join multiple block children with a space (not "") so adjacent blocks
// like a paragraph followed by a list don't collide into "line1- a".
// Then collapse newlines and escape pipes so a cell containing "|" or a
// line break cannot corrupt the surrounding GFM row.
return nodeContent
.map(processNode)
.join(" ")
.replace(/\r?\n/g, " ")
.replace(/\|/g, "\\|");
}
case "callout":
const calloutType = node.attrs?.type || "info";
const calloutContent = nodeContent.map(processNode).join("\n");
return `:::${calloutType.toLowerCase()}\n${calloutContent}\n:::`;
case "details":
return nodeContent.map(processNode).join("\n");
case "detailsSummary":
const summaryText = nodeContent.map(processNode).join("");
return `<details>\n<summary>${summaryText}</summary>\n`;
case "detailsContent":
const detailsText = nodeContent.map(processNode).join("\n");
return `${detailsText}\n</details>`;
case "mathInline": {
// The schema's `text` attribute has no parseHTML, so TipTap's default
// parser reads it from the `text` HTML attribute (NOT the element's text
// content). Emit span[data-type="mathInline"] carrying the LaTeX in a
// `text="..."` attribute so it round-trips. marked cannot parse $...$
// back, so the previous form was lossy.
const inlineMath = node.attrs?.text || "";
return `<span data-type="mathInline" data-katex="true" text="${escapeAttr(inlineMath)}"></span>`;
}
case "mathBlock": {
// Same as mathInline: the LaTeX must ride in the `text` HTML attribute
// for the schema's default parser to recover it.
const blockMath = node.attrs?.text || "";
return `<div data-type="mathBlock" data-katex="true" text="${escapeAttr(blockMath)}"></div>`;
}
case "mention": {
// Emit span[data-type="mention"] with the schema's data-* attributes so
// generateJSON rebuilds the mention node instead of leaving "@label"
// plain text that cannot re-parse.
const attrs = node.attrs || {};
const parts: string[] = [`data-type="mention"`];
if (attrs.id) parts.push(`data-id="${escapeAttr(attrs.id)}"`);
if (attrs.label)
parts.push(`data-label="${escapeAttr(attrs.label)}"`);
if (attrs.entityType)
parts.push(`data-entity-type="${escapeAttr(attrs.entityType)}"`);
if (attrs.entityId)
parts.push(`data-entity-id="${escapeAttr(attrs.entityId)}"`);
if (attrs.slugId)
parts.push(`data-slug-id="${escapeAttr(attrs.slugId)}"`);
if (attrs.creatorId)
parts.push(`data-creator-id="${escapeAttr(attrs.creatorId)}"`);
if (attrs.anchorId)
parts.push(`data-anchor-id="${escapeAttr(attrs.anchorId)}"`);
// Keep the label as visible text content too; the schema reads attrs
// from data-*, so the inner text is purely cosmetic and harmless.
const mentionLabel = attrs.label || attrs.id || "";
// The label is visible element TEXT content here (the data-* attrs above
// carry the real values), so escape it for the text context, not attrs.
return `<span ${parts.join(" ")}>@${escapeHtmlText(mentionLabel)}</span>`;
}
case "attachment": {
// BUG FIX: the old code read node.attrs.fileName / node.attrs.src, but
// the schema stores name/url (plus mime/size/attachmentId). Emit the
// schema-matching div[data-type="attachment"] with data-attachment-*
// attrs so the node round-trips instead of degrading to a markdown link.
const attrs = node.attrs || {};
const parts: string[] = [
`data-type="attachment"`,
`data-attachment-url="${escapeAttr(attrs.url ?? "")}"`,
];
if (attrs.name)
parts.push(`data-attachment-name="${escapeAttr(attrs.name)}"`);
if (attrs.mime)
parts.push(`data-attachment-mime="${escapeAttr(attrs.mime)}"`);
if (attrs.size != null)
parts.push(`data-attachment-size="${escapeAttr(attrs.size)}"`);
if (attrs.attachmentId)
parts.push(
`data-attachment-id="${escapeAttr(attrs.attachmentId)}"`,
);
return `<div ${parts.join(" ")}></div>`;
}
case "drawio":
case "excalidraw": {
// Emit the schema-matching div[data-type=...] carrying the diagram's
// attrs as data-* (the schema's diagramAttributes reads src/title/alt/
// width/height/size/aspectRatio/align/attachmentId from data-*), so the
// diagram round-trips instead of degrading to a lossy placeholder.
const attrs = node.attrs || {};
const parts: string[] = [
`data-type="${type}"`,
`data-src="${escapeAttr(attrs.src ?? "")}"`,
];
if (attrs.title != null)
parts.push(`data-title="${escapeAttr(attrs.title)}"`);
if (attrs.alt != null) parts.push(`data-alt="${escapeAttr(attrs.alt)}"`);
if (attrs.width != null)
parts.push(`data-width="${escapeAttr(attrs.width)}"`);
if (attrs.height != null)
parts.push(`data-height="${escapeAttr(attrs.height)}"`);
if (attrs.size != null)
parts.push(`data-size="${escapeAttr(attrs.size)}"`);
if (attrs.aspectRatio != null)
parts.push(`data-aspect-ratio="${escapeAttr(attrs.aspectRatio)}"`);
if (attrs.align)
parts.push(`data-align="${escapeAttr(attrs.align)}"`);
if (attrs.attachmentId)
parts.push(
`data-attachment-id="${escapeAttr(attrs.attachmentId)}"`,
);
return `<div ${parts.join(" ")}></div>`;
}
case "embed": {
// Emit the schema-matching div[data-type="embed"]; the schema reads
// src/provider/align/width/height from data-* attributes so the node
// (and its provider iframe info) survives the round-trip.
const attrs = node.attrs || {};
const parts: string[] = [
`data-type="embed"`,
`data-src="${escapeAttr(attrs.src ?? "")}"`,
`data-provider="${escapeAttr(attrs.provider ?? "")}"`,
];
if (attrs.align)
parts.push(`data-align="${escapeAttr(attrs.align)}"`);
if (attrs.width != null)
parts.push(`data-width="${escapeAttr(attrs.width)}"`);
if (attrs.height != null)
parts.push(`data-height="${escapeAttr(attrs.height)}"`);
return `<div ${parts.join(" ")}></div>`;
}
case "audio": {
// Emit the schema-matching <audio> element (was emitting nothing). The
// schema reads src from src and attachmentId/size from data-*.
const attrs = node.attrs || {};
const parts: string[] = [`src="${escapeAttr(attrs.src ?? "")}"`];
if (attrs.attachmentId)
parts.push(
`data-attachment-id="${escapeAttr(attrs.attachmentId)}"`,
);
if (attrs.size != null)
parts.push(`data-size="${escapeAttr(attrs.size)}"`);
// Wrap in a block <div> for the same reason as video: a bare <audio> is
// inline-level HTML that marked would wrap in <p>.
return `<div><audio ${parts.join(" ")}></audio></div>`;
}
case "pdf": {
// Emit the schema-matching div[data-type="pdf"] (was emitting nothing).
// The schema reads src/width/height from standard attrs and name/
// attachmentId/size from data-*.
const attrs = node.attrs || {};
const parts: string[] = [
`data-type="pdf"`,
`src="${escapeAttr(attrs.src ?? "")}"`,
];
if (attrs.name) parts.push(`data-name="${escapeAttr(attrs.name)}"`);
if (attrs.attachmentId)
parts.push(
`data-attachment-id="${escapeAttr(attrs.attachmentId)}"`,
);
if (attrs.size != null)
parts.push(`data-size="${escapeAttr(attrs.size)}"`);
if (attrs.width != null)
parts.push(`width="${escapeAttr(attrs.width)}"`);
if (attrs.height != null)
parts.push(`height="${escapeAttr(attrs.height)}"`);
return `<div ${parts.join(" ")}></div>`;
}
case "columns": {
// Emit the schema-matching div[data-type="columns"] wrapper so the
// multi-column layout survives. Without a case the children were
// concatenated with no separator and the text merged. The schema reads
// layout from data-layout and widthMode from data-width-mode. The whole
// block is raw HTML, so render children via blockToHtml (NOT markdown,
// which marked would not re-parse inside a raw HTML block).
const attrs = node.attrs || {};
const parts: string[] = [`data-type="columns"`];
if (attrs.layout)
parts.push(`data-layout="${escapeAttr(attrs.layout)}"`);
if (attrs.widthMode && attrs.widthMode !== "normal")
parts.push(`data-width-mode="${escapeAttr(attrs.widthMode)}"`);
const inner = nodeContent.map((n: any) => blockToHtml(n)).join("");
return `<div ${parts.join(" ")}>${inner}</div>`;
}
case "column": {
// Emit the schema-matching div[data-type="column"]; the schema reads the
// column width from data-width. Children are rendered as HTML so their
// formatting survives inside this raw HTML block.
const attrs = node.attrs || {};
const parts: string[] = [`data-type="column"`];
if (attrs.width)
parts.push(`data-width="${escapeAttr(attrs.width)}"`);
const inner = nodeContent.map((n: any) => blockToHtml(n)).join("");
return `<div ${parts.join(" ")}>${inner}</div>`;
}
case "subpages":
return "{{SUBPAGES}}";
default:
// Fallback: process children
return nodeContent.map(processNode).join("");
}
};
// Render inline content (text runs + their marks) to HTML. Used by the raw
// HTML fallbacks (spanned tables, columns) where marked will NOT re-parse
// markdown, so backtick/asterisk/bracket syntax would otherwise leak as
// literal characters. Each mark is mirrored to the HTML the schema's parseHTML
// accepts so it re-imports as the matching ProseMirror mark.
const inlineToHtml = (inlineNodes: any[]): string =>
(inlineNodes || [])
.map((n: any) => {
if (n.type === "hardBreak") return "<br>";
if (n.type !== "text") {
// Inline atoms (mention, mathInline) already emit schema HTML.
return processNode(n);
}
let t = escapeHtmlText(n.text || "");
for (const mark of n.marks || []) {
switch (mark.type) {
case "bold":
t = `<strong>${t}</strong>`;
break;
case "italic":
t = `<em>${t}</em>`;
break;
case "code":
t = `<code>${t}</code>`;
break;
case "strike":
t = `<s>${t}</s>`;
break;
case "underline":
t = `<u>${t}</u>`;
break;
case "subscript":
t = `<sub>${t}</sub>`;
break;
case "superscript":
t = `<sup>${t}</sup>`;
break;
case "link":
t = `<a href="${escapeAttr(mark.attrs?.href || "")}">${t}</a>`;
break;
case "highlight":
t = mark.attrs?.color
? `<mark style="background-color: ${escapeAttr(mark.attrs.color)}">${t}</mark>`
: `<mark>${t}</mark>`;
break;
case "textStyle":
if (mark.attrs?.color)
t = `<span style="color: ${escapeAttr(mark.attrs.color)}">${t}</span>`;
break;
case "comment":
// Inline comment anchor inside a raw-HTML container (columns /
// spanned table cells), so commented text there also round-trips.
if (mark.attrs?.commentId) {
const r = mark.attrs?.resolved ? ` data-resolved="true"` : "";
t = `<span data-comment-id="${escapeAttr(mark.attrs.commentId)}"${r}>${t}</span>`;
}
break;
}
}
return t;
})
.join("");
// Emit the schema-matching <img> for an image node. Shared so the image is
// emitted as real HTML wherever a raw-HTML container needs it (inside a column
// or a spanned table cell), where markdown `![](...)` would NOT be re-parsed
// and would survive as literal text. The Image extension reads src/alt from
// the standard attributes; the Docmost extra attrs (width/height/align/size/
// attachmentId/aspectRatio) are global attributes read from same-named DOM
// attributes, so emit them by name.
const imageToHtml = (node: any): string => {
const attrs = node.attrs || {};
const parts: string[] = [`src="${escapeAttr(attrs.src ?? "")}"`];
if (attrs.alt) parts.push(`alt="${escapeAttr(attrs.alt)}"`);
if (attrs.title) parts.push(`title="${escapeAttr(attrs.title)}"`);
if (attrs.width != null) parts.push(`width="${escapeAttr(attrs.width)}"`);
if (attrs.height != null) parts.push(`height="${escapeAttr(attrs.height)}"`);
if (attrs.align) parts.push(`align="${escapeAttr(attrs.align)}"`);
if (attrs.size != null) parts.push(`data-size="${escapeAttr(attrs.size)}"`);
if (attrs.attachmentId)
parts.push(`data-attachment-id="${escapeAttr(attrs.attachmentId)}"`);
if (attrs.aspectRatio != null)
parts.push(`data-aspect-ratio="${escapeAttr(attrs.aspectRatio)}"`);
return `<img ${parts.join(" ")}>`;
};
// Emit the schema-matching div[data-type="callout"] for a callout node. The
// schema reads the banner type from data-callout-type. Children are rendered
// as HTML so they survive inside a raw-HTML container.
const calloutToHtml = (node: any): string => {
const type = (node.attrs?.type || "info").toLowerCase();
const inner = (node.content || []).map(blockToHtml).join("");
return `<div data-type="callout" data-callout-type="${escapeAttr(type)}">${inner}</div>`;
};
// Emit a schema-matching <details> tree. The schema parses <details>,
// summary[data-type="detailsSummary"], and div[data-type="detailsContent"].
const detailsToHtml = (node: any): string => {
const inner = (node.content || []).map(blockToHtml).join("");
return `<details>${inner}</details>`;
};
const detailsSummaryToHtml = (node: any): string =>
`<summary data-type="detailsSummary">${inlineToHtml(node.content || [])}</summary>`;
const detailsContentToHtml = (node: any): string => {
const inner = (node.content || []).map(blockToHtml).join("");
return `<div data-type="detailsContent">${inner}</div>`;
};
// Emit the schema-matching taskList/taskItem HTML. bridgeTaskLists (in
// collaboration.ts) recognizes ul[data-type="taskList"] with
// li[data-type="taskItem"][data-checked]; emitting that directly here keeps
// task lists inside columns/cells from degrading to literal "- [ ]" text.
const taskListToHtml = (node: any): string => {
const items = (node.content || [])
.map((it: any) => {
const checked = it.attrs?.checked ? "true" : "false";
return `<li data-type="taskItem" data-checked="${checked}">${blockChildrenToHtml(it)}</li>`;
})
.join("");
return `<ul data-type="taskList">${items}</ul>`;
};
// Render a block node to HTML for the raw-HTML containers (spanned tables,
// columns). marked does NOT re-parse markdown inside a raw-HTML block, so
// EVERY block type that can appear inside a column or a spanned cell must be
// emitted as schema-matching HTML here — never as markdown, or it would land
// as literal text on re-import. Nodes whose processNode case already produces
// schema-matching HTML (math/media/embed/attachment/nested columns/spanned
// table) are delegated to processNode; the markdown-emitting cases
// (image/blockquote/callout/details/hr/taskList) get explicit HTML here.
const blockToHtml = (block: any): string => {
const children = block.content || [];
switch (block.type) {
case "paragraph":
return `<p>${inlineToHtml(children)}</p>`;
case "heading": {
const level = block.attrs?.level || 1;
return `<h${level}>${inlineToHtml(children)}</h${level}>`;
}
case "bulletList":
return `<ul>${children
.map((li: any) => `<li>${blockChildrenToHtml(li)}</li>`)
.join("")}</ul>`;
case "orderedList":
return `<ol>${children
.map((li: any) => `<li>${blockChildrenToHtml(li)}</li>`)
.join("")}</ol>`;
case "codeBlock": {
const lang = block.attrs?.language || "";
// The code itself is element TEXT content (between <code> tags), so it
// must escape < > & — NOT the attribute escaper. The language rides in
// a class ATTRIBUTE, so it uses escapeAttr.
const code = escapeHtmlText(
children
.map(processNode)
.join("")
.replace(/\n+$/, ""),
);
const cls = lang ? ` class="language-${escapeAttr(lang)}"` : "";
return `<pre><code${cls}>${code}</code></pre>`;
}
case "image":
return imageToHtml(block);
case "blockquote":
return `<blockquote>${children.map(blockToHtml).join("")}</blockquote>`;
case "horizontalRule":
return "<hr>";
case "callout":
return calloutToHtml(block);
case "details":
return detailsToHtml(block);
case "detailsSummary":
return detailsSummaryToHtml(block);
case "detailsContent":
return detailsContentToHtml(block);
case "taskList":
return taskListToHtml(block);
case "taskItem":
// A bare taskItem (outside a taskList) still needs a wrapping list so
// the schema parses it; wrap it in a single-item taskList.
return taskListToHtml({ content: [block] });
// table (incl. spanned), columns/column, math, media, embed, attachment,
// mention, etc. already emit schema-matching HTML from processNode.
case "table":
case "columns":
case "column":
case "mathBlock":
case "video":
case "audio":
case "pdf":
case "youtube":
case "embed":
case "attachment":
case "drawio":
case "excalidraw":
return processNode(block);
default:
// Any still-unhandled block type: NEVER fall back to markdown inside a
// raw-HTML block (it would become literal text). Wrap its rendered
// children in a <div> so their content is preserved; if it has no block
// children, render its inline content instead.
if (children.length && children.some((c: any) => c.type !== "text")) {
return `<div>${children.map(blockToHtml).join("")}</div>`;
}
return `<div>${inlineToHtml(children)}</div>`;
}
};
// Render the block children of a list item to HTML (a listItem holds block+
// content). Mirrors processListItem but for the HTML fallback path.
const blockChildrenToHtml = (item: any): string =>
(item.content || []).map((b: any) => blockToHtml(b)).join("");
// Indent the rendered children of a list item under a marker prefix.
// Each child block is a (possibly multi-line) string. The very first physical
// line of the first child carries the marker (e.g. "- " or "1. "); EVERY
// other line — the remaining lines of the first child AND all lines of every
// subsequent child (nested lists, code blocks, extra paragraphs) — is indented
// to align under the marker. Without indenting these continuation lines, the
// 2nd/3rd line of a nested child collapses to column 0 and escapes the list.
//
// The continuation indent MUST equal the LIST marker width, which is not the
// same as the visible prefix width:
// - bullet "- " -> 2 columns
// - task "- [ ] " -> marker is still "- " (the "[ ] " is content), 2
// - ordered "1. "/"10. " -> 3/4 columns, scaling with the number's digits
// CommonMark anchors nested content to the marker column, so an ordered item
// indented to only 2 columns would be re-parsed as a sibling/loose content on
// re-import. Callers therefore pass the exact indent width to use.
const indentItemChildren = (
childStrings: string[],
prefix: string,
indentWidth: number,
): string => {
const indent = " ".repeat(indentWidth);
const lines: string[] = [];
childStrings.forEach((child, childIndex) => {
child.split("\n").forEach((line, lineIndex) => {
if (childIndex === 0 && lineIndex === 0) {
// First physical line of the first block gets the marker.
lines.push(`${prefix} ${line}`);
} else {
// Indent every continuation line by the marker width; keep blank
// lines blank rather than emitting trailing whitespace.
lines.push(line.length ? `${indent}${line}` : "");
}
});
});
return lines.join("\n");
};
const processListItem = (item: any, prefix: string): string => {
const itemContent = item.content || [];
const childStrings = itemContent.map(processNode);
if (childStrings.length === 0) return prefix;
// The rendered marker is `${prefix} ` (prefix + one space), so its width —
// and thus the continuation indent — is prefix.length + 1. This is correct
// for both bullet ("-" -> 2) and ordered ("1." -> 3, "10." -> 4) markers,
// since for those the visible prefix IS the list marker.
return indentItemChildren(childStrings, prefix, prefix.length + 1);
};
const processTaskItem = (item: any): string => {
const checked = item.attrs?.checked || false;
const checkbox = checked ? "[x]" : "[ ]";
const prefix = `- ${checkbox}`;
const itemContent = item.content || [];
const childStrings = itemContent.map(processNode);
// An empty task item still needs its checkbox marker; without this guard
// the indent below produces "" and the "- [ ]"/"- [x]" row disappears.
if (childStrings.length === 0) return prefix;
// The list marker for a task item is just "- " (2 columns); the "[ ] "/"[x] "
// checkbox is item content, NOT part of the marker. So the continuation
// indent is a fixed 2 — do NOT derive it from the wider prefix.length.
return indentItemChildren(childStrings, prefix, 2);
};
return processNode(content).trim();
}

View File

@@ -0,0 +1,136 @@
/**
* Self-contained Docmost-flavoured Markdown document (custom extensions).
*
* A single `.md` file that packages everything needed to losslessly round-trip
* a page through "download -> edit body -> re-upload":
* - a leading `docmost:meta` block: a one-line JSON object with page identity;
* - the Markdown body (carrying inline comment anchors and diagrams as HTML);
* - a trailing `docmost:comments` block: a one-line JSON array of comment
* threads.
*
* Both metadata blocks are HTML comments on purpose: `marked`/`generateJSON`
* drop HTML comments, so even if the WHOLE file were ever fed straight to the
* importer without first stripping the blocks, the metadata cannot leak into the
* document. (A fenced ```docmost-comments``` block would WRONGLY become a
* codeBlock node, so a fenced block is deliberately NOT used.)
*
* The delimiter literals may legitimately appear in the BODY too (e.g. a user
* re-pastes an exported `.md` into a page, or a page documents this very
* format). To stay robust, parsing treats only the FINAL, document-ending
* `docmost:comments` block as metadata: it is the last `<!-- docmost:comments`
* opener whose closing `-->` sits at the very end of the file. Any earlier
* literal occurrence is left in the body untouched.
*
* NOTE on comments: in this version the comment THREAD records are preserved in
* the file but are NOT pushed back to the server on import — only the inline
* comment marks (anchors) embedded in the body are restored. Managing comment
* records stays with the comment tools/UI.
*/
export interface DocmostMdMeta {
version: number;
pageId?: string;
slugId?: string;
title?: string;
spaceId?: string;
parentPageId?: string | null;
}
// Match the leading meta block (allow leading whitespace). Capture group 1 is
// the JSON text between the markers.
const META_RE = /^\s*<!--\s*docmost:meta\s*\n([\s\S]*?)\n-->/;
// Match a `docmost:comments` opener. Used globally to scan for the LAST opener
// rather than end-anchoring a single regex (which would mis-capture across a
// literal opener that appears earlier in the body).
const COMMENTS_OPEN_RE = /<!--[ \t]*docmost:comments[ \t]*\r?\n/g;
/**
* Assemble the full self-contained markdown file: meta block, body, and the
* comments block. The meta block is always emitted; the comments block is always
* emitted too (with `[]` when there are no comments) so the format stays uniform
* and parsing stays simple.
*/
export function serializeDocmostMarkdown(
meta: DocmostMdMeta,
body: string,
comments: any[],
): string {
const metaJson = JSON.stringify(meta);
const commentsJson = JSON.stringify(Array.isArray(comments) ? comments : []);
const trimmedBody = (body ?? "").trim();
return (
`<!-- docmost:meta\n${metaJson}\n-->\n\n` +
`${trimmedBody}\n\n` +
`<!-- docmost:comments\n${commentsJson}\n-->\n`
);
}
/**
* Split a self-contained file back into its parts. Tolerant: if the meta or
* comments block is missing (e.g. a hand-written plain-markdown file), the
* corresponding value is returned as `null` and the whole input is treated as
* the body. This never throws on a MISSING block; only a `JSON.parse` failure
* inside a block that IS present is surfaced as a thrown Error with a clear
* message. Robust to `\r\n` line endings.
*/
export function parseDocmostMarkdown(full: string): {
meta: DocmostMdMeta | null;
body: string;
comments: any[] | null;
} {
// Normalize line endings so the anchored regexes work regardless of CRLF.
const normalized = (full ?? "").replace(/\r\n/g, "\n");
// Extract the leading meta block (start-anchored — already unambiguous).
let meta: DocmostMdMeta | null = null;
let metaEnd = 0;
const metaMatch = normalized.match(META_RE);
if (metaMatch) {
try {
meta = JSON.parse(metaMatch[1]);
} catch (e) {
throw new Error(
`Invalid docmost:meta JSON block: ${
e instanceof Error ? e.message : String(e)
}`,
);
}
// Body starts right after the matched meta block.
metaEnd = (metaMatch.index ?? 0) + metaMatch[0].length;
}
// Find the LAST `<!-- docmost:comments` opener; the real file-level block is
// the final one whose closing `-->` ends the document. Any earlier literal
// occurrence inside the body (e.g. a re-pasted export) is left in the body.
let lastOpenStart = -1;
let lastOpenEnd = -1;
let m: RegExpExecArray | null;
COMMENTS_OPEN_RE.lastIndex = 0;
while ((m = COMMENTS_OPEN_RE.exec(normalized)) !== null) {
lastOpenStart = m.index;
lastOpenEnd = m.index + m[0].length;
}
let comments: any[] | null = null;
let bodyEnd = normalized.length;
if (lastOpenStart !== -1) {
const rest = normalized.slice(lastOpenEnd);
const close = rest.match(/\r?\n-->[ \t]*\r?\n?\s*$/); // closer must end the doc
if (close) {
const jsonText = rest.slice(0, close.index);
try {
comments = JSON.parse(jsonText);
} catch (e) {
throw new Error(
`Invalid docmost:comments JSON block: ${
e instanceof Error ? e.message : String(e)
}`,
);
}
bodyEnd = lastOpenStart; // strip from the opener to end of document
}
}
const body = normalized.slice(metaEnd, bodyEnd).trim();
return { meta, body, comments };
}

View File

@@ -0,0 +1,897 @@
/**
* Pure, network-free helpers for manipulating a ProseMirror/TipTap document
* tree by node id.
*
* A ProseMirror node here is a plain JSON object of the shape produced by
* Docmost: `{ type, attrs?, content?, text?, marks? }`. Children live in the
* `content` array; a node carries a stable id in `attrs.id`. Callouts and
* table cells hold their children in `content` just like any other block, so a
* single recursive walk reaches them all.
*
* Every exported function operates on a DEEP CLONE of the input document and
* returns the new document. The input doc and any `newNode`/`node` argument are
* never mutated. All functions are defensively null-safe: missing/!Array
* `content`, non-object nodes, and absent `attrs` are tolerated.
*/
/** Deep-clone a JSON-serializable value without mutating the original. */
function clone<T>(value: T): T {
if (typeof structuredClone === "function") {
return structuredClone(value);
}
// Fallback for environments without structuredClone.
return JSON.parse(JSON.stringify(value)) as T;
}
/** True if `value` is a non-null object (and not an array). */
function isObject(value: any): value is Record<string, any> {
return value != null && typeof value === "object" && !Array.isArray(value);
}
/** True if `node` carries the given id in `node.attrs.id`. */
function matchesId(node: any, nodeId: string): boolean {
return isObject(node) && isObject(node.attrs) && node.attrs.id === nodeId;
}
/**
* Recursively concatenate all text contained in a node.
*
* Text nodes contribute their `text` string; container nodes contribute the
* joined `blockPlainText` of their `content` children. Returns "" for nullish
* or non-object inputs.
*/
export function blockPlainText(node: any): string {
if (!isObject(node)) return "";
let out = "";
if (typeof node.text === "string") {
out += node.text;
}
if (Array.isArray(node.content)) {
for (const child of node.content) {
out += blockPlainText(child);
}
}
return out;
}
/** Truncate `text` to at most `n` chars, appending an ellipsis when cut. */
function truncate(text: string, n: number): string {
return text.length > n ? text.slice(0, n) + "…" : text;
}
/** One compact outline entry for a single top-level block. */
export interface OutlineEntry {
index: number;
type: string | undefined;
id: string | null;
firstText: string;
/** Present for headings only. */
level?: number | null;
/** Present for tables only. */
rows?: number;
cols?: number;
header?: string[];
/** Present for list blocks only (bulletList/orderedList/taskList). */
items?: number;
}
/**
* Build a COMPACT outline of the TOP-LEVEL blocks of `doc` (the entries in
* `doc.content`). Deliberately does NOT recurse into paragraphs, list items, or
* table cells — compactness is the point; use `getNodeByRef` to drill into a
* specific block.
*
* Each entry carries `{ index, type, id, firstText }`, plus type-specific
* extras: headings add `level`; tables add `rows`/`cols` and the first row's
* cell texts as `header`; list blocks (types ending in "List") add `items`.
* `firstText` is the block's plain text truncated to 100 chars. Null-safe:
* a missing or non-object doc/content yields `[]`.
*/
export function buildOutline(doc: any): OutlineEntry[] {
if (!isObject(doc) || !Array.isArray(doc.content)) return [];
const out: OutlineEntry[] = [];
for (let i = 0; i < doc.content.length; i++) {
const block = doc.content[i];
const type = isObject(block) ? block.type : undefined;
const entry: OutlineEntry = {
index: i,
type,
id: isObject(block) && isObject(block.attrs) ? block.attrs.id ?? null : null,
firstText: truncate(blockPlainText(block), 100),
};
if (type === "heading") {
entry.level = isObject(block.attrs) ? block.attrs.level ?? null : null;
} else if (type === "table") {
const headerRow = block.content?.[0]?.content ?? [];
entry.rows = block.content?.length ?? 0;
entry.cols = block.content?.[0]?.content?.length ?? 0;
entry.header = headerRow.map((cell: any) =>
truncate(blockPlainText(cell), 40),
);
} else if (typeof type === "string" && type.endsWith("List")) {
entry.items = block.content?.length ?? 0;
}
out.push(entry);
}
return out;
}
/**
* Resolve a single node by reference and return `{ node, path, type }`, or
* `null` when nothing matches.
*
* - `ref` of the form `#<n>` (e.g. `#2`) selects the TOP-LEVEL block at index
* `n` in `doc.content`. This is the only way to address table/tableRow/
* tableCell nodes, which carry no `attrs.id`.
* - Otherwise `ref` is treated as a block id: the FIRST node anywhere in the
* tree with `attrs.id === ref` is returned.
*
* `path` is the array of child indices from the doc root down to the node
* (so a top-level block is `[index]`). The returned `node` is a DEEP CLONE,
* so callers can mutate it without touching the input doc. Null-safe.
*/
export function getNodeByRef(
doc: any,
ref: string,
): { node: any; path: number[]; type: string | undefined } | null {
if (!isObject(doc)) return null;
// "#<n>": index into the top-level content array.
const indexMatch = typeof ref === "string" ? ref.match(/^#(\d+)$/) : null;
if (indexMatch) {
const index = Number(indexMatch[1]);
const block = Array.isArray(doc.content) ? doc.content[index] : undefined;
if (!isObject(block)) return null;
return { node: clone(block), path: [index], type: block.type };
}
// Otherwise: depth-first search for the first node with attrs.id === ref.
const search = (
node: any,
trail: number[],
): { node: any; path: number[]; type: string } | null => {
if (!isObject(node)) return null;
if (Array.isArray(node.content)) {
for (let i = 0; i < node.content.length; i++) {
const child = node.content[i];
const path = [...trail, i];
if (matchesId(child, ref)) {
return { node: clone(child), path, type: child.type };
}
const hit = search(child, path);
if (hit != null) return hit;
}
}
return null;
};
return search(doc, []);
}
/**
* Replace EVERY node whose `attrs.id === nodeId` with a deep clone of
* `newNode`, anywhere in the tree (including inside callouts and table cells).
*
* Operates on a clone of `doc`; returns `{ doc, replaced }` where `replaced`
* is the number of nodes substituted. A fresh clone of `newNode` is used for
* each match so they do not share references.
*/
export function replaceNodeById(
doc: any,
nodeId: string,
newNode: any,
): { doc: any; replaced: number } {
const out = clone(doc);
let replaced = 0;
// Walk a content array, replacing direct matches and recursing into the
// (possibly new) children of non-matching nodes.
const walkContent = (content: any[]): void => {
for (let i = 0; i < content.length; i++) {
const child = content[i];
if (matchesId(child, nodeId)) {
content[i] = clone(newNode);
replaced++;
// Do not recurse into a freshly substituted node.
continue;
}
if (isObject(child) && Array.isArray(child.content)) {
walkContent(child.content);
}
}
};
if (isObject(out) && Array.isArray(out.content)) {
walkContent(out.content);
}
return { doc: out, replaced };
}
/**
* Remove EVERY node whose `attrs.id === nodeId` from its parent `content`
* array, anywhere in the tree (recursive, including callouts and tables).
*
* Operates on a clone of `doc`; returns `{ doc, deleted }` where `deleted` is
* the number of nodes removed.
*/
export function deleteNodeById(
doc: any,
nodeId: string,
): { doc: any; deleted: number } {
const out = clone(doc);
let deleted = 0;
// Filter a content array in place, dropping matches and recursing into the
// surviving children.
const walkContent = (content: any[]): any[] => {
const kept: any[] = [];
for (const child of content) {
if (matchesId(child, nodeId)) {
deleted++;
continue;
}
if (isObject(child) && Array.isArray(child.content)) {
child.content = walkContent(child.content);
}
kept.push(child);
}
return kept;
};
if (isObject(out) && Array.isArray(out.content)) {
out.content = walkContent(out.content);
}
return { doc: out, deleted };
}
/**
* Deep-clone `doc` and strip every node/mark attribute whose value is strictly
* `undefined`, so the result is safe to hand to Yjs (which throws an opaque
* "Unexpected content type" when asked to store an `undefined` attribute value).
*
* Only `undefined` keys are removed; `null`, `false`, `0`, and `""` are all
* legitimate JSON-storable values and are preserved. Operates on a clone and
* returns it; the input is never mutated. Defensively null-safe like the rest
* of the file.
*/
export function sanitizeForYjs(doc: any): any {
const out = clone(doc);
// Drop every key whose value is strictly `undefined` from an attrs object.
const stripUndefined = (attrs: any): void => {
if (!isObject(attrs)) return;
for (const key of Object.keys(attrs)) {
if (attrs[key] === undefined) {
delete attrs[key];
}
}
};
const walk = (node: any): void => {
if (!isObject(node)) return;
stripUndefined(node.attrs);
if (Array.isArray(node.marks)) {
for (const mark of node.marks) {
if (isObject(mark)) stripUndefined(mark.attrs);
}
}
if (Array.isArray(node.content)) {
for (const child of node.content) {
walk(child);
}
}
};
walk(out);
return out;
}
/**
* Diagnostics helper: walk the tree and return a human-readable path string for
* the FIRST attribute value (in any `node.attrs` or `mark.attrs`) that Yjs
* cannot store — i.e. `undefined`, a `function`, a `symbol`, or a `bigint`
* (e.g. `content[3].content[0].attrs.indent (undefined)`). Returns `null` when
* every attribute is storable. Null-safe.
*/
export function findUnstorableAttr(doc: any): string | null {
const isUnstorable = (value: any): string | null => {
if (value === undefined) return "undefined";
const t = typeof value;
if (t === "function") return "function";
if (t === "symbol") return "symbol";
if (t === "bigint") return "bigint";
return null;
};
// Check an attrs object; return the offending sub-path or null.
const checkAttrs = (attrs: any, basePath: string): string | null => {
if (!isObject(attrs)) return null;
for (const key of Object.keys(attrs)) {
const kind = isUnstorable(attrs[key]);
if (kind != null) return `${basePath}.${key} (${kind})`;
}
return null;
};
const walk = (node: any, path: string): string | null => {
if (!isObject(node)) return null;
const attrHit = checkAttrs(node.attrs, `${path}.attrs`);
if (attrHit != null) return attrHit;
if (Array.isArray(node.marks)) {
for (let i = 0; i < node.marks.length; i++) {
const markHit = checkAttrs(
node.marks[i]?.attrs,
`${path}.marks[${i}].attrs`,
);
if (markHit != null) return markHit;
}
}
if (Array.isArray(node.content)) {
for (let i = 0; i < node.content.length; i++) {
const childHit = walk(node.content[i], `${path}.content[${i}]`);
if (childHit != null) return childHit;
}
}
return null;
};
// The root doc node carries no useful index, so start the path at "doc".
if (!isObject(doc)) return null;
const attrHit = checkAttrs(doc.attrs, "attrs");
if (attrHit != null) return attrHit;
if (Array.isArray(doc.content)) {
for (let i = 0; i < doc.content.length; i++) {
const childHit = walk(doc.content[i], `content[${i}]`);
if (childHit != null) return childHit;
}
}
return null;
}
/**
* Table structural node types and the container each must live directly inside.
* Used by `insertNodeRelative` to splice rows/cells into the correct ancestor
* rather than blindly into the anchor's direct parent (which would corrupt the
* table's nesting).
*/
const STRUCTURAL_TYPES = new Set(["tableRow", "tableCell", "tableHeader"]);
const REQUIRED_CONTAINER: Record<string, string> = {
tableRow: "table",
tableCell: "tableRow",
tableHeader: "tableRow",
};
/**
* Locate an anchor and return its ancestor chain (from `doc` down to and
* including the matched node). Each chain entry is `{ node, index }` where
* `index` is the node's position inside its parent's `content` array (the root
* doc has index -1). Returns `null` when the anchor cannot be resolved.
*/
function findAnchorChain(
doc: any,
opts: InsertOptions,
): { node: any; index: number }[] | null {
if (!isObject(doc)) return null;
// DFS by id anywhere in the tree, accumulating the path.
if (opts.anchorNodeId != null) {
const targetId = opts.anchorNodeId;
const search = (
node: any,
index: number,
trail: { node: any; index: number }[],
): { node: any; index: number }[] | null => {
if (!isObject(node)) return null;
const here = [...trail, { node, index }];
if (matchesId(node, targetId)) return here;
if (Array.isArray(node.content)) {
for (let i = 0; i < node.content.length; i++) {
const hit = search(node.content[i], i, here);
if (hit != null) return hit;
}
}
return null;
};
return search(doc, -1, []);
}
// By text: only top-level blocks are scanned (same rule as the JSON path).
if (opts.anchorText != null && Array.isArray(doc.content)) {
for (let i = 0; i < doc.content.length; i++) {
if (blockPlainText(doc.content[i]).includes(opts.anchorText)) {
return [
{ node: doc, index: -1 },
{ node: doc.content[i], index: i },
];
}
}
}
return null;
}
/** Options controlling where `insertNodeRelative` places the new node. */
export interface InsertOptions {
position: "before" | "after" | "append";
/** Resolve the anchor by node id anywhere in the tree (preferred). */
anchorNodeId?: string;
/** Fallback: first TOP-LEVEL block whose plain text includes this string. */
anchorText?: string;
}
/**
* Insert a deep clone of `node` relative to an anchor.
*
* - position "append": push the node onto the top-level `doc.content`.
* - position "before"/"after": locate the anchor and splice the node into the
* anchor's parent `content` array immediately before / after it.
*
* Anchor resolution for before/after:
* - if `anchorNodeId` is given, find the node with `attrs.id === anchorNodeId`
* anywhere in the tree (recursive);
* - otherwise, if `anchorText` is given, scan only TOP-LEVEL `doc.content`
* blocks and pick the first whose `blockPlainText` includes `anchorText`.
*
* Operates on a clone of `doc`; returns `{ doc, inserted }`. `inserted` is
* false when the anchor could not be resolved (the doc is returned unchanged
* apart from being cloned).
*/
export function insertNodeRelative(
doc: any,
node: any,
opts: InsertOptions,
): { doc: any; inserted: boolean } {
const out = clone(doc);
const fresh = clone(node);
// Defensive: stay null-safe like the other exports — a missing opts means
// there is nothing actionable to do.
if (!isObject(opts)) return { doc: out, inserted: false };
const isStructural = isObject(node) && STRUCTURAL_TYPES.has(node.type);
// "append": top-level push.
if (opts.position === "append") {
// Structural table nodes (tableRow/tableCell/tableHeader) cannot live at the
// top level — appending one would produce invalid nesting.
if (isStructural) {
throw new Error(
`insert_node: cannot append a ${node.type} at the top level; use ` +
`position before/after with an anchor inside the target table`,
);
}
if (isObject(out)) {
if (!Array.isArray(out.content)) out.content = [];
out.content.push(fresh);
return { doc: out, inserted: true };
}
return { doc: out, inserted: false };
}
const offset = opts.position === "after" ? 1 : 0;
// Structural insert (before/after a tableRow/tableCell/tableHeader): splice
// into the nearest enclosing table/tableRow rather than the anchor's direct
// parent, so the row/cell lands at the correct level of the table.
if (isStructural) {
const containerType = REQUIRED_CONTAINER[node.type];
const chain = findAnchorChain(out, opts);
// Anchor not resolved at all — keep the existing "anchor not found" path.
if (chain == null) return { doc: out, inserted: false };
// Find the DEEPEST ancestor (including the anchor itself) of the required
// container type.
let containerIdx = -1;
for (let i = chain.length - 1; i >= 0; i--) {
if (isObject(chain[i].node) && chain[i].node.type === containerType) {
containerIdx = i;
break;
}
}
if (containerIdx === -1) {
throw new Error(
`insert_node: cannot insert a ${node.type} here — the anchor is not ` +
`inside a ${containerType}. Anchor on a cell's text or a block id ` +
`that lives inside the target table.`,
);
}
const container = chain[containerIdx].node;
if (!Array.isArray(container.content)) container.content = [];
if (containerIdx === chain.length - 1) {
// The matched container IS the anchor node itself (e.g. anchorText
// resolved to the table block): append/prepend within it.
const at = opts.position === "after" ? container.content.length : 0;
container.content.splice(at, 0, fresh);
} else {
// The immediate child on the path leading to the anchor is the row/cell
// to splice next to.
const enclosingChildIndex = chain[containerIdx + 1].index;
container.content.splice(enclosingChildIndex + offset, 0, fresh);
}
return { doc: out, inserted: true };
}
// Resolve by id anywhere in the tree: splice into the parent content array.
if (opts.anchorNodeId != null) {
let inserted = false;
const walkContent = (content: any[]): void => {
for (let i = 0; i < content.length; i++) {
const child = content[i];
if (matchesId(child, opts.anchorNodeId as string)) {
content.splice(i + offset, 0, fresh);
inserted = true;
return;
}
if (isObject(child) && Array.isArray(child.content)) {
walkContent(child.content);
if (inserted) return;
}
}
};
if (isObject(out) && Array.isArray(out.content)) {
walkContent(out.content);
}
return { doc: out, inserted };
}
// Resolve by text: only top-level doc.content blocks are scanned.
if (opts.anchorText != null && isObject(out) && Array.isArray(out.content)) {
for (let i = 0; i < out.content.length; i++) {
if (blockPlainText(out.content[i]).includes(opts.anchorText)) {
out.content.splice(i + offset, 0, fresh);
return { doc: out, inserted: true };
}
}
}
return { doc: out, inserted: false };
}
// ===========================================================================
// Table editing helpers
//
// A Docmost table is a ProseMirror subtree with NO ids on the structural nodes:
// table -> { type:"table", content:[tableRow...] }
// row -> { type:"tableRow", content:[tableCell|tableHeader...] }
// cell -> { type:"tableCell"|"tableHeader", attrs:{colspan,rowspan,colwidth},
// content:[paragraph...] }
// para -> { type:"paragraph", attrs:{id,indent}, content:[textNode...] }
// Only paragraphs/headings carry an `attrs.id`, so a cell is addressed via the
// id of the paragraph inside it. The helpers below all operate on a DEEP CLONE
// of the input doc (via `clone`) and never mutate their inputs.
// ===========================================================================
/**
* Collect EVERY `attrs.id` present anywhere in `node` into `used`. Used to seed
* `makeFreshId` so generated paragraph ids never collide with existing ones.
*/
function collectIds(node: any, used: Set<string>): void {
if (!isObject(node)) return;
if (isObject(node.attrs) && typeof node.attrs.id === "string") {
used.add(node.attrs.id);
}
if (Array.isArray(node.content)) {
for (const child of node.content) collectIds(child, used);
}
}
/**
* Fresh-id generator: returns a random Docmost-style id (12 chars from
* lowercase `a-z0-9`) that is not already in `used`, and records it. On the
* rare collision the id is regenerated. Callers rely on uniqueness, not on the
* exact string, so randomness is fine — and unlike a module-local counter it
* needs no reset and cannot become predictable across calls.
*/
function makeFreshId(used: Set<string>): string {
const alphabet = "abcdefghijklmnopqrstuvwxyz0123456789";
let id: string;
do {
id = "";
for (let i = 0; i < 12; i++) {
id += alphabet[Math.floor(Math.random() * alphabet.length)];
}
} while (used.has(id) || id === "");
used.add(id);
return id;
}
/**
* Resolve a table reference against an ALREADY-CLONED doc and return the LIVE
* table node (a reference inside `rootClone`, so the caller may mutate it) plus
* its index path. Returns null when no table matches.
*
* - `#<n>`: the top-level block at index `n`, only if its `type === "table"`.
* - otherwise: DFS for the node with `attrs.id === tableRef`, then walk UP its
* ancestor chain to the nearest `type === "table"` ancestor.
*/
function locateTable(
rootClone: any,
tableRef: string,
): { table: any; path: number[] } | null {
if (!isObject(rootClone)) return null;
// "#<n>": index into the top-level content array; must be a table.
const indexMatch = typeof tableRef === "string" ? tableRef.match(/^#(\d+)$/) : null;
if (indexMatch) {
const index = Number(indexMatch[1]);
const block = Array.isArray(rootClone.content)
? rootClone.content[index]
: undefined;
if (isObject(block) && block.type === "table") {
return { table: block, path: [index] };
}
return null;
}
// Otherwise: DFS for attrs.id === tableRef, tracking the ancestor chain, then
// climb to the nearest enclosing table.
const search = (
node: any,
trail: { node: any; index: number }[],
): { table: any; path: number[] } | null => {
if (!isObject(node)) return null;
if (Array.isArray(node.content)) {
for (let i = 0; i < node.content.length; i++) {
const child = node.content[i];
const here = [...trail, { node: child, index: i }];
if (matchesId(child, tableRef)) {
// Walk UP to the nearest table ancestor (including the match itself).
for (let j = here.length - 1; j >= 0; j--) {
if (isObject(here[j].node) && here[j].node.type === "table") {
return {
table: here[j].node,
path: here.slice(0, j + 1).map((e) => e.index),
};
}
}
return null; // id found but no enclosing table
}
const hit = search(child, here);
if (hit != null) return hit;
}
}
return null;
};
return search(rootClone, []);
}
/** Build the plain-text → single-paragraph cell content used by all writers. */
function makeCellParagraph(id: string, text: string): any {
return {
type: "paragraph",
attrs: { id, indent: 0 },
// Empty string → a paragraph with an empty content array.
content: text ? [{ type: "text", text }] : [],
};
}
/**
* Read a table as a matrix. Returns null when `tableRef` resolves to no table.
*
* - `rows`/`cols`: the table's row count and the column count of its FIRST row.
* Tables may be ragged (rows of differing length), so `cols` reflects only
* row 0; use the per-row length of `cells`/`cellIds` for each row's actual
* width.
* - `cells`: `string[][]` of each cell's `blockPlainText`.
* - `cellIds`: `(string|null)[][]` of each cell's FIRST paragraph id (or null),
* so callers can `patch_node` a cell for rich-formatted edits.
* - `path`: index path of the table within the doc.
*/
export function readTable(
doc: any,
tableRef: string,
): {
rows: number;
cols: number;
cells: string[][];
cellIds: (string | null)[][];
path: number[];
} | null {
const root = clone(doc);
const located = locateTable(root, tableRef);
if (located == null) return null;
const { table, path } = located;
const rowNodes = Array.isArray(table.content) ? table.content : [];
const rows = rowNodes.length;
const cols = rowNodes[0]?.content?.length ?? 0;
const cells: string[][] = [];
const cellIds: (string | null)[][] = [];
for (const rowNode of rowNodes) {
const cellNodes = Array.isArray(rowNode?.content) ? rowNode.content : [];
const rowText: string[] = [];
const rowIds: (string | null)[] = [];
for (const cellNode of cellNodes) {
rowText.push(blockPlainText(cellNode));
// The cell's first paragraph carries the id used for patch_node.
const firstPara = Array.isArray(cellNode?.content)
? cellNode.content[0]
: undefined;
const id =
isObject(firstPara) && isObject(firstPara.attrs)
? firstPara.attrs.id ?? null
: null;
rowIds.push(id);
}
cells.push(rowText);
cellIds.push(rowIds);
}
return { rows, cols, cells, cellIds, path };
}
/**
* Insert a row of plain-text cells into a table. Returns `{ doc, inserted }`.
*
* The row is padded to the table's column count (`cells[i] ?? ""`); supplying
* MORE cells than columns throws. Each new cell copies `colwidth` for its
* column from the header row when present, gets a fresh-id paragraph, and a
* `colspan:1, rowspan:1` attrs. `index` (when an integer in `[0, rows]`) splices
* the row there; otherwise the row is appended at the end.
*/
export function insertTableRow(
doc: any,
tableRef: string,
cells: string[],
index?: number,
): { doc: any; inserted: boolean } {
const out = clone(doc);
const located = locateTable(out, tableRef);
if (located == null) return { doc: out, inserted: false };
const { table } = located;
if (!Array.isArray(table.content)) table.content = [];
const rows = table.content.length;
const headerRow = table.content[0];
const headerCells = Array.isArray(headerRow?.content) ? headerRow.content : [];
// Column count is the WIDEST existing row, so the guard below stays
// meaningful for ragged tables and the new row matches the table's width.
// Fall back to the supplied cell count only when the table has no rows.
let colCount = 0;
for (const r of table.content) {
if (isObject(r) && Array.isArray(r.content)) colCount = Math.max(colCount, r.content.length);
}
if (colCount === 0) colCount = Array.isArray(cells) ? cells.length : 0;
if (Array.isArray(cells) && cells.length > colCount) {
throw new Error(
`table_insert_row: got ${cells.length} cell(s) but the table has ${colCount} column(s)`,
);
}
// Resolve the landing index up front so the cell-type decision and the splice
// below agree: a valid integer in [0, rows] splices there, else we append.
const landingIndex =
typeof index === "number" && Number.isInteger(index) && index >= 0 && index <= rows
? index
: rows;
// Seed the id generator with every id already in the doc so the new cell
// paragraph ids are unique within the whole document.
const used = new Set<string>();
collectIds(out, used);
const newCells: any[] = [];
for (let i = 0; i < colCount; i++) {
const text = (Array.isArray(cells) ? cells[i] : undefined) ?? "";
const attrs: Record<string, any> = { colspan: 1, rowspan: 1 };
// Copy this column's colwidth from the header row's cell when present.
const colwidth = headerCells[i]?.attrs?.colwidth;
if (colwidth !== undefined) attrs.colwidth = colwidth;
// A row landing at index 0 becomes the new header row, so inherit the
// current header cell's type per column (Docmost uses "tableHeader" there);
// every other position is a plain data cell.
const cellType = landingIndex === 0 ? headerCells[i]?.type ?? "tableCell" : "tableCell";
newCells.push({
type: cellType,
attrs,
content: [makeCellParagraph(makeFreshId(used), text)],
});
}
const newRow = { type: "tableRow", content: newCells };
// Splice at the resolved landing index (append when index was omitted/invalid).
table.content.splice(landingIndex, 0, newRow);
return { doc: out, inserted: true };
}
/**
* Delete the row at 0-based `index` from a table. Returns `{ doc, deleted }`.
* `deleted` is false only when the table cannot be located. Throws on an
* out-of-range index, and refuses to delete the table's only row.
*/
export function deleteTableRow(
doc: any,
tableRef: string,
index: number,
): { doc: any; deleted: boolean } {
const out = clone(doc);
const located = locateTable(out, tableRef);
if (located == null) return { doc: out, deleted: false };
const { table } = located;
if (!Array.isArray(table.content)) table.content = [];
const rows = table.content.length;
if (!Number.isInteger(index) || index < 0 || index >= rows) {
throw new Error(
`table_delete_row: row index ${index} out of range (table has ${rows} row(s))`,
);
}
if (rows <= 1) {
throw new Error(
"table_delete_row: refusing to delete the only row of the table",
);
}
table.content.splice(index, 1);
return { doc: out, deleted: true };
}
/**
* Set the plain-text content of cell `[row, col]` (0-based) to `text`. Returns
* `{ doc, updated }`; `updated` is false only when the table cannot be located.
* Throws when `row`/`col` is out of range. The cell's own attrs (colspan/
* rowspan/colwidth) are preserved; its content becomes a single text paragraph
* that reuses the cell's existing first-paragraph id when present, else a fresh
* one.
*/
export function updateTableCell(
doc: any,
tableRef: string,
row: number,
col: number,
text: string,
): { doc: any; updated: boolean } {
const out = clone(doc);
const located = locateTable(out, tableRef);
if (located == null) return { doc: out, updated: false };
const { table } = located;
const rowNodes = Array.isArray(table.content) ? table.content : [];
const rows = rowNodes.length;
const rowNode = rowNodes[row];
const cols = isObject(rowNode) && Array.isArray(rowNode.content)
? rowNode.content.length
: 0;
if (
!Number.isInteger(row) ||
row < 0 ||
row >= rows ||
!Number.isInteger(col) ||
col < 0 ||
col >= cols
) {
throw new Error(`table_update_cell: cell [${row},${col}] out of range`);
}
const cellNode = rowNode.content[col];
// Reuse the cell's existing first-paragraph id, or mint a fresh unique one.
const existingPara = Array.isArray(cellNode?.content)
? cellNode.content[0]
: undefined;
let id =
isObject(existingPara) && isObject(existingPara.attrs)
? existingPara.attrs.id
: undefined;
if (typeof id !== "string" || id.length === 0) {
const used = new Set<string>();
collectIds(out, used);
id = makeFreshId(used);
}
cellNode.content = [makeCellParagraph(id, text)];
return { doc: out, updated: true };
}

View File

@@ -0,0 +1,39 @@
/**
* Per-page async mutex.
*
* Content writes over the collaboration websocket must never overlap for the
* same page: two concurrent full-document replaces would race on the live Yjs
* fragment. We serialize them with a per-pageId promise chain — each new
* operation waits for the previous one on that page to settle (success or
* failure) before it runs. Different pages never block each other.
*/
const chains = new Map<string, Promise<unknown>>();
// The returned promise carries the real result/rejection of `fn` and MUST be
// awaited/handled by the caller; only the internal chaining tail swallows
// errors (purely to gate ordering).
export function withPageLock<T>(
pageId: string,
fn: () => Promise<T>,
): Promise<T> {
// Wait for the previous op on this page; swallow its error so a failure does
// not poison the queue for the next caller.
const prev = (chains.get(pageId) ?? Promise.resolve()).catch(() => {});
const run = prev.then(fn);
// The tail used for chaining must also swallow errors (it only gates order).
const tail = run.catch(() => {});
chains.set(pageId, tail);
// Drop the map entry once this op is the tail and has settled, to avoid an
// unbounded map of resolved promises.
tail.then(() => {
if (chains.get(pageId) === tail) {
chains.delete(pageId);
}
});
// Callers get the real result/rejection of fn.
return run;
}

View File

@@ -0,0 +1,477 @@
/**
* Pure, network-free transform primitives for a ProseMirror/TipTap document
* tree, plus one higher-level orchestration (commentsToFootnotes).
*
* A ProseMirror node here is a plain JSON object of the shape produced by
* Docmost: `{ type, attrs?, content?, text?, marks? }`. Children live in the
* `content` array; callouts, tables, lists all hold their children in
* `content`, so a single recursive walk reaches them all.
*
* Conventions (matching node-ops.ts):
* - functions that produce a new document deep-clone their input and return a
* `{ doc, ... }` object; the caller's objects are never mutated.
* - functions are defensively null-safe.
* - `marks` arrays are preserved verbatim when fragments are split/reordered.
*/
import { blockPlainText } from "./node-ops.js";
/** Deep-clone a JSON-serializable value without mutating the original. */
function clone<T>(value: T): T {
if (typeof structuredClone === "function") {
return structuredClone(value);
}
// Fallback for environments without structuredClone.
return JSON.parse(JSON.stringify(value)) as T;
}
/** True if `value` is a non-null object (and not an array). */
function isObject(value: any): value is Record<string, any> {
return value != null && typeof value === "object" && !Array.isArray(value);
}
/**
* Plain text of a node (re-export of node-ops' blockPlainText so transform
* authors have a single import surface). Recurses through nested content.
*/
export function blockText(node: any): string {
return blockPlainText(node);
}
/**
* Depth-first visit of every node in the tree, including the root and the
* nested content of callouts, tables, lists, etc. `fn` is called once per node.
* Null-safe: a nullish or non-object node is ignored.
*/
export function walk(node: any, fn: (node: any) => void): void {
if (!isObject(node)) return;
fn(node);
if (Array.isArray(node.content)) {
for (const child of node.content) {
walk(child, fn);
}
}
}
/**
* Find the FIRST node (depth-first) matching `predicate`, anywhere in the tree.
* Works even when the node carries no `attrs.id` (it searches the raw tree, not
* an id index). Returns the live node reference inside `doc` (NOT a clone), or
* null when nothing matches. Typical use: `getList(doc, n => n.type ===
* "orderedList")`.
*/
export function getList(
doc: any,
predicate: (node: any) => boolean,
): any | null {
let found: any | null = null;
walk(doc, (node) => {
if (found == null && predicate(node)) {
found = node;
}
});
return found;
}
/** Options for insertMarkerAfter. */
export interface InsertMarkerOptions {
/**
* Limit the search to TOP-LEVEL blocks with index < beforeBlock. Used to keep
* footnote markers in the body and out of the notes section.
*/
beforeBlock?: number;
}
/**
* Insert `marker` as a PLAIN (unmarked) text run right after the first
* occurrence of `anchor`.
*
* The text run that contains the END of the anchor is SPLIT at the anchor end,
* so all existing marks (links, bold, ...) on the surrounding text are
* preserved, while the inserted marker run carries NO marks. The marker is
* inserted as a leading-space-padded run (`" " + marker`) so it visually
* separates from the preceding word.
*
* The anchor is matched against the concatenated plain text of each top-level
* block (so an anchor that spans several text/mark runs still matches). The
* insertion happens inside the inline content array that holds the anchor's
* final character.
*
* Operates on a clone of `doc`; returns `{ doc, inserted }`. `inserted` is
* false when the anchor text was not found in any in-scope block.
*/
export function insertMarkerAfter(
doc: any,
anchor: string,
marker: string,
opts: InsertMarkerOptions = {},
): { doc: any; inserted: boolean } {
const out = clone(doc);
if (!isObject(out) || !Array.isArray(out.content) || !anchor) {
return { doc: out, inserted: false };
}
const limit =
typeof opts.beforeBlock === "number"
? Math.min(opts.beforeBlock, out.content.length)
: out.content.length;
for (let b = 0; b < limit; b++) {
const block = out.content[b];
if (!isObject(block)) continue;
// Quick reject: skip blocks whose plain text cannot contain the anchor.
if (!blockPlainText(block).includes(anchor)) continue;
// Walk the inline content arrays inside this block, tracking a running
// character offset so we can locate the inline array + text run that holds
// the END of the anchor's first occurrence.
let inserted = false;
let offset = 0; // characters of plain text seen so far in this block
const anchorEnd = (() => blockPlainText(block).indexOf(anchor) + anchor.length)();
// Recurse into inline-bearing containers (paragraph, heading, table cell,
// callout child paragraphs, ...). We only split inside an array of inline
// nodes (text/inline atoms); the FIRST array whose cumulative range covers
// anchorEnd receives the split + marker.
const visit = (container: any): void => {
if (inserted || !isObject(container) || !Array.isArray(container.content)) {
return;
}
const inline = container.content;
// Detect whether this array is an inline array (contains text nodes).
const hasText = inline.some(
(n: any) => isObject(n) && n.type === "text",
);
if (hasText) {
for (let i = 0; i < inline.length; i++) {
const n = inline[i];
const len = isObject(n) ? blockPlainText(n).length : 0;
const runStart = offset;
const runEnd = offset + len;
// The run that contains the anchor end (anchorEnd lands inside this
// run, i.e. runStart < anchorEnd <= runEnd) is the split point.
if (
!inserted &&
isObject(n) &&
n.type === "text" &&
typeof n.text === "string" &&
anchorEnd > runStart &&
anchorEnd <= runEnd
) {
const cut = anchorEnd - runStart; // split index within this text run
const before = n.text.slice(0, cut);
const after = n.text.slice(cut);
const marks = Array.isArray(n.marks) ? n.marks : [];
const parts: any[] = [];
if (before.length > 0) {
parts.push({ ...n, text: before, marks: [...marks] });
}
// Marker is a PLAIN run: no marks copied. Leading space separates it.
parts.push({ type: "text", text: " " + marker });
if (after.length > 0) {
parts.push({ ...n, text: after, marks: [...marks] });
}
inline.splice(i, 1, ...parts);
inserted = true;
return;
}
offset = runEnd;
}
} else {
// Not an inline array: recurse into children (e.g. callout -> paragraph).
for (const child of inline) {
visit(child);
if (inserted) return;
}
}
};
visit(block);
if (inserted) {
return { doc: out, inserted: true };
}
// If the block matched in plain text but we could not split (e.g. anchor
// lands inside an atom), fall through to the next block rather than failing.
}
return { doc: out, inserted: false };
}
/**
* In the disclaimer callout, replace a `[1]…[K]` range marker with `[1]…[n]`.
*
* Docmost translations use a callout that states the footnote range, e.g.
* "[1]…[5]". When the number of notes changes, this rewrites the trailing
* number of any `[1]…[K]` (or `[1]...[K]`, ASCII ellipsis) occurrence found in a
* callout's text nodes to `[1]…[n]`. Operates on a clone; returns
* `{ doc, changed }` where `changed` is the number of text nodes rewritten.
*/
export function setCalloutRange(
doc: any,
n: number,
): { doc: any; changed: number } {
const out = clone(doc);
let changed = 0;
// Match "[1]" + (… or ...) + "[<digits>]"; rewrite the last number to n.
const rangeRe = /(\[1\]\s*(?:…|\.\.\.)\s*\[)\d+(\])/g;
walk(out, (node) => {
if (node.type === "callout") {
walk(node, (inner) => {
if (
inner.type === "text" &&
typeof inner.text === "string" &&
rangeRe.test(inner.text)
) {
rangeRe.lastIndex = 0;
inner.text = inner.text.replace(rangeRe, `$1${n}$2`);
changed++;
}
rangeRe.lastIndex = 0;
});
}
});
return { doc: out, changed };
}
/**
* Generate a short random id for a new block's `attrs.id`. Docmost uses nanoid;
* a base36 random string is sufficient here (uniqueness within one document).
*/
function freshId(): string {
return (
Math.random().toString(36).slice(2, 12) +
Math.random().toString(36).slice(2, 6)
);
}
/**
* Wrap inline ProseMirror nodes in a list item:
* { type:"listItem", content:[{ type:"paragraph", attrs:{id}, content: inlineNodes }] }
* with a fresh random block id on the paragraph. The inline nodes are cloned so
* the result shares no references with the caller's input.
*/
export function noteItem(inlineNodes: any[]): any {
const content = Array.isArray(inlineNodes) ? clone(inlineNodes) : [];
return {
type: "listItem",
content: [
{
type: "paragraph",
attrs: { id: freshId() },
content,
},
],
};
}
/**
* Convert a comment's markdown (e.g. `**Lead.** body...`) into inline
* ProseMirror nodes.
*
* A leading `комментарий: ` (case-insensitive) or `N. ` numeric prefix is
* stripped first. Then a minimal bold-split is applied: a leading
* `**bold lead**` run becomes a text node with a bold mark, and the remainder
* becomes a plain text node. This keeps the conversion synchronous (the
* transform sandbox runs synchronously) and dependency-free; the existing
* async markdownToProseMirror is intentionally NOT used here.
*/
export function mdToInlineNodes(markdown: string): any[] {
let md = typeof markdown === "string" ? markdown : "";
// Strip a leading "комментарий: " prefix (case-insensitive) or a "N. " prefix.
md = md.replace(/^\s*комментарий\s*:\s*/i, "");
md = md.replace(/^\s*\d+\.\s+/, "");
md = md.trim();
if (md === "") return [];
const nodes: any[] = [];
// Leading bold lead: **...** at the very start.
const leadMatch = /^\*\*([^*]+)\*\*\s*/.exec(md);
if (leadMatch) {
const leadText = leadMatch[1];
nodes.push({
type: "text",
text: leadText,
marks: [{ type: "bold" }],
});
const rest = md.slice(leadMatch[0].length);
if (rest.length > 0) {
// Preserve the separating space that followed the bold lead.
const sep = /^\*\*[^*]+\*\*(\s*)/.exec(md);
const spacing = sep ? sep[1] : "";
nodes.push({ type: "text", text: spacing + rest });
}
return nodes;
}
// No bold lead: emit the whole thing as a single plain text node, with any
// remaining **bold** spans split out inline.
return splitInlineBold(md);
}
/**
* Split a string with inline `**bold**` spans into text nodes, bolding the
* spans. Used as the no-lead fallback in mdToInlineNodes.
*/
function splitInlineBold(text: string): any[] {
const nodes: any[] = [];
const re = /\*\*([^*]+)\*\*/g;
let last = 0;
let m: RegExpExecArray | null;
while ((m = re.exec(text)) !== null) {
if (m.index > last) {
nodes.push({ type: "text", text: text.slice(last, m.index) });
}
nodes.push({ type: "text", text: m[1], marks: [{ type: "bold" }] });
last = m.index + m[0].length;
}
if (last < text.length) {
nodes.push({ type: "text", text: text.slice(last) });
}
return nodes.length > 0 ? nodes : [{ type: "text", text }];
}
/** Options for commentsToFootnotes. */
export interface CommentsToFootnotesOptions {
/** Heading text under which the notes orderedList lives. */
notesHeading?: string;
}
/** A comment shape as returned by DocmostClient.listComments. */
export interface FootnoteComment {
id: string;
content: string;
selection?: string | null;
[k: string]: any;
}
/**
* Turn inline comments into numbered footnotes.
*
* For each inline comment that carries a `selection`:
* 1. insert a placeholder marker (a NUL-delimited "\u0000FN<i>\u0000"
* sentinel) right after the selection text in the BODY (before the
* notes heading);
* 2. build a note list item from the comment's markdown content.
*
* Then RENUMBER every footnote marker in the body by reading order: existing
* `[N]` markers and the new "\u0000FN<i>\u0000" placeholders are both replaced by a
* sequential `[seq]`, and the notes orderedList is reordered so each note lines
* up with its marker's reading-order position. Finally the disclaimer callout
* range is synced to the new note count.
*
* Returns `{ doc, consumed }` where `consumed` lists the ids of comments that
* were successfully anchored (their selection was found and a placeholder
* inserted). Operates on a clone of `doc`.
*/
export function commentsToFootnotes(
doc: any,
comments: FootnoteComment[],
opts: CommentsToFootnotesOptions = {},
): { doc: any; consumed: string[] } {
let working = clone(doc);
const notesHeading = opts.notesHeading ?? "Примечания переводчика";
const top: any[] = Array.isArray(working.content) ? working.content : [];
const notesIdx = top.findIndex(
(n) => isObject(n) && n.type === "heading" && blockText(n).trim() === notesHeading,
);
if (notesIdx < 0) {
throw new Error(`heading "${notesHeading}" not found`);
}
// The notes orderedList lives at or after the heading.
const notesList = top
.slice(notesIdx)
.find((n) => isObject(n) && n.type === "orderedList");
if (!notesList) {
throw new Error("notes orderedList not found");
}
const consumed: string[] = [];
const noteByPh = new Map<string, any>();
(Array.isArray(comments) ? comments : []).forEach((c, i) => {
if (!c || !c.selection) return;
// Collision-proof sentinel delimited by NUL control chars, which never occur
// in real Docmost prose — so the renumber regex below cannot mistake any body
// text (e.g. "Press F1 for help", model "FN2") for a placeholder. The NUL is
// transient: the placeholder round-trips within this function (insertMarkerAfter
// inserts it, the renumber pass replaces it with "[N]"), so it never persists
// in a returned/pushed document.
const ph = `\u0000FN${i}\u0000`;
// insertMarkerAfter returns a NEW cloned doc; reassign `working` and refresh
// the `top` / `notesList` references that point into it.
const r = insertMarkerAfter(working, c.selection.trimEnd(), ph, {
beforeBlock: notesIdx,
});
if (!r.inserted) return;
working = r.doc;
noteByPh.set(ph, noteItem(mdToInlineNodes(c.content)));
consumed.push(c.id);
});
// Re-resolve references into the (possibly re-cloned) working doc.
const top2: any[] = Array.isArray(working.content) ? working.content : [];
const notesList2 = top2
.slice(notesIdx)
.find((n) => isObject(n) && n.type === "orderedList");
if (!notesList2) {
throw new Error("notes orderedList not found");
}
const oldNotes: any[] = Array.isArray(notesList2.content)
? notesList2.content
: [];
const newNotes: any[] = [];
let seq = 0;
// Match either an existing "[N]" marker or a NUL-delimited "\u0000FN<i>\u0000"
// placeholder, in reading order across the body (blocks before the notes heading).
const re = /\[(\d+)\]|\u0000FN(\d+)\u0000/g;
// Same range regex setCalloutRange uses to detect the disclaimer callout's
// "[1]…[K]" range; used here to decide whether a top-level callout is the
// disclaimer (skip) or an ordinary callout (renumber normally).
const disclaimerRangeRe = /(\[1\]\s*(?:…|\.\.\.)\s*\[)\d+(\])/;
for (let i = 0; i < notesIdx; i++) {
// Skip ONLY the disclaimer callout: its "[1]…[K]" range is NOT a footnote
// marker and is synced separately by setCalloutRange. Renumbering it here
// would consume note slots and corrupt the sequence. Other top-level
// callouts may carry legitimate "[N]" body markers and are renumbered.
if (
isObject(top2[i]) &&
top2[i].type === "callout" &&
disclaimerRangeRe.test(blockText(top2[i]))
) {
continue;
}
walk(top2[i], (node) => {
if (node.type !== "text" || typeof node.text !== "string") return;
node.text = node.text.replace(re, (_m: string, oldNum: string, phIdx: string) => {
if (oldNum != null) {
const note = oldNotes[Number(oldNum) - 1];
// Every existing body marker MUST map to a real note. An out-of-range
// marker means the document is internally inconsistent; fail loudly
// rather than silently dropping the note and desyncing the callout.
if (note === undefined) {
throw new Error(
`footnote [${oldNum}] has no matching note (notes list has ${oldNotes.length} items); document is inconsistent`,
);
}
newNotes.push(note);
} else {
newNotes.push(noteByPh.get(`\u0000FN${phIdx}\u0000`));
}
return `[${++seq}]`;
});
});
}
// Reorder the notes list IN PLACE on `working` first, THEN sync the callout
// range. setCalloutRange clones `working`, so the reordered notes (mutated
// before the clone) are carried into its result automatically. No null-filter
// here: marker count and note count must stay exactly equal (the out-of-range
// guard above guarantees no undefined entry is ever pushed).
notesList2.content = newNotes;
const synced = setCalloutRange(working, notesList2.content.length);
return { doc: synced.doc, consumed };
}

48
packages/mcp/src/stdio.ts Normal file
View File

@@ -0,0 +1,48 @@
#!/usr/bin/env node
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { createDocmostMcpServer } from "./index.js";
// Standalone stdio entrypoint. This restores the original behavior of the
// package when run as a CLI (`docmost-mcp`): it reads credentials from the
// environment and serves the MCP protocol over stdin/stdout. The factory in
// index.ts stays side-effect-free; all the process/transport lifecycle lives
// here.
const API_URL = process.env.DOCMOST_API_URL;
const EMAIL = process.env.DOCMOST_EMAIL;
const PASSWORD = process.env.DOCMOST_PASSWORD;
if (!API_URL || !EMAIL || !PASSWORD) {
console.error(
"Error: DOCMOST_API_URL, DOCMOST_EMAIL, and DOCMOST_PASSWORD environment variables are required.",
);
process.exit(1);
}
async function run() {
// Global safety nets so a stray rejection/exception cannot silently kill
// the stdio server. Per-tool errors still flow through the SDK and are not
// affected by these handlers; these only catch errors raised OUTSIDE a tool
// call (e.g. a transient ws/collab socket "error" event). Such errors must
// NOT tear down the whole stdio server, so we log only and keep running.
// Genuine startup failures are still fatal via run().catch(...) below.
process.on("unhandledRejection", (reason) => {
console.error("Unhandled promise rejection:", reason);
});
process.on("uncaughtException", (error) => {
console.error("Uncaught exception:", error);
});
const server = createDocmostMcpServer({
apiUrl: API_URL!,
email: EMAIL!,
password: PASSWORD!,
});
const transport = new StdioServerTransport();
await server.connect(transport);
}
run().catch((error) => {
console.error("Fatal error running server:", error);
process.exit(1);
});

474
packages/mcp/test-e2e.mjs Normal file
View File

@@ -0,0 +1,474 @@
// End-to-end test of the docmost-mcp client against a live Docmost server.
// Creates a throwaway page, exercises every code path, cleans up after itself.
// Usage: DOCMOST_API_URL=... DOCMOST_EMAIL=... DOCMOST_PASSWORD=... node test-e2e.mjs
import { DocmostClient } from "./build/client.js";
import axios from "axios";
import { writeFileSync, unlinkSync } from "node:fs";
import { tmpdir } from "node:os";
import { join } from "node:path";
import { deflateSync } from "node:zlib";
const API = process.env.DOCMOST_API_URL;
if (!API || !process.env.DOCMOST_EMAIL || !process.env.DOCMOST_PASSWORD) {
console.error("Set DOCMOST_API_URL, DOCMOST_EMAIL and DOCMOST_PASSWORD env variables.");
process.exit(2);
}
const APP = API.replace(/\/api\/?$/, "");
const client = new DocmostClient(API, process.env.DOCMOST_EMAIL, process.env.DOCMOST_PASSWORD);
let failed = 0;
const check = (name, cond, extra = "") => {
console.log(`${cond ? "OK " : "FAIL"} ${name}${extra ? " — " + extra : ""}`);
if (!cond) failed++;
};
// Minimal solid-color PNG encoder using Node built-ins only (no dependencies).
// Returns a valid PNG buffer for a 1x1 image of the given RGB color.
const crc32 = (buf) => {
let crc = 0xffffffff;
for (let i = 0; i < buf.length; i++) {
crc ^= buf[i];
for (let k = 0; k < 8; k++) crc = crc & 1 ? (crc >>> 1) ^ 0xedb88320 : crc >>> 1;
}
return (crc ^ 0xffffffff) >>> 0;
};
const pngChunk = (type, data) => {
const len = Buffer.alloc(4);
len.writeUInt32BE(data.length, 0);
const typeBuf = Buffer.from(type, "ascii");
const crc = Buffer.alloc(4);
crc.writeUInt32BE(crc32(Buffer.concat([typeBuf, data])), 0);
return Buffer.concat([len, typeBuf, data, crc]);
};
const makePng = (r, g, b) => {
const sig = Buffer.from([0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a]);
const ihdr = Buffer.alloc(13);
ihdr.writeUInt32BE(1, 0); // width
ihdr.writeUInt32BE(1, 4); // height
ihdr[8] = 8; // bit depth
ihdr[9] = 2; // color type: truecolor RGB
ihdr[10] = 0; // compression
ihdr[11] = 0; // filter
ihdr[12] = 0; // interlace
// One scanline: filter byte 0 followed by one RGB pixel.
const raw = Buffer.from([0, r, g, b]);
const idat = deflateSync(raw);
return Buffer.concat([
sig,
pngChunk("IHDR", ihdr),
pngChunk("IDAT", idat),
pngChunk("IEND", Buffer.alloc(0)),
]);
};
const MD = `:::info
**Тестовый callout.** Он должен стать узлом callout, а не blockquote.
:::
Первый абзац с **жирным** и [ссылкой](https://example.com). Маркер тут [1] стоит.
## Раздел два
| Колонка А | Колонка Б |
| --- | --- |
| раз | два |
| три | четыре |
Последний абзац со словом БУКВОЕД для замены.
`;
async function main() {
const spaces = await client.getSpaces();
const spaceId = spaces[0].id;
let pageId = null;
try {
// 1. create_page: title with spaces must survive (was: underscores bug)
const created = await client.createPage("Тест апгрейда MCP сервера", MD, spaceId);
pageId = created.data.id;
check("create_page: title keeps spaces", created.data.title === "Тест апгрейда MCP сервера", created.data.title);
check("create_page: slugId exposed", typeof created.data.slugId === "string" && created.data.slugId.length > 0, created.data.slugId);
// 2. get_page_json: raw ProseMirror with callout + table
const pj = await client.getPageJson(pageId);
const types = pj.content.content.map((n) => n.type);
check("get_page_json: callout node present", types.includes("callout"), types.join(","));
check("get_page_json: table node present", types.includes("table"));
check("get_page_json: slugId present", !!pj.slugId);
// 3. edit_page_text: surgical replace, ids preserved
const idsBefore = JSON.stringify(
pj.content.content.filter((n) => n.attrs?.id).map((n) => n.attrs.id),
);
const editRes = await client.editPageText(pageId, [
{ find: "БУКВОЕД", replace: "КНИГОЛЮБ" },
{ find: "[1]", replace: "[42]" },
]);
check("edit_page_text: both edits applied", editRes.edits.every((e) => e.replacements === 1));
await new Promise((r) => setTimeout(r, 16000)); // wait for server persistence
const pj2 = await client.getPageJson(pageId);
const text2 = JSON.stringify(pj2.content);
check("edit_page_text: replacement visible", text2.includes("КНИГОЛЮБ") && text2.includes("[42]"));
check("edit_page_text: old text gone", !text2.includes("БУКВОЕД"));
const idsAfter = JSON.stringify(
pj2.content.content.filter((n) => n.attrs?.id).map((n) => n.attrs.id),
);
check("edit_page_text: block ids preserved", idsBefore === idsAfter);
check("edit_page_text: callout survived", JSON.stringify(pj2.content).includes('"callout"'));
check("edit_page_text: table survived", pj2.content.content.some((n) => n.type === "table"));
// 4. error reporting: ambiguous and missing finds
let err1 = "";
try { await client.editPageText(pageId, [{ find: "Колонка", replace: "X" }]); } catch (e) { err1 = e.message; }
check("edit_page_text: ambiguous match rejected", err1.includes("matches"), err1);
let err2 = "";
try { await client.editPageText(pageId, [{ find: "НЕСУЩЕСТВУЮЩЕЕ", replace: "X" }]); } catch (e) { err2 = e.message; }
check("edit_page_text: missing text reported", err2.includes("not found"), err2);
// 5. update_page (markdown): table + callout must survive the re-import
await client.updatePage(pageId, MD + "\nДобавленный абзац.\n");
await new Promise((r) => setTimeout(r, 16000));
const pj3 = await client.getPageJson(pageId);
const types3 = pj3.content.content.map((n) => n.type);
check("update_page md: callout survives re-import", types3.includes("callout"), types3.join(","));
check("update_page md: table survives re-import", types3.includes("table"));
const tableNode = pj3.content.content.find((n) => n.type === "table");
const cellText = JSON.stringify(tableNode);
check("update_page md: table cells intact", cellText.includes("четыре") && cellText.includes("Колонка А"));
// 6. update_page_json: lossless write round-trip
pj3.content.content.push({
type: "paragraph",
attrs: { id: "testidjsonpush", indent: 0, textAlign: null },
content: [{ type: "text", text: "Абзац, добавленный через update_page_json." }],
});
await client.updatePageJson(pageId, pj3.content);
await new Promise((r) => setTimeout(r, 16000));
const pj4 = await client.getPageJson(pageId);
const lastNode = pj4.content.content[pj4.content.content.length - 1];
check("update_page_json: paragraph appended", JSON.stringify(pj4.content).includes("добавленный через update_page_json"));
check("update_page_json: custom node id preserved", lastNode.attrs?.id === "testidjsonpush", lastNode.attrs?.id);
// 6b. images: upload / insert / replace (clean src, fresh attachment on replace)
const pngA = join(tmpdir(), `mcp-e2e-img-a-${Date.now()}.png`);
const pngB = join(tmpdir(), `mcp-e2e-img-b-${Date.now()}.png`);
writeFileSync(pngA, makePng(255, 0, 0)); // red
writeFileSync(pngB, makePng(0, 0, 255)); // blue (a DIFFERENT valid PNG)
try {
// Independent login to fetch file bytes with the same cookie the editor uses.
const login = await axios.post(
`${API}/auth/login`,
{ email: process.env.DOCMOST_EMAIL, password: process.env.DOCMOST_PASSWORD },
{ validateStatus: () => true },
);
const token = (login.headers["set-cookie"] || [])
.find((c) => c.startsWith("authToken="))
?.split(";")[0]
.split("=")[1];
const fetchFile = (src) =>
axios.get(`${APP}${src}`, {
headers: { Cookie: `authToken=${token}` },
responseType: "arraybuffer",
validateStatus: () => true,
});
// insert_image: append the first PNG, src must be clean (no ?v=) and fetchable.
const ins = await client.insertImage(pageId, pngA);
check("insert_image: src has no ?v= cache-buster", !ins.src.includes("?v="), ins.src);
const fileA = await fetchFile(ins.src);
check("insert_image: file fetch returns 200", fileA.status === 200, `status=${fileA.status}`);
check(
"insert_image: content-type is image/*",
String(fileA.headers["content-type"] || "").startsWith("image/"),
String(fileA.headers["content-type"]),
);
await new Promise((r) => setTimeout(r, 16000));
const pjImg = await client.getPageJson(pageId);
const findImage = (nodes, id) => {
for (const n of nodes || []) {
if (n.type === "image" && (!id || n.attrs?.attachmentId === id)) return n;
const found = findImage(n.content, id);
if (found) return found;
}
return null;
};
const imgNode = findImage(pjImg.content.content);
const oldAttachmentId = imgNode?.attrs?.attachmentId;
check("insert_image: image node present after persist", !!oldAttachmentId, oldAttachmentId);
// replace_image: must create a NEW attachment with a clean, fetchable URL.
// The 200 fetch is the assertion that catches the in-place-overwrite HTTP 500 regression.
const rep = await client.replaceImage(pageId, oldAttachmentId, pngB);
check("replace_image: new attachment id differs from old", rep.newAttachmentId !== oldAttachmentId, `${oldAttachmentId} -> ${rep.newAttachmentId}`);
check("replace_image: src has no ?v= cache-buster", !rep.src.includes("?v="), rep.src);
const fileB = await fetchFile(rep.src);
check("replace_image: new file fetch returns 200", fileB.status === 200, `status=${fileB.status}`);
check(
"replace_image: new content-type is image/*",
String(fileB.headers["content-type"] || "").startsWith("image/"),
String(fileB.headers["content-type"]),
);
await new Promise((r) => setTimeout(r, 16000));
const pjImg2 = await client.getPageJson(pageId);
check("replace_image: page has new attachment id", !!findImage(pjImg2.content.content, rep.newAttachmentId), rep.newAttachmentId);
check("replace_image: old attachment id repointed away", !findImage(pjImg2.content.content, oldAttachmentId), oldAttachmentId);
} finally {
try { unlinkSync(pngA); } catch {}
try { unlinkSync(pngB); } catch {}
}
// 6c. rich formatting: callout type, task list, inline marks, table alignment,
// and literal $-pattern edits. Runs on its own throwaway page so it does not
// disturb the markdown-export assumptions of later sections.
{
const findNodes = (n, t, acc = []) => {
if (!n) return acc;
if (n.type === t) acc.push(n);
for (const ch of n.content || []) findNodes(ch, t, acc);
return acc;
};
const marksOf = (n, acc = new Set()) => {
if (!n) return acc;
for (const m of n.marks || []) acc.add(m.type);
for (const ch of n.content || []) marksOf(ch, acc);
return acc;
};
const FMD = [
":::warning", "Warning callout with СЛОВО.", ":::", "",
"- [x] done", "- [ ] todo", "",
"Marks: <mark>hl</mark> <sub>lo</sub> <sup>hi</sup>.", "",
"| L | C | R |", "|:--|:-:|--:|", "| a | b | c |", "",
"Edit anchor PRICEMARK.",
].join("\n");
const featPng = join(tmpdir(), `mcp-e2e-feat-${Date.now()}.png`);
writeFileSync(featPng, makePng(0, 255, 0));
const fp = await client.createPage("E2E features " + Date.now(), "init", spaceId);
const fid = fp.data.id;
try {
await client.updatePage(fid, FMD);
await new Promise((r) => setTimeout(r, 16000));
const fj = (await client.getPageJson(fid)).content;
check("feature: callout type 'warning' preserved (was coerced to info)", findNodes(fj, "callout").some((n) => n.attrs?.type === "warning"), JSON.stringify(findNodes(fj, "callout").map((n) => n.attrs?.type)));
check("feature: task list imported (taskList + 2 taskItems)", findNodes(fj, "taskList").length >= 1 && findNodes(fj, "taskItem").length === 2, `tl=${findNodes(fj, "taskList").length} ti=${findNodes(fj, "taskItem").length}`);
check("feature: task checked states preserved", findNodes(fj, "taskItem").some((n) => n.attrs?.checked === true) && findNodes(fj, "taskItem").some((n) => n.attrs?.checked === false));
const mk = [...marksOf(fj)];
check("feature: highlight/subscript/superscript marks imported", ["highlight", "subscript", "superscript"].every((m) => mk.includes(m)), mk.join(","));
check("feature: table cell alignment imported", JSON.stringify(findNodes(fj, "tableHeader").map((n) => n.attrs?.align)) === '["left","center","right"]', JSON.stringify(findNodes(fj, "tableHeader").map((n) => n.attrs?.align)));
const fmd = (await client.getPage(fid)).data.content;
check("feature: md export emits task checkboxes", fmd.includes("- [x]") && fmd.includes("- [ ]"));
check("feature: md export emits table alignment markers", /:--|:-:|--:/.test(fmd));
await client.editPageText(fid, [{ find: "PRICEMARK", replace: "$& costs $100" }]);
await new Promise((r) => setTimeout(r, 16000));
const ftext = JSON.stringify((await client.getPageJson(fid)).content);
check("feature: edit_page_text inserts $-pattern literally (no $& expansion)", ftext.includes("$& costs $100") && !ftext.includes("PRICEMARK costs"));
let badThrew = false;
try { await client.replaceImage(fid, "00000000-0000-0000-0000-000000000000", featPng); } catch (e) { badThrew = /no image with attachmentId/.test(e.message); }
check("feature: replace_image with unknown id throws (no orphan upload)", badThrew);
} finally {
try { await client.deletePage(fid); } catch {}
try { unlinkSync(featPng); } catch {}
}
}
// 6d. node ops: patch / insert / delete a block by id on a throwaway page.
// Three paragraphs are written with KNOWN ids via update_page_json so the
// ids can be targeted directly; each op is verified via getPageJson after
// the standard 16s persistence wait.
{
const np = await client.createPage("E2E node-ops " + Date.now(), "init", spaceId);
const nid = np.data.id;
try {
const mkPara = (id, text) => ({
type: "paragraph",
attrs: { id, indent: 0, textAlign: null },
content: [{ type: "text", text }],
});
// Seed three paragraphs with known ids.
await client.updatePageJson(nid, {
type: "doc",
content: [
mkPara("nodeops-a", "Alpha paragraph."),
mkPara("nodeops-b", "Bravo paragraph."),
mkPara("nodeops-c", "Charlie paragraph."),
],
});
await new Promise((r) => setTimeout(r, 16000));
// Read back the ids the server actually assigned.
const seed = (await client.getPageJson(nid)).content;
const seedIds = seed.content.map((n) => n.attrs?.id);
check("node_ops: three seed paragraphs present", seed.content.length === 3, seedIds.join(","));
const [idA, idB, idC] = seedIds;
// patchNode: replace the middle paragraph; siblings' ids must be unchanged.
await client.patchNode(nid, idB, mkPara(idB, "Bravo PATCHED."));
await new Promise((r) => setTimeout(r, 16000));
const afterPatch = (await client.getPageJson(nid)).content;
const patchText = JSON.stringify(afterPatch);
check("node_ops: patchNode applied new text", patchText.includes("Bravo PATCHED.") && !patchText.includes("Bravo paragraph."));
const patchIds = afterPatch.content.map((n) => n.attrs?.id);
check("node_ops: patchNode kept sibling ids", patchIds[0] === idA && patchIds[2] === idC, patchIds.join(","));
// insertNode: place a new block after the first paragraph.
await client.insertNode(
nid,
mkPara("nodeops-ins", "Inserted paragraph."),
{ position: "after", anchorNodeId: idA },
);
await new Promise((r) => setTimeout(r, 16000));
const afterIns = (await client.getPageJson(nid)).content;
const insIds = afterIns.content.map((n) => n.attrs?.id);
const insText = afterIns.content.map((n) => JSON.stringify(n.content)).join("|");
check("node_ops: insertNode added a block", afterIns.content.length === 4 && insText.includes("Inserted paragraph."));
check("node_ops: insertNode placed block right after anchor", insIds[0] === idA && insIds[1] !== idB && insIds[2] === idB, insIds.join(","));
// deleteNode: remove the last (Charlie) paragraph.
await client.deleteNode(nid, idC);
await new Promise((r) => setTimeout(r, 16000));
const afterDel = (await client.getPageJson(nid)).content;
const delText = JSON.stringify(afterDel);
check("node_ops: deleteNode removed the block", !delText.includes("Charlie paragraph.") && !afterDel.content.some((n) => n.attrs?.id === idC));
} finally {
try { await client.deletePage(nid); } catch {}
}
}
// 6e. rename_page: title-only update must leave the content untouched.
{
const rp = await client.createPage("E2E rename before " + Date.now(), "Rename body marker RENAMEBODY.", spaceId);
const rid = rp.data.id;
try {
const beforeJson = (await client.getPageJson(rid)).content;
const beforeContent = JSON.stringify(beforeJson);
const newTitle = "E2E rename AFTER " + Date.now();
const rr = await client.renamePage(rid, newTitle);
check("rename_page: returns success+title", rr.success === true && rr.title === newTitle, JSON.stringify(rr));
await new Promise((r) => setTimeout(r, 16000));
const afterJson = await client.getPageJson(rid);
check("rename_page: title changed", afterJson.title === newTitle, afterJson.title);
check("rename_page: content unchanged", JSON.stringify(afterJson.content) === beforeContent && beforeContent.includes("RENAMEBODY"));
const afterMd = (await client.getPage(rid)).data;
check("rename_page: get_page reflects new title", afterMd.title === newTitle, afterMd.title);
} finally {
try { await client.deletePage(rid); } catch {}
}
}
// 6f. update_page_json title-only: omitting content updates the title and
// leaves the body intact; supplying neither content nor title throws.
{
const up = await client.createPage("E2E upj-title before " + Date.now(), "Title-only body marker UPJTITLEBODY.", spaceId);
const uid = up.data.id;
try {
const beforeContent = JSON.stringify((await client.getPageJson(uid)).content);
const newTitle = "E2E upj-title AFTER " + Date.now();
const ur = await client.updatePageJson(uid, undefined, newTitle);
check("update_page_json title-only: succeeds", ur.success === true, JSON.stringify(ur));
await new Promise((r) => setTimeout(r, 16000));
const afterJson = await client.getPageJson(uid);
check("update_page_json title-only: title updated", afterJson.title === newTitle, afterJson.title);
check("update_page_json title-only: content intact", JSON.stringify(afterJson.content) === beforeContent && beforeContent.includes("UPJTITLEBODY"));
let upjErr = "";
try { await client.updatePageJson(uid); } catch (e) { upjErr = e.message; }
check("update_page_json: neither content nor title throws", upjErr.includes("nothing to update"), upjErr);
} finally {
try { await client.deletePage(uid); } catch {}
}
}
// 6g. copy_page_content: B's body becomes a copy of A's body, server-side,
// while B's title/slugId stay put. Both pages are throwaways.
{
let aid = null;
let bid = null;
try {
const aPage = await client.createPage("E2E copy SOURCE " + Date.now(), "Source marker COPYSOURCE only here.\n\nSecond source paragraph.", spaceId);
aid = aPage.data.id;
const bPage = await client.createPage("E2E copy TARGET " + Date.now(), "Target marker COPYTARGET only here.", spaceId);
bid = bPage.data.id;
const aJson = await client.getPageJson(aid);
const bBefore = await client.getPageJson(bid);
const bTitleBefore = bBefore.title;
const bSlugBefore = bBefore.slugId;
const aNodeCount = aJson.content.content.length;
const cr = await client.copyPageContent(aid, bid);
check("copy_page_content: returns success + node count", cr.success === true && cr.copiedNodes === aNodeCount, JSON.stringify(cr));
await new Promise((r) => setTimeout(r, 16000));
const bAfter = await client.getPageJson(bid);
const bText = JSON.stringify(bAfter.content);
check("copy_page_content: B now has A's marker", bText.includes("COPYSOURCE"));
check("copy_page_content: B's old marker gone", !bText.includes("COPYTARGET"));
check("copy_page_content: B node count equals A's", bAfter.content.content.length === aNodeCount, `${bAfter.content.content.length} vs ${aNodeCount}`);
check("copy_page_content: B title unchanged", bAfter.title === bTitleBefore, bAfter.title);
check("copy_page_content: B slugId unchanged", bAfter.slugId === bSlugBefore, bAfter.slugId);
// Source must be left untouched by the copy.
const aAfter = JSON.stringify((await client.getPageJson(aid)).content);
check("copy_page_content: source page unchanged", aAfter === JSON.stringify(aJson.content) && aAfter.includes("COPYSOURCE"));
let copyErr = "";
try { await client.copyPageContent(aid, aid); } catch (e) { copyErr = e.message; }
check("copy_page_content: self-copy rejected", copyErr.includes("same page"), copyErr);
} finally {
try { if (bid) await client.deletePage(bid); } catch {}
try { if (aid) await client.deletePage(aid); } catch {}
}
}
// 7. shares: create (idempotent), public access, list, unshare
const share = await client.sharePage(pageId);
check("share_page: returns public URL", share.publicUrl?.startsWith(`${APP}/share/`), share.publicUrl);
const share2 = await client.sharePage(pageId);
check("share_page: idempotent", share2.key === share.key);
const anon = await axios.post(`${API}/shares/page-info`, { pageId: pj4.slugId, shareId: share.key }, { validateStatus: () => true });
check("share_page: anonymous access works", anon.status === 200);
const shares = await client.listShares();
check("list_shares: contains our page", shares.some((s) => s.pageId === pageId && s.publicUrl === share.publicUrl));
const un = await client.unsharePage(pageId);
check("unshare_page: success", un.success === true);
const anon2 = await axios.post(`${API}/shares/page-info`, { pageId: pj4.slugId, shareId: share.key }, { validateStatus: () => true });
check("unshare_page: public access revoked", anon2.status !== 200, `status=${anon2.status}`);
// 8. get_page markdown round-trip sanity (table separator present)
const md = await client.getPage(pageId);
check("get_page md: table separator emitted", md.data.content.includes("| --- |"), "");
check("get_page md: callout exported as :::", md.data.content.includes(":::info"));
// 9. comments: create / list / reply / update / check_new / delete
const beforeComments = new Date(Date.now() - 1000).toISOString();
const c1 = await client.createComment(pageId, "Первый **комментарий** с [ссылкой](https://example.com).");
check("create_comment: created", !!c1.data.id, c1.data.id);
check("create_comment: markdown round-trip", c1.data.content.includes("**комментарий**"), c1.data.content);
const reply = await client.createComment(pageId, "Ответ на комментарий.", "page", undefined, c1.data.id);
check("create_comment: reply has parent", reply.data.parentCommentId === c1.data.id);
const list = await client.listComments(pageId);
check("list_comments: both visible", list.length === 2, `count=${list.length}`);
await client.updateComment(c1.data.id, "Обновлённый текст комментария.");
const got = await client.getComment(c1.data.id);
check("update_comment + get_comment: content updated", got.data.content.includes("Обновлённый"), got.data.content);
const news = await client.checkNewComments(spaceId, beforeComments, pageId);
check("check_new_comments: finds new comments in subtree", news.totalNewComments >= 2, `total=${news.totalNewComments}`);
await client.deleteComment(reply.data.id);
await client.deleteComment(c1.data.id);
const listAfter = await client.listComments(pageId);
check("delete_comment: comments removed", listAfter.length === 0, `count=${listAfter.length}`);
} finally {
if (pageId) {
await client.deletePage(pageId);
console.log("cleanup: test page deleted");
}
}
console.log(failed === 0 ? "\nALL TESTS PASSED" : `\n${failed} TESTS FAILED`);
process.exit(failed === 0 ? 0 : 1);
}
main().catch((e) => {
console.error("FATAL:", e.message);
process.exit(2);
});

View File

@@ -0,0 +1,440 @@
// Mock-HTTP tests for the re-auth / multipart / pagination paths in
// DocmostClient that the live e2e (which always starts with a FRESH token)
// can never reach: expired-token replay, concurrent-login dedup, the
// no-infinite-loop guard, exact cookie parsing, and the paginateAll loop
// guards. A local http.createServer stands in for Docmost so everything
// stays deterministic and offline.
import { test, after } from "node:test";
import assert from "node:assert/strict";
import http from "node:http";
import { DocmostClient } from "../../build/client.js";
// Read a request body to completion (used to assert /auth/login receives the
// email/password JSON, and just to drain the stream before responding).
function readBody(req) {
return new Promise((resolve) => {
let raw = "";
req.on("data", (chunk) => {
raw += chunk;
});
req.on("end", () => resolve(raw));
});
}
// Start an http server bound to an ephemeral port and resolve once it is
// listening, returning the server plus the api base URL the client should use.
function startServer(handler) {
return new Promise((resolve) => {
const server = http.createServer(handler);
server.listen(0, "127.0.0.1", () => {
const { port } = server.address();
resolve({ server, baseURL: `http://127.0.0.1:${port}/api` });
});
});
}
function closeServer(server) {
return new Promise((resolve) => server.close(resolve));
}
// JSON helper.
function sendJson(res, status, obj, extraHeaders = {}) {
res.writeHead(status, { "Content-Type": "application/json", ...extraHeaders });
res.end(JSON.stringify(obj));
}
// Track every server so the after() hook can guarantee nothing is left open.
const openServers = [];
async function spawn(handler) {
const { server, baseURL } = await startServer(handler);
openServers.push(server);
return { server, baseURL };
}
after(async () => {
await Promise.all(openServers.map((s) => closeServer(s)));
});
// -----------------------------------------------------------------------------
// 1) 401-then-200: the interceptor re-logs-in and replays the request once.
// -----------------------------------------------------------------------------
test("401 on a JSON endpoint triggers re-login and a successful replay", async () => {
let loginCalls = 0;
let infoCalls = 0;
let replayedAuthHeader = null;
const { baseURL } = await spawn(async (req, res) => {
await readBody(req);
if (req.url === "/api/auth/login") {
loginCalls++;
// Hand back a fresh token via Set-Cookie (HttpOnly, like Docmost).
sendJson(res, 200, { success: true }, {
"Set-Cookie": "authToken=fresh-token-123; Path=/; HttpOnly",
});
return;
}
if (req.url === "/api/workspace/info") {
infoCalls++;
// First hit: token is stale -> 401. Second hit (the replay): 200, and
// record the Authorization header so we can confirm the new Bearer.
if (infoCalls === 1) {
sendJson(res, 401, { message: "Unauthorized" });
} else {
replayedAuthHeader = req.headers["authorization"];
sendJson(res, 200, { success: true, data: { id: "ws-1", name: "WS" } });
}
return;
}
sendJson(res, 404, { message: "not found" });
});
const client = new DocmostClient(baseURL, "user@example.com", "pw");
// Pre-seed a stale token so the FIRST /workspace/info uses it and 401s,
// exercising the interceptor replay rather than the initial-login path.
client.token = "stale-token";
client.client.defaults.headers.common["Authorization"] = "Bearer stale-token";
const result = await client.getWorkspace();
assert.equal(result.success, true);
assert.equal(loginCalls, 1, "/auth/login should be called exactly once");
assert.equal(infoCalls, 2, "the endpoint should be hit twice (401 then replay)");
assert.equal(
replayedAuthHeader,
"Bearer fresh-token-123",
"the replay must carry the freshly minted Bearer token",
);
});
// -----------------------------------------------------------------------------
// 2) Login dedup: concurrent 401s collapse into a single /auth/login.
// -----------------------------------------------------------------------------
test("concurrent 401s deduplicate into a single /auth/login call", async () => {
let loginCalls = 0;
const infoState = new Map(); // per-endpoint hit counter
const { baseURL } = await spawn(async (req, res) => {
await readBody(req);
if (req.url === "/api/auth/login") {
loginCalls++;
// Delay the login response a touch so all concurrent requests are still
// in flight and genuinely share the one in-flight loginPromise.
setTimeout(() => {
sendJson(res, 200, { success: true }, {
"Set-Cookie": "authToken=shared-token; Path=/; HttpOnly",
});
}, 40);
return;
}
// Several distinct JSON endpoints, each 401 on the first hit then 200.
const n = (infoState.get(req.url) || 0) + 1;
infoState.set(req.url, n);
if (n === 1) {
sendJson(res, 401, { message: "Unauthorized" });
} else {
sendJson(res, 200, { success: true, data: { items: [], meta: {} } });
}
});
const client = new DocmostClient(baseURL, "user@example.com", "pw");
client.token = "stale-token";
client.client.defaults.headers.common["Authorization"] = "Bearer stale-token";
// Fire several different requests concurrently; each one's first attempt 401s
// and triggers a re-login, but the in-flight loginPromise must coalesce them.
await Promise.all([
client.getWorkspace(),
client.getSpaces(),
client.search("anything"),
client.listShares(),
]);
assert.equal(
loginCalls,
1,
"all concurrent 401s must share ONE in-flight /auth/login",
);
});
// -----------------------------------------------------------------------------
// 3) Persistent 401: exactly one retry, no infinite loop; a 401 on the login
// endpoint itself is NOT retried.
// -----------------------------------------------------------------------------
test("a persistently-401 endpoint fails after exactly one retry", async () => {
let loginCalls = 0;
let infoCalls = 0;
const { baseURL } = await spawn(async (req, res) => {
await readBody(req);
if (req.url === "/api/auth/login") {
loginCalls++;
sendJson(res, 200, { success: true }, {
"Set-Cookie": "authToken=t; Path=/; HttpOnly",
});
return;
}
if (req.url === "/api/workspace/info") {
infoCalls++;
// ALWAYS 401, even after a fresh login: the retry guard must stop here.
sendJson(res, 401, { message: "Unauthorized" });
return;
}
sendJson(res, 404, {});
});
const client = new DocmostClient(baseURL, "user@example.com", "pw");
client.token = "stale-token";
client.client.defaults.headers.common["Authorization"] = "Bearer stale-token";
await assert.rejects(() => client.getWorkspace());
// Original request + exactly ONE replay = 2 hits, never more (no loop).
assert.equal(infoCalls, 2, "endpoint hit at most twice (one retry only)");
assert.equal(loginCalls, 1, "re-login attempted exactly once");
});
test("a 401 on /auth/login itself is not retried", async () => {
let loginCalls = 0;
const { baseURL } = await spawn(async (req, res) => {
await readBody(req);
if (req.url === "/api/auth/login") {
loginCalls++;
// The login endpoint rejects credentials. The interceptor must NOT try
// to "re-login to fix a failed login" — that would loop forever.
sendJson(res, 401, { message: "Invalid credentials" });
return;
}
sendJson(res, 404, {});
});
const client = new DocmostClient(baseURL, "user@example.com", "wrong-pw");
// login() -> performLogin POSTs /auth/login, gets 401; the interceptor sees
// isLoginRequest and rejects without retrying. So /auth/login is hit once.
await assert.rejects(() => client.login());
assert.equal(loginCalls, 1, "/auth/login must be attempted exactly once");
});
// -----------------------------------------------------------------------------
// 4) performLogin cookie parsing: base64 "=" padding survives intact, and a
// cookie literally named authTokenRefresh is not mistaken for authToken.
// -----------------------------------------------------------------------------
test("a token with base64 '=' padding round-trips intact to the server", async () => {
// A realistic JWT-ish value whose final segment ends in base64 "=" padding.
const paddedToken = "header.payload.c2lnbmF0dXJl==";
let sentBearer = null;
const { baseURL } = await spawn(async (req, res) => {
await readBody(req);
if (req.url === "/api/auth/login") {
sendJson(res, 200, { success: true }, {
// Include attributes AND a base64 value containing "=" so we verify the
// parser keeps everything after the FIRST "=" up to the first ";".
"Set-Cookie": `authToken=${paddedToken}; Path=/; HttpOnly; SameSite=Lax`,
});
return;
}
if (req.url === "/api/workspace/info") {
sentBearer = req.headers["authorization"];
sendJson(res, 200, { success: true, data: { id: "ws" } });
return;
}
sendJson(res, 404, {});
});
const client = new DocmostClient(baseURL, "user@example.com", "pw");
await client.login();
// The parsed token equals exactly what the server set (padding preserved).
assert.equal(client.token, paddedToken);
// And the client sends that exact token back on a subsequent request.
await client.getWorkspace();
assert.equal(sentBearer, `Bearer ${paddedToken}`);
});
test("an authTokenRefresh cookie is not mistaken for authToken", async () => {
const { baseURL } = await spawn(async (req, res) => {
await readBody(req);
if (req.url === "/api/auth/login") {
// Set BOTH cookies. The exact-name match must pick authToken=real and
// ignore authTokenRefresh=should-not-match (a prefix match would grab it).
res.writeHead(200, {
"Content-Type": "application/json",
"Set-Cookie": [
"authTokenRefresh=should-not-match; Path=/; HttpOnly",
"authToken=real-token; Path=/; HttpOnly",
],
});
res.end(JSON.stringify({ success: true }));
return;
}
sendJson(res, 404, {});
});
const client = new DocmostClient(baseURL, "user@example.com", "pw");
await client.login();
assert.equal(client.token, "real-token");
});
test("a response with ONLY authTokenRefresh (no authToken) rejects login", async () => {
const { baseURL } = await spawn(async (req, res) => {
await readBody(req);
if (req.url === "/api/auth/login") {
sendJson(res, 200, { success: true }, {
"Set-Cookie": "authTokenRefresh=nope; Path=/; HttpOnly",
});
return;
}
sendJson(res, 404, {});
});
const client = new DocmostClient(baseURL, "user@example.com", "pw");
// No authToken cookie present -> performLogin throws.
await assert.rejects(() => client.login(), /No authToken cookie/);
});
// -----------------------------------------------------------------------------
// 5) paginateAll loop guards.
// -----------------------------------------------------------------------------
test("paginateAll stops at the MAX_PAGES cap when hasNextPage is always true", async () => {
let pageRequests = 0;
const LIMIT = 100;
const { baseURL } = await spawn(async (req, res) => {
await readBody(req);
if (req.url === "/api/auth/login") {
sendJson(res, 200, { success: true }, {
"Set-Cookie": "authToken=t; Path=/; HttpOnly",
});
return;
}
if (req.url === "/api/spaces") {
pageRequests++;
// Always return a FULL page (== requested limit) AND hasNextPage:true.
// Both the page-length check and the hasNextPage flag say "keep going",
// so only the MAX_PAGES ceiling can stop the loop.
const items = Array.from({ length: LIMIT }, (_, i) => ({
id: `s-${pageRequests}-${i}`,
}));
sendJson(res, 200, {
success: true,
data: { items, meta: { hasNextPage: true } },
});
return;
}
sendJson(res, 404, {});
});
const client = new DocmostClient(baseURL, "user@example.com", "pw");
const all = await client.paginateAll("/spaces", {}, LIMIT);
// MAX_PAGES is 50; the loop must terminate there, not run unbounded.
assert.ok(
pageRequests <= 50,
`expected <= 50 page requests, got ${pageRequests}`,
);
assert.equal(pageRequests, 50, "should fetch exactly the MAX_PAGES cap");
assert.equal(all.length, 50 * LIMIT, "accumulates one full page per request");
});
test("paginateAll stops early on a short page even if hasNextPage is true", async () => {
let pageRequests = 0;
const LIMIT = 100;
const { baseURL } = await spawn(async (req, res) => {
await readBody(req);
if (req.url === "/api/auth/login") {
sendJson(res, 200, { success: true }, {
"Set-Cookie": "authToken=t; Path=/; HttpOnly",
});
return;
}
if (req.url === "/api/spaces") {
pageRequests++;
// First page is full; second page is SHORT (fewer than limit). The short
// page must stop the loop immediately even though hasNextPage stays true.
const count = pageRequests === 1 ? LIMIT : 3;
const items = Array.from({ length: count }, (_, i) => ({
id: `s-${pageRequests}-${i}`,
}));
sendJson(res, 200, {
success: true,
data: { items, meta: { hasNextPage: true } },
});
return;
}
sendJson(res, 404, {});
});
const client = new DocmostClient(baseURL, "user@example.com", "pw");
const all = await client.paginateAll("/spaces", {}, LIMIT);
assert.equal(pageRequests, 2, "stops right after the first short page");
assert.equal(all.length, LIMIT + 3, "full page + short page accumulated");
});
test("paginateAll handles both {data:{items,meta}} and {items,meta} envelopes", async () => {
// Bare envelope: { items, meta } with no { data } wrapper.
const bareRequests = [];
const { baseURL: bareURL } = await spawn(async (req, res) => {
await readBody(req);
if (req.url === "/api/auth/login") {
sendJson(res, 200, { success: true }, {
"Set-Cookie": "authToken=t; Path=/; HttpOnly",
});
return;
}
if (req.url === "/api/groups") {
bareRequests.push(1);
// Page 1: full page, hasNextPage true. Page 2: short page -> stop.
if (bareRequests.length === 1) {
sendJson(res, 200, {
items: Array.from({ length: 100 }, (_, i) => ({ id: `g${i}` })),
meta: { hasNextPage: true },
});
} else {
sendJson(res, 200, {
items: [{ id: "tail" }],
meta: { hasNextPage: false },
});
}
return;
}
sendJson(res, 404, {});
});
const bareClient = new DocmostClient(bareURL, "user@example.com", "pw");
const bare = await bareClient.paginateAll("/groups", {}, 100);
assert.equal(bare.length, 101, "bare {items,meta} envelope handled");
assert.equal(bare[bare.length - 1].id, "tail");
// Wrapped envelope: { data: { items, meta } }.
const wrappedRequests = [];
const { baseURL: wrappedURL } = await spawn(async (req, res) => {
await readBody(req);
if (req.url === "/api/auth/login") {
sendJson(res, 200, { success: true }, {
"Set-Cookie": "authToken=t; Path=/; HttpOnly",
});
return;
}
if (req.url === "/api/groups") {
wrappedRequests.push(1);
// Single short page -> stops after one request.
sendJson(res, 200, {
data: {
items: [{ id: "w1" }, { id: "w2" }],
meta: { hasNextPage: false },
},
});
return;
}
sendJson(res, 404, {});
});
const wrappedClient = new DocmostClient(wrappedURL, "user@example.com", "pw");
const wrapped = await wrappedClient.paginateAll("/groups", {}, 100);
assert.equal(wrapped.length, 2, "wrapped {data:{items,meta}} envelope handled");
assert.equal(wrappedRequests.length, 1, "single short page -> one request");
});

View File

@@ -0,0 +1,126 @@
import { test } from "node:test";
import assert from "node:assert/strict";
import {
buildCollabWsUrl,
markdownToProseMirror,
} from "../../build/lib/collaboration.js";
/** Recursively find the first descendant node (or self) of the given type. */
function find(node, type) {
if (!node || typeof node !== "object") return null;
if (node.type === type) return node;
const kids = Array.isArray(node.content) ? node.content : [];
for (const k of kids) {
const r = find(k, type);
if (r) return r;
}
return null;
}
/** Recursively collect every descendant node (and self) of the given type. */
function findAll(node, type, acc = []) {
if (!node || typeof node !== "object") return acc;
if (node.type === type) acc.push(node);
const kids = Array.isArray(node.content) ? node.content : [];
for (const k of kids) findAll(k, type, acc);
return acc;
}
/** Collect the set of mark types present anywhere in the document tree. */
function collectMarkTypes(node, set = new Set()) {
if (!node || typeof node !== "object") return set;
if (Array.isArray(node.marks)) {
for (const m of node.marks) set.add(m.type);
}
const kids = Array.isArray(node.content) ? node.content : [];
for (const k of kids) collectMarkTypes(k, set);
return set;
}
test("buildCollabWsUrl: https + /api -> wss + /collab", () => {
assert.equal(buildCollabWsUrl("https://h/api"), "wss://h/collab");
});
test("buildCollabWsUrl: http (no /api) -> ws + /collab", () => {
assert.equal(buildCollabWsUrl("http://h"), "ws://h/collab");
});
test("buildCollabWsUrl: trailing slash on /api/ is handled", () => {
assert.equal(buildCollabWsUrl("https://h/api/"), "wss://h/collab");
});
test("buildCollabWsUrl: a base with trailing slash maps to /collab", () => {
assert.equal(buildCollabWsUrl("https://h/"), "wss://h/collab");
});
test("buildCollabWsUrl: query and hash on the base are dropped", () => {
assert.equal(buildCollabWsUrl("https://h/api?foo=1#bar"), "wss://h/collab");
});
test("markdownToProseMirror: :::warning::: becomes a callout node typed warning", async () => {
const doc = await markdownToProseMirror(":::warning\nhello\n:::");
const callout = find(doc, "callout");
assert.ok(callout, "expected a callout node");
assert.equal(callout.attrs.type, "warning");
});
test("markdownToProseMirror: a ::: line inside a fenced code block is not a callout delimiter", async () => {
const doc = await markdownToProseMirror("```\n:::warning\nx\n:::\n```");
assert.equal(find(doc, "callout"), null, "code-fenced ::: must not open a callout");
assert.ok(find(doc, "codeBlock"), "the fenced block should stay a codeBlock");
});
test("markdownToProseMirror: GFM checkbox list -> one taskList, two taskItems, no bulletList", async () => {
const doc = await markdownToProseMirror("- [x] a\n- [ ] b");
const taskLists = findAll(doc, "taskList");
assert.equal(taskLists.length, 1, "expected exactly one taskList");
const items = findAll(doc, "taskItem");
assert.equal(items.length, 2, "expected two taskItems");
assert.deepEqual(
items.map((i) => i.attrs.checked),
[true, false],
);
assert.equal(find(doc, "bulletList"), null, "no bulletList should remain");
});
test("markdownToProseMirror: numbered checklist -> one taskList, no orderedList (ol phantom regression)", async () => {
const doc = await markdownToProseMirror("1. [x] a\n2. [ ] b");
const taskLists = findAll(doc, "taskList");
assert.equal(taskLists.length, 1, "expected exactly one taskList");
assert.equal(
find(doc, "orderedList"),
null,
"a numbered checklist must not leave a phantom orderedList",
);
assert.deepEqual(
findAll(doc, "taskItem").map((i) => i.attrs.checked),
[true, false],
);
});
test("markdownToProseMirror: a plain numbered list stays an orderedList", async () => {
const doc = await markdownToProseMirror("1. a\n2. b");
assert.ok(find(doc, "orderedList"), "plain numbered list should be an orderedList");
assert.equal(find(doc, "taskList"), null, "plain numbered list must not become a taskList");
});
test("markdownToProseMirror: mark/sub/sup produce highlight, subscript, superscript marks", async () => {
const doc = await markdownToProseMirror("<mark>h</mark> <sub>x</sub> <sup>y</sup>");
const marks = collectMarkTypes(doc);
assert.ok(marks.has("highlight"), "expected a highlight mark");
assert.ok(marks.has("subscript"), "expected a subscript mark");
assert.ok(marks.has("superscript"), "expected a superscript mark");
});
test("markdownToProseMirror: an aligned GFM table maps header alignment", async () => {
const doc = await markdownToProseMirror(
"| a | b | c |\n|:--|:-:|--:|\n| 1 | 2 | 3 |",
);
const headers = findAll(doc, "tableHeader");
assert.equal(headers.length, 3, "expected three header cells");
assert.deepEqual(
headers.map((h) => h.attrs.align),
["left", "center", "right"],
);
});

View File

@@ -0,0 +1,136 @@
import { test } from "node:test";
import assert from "node:assert/strict";
import { diffDocs } from "../../build/lib/diff.js";
// ---------------------------------------------------------------------------
// Builders
// ---------------------------------------------------------------------------
const t = (text, marks) => (marks ? { type: "text", text, marks } : { type: "text", text });
const para = (...children) => ({ type: "paragraph", content: children });
const doc = (...children) => ({ type: "doc", content: children });
// ---------------------------------------------------------------------------
// Core diff: one inserted word
// ---------------------------------------------------------------------------
test("diffDocs detects a single inserted word", () => {
const oldDoc = doc(para(t("Hello world")));
const newDoc = doc(para(t("Hello brave world")));
const r = diffDocs(oldDoc, newDoc);
assert.ok(r.summary.inserted > 0, "reports insertion length");
assert.equal(r.summary.deleted, 0, "no deletions");
const ins = r.changes.find((c) => c.op === "insert");
assert.ok(ins, "has an insert change");
assert.match(ins.text, /brave/);
assert.match(r.markdown, /inserted/);
});
// ---------------------------------------------------------------------------
// Core diff: one deleted block
// ---------------------------------------------------------------------------
test("diffDocs detects a deleted block", () => {
const oldDoc = doc(para(t("keep this")), para(t("remove this block")));
const newDoc = doc(para(t("keep this")));
const r = diffDocs(oldDoc, newDoc);
assert.ok(r.summary.deleted > 0, "reports deletion length");
const del = r.changes.find((c) => c.op === "delete");
assert.ok(del, "has a delete change");
assert.match(del.text, /remove this block/);
});
// ---------------------------------------------------------------------------
// Integrity counts
// ---------------------------------------------------------------------------
test("diffDocs reports integrity counts as [old,new] tuples", () => {
const link = [{ type: "link", attrs: { href: "http://x" } }];
const image = { type: "image", attrs: { src: "/api/files/a.png" } };
const callout = {
type: "callout",
attrs: { type: "info" },
content: [para(t("note"))],
};
const oldDoc = doc(
para(t("a link", link)),
image,
callout,
para(t("body with [1] and [2]")),
);
// new doc: drop the image, drop one footnote marker, keep link + callout.
const newDoc = doc(
para(t("a link", link)),
callout,
para(t("body with [1]")),
);
const r = diffDocs(oldDoc, newDoc);
assert.deepEqual(r.integrity.images, [1, 0]);
assert.deepEqual(r.integrity.links, [1, 1]);
assert.deepEqual(r.integrity.callouts, [1, 1]);
assert.deepEqual(r.integrity.tables, [0, 0]);
// footnote markers parsed in reading order from the body.
assert.deepEqual(r.integrity.footnoteMarkers, [[1, 2], [1]]);
});
// ---------------------------------------------------------------------------
// Footnote markers stop at the notes heading
// ---------------------------------------------------------------------------
test("diffDocs footnote markers ignore the notes section", () => {
const oldDoc = doc(
para(t("body [1]")),
{ type: "heading", attrs: { level: 2 }, content: [t("Примечания переводчика")] },
{
type: "orderedList",
content: [
{ type: "listItem", content: [para(t("note [1] inside list"))] },
],
},
);
const r = diffDocs(oldDoc, oldDoc);
// Only the body [1] is counted, not the [1] inside the notes list.
assert.deepEqual(r.integrity.footnoteMarkers, [[1], [1]]);
assert.equal(r.summary.inserted, 0);
assert.equal(r.summary.deleted, 0);
});
// ---------------------------------------------------------------------------
// Bug 3: links integrity counts UNIQUE links by href, not link-bearing runs.
// A single link split across two runs (link+bold, then link) is one link.
// ---------------------------------------------------------------------------
test("diffDocs counts a link split across two runs as one link", () => {
const link = [{ type: "link", attrs: { href: "http://x" } }];
const linkBold = [
{ type: "link", attrs: { href: "http://x" } },
{ type: "bold" },
];
// One logical link to http://x rendered as two adjacent runs.
const splitDoc = doc(para(t("see ", linkBold), t("the link", link), t(" here")));
// Same single href represented as a single run.
const wholeDoc = doc(para(t("see the link", link), t(" here")));
const r = diffDocs(splitDoc, wholeDoc);
// Unique-by-href: both sides have exactly one distinct link.
assert.deepEqual(r.integrity.links, [1, 1]);
});
test("diffDocs counts two distinct hrefs as two links", () => {
const a = [{ type: "link", attrs: { href: "http://a" } }];
const b = [{ type: "link", attrs: { href: "http://b" } }];
const oldDoc = doc(para(t("one", a), t(" two", b)));
// new doc drops the second link.
const newDoc = doc(para(t("one", a), t(" two")));
const r = diffDocs(oldDoc, newDoc);
assert.deepEqual(r.integrity.links, [2, 1]);
});
// ---------------------------------------------------------------------------
// Identical docs produce no changes
// ---------------------------------------------------------------------------
test("diffDocs on identical docs reports no changes", () => {
const d = doc(para(t("unchanged")));
const r = diffDocs(d, d);
assert.equal(r.changes.length, 0);
assert.equal(r.summary.blocksChanged, 0);
});

View File

@@ -0,0 +1,190 @@
import { test } from "node:test";
import assert from "node:assert/strict";
import {
serializeDocmostMarkdown,
parseDocmostMarkdown,
} from "../../build/lib/markdown-document.js";
import { convertProseMirrorToMarkdown } from "../../build/lib/markdown-converter.js";
import { markdownToProseMirror } from "../../build/lib/collaboration.js";
/** Recursively find the first descendant node (or self) of the given type. */
function find(node, type) {
if (!node || typeof node !== "object") return null;
if (node.type === type) return node;
const kids = Array.isArray(node.content) ? node.content : [];
for (const k of kids) {
const r = find(k, type);
if (r) return r;
}
return null;
}
/** Recursively collect every descendant node (and self) of the given type. */
function findAll(node, type, acc = []) {
if (!node || typeof node !== "object") return acc;
if (node.type === type) acc.push(node);
const kids = Array.isArray(node.content) ? node.content : [];
for (const k of kids) findAll(k, type, acc);
return acc;
}
/** Find the first text node carrying a mark of the given type. */
function findTextWithMark(node, markType) {
for (const t of findAll(node, "text")) {
if (Array.isArray(t.marks) && t.marks.some((m) => m.type === markType)) {
return t;
}
}
return null;
}
test("serialize/parse: meta and comments survive a round-trip; body recovered", () => {
const meta = {
version: 1,
pageId: "p1",
slugId: "s1",
title: "Hello",
spaceId: "sp1",
parentPageId: null,
};
const body = "# Title\n\nSome **bold** body text.";
const comments = [
{ id: "c1", content: "a note", resolved: false },
{ id: "c2", content: "another", resolved: true },
];
const full = serializeDocmostMarkdown(meta, body, comments);
const parsed = parseDocmostMarkdown(full);
assert.deepEqual(parsed.meta, meta);
assert.deepEqual(parsed.comments, comments);
assert.equal(parsed.body, body);
});
test("serialize: a page with no comments still emits an empty comments block", () => {
const full = serializeDocmostMarkdown({ version: 1 }, "body", []);
assert.match(full, /<!--\s*docmost:comments\s*\n\[\]\n-->/);
const parsed = parseDocmostMarkdown(full);
assert.deepEqual(parsed.comments, []);
});
test("parse: plain markdown with no blocks -> meta=null, comments=null, body=input", () => {
const input = " # Just a heading\n\nplain body ";
const parsed = parseDocmostMarkdown(input);
assert.equal(parsed.meta, null);
assert.equal(parsed.comments, null);
assert.equal(parsed.body, input.trim());
});
test("parse: tolerant to CRLF line endings", () => {
const meta = { version: 1, pageId: "p9" };
const body = "line one\n\nline two";
const full = serializeDocmostMarkdown(meta, body, []).replace(/\n/g, "\r\n");
const parsed = parseDocmostMarkdown(full);
assert.deepEqual(parsed.meta, meta);
assert.deepEqual(parsed.comments, []);
assert.equal(parsed.body, body);
});
test("parse: a malformed present meta block throws a clear error", () => {
const bad = "<!-- docmost:meta\n{not valid json}\n-->\n\nbody\n";
assert.throws(() => parseDocmostMarkdown(bad), /docmost:meta JSON/);
});
test("parse: a literal comments-block in the body is left in the body when a real trailing block follows", () => {
// The body documents the format (e.g. inside a fenced code block) AND there is
// a real trailing comments block. Only the final, document-ending block is
// metadata; the literal stays in the body verbatim.
const meta = { version: 1, pageId: "p-literal" };
const literal = "```\n<!-- docmost:comments\n[1]\n-->\n```";
const body = `# Doc\n\nExample of the format:\n\n${literal}`;
const realComments = [{ id: "c1", content: "real" }];
const full = serializeDocmostMarkdown(meta, body, realComments);
const parsed = parseDocmostMarkdown(full);
// The REAL trailing comments are parsed.
assert.deepEqual(parsed.comments, realComments);
// The literal block text is still present in the recovered body.
assert.ok(
parsed.body.includes("<!-- docmost:comments\n[1]\n-->"),
"expected the literal comments block to remain in the body",
);
assert.equal(parsed.body, body.trim());
});
test("parse: a body-ending literal comments block (no real trailing block) is treated as the final block", () => {
// Hand-written file whose ONLY `docmost:comments` opener is a literal that
// also ends the document. Per the implementation, the final document-ending
// block IS treated as metadata, so it is parsed and stripped from the body.
const input = "# Doc\n\nsome text\n\n<!-- docmost:comments\n[1]\n-->\n";
const parsed = parseDocmostMarkdown(input);
assert.equal(parsed.meta, null);
assert.deepEqual(parsed.comments, [1]);
assert.equal(parsed.body, "# Doc\n\nsome text");
});
test("parse: a literal comments block NOT ending the document stays entirely in the body", () => {
// The literal opener/closer is followed by more body content, so it does not
// end the document and is therefore left untouched in the body.
const input =
"# Doc\n\n<!-- docmost:comments\n[1]\n-->\n\nmore body after it\n";
const parsed = parseDocmostMarkdown(input);
assert.equal(parsed.meta, null);
assert.equal(parsed.comments, null);
assert.equal(parsed.body, input.trim());
});
test("export emits comment anchors and they round-trip back to a comment mark", () => {
// A small ProseMirror doc with a text run carrying a `comment` mark.
const doc = {
type: "doc",
content: [
{
type: "paragraph",
content: [
{ type: "text", text: "before " },
{
type: "text",
text: "anchored",
marks: [{ type: "comment", attrs: { commentId: "cm-123" } }],
},
{ type: "text", text: " after" },
],
},
],
};
const body = convertProseMirrorToMarkdown(doc);
assert.match(body, /data-comment-id="cm-123"/);
return markdownToProseMirror(body).then((rebuilt) => {
const commented = findTextWithMark(rebuilt, "comment");
assert.ok(commented, "expected a text node with a comment mark");
const mark = commented.marks.find((m) => m.type === "comment");
assert.equal(mark.attrs.commentId, "cm-123");
});
});
test("drawio round-trips through export and import", () => {
const doc = {
type: "doc",
content: [
{
type: "drawio",
attrs: { src: "https://example/diagram.xml", attachmentId: "att-7" },
},
],
};
const body = convertProseMirrorToMarkdown(doc);
assert.match(body, /data-type="drawio"/);
assert.match(body, /data-src="https:\/\/example\/diagram\.xml"/);
return markdownToProseMirror(body).then((rebuilt) => {
const diagram = find(rebuilt, "drawio");
assert.ok(diagram, "expected a drawio node after import");
assert.equal(diagram.attrs.src, "https://example/diagram.xml");
});
});

View File

@@ -0,0 +1,173 @@
import { test } from "node:test";
import assert from "node:assert/strict";
import { filterComment, filterPage } from "../../build/lib/filters.js";
test("filterComment includes resolvedAt/resolvedById as null when absent", () => {
const result = filterComment({
id: "c1",
pageId: "p1",
content: "hello",
createdAt: "2026-01-01T00:00:00.000Z",
});
assert.equal(result.resolvedAt, null);
assert.equal(result.resolvedById, null);
});
test("filterComment passes through resolvedAt/resolvedById when present", () => {
const result = filterComment({
id: "c1",
pageId: "p1",
content: "hello",
createdAt: "2026-01-01T00:00:00.000Z",
resolvedAt: "2026-02-02T10:00:00.000Z",
resolvedById: "user-42",
});
assert.equal(result.resolvedAt, "2026-02-02T10:00:00.000Z");
assert.equal(result.resolvedById, "user-42");
});
test("filterComment still includes id/content/createdAt", () => {
const result = filterComment({
id: "c-id",
pageId: "p1",
content: "the body",
createdAt: "2026-03-03T03:03:03.000Z",
});
assert.equal(result.id, "c-id");
assert.equal(result.content, "the body");
assert.equal(result.createdAt, "2026-03-03T03:03:03.000Z");
});
test("filterComment uses markdownContent override when provided", () => {
const result = filterComment(
{
id: "c1",
pageId: "p1",
content: "raw json content",
createdAt: "2026-01-01T00:00:00.000Z",
},
"**markdown** content",
);
assert.equal(result.content, "**markdown** content");
});
test("filterComment is null-safe on missing creator", () => {
const result = filterComment({
id: "c1",
pageId: "p1",
content: "hello",
createdAt: "2026-01-01T00:00:00.000Z",
creatorId: "u1",
// no `creator` object present
});
assert.equal(result.creatorName, null);
assert.equal(result.creatorId, "u1");
});
test("filterComment reads creator.name when creator present", () => {
const result = filterComment({
id: "c1",
pageId: "p1",
content: "hello",
createdAt: "2026-01-01T00:00:00.000Z",
creator: { name: "Alice" },
});
assert.equal(result.creatorName, "Alice");
});
test("filterComment defaults selection/type/parentCommentId/editedAt", () => {
const result = filterComment({
id: "c1",
pageId: "p1",
content: "hello",
createdAt: "2026-01-01T00:00:00.000Z",
});
assert.equal(result.selection, null);
assert.equal(result.type, "page");
assert.equal(result.parentCommentId, null);
assert.equal(result.editedAt, null);
});
test("filterPage selects expected fields", () => {
const result = filterPage({
id: "page-1",
slugId: "slug-1",
title: "My Page",
parentPageId: "parent-1",
spaceId: "space-1",
isLocked: false,
createdAt: "2026-01-01T00:00:00.000Z",
updatedAt: "2026-01-02T00:00:00.000Z",
deletedAt: null,
// extra fields that must be dropped
extraneous: "should not appear",
content: "should be ignored when not passed as arg",
});
assert.deepEqual(result, {
id: "page-1",
slugId: "slug-1",
title: "My Page",
parentPageId: "parent-1",
spaceId: "space-1",
isLocked: false,
createdAt: "2026-01-01T00:00:00.000Z",
updatedAt: "2026-01-02T00:00:00.000Z",
deletedAt: null,
});
});
test("filterPage omits content key when content arg is not a string", () => {
const result = filterPage({ id: "p1", title: "t" });
assert.equal("content" in result, false);
});
test("filterPage includes content when arg is a string", () => {
const result = filterPage({ id: "p1", title: "t" }, "# Heading");
assert.equal(result.content, "# Heading");
});
test("filterPage includes content when arg is an empty string", () => {
const result = filterPage({ id: "p1", title: "t" }, "");
assert.equal("content" in result, true);
assert.equal(result.content, "");
});
test("filterPage omits subpages when none provided", () => {
const result = filterPage({ id: "p1", title: "t" });
assert.equal("subpages" in result, false);
});
test("filterPage omits subpages when an empty array is provided", () => {
const result = filterPage({ id: "p1", title: "t" }, undefined, []);
assert.equal("subpages" in result, false);
});
test("filterPage maps subpages to id/title only", () => {
const result = filterPage({ id: "p1", title: "t" }, undefined, [
{ id: "s1", title: "Sub One", extra: "drop" },
{ id: "s2", title: "Sub Two" },
]);
assert.deepEqual(result.subpages, [
{ id: "s1", title: "Sub One" },
{ id: "s2", title: "Sub Two" },
]);
});
test("filterPage includes both content and subpages together", () => {
const result = filterPage({ id: "p1", title: "t" }, "body", [
{ id: "s1", title: "Sub" },
]);
assert.equal(result.content, "body");
assert.deepEqual(result.subpages, [{ id: "s1", title: "Sub" }]);
});

View File

@@ -0,0 +1,173 @@
import { test } from "node:test";
import assert from "node:assert/strict";
import { applyTextEdits } from "../../build/lib/json-edit.js";
// Helpers to build small ProseMirror docs.
const textNode = (text, extra = {}) => ({ type: "text", text, ...extra });
const paragraph = (...children) => ({ type: "paragraph", content: children });
const doc = (...children) => ({ type: "doc", content: children });
test("single-match replace preserves ids/marks and reports replacements===1", () => {
const input = doc({
type: "paragraph",
attrs: { id: "para-1" },
content: [
textNode("Hello world", { marks: [{ type: "bold" }] }),
],
});
const { doc: out, results } = applyTextEdits(input, [
{ find: "world", replace: "there" },
]);
assert.deepEqual(results, [{ find: "world", replacements: 1 }]);
const para = out.content[0];
// Paragraph id attribute is preserved.
assert.equal(para.attrs.id, "para-1");
const tnode = para.content[0];
// Text node marks are preserved.
assert.deepEqual(tnode.marks, [{ type: "bold" }]);
assert.equal(tnode.text, "Hello there");
});
test("zero match throws not found", () => {
const input = doc(paragraph(textNode("Hello world")));
assert.throws(
() => applyTextEdits(input, [{ find: "absent", replace: "x" }]),
/not found/,
);
});
test("text split across two text nodes (one bold) throws spans-multiple-runs", () => {
// "Hello world" is split: "Hello " (plain) + "world" (bold). No single text
// node contains "Hello world", but the collected document text does.
const input = doc(
paragraph(
textNode("Hello "),
textNode("world", { marks: [{ type: "bold" }] }),
),
);
assert.throws(
() => applyTextEdits(input, [{ find: "Hello world", replace: "x" }]),
/spans/,
);
});
test("multi-match without replaceAll throws matches", () => {
// "ab" appears twice inside a single text node.
const input = doc(paragraph(textNode("ab cd ab")));
assert.throws(
() => applyTextEdits(input, [{ find: "ab", replace: "x" }]),
/matches/,
);
});
test("replaceAll replaces all occurrences", () => {
const input = doc(
paragraph(textNode("foo and foo")),
paragraph(textNode("more foo")),
);
const { doc: out, results } = applyTextEdits(input, [
{ find: "foo", replace: "bar", replaceAll: true },
]);
// 2 in the first paragraph, 1 in the second = 3 total.
assert.deepEqual(results, [{ find: "foo", replacements: 3 }]);
assert.equal(out.content[0].content[0].text, "bar and bar");
assert.equal(out.content[1].content[0].text, "more bar");
});
test("replacement containing $&, $1, $$ is inserted LITERALLY (regression)", () => {
const input = doc(paragraph(textNode("token here")));
const literal = "price $& cost $1 dollars $$ end";
const { doc: out } = applyTextEdits(input, [
{ find: "token", replace: literal },
]);
// The replacement must appear verbatim, NOT regex-expanded.
assert.equal(out.content[0].content[0].text, `${literal} here`);
// Be explicit that the find text was not re-injected via $&.
assert.ok(out.content[0].content[0].text.includes("$&"));
assert.ok(!out.content[0].content[0].text.includes("token"));
});
test("$ patterns are inserted literally under replaceAll too", () => {
const input = doc(paragraph(textNode("x and x")));
const { doc: out } = applyTextEdits(input, [
{ find: "x", replace: "$&$1$$", replaceAll: true },
]);
assert.equal(out.content[0].content[0].text, "$&$1$$ and $&$1$$");
});
test("empty replacement prunes the emptied text node", () => {
// A paragraph whose only text node becomes empty: the node must be pruned.
const input = doc(
paragraph(
textNode("DELETE", { marks: [{ type: "italic" }] }),
textNode(" kept"),
),
);
const { doc: out, results } = applyTextEdits(input, [
{ find: "DELETE", replace: "" },
]);
assert.deepEqual(results, [{ find: "DELETE", replacements: 1 }]);
const para = out.content[0];
// The emptied first text node is gone; only the " kept" node remains.
assert.equal(para.content.length, 1);
assert.equal(para.content[0].text, " kept");
});
test("multi-edit array applied in order", () => {
const input = doc(paragraph(textNode("alpha beta")));
const { doc: out, results } = applyTextEdits(input, [
{ find: "alpha", replace: "ALPHA" },
{ find: "beta", replace: "BETA" },
]);
assert.deepEqual(results, [
{ find: "alpha", replacements: 1 },
{ find: "beta", replacements: 1 },
]);
assert.equal(out.content[0].content[0].text, "ALPHA BETA");
});
test("second edit can target text produced by the first (ordered application)", () => {
const input = doc(paragraph(textNode("one")));
const { doc: out, results } = applyTextEdits(input, [
{ find: "one", replace: "two" },
{ find: "two", replace: "three" },
]);
assert.deepEqual(results, [
{ find: "one", replacements: 1 },
{ find: "two", replacements: 1 },
]);
assert.equal(out.content[0].content[0].text, "three");
});
test("input doc is not mutated", () => {
const input = doc(paragraph(textNode("immutable source")));
const snapshot = JSON.parse(JSON.stringify(input));
const { doc: out } = applyTextEdits(input, [
{ find: "immutable", replace: "changed" },
]);
// Original is untouched; the returned doc is a distinct object.
assert.deepEqual(input, snapshot);
assert.notEqual(out, input);
assert.equal(out.content[0].content[0].text, "changed source");
});

View File

@@ -0,0 +1,151 @@
import { test } from "node:test";
import assert from "node:assert/strict";
import { convertProseMirrorToMarkdown } from "../../build/lib/markdown-converter.js";
// ProseMirror builders.
const text = (t, marks) => (marks ? { type: "text", text: t, marks } : { type: "text", text: t });
const paragraph = (...content) => ({ type: "paragraph", content });
const doc = (...content) => ({ type: "doc", content });
const listItem = (...content) => ({ type: "listItem", content });
const bulletList = (...items) => ({ type: "bulletList", content: items });
const orderedList = (...items) => ({ type: "orderedList", content: items });
test("nested bulletList with 3 children keeps all children indented under the parent", () => {
const input = doc(
bulletList(
listItem(
paragraph(text("Parent")),
bulletList(
listItem(paragraph(text("A"))),
listItem(paragraph(text("B"))),
listItem(paragraph(text("C"))),
),
),
),
);
assert.equal(
convertProseMirrorToMarkdown(input),
"- Parent\n - A\n - B\n - C",
);
});
test("nested list under an ordered item indents 3 spaces", () => {
const input = doc(
orderedList(
listItem(
paragraph(text("Parent")),
bulletList(listItem(paragraph(text("Child")))),
),
),
);
assert.equal(
convertProseMirrorToMarkdown(input),
"1. Parent\n - Child",
);
});
test("link with title -> [t](url \"title\")", () => {
const input = doc(
paragraph(
text("click", [
{ type: "link", attrs: { href: "https://example.com", title: "the title" } },
]),
),
);
assert.equal(
convertProseMirrorToMarkdown(input),
'[click](https://example.com "the title")',
);
});
test("hardBreak -> trailing two-spaces+newline", () => {
const input = doc(
paragraph(text("line1"), { type: "hardBreak" }, text("line2")),
);
assert.equal(convertProseMirrorToMarkdown(input), "line1 \nline2");
});
test("table cell with two block children joined by a space (and a pipe escaped)", () => {
const input = doc({
type: "table",
content: [
{
type: "tableRow",
content: [
{
type: "tableCell",
content: [paragraph(text("a|b")), paragraph(text("c"))],
},
],
},
],
});
// Single-column header row + separator. The cell joins its two paragraphs
// with a space ("a|b c") then escapes the pipe -> "a\|b c".
assert.equal(
convertProseMirrorToMarkdown(input),
"| a\\|b c |\n| --- |",
);
});
test("code block trailing newline trimmed", () => {
const input = doc({
type: "codeBlock",
attrs: { language: "js" },
content: [text("const a = 1;\n")],
});
// The single trailing newline inside the code is trimmed; fences add one.
assert.equal(
convertProseMirrorToMarkdown(input),
"```js\nconst a = 1;\n```",
);
});
test("textAlign value: delimiting double-quote escaped (attribute-safe, idempotent; < > left literal/inert)", () => {
const input = doc({
type: "paragraph",
attrs: { textAlign: 'right"><b' },
content: [text("body")],
});
// Attribute values escape only & and " so the value cannot break out of the
// quoted attribute. < and > are left literal: parse5/jsdom does NOT decode
// &lt;/&gt; inside attribute values, so escaping them would corrupt the value
// and accumulate on every round-trip. The literal < > are inert inside quotes.
assert.equal(
convertProseMirrorToMarkdown(input),
'<div align="right&quot;><b">body</div>',
);
});
test("highlight color: delimiting double-quote escaped (attribute-safe; < > inert, and import sanitizes the color)", () => {
const input = doc(
paragraph(
text("hi", [{ type: "highlight", attrs: { color: 'red"><script' } }]),
),
);
assert.equal(
convertProseMirrorToMarkdown(input),
'<mark style="background-color: red&quot;><script">hi</mark>',
);
});
test("empty task item still emits its marker", () => {
const input = doc({
type: "taskList",
content: [
{ type: "taskItem", attrs: { checked: false }, content: [] },
{ type: "taskItem", attrs: { checked: true }, content: [] },
],
});
assert.equal(convertProseMirrorToMarkdown(input), "- [ ]\n- [x]");
});

View File

@@ -0,0 +1,301 @@
import { test } from "node:test";
import assert from "node:assert/strict";
import {
insertNodeRelative,
sanitizeForYjs,
findUnstorableAttr,
} from "../../build/lib/node-ops.js";
// ProseMirror builders. Blocks carry a stable id in attrs.id.
const textNode = (text) => ({ type: "text", text });
const para = (id, ...children) => ({
type: "paragraph",
attrs: { id },
content: children,
});
const doc = (...children) => ({ type: "doc", content: children });
const snapshot = (v) => JSON.parse(JSON.stringify(v));
// A table cell holding a single paragraph.
const cell = (id, innerPara) => ({
type: "tableCell",
attrs: { id },
content: [innerPara],
});
const row = (id, ...cells) => ({
type: "tableRow",
attrs: { id },
content: cells,
});
const table = (id, ...rows) => ({
type: "table",
attrs: { id },
content: rows,
});
// A 2x2 table: rows r1/r2, cells c1..c4, each cell holds a paragraph p1..p4.
const make2x2Table = () =>
doc(
table(
"t1",
row("r1", cell("c1", para("p1", textNode("A1"))), cell("c2", para("p2", textNode("A2")))),
row("r2", cell("c3", para("p3", textNode("B1"))), cell("c4", para("p4", textNode("B2")))),
),
);
const freshRow = () => row("rNEW", cell("cNEW", para("pNEW", textNode("NEW"))));
const freshCell = () => cell("cNEW", para("pNEW", textNode("NEW")));
// ---------------------------------------------------------------------------
// sanitizeForYjs
// ---------------------------------------------------------------------------
test("sanitizeForYjs strips undefined node-attr keys, preserves null/false/0/''", () => {
const input = doc({
type: "paragraph",
attrs: {
id: "p-1",
gone: undefined,
keptNull: null,
keptFalse: false,
keptZero: 0,
keptEmpty: "",
},
content: [textNode("x")],
});
const out = sanitizeForYjs(input);
const attrs = out.content[0].attrs;
assert.equal("gone" in attrs, false);
assert.equal("keptNull" in attrs, true);
assert.equal(attrs.keptNull, null);
assert.equal(attrs.keptFalse, false);
assert.equal(attrs.keptZero, 0);
assert.equal(attrs.keptEmpty, "");
// Input must not be mutated.
assert.equal("gone" in input.content[0].attrs, true);
});
test("sanitizeForYjs strips undefined mark-attr keys, preserves falsy values", () => {
const input = doc({
type: "paragraph",
attrs: { id: "p-1" },
content: [
{
type: "text",
text: "x",
marks: [
{
type: "link",
attrs: { href: "", target: undefined, rel: null },
},
],
},
],
});
const out = sanitizeForYjs(input);
const markAttrs = out.content[0].content[0].marks[0].attrs;
assert.equal("target" in markAttrs, false);
assert.equal(markAttrs.href, "");
assert.equal(markAttrs.rel, null);
});
// ---------------------------------------------------------------------------
// findUnstorableAttr
// ---------------------------------------------------------------------------
test("findUnstorableAttr returns a path for an undefined node attr", () => {
const input = doc(
para("p-0", textNode("ok")),
{
type: "paragraph",
attrs: { id: "p-1", indent: undefined },
content: [textNode("y")],
},
);
const hit = findUnstorableAttr(input);
assert.equal(hit, "content[1].attrs.indent (undefined)");
});
test("findUnstorableAttr finds an unstorable mark attr", () => {
const input = doc({
type: "paragraph",
attrs: { id: "p-1" },
content: [
{
type: "text",
text: "x",
marks: [{ type: "link", attrs: { href: () => {} } }],
},
],
});
const hit = findUnstorableAttr(input);
assert.equal(hit, "content[0].content[0].marks[0].attrs.href (function)");
});
test("findUnstorableAttr returns null for a clean doc", () => {
const input = doc(para("p-1", textNode("clean")));
assert.equal(findUnstorableAttr(input), null);
});
// ---------------------------------------------------------------------------
// insertNodeRelative — table-structure-aware
// ---------------------------------------------------------------------------
test("insertNodeRelative inserting a tableRow anchored on a paragraph INSIDE a cell appends a sibling row to the table", () => {
const input = make2x2Table();
const { doc: out, inserted } = insertNodeRelative(input, freshRow(), {
position: "after",
anchorNodeId: "p4", // paragraph inside last cell of the last row
});
assert.equal(inserted, true);
const tbl = out.content[0];
// table.content length +1 (the row is a direct child of the table).
assert.equal(tbl.content.length, 3);
// The new row is a direct child of the table, NOT nested inside a cell.
const newRow = tbl.content[2];
assert.equal(newRow.type, "tableRow");
assert.equal(newRow.attrs.id, "rNEW");
// Existing rows' cells are intact.
assert.deepEqual(
tbl.content[0].content.map((c) => c.attrs.id),
["c1", "c2"],
);
assert.deepEqual(
tbl.content[1].content.map((c) => c.attrs.id),
["c3", "c4"],
);
// Assert the new row is NOT nested inside any existing cell.
for (const r of [tbl.content[0], tbl.content[1]]) {
for (const c of r.content) {
const ids = (c.content || []).map((n) => n.attrs?.id);
assert.equal(ids.includes("rNEW"), false);
}
}
});
test("insertNodeRelative before/after place the new row at the correct index relative to the enclosing row", () => {
// "before" the first row.
{
const input = make2x2Table();
const { doc: out } = insertNodeRelative(input, freshRow(), {
position: "before",
anchorNodeId: "p1", // paragraph in first row
});
assert.deepEqual(
out.content[0].content.map((r) => r.attrs.id),
["rNEW", "r1", "r2"],
);
}
// "after" the first row.
{
const input = make2x2Table();
const { doc: out } = insertNodeRelative(input, freshRow(), {
position: "after",
anchorNodeId: "p1", // paragraph in first row
});
assert.deepEqual(
out.content[0].content.map((r) => r.attrs.id),
["r1", "rNEW", "r2"],
);
}
});
test("insertNodeRelative inserting a tableCell anchored inside a cell adds it to the enclosing row", () => {
const input = make2x2Table();
const { doc: out, inserted } = insertNodeRelative(input, freshCell(), {
position: "after",
anchorNodeId: "p1", // paragraph inside first cell of first row
});
assert.equal(inserted, true);
// The cell is spliced into the enclosing row (r1) after c1.
assert.deepEqual(
out.content[0].content[0].content.map((c) => c.attrs.id),
["c1", "cNEW", "c2"],
);
// The other row is untouched.
assert.deepEqual(
out.content[0].content[1].content.map((c) => c.attrs.id),
["c3", "c4"],
);
});
test("insertNodeRelative inserting a tableRow with an anchor NOT inside a table throws", () => {
const input = doc(para("p-1", textNode("plain")));
assert.throws(
() =>
insertNodeRelative(input, freshRow(), {
position: "after",
anchorNodeId: "p-1",
}),
/not inside a table/,
);
});
test("insertNodeRelative append + tableRow throws", () => {
const input = make2x2Table();
assert.throws(
() => insertNodeRelative(input, freshRow(), { position: "append" }),
/cannot append a tableRow at the top level/,
);
});
test("insertNodeRelative structural insert with unresolved anchor returns inserted:false (no throw)", () => {
const input = make2x2Table();
const { doc: out, inserted } = insertNodeRelative(input, freshRow(), {
position: "after",
anchorNodeId: "does-not-exist",
});
assert.equal(inserted, false);
assert.deepEqual(out, input);
});
test("insertNodeRelative tableRow by anchorText resolving to the table block appends within the table", () => {
const input = make2x2Table();
// anchorText "A1" lives in the first cell; the matched top-level block is the
// table itself, so the row appends at the end of the table.
const { doc: out, inserted } = insertNodeRelative(input, freshRow(), {
position: "after",
anchorText: "A1",
});
assert.equal(inserted, true);
assert.deepEqual(
out.content[0].content.map((r) => r.attrs.id),
["r1", "r2", "rNEW"],
);
});
// ---------------------------------------------------------------------------
// Regression: a normal (non-structural) paragraph insert is unchanged.
// ---------------------------------------------------------------------------
test("insertNodeRelative regression: normal paragraph before/after a top-level block behaves as before", () => {
const before = doc(para("p-1", textNode("one")), para("p-2", textNode("two")));
{
const { doc: out, inserted } = insertNodeRelative(
before,
para("new", textNode("NEW")),
{ position: "before", anchorNodeId: "p-2" },
);
assert.equal(inserted, true);
assert.deepEqual(
out.content.map((n) => n.attrs.id),
["p-1", "new", "p-2"],
);
}
{
const snap = snapshot(before);
const { doc: out, inserted } = insertNodeRelative(
before,
para("new", textNode("NEW")),
{ position: "after", anchorNodeId: "p-1" },
);
assert.equal(inserted, true);
assert.deepEqual(
out.content.map((n) => n.attrs.id),
["p-1", "new", "p-2"],
);
// Input not mutated.
assert.deepEqual(before, snap);
}
});

View File

@@ -0,0 +1,402 @@
import { test } from "node:test";
import assert from "node:assert/strict";
import {
blockPlainText,
replaceNodeById,
deleteNodeById,
insertNodeRelative,
} from "../../build/lib/node-ops.js";
// ProseMirror builders. Blocks carry a stable id in attrs.id.
const textNode = (text) => ({ type: "text", text });
const para = (id, ...children) => ({
type: "paragraph",
attrs: { id },
content: children,
});
const doc = (...children) => ({ type: "doc", content: children });
const snapshot = (v) => JSON.parse(JSON.stringify(v));
// A callout / table-cell wraps its children in `content`, just like any other
// block, so recursion reaches a paragraph nested inside it.
const callout = (id, ...children) => ({
type: "callout",
attrs: { id, type: "info" },
content: children,
});
const tableDoc = (innerPara) =>
doc({
type: "table",
attrs: { id: "table-1" },
content: [
{
type: "tableRow",
attrs: { id: "row-1" },
content: [
{
type: "tableCell",
attrs: { id: "cell-1" },
content: [innerPara],
},
],
},
],
});
// ---------------------------------------------------------------------------
// blockPlainText
// ---------------------------------------------------------------------------
test("blockPlainText concatenates nested text", () => {
const node = {
type: "callout",
content: [
para("p-1", textNode("Hello "), textNode("world")),
para("p-2", textNode("!")),
],
};
assert.equal(blockPlainText(node), "Hello world!");
});
test("blockPlainText returns '' for nullish / non-object", () => {
assert.equal(blockPlainText(null), "");
assert.equal(blockPlainText(undefined), "");
assert.equal(blockPlainText("just a string"), "");
});
test("blockPlainText reads a bare text node", () => {
assert.equal(blockPlainText(textNode("solo")), "solo");
});
// ---------------------------------------------------------------------------
// replaceNodeById
// ---------------------------------------------------------------------------
test("replaceNodeById replaces the matching block and leaves others, count===1", () => {
const input = doc(
para("p-1", textNode("one")),
para("p-2", textNode("two")),
para("p-3", textNode("three")),
);
const newNode = para("p-2", textNode("REPLACED"));
const { doc: out, replaced } = replaceNodeById(input, "p-2", newNode);
assert.equal(replaced, 1);
// Target replaced.
assert.equal(out.content[1].content[0].text, "REPLACED");
// Siblings untouched (text and ids).
assert.equal(out.content[0].content[0].text, "one");
assert.equal(out.content[2].content[0].text, "three");
assert.deepEqual(
out.content.map((n) => n.attrs.id),
["p-1", "p-2", "p-3"],
);
});
test("replaceNodeById on no-match returns replaced===0 and does not throw", () => {
const input = doc(para("p-1", textNode("one")));
const { doc: out, replaced } = replaceNodeById(
input,
"missing",
para("x", textNode("x")),
);
assert.equal(replaced, 0);
// Document content is preserved.
assert.equal(out.content[0].content[0].text, "one");
});
test("replaceNodeById replaces EVERY node sharing the id (count reflects all)", () => {
const input = doc(
para("dup", textNode("a")),
para("dup", textNode("b")),
para("keep", textNode("c")),
);
const { doc: out, replaced } = replaceNodeById(
input,
"dup",
para("dup", textNode("NEW")),
);
assert.equal(replaced, 2);
assert.equal(out.content[0].content[0].text, "NEW");
assert.equal(out.content[1].content[0].text, "NEW");
assert.equal(out.content[2].content[0].text, "c");
// The two replacements must not share a reference (deep clone per match).
assert.notEqual(out.content[0], out.content[1]);
});
test("replaceNodeById reaches a node nested inside a callout", () => {
const input = doc(callout("c-1", para("inner", textNode("old"))));
const { doc: out, replaced } = replaceNodeById(
input,
"inner",
para("inner", textNode("new")),
);
assert.equal(replaced, 1);
assert.equal(out.content[0].content[0].content[0].text, "new");
});
test("replaceNodeById reaches a node nested inside a table cell", () => {
const input = tableDoc(para("deep", textNode("before")));
const { doc: out, replaced } = replaceNodeById(
input,
"deep",
para("deep", textNode("after")),
);
assert.equal(replaced, 1);
const cellPara = out.content[0].content[0].content[0].content[0];
assert.equal(cellPara.content[0].text, "after");
});
test("replaceNodeById does NOT mutate input (deep-equal snapshot)", () => {
const input = doc(
para("p-1", textNode("one")),
callout("c-1", para("inner", textNode("old"))),
);
const snap = snapshot(input);
const { doc: out } = replaceNodeById(
input,
"inner",
para("inner", textNode("changed")),
);
assert.deepEqual(input, snap);
assert.notEqual(out, input);
});
// ---------------------------------------------------------------------------
// deleteNodeById
// ---------------------------------------------------------------------------
test("deleteNodeById removes the block and reports deleted===1", () => {
const input = doc(
para("p-1", textNode("one")),
para("p-2", textNode("two")),
para("p-3", textNode("three")),
);
const { doc: out, deleted } = deleteNodeById(input, "p-2");
assert.equal(deleted, 1);
assert.deepEqual(
out.content.map((n) => n.attrs.id),
["p-1", "p-3"],
);
});
test("deleteNodeById on no-match returns deleted===0 and leaves content", () => {
const input = doc(para("p-1", textNode("one")));
const { doc: out, deleted } = deleteNodeById(input, "missing");
assert.equal(deleted, 0);
assert.equal(out.content.length, 1);
});
test("deleteNodeById removes a node nested inside a callout", () => {
const input = doc(
callout("c-1", para("inner", textNode("x")), para("keep", textNode("y"))),
);
const { doc: out, deleted } = deleteNodeById(input, "inner");
assert.equal(deleted, 1);
assert.deepEqual(
out.content[0].content.map((n) => n.attrs.id),
["keep"],
);
});
test("deleteNodeById removes EVERY node sharing the id", () => {
const input = doc(
para("dup", textNode("a")),
para("keep", textNode("b")),
para("dup", textNode("c")),
);
const { doc: out, deleted } = deleteNodeById(input, "dup");
assert.equal(deleted, 2);
assert.deepEqual(
out.content.map((n) => n.attrs.id),
["keep"],
);
});
test("deleteNodeById does NOT mutate input (deep-equal snapshot)", () => {
const input = doc(
para("p-1", textNode("one")),
para("p-2", textNode("two")),
);
const snap = snapshot(input);
const { doc: out } = deleteNodeById(input, "p-2");
assert.deepEqual(input, snap);
assert.notEqual(out, input);
});
// ---------------------------------------------------------------------------
// insertNodeRelative
// ---------------------------------------------------------------------------
test("insertNodeRelative before by anchorNodeId", () => {
const input = doc(para("p-1", textNode("one")), para("p-2", textNode("two")));
const node = para("new", textNode("NEW"));
const { doc: out, inserted } = insertNodeRelative(input, node, {
position: "before",
anchorNodeId: "p-2",
});
assert.equal(inserted, true);
assert.deepEqual(
out.content.map((n) => n.attrs.id),
["p-1", "new", "p-2"],
);
});
test("insertNodeRelative after by anchorNodeId", () => {
const input = doc(para("p-1", textNode("one")), para("p-2", textNode("two")));
const node = para("new", textNode("NEW"));
const { doc: out, inserted } = insertNodeRelative(input, node, {
position: "after",
anchorNodeId: "p-1",
});
assert.equal(inserted, true);
assert.deepEqual(
out.content.map((n) => n.attrs.id),
["p-1", "new", "p-2"],
);
});
test("insertNodeRelative before/after by anchorNodeId reaches a nested sibling", () => {
const input = doc(
callout("c-1", para("a", textNode("a")), para("b", textNode("b"))),
);
const node = para("new", textNode("NEW"));
const { doc: out, inserted } = insertNodeRelative(input, node, {
position: "after",
anchorNodeId: "a",
});
assert.equal(inserted, true);
// Inserted as a sibling inside the callout's content array.
assert.deepEqual(
out.content[0].content.map((n) => n.attrs.id),
["a", "new", "b"],
);
});
test("insertNodeRelative before by anchorText (top-level)", () => {
const input = doc(
para("p-1", textNode("alpha")),
para("p-2", textNode("beta")),
);
const node = para("new", textNode("NEW"));
const { doc: out, inserted } = insertNodeRelative(input, node, {
position: "before",
anchorText: "beta",
});
assert.equal(inserted, true);
assert.deepEqual(
out.content.map((n) => n.attrs.id),
["p-1", "new", "p-2"],
);
});
test("insertNodeRelative after by anchorText (top-level)", () => {
const input = doc(
para("p-1", textNode("alpha")),
para("p-2", textNode("beta")),
);
const node = para("new", textNode("NEW"));
const { doc: out, inserted } = insertNodeRelative(input, node, {
position: "after",
anchorText: "alpha",
});
assert.equal(inserted, true);
assert.deepEqual(
out.content.map((n) => n.attrs.id),
["p-1", "new", "p-2"],
);
});
test("insertNodeRelative anchorText scans TOP-LEVEL blocks via recursive plain text", () => {
// anchorText matches the FIRST top-level block whose (recursive) blockPlainText
// includes the string. "deeptext" lives nested in a top-level callout, so the
// callout itself is the matched top-level block and the node lands as its
// sibling at the top level (not inside the callout).
const input = doc(
callout("c-1", para("inner", textNode("deeptext"))),
para("p-2", textNode("tail")),
);
const node = para("new", textNode("NEW"));
const { doc: out, inserted } = insertNodeRelative(input, node, {
position: "after",
anchorText: "deeptext",
});
assert.equal(inserted, true);
assert.deepEqual(
out.content.map((n) => n.attrs.id),
["c-1", "new", "p-2"],
);
});
test("insertNodeRelative anchorText does NOT match text only present below top level when no top-level block contains it", () => {
// The only block whose plain text includes "lonely" is a paragraph nested two
// levels deep, but the top-level scan still sees it through the callout's
// recursive plain text. To prove the scan is TOP-LEVEL (parent-array) only,
// assert the insertion happens at the top level beside the callout, never
// inside it.
const input = doc(callout("c-1", para("inner", textNode("lonely word"))));
const node = para("new", textNode("NEW"));
const { doc: out, inserted } = insertNodeRelative(input, node, {
position: "before",
anchorText: "lonely",
});
assert.equal(inserted, true);
// Inserted at the top level (siblings of the callout), not into the callout.
assert.deepEqual(
out.content.map((n) => n.attrs.id),
["new", "c-1"],
);
// The callout's own children are untouched.
assert.deepEqual(
out.content[1].content.map((n) => n.attrs.id),
["inner"],
);
});
test("insertNodeRelative append pushes the node at the end of top-level content", () => {
const input = doc(para("p-1", textNode("one")), para("p-2", textNode("two")));
const node = para("new", textNode("NEW"));
const { doc: out, inserted } = insertNodeRelative(input, node, {
position: "append",
});
assert.equal(inserted, true);
assert.deepEqual(
out.content.map((n) => n.attrs.id),
["p-1", "p-2", "new"],
);
});
test("insertNodeRelative inserted===false when anchorNodeId missing", () => {
const input = doc(para("p-1", textNode("one")));
const node = para("new", textNode("NEW"));
const { doc: out, inserted } = insertNodeRelative(input, node, {
position: "after",
anchorNodeId: "nope",
});
assert.equal(inserted, false);
assert.deepEqual(out, input);
});
test("insertNodeRelative inserted===false when anchorText missing", () => {
const input = doc(para("p-1", textNode("one")));
const node = para("new", textNode("NEW"));
const { inserted } = insertNodeRelative(input, node, {
position: "before",
anchorText: "nomatch",
});
assert.equal(inserted, false);
});
test("insertNodeRelative does NOT mutate input (deep-equal snapshot)", () => {
const input = doc(para("p-1", textNode("one")), para("p-2", textNode("two")));
const snap = snapshot(input);
const node = para("new", textNode("NEW"));
const { doc: out } = insertNodeRelative(input, node, {
position: "after",
anchorNodeId: "p-1",
});
assert.deepEqual(input, snap);
assert.notEqual(out, input);
});

View File

@@ -0,0 +1,109 @@
import { test } from "node:test";
import assert from "node:assert/strict";
import { buildOutline, getNodeByRef } from "../../build/lib/node-ops.js";
// Helpers to build the small fixture doc.
const textNode = (text) => ({ type: "text", text });
const paragraph = (id, text) => ({
type: "paragraph",
attrs: { id },
content: [textNode(text)],
});
// A table cell holds a paragraph; cells/rows/table carry NO attrs.id.
const cell = (text) => ({
type: "tableCell",
content: [{ type: "paragraph", content: [textNode(text)] }],
});
const row = (...texts) => ({
type: "tableRow",
content: texts.map(cell),
});
const listItem = (text) => ({
type: "listItem",
content: [{ type: "paragraph", content: [textNode(text)] }],
});
// A long paragraph to exercise truncation (>100 chars).
const longText = "x".repeat(150);
const buildDoc = () => ({
type: "doc",
content: [
{ type: "heading", attrs: { id: "h1", level: 2 }, content: [textNode("Title")] },
paragraph("p1", longText),
{
type: "table",
content: [row("A", "B", "C"), row("1", "2", "3")],
},
{
type: "bulletList",
attrs: { id: "list1" },
content: [listItem("one"), listItem("two")],
},
],
});
test("buildOutline returns one compact entry per top-level block", () => {
const outline = buildOutline(buildDoc());
assert.equal(outline.length, 4);
// Heading: level + id + firstText.
assert.equal(outline[0].type, "heading");
assert.equal(outline[0].level, 2);
assert.equal(outline[0].id, "h1");
assert.equal(outline[0].firstText, "Title");
// Long paragraph text is truncated to 100 chars + ellipsis.
assert.equal(outline[1].id, "p1");
assert.equal(outline[1].firstText, "x".repeat(100) + "…");
assert.equal(outline[1].firstText.length, 101);
// Table: rows/cols/header from the first row; no id on the table itself.
assert.equal(outline[2].type, "table");
assert.equal(outline[2].rows, 2);
assert.equal(outline[2].cols, 3);
assert.deepEqual(outline[2].header, ["A", "B", "C"]);
assert.equal(outline[2].id, null);
// List: item count.
assert.equal(outline[3].type, "bulletList");
assert.equal(outline[3].items, 2);
});
test("buildOutline is null-safe", () => {
assert.deepEqual(buildOutline(undefined), []);
assert.deepEqual(buildOutline({ type: "doc" }), []);
assert.deepEqual(buildOutline(42), []);
});
test("getNodeByRef resolves a block id to its node and path", () => {
const doc = buildDoc();
const hit = getNodeByRef(doc, "h1");
assert.ok(hit);
assert.equal(hit.type, "heading");
assert.deepEqual(hit.path, [0]);
assert.equal(hit.node.attrs.id, "h1");
});
test("getNodeByRef resolves #<index> to a top-level block (table)", () => {
const doc = buildDoc();
const hit = getNodeByRef(doc, "#2");
assert.ok(hit);
assert.equal(hit.type, "table");
assert.deepEqual(hit.path, [2]);
});
test("getNodeByRef returns null for an unknown ref", () => {
assert.equal(getNodeByRef(buildDoc(), "nope"), null);
});
test("getNodeByRef returns a clone (mutating it does not change the input)", () => {
const doc = buildDoc();
const hit = getNodeByRef(doc, "h1");
hit.node.attrs.id = "MUTATED";
hit.node.content[0].text = "changed";
// Original doc is untouched.
assert.equal(doc.content[0].attrs.id, "h1");
assert.equal(doc.content[0].content[0].text, "Title");
});

View File

@@ -0,0 +1,153 @@
import { test } from "node:test";
import assert from "node:assert/strict";
import { withPageLock } from "../../build/lib/page-lock.js";
const delay = (ms) => new Promise((resolve) => setTimeout(resolve, ms));
test("two ops on the same pageId run strictly sequentially (no overlap)", async () => {
const events = [];
const pageId = "same-page";
const p1 = withPageLock(pageId, async () => {
events.push("start-1");
await delay(40);
events.push("end-1");
return "r1";
});
// Queue the second op while the first is still running.
const p2 = withPageLock(pageId, async () => {
events.push("start-2");
await delay(10);
events.push("end-2");
return "r2";
});
const [r1, r2] = await Promise.all([p1, p2]);
assert.equal(r1, "r1");
assert.equal(r2, "r2");
// First op must fully finish before the second one begins.
assert.deepEqual(events, ["start-1", "end-1", "start-2", "end-2"]);
});
test("same pageId ordering holds for many queued ops", async () => {
const pageId = "ordered-page";
const order = [];
const active = { count: 0, maxConcurrent: 0 };
const ops = [];
for (let i = 0; i < 6; i++) {
ops.push(
withPageLock(pageId, async () => {
active.count += 1;
active.maxConcurrent = Math.max(active.maxConcurrent, active.count);
order.push(i);
await delay(5);
active.count -= 1;
return i;
}),
);
}
const results = await Promise.all(ops);
assert.deepEqual(results, [0, 1, 2, 3, 4, 5]);
assert.deepEqual(order, [0, 1, 2, 3, 4, 5]);
// Strictly sequential: never more than one op running at a time.
assert.equal(active.maxConcurrent, 1);
});
test("a rejecting op does not poison the chain for the same page", async () => {
const pageId = "poison-page";
const events = [];
const failing = withPageLock(pageId, async () => {
events.push("fail-start");
await delay(20);
events.push("fail-throw");
throw new Error("boom");
});
// The caller of the failing op must still see the rejection.
await assert.rejects(failing, /boom/);
const following = withPageLock(pageId, async () => {
events.push("next-run");
await delay(5);
return "ok";
});
const result = await following;
assert.equal(result, "ok");
// The next op ran after the failing one settled and was not blocked by it.
assert.deepEqual(events, ["fail-start", "fail-throw", "next-run"]);
});
test("failing op queued before a success both resolve/reject correctly", async () => {
const pageId = "poison-page-2";
const order = [];
const failing = withPageLock(pageId, async () => {
order.push("fail");
await delay(20);
throw new Error("nope");
});
const ok = withPageLock(pageId, async () => {
order.push("ok");
await delay(5);
return 123;
});
await assert.rejects(failing, /nope/);
assert.equal(await ok, 123);
// The failing op still ran first (it was queued first), then the success.
assert.deepEqual(order, ["fail", "ok"]);
});
test("ops on different pageIds run concurrently (overlap)", async () => {
const events = [];
const pA = withPageLock("page-A", async () => {
events.push("A-start");
await delay(40);
events.push("A-end");
return "A";
});
const pB = withPageLock("page-B", async () => {
events.push("B-start");
await delay(10);
events.push("B-end");
return "B";
});
const [rA, rB] = await Promise.all([pA, pB]);
assert.equal(rA, "A");
assert.equal(rB, "B");
// B starts before A finishes (concurrent), and B finishes before A.
assert.deepEqual(events, ["A-start", "B-start", "B-end", "A-end"]);
});
test("no functional leak: many sequential ops on same page keep working", async () => {
const pageId = "leak-page";
// Run a long series of fully sequential ops (each awaited before the next is
// queued) so the internal map entry is created and dropped repeatedly.
for (let i = 0; i < 50; i++) {
const value = await withPageLock(pageId, async () => {
await delay(1);
return i;
});
assert.equal(value, i);
}
// After the chain has drained, a brand new op on the same page still works,
// confirming the entry was not left in a broken state.
const final = await withPageLock(pageId, async () => "still-works");
assert.equal(final, "still-works");
});

View File

@@ -0,0 +1,149 @@
// Round-trip regression tests: PM -> markdown -> PM must preserve rich nodes.
// These lock in the converter/schema fixes (math, mention, attachment, columns,
// nested blocks, text color) and the attribute-escaping idempotency fix.
import { test } from "node:test";
import assert from "node:assert/strict";
import { convertProseMirrorToMarkdown } from "../../build/lib/markdown-converter.js";
import { markdownToProseMirror } from "../../build/lib/collaboration.js";
const doc = (...content) => ({ type: "doc", content });
const para = (...content) => ({ type: "paragraph", content });
const text = (t, marks) => (marks ? { type: "text", text: t, marks } : { type: "text", text: t });
// Recursively collect nodes of a given type.
const findNodes = (node, type, acc = []) => {
if (!node) return acc;
if (node.type === type) acc.push(node);
for (const c of node.content || []) findNodes(c, type, acc);
return acc;
};
// Recursively collect the set of mark types present.
const markTypes = (node, acc = new Set()) => {
if (!node) return acc;
for (const m of node.marks || []) acc.add(m.type);
for (const c of node.content || []) markTypes(c, acc);
return acc;
};
const roundtrip = async (pmDoc) => markdownToProseMirror(convertProseMirrorToMarkdown(pmDoc));
test("round-trip: text color (textStyle mark) survives", async () => {
const input = doc(para(text("colored", [{ type: "textStyle", attrs: { color: "red" } }])));
const out = await roundtrip(input);
const ts = findNodes(out, "text").flatMap((n) => n.marks || []).filter((m) => m.type === "textStyle");
assert.ok(ts.length >= 1, "textStyle mark should survive");
assert.equal(ts[0].attrs?.color, "red");
});
test("round-trip: mathInline with '<' survives and is idempotent", async () => {
const input = doc(para(text("x"), { type: "mathInline", attrs: { text: "a < b \\leq c" } }));
const md1 = convertProseMirrorToMarkdown(input);
const md2 = convertProseMirrorToMarkdown(await markdownToProseMirror(md1));
assert.equal(md1, md2, "markdown must be idempotent across a round-trip (no escape accumulation)");
const out = await markdownToProseMirror(md1);
const math = findNodes(out, "mathInline");
assert.equal(math.length, 1, "mathInline node should survive");
assert.equal(math[0].attrs?.text, "a < b \\leq c", "LaTeX (incl. '<') preserved exactly");
});
test("round-trip: mathBlock survives", async () => {
const input = doc({ type: "mathBlock", attrs: { text: "E = mc^2" } });
const out = await roundtrip(input);
const math = findNodes(out, "mathBlock");
assert.equal(math.length, 1);
assert.equal(math[0].attrs?.text, "E = mc^2");
});
test("round-trip: mention node survives (not flattened to @text)", async () => {
const input = doc(para(text("hi "), { type: "mention", attrs: { id: "u1", label: "Alice", entityType: "user", entityId: "u1" } }));
const out = await roundtrip(input);
assert.equal(findNodes(out, "mention").length, 1, "mention node should survive");
});
test("round-trip: attachment node survives with url + name", async () => {
const input = doc({ type: "attachment", attrs: { url: "/api/files/x/report.pdf", name: "report.pdf", mime: "application/pdf" } });
const out = await roundtrip(input);
const att = findNodes(out, "attachment");
assert.equal(att.length, 1, "attachment node should survive");
assert.equal(att[0].attrs?.url, "/api/files/x/report.pdf");
assert.equal(att[0].attrs?.name, "report.pdf");
});
test("round-trip: image inside a column survives as an image node (not literal markdown)", async () => {
const input = doc({
type: "columns",
content: [
{ type: "column", content: [para(text("left")), { type: "image", attrs: { src: "/api/files/a/p.png", alt: "pic" } }] },
{ type: "column", content: [para(text("right"))] },
],
});
const out = await roundtrip(input);
assert.equal(findNodes(out, "image").length, 1, "image inside a column must survive");
// and it must NOT leak as literal markdown text
assert.ok(!JSON.stringify(out).includes("![pic]"), "image must not become literal markdown text");
});
test("round-trip: blockquote inside a column survives as a blockquote node", async () => {
const input = doc({
type: "columns",
content: [
{ type: "column", content: [{ type: "blockquote", content: [para(text("quoted"))] }] },
{ type: "column", content: [para(text("r"))] },
],
});
const out = await roundtrip(input);
assert.equal(findNodes(out, "blockquote").length, 1, "blockquote inside a column must survive");
});
test("round-trip: table cell with colspan>1 keeps the grid (HTML fallback)", async () => {
const cell = (t, attrs = {}) => ({ type: "tableCell", attrs, content: [para(text(t))] });
const header = (t) => ({ type: "tableHeader", attrs: {}, content: [para(text(t))] });
const input = doc({
type: "table",
content: [
{ type: "tableRow", content: [header("A"), header("B")] },
{ type: "tableRow", content: [cell("wide", { colspan: 2 })] },
],
});
const out = await roundtrip(input);
const tables = findNodes(out, "table");
assert.equal(tables.length, 1, "table should survive");
const spanned = findNodes(out, "tableCell").find((c) => (c.attrs?.colspan ?? 1) > 1);
assert.ok(spanned, "colspan>1 cell should be preserved via the HTML fallback");
});
test("import: an unsafe highlight color (raw data-color) is sanitized to null (no style breakout)", async () => {
// data-color is read verbatim (no CSSOM isolation), so it is the real
// injection surface; a value with quotes/semicolons must be clamped to null.
const out = await markdownToProseMirror('<mark data-color="red&quot;; background:url(x)">hi</mark>');
const hl = findNodes(out, "text").flatMap((n) => n.marks || []).filter((m) => m.type === "highlight");
assert.ok(hl.length >= 1, "highlight mark present");
assert.equal(hl[0].attrs?.color ?? null, null, "unsafe color must be clamped to null");
});
test("import: a safe highlight color is preserved", async () => {
const out = await markdownToProseMirror('<mark style="background-color: #ff0000">hi</mark>');
const hl = findNodes(out, "text").flatMap((n) => n.marks || []).filter((m) => m.type === "highlight");
assert.ok(hl.length >= 1);
assert.equal(hl[0].attrs?.color, "#ff0000");
});
test("round-trip: attribute value with an apostrophe is idempotent (no &amp; accumulation)", async () => {
const input = doc({ type: "attachment", attrs: { url: "/api/files/x/o'brien's file.pdf", name: "o'brien's file.pdf" } });
const md1 = convertProseMirrorToMarkdown(input);
const md2 = convertProseMirrorToMarkdown(await markdownToProseMirror(md1));
assert.equal(md1, md2, "apostrophe in an attribute value must not accumulate escapes across round-trips");
const att = findNodes(await markdownToProseMirror(md1), "attachment");
assert.equal(att.length, 1);
assert.equal(att[0].attrs?.name, "o'brien's file.pdf", "apostrophe preserved verbatim");
});
test("import: a colored span that is also a comment keeps the comment mark", async () => {
const out = await markdownToProseMirror('<span data-comment-id="c1" style="color: red">x</span>');
const marks = findNodes(out, "text").flatMap((n) => n.marks || []).map((m) => m.type);
assert.ok(marks.includes("comment"), "comment mark must survive (textStyle must not steal the span)");
});
test("import: a colored mention span keeps the mention node", async () => {
const out = await markdownToProseMirror('<span data-type="mention" data-id="u1" data-label="Alice" style="color: blue">@Alice</span>');
assert.equal(findNodes(out, "mention").length, 1, "mention node must survive a colored span");
});

View File

@@ -0,0 +1,77 @@
import { test } from "node:test";
import assert from "node:assert/strict";
import {
docmostExtensions,
clampCalloutType,
} from "../../build/lib/docmost-schema.js";
import { TiptapTransformer } from "@hocuspocus/transformer";
test("clampCalloutType: a known type passes through", () => {
assert.equal(clampCalloutType("warning"), "warning");
});
test("clampCalloutType: an uppercase known type folds to lower case", () => {
assert.equal(clampCalloutType("WARNING"), "warning");
assert.equal(clampCalloutType("Info"), "info");
});
test("clampCalloutType: an unknown type falls back to info", () => {
assert.equal(clampCalloutType("bogus"), "info");
});
test("clampCalloutType: null and undefined fall back to info", () => {
assert.equal(clampCalloutType(null), "info");
assert.equal(clampCalloutType(undefined), "info");
});
// Minimal-doc builders for the toYdoc acceptance loop.
const text = (t) => ({ type: "text", text: t });
const paragraph = (inline) => ({ type: "paragraph", content: inline });
const docOf = (...content) => ({ type: "doc", content });
// Each entry is a minimal valid doc for one Docmost node type. Inline atoms
// (mention, mathInline) and inline-capable nodes go inside a paragraph; block
// atoms and block containers go at the top level.
const cases = {
mention: docOf(
paragraph([{ type: "mention", attrs: { id: "u1", label: "Bob" } }]),
),
mathInline: docOf(paragraph([{ type: "mathInline", attrs: { text: "x^2" } }])),
mathBlock: docOf({ type: "mathBlock", attrs: { text: "x^2" } }),
details: docOf({
type: "details",
content: [
{ type: "detailsSummary", content: [text("Summary")] },
{ type: "detailsContent", content: [paragraph([text("body")])] },
],
}),
attachment: docOf({
type: "attachment",
attrs: { url: "http://x/f.zip", name: "f.zip" },
}),
video: docOf({ type: "video", attrs: { src: "http://x/v.mp4" } }),
youtube: docOf({ type: "youtube", attrs: { src: "http://y/watch" } }),
embed: docOf({ type: "embed", attrs: { src: "http://e", provider: "iframe" } }),
drawio: docOf({ type: "drawio", attrs: { src: "http://d" } }),
excalidraw: docOf({ type: "excalidraw", attrs: { src: "http://e" } }),
columns: docOf({
type: "columns",
content: [
{ type: "column", content: [paragraph([text("c1")])] },
{ type: "column", content: [paragraph([text("c2")])] },
],
}),
subpages: docOf({ type: "subpages" }),
audio: docOf({ type: "audio", attrs: { src: "http://a.mp3" } }),
pdf: docOf({ type: "pdf", attrs: { src: "http://p.pdf" } }),
pageBreak: docOf({ type: "pageBreak" }),
};
for (const [name, doc] of Object.entries(cases)) {
test(`toYdoc accepts a ${name} node without throwing`, () => {
assert.doesNotThrow(() => {
TiptapTransformer.toYdoc(doc, "default", docmostExtensions);
});
});
}

View File

@@ -0,0 +1,338 @@
import { test } from "node:test";
import assert from "node:assert/strict";
import {
readTable,
insertTableRow,
deleteTableRow,
updateTableCell,
} from "../../build/lib/node-ops.js";
// ---------------------------------------------------------------------------
// Builders. Tables/rows/cells carry NO attrs.id — only the paragraph inside a
// cell does. A cell holds a single plain-text paragraph.
// ---------------------------------------------------------------------------
const textNode = (text) => ({ type: "text", text });
const para = (id, text) => ({
type: "paragraph",
attrs: { id, indent: 0 },
content: text ? [textNode(text)] : [],
});
const cell = (paraId, text, colwidth) => ({
type: "tableCell",
attrs: { colspan: 1, rowspan: 1, ...(colwidth ? { colwidth } : {}) },
content: [para(paraId, text)],
});
const row = (...cells) => ({ type: "tableRow", content: cells });
const doc = (...children) => ({ type: "doc", content: children });
const snapshot = (v) => JSON.parse(JSON.stringify(v));
// Heading at index 0, a 3x3 table at index 1.
// Header row "A"/"B"/"C" with colwidths [120]/[200]/[150]; two data rows.
const makeDoc = () =>
doc(
{ type: "heading", attrs: { id: "h1", level: 1 }, content: [textNode("Title")] },
{
type: "table",
content: [
row(
cell("hpA", "A", [120]),
cell("hpB", "B", [200]),
cell("hpC", "C", [150]),
),
row(cell("p10", "r1c0"), cell("p11", "r1c1"), cell("p12", "r1c2")),
row(cell("p20", "r2c0"), cell("p21", "r2c1"), cell("p22", "r2c2")),
],
},
);
// Gather every attrs.id present anywhere in a doc.
const allIds = (node, acc = new Set()) => {
if (node && typeof node === "object" && !Array.isArray(node)) {
if (node.attrs && typeof node.attrs.id === "string") acc.add(node.attrs.id);
if (Array.isArray(node.content)) node.content.forEach((c) => allIds(c, acc));
}
return acc;
};
// ---------------------------------------------------------------------------
// readTable
// ---------------------------------------------------------------------------
test("readTable('#1') returns the 3x3 matrix, cell ids, and path", () => {
const t = readTable(makeDoc(), "#1");
assert.ok(t);
assert.equal(t.rows, 3);
assert.equal(t.cols, 3);
assert.deepEqual(t.cells, [
["A", "B", "C"],
["r1c0", "r1c1", "r1c2"],
["r2c0", "r2c1", "r2c2"],
]);
assert.deepEqual(t.cellIds, [
["hpA", "hpB", "hpC"],
["p10", "p11", "p12"],
["p20", "p21", "p22"],
]);
assert.deepEqual(t.path, [1]);
});
test("readTable(<cell paragraph id>) resolves the enclosing table", () => {
const t = readTable(makeDoc(), "p21"); // a paragraph inside a data cell
assert.ok(t);
assert.equal(t.rows, 3);
assert.equal(t.cols, 3);
assert.deepEqual(t.path, [1]);
});
test("readTable on a non-table block / unknown ref returns null", () => {
assert.equal(readTable(makeDoc(), "#0"), null); // heading, not a table
assert.equal(readTable(makeDoc(), "nope"), null); // no such id
});
// ---------------------------------------------------------------------------
// insertTableRow
// ---------------------------------------------------------------------------
test("insertTableRow appends a 4th row, copies header colwidths, fresh unique ids", () => {
const input = makeDoc();
const snap = snapshot(input);
const existingIds = allIds(input);
const { doc: out, inserted } = insertTableRow(input, "#1", ["x", "y", "z"]);
assert.equal(inserted, true);
// Input not mutated.
assert.deepEqual(input, snap);
const tbl = out.content[1];
assert.equal(tbl.content.length, 4);
const newRow = tbl.content[3];
assert.equal(newRow.type, "tableRow");
assert.equal(newRow.content.length, 3);
// Cell texts.
assert.deepEqual(
newRow.content.map((c) => c.content[0].content[0]?.text),
["x", "y", "z"],
);
// Colwidths copied from the header row.
assert.deepEqual(
newRow.content.map((c) => c.attrs.colwidth),
[[120], [200], [150]],
);
// colspan/rowspan present.
for (const c of newRow.content) {
assert.equal(c.attrs.colspan, 1);
assert.equal(c.attrs.rowspan, 1);
}
// New paragraph ids are unique and not equal to any existing id.
const newIds = newRow.content.map((c) => c.content[0].attrs.id);
assert.equal(new Set(newIds).size, 3);
for (const id of newIds) {
assert.ok(typeof id === "string" && id.length > 0);
assert.equal(existingIds.has(id), false);
}
});
test("insertTableRow at index 0 inserts before the header and pads to 3 cells", () => {
const { doc: out, inserted } = insertTableRow(makeDoc(), "#1", ["x"], 0);
assert.equal(inserted, true);
const tbl = out.content[1];
assert.equal(tbl.content.length, 4);
const newRow = tbl.content[0]; // inserted at the front
assert.equal(newRow.content.length, 3);
// First cell "x", remaining two empty.
assert.deepEqual(
newRow.content.map((c) => c.content[0].content.length),
[1, 0, 0],
);
assert.equal(newRow.content[0].content[0].content[0].text, "x");
});
test("insertTableRow throws when given more cells than columns", () => {
assert.throws(
() => insertTableRow(makeDoc(), "#1", ["a", "b", "c", "d"]),
/table_insert_row: got 4 cell\(s\) but the table has 3 column\(s\)/,
);
});
test("insertTableRow on a missing table returns inserted:false", () => {
const { inserted } = insertTableRow(makeDoc(), "#0", ["x"]);
assert.equal(inserted, false);
});
// A header cell uses type "tableHeader" (vs. "tableCell" for data cells).
const headerCell = (paraId, text, colwidth) => ({
type: "tableHeader",
attrs: { colspan: 1, rowspan: 1, ...(colwidth ? { colwidth } : {}) },
content: [para(paraId, text)],
});
// Table whose first row uses tableHeader cells.
const makeHeaderDoc = () =>
doc({
type: "table",
content: [
row(headerCell("hA", "A"), headerCell("hB", "B")),
row(cell("p10", "r1c0"), cell("p11", "r1c1")),
],
});
test("insertTableRow at index 0 inherits the header cell type (tableHeader)", () => {
const { doc: out, inserted } = insertTableRow(makeHeaderDoc(), "#0", ["x", "y"], 0);
assert.equal(inserted, true);
const tbl = out.content[0];
const newRow = tbl.content[0]; // landed at index 0
// The new row's cells inherit the header type.
assert.deepEqual(
newRow.content.map((c) => c.type),
["tableHeader", "tableHeader"],
);
assert.equal(newRow.content[0].content[0].content[0].text, "x");
});
test("insertTableRow append produces data cells (tableCell), not header cells", () => {
const { doc: out, inserted } = insertTableRow(makeHeaderDoc(), "#0", ["x", "y"]);
assert.equal(inserted, true);
const tbl = out.content[0];
const newRow = tbl.content[tbl.content.length - 1]; // appended last
assert.deepEqual(
newRow.content.map((c) => c.type),
["tableCell", "tableCell"],
);
});
// Ragged table: row 0 has 2 cols, a later row has 3.
const makeRaggedDoc = () =>
doc({
type: "table",
content: [
row(cell("a0", "a0"), cell("a1", "a1")),
row(cell("b0", "b0"), cell("b1", "b1"), cell("b2", "b2")),
],
});
test("insertTableRow uses the max column count across all rows (ragged table)", () => {
// colCount is 3 (the widest row), so 3 cells are accepted...
const { doc: out, inserted } = insertTableRow(makeRaggedDoc(), "#0", ["x", "y", "z"]);
assert.equal(inserted, true);
const tbl = out.content[0];
const newRow = tbl.content[tbl.content.length - 1];
assert.equal(newRow.content.length, 3);
assert.deepEqual(
newRow.content.map((c) => c.content[0].content[0]?.text),
["x", "y", "z"],
);
// ...but 4 cells exceed the widest row and throw.
assert.throws(
() => insertTableRow(makeRaggedDoc(), "#0", ["a", "b", "c", "d"]),
/table_insert_row: got 4 cell\(s\) but the table has 3 column\(s\)/,
);
});
test("insertTableRow into an empty table uses colCount = supplied cells", () => {
const empty = doc({ type: "table", content: [] });
const { doc: out, inserted } = insertTableRow(empty, "#0", ["x", "y", "z"]);
assert.equal(inserted, true);
const tbl = out.content[0];
assert.equal(tbl.content.length, 1);
assert.equal(tbl.content[0].content.length, 3);
assert.deepEqual(
tbl.content[0].content.map((c) => c.content[0].content[0]?.text),
["x", "y", "z"],
);
});
test("insertTableRow mints 12-char [a-z0-9] ids that are unique and non-colliding", () => {
const input = makeDoc();
const existingIds = allIds(input);
const { doc: out } = insertTableRow(input, "#1", ["x", "y", "z"]);
const tbl = out.content[1];
const newRow = tbl.content[tbl.content.length - 1];
const newIds = newRow.content.map((c) => c.content[0].attrs.id);
// Docmost-style: exactly 12 chars from lowercase a-z0-9.
for (const id of newIds) {
assert.match(id, /^[a-z0-9]{12}$/);
assert.equal(existingIds.has(id), false); // no collision with the doc
}
// All distinct within the new row.
assert.equal(new Set(newIds).size, newIds.length);
});
// ---------------------------------------------------------------------------
// deleteTableRow
// ---------------------------------------------------------------------------
test("deleteTableRow removes the 3rd row -> rows:2", () => {
const { doc: out, deleted } = deleteTableRow(makeDoc(), "#1", 2);
assert.equal(deleted, true);
const tbl = out.content[1];
assert.equal(tbl.content.length, 2);
// The removed row was the second data row (r2*).
assert.deepEqual(
tbl.content.map((r) => r.content[0].content[0].content[0]?.text ?? ""),
["A", "r1c0"],
);
});
test("deleteTableRow out-of-range index throws", () => {
assert.throws(
() => deleteTableRow(makeDoc(), "#1", 9),
/table_delete_row: row index 9 out of range \(table has 3 row\(s\)\)/,
);
});
test("deleteTableRow refuses to delete the only row", () => {
const single = doc({
type: "table",
content: [row(cell("only", "x"))],
});
assert.throws(
() => deleteTableRow(single, "#0", 0),
/refusing to delete the only row of the table/,
);
});
// ---------------------------------------------------------------------------
// updateTableCell
// ---------------------------------------------------------------------------
test("updateTableCell sets cell [1,1] to 'Z' and preserves the paragraph id", () => {
const input = makeDoc();
const snap = snapshot(input);
const { doc: out, updated } = updateTableCell(input, "#1", 1, 1, "Z");
assert.equal(updated, true);
// Input not mutated.
assert.deepEqual(input, snap);
const targetCell = out.content[1].content[1].content[1];
assert.equal(targetCell.content.length, 1);
const p = targetCell.content[0];
assert.equal(p.type, "paragraph");
assert.equal(p.attrs.id, "p11"); // preserved
assert.equal(p.content[0].text, "Z");
// Cell attrs preserved.
assert.equal(targetCell.attrs.colspan, 1);
assert.equal(targetCell.attrs.rowspan, 1);
});
test("updateTableCell out-of-range row/col throws", () => {
assert.throws(
() => updateTableCell(makeDoc(), "#1", 9, 0, "x"),
/table_update_cell: cell \[9,0\] out of range/,
);
assert.throws(
() => updateTableCell(makeDoc(), "#1", 0, 9, "x"),
/table_update_cell: cell \[0,9\] out of range/,
);
});

View File

@@ -0,0 +1,303 @@
import { test } from "node:test";
import assert from "node:assert/strict";
import {
blockText,
walk,
getList,
insertMarkerAfter,
setCalloutRange,
noteItem,
mdToInlineNodes,
commentsToFootnotes,
} from "../../build/lib/transforms.js";
// ---------------------------------------------------------------------------
// Builders
// ---------------------------------------------------------------------------
const t = (text, marks) => (marks ? { type: "text", text, marks } : { type: "text", text });
const para = (id, ...children) => ({
type: "paragraph",
attrs: { id },
content: children,
});
const heading = (id, text) => ({
type: "heading",
attrs: { id, level: 2 },
content: [t(text)],
});
const olist = (...items) => ({ type: "orderedList", content: items });
const li = (text) => ({
type: "listItem",
content: [{ type: "paragraph", content: [t(text)] }],
});
const doc = (...children) => ({ type: "doc", content: children });
const snapshot = (v) => JSON.parse(JSON.stringify(v));
// ---------------------------------------------------------------------------
// blockText / walk / getList
// ---------------------------------------------------------------------------
test("blockText concatenates nested inline text", () => {
assert.equal(blockText(para("p", t("a"), t("b"), t("c"))), "abc");
});
test("walk visits every node depth-first", () => {
const d = doc(para("p1", t("x")), olist(li("y")));
const types = [];
walk(d, (n) => types.push(n.type));
assert.deepEqual(types, [
"doc",
"paragraph",
"text",
"orderedList",
"listItem",
"paragraph",
"text",
]);
});
test("getList finds an orderedList without an id", () => {
const d = doc(para("p", t("x")), olist(li("one")));
const found = getList(d, (n) => n.type === "orderedList");
assert.ok(found);
assert.equal(found.type, "orderedList");
});
// ---------------------------------------------------------------------------
// insertMarkerAfter — mark-safe split
// ---------------------------------------------------------------------------
test("insertMarkerAfter splits a marked run and inserts an UNMARKED marker", () => {
// A paragraph: "see " (plain) + "the link" (link mark) + " here" (plain).
const link = [{ type: "link", attrs: { href: "http://x" } }];
const original = doc(
para("p1", t("see "), t("the link", link), t(" here")),
);
const before = snapshot(original);
const { doc: out, inserted } = insertMarkerAfter(
original,
"the link",
"[1]",
);
assert.equal(inserted, true);
// The caller's object is untouched (deep clone).
assert.deepEqual(original, before);
const inline = out.content[0].content;
// Expect: "see "(plain), "the link"(link), " [1]"(NO marks), " here"(plain).
const marker = inline.find((n) => n.text === " [1]");
assert.ok(marker, "marker run present");
assert.equal(marker.marks, undefined, "marker carries no marks");
// The link run kept its mark verbatim.
const linkRun = inline.find((n) => n.text === "the link");
assert.deepEqual(linkRun.marks, link);
// Plain text reads correctly with the marker placed right after the anchor.
assert.equal(blockText(out.content[0]), "see the link [1] here");
});
test("insertMarkerAfter respects beforeBlock and reports not-found", () => {
const d = doc(para("p1", t("alpha")), para("p2", t("beta")));
// anchor only in block index 1, but search limited to blocks < 1
const r = insertMarkerAfter(d, "beta", "[1]", { beforeBlock: 1 });
assert.equal(r.inserted, false);
});
// ---------------------------------------------------------------------------
// setCalloutRange
// ---------------------------------------------------------------------------
test("setCalloutRange rewrites [1]…[K] to [1]…[n]", () => {
const d = doc({
type: "callout",
attrs: { type: "info" },
content: [para("c", t("Footnotes [1]…[3] are translator notes."))],
});
const { doc: out, changed } = setCalloutRange(d, 7);
assert.equal(changed, 1);
assert.equal(blockText(out), "Footnotes [1]…[7] are translator notes.");
});
// ---------------------------------------------------------------------------
// noteItem / mdToInlineNodes
// ---------------------------------------------------------------------------
test("noteItem wraps inline nodes in a listItem with a fresh paragraph id", () => {
const item = noteItem([t("hello")]);
assert.equal(item.type, "listItem");
assert.equal(item.content[0].type, "paragraph");
assert.ok(item.content[0].attrs.id, "has a fresh id");
assert.deepEqual(item.content[0].content, [t("hello")]);
});
test("mdToInlineNodes splits a bold lead and strips a prefix", () => {
const nodes = mdToInlineNodes("комментарий: **Lead.** body text");
// bold lead node + plain remainder
assert.equal(nodes[0].text, "Lead.");
assert.deepEqual(nodes[0].marks, [{ type: "bold" }]);
assert.ok(nodes[1].text.includes("body text"));
assert.equal(nodes[1].marks, undefined);
});
test("mdToInlineNodes strips a 'N. ' numeric prefix", () => {
const nodes = mdToInlineNodes("3. plain note");
assert.equal(nodes.map((n) => n.text).join(""), "plain note");
});
// ---------------------------------------------------------------------------
// commentsToFootnotes — renumber by reading position on a small fixture
// ---------------------------------------------------------------------------
test("commentsToFootnotes anchors comments and renumbers by position", () => {
// Body has an EXISTING footnote [1] in the second paragraph; we add two
// inline comments anchored to text in the first and third paragraphs. After
// running, markers must be renumbered 1,2,3 in reading order and the notes
// list reordered to match.
const callout = {
type: "callout",
attrs: { type: "info" },
content: [para("c", t("Notes [1]…[1] follow."))],
};
const d = doc(
callout,
para("p1", t("First mentions apple.")),
para("p2", t("Second already has a note [1] here.")),
para("p3", t("Third mentions banana.")),
heading("h", "Примечания переводчика"),
olist(li("existing note one")), // matches the existing [1]
);
const comments = [
{ id: "cA", content: "apple note", selection: "apple" },
{ id: "cB", content: "banana note", selection: "banana" },
];
const { doc: out, consumed } = commentsToFootnotes(d, comments);
assert.deepEqual(consumed.sort(), ["cA", "cB"]);
// Markers in reading order: p1 "apple"->[1], p2 existing->[2], p3 "banana"->[3]
assert.match(blockText(out.content[1]), /\[1\]/);
assert.match(blockText(out.content[2]), /\[2\]/);
assert.match(blockText(out.content[3]), /\[3\]/);
// No stray placeholders remain.
const allText = blockText(out);
assert.doesNotMatch(allText, / F\d+ /);
// Notes list reordered to [apple, existing, banana] (reading order).
const list = out.content.find((n) => n.type === "orderedList");
assert.equal(list.content.length, 3);
assert.equal(blockText(list.content[0]), "apple note");
assert.equal(blockText(list.content[1]), "existing note one");
assert.equal(blockText(list.content[2]), "banana note");
// Callout range synced to 3 notes.
assert.match(blockText(out.content[0]), /\[1\]…\[3\]/);
});
test("commentsToFootnotes throws when the notes heading is missing", () => {
const d = doc(para("p", t("no notes section")));
assert.throws(
() => commentsToFootnotes(d, [{ id: "x", content: "y", selection: "no" }]),
/heading .* not found/,
);
});
// ---------------------------------------------------------------------------
// Bug 1: the placeholder sentinel must not collide with real "F<digits>" /
// "FN<digits>" text. Body text "F1"/"FN2"/"F12" near a real comment anchor must
// be left untouched; only the real comment becomes a footnote. "FN2" is the key
// case: the old printable " FN<i> " sentinel could collide with prose like "FN2",
// which the NUL-delimited "\u0000FN<i>\u0000" sentinel makes impossible.
// ---------------------------------------------------------------------------
test("commentsToFootnotes leaves literal 'F1'/'FN2'/'F12' body text untouched", () => {
const d = doc(
para("p1", t("Press F1 for help, model FN2 and F12 for tools near apple here.")),
heading("h", "Примечания переводчика"),
olist(), // empty notes list; the single comment supplies the only note
);
const comments = [{ id: "cA", content: "apple note", selection: "apple" }];
const { doc: out, consumed } = commentsToFootnotes(d, comments);
assert.deepEqual(consumed, ["cA"]);
const bodyText = blockText(out.content[0]);
// The literal "F1"/"FN2"/"F12" prose is preserved verbatim (no bogus
// footnotes, no eaten spaces around them).
assert.match(bodyText, /Press F1 for help, model FN2 and F12 for tools/);
// Exactly one real footnote marker was produced, at the anchored word.
const markerCount = (bodyText.match(/\[\d+\]/g) || []).length;
assert.equal(markerCount, 1);
assert.match(bodyText, /apple \[1\]/);
// Exactly one note in the list — "F1"/"FN2"/"F12" did not spawn extra notes.
const list = out.content.find((n) => n.type === "orderedList");
assert.equal(list.content.length, 1);
assert.equal(blockText(list.content[0]), "apple note");
// No stray placeholder sentinel remains anywhere: the NUL-delimited sentinel
// is fully consumed by the renumber pass, so no raw NUL control char persists
// in the returned doc. We deliberately do NOT assert absence of the printable
// " FN<i> " shape: the body intentionally contains real prose "model FN2 and",
// which must survive verbatim (see the match assertion above) - that is exactly
// why the old printable sentinel was unsafe and the NUL sentinel is not.
assert.doesNotMatch(blockText(out), /\u0000/);
});
// ---------------------------------------------------------------------------
// Bug 2: an out-of-range body marker must throw, not silently drop the note.
// ---------------------------------------------------------------------------
test("commentsToFootnotes throws on an out-of-range body marker", () => {
// Body marker [9] but the notes list has only 1 item -> inconsistent doc.
const d = doc(
para("p1", t("Some text with a dangling marker [9] here.")),
heading("h", "Примечания переводчика"),
olist(li("the only note")),
);
assert.throws(
() => commentsToFootnotes(d, []),
/footnote \[9\] has no matching note \(notes list has 1 items\); document is inconsistent/,
);
});
// ---------------------------------------------------------------------------
// Bug 4: a non-disclaimer callout in the body gets its [N] markers renumbered;
// a disclaimer callout carrying a "[1]…[K]" range is left out of renumbering.
// ---------------------------------------------------------------------------
test("commentsToFootnotes renumbers body callouts but skips the disclaimer range", () => {
const disclaimer = {
type: "callout",
attrs: { type: "info" },
content: [para("d", t("Notes [1]…[2] follow."))],
};
const bodyCallout = {
type: "callout",
attrs: { type: "warning" },
content: [para("bc", t("Important point already noted [1] above."))],
};
const d = doc(
disclaimer,
bodyCallout,
para("p2", t("Then a second mention with [2] too.")),
heading("h", "Примечания переводчика"),
olist(li("first note"), li("second note")),
);
const { doc: out, consumed } = commentsToFootnotes(d, []);
assert.deepEqual(consumed, []);
// The disclaimer's "[1]…[K]" range is NOT treated as body markers: it stays
// a range and is synced to the note count (2), not renumbered into [1],[2].
assert.match(blockText(out.content[0]), /\[1\]…\[2\]/);
// The body callout's [1] is renumbered as a real reading-order marker.
assert.match(blockText(out.content[1]), /noted \[1\] above/);
// The following paragraph's [2] keeps reading order.
assert.match(blockText(out.content[2]), /with \[2\] too/);
// Notes list still has the two original notes in order.
const list = out.content.find((n) => n.type === "orderedList");
assert.equal(list.content.length, 2);
assert.equal(blockText(list.content[0]), "first note");
assert.equal(blockText(list.content[1]), "second note");
});

View File

@@ -0,0 +1,14 @@
{
"compilerOptions": {
"target": "ES2022",
"module": "Node16",
"moduleResolution": "Node16",
"outDir": "./build",
"rootDir": "./src",
"strict": true,
"esModuleInterop": true,
"skipLibCheck": true,
"forceConsistentCasingInFileNames": true
},
"include": ["src/**/*"]
}