Compare commits

..

7 Commits

Author SHA1 Message Date
agent_coder c5bff2d84a fix(#345): normalize CRLF before front-matter strip (review round 3)
F9 [WARNING] The line-anchored front-matter regex from round 2 requires a bare
LF after the opening `---`, so a Windows/CRLF foreign file (`---\r\n...`) slips
past the strip and leaks its front-matter into the body (where `title: Foo`
renders as a setext heading that title extraction hijacks). The canonical parser
whose regex shape this copied (page-file.ts) normalizes CRLF -> LF BEFORE its
FRONTMATTER_RE; the import path copied the regex but missed the normalization.
normalizeForeignMarkdown now replaces CRLF with LF first (which also makes
convertReferenceFootnotes' split('\n') consistent). Adds a CRLF fixture.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-05 05:38:07 +03:00
agent_coder 80fc30633b fix(#345): replace id-alternation regex with a fixed generic scanner + line-anchor frontmatter (review round 2)
F7 [CRITICAL] The round-1 F2(a) fix built ONE alternation regex over all
definition ids (`(id1|id2|...)`). On prefix-chain ids (a, aa, aaa, ...) V8's
regex compiler blows its stack with a fatal, UNCATCHABLE 'RegExpCompiler
Allocation failed' that kills the whole process — strictly worse than the
original per-def thread-hang, and its match cost was still O(text x defs).
Replaced with a single FIXED generic scanner `/\[\^([^\]]+)\]/g` plus a map
lookup in the replacer: genuinely O(total text), no per-document regex
compilation, cannot blow up. Output is identical (only real def ids are inlined).

F8 [WARNING] The frontmatter strip regex was not line-anchored: it closed on the
FIRST `---` anywhere, so a value containing a triple-dash (e.g.
'title: Q1 --- Q2') truncated the frontmatter and leaked the rest into the body.
Replaced with the line-anchored shape the canonical parser already uses
(page-file.ts): open on `---\n`, close on a `\n---` line.

Adds tests: 4000 prefix-chain ids do not crash and stay fast; a frontmatter
value containing '---' is stripped whole.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-05 05:18:44 +03:00
agent_coder e17d5bc060 fix(#345): restore prom-client, harden normalizer against ReDoS, strip frontmatter (review round 1)
Addresses the round-1 review of #369:

F1 [CRITICAL] Restore prom-client. The prior commit removed it as a 'stray dep',
but metrics.registry.ts imports it unconditionally at startup (main.ts boot), so
a clean frozen install had no prom-client -> server tsc TS2307 + boot crash. It
was surviving only via hoisting from a warm store. Restored to apps/server
dependencies + regenerated the lock (prom-client/tdigest/bintrees return),
keeping the @docmost/prosemirror-markdown dep. Verified: clean frozen install ->
require.resolve('prom-client') ok, server tsc EXIT 0.

F2 [HIGH] Two quadratic ReDoS vectors in foreign-markdown.ts on untrusted import
(runs synchronously on the request thread, 30MB cap):
  (a) pass-2 was O(lines x defs) — a per-def RegExp rebuilt and run over every
      line. Replaced with ONE precompiled alternation regex over all def ids,
      built once per document, with an id->body lookup in the replacer: O(text).
  (b) the inline-code split alternation backtracks quadratically on a long
      UNCLOSED backtick run. Lines over 8KB now skip the split (left untouched) —
      a real footnote line is never that long.

F3 [WARNING] Restore the leading YAML front-matter strip that the retired
markdownToHtml layer did. Without it, Obsidian/Hugo/Jekyll/git-sync files leak
their front-matter into the body (and 'title:' renders as a setext heading that
title extraction can hijack).

F4 [WARNING] Extend the zip-import spec with an image (width+align) + callout
fidelity assertion through the PM->HTML->PM hop (the one hop the package suite
does not cover).

F5/F6 Update AGENTS.md (apps/server is now a prosemirror-markdown consumer) and
make the server pretest build prosemirror-markdown too.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-05 04:54:07 +03:00
agent_coder 2c2d60a5dc fix(#345): protect inline-code refs and escape footnote-body brackets
The foreign-markdown import normalizer rewrote GFM reference footnotes
(`[^id]` + `[^id]: def`) into canonical inline `^[def]` footnotes, but two
edge cases corrupted content:

1. A `[^id]` inside an inline-code span (backticks) was rewritten like prose
   text — only fenced code blocks were protected. Now the rewrite pass splits
   each line on inline-code spans and only touches the text outside them.

2. An unbalanced `]` in a definition body truncated the resulting `^[...]`
   footnote at the canonical tokenizer, leaking the tail as literal text. The
   body's square brackets are now backslash-escaped before wrapping.

Adds golden cases for both.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-05 03:39:01 +03:00
agent_coder 1417209915 fix(#345): drop stray prom-client dep + add prosemirror-markdown to the lock
The step-1 package.json declared the new @docmost/prosemirror-markdown workspace
dep but the lock was not regenerated (CI frozen install would fail), and it also
added a stray prom-client dep (a coder env-workaround for a pre-existing hoisted
import, unrelated to #345 — removed). Regenerated the lock with only the
prosemirror-markdown dep; faithful frozen install now passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-05 03:27:01 +03:00
agent_coder f555fc87da refactor(#345 step 2): server markdown IMPORT via canonical parser + normalizer
Move every SERVER Markdown->ProseMirror path off the editor-ext markdown layer
(`markdownToHtml`, a second marked-based parser) onto the canonical
`@docmost/prosemirror-markdown` package, and add a foreign-markdown normalizer at
the import boundary.

Code:
- `ImportService.processMarkdown` (single `.md` upload) now parses
  `markdownToProseMirror(normalizeForeignMarkdown(md))` directly — no HTML hop.
- `PageService.parseProsemirrorContent` markdown case (page create/update with
  `format: 'markdown'`) same.
- `FileImportTaskService` (zip import) parses markdown with the package, then
  serializes to HTML (`jsonToHtml`) so the SHARED HTML attachment / internal-link
  pipeline (processAttachments + formatImportHtml + processHTML) keeps handling
  `.md` and `.html` imports uniformly. The markdown PARSE — the drift source — no
  longer goes through editor-ext; the PM->HTML->PM hop that follows is lossless
  plumbing for attachment resolution, not a second parse.
- `canonicalizeFootnotes` stays as an idempotent #228 safety net for the HTML
  path (a no-op on the already-canonical markdown output).

Normalizer (`integrations/import/utils/foreign-markdown.ts`): a TEXT pre-pass,
NOT a parser fork. The strict canonical parser does not accept GFM `[^id]`
reference footnotes (and would misread `[^id]: def` as a CommonMark link-ref
definition, silently corrupting the ref into a bogus link), so the normalizer
rewrites reference footnotes into canonical inline `^[def]` before parsing.
Callout surfaces (`:::type` and `> [!type]`) are intentionally NOT touched — the
canonical parser already accepts BOTH natively, so normalizing them would be
redundant and risk degrading its nesting/code-fence-aware handling.

Fixtures-first: foreign-markdown.spec pins the normalizer and the end-to-end
acceptance (no literal `[^id]`/`:::` leaks; re-export is canonical). The two
footnote-canonicalize specs are updated to the canonical output — the parser
assigns fresh `fn-*` ids, so they now assert by definition BODY order (still
reference-ordered, deduped, orphan-free).

FINAL CHECK: `grep -rn "htmlToMarkdown\|markdownToHtml" apps/server/src` (non
-test) is now empty — both editor-ext markdown-layer functions are gone from the
server.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-05 03:21:07 +03:00
agent_coder d6d1195abd refactor(#345 step 1): server markdown EXPORT via canonical converter
Move every SERVER ProseMirror->Markdown path off the editor-ext markdown layer
(`htmlToMarkdown`, a second turndown-based converter) onto the canonical
`@docmost/prosemirror-markdown` package.

- `ExportService.exportPage` (page/space markdown export) and
  `collaboration.util.jsonToMarkdown` (used by page.controller's markdown
  responses and the AI public-share chat tool) now serialize DIRECTLY from
  ProseMirror JSON via `convertProseMirrorToMarkdown` — no HTML intermediate, no
  `<colgroup>` scrub (the converter emits GFM tables directly).

This is the SAME serializer the git-sync vault writer feeds, so an exported page
BODY is byte-identical to its vault representation: no more export-md vs vault-md
drift. The HTML export path is unchanged (still `jsonToHtml`).

Emitted markdown moves to the canonical forms: callouts `> [!type]` (not
`:::type`), inline footnotes `^[…]` (not `[^id]`), lossless images
`![alt](src) <!--img {…}-->` (editor-ext dropped width/height/align).

Fixtures-first: export-markdown.spec asserts those canonical forms and the
export==vault-by-construction equality (both call the package converter). The
one deliberate export/vault delta — export prepends the page title as an H1
while the vault carries it in frontmatter — is pinned by a test.

Test infra: declare the `@docmost/prosemirror-markdown` workspace dep; teach
jest to load its ESM build (babel-jest) and stub `@tiptap/react` (server code
imports editor-ext, whose node views reference React renderers only used in a
live browser editor — never on the server).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-05 03:20:25 +03:00
36 changed files with 1081 additions and 1187 deletions
+2 -2
View File
@@ -201,7 +201,7 @@ pnpm workspace (`pnpm@10.4.0`) orchestrated by **Nx**. Four workspace packages:
| `apps/client` | `client` | React 18 + Vite + Mantine 8 + TanStack Query + Jotai | SPA frontend |
| `packages/editor-ext` | `@docmost/editor-ext` | Tiptap/ProseMirror | Shared Tiptap node/mark extensions, imported by both the client and the server |
| `packages/mcp` | `@docmost/mcp` | MCP SDK, Tiptap, Yjs | Standalone MCP server, also bundled into the server at `/mcp`. Consumes the shared converter/schema from `@docmost/prosemirror-markdown` (#293) — it no longer carries its own vendored converter/schema copy |
| `packages/prosemirror-markdown` | `@docmost/prosemirror-markdown` | Tiptap, marked, jsdom | The single, canonical ProseMirror↔Markdown converter + Docmost schema mirror (#293). Consumed by `mcp` and `git-sync`; there is exactly ONE copy of the converter now |
| `packages/prosemirror-markdown` | `@docmost/prosemirror-markdown` | Tiptap, marked, jsdom | The single, canonical ProseMirror↔Markdown converter + Docmost schema mirror (#293). Consumed by `mcp`, `git-sync`, AND `apps/server` (server-side markdown import/export, #345); there is exactly ONE copy of the converter now |
`build` targets are Nx-cached and dependency-ordered (`dependsOn: ["^build"]`), so `editor-ext` builds before the apps. `nx.json` sets `affected.defaultBase: main`.
@@ -284,7 +284,7 @@ The API server is a Fastify app with a global `/api` prefix (`main.ts` excludes
### Client structure
Vite SPA. Code is organized by feature under `apps/client/src/features/*` (mirrors the server domains: `page`, `space`, `comment`, `ai-chat`, `editor`, …). Conventions:
- **TanStack Query** for server state (one `queries/` file per feature), **Jotai** atoms for local/shared UI state, **Mantine 8** + CSS modules (`*.module.css`) + `postcss-preset-mantine` for UI.
- The editor is Tiptap; shared node/mark extensions live in `packages/editor-ext` and are imported by **both the client and the server** (collaboration, import/export) — editor schema changes often need to be made in `editor-ext`, not just the client. The ProseMirror↔Markdown converter and its Docmost schema mirror now live in a SINGLE package, `@docmost/prosemirror-markdown` (#293), consumed by both `mcp` and `git-sync` — do NOT reintroduce a per-package copy. `editor-ext` is the upstream source of the Tiptap schema; the package's `docmost-schema.ts` mirrors it and a serializer-contract test (`packages/prosemirror-markdown/test/serializer-contract.test.ts`) guards the boundary (every schema node must have a converter case), so a drift surfaces as a failing test rather than silent divergence.
- The editor is Tiptap; shared node/mark extensions live in `packages/editor-ext` and are imported by **both the client and the server** (collaboration, schema, `canonicalizeFootnotes`) — editor schema changes often need to be made in `editor-ext`, not just the client. Server-side markdown import/export no longer lives in `editor-ext`: it goes through the canonical converter (#345, see below). The ProseMirror↔Markdown converter and its Docmost schema mirror now live in a SINGLE package, `@docmost/prosemirror-markdown` (#293), consumed by `mcp`, `git-sync`, and `apps/server` (#345) — do NOT reintroduce a per-package copy. `editor-ext` is the upstream source of the Tiptap schema; the package's `docmost-schema.ts` mirrors it and a serializer-contract test (`packages/prosemirror-markdown/test/serializer-contract.test.ts`) guards the boundary (every schema node must have a converter case), so a drift surfaces as a failing test rather than silent divergence.
- API access goes through `apps/client/src/lib/api-client.ts` (axios). The `@` alias maps to `apps/client/src`.
- Runtime config is injected at build time by `vite.config.ts` via `define` (`APP_URL`, `COLLAB_URL`, `APP_VERSION`, …) — these come from the root `.env`, not from `import.meta.env`.
-8
View File
@@ -12,14 +12,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
- **Save intentional page versions.** Press `Cmd/Ctrl+S` (or use the page menu)
to save a named version of a page. The history panel now distinguishes
intentional versions (a "Saved" / "Agent version" badge) from automatic
snapshots, dims autosaves, and offers an "Only versions" filter. Automatic
snapshots switched from a fixed interval to a trailing idle-flush with a
max-wait ceiling, and a boundary snapshot is pinned whenever the editing source
changes (e.g. a person's edits followed by the AI agent). (#370)
- **Place several images side by side in a row.** A new "Inline (side by
side)" alignment mode in the image bubble menu renders consecutive inline
images as a row that wraps onto the next line on narrow screens. The row is
@@ -1385,14 +1385,5 @@
"The commented text changed since this suggestion was made; it was not applied.": "The commented text changed since this suggestion was made; it was not applied.",
"Dismiss": "Dismiss",
"Suggestion dismissed": "Suggestion dismissed",
"Failed to dismiss suggestion": "Failed to dismiss suggestion",
"Save version": "Save version",
"Ctrl+S": "Ctrl+S",
"Version saved": "Version saved",
"Already saved as the latest version": "Already saved as the latest version",
"Agent version": "Agent version",
"Boundary": "Boundary",
"Autosave": "Autosave",
"Only versions": "Only versions",
"No saved versions yet.": "No saved versions yet."
"Failed to dismiss suggestion": "Failed to dismiss suggestion"
}
@@ -1248,14 +1248,5 @@
"The commented text changed since this suggestion was made; it was not applied.": "Прокомментированный текст изменился после создания предложения; оно не было применено.",
"Dismiss": "Не применять",
"Suggestion dismissed": "Предложение отклонено",
"Failed to dismiss suggestion": "Не удалось отклонить предложение",
"Save version": "Сохранить версию",
"Ctrl+S": "Ctrl+S",
"Version saved": "Версия сохранена",
"Already saved as the latest version": "Уже сохранено как последняя версия",
"Agent version": "Версия агента",
"Boundary": "Граница",
"Autosave": "Автосейв",
"Only versions": "Только версии",
"No saved versions yet.": "Пока нет сохранённых версий."
"Failed to dismiss suggestion": "Не удалось отклонить предложение"
}
@@ -1,19 +1,10 @@
import { atom } from "jotai";
import { Editor } from "@tiptap/core";
import type { HocuspocusProvider } from "@hocuspocus/provider";
import { PageEditMode } from "@/features/user/types/user.types.ts";
import type { DictationUnavailableReason } from "@/features/dictation/dictation-status";
export const pageEditorAtom = atom<Editor | null>(null);
// #370 — the active page's collab provider, published by the page editor so the
// header menu can emit the "save-version" stateless signal (Cmd+S / button).
// Null when the page is read-only / collab isn't connected. A typed initial
// value (rather than an explicit generic) keeps jotai's overload resolution on
// the writable PrimitiveAtom branch.
const initialCollabProvider: HocuspocusProvider | null = null;
export const collabProviderAtom = atom(initialCollabProvider);
export const titleEditorAtom = atom<Editor | null>(null);
export const readOnlyEditorAtom = atom<Editor | null>(null);
@@ -31,18 +31,11 @@ import { useAtom, useAtomValue, useSetAtom } from "jotai";
import useCollaborationUrl from "@/features/editor/hooks/use-collaboration-url";
import { currentUserAtom } from "@/features/user/atoms/current-user-atom";
import {
collabProviderAtom,
currentPageEditModeAtom,
dictationAvailabilityAtom,
pageEditorAtom,
yjsConnectionStatusAtom,
} from "@/features/editor/atoms/editor-atoms";
import { notifications } from "@mantine/notifications";
import {
VERSION_SAVED_MESSAGE_TYPE,
type VersionSavedMessage,
saveVersionPending,
} from "@/features/page-history/version-messages";
import { asideStateAtom } from "@/components/layouts/global/hooks/atoms/sidebar-atom";
import {
activeCommentIdAtom,
@@ -130,7 +123,6 @@ export default function PageEditor({
const [currentUser] = useAtom(currentUserAtom);
const [, setEditor] = useAtom(pageEditorAtom);
const setCollabProvider = useSetAtom(collabProviderAtom);
const [, setAsideState] = useAtom(asideStateAtom);
const [, setActiveCommentId] = useAtom(activeCommentIdAtom);
const [showCommentPopup, setShowCommentPopup] = useAtom(showCommentPopupAtom);
@@ -188,24 +180,6 @@ export default function PageEditor({
const onStatelessHandler = ({ payload }: onStatelessParameters) => {
try {
const message = JSON.parse(payload);
// #370 — a version was saved somewhere; live-refresh the history panel
// on every client. Only the client that pressed Save (tracked by the
// module-level flag) shows the confirmation toast.
if (message?.type === VERSION_SAVED_MESSAGE_TYPE) {
const versionMsg = message as VersionSavedMessage;
queryClient.invalidateQueries({
queryKey: ["page-history-list"],
});
if (saveVersionPending.current) {
saveVersionPending.current = false;
notifications.show({
message: versionMsg.alreadySaved
? t("Already saved as the latest version")
: t("Version saved"),
});
}
return;
}
if (message?.type !== "page.updated" || !message.updatedAt) return;
const pageData = queryClient.getQueryData<IPage>(["pages", slugId]);
if (pageData) {
@@ -263,16 +237,12 @@ export default function PageEditor({
local.on("synced", onLocalSyncedHandler);
providersRef.current = { socket, local, remote };
// #370 — publish the provider so the header menu can emit save-version.
setCollabProvider(remote);
setProvidersReady(true);
} else {
setCollabProvider(providersRef.current.remote);
setProvidersReady(true);
}
// Only destroy on final unmount
return () => {
setCollabProvider(null);
providersRef.current?.socket.destroy();
providersRef.current?.remote.destroy();
providersRef.current?.local.destroy();
@@ -1,11 +1,4 @@
import {
Text,
Group,
UnstyledButton,
Avatar,
Tooltip,
Badge,
} from "@mantine/core";
import { Text, Group, UnstyledButton, Avatar, Tooltip } from "@mantine/core";
import { CustomAvatar } from "@/components/ui/custom-avatar.tsx";
import { AgentAvatarStack } from "@/components/ui/agent-avatar-stack.tsx";
import { formattedDate } from "@/lib/time";
@@ -14,59 +7,36 @@ import clsx from "clsx";
import { IPageHistory } from "@/features/page-history/types/page.types";
import { memo, useCallback } from "react";
import { useSetAtom } from "jotai";
import { useTranslation } from "react-i18next";
import { historyAtoms } from "@/features/page-history/atoms/history-atoms.ts";
const MAX_VISIBLE_AVATARS = 5;
/**
* #370 — map a snapshot's intentionality tier to its badge. `version: true`
* marks the intentional points (manual / agent); autosaves (boundary / idle /
* legacy null) are non-versions and get dimmed in the list.
*/
type HistoryKindMeta = { labelKey: string; color: string; version: boolean };
export function historyKindMeta(kind?: string | null): HistoryKindMeta {
switch (kind) {
case "manual":
return { labelKey: "Saved", color: "blue", version: true };
case "agent":
return { labelKey: "Agent version", color: "violet", version: true };
case "boundary":
return { labelKey: "Boundary", color: "gray", version: false };
default: // "idle" | null | undefined (legacy autosave)
return { labelKey: "Autosave", color: "gray", version: false };
}
}
interface HistoryItemProps {
historyItem: IPageHistory;
// The previous snapshot for diff/restore is resolved by id from the FULL list
// in the parent (resolvePrevSnapshotId), so the item only needs to report its
// own id — never a list index (which would be the filtered-view index).
onSelect: (id: string) => void;
onHover?: (id: string) => void;
index: number;
onSelect: (id: string, index: number) => void;
onHover?: (id: string, index: number) => void;
onHoverEnd?: () => void;
isActive: boolean;
}
const HistoryItem = memo(function HistoryItem({
historyItem,
index,
onSelect,
onHover,
onHoverEnd,
isActive,
}: HistoryItemProps) {
const setHistoryModalOpen = useSetAtom(historyAtoms);
const { t } = useTranslation();
const kindMeta = historyKindMeta(historyItem.kind);
const handleClick = useCallback(() => {
onSelect(historyItem.id);
}, [onSelect, historyItem.id]);
onSelect(historyItem.id, index);
}, [onSelect, historyItem.id, index]);
const handleMouseEnter = useCallback(() => {
onHover?.(historyItem.id);
}, [onHover, historyItem.id]);
onHover?.(historyItem.id, index);
}, [onHover, historyItem.id, index]);
const contributors = historyItem.contributors;
const hasContributors = contributors && contributors.length > 0;
@@ -79,20 +49,8 @@ const HistoryItem = memo(function HistoryItem({
onMouseEnter={handleMouseEnter}
onMouseLeave={onHoverEnd}
className={clsx(classes.history, { [classes.active]: isActive })}
// #370 — dim autosnapshots so intentional versions stand out.
style={{ opacity: kindMeta.version ? 1 : 0.55 }}
>
<Group gap={6} wrap="nowrap" justify="space-between">
<Text size="sm">{formattedDate(new Date(historyItem.createdAt))}</Text>
<Badge
size="xs"
radius="sm"
variant={kindMeta.version ? "filled" : "light"}
color={kindMeta.color}
>
{t(kindMeta.labelKey)}
</Badge>
</Group>
<Text size="sm">{formattedDate(new Date(historyItem.createdAt))}</Text>
<Group gap={6} wrap="nowrap" mt={4}>
{hasContributors ? (
@@ -9,7 +9,7 @@ import {
historyAtoms,
} from "@/features/page-history/atoms/history-atoms";
import { useAtom, useSetAtom } from "jotai";
import { useCallback, useEffect, useMemo, useRef, useState } from "react";
import { useCallback, useEffect, useMemo, useRef } from "react";
import {
Button,
ScrollArea,
@@ -17,12 +17,9 @@ import {
Divider,
Loader,
Center,
Switch,
Text,
} from "@mantine/core";
import { useTranslation } from "react-i18next";
import { useHistoryRestore } from "@/features/page-history/hooks";
import { resolvePrevSnapshotId } from "@/features/page-history/utils/resolve-prev-snapshot";
const PREFETCH_DELAY_MS = 150;
@@ -50,23 +47,6 @@ function HistoryList({ pageId }: Props) {
[pageHistoryData],
);
// #370 — "only versions" filter: hide autosnapshots (idle/boundary/legacy
// null), keep only intentional points (manual/agent). Filtering is over the
// already-loaded pages; the diff/restore still targets the true previous
// snapshot, so items carry their index within the FULL list.
const [onlyVersions, setOnlyVersions] = useState(false);
const isVersion = useCallback(
(kind?: string | null) => kind === "manual" || kind === "agent",
[],
);
const visibleItems = useMemo(
() =>
onlyVersions
? historyItems.filter((item) => isVersion(item.kind))
: historyItems,
[historyItems, onlyVersions, isVersion],
);
const loadMoreRef = useRef<HTMLDivElement>(null);
const prefetchTimeoutRef = useRef<ReturnType<typeof setTimeout> | null>(null);
@@ -80,13 +60,11 @@ function HistoryList({ pageId }: Props) {
}, []);
const handleHover = useCallback(
(historyId: string) => {
(historyId: string, index: number) => {
clearPrefetchTimeout();
prefetchTimeoutRef.current = setTimeout(() => {
prefetchPageHistory(historyId);
// The true previous snapshot in the FULL list (not the previous visible
// one under the "only versions" filter).
const prevId = resolvePrevSnapshotId(historyItems, historyId);
const prevId = historyItems[index + 1]?.id;
if (prevId) {
prefetchPageHistory(prevId);
}
@@ -100,11 +78,9 @@ function HistoryList({ pageId }: Props) {
}, [clearPrefetchTimeout]);
const handleSelect = useCallback(
(id: string) => {
(id: string, index: number) => {
setActiveHistoryId(id);
// Baseline = true previous snapshot in the FULL list, so the "only
// versions" filter never diffs/restores against the wrong item.
setActiveHistoryPrevId(resolvePrevSnapshotId(historyItems, id));
setActiveHistoryPrevId(historyItems[index + 1]?.id ?? "");
},
[historyItems, setActiveHistoryId, setActiveHistoryPrevId],
);
@@ -152,27 +128,12 @@ function HistoryList({ pageId }: Props) {
return (
<div>
<Group px="xs" py={6} justify="flex-end">
<Switch
size="xs"
checked={onlyVersions}
onChange={(e) => setOnlyVersions(e.currentTarget.checked)}
label={t("Only versions")}
/>
</Group>
<ScrollArea h={620} w="100%" type="scroll" scrollbarSize={5}>
{onlyVersions && visibleItems.length === 0 && (
<Center py="md">
<Text size="sm" c="dimmed">
{t("No saved versions yet.")}
</Text>
</Center>
)}
{visibleItems.map((historyItem) => (
{historyItems.map((historyItem, index) => (
<HistoryItem
key={historyItem.id}
historyItem={historyItem}
index={index}
onSelect={handleSelect}
onHover={handleHover}
onHoverEnd={clearPrefetchTimeout}
@@ -24,10 +24,6 @@ export interface IPageHistory {
updatedAt: string;
lastUpdatedBy: IPageHistoryUser;
contributors?: IPageHistoryUser[];
// #370 — intentionality tier: 'manual'/'agent' are versions (intentional
// points), 'idle'/'boundary' are autosnapshots; null/undefined = legacy
// autosave. Derived server-side, drives the history badge + "versions" filter.
kind?: "manual" | "agent" | "idle" | "boundary" | null;
// Provenance markers copied off the page row when the snapshot was saved.
// `'agent'` marks a version written by the AI agent; `lastUpdatedAiChatId`
// (when present) deep-links to the chat that produced the edit.
@@ -1,42 +0,0 @@
import { describe, it, expect } from "vitest";
import { resolvePrevSnapshotId } from "./resolve-prev-snapshot";
// #370 F4 — the risky client path: with the "only versions" filter active, diff
// and restore must still baseline against the TRUE previous snapshot in the FULL
// list, never the previous VISIBLE version (which would skip the autosnapshots
// between two versions). These pin that the resolution is by FULL-list order.
describe("resolvePrevSnapshotId", () => {
// Newest-first, as the history list stores it: a version, then two autosaves,
// then an older version.
const full = [
{ id: "v2", kind: "manual" },
{ id: "a2", kind: "idle" },
{ id: "a1", kind: "boundary" },
{ id: "v1", kind: "manual" },
{ id: "a0", kind: null },
];
it("returns the immediate FULL-list successor, not the previous visible version", () => {
// Selecting v2 while filtered to versions-only must baseline against a2 (the
// real chronological predecessor), NOT v1 (the previous visible version).
expect(resolvePrevSnapshotId(full, "v2")).toBe("a2");
});
it("resolves an autosnapshot's predecessor by full-list order", () => {
expect(resolvePrevSnapshotId(full, "a1")).toBe("v1");
});
it("returns '' for the oldest item (no predecessor)", () => {
expect(resolvePrevSnapshotId(full, "a0")).toBe("");
});
it("returns '' for an id not in the list", () => {
expect(resolvePrevSnapshotId(full, "missing")).toBe("");
});
it("does not depend on a filtered subset — same result whatever is visible", () => {
// The helper only ever sees the full list; a filtered view cannot change the
// baseline it computes.
expect(resolvePrevSnapshotId(full, "v1")).toBe("a0");
});
});
@@ -1,22 +0,0 @@
/**
* #370 — resolve the TRUE previous snapshot for a history item.
*
* The history panel can be filtered to "only versions" (manual/agent), but diff
* and restore must always compare against the immediately-preceding snapshot in
* the FULL, unfiltered list — NOT the previous VISIBLE item. Comparing against
* the previous visible version would silently skip the autosnapshots between two
* versions and diff/restore the wrong baseline.
*
* Given the full (newest-first) list and an item id, this returns the id of the
* item right after it in the full list (its chronological predecessor), or "" if
* it is the oldest / not found. Pure and list-order-preserving so it can be unit
* tested without mounting the component.
*/
export function resolvePrevSnapshotId(
fullItems: ReadonlyArray<{ id: string }>,
id: string,
): string {
const index = fullItems.findIndex((item) => item.id === id);
if (index === -1) return "";
return fullItems[index + 1]?.id ?? "";
}
@@ -1,28 +0,0 @@
/**
* #370 — page-version stateless wire formats. Kept in one place so the client
* emitter (Save hotkey / button) and the client listener (page-editor) agree
* with the server (PersistenceExtension) on the message shapes.
*/
/** Client → server: "save a version now". The server derives the tier
* (manual/agent) from the signed connection actor, never from this payload. */
export const SAVE_VERSION_MESSAGE_TYPE = "save-version";
/** Server → all clients: a version was saved (or promoted / already existed). */
export const VERSION_SAVED_MESSAGE_TYPE = "version.saved";
export interface VersionSavedMessage {
type: typeof VERSION_SAVED_MESSAGE_TYPE;
historyId: string;
kind: "manual" | "agent";
/** True when the latest snapshot was already a manual version (a no-op save). */
alreadySaved: boolean;
}
/**
* Cross-component coordination flag so only the client that pressed Save shows
* the confirmation toast, while every other client silently refreshes its
* history panel on the broadcast. A module-level ref avoids stale-closure
* pitfalls in the editor's long-lived stateless handler.
*/
export const saveVersionPending = { current: false };
@@ -3,7 +3,6 @@ import {
IconArrowRight,
IconArrowsHorizontal,
IconClockHour4,
IconDeviceFloppy,
IconDots,
IconEye,
IconEyeOff,
@@ -18,7 +17,7 @@ import {
IconTrash,
IconWifiOff,
} from "@tabler/icons-react";
import React, { useCallback, useEffect, useRef, useState } from "react";
import React, { useEffect, useRef, useState } from "react";
import { useAsideTriggerProps } from "@/hooks/use-toggle-aside.tsx";
import { useAtom, useAtomValue } from "jotai";
import { historyAtoms } from "@/features/page-history/atoms/history-atoms.ts";
@@ -40,14 +39,9 @@ import { Trans, useTranslation } from "react-i18next";
import ExportModal from "@/components/common/export-modal";
import { htmlToMarkdown } from "@docmost/editor-ext";
import {
collabProviderAtom,
pageEditorAtom,
yjsConnectionStatusAtom,
} from "@/features/editor/atoms/editor-atoms.ts";
import {
SAVE_VERSION_MESSAGE_TYPE,
saveVersionPending,
} from "@/features/page-history/version-messages.ts";
import { formattedDate } from "@/lib/time.ts";
import { PageEditModeToggle } from "@/features/user/components/page-state-pref.tsx";
import MovePageModal from "@/features/page/components/move-page-modal.tsx";
@@ -78,34 +72,9 @@ export default function PageHeaderMenu({ readOnly }: PageHeaderMenuProps) {
});
const isDeleted = !!page?.deletedAt;
const [workspace] = useAtom(workspaceAtom);
const collabProvider = useAtomValue(collabProviderAtom);
// Community public-sharing entry point (replaces the removed EE PageShareModal)
const workspaceSharingDisabled = workspace?.settings?.sharing?.disabled === true;
// #370 — explicit "save a version" (Cmd+S / Save button). One path for the
// human; the server derives the tier from the signed actor. Readers can't save
// (the button is hidden and the collab connection is read-only server-side).
const handleSaveVersion = useCallback(() => {
if (readOnly || !collabProvider) return;
// Flag this client as the initiator so only it shows the confirmation toast;
// a safety timeout clears it if no broadcast comes back (e.g. offline).
saveVersionPending.current = true;
window.setTimeout(() => {
saveVersionPending.current = false;
}, 5000);
collabProvider.sendStateless(
JSON.stringify({ type: SAVE_VERSION_MESSAGE_TYPE }),
);
}, [readOnly, collabProvider]);
// mod+S must also block the browser's "Save page" dialog. `triggerOnContent-
// Editable` + empty ignore-list so it fires while typing in the editor/title.
useHotkeys(
[["mod+S", handleSaveVersion, { preventDefault: true }]],
[],
true,
);
useHotkeys(
[
[
@@ -164,16 +133,15 @@ export default function PageHeaderMenu({ readOnly }: PageHeaderMenuProps) {
</ActionIcon>
</Tooltip>
<PageActionMenu readOnly={readOnly} onSaveVersion={handleSaveVersion} />
<PageActionMenu readOnly={readOnly} />
</>
);
}
interface PageActionMenuProps {
readOnly?: boolean;
onSaveVersion?: () => void;
}
function PageActionMenu({ readOnly, onSaveVersion }: PageActionMenuProps) {
function PageActionMenu({ readOnly }: PageActionMenuProps) {
const { t } = useTranslation();
const [, setHistoryModalOpen] = useAtom(historyAtoms);
const clipboard = useClipboard({ timeout: 500 });
@@ -334,20 +302,6 @@ function PageActionMenu({ readOnly, onSaveVersion }: PageActionMenuProps) {
</Group>
</Menu.Item>
{!readOnly && (
<Menu.Item
leftSection={<IconDeviceFloppy size={16} />}
onClick={onSaveVersion}
rightSection={
<Text size="xs" c="dimmed">
{t("Ctrl+S")}
</Text>
}
>
{t("Save version")}
</Menu.Item>
)}
<Menu.Item
leftSection={<IconHistory size={16} />}
onClick={openHistoryModal}
+6 -4
View File
@@ -23,7 +23,7 @@
"migration:reset": "tsx src/database/migrate.ts down-to NO_MIGRATIONS",
"migration:codegen": "kysely-codegen --dialect=postgres --camel-case --env-file=../../.env --out-file=./src/database/types/db.d.ts",
"lint": "eslint \"{src,apps,libs,test}/**/*.ts\" --fix",
"pretest": "pnpm --filter @docmost/editor-ext build",
"pretest": "pnpm --filter @docmost/editor-ext build && pnpm --filter @docmost/prosemirror-markdown build",
"test": "jest",
"test:int": "jest --config test/jest-integration.json",
"test:watch": "jest --watch",
@@ -43,6 +43,7 @@
"@clickhouse/client": "^1.18.2",
"@docmost/mcp": "workspace:*",
"@docmost/pdf-inspector": "1.9.6",
"@docmost/prosemirror-markdown": "workspace:*",
"@fastify/cookie": "^11.0.2",
"@fastify/multipart": "^10.0.0",
"@fastify/static": "^9.1.3",
@@ -175,7 +176,7 @@
"/node_modules/"
],
"transform": {
"happy-dom.+\\.js$": [
"(happy-dom.+|prosemirror-markdown/build/.+)\\.js$": [
"babel-jest",
{
"presets": [
@@ -193,7 +194,7 @@
"^.+\\.(t|j)sx?$": "ts-jest"
},
"transformIgnorePatterns": [
"/node_modules/(?!(\\.pnpm/)?(nanoid|uuid|image-dimensions|marked|happy-dom|lib0)(@|/))"
"/node_modules/(?!(\\.pnpm/)?(nanoid|uuid|image-dimensions|marked|happy-dom|lib0|@docmost/prosemirror-markdown)(@|/))"
],
"collectCoverageFrom": [
"**/*.(t|j)s"
@@ -204,7 +205,8 @@
"^@docmost/db/(.*)$": "<rootDir>/database/$1",
"^@docmost/transactional/(.*)$": "<rootDir>/integrations/transactional/$1",
"^@docmost/ee/(.*)$": "<rootDir>/ee/$1",
"^src/(.*)$": "<rootDir>/$1"
"^src/(.*)$": "<rootDir>/$1",
"^@tiptap/react$": "<rootDir>/../test/stubs/tiptap-react.js"
}
}
}
@@ -43,7 +43,6 @@ import {
Column,
Status,
addUniqueIdsToDoc,
htmlToMarkdown,
TransclusionSource,
TransclusionReference,
FootnoteReference,
@@ -51,6 +50,7 @@ import {
FootnoteDefinition,
PageEmbed,
} from '@docmost/editor-ext';
import { convertProseMirrorToMarkdown } from '@docmost/prosemirror-markdown';
import { generateText, getSchema, JSONContent } from '@tiptap/core';
import { generateHTML, generateJSON } from '../common/helpers/prosemirror/html';
// @tiptap/html library works best for generating prosemirror json state but not HTML
@@ -239,6 +239,10 @@ export function prosemirrorNodeToYElement(node: any): Y.XmlElement | Y.XmlText {
}
export function jsonToMarkdown(tiptapJson: any): string {
const html = jsonToHtml(tiptapJson);
return htmlToMarkdown(html);
// Direct ProseMirror JSON -> Markdown via the canonical converter
// (`@docmost/prosemirror-markdown`) — no HTML intermediate, no second
// editor-ext markdown layer. Same serializer as the page/space export and the
// git-sync vault writer, so every server PM->MD path emits identical canonical
// markdown (issue #345).
return convertProseMirrorToMarkdown(tiptapJson);
}
+3 -29
View File
@@ -1,29 +1,3 @@
/**
* #370 — page-history intentionality tiers. Domain of `page_history.kind`.
* - 'manual' / 'agent' → Tier 1 versions (intentional points)
* - 'idle' / 'boundary' → Tier 0 autosnapshots (safety net)
* A legacy `null` kind is treated as an autosave.
*/
export type PageHistoryKind = 'manual' | 'agent' | 'idle' | 'boundary';
/**
* #370 — trailing idle-flush windows. A page's pending idle snapshot is
* re-armed on every store and fires this long after edits go quiet, so a burst
* of edits collapses into a single autosnapshot instead of one-per-store. Human
* sessions are noisier and less risky, so they flush less often than the agent.
*/
export const IDLE_INTERVAL_USER = 60 * 60 * 1000; // 60m
export const IDLE_INTERVAL_AGENT = 15 * 60 * 1000; // 15m
/**
* #370 — max-wait ceiling for the idle flush. Pure trailing debounce starves the
* safety net: hocuspocus stores at least every ~45s, so a CONTINUOUS editing
* session would re-arm the trailing timer forever and never take an idle
* snapshot until edits finally go quiet (up to IDLE_INTERVAL_USER = 60m). This
* ceiling bounds the actual wait from the FIRST edit of a burst, so an idle
* snapshot fires at least this often during a long unbroken session — restoring
* a recovery point cadence closer to the old heuristic without one-per-store
* noise. Mirrors hocuspocus's own maxDebounce idea.
*/
export const IDLE_MAX_WAIT_USER = 10 * 60 * 1000; // 10m
export const IDLE_MAX_WAIT_AGENT = 5 * 60 * 1000; // 5m
export const HISTORY_INTERVAL = 5 * 60 * 1000;
export const HISTORY_FAST_INTERVAL = 60 * 1000;
export const HISTORY_FAST_THRESHOLD = 5 * 60 * 1000;
@@ -1,93 +1,84 @@
import { computeHistoryJob, resolveSource } from './persistence.extension';
import {
IDLE_INTERVAL_AGENT,
IDLE_INTERVAL_USER,
IDLE_MAX_WAIT_AGENT,
IDLE_MAX_WAIT_USER,
computeHistoryJob,
resolveSource,
} from './persistence.extension';
import {
HISTORY_FAST_INTERVAL,
HISTORY_FAST_THRESHOLD,
HISTORY_INTERVAL,
} from '../constants';
// A fixed clock + fixed createdAt make pageAge deterministic.
const NOW = 1_700_000_000_000;
const PAGE_ID = '550e8400-e29b-41d4-a716-446655440000';
const page = { id: PAGE_ID };
// Build a minimal page whose age (NOW - createdAt) is exactly `ageMs`.
const pageAged = (ageMs: number) => ({
id: PAGE_ID,
createdAt: new Date(NOW - ageMs),
});
describe('computeHistoryJob (#370 — shared trailing idle pipeline)', () => {
it('human edit → user idle window, bare page.id job', () => {
// Humans and the agent now share ONE idle job per page (jobId = page.id).
// The agent's old delay=0 fast path is GONE — intentional agent points now
// arrive via the explicit save-version signal, not a zero-delay snapshot.
const { jobId, delay } = computeHistoryJob(page, 'user');
expect(delay).toBe(IDLE_INTERVAL_USER);
describe('computeHistoryJob', () => {
it('agent edit → delay MUST be 0 and job id is source-keyed', () => {
// INVARIANT (§15 H2 / persistence.extension): the agent delay MUST stay 0.
// The worker re-reads the page row at run time, so any non-zero delay risks
// snapshotting content a later human edit has already overwritten. This is
// the load-bearing assertion of this spec — do not relax it.
const { jobId, delay } = computeHistoryJob(pageAged(0), 'agent', NOW);
expect(delay).toBe(0);
expect(jobId).toBe(`${PAGE_ID}-agent`);
});
it('agent edit on an OLD page is still delay 0 (age never applies to agents)', () => {
// Even when the page is far older than the fast threshold, the agent path
// must short-circuit to 0 — age-based debounce is a human-only concern.
const { jobId, delay } = computeHistoryJob(
pageAged(HISTORY_FAST_THRESHOLD + 60_000),
'agent',
NOW,
);
expect(delay).toBe(0);
expect(jobId).toBe(`${PAGE_ID}-agent`);
});
it('human edit on a YOUNG page (age < threshold) → fast interval, bare job id', () => {
const { jobId, delay } = computeHistoryJob(
pageAged(HISTORY_FAST_THRESHOLD - 1),
'user',
NOW,
);
expect(delay).toBe(HISTORY_FAST_INTERVAL);
expect(jobId).toBe(PAGE_ID);
});
it('agent edit → agent idle window (shorter), still the bare page.id job', () => {
const { jobId, delay } = computeHistoryJob(page, 'agent');
expect(delay).toBe(IDLE_INTERVAL_AGENT);
// No `-agent` suffix anymore: the agent joins the common idle pipeline.
it('human edit on an OLD page (age > threshold) → standard interval', () => {
const { jobId, delay } = computeHistoryJob(
pageAged(HISTORY_FAST_THRESHOLD + 1),
'user',
NOW,
);
expect(delay).toBe(HISTORY_INTERVAL);
expect(jobId).toBe(PAGE_ID);
});
it('agent flushes sooner than a human', () => {
expect(IDLE_INTERVAL_AGENT).toBeLessThan(IDLE_INTERVAL_USER);
it('boundary: pageAge EXACTLY === threshold takes the slow branch (the `<` is strict)', () => {
// Off-by-one guard: the condition is `pageAge < HISTORY_FAST_THRESHOLD`, so
// an age of exactly the threshold is NOT "fast" — it must use HISTORY_INTERVAL.
const { delay } = computeHistoryJob(
pageAged(HISTORY_FAST_THRESHOLD),
'user',
NOW,
);
expect(delay).toBe(HISTORY_INTERVAL);
});
it('treats any non-"agent" source string as human (keys strictly on === agent)', () => {
const { jobId, delay } = computeHistoryJob(page, 'user');
expect(delay).toBe(IDLE_INTERVAL_USER);
it('treats any non-"agent" source string as human', () => {
// resolveSource only ever yields 'agent' | 'user', but guard the contract:
// the agent branch keys strictly on === 'agent'.
const { jobId, delay } = computeHistoryJob(pageAged(0), 'user', NOW);
expect(delay).toBe(HISTORY_FAST_INTERVAL);
expect(jobId).toBe(PAGE_ID);
});
// #370 review round-1 WARNING: the max-wait ceiling prevents autosnapshot
// starvation during a continuous editing session (the trailing timer would
// otherwise re-arm forever and never fire).
describe('max-wait ceiling', () => {
const T0 = 1_000_000; // arbitrary fixed epoch for deterministic tests
it('once a burst is armed, delay clamps to the remaining max-wait budget', () => {
// 1 minute into the burst the USER interval (60m) far exceeds the remaining
// max-wait budget (10m - 1m = 9m), so the delay is clamped DOWN to that
// remaining budget — the full interval is NOT used once a ceiling applies.
const { delay } = computeHistoryJob(page, 'user', T0, T0 + 60_000);
expect(delay).toBe(IDLE_MAX_WAIT_USER - 60_000);
});
it('never waits longer than the max-wait budget from the burst start', () => {
// A store arriving right at the ceiling → delay 0 (fire promptly).
const { delay } = computeHistoryJob(
page,
'user',
T0,
T0 + IDLE_MAX_WAIT_USER,
);
expect(delay).toBe(0);
});
it('past the ceiling never returns a negative delay', () => {
const { delay } = computeHistoryJob(
page,
'user',
T0,
T0 + IDLE_MAX_WAIT_USER + 5 * 60_000,
);
expect(delay).toBe(0);
});
it('the agent ceiling is shorter than the user ceiling', () => {
expect(IDLE_MAX_WAIT_AGENT).toBeLessThan(IDLE_MAX_WAIT_USER);
const { delay } = computeHistoryJob(
page,
'agent',
T0,
T0 + IDLE_MAX_WAIT_AGENT,
);
expect(delay).toBe(0);
});
it('without a burstStart there is no ceiling (backward-compatible)', () => {
expect(computeHistoryJob(page, 'user').delay).toBe(IDLE_INTERVAL_USER);
expect(computeHistoryJob(page, 'agent').delay).toBe(IDLE_INTERVAL_AGENT);
});
});
});
describe('resolveSource (truth table)', () => {
@@ -40,12 +40,11 @@ describe('PersistenceExtension.onStoreDocument — Approach-A boundary snapshot'
let pageHistoryRepo: {
saveHistory: jest.Mock;
findPageLastHistory: jest.Mock;
updateHistoryKind: jest.Mock;
};
let aiQueue: { add: jest.Mock };
let historyQueue: { add: jest.Mock; remove: jest.Mock };
let historyQueue: { add: jest.Mock };
let notificationQueue: { add: jest.Mock };
let collabHistory: { addContributors: jest.Mock; popContributors: jest.Mock };
let collabHistory: { addContributors: jest.Mock };
let transclusionService: {
syncPageTransclusions: jest.Mock;
syncPageReferences: jest.Mock;
@@ -94,22 +93,13 @@ describe('PersistenceExtension.onStoreDocument — Approach-A boundary snapshot'
pageHistoryRepo = {
saveHistory: jest.fn().mockImplementation(async () => {
callOrder.push('saveHistory');
return { id: 'history-1' };
}),
findPageLastHistory: jest.fn().mockResolvedValue(null),
updateHistoryKind: jest.fn().mockResolvedValue(undefined),
};
aiQueue = { add: jest.fn().mockResolvedValue(undefined) };
historyQueue = {
add: jest.fn().mockResolvedValue(undefined),
// #370 — enqueuePageHistory now removes any pending idle job before re-adding.
remove: jest.fn().mockResolvedValue(undefined),
};
historyQueue = { add: jest.fn().mockResolvedValue(undefined) };
notificationQueue = { add: jest.fn().mockResolvedValue(undefined) };
collabHistory = {
addContributors: jest.fn().mockResolvedValue(undefined),
popContributors: jest.fn().mockResolvedValue([]),
};
collabHistory = { addContributors: jest.fn().mockResolvedValue(undefined) };
transclusionService = {
syncPageTransclusions: jest.fn().mockResolvedValue(undefined),
syncPageReferences: jest.fn().mockResolvedValue(undefined),
@@ -175,50 +165,6 @@ describe('PersistenceExtension.onStoreDocument — Approach-A boundary snapshot'
expect(pageRepo.updatePage.mock.calls[0][0].lastUpdatedSource).toBe('user');
});
// #370 review round-1 SUGGESTION: the boundary was GENERALIZED from a
// user→agent special-case to ANY lastUpdatedSource transition. These pin the
// generalized behaviour it was rebuilt for.
describe('generalized boundary — any source transition', () => {
// Same persisted page but with an explicit prior source.
const pageWithPriorSource = (prior: string | null) => ({
...persistedHumanPage('NEW CONTENT'),
lastUpdatedSource: prior,
});
it('agent→user transition fires the boundary (pins the prior agent revision)', async () => {
const document = ydocFor(doc('NEW CONTENT'));
pageRepo.findById.mockResolvedValue(pageWithPriorSource('agent'));
pageHistoryRepo.findPageLastHistory.mockResolvedValue(null);
await ext.onStoreDocument(buildData(document, 'user') as any);
expect(pageHistoryRepo.saveHistory).toHaveBeenCalledTimes(1);
expect(callOrder).toEqual(['saveHistory', 'updatePage']);
expect(pageRepo.updatePage.mock.calls[0][0].lastUpdatedSource).toBe('user');
});
it('git→user transition fires the boundary (git-sync overwrite is a source change)', async () => {
const document = ydocFor(doc('NEW CONTENT'));
pageRepo.findById.mockResolvedValue(pageWithPriorSource('git'));
pageHistoryRepo.findPageLastHistory.mockResolvedValue(null);
await ext.onStoreDocument(buildData(document, 'user') as any);
expect(pageHistoryRepo.saveHistory).toHaveBeenCalledTimes(1);
expect(callOrder).toEqual(['saveHistory', 'updatePage']);
});
it('a null prior source (first-ever edit) does NOT fire the boundary', async () => {
const document = ydocFor(doc('NEW CONTENT'));
pageRepo.findById.mockResolvedValue(pageWithPriorSource(null));
await ext.onStoreDocument(buildData(document, 'agent') as any);
expect(pageHistoryRepo.saveHistory).not.toHaveBeenCalled();
expect(pageRepo.updatePage).toHaveBeenCalledTimes(1);
});
});
it('idempotency: unchanged content → no updatePage, no history, no queues', async () => {
// The Y.Doc content equals the persisted content deeply → early skip.
// A Y.Doc round-trip normalizes attrs (e.g. paragraph indent), so derive
@@ -523,125 +469,4 @@ describe('PersistenceExtension.onStoreDocument — Approach-A boundary snapshot'
// Contributors keyed by the UUID so they match the PAGE_HISTORY job (page.id).
expect(collabHistory.addContributors.mock.calls[0][0]).toBe(PAGE_ID);
});
// #370 — explicit save-version (Cmd+S / agent save tool) over the stateless
// seam. The tier is derived from the SIGNED connection actor, the store path
// is reused, and promote-not-dup avoids duplicating heavy content rows.
describe('save-version (#370)', () => {
const emitSave = (document: any, actor: 'user' | 'agent') =>
ext.onStateless({
connection: {
readOnly: false,
context: { user: { id: USER_ID, name: 'Alice' }, actor },
} as any,
documentName: `page.${PAGE_ID}`,
document: document as any,
payload: JSON.stringify({ type: 'save-version' }),
} as any);
// findById returns a page whose content already equals the live doc, so the
// store path is a no-op and we isolate the versioning decision.
const pageMatchingDoc = (document: any) => ({
...persistedHumanPage('IGNORED'),
content: TiptapTransformer.fromYdoc(document, 'default'),
});
it('human save with no prior snapshot → writes a manual version + broadcasts', async () => {
const document = ydocFor(doc('VERSION ME'));
pageRepo.findById.mockResolvedValue(pageMatchingDoc(document));
pageHistoryRepo.findPageLastHistory.mockResolvedValue(null);
await emitSave(document, 'user');
expect(pageHistoryRepo.saveHistory).toHaveBeenCalledTimes(1);
expect(pageHistoryRepo.saveHistory.mock.calls[0][1]).toEqual(
expect.objectContaining({ kind: 'manual' }),
);
// The pending idle autosnapshot is cancelled by the explicit version.
expect(historyQueue.remove).toHaveBeenCalledWith(PAGE_ID);
const msg = JSON.parse(
(document as any).broadcastStateless.mock.calls[(document as any).broadcastStateless.mock.calls.length - 1][0],
);
expect(msg).toMatchObject({
type: 'version.saved',
kind: 'manual',
alreadySaved: false,
});
});
it('agent save derives kind=agent from the signed actor', async () => {
const document = ydocFor(doc('AGENT VERSION'));
pageRepo.findById.mockResolvedValue(pageMatchingDoc(document));
pageHistoryRepo.findPageLastHistory.mockResolvedValue(null);
await emitSave(document, 'agent');
expect(pageHistoryRepo.saveHistory.mock.calls[pageHistoryRepo.saveHistory.mock.calls.length - 1][1]).toEqual(
expect.objectContaining({ kind: 'agent' }),
);
});
it('promote-not-dup: latest snapshot is an autosave with identical content → upgrades in place', async () => {
const document = ydocFor(doc('SAME'));
const page = pageMatchingDoc(document);
pageRepo.findById.mockResolvedValue(page);
pageHistoryRepo.findPageLastHistory.mockResolvedValue({
id: 'auto-1',
content: page.content,
kind: 'idle',
});
await emitSave(document, 'user');
// No heavy new content row — the existing autosave is promoted to manual.
expect(pageHistoryRepo.updateHistoryKind).toHaveBeenCalledWith(
'auto-1',
'manual',
expect.anything(),
);
expect(pageHistoryRepo.saveHistory).not.toHaveBeenCalled();
const msg = JSON.parse(
(document as any).broadcastStateless.mock.calls[(document as any).broadcastStateless.mock.calls.length - 1][0],
);
expect(msg).toMatchObject({ historyId: 'auto-1', alreadySaved: false });
});
it('no-op when the latest snapshot is already a manual version of this content', async () => {
const document = ydocFor(doc('ALREADY SAVED'));
const page = pageMatchingDoc(document);
pageRepo.findById.mockResolvedValue(page);
pageHistoryRepo.findPageLastHistory.mockResolvedValue({
id: 'ver-1',
content: page.content,
kind: 'manual',
});
await emitSave(document, 'user');
expect(pageHistoryRepo.updateHistoryKind).not.toHaveBeenCalled();
expect(pageHistoryRepo.saveHistory).not.toHaveBeenCalled();
const msg = JSON.parse(
(document as any).broadcastStateless.mock.calls[(document as any).broadcastStateless.mock.calls.length - 1][0],
);
expect(msg).toMatchObject({ alreadySaved: true, kind: 'manual' });
});
it('a read-only connection cannot save a version', async () => {
const document = ydocFor(doc('READER'));
pageRepo.findById.mockResolvedValue(pageMatchingDoc(document));
await ext.onStateless({
connection: {
readOnly: true,
context: { user: { id: USER_ID }, actor: 'user' },
} as any,
documentName: `page.${PAGE_ID}`,
document: document as any,
payload: JSON.stringify({ type: 'save-version' }),
} as any);
expect(pageHistoryRepo.saveHistory).not.toHaveBeenCalled();
expect(pageHistoryRepo.updateHistoryKind).not.toHaveBeenCalled();
});
});
});
@@ -36,11 +36,9 @@ import {
import { Page } from '@docmost/db/types/entity.types';
import { CollabHistoryService } from '../services/collab-history.service';
import {
IDLE_INTERVAL_AGENT,
IDLE_INTERVAL_USER,
IDLE_MAX_WAIT_AGENT,
IDLE_MAX_WAIT_USER,
PageHistoryKind,
HISTORY_FAST_INTERVAL,
HISTORY_FAST_THRESHOLD,
HISTORY_INTERVAL,
} from '../constants';
import { TransclusionService } from '../../core/page/transclusion/transclusion.service';
import { observeCollabStore } from '../../integrations/metrics/metrics.registry';
@@ -53,16 +51,6 @@ import { observeCollabStore } from '../../integrations/metrics/metrics.registry'
*/
export const INTENTIONAL_CLEAR_MESSAGE_TYPE = 'intentional-clear';
/**
* #370 — wire format of the client→server "save a version" signal. Sent by the
* human (Cmd+S / Save button) and by the agent's explicit save tool over the
* SAME stateless channel. The intentionality tier ('manual' vs 'agent') is
* derived SERVER-SIDE from the signed connection actor, never from this
* payload, so a version's type is unforgeable. The document is taken from the
* connection (not the payload), so the signal cannot be aimed at another page.
*/
export const SAVE_VERSION_MESSAGE_TYPE = 'save-version';
/**
* #251 — how long an intentional-clear signal stays "pending" before it is
* ignored. The signal is set on the clearing keystroke but consumed by the
@@ -99,39 +87,35 @@ export function resolveSource(
}
/**
* #370 — compute the BullMQ job id + delay for a page's trailing idle-flush
* autosnapshot. Pure so the timing is unit-testable.
* Compute the BullMQ job id + delay for a page-history snapshot job. Pure so
* the data-loss-sensitive timing arithmetic is unit-testable; `now` is injected
* (caller passes `Date.now()`) for determinism.
*
* Both humans and the agent now share ONE idle pipeline (the agent's old
* `delay=0` fast path is gone — intentional agent points arrive via the
* explicit save-version signal instead). The job id is the bare `page.id`, so a
* page has at most one pending idle job; the caller removes-and-re-adds it on
* every store to keep it debounced to the trailing edge of an edit burst. The
* window differs by source only: the agent flushes sooner than a human.
* - Agent edits: delay 0 and a source-keyed job id `${page.id}-agent`. The
* delay MUST stay 0 — the worker re-reads the page row at run time, so any
* delay risks reading content a later human edit has already overwritten
* (mis-tagged snapshot). 0 minimizes that window. The `-agent` suffix keeps
* the job from coalescing with the bare-page.id human job.
* - Human edits: age-based debounce so rapid human edits coalesce into one
* snapshot; job id is the bare `page.id`.
*
* BullMQ forbids ':' in custom job ids (Redis key separator), so '-' is used;
* page.id is a UUID, so `${page.id}-agent` cannot collide with a human job.
*/
export function computeHistoryJob(
page: Pick<Page, 'id'>,
page: Pick<Page, 'id' | 'createdAt'>,
source: string,
// Epoch ms of the FIRST edit in the current burst (when the pending idle job
// was first armed). Used to enforce the max-wait ceiling so a continuous
// editing session cannot re-arm the trailing timer forever. `now` is injectable
// for tests; both default to a live clock / no ceiling when omitted.
burstStart?: number,
now: number = Date.now(),
now: number,
): { jobId: string; delay: number } {
const isAgent = source === 'agent';
const interval = isAgent ? IDLE_INTERVAL_AGENT : IDLE_INTERVAL_USER;
const maxWait = isAgent ? IDLE_MAX_WAIT_AGENT : IDLE_MAX_WAIT_USER;
let delay = interval;
if (burstStart !== undefined) {
// Time already elapsed since the burst's first edit; the snapshot must fire
// no later than `maxWait` after that, so shrink the trailing delay to the
// remaining budget (never negative, so BullMQ fires it promptly).
const remaining = burstStart + maxWait - now;
delay = Math.max(0, Math.min(interval, remaining));
}
return { jobId: page.id, delay };
const pageAge = now - new Date(page.createdAt).getTime();
const delay = isAgent
? 0
: pageAge < HISTORY_FAST_THRESHOLD
? HISTORY_FAST_INTERVAL
: HISTORY_INTERVAL;
const jobId = isAgent ? `${page.id}-agent` : page.id;
return { jobId, delay };
}
@Injectable()
@@ -143,11 +127,6 @@ export class PersistenceExtension implements Extension {
// coalescing window" per document and OR it across all edits in the window,
// so the snapshot is marked 'agent' regardless of who wrote last.
private agentTouched: Map<string, boolean> = new Map();
// #370 — epoch ms of the FIRST edit in the current idle-flush burst, per page.
// Set when the pending idle job is first armed (empty entry), read to enforce
// the max-wait ceiling in computeHistoryJob, and cleared when the idle job is
// consumed/cancelled so the next burst starts a fresh window.
private idleBurstStart: Map<string, number> = new Map();
// #251 — per-document "intentional clear pending" flags. Keyed by
// documentName, value = expiry timestamp (ms). Set by onStateless when the
// client reports a deliberate clear; consumed once by the next
@@ -347,19 +326,20 @@ export class PersistenceExtension implements Extension {
//this.logger.debug('Contributors error:' + err?.['message']);
}
// #370 — boundary snapshot on ANY source transition. When the store
// flips the page's provenance (user↔agent↔git), pin the OUTGOING
// state as its own history version BEFORE the incoming source
// overwrites it. `page` still holds the OLD content/provenance here,
// so saveHistory(page) captures the pre-transition state tagged with
// its own source, kind='boundary'. The incoming content is snapshotted
// later by the debounced idle job. Skip if the page is effectively
// empty or if the latest existing snapshot already equals this state
// (the shared isDeepStrictEqual gate — avoids duplicates). Generalizing
// beyond the old user→agent special-case also covers git-sync for free.
// Approach A — boundary snapshot before the agent's first edit.
// When this store is the agent's and the page's currently persisted
// state was authored by a human, pin that human state as its own
// history version BEFORE the agent overwrites it. `page` still holds
// the OLD content/provenance here, so saveHistory(page) captures the
// pre-agent state tagged 'user'. The agent's new content is
// snapshotted later by the debounced PAGE_HISTORY job ('agent'). Skip
// if the prior state is already agent-authored (boundary already
// pinned on the user->agent transition), if the page is effectively
// empty, or if the latest existing snapshot already equals this human
// state (avoid duplicates).
if (
page.lastUpdatedSource &&
page.lastUpdatedSource !== lastUpdatedSource
lastUpdatedSource === 'agent' &&
page.lastUpdatedSource !== 'agent'
) {
// pageHistory.pageId is uuid-typed; use page.id (never the doc-name
// slugId) so a `page.<slugId>` doc cannot throw 22P02 here (#260).
@@ -367,13 +347,15 @@ export class PersistenceExtension implements Extension {
page.id,
{ includeContent: true, trx },
);
const baselineMissing =
const humanBaselineMissing =
!lastHistory ||
!isDeepStrictEqual(lastHistory.content, page.content);
if (!isEmptyParagraphDoc(page.content as any) && baselineMissing) {
if (
!isEmptyParagraphDoc(page.content as any) &&
humanBaselineMissing
) {
await this.pageHistoryRepo.saveHistory(page, {
contributorIds: page.contributorIds ?? undefined,
kind: 'boundary',
trx,
});
}
@@ -498,14 +480,6 @@ export class PersistenceExtension implements Extension {
return; // unrelated / malformed stateless message
}
// #370 — explicit "save a version" (human Cmd+S / agent save tool). Edit
// rights are already enforced by the readOnly reject above (a reader can't
// create a version), exactly as intentional-clear requires.
if (message?.type === SAVE_VERSION_MESSAGE_TYPE) {
await this.handleSaveVersion(data);
return;
}
if (message?.type !== INTENTIONAL_CLEAR_MESSAGE_TYPE) return;
this.intentionalClear.set(
@@ -514,117 +488,6 @@ export class PersistenceExtension implements Extension {
);
}
/**
* #370 — persist an intentional version from the live in-memory ydoc.
*
* One stateless path serves BOTH the human and the agent; the tier is derived
* SERVER-SIDE from the signed connection actor ('agent' → 'agent', anything
* else → 'manual'), so the version type cannot be spoofed by the client. We
* take the fresh ydoc from the collab process memory and run it through the
* EXISTING store path first (so pages.content/ydoc reflect the exact content
* being versioned — a REST endpoint would race the up-to-10s-stale page row),
* then snapshot it into page_history with the intentional kind.
*
* Promote-not-dup: if the latest history row already holds this exact content
* and it is an autosave (idle/boundary/legacy-null), upgrade its kind in place
* instead of duplicating a heavy content row; if it is already 'manual', it is
* a no-op (the client shows an "already saved" toast). Otherwise a fresh
* version row is written, popping the aggregated contributors from Redis.
*/
private async handleSaveVersion(data: onStatelessPayload): Promise<void> {
const { connection, document, documentName } = data;
const context = connection?.context;
const pageId = getPageId(documentName);
// Unforgeable: 'agent' only for a signed agent connection, else 'manual'.
const kind: PageHistoryKind =
context?.actor === 'agent' ? 'agent' : 'manual';
// Flush the live ydoc through the normal store path so the page row + ydoc
// hold exactly what we are about to version (also fires the idle enqueue we
// supersede below, plus any source-transition boundary). onStoreDocument
// only needs document/documentName/context.
await this.onStoreDocument({
document,
documentName,
context,
} as onStoreDocumentPayload);
let result:
| { historyId: string; kind: PageHistoryKind; alreadySaved: boolean }
| undefined;
await executeTx(this.db, async (trx) => {
const page = await this.pageRepo.findById(pageId, {
withLock: true,
includeContent: true,
trx,
});
if (!page) return;
// Never version an effectively-empty page (mirrors the processor's
// first-history guard); there is nothing intentional to pin.
if (isEmptyParagraphDoc(page.content as any)) return;
const lastHistory = await this.pageHistoryRepo.findPageLastHistory(
page.id,
{ includeContent: true, trx },
);
if (
lastHistory &&
isDeepStrictEqual(lastHistory.content, page.content)
) {
// Content is already snapshotted. Promote-not-dup.
if (lastHistory.kind === 'manual') {
result = {
historyId: lastHistory.id,
kind: 'manual',
alreadySaved: true,
};
return;
}
await this.pageHistoryRepo.updateHistoryKind(
lastHistory.id,
kind,
trx,
);
result = { historyId: lastHistory.id, kind, alreadySaved: false };
return;
}
// Fresh version row. Pop the contributors aggregated since the last
// snapshot (SPOP); restore them if the write fails so they aren't lost.
const contributorIds = await this.collabHistory.popContributors(page.id);
try {
const saved = await this.pageHistoryRepo.saveHistory(page, {
contributorIds,
kind,
trx,
});
result = { historyId: saved.id, kind, alreadySaved: false };
} catch (err) {
await this.collabHistory.addContributors(page.id, contributorIds);
throw err;
}
});
// Housekeeping: this explicit version supersedes the page's pending idle
// autosnapshot, so cancel it (delayed job → remove() just deletes it) and
// end the current idle burst so the next edit starts a fresh max-wait window.
await this.historyQueue.remove(pageId).catch(() => undefined);
this.idleBurstStart.delete(pageId);
if (result) {
document.broadcastStateless(
JSON.stringify({
type: 'version.saved',
historyId: result.historyId,
kind: result.kind,
alreadySaved: result.alreadySaved,
}),
);
}
}
async onChange(data: onChangePayload) {
const documentName = data.documentName;
const userId = data.context?.user?.id;
@@ -682,45 +545,17 @@ export class PersistenceExtension implements Extension {
page: Page,
lastUpdatedSource: string,
): Promise<void> {
// #370 — trailing idle debounce with a max-wait ceiling. One pending idle
// job per page (jobId = page.id); on every store we remove the pending
// delayed job and re-add it, so the snapshot lands `delay` after edits go
// quiet rather than once per store (precedent: workspace.service.ts).
// remove() on a delayed job simply deletes it (0 if absent, no throw); if the
// job is already ACTIVE and the remove is a no-op, the add still de-dups and
// the processor's isDeepStrictEqual gate collapses the duplicate content.
//
// The FIRST arm of a burst records `burstStart`; computeHistoryJob shrinks
// the delay to the remaining max-wait budget from that point, so a continuous
// session cannot re-arm the trailing timer forever and starve the snapshot.
// A burst marker older than THIS TIER's max-wait means the previous idle job
// has already fired — start a fresh window instead of firing immediately on
// the next edit. Must use the SAME source-specific max-wait computeHistoryJob
// uses (agent 5m / user 10m): a hardcoded USER ceiling would leave an agent
// burst's marker stale for 5..10m, forcing delay=0 on every store in that
// window and writing one idle row per store — exactly the per-store bloat the
// debounce exists to prevent, on the continuous-agent path.
const maxWait =
lastUpdatedSource === 'agent' ? IDLE_MAX_WAIT_AGENT : IDLE_MAX_WAIT_USER;
const now = Date.now();
let burstStart = this.idleBurstStart.get(page.id);
if (burstStart === undefined || now - burstStart >= maxWait) {
burstStart = now;
this.idleBurstStart.set(page.id, burstStart);
}
// Job id + delay arithmetic lives in the pure `computeHistoryJob` (see its
// doc comment for the agent-delay-0 / age-based-debounce invariants).
const { jobId, delay } = computeHistoryJob(
page,
lastUpdatedSource,
burstStart,
now,
Date.now(),
);
await this.historyQueue.remove(jobId).catch(() => undefined);
await this.historyQueue.add(
QueueJob.PAGE_HISTORY,
{ pageId: page.id, kind: 'idle' } as IPageHistoryJob,
{ pageId: page.id } as IPageHistoryJob,
{ jobId, delay },
);
}
@@ -66,15 +66,6 @@ describe('HistoryProcessor.process', () => {
notificationQueue = { add: jest.fn().mockResolvedValue(undefined) };
generalQueue = { add: jest.fn().mockResolvedValue(undefined) };
// #370 F3 — the processor now serializes its find+save under a page-row lock
// via executeTx. A db whose transaction().execute(fn) runs fn with a trx stub
// drives the real executeTx() helper without a database.
const db = {
transaction: () => ({
execute: (fn: (trx: any) => Promise<any>) => fn({ __trx: true }),
}),
};
// WorkerHost's constructor reads `this.worker`; passing repos positionally
// matches the constructor and avoids the Nest DI container.
proc = new HistoryProcessor(
@@ -82,7 +73,6 @@ describe('HistoryProcessor.process', () => {
pageRepo as any,
collabHistory as any,
watcherService as any,
db as any,
notificationQueue as any,
generalQueue as any,
);
@@ -136,26 +126,15 @@ describe('HistoryProcessor.process', () => {
await proc.process(buildJob());
expect(collabHistory.popContributors).toHaveBeenCalledWith(PAGE_ID);
// #370 F3/F9 — the snapshot decision runs under a page-row lock. Pin the lock
// structurally so a refactor that drops withLock/trx (silently reintroducing
// the TOCTOU double-insert) turns this red. The tx stub is { __trx: true }.
expect(pageRepo.findById).toHaveBeenCalledWith(
PAGE_ID,
expect.objectContaining({ withLock: true, trx: { __trx: true } }),
);
// #370 F7 — addPageWatchers MUST receive the trx, or its FK-check runs on a
// separate connection and self-deadlocks against our FOR UPDATE. Asserting
// the trx arg here is exactly what would have caught that regression.
expect(watcherService.addPageWatchers).toHaveBeenCalledWith(
['u1', 'u2'],
PAGE_ID,
SPACE_ID,
WORKSPACE_ID,
{ __trx: true },
);
expect(pageHistoryRepo.saveHistory).toHaveBeenCalledWith(
expect.objectContaining({ id: PAGE_ID }),
{ contributorIds: ['u1', 'u2'], kind: 'idle', trx: { __trx: true } },
{ contributorIds: ['u1', 'u2'] },
);
expect(generalQueue.add).toHaveBeenCalledWith(
QueueJob.PAGE_BACKLINKS,
@@ -207,48 +186,6 @@ describe('HistoryProcessor.process', () => {
]);
});
it('COMMIT failure (throw outside the tx callback) → contributors RESTORED', async () => {
// #370 F8 — a commit-time failure throws OUTSIDE the callback, so the inner
// try/catch does not run; the outer catch must restore the popped set (else a
// BullMQ retry writes an unattributed version). Use a db whose execute() runs
// the callback THEN throws, simulating a commit abort.
pageHistoryRepo.findPageLastHistory.mockResolvedValue({
content: { type: 'doc', content: [] },
});
const commitFail = {
transaction: () => ({
execute: async (fn: (trx: any) => Promise<any>) => {
await fn({ __trx: true }); // callback succeeds (saveHistory ok)
throw new Error('commit aborted'); // ...but the COMMIT fails
},
}),
};
const procCommitFail = new HistoryProcessor(
pageHistoryRepo as any,
pageRepo as any,
collabHistory as any,
watcherService as any,
commitFail as any,
notificationQueue as any,
generalQueue as any,
);
jest
.spyOn(procCommitFail['logger'], 'error')
.mockImplementation(() => undefined);
await expect(procCommitFail.process(buildJob())).rejects.toThrow(
'commit aborted',
);
// The inner catch did NOT run (save succeeded), so only the outer catch can
// restore — assert it did.
expect(collabHistory.addContributors).toHaveBeenCalledWith(PAGE_ID, [
'u1',
'u2',
]);
// And the post-snapshot queue work must NOT have run (we rethrew).
expect(generalQueue.add).not.toHaveBeenCalled();
});
it('backlinks + notification queue failures are swallowed (history still committed)', async () => {
pageHistoryRepo.findPageLastHistory.mockResolvedValue({
content: { type: 'doc', content: [] },
@@ -19,9 +19,6 @@ import { isDeepStrictEqual } from 'node:util';
import { CollabHistoryService } from '../services/collab-history.service';
import { WatcherService } from '../../core/watcher/watcher.service';
import { isEmptyParagraphDoc } from '../collaboration.util';
import { InjectKysely } from 'nestjs-kysely';
import { KyselyDB } from '@docmost/db/types/kysely.types';
import { executeTx } from '@docmost/db/utils';
@Processor(QueueName.HISTORY_QUEUE)
export class HistoryProcessor extends WorkerHost implements OnModuleDestroy {
@@ -32,7 +29,6 @@ export class HistoryProcessor extends WorkerHost implements OnModuleDestroy {
private readonly pageRepo: PageRepo,
private readonly collabHistory: CollabHistoryService,
private readonly watcherService: WatcherService,
@InjectKysely() private readonly db: KyselyDB,
@InjectQueue(QueueName.NOTIFICATION_QUEUE) private notificationQueue: Queue,
@InjectQueue(QueueName.GENERAL_QUEUE) private generalQueue: Queue,
) {
@@ -45,9 +41,6 @@ export class HistoryProcessor extends WorkerHost implements OnModuleDestroy {
try {
const { pageId } = job.data;
// Read the page WITHOUT a lock first, only to bail early on the two cheap
// no-write cases (page gone / empty first snapshot) without opening a
// transaction. The authoritative check-then-write happens locked below.
const page = await this.pageRepo.findById(pageId, {
includeContent: true,
});
@@ -58,109 +51,40 @@ export class HistoryProcessor extends WorkerHost implements OnModuleDestroy {
return;
}
// #370 F3 — the snapshot decision (findPageLastHistory → saveHistory) must
// be serialized against manual-save/boundary writers, which run under a
// page-row lock in onStoreDocument. Without it, this processor and a
// concurrent manual-save each read the same lastHistory (MVCC), both see
// content != lastHistory, and both insert — producing two page_history rows
// with IDENTICAL content (one 'idle', one 'manual'), defeating
// promote-not-dup and the version-vs-autosave split. Taking the same
// page-row lock makes the second writer observe the first's committed row so
// the isDeepStrictEqual gate collapses the duplicate. Only the read+write
// is transacted; the post-snapshot queue work stays outside.
let contributorIds: string[] = [];
let snapshotWritten = false;
let lastHistoryContent: unknown;
// #370 F8 — the contributor set popped from Redis (destructive SPOP) must be
// restored if the snapshot does not durably land. The inner try/catch only
// covers a throw INSIDE the callback; a COMMIT failure (connection drop,
// serialization/deadlock abort on commit — the transient class the epic
// already retries) throws OUTSIDE it, rolling the snapshot back while the
// pop is already gone. We track the popped set here and restore it in the
// outer catch so a BullMQ retry re-attributes the version. addContributors
// is an idempotent Redis SADD, so a double-restore is harmless.
let poppedForRestore: string[] = [];
const lastHistory = await this.pageHistoryRepo.findPageLastHistory(
pageId,
{ includeContent: true },
);
try {
await executeTx(this.db, async (trx) => {
const lockedPage = await this.pageRepo.findById(pageId, {
includeContent: true,
withLock: true,
trx,
});
if (!lockedPage) return;
const lastHistory = await this.pageHistoryRepo.findPageLastHistory(
pageId,
{ includeContent: true, trx },
);
lastHistoryContent = lastHistory?.content;
if (!lastHistory && isEmptyParagraphDoc(lockedPage.content as any)) {
this.logger.debug(
`Skipping first history for page ${pageId}: empty content`,
);
return;
}
if (
lastHistory &&
isDeepStrictEqual(lastHistory.content, lockedPage.content)
) {
return; // already snapshotted at this content — nothing to write
}
contributorIds = await this.collabHistory.popContributors(pageId);
poppedForRestore = contributorIds;
try {
// Pass `trx` so the watcher insert's FK check (FOR KEY SHARE on
// pages[pageId]) runs on the SAME connection that already holds the
// FOR UPDATE lock from findById — otherwise it takes the FK lock on a
// separate pool connection and self-deadlocks against our own tx.
await this.watcherService.addPageWatchers(
contributorIds,
pageId,
lockedPage.spaceId,
lockedPage.workspaceId,
trx,
);
// #370 — every job on this queue is a trailing idle-flush autosnapshot.
await this.pageHistoryRepo.saveHistory(lockedPage, {
contributorIds,
kind: job.data.kind ?? 'idle',
trx,
});
snapshotWritten = true;
this.logger.debug(`History created for page: ${pageId}`);
} catch (err) {
await this.collabHistory.addContributors(pageId, contributorIds);
poppedForRestore = [];
throw err;
}
});
} catch (err) {
// A throw here means the tx did NOT commit (callback threw, or the commit
// itself failed and rolled back). If we popped contributors and the inner
// catch did not already restore them, restore now so the retry keeps
// attribution. snapshotWritten is irrelevant: it is set before commit, so
// it can be true even when the commit rolled the snapshot back.
if (poppedForRestore.length) {
await this.collabHistory.addContributors(pageId, poppedForRestore);
}
throw err;
}
// No snapshot written (page vanished / empty-first / unchanged content) →
// clear the contributor set for the skip cases and stop.
if (!snapshotWritten) {
if (!lastHistoryContent && isEmptyParagraphDoc(page.content as any)) {
await this.collabHistory.clearContributors(pageId);
}
if (!lastHistory && isEmptyParagraphDoc(page.content as any)) {
this.logger.debug(
`Skipping first history for page ${pageId}: empty content`,
);
await this.collabHistory.clearContributors(pageId);
return;
}
{
if (
!lastHistory ||
!isDeepStrictEqual(lastHistory.content, page.content)
) {
const contributorIds = await this.collabHistory.popContributors(pageId);
try {
await this.watcherService.addPageWatchers(
contributorIds,
pageId,
page.spaceId,
page.workspaceId,
);
await this.pageHistoryRepo.saveHistory(page, { contributorIds });
this.logger.debug(`History created for page: ${pageId}`);
} catch (err) {
await this.collabHistory.addContributors(pageId, contributorIds);
throw err;
}
const mentions = extractMentions(page.content);
const pageMentions = extractPageMentions(mentions);
const internalLinkSlugIds = extractInternalLinkSlugIds(page.content);
@@ -178,7 +102,7 @@ export class HistoryProcessor extends WorkerHost implements OnModuleDestroy {
);
});
if (contributorIds.length > 0 && lastHistoryContent) {
if (contributorIds.length > 0 && lastHistory?.content) {
await this.notificationQueue
.add(QueueJob.PAGE_UPDATED, {
pageId,
@@ -52,7 +52,9 @@ import {
INTERNAL_LINK_REGEX,
extractPageSlugId,
} from '../../../integrations/export/utils';
import { markdownToHtml, canonicalizeFootnotes } from '@docmost/editor-ext';
import { canonicalizeFootnotes } from '@docmost/editor-ext';
import { markdownToProseMirror } from '@docmost/prosemirror-markdown';
import { normalizeForeignMarkdown } from '../../../integrations/import/utils/foreign-markdown';
import { WatcherService } from '../../watcher/watcher.service';
import { sql } from 'kysely';
import { TransclusionService } from '../transclusion/transclusion.service';
@@ -1301,8 +1303,14 @@ export class PageService {
switch (format) {
case 'markdown': {
const html = await markdownToHtml(content as string);
prosemirrorJson = htmlToJson(html as string);
// Canonical markdown -> ProseMirror JSON directly via
// `@docmost/prosemirror-markdown` (issue #345) — no HTML intermediate,
// no editor-ext markdown layer. Foreign markdown surfaces the strict
// parser rejects (GFM `[^id]` reference footnotes) are normalized to the
// canonical inline form first.
prosemirrorJson = await markdownToProseMirror(
normalizeForeignMarkdown(content as string),
);
break;
}
case 'html': {
@@ -1,27 +0,0 @@
import { type Kysely } from 'kysely';
/**
* #370 — page-versioning intentionality tier on a history snapshot.
*
* Adds `page_history.kind`, the three-tier "how intentional was this snapshot"
* marker that lets versions (intentional points) be told apart from autosaves:
* - 'manual' — a human explicitly saved a version (Cmd+S / Save button)
* - 'agent' — the AI agent explicitly saved a version
* - 'idle' — trailing idle-flush autosnapshot (safety net)
* - 'boundary' — autosnapshot pinned on a source transition (user↔agent↔git)
*
* Nullable with NO default (mirrors last_updated_source in the agent-provenance
* migration): legacy rows predate the marker and read back as `null`, which the
* client renders as a plain autosave. Stored as a short varchar to stay
* forward-compatible without an enum migration.
*/
export async function up(db: Kysely<any>): Promise<void> {
await db.schema
.alterTable('page_history')
.addColumn('kind', 'varchar(20)', (col) => col)
.execute();
}
export async function down(db: Kysely<any>): Promise<void> {
await db.schema.alterTable('page_history').dropColumn('kind').execute();
}
@@ -13,7 +13,6 @@ import { jsonArrayFrom, jsonObjectFrom } from 'kysely/helpers/postgres';
import { ExpressionBuilder, sql } from 'kysely';
import { DB } from '@docmost/db/types/db';
import { resolveAgentProvenance } from '../agent-provenance';
import { PageHistoryKind } from '../../../collaboration/constants';
/**
* Role-resolution subquery for a page-history row's bound AI chat (#300). Joins
@@ -47,9 +46,6 @@ export class PageHistoryRepo {
'lastUpdatedById',
'lastUpdatedSource',
'lastUpdatedAiChatId',
// #370 — intentionality tier ('manual' | 'agent' | 'idle' | 'boundary');
// null on legacy rows (= autosave). Selected so callers can read/promote it.
'kind',
'contributorIds',
'spaceId',
'workspaceId',
@@ -89,15 +85,9 @@ export class PageHistoryRepo {
async saveHistory(
page: Page,
opts?: {
contributorIds?: string[];
// #370 — intentionality tier for this snapshot. Omitted → null (legacy
// autosave semantics). Callers derive it server-side, never from a client.
kind?: PageHistoryKind;
trx?: KyselyTransaction;
},
): Promise<PageHistory> {
return await this.insertPageHistory(
opts?: { contributorIds?: string[]; trx?: KyselyTransaction },
): Promise<void> {
await this.insertPageHistory(
{
pageId: page.id,
slugId: page.slugId,
@@ -109,7 +99,6 @@ export class PageHistoryRepo {
// Copy the provenance marker off the page row, as for lastUpdatedById.
lastUpdatedSource: page.lastUpdatedSource,
lastUpdatedAiChatId: page.lastUpdatedAiChatId,
kind: opts?.kind ?? null,
contributorIds: opts?.contributorIds,
spaceId: page.spaceId,
workspaceId: page.workspaceId,
@@ -118,25 +107,6 @@ export class PageHistoryRepo {
);
}
/**
* #370 — promote an existing snapshot's intentionality tier in place. Used by
* the manual-save "promote-not-dup" path: when the latest history row already
* holds the exact content being versioned, we upgrade its `kind` instead of
* duplicating a heavy content row.
*/
async updateHistoryKind(
pageHistoryId: string,
kind: PageHistoryKind,
trx?: KyselyTransaction,
): Promise<void> {
const db = dbOrTx(this.db, trx);
await db
.updateTable('pageHistory')
.set({ kind })
.where('id', '=', pageHistoryId)
.execute();
}
async findPageHistoryByPageId(pageId: string, pagination: PaginationOptions) {
const query = this.db
.selectFrom('pageHistory')
-1
View File
@@ -280,7 +280,6 @@ export interface PageHistory {
createdAt: Generated<Timestamp>;
icon: string | null;
id: Generated<string>;
kind: string | null;
lastUpdatedAiChatId: string | null;
lastUpdatedById: string | null;
lastUpdatedSource: string | null;
@@ -0,0 +1,145 @@
// export.service.ts imports the ESM-only @sindresorhus/slugify (not in jest's
// transform allowlist). It is irrelevant to the markdown-serialization path under
// test (only used for page-mention link slugs on the DB path), so it is mocked
// out to keep the module graph loadable under ts-jest (mirrors the import specs).
jest.mock('@sindresorhus/slugify', () => ({
__esModule: true,
default: (input: string) => String(input),
}));
import { convertProseMirrorToMarkdown } from '@docmost/prosemirror-markdown';
import { ExportService } from './export.service';
import { ExportFormat } from './dto/export-dto';
/**
* STEP 1 golden test for issue #345: server MARKDOWN export runs DIRECTLY through
* the canonical converter (`convertProseMirrorToMarkdown`) no HTML intermediate
* and no `@docmost/editor-ext` markdown layer so the emitted markdown is in the
* canonical package forms and is byte-identical to the git-sync vault body.
*
* These are the goldens the swap has to satisfy: they assert the CANONICAL
* surface (callout `> [!type]`, inline footnote `^[…]`, lossless image
* `<!--img …-->`) rather than the old editor-ext forms (`:::type`, `[^id]`,
* lossy `![alt](src)`).
*
* `exportPage(..., singlePage=false)` takes no DB path (no mention rewriting), so
* the service is constructed with null collaborators and only the pure
* PM -> Markdown path is exercised.
*/
function makeService(): ExportService {
return new ExportService(
null as any, // pageRepo
null as any, // pagePermissionRepo
null as any, // db
null as any, // storageService
null as any, // environmentService
null as any, // domainService
);
}
// A representative page exercising the node types whose canonical markdown form
// changed with the move off the editor-ext layer: callout, inline footnote, and a
// lossless image carrying width/align attrs that the old layer dropped.
const REPRESENTATIVE_DOC = {
type: 'doc',
content: [
{
type: 'paragraph',
content: [
{ type: 'text', text: 'Body ' },
{ type: 'footnoteReference', attrs: { id: 'fn-1' } },
{ type: 'text', text: ' end.' },
],
},
{
type: 'callout',
attrs: { type: 'info', icon: null },
content: [
{
type: 'paragraph',
content: [{ type: 'text', text: 'Heads up' }],
},
],
},
{
type: 'image',
attrs: {
src: '/files/pic.png',
alt: 'Pic',
width: 320,
align: 'left',
},
},
{
type: 'footnotesList',
content: [
{
type: 'footnoteDefinition',
attrs: { id: 'fn-1' },
content: [
{
type: 'paragraph',
content: [{ type: 'text', text: 'the note' }],
},
],
},
],
},
],
};
describe('ExportService — markdown export via the canonical converter (#345)', () => {
it('emits canonical callout, inline footnote and lossless image forms', async () => {
const service = makeService();
const md = (await service.exportPage(ExportFormat.Markdown, {
title: '',
content: REPRESENTATIVE_DOC,
} as any)) as string;
// Callout: Obsidian `> [!type]`, NOT the legacy `:::type`.
expect(md).toContain('> [!info]');
expect(md).not.toContain(':::');
// Inline footnote: `^[…]`, NOT the reference `[^id]` form.
expect(md).toContain('^[the note]');
expect(md).not.toMatch(/\[\^/);
// Lossless image: trailing `<!--img …-->` carrying the dropped attrs.
expect(md).toContain('![Pic](/files/pic.png)');
expect(md).toContain('<!--img');
expect(md).toContain('"width":"320"');
expect(md).toContain('"align":"left"');
});
it('export body is byte-identical to the git-sync vault serializer (export == vault)', async () => {
const service = makeService();
// A title-less page: exportPage prepends NO heading, so the whole output is
// the page BODY — exactly what git-sync serializes (git-sync stores the title
// in frontmatter / the filename, never as an in-body H1).
const exported = (await service.exportPage(ExportFormat.Markdown, {
title: '',
content: REPRESENTATIVE_DOC,
} as any)) as string;
// The git-sync vault writer feeds this SAME converter (git-sync
// `stabilizePageBody` = convertProseMirrorToMarkdown(content) at the
// fixpoint). For an already-stable doc the single pass IS the fixpoint, so
// the two are byte-identical by construction — assert it.
const vaultBody = convertProseMirrorToMarkdown(REPRESENTATIVE_DOC);
expect(exported).toBe(vaultBody);
});
it('prepends the page title as an H1 heading (the one documented export/vault delta)', async () => {
const service = makeService();
const md = (await service.exportPage(ExportFormat.Markdown, {
title: 'My Page',
content: { type: 'doc', content: [] },
} as any)) as string;
// Export makes standalone files, so it prepends the title as an H1. This is
// the ONE deliberate difference from the vault body (which carries the title
// in frontmatter). The body below the heading still serializes canonically.
expect(md.startsWith('# My Page')).toBe(true);
});
});
@@ -37,7 +37,7 @@ import {
getAttachmentIds,
getProsemirrorContent,
} from '../../common/helpers/prosemirror/utils';
import { htmlToMarkdown } from '@docmost/editor-ext';
import { convertProseMirrorToMarkdown } from '@docmost/prosemirror-markdown';
type AllowedAttachment = { id: string; fileName: string; filePath: string };
@@ -79,9 +79,8 @@ export class ExportService {
prosemirrorJson.content.unshift(titleNode);
}
const pageHtml = jsonToHtml(prosemirrorJson);
if (format === ExportFormat.HTML) {
const pageHtml = jsonToHtml(prosemirrorJson);
return `<!DOCTYPE html>
<html>
<head>
@@ -92,11 +91,14 @@ export class ExportService {
}
if (format === ExportFormat.Markdown) {
const newPageHtml = pageHtml.replace(
/<colgroup[^>]*>[\s\S]*?<\/colgroup>/gim,
'',
);
return htmlToMarkdown(newPageHtml);
// Direct ProseMirror JSON -> Markdown via the canonical converter
// (`@docmost/prosemirror-markdown`). This is the SAME serializer the
// git-sync vault writer feeds (see git-sync `stabilizePageBody`), so an
// exported page body is byte-identical to its vault representation — no
// HTML intermediate, no second markdown layer, no format drift (issue
// #345). The old `<colgroup>` scrub is gone with the HTML step: the
// converter emits GFM tables directly and never produces `<colgroup>`.
return convertProseMirrorToMarkdown(prosemirrorJson);
}
return;
@@ -17,6 +17,22 @@ jest.mock('image-dimensions', () => ({
__esModule: true,
imageDimensionsFromData: () => undefined,
}));
// FileImportTaskService -> PageService -> collaboration.gateway ->
// metrics.registry imports `prom-client`, which is not resolvable in this
// workspace's node_modules (types-only stub, no runtime entry). Metrics are
// disabled on this path, so a virtual no-op mock keeps the module graph loadable.
jest.mock(
'prom-client',
() => ({
collectDefaultMetrics: () => undefined,
Registry: class {},
Histogram: class {},
Gauge: class {},
Counter: class {},
Summary: class {},
}),
{ virtual: true },
);
import { promises as fs } from 'fs';
import * as os from 'os';
@@ -26,14 +42,17 @@ import { ImportService } from './import.service';
/**
* Binding test for issue #228 / review #5: FileImportTaskService.processGenericImport
* is a NON-editor write path (markdownToHtml -> processHTML -> JSON, never runs
* footnoteSyncPlugin), so it canonicalizes footnotes before persisting. This pins
* that binding the same one import.service has a spec for which previously had
* NO spec at all.
* is a NON-editor write path, so a zip-imported `.md` page ends up with canonical
* footnotes before persisting: ordered by first reference, reused refs deduped,
* orphan definitions dropped.
*
* The markdown -> HTML -> ProseMirror conversion is REAL (a real ImportService,
* its createYdoc stubbed); the filesystem is a real temp dir with one .md file;
* the DB transaction is stubbed to capture the persisted page content.
* Since #345 the `.md` parse runs `normalizeForeignMarkdown` ->
* `markdownToProseMirror` -> `jsonToHtml` (feeding the shared HTML attachment /
* link pipeline) -> `processHTML` -> `canonicalizeFootnotes`. The parser assigns
* fresh `fn-*` ids, so we assert by definition BODY order rather than the source
* labels. The conversion is REAL (a real ImportService, its createYdoc stubbed);
* the filesystem is a real temp dir with one .md file; the DB transaction is
* stubbed to capture the persisted page content.
*/
// Out-of-order references (c, a, b), a REUSED reference ([^a] twice), and an
@@ -49,13 +68,14 @@ const MARKDOWN = [
'[^z]: orphan note',
].join('\n');
function footnoteListIds(content: any): string[] {
/** Definition body texts of the (single) footnotesList, in list order. */
function footnoteListBodies(content: any): string[] {
const list = (content?.content ?? []).find(
(n: any) => n.type === 'footnotesList',
);
return (list?.content ?? [])
.filter((n: any) => n.type === 'footnoteDefinition')
.map((n: any) => n.attrs?.id);
.map((n: any) => n.content?.[0]?.content?.[0]?.text);
}
// A permissive chainable stub for the spaces lookup (selectFrom(...).select(...)
@@ -71,80 +91,127 @@ function chainable(result: any): any {
return proxy;
}
/**
* Run one markdown file through the REAL zip-import pipeline
* (`processGenericImport` -> `markdownToProseMirror` -> `jsonToHtml` ->
* `processHTML`/`htmlToJson`) and return the persisted page `content`. This is
* the server-specific PM->HTML->PM hop that the package's own PM<->MD tests do
* NOT cover.
*/
async function runZipImport(markdown: string): Promise<any> {
const extractDir = await fs.mkdtemp(path.join(os.tmpdir(), 'fit-canon-'));
await fs.writeFile(path.join(extractDir, 'note.md'), markdown, 'utf-8');
const importService = new ImportService(
{} as any,
{} as any,
{} as any,
{} as any,
);
jest
.spyOn(importService as any, 'createYdoc')
.mockResolvedValue(Buffer.from([]) as any);
let captured: any = null;
const trx = {
insertInto: (table: string) => ({
values: (v: any) => {
if (table === 'pages') captured = v;
return { execute: async () => {} };
},
}),
};
const db: any = {
selectFrom: () => chainable({ slug: 'space-slug' }),
transaction: () => ({ execute: (fn: any) => fn(trx) }),
};
const importAttachmentService = {
processAttachments: async ({ html }: any) => html,
};
const service = new FileImportTaskService(
{} as any, // storageService
importService as any,
{ nextPagePosition: async () => 'a0' } as any,
{ insertBacklink: jest.fn() } as any,
db,
importAttachmentService as any,
{ emit: jest.fn() } as any,
{ logBatchWithContext: jest.fn() } as any,
);
const fileTask: any = {
id: 'task-1',
source: 'generic',
spaceId: 'space-1',
workspaceId: 'ws-1',
creatorId: 'user-1',
};
try {
await service.processGenericImport({ extractDir, fileTask });
expect(captured).toBeTruthy();
return captured.content;
} finally {
await fs.rm(extractDir, { recursive: true, force: true });
}
}
/** Find the first node of a given type anywhere in a PM content tree. */
function findFirst(node: any, type: string): any {
if (!node || typeof node !== 'object') return null;
if (node.type === type) return node;
for (const child of node.content ?? []) {
const hit = findFirst(child, type);
if (hit) return hit;
}
return null;
}
describe('FileImportTaskService.processGenericImport — footnote canonicalization (#228)', () => {
it('orders footnotes by first reference, dedupes reuse, and drops orphans on zip import', async () => {
const extractDir = await fs.mkdtemp(path.join(os.tmpdir(), 'fit-canon-'));
await fs.writeFile(path.join(extractDir, 'note.md'), MARKDOWN, 'utf-8');
// Real ImportService for the html -> JSON conversion; stub the yjs encode.
const importService = new ImportService(
{} as any,
{} as any,
{} as any,
{} as any,
const content = await runZipImport(MARKDOWN);
// Definitions ordered by FIRST REFERENCE (C, A, B), NOT the markdown
// definition order (A, B, C). Ids are the parser's fresh `fn-*`, so pin
// the BODIES.
expect(footnoteListBodies(content)).toEqual(['note C', 'note A', 'note B']);
// Orphan [^z] dropped; reused [^a] collapses to one definition; one list.
expect(footnoteListBodies(content)).not.toContain('orphan note');
const lists = (content.content ?? []).filter(
(n: any) => n.type === 'footnotesList',
);
jest
.spyOn(importService as any, 'createYdoc')
.mockResolvedValue(Buffer.from([]) as any);
expect(lists).toHaveLength(1);
expect(
footnoteListBodies(content).filter((b) => b === 'note A'),
).toHaveLength(1);
});
let captured: any = null;
const trx = {
insertInto: (table: string) => ({
values: (v: any) => {
if (table === 'pages') captured = v;
return { execute: async () => {} };
},
}),
};
const db: any = {
selectFrom: () => chainable({ slug: 'space-slug' }),
transaction: () => ({ execute: (fn: any) => fn(trx) }),
};
// #345 F4: the zip path routes markdown through jsonToHtml -> processHTML ->
// htmlToJson (the shared HTML attachment pipeline). #345's headline is LOSSLESS
// image width/align via the `<!--img {...}-->` comment; a callout carries its
// `type`. This asserts those survive the PM->HTML->PM hop — the one hop the
// package's PM<->MD suite does not exercise.
it('preserves image width/align and callout type through the PM->HTML->PM hop', async () => {
const md = [
'# Doc',
'',
'![a picture](https://example.com/i.png) <!--img {"width":"320","align":"left"}-->',
'',
':::warning',
'Careful now.',
':::',
].join('\n');
const importAttachmentService = {
processAttachments: async ({ html }: any) => html,
};
const backlinkRepo = { insertBacklink: jest.fn() };
const eventEmitter = { emit: jest.fn() };
const auditService = { logBatchWithContext: jest.fn() };
const content = await runZipImport(md);
const pageService = { nextPagePosition: async () => 'a0' };
const image = findFirst(content, 'image');
expect(image).toBeTruthy();
// The lossless sizing/alignment must survive the HTML hop.
expect(String(image.attrs?.width)).toBe('320');
expect(image.attrs?.align).toBe('left');
const service = new FileImportTaskService(
{} as any, // storageService
importService as any,
pageService as any,
backlinkRepo as any,
db,
importAttachmentService as any,
eventEmitter as any,
auditService as any,
);
const fileTask: any = {
id: 'task-1',
source: 'generic',
spaceId: 'space-1',
workspaceId: 'ws-1',
creatorId: 'user-1',
};
try {
await service.processGenericImport({ extractDir, fileTask });
expect(captured).toBeTruthy();
const content = captured.content;
// Reference order is c, a, b (NOT the markdown definition order a, b, c).
expect(footnoteListIds(content)).toEqual(['c', 'a', 'b']);
// Orphan [^z] dropped; reused [^a] collapses to one definition; one list.
expect(footnoteListIds(content)).not.toContain('z');
const lists = (content.content ?? []).filter(
(n: any) => n.type === 'footnotesList',
);
expect(lists).toHaveLength(1);
expect(footnoteListIds(content).filter((id) => id === 'a')).toHaveLength(1);
} finally {
await fs.rm(extractDir, { recursive: true, force: true });
}
const callout = findFirst(content, 'callout');
expect(callout).toBeTruthy();
expect(callout.attrs?.type).toBe('warning');
});
});
@@ -1,6 +1,9 @@
import { Inject, Injectable, Logger } from '@nestjs/common';
import * as path from 'path';
import { jsonToText } from '../../../collaboration/collaboration.util';
import {
jsonToHtml,
jsonToText,
} from '../../../collaboration/collaboration.util';
import { InjectKysely } from 'nestjs-kysely';
import { KyselyDB } from '@docmost/db/types/kysely.types';
import {
@@ -18,9 +21,11 @@ import { generateSlugId } from '../../../common/helpers';
import { v7 } from 'uuid';
import { generateJitteredKeyBetween } from 'fractional-indexing-jittered';
import { FileTask, InsertablePage } from '@docmost/db/types/entity.types';
import { markdownToHtml, canonicalizeFootnotes } from '@docmost/editor-ext';
import { canonicalizeFootnotes } from '@docmost/editor-ext';
import { markdownToProseMirror } from '@docmost/prosemirror-markdown';
import { getProsemirrorContent } from '../../../common/helpers/prosemirror/utils';
import { formatImportHtml } from '../utils/import-formatter';
import { normalizeForeignMarkdown } from '../utils/foreign-markdown';
import {
buildAttachmentCandidates,
collectMarkdownAndHtmlFiles,
@@ -461,7 +466,18 @@ export class FileImportTaskService {
content = await fs.readFile(absPath, 'utf-8');
if (page.fileExtension.toLowerCase() === '.md') {
content = await markdownToHtml(content);
// Parse markdown with the single canonical converter
// (`@docmost/prosemirror-markdown`), after normalizing foreign
// reference footnotes, then serialize to HTML so the shared HTML
// pipeline below (processAttachments + formatImportHtml +
// processHTML) keeps handling `.md` and `.html` imports
// uniformly. The markdown PARSE no longer goes through the
// editor-ext markdown layer (issue #345) — the drift source is
// gone. The PM -> HTML -> PM hop that follows is lossless
// plumbing for attachment/link resolution, NOT a second parse.
content = jsonToHtml(
await markdownToProseMirror(normalizeForeignMarkdown(content)),
);
}
} catch (err: any) {
if (err?.code === 'ENOENT') {
@@ -500,10 +516,12 @@ export class FileImportTaskService {
this.importService.extractTitleAndRemoveHeading(pmState);
// Canonicalize footnote topology on this non-editor write path
// (markdownToHtml/processHTML never runs footnoteSyncPlugin), so a
// zip-imported page's footnotes are reference-ordered, deduped, and
// (the HTML pipeline's processHTML never runs footnoteSyncPlugin), so
// a zip-imported page's footnotes are reference-ordered, deduped, and
// orphan-free like the editor's invariant (issue #228). Pure +
// idempotent + shape-safe; a footnote-free doc is unchanged.
// idempotent + shape-safe; a footnote-free doc is unchanged. (For a
// `.md` file the package parser already yields canonical footnotes,
// so this is a no-op there.)
// (Future consolidation, architecture B: like import.service, this
// path persists directly rather than via PageService — a shared
// "prepare JSON for persist" helper would centralize this call.)
@@ -12,13 +12,19 @@ import { canonicalizeFootnotes } from '@docmost/editor-ext';
/**
* Integration-ish test for the USER-FACING markdown import path
* (`ImportService.importPage`). It exercises the REAL markdown -> HTML -> JSON
* conversion and asserts that the stored page content has its footnotes
* canonicalized the gap that issue #228 fixes: the import path builds
* ProseMirror JSON directly (never running the editor's footnoteSyncPlugin), so
* before this wiring the stored footnotes kept the markdown's physical
* definition order (out of order vs. references), retained orphan definitions,
* and did not collapse reused references.
* (`ImportService.importPage`). It exercises the REAL markdown -> ProseMirror
* conversion and asserts the stored page's footnotes are canonical: ordered by
* FIRST REFERENCE (not markdown definition order), reused references deduped to a
* single definition, and orphan definitions dropped.
*
* Since #345 the markdown parse runs through the canonical package
* (`normalizeForeignMarkdown` -> `markdownToProseMirror`), which owns this
* canonicalization: the input's GFM `[^id]` reference footnotes are normalized to
* inline `^[…]`, and the parser assigns fresh sequential ids (`fn-*`) in
* reference order while merging identical bodies so we assert by definition
* BODY order, not by the source labels. `canonicalizeFootnotes` remains wired as
* an idempotent safety net (issue #228) and is a no-op on this already-canonical
* output.
*
* The DB/ydoc side-effects are stubbed: `getNewPagePosition` (DB query) and
* `createYdoc` (Yjs encode) are spied, and `pageRepo.insertPage` captures the
@@ -67,24 +73,14 @@ function makeService() {
}
/** List the footnote-definition ids of the (single) footnotesList, in order. */
function footnoteListIds(content: any): string[] {
/** Definition body texts of the (single) footnotesList, in list order. */
function footnoteListBodies(content: any): string[] {
const list = (content.content ?? []).find(
(n: any) => n.type === 'footnotesList',
);
if (!list) return [];
return (list.content ?? [])
return (list?.content ?? [])
.filter((n: any) => n.type === 'footnoteDefinition')
.map((n: any) => n.attrs?.id);
}
function definitionText(content: any, id: string): string | undefined {
const list = (content.content ?? []).find(
(n: any) => n.type === 'footnotesList',
);
const def = (list?.content ?? []).find(
(n: any) => n.type === 'footnoteDefinition' && n.attrs?.id === id,
);
return def?.content?.[0]?.content?.[0]?.text;
.map((n: any) => n.content?.[0]?.content?.[0]?.text);
}
describe('ImportService.importPage — footnote canonicalization (#228)', () => {
@@ -101,23 +97,23 @@ describe('ImportService.importPage — footnote canonicalization (#228)', () =>
const content = getCaptured().content;
expect(content).toBeTruthy();
// Reference order is c, a, b (NOT the markdown definition order a, b, c).
expect(footnoteListIds(content)).toEqual(['c', 'a', 'b']);
// Definitions preserved and attached to the right ids.
expect(definitionText(content, 'c')).toBe('note C');
expect(definitionText(content, 'a')).toBe('note A');
expect(definitionText(content, 'b')).toBe('note B');
// Definitions ordered by FIRST REFERENCE (C, A, B) — NOT the markdown
// definition order (A, B, C) — with the orphan [^z] dropped and the reused
// [^a] collapsed to a single definition. (Ids are the parser's fresh `fn-*`,
// so we pin the BODIES.)
expect(footnoteListBodies(content)).toEqual(['note C', 'note A', 'note B']);
// Orphan definition [^z] is dropped.
expect(footnoteListIds(content)).not.toContain('z');
expect(footnoteListBodies(content)).not.toContain('orphan note');
// Reused [^a] yields exactly ONE definition, and exactly one list.
const lists = (content.content ?? []).filter(
(n: any) => n.type === 'footnotesList',
);
expect(lists).toHaveLength(1);
expect(footnoteListIds(content).filter((id) => id === 'a')).toHaveLength(1);
expect(
footnoteListBodies(content).filter((b) => b === 'note A'),
).toHaveLength(1);
});
it('is idempotent: canonicalizing the stored output again is a no-op', async () => {
@@ -134,6 +130,6 @@ describe('ImportService.importPage — footnote canonicalization (#228)', () =>
// time must not change it (safe to wire into every write path).
const second = canonicalizeFootnotes(stored);
expect(second).toEqual(stored);
expect(footnoteListIds(second)).toEqual(['c', 'a', 'b']);
expect(footnoteListBodies(second)).toEqual(['note C', 'note A', 'note B']);
});
});
@@ -17,7 +17,9 @@ import {
import { generateJitteredKeyBetween } from 'fractional-indexing-jittered';
import { TiptapTransformer } from '@hocuspocus/transformer';
import * as Y from 'yjs';
import { markdownToHtml, canonicalizeFootnotes } from '@docmost/editor-ext';
import { canonicalizeFootnotes } from '@docmost/editor-ext';
import { markdownToProseMirror } from '@docmost/prosemirror-markdown';
import { normalizeForeignMarkdown } from '../utils/foreign-markdown';
import {
FileTaskStatus,
FileTaskType,
@@ -85,11 +87,13 @@ export class ImportService {
const extracted = this.extractTitleAndRemoveHeading(prosemirrorState);
const title = extracted.title;
// Imported markdown/HTML is built via markdownToHtml -> htmlToJson, which
// never runs the editor's footnoteSyncPlugin, so the footnote topology keeps
// the source's PHYSICAL definition order (out of order vs. references),
// retains orphan definitions, and is not deduped. Canonicalize before
// persisting so the stored page matches the editor's invariant (issue #228).
// The markdown path now canonicalizes footnotes itself (the package parser),
// but the HTML path (processHTML -> htmlToJson) does NOT run the editor's
// footnoteSyncPlugin, so an imported HTML doc can keep its source's PHYSICAL
// definition order (out of order vs. references), retain orphan definitions,
// and not be deduped. Canonicalize before persisting so the stored page
// matches the editor's invariant (issue #228); it is an idempotent no-op on
// the already-canonical markdown output.
// Pure + idempotent + shape-safe: a doc with no footnotes is unchanged.
// (Future consolidation, architecture B: this import path persists directly
// via pageRepo.insertPage rather than through PageService.createPage, so the
@@ -133,12 +137,15 @@ export class ImportService {
}
async processMarkdown(markdownInput: string): Promise<any> {
try {
const html = await markdownToHtml(markdownInput);
return this.processHTML(html);
} catch (err) {
throw err;
}
// Canonical markdown -> ProseMirror JSON directly via
// `@docmost/prosemirror-markdown` (issue #345) — no HTML intermediate and no
// second editor-ext markdown layer. Foreign markdown surfaces the strict
// canonical parser does not accept (GFM `[^id]` reference footnotes) are
// rewritten to the canonical inline form by `normalizeForeignMarkdown` first.
// The HTML-cleanup pass (`normalizeImportHtml`) is intentionally skipped here:
// it targets foreign *HTML* (Notion/XWiki), which only ever arrives on the
// `.html` path (`processHTML`), never as canonical markdown.
return markdownToProseMirror(normalizeForeignMarkdown(markdownInput));
}
async processHTML(htmlInput: string): Promise<any> {
@@ -0,0 +1,218 @@
import {
convertProseMirrorToMarkdown,
markdownToProseMirror,
} from '@docmost/prosemirror-markdown';
import { normalizeForeignMarkdown } from './foreign-markdown';
/**
* STEP 2 goldens for issue #345: the foreign-markdown normalizer that runs at the
* import boundary BEFORE the strict canonical parser (`markdownToProseMirror`).
*
* Two layers:
* 1. PURE stringstring cases pinning the normalizer's own behavior (GFM
* reference footnotes inline `^[…]`).
* 2. END-TO-END acceptance: for a foreign corpus, `normalizeForeignMarkdown`
* then `markdownToProseMirror` then `convertProseMirrorToMarkdown` must leave
* NO literal `[^id]` / `:::` garbage in the document and must re-export in the
* canonical forms.
*/
describe('normalizeForeignMarkdown — GFM reference footnotes', () => {
it('inlines a single-line reference footnote and drops its definition', () => {
const out = normalizeForeignMarkdown(
'A note[^1] here.\n\n[^1]: The definition.',
);
expect(out).toBe('A note^[The definition.] here.\n');
});
it('inlines every reference to a reused id (downstream dedups)', () => {
const out = normalizeForeignMarkdown(
'X[^a] and Y[^a].\n\n[^a]: shared.',
);
expect(out).toBe('X^[shared.] and Y^[shared.].\n');
});
it('joins indented continuation lines of a definition with a space', () => {
const out = normalizeForeignMarkdown(
'See[^n].\n\n[^n]: line one\n line two',
);
expect(out).toBe('See^[line one line two].\n');
});
it('never rewrites a reference inside a fenced code block', () => {
const out = normalizeForeignMarkdown(
'```\ncode[^1] here\n```\n\n[^1]: def.',
);
expect(out).toContain('code[^1] here');
// The (now orphaned) definition line is still removed.
expect(out).not.toContain('[^1]: def.');
});
it('never rewrites a reference inside an INLINE-code span (backticks)', () => {
// The `[^1]` inside backticks is literal code and must survive verbatim;
// the one outside is rewritten. (Bug #1: only fenced blocks were protected.)
const out = normalizeForeignMarkdown(
'Use `arr[^1]` in code but note[^1] in prose.\n\n[^1]: def.',
);
expect(out).toBe('Use `arr[^1]` in code but note^[def.] in prose.\n');
});
it('escapes brackets in a body so an unbalanced ] cannot truncate the footnote', () => {
// A foreign definition body with a stray `]` would, unescaped, close the
// canonical `^[...]` early and leak the tail as text (bug #2). The body's
// brackets are backslash-escaped so the footnote stays whole.
const out = normalizeForeignMarkdown(
'Ref[^1] here.\n\n[^1]: see item ] and [more] later',
);
expect(out).toBe('Ref^[see item \\] and \\[more\\] later] here.\n');
// The tokenizer must see exactly one unescaped closing bracket (our own).
expect(out.match(/(?<!\\)\]/g)).toHaveLength(1);
});
it('leaves a reference with no matching definition literal (no body to inline)', () => {
const out = normalizeForeignMarkdown('Dangling[^x] ref.');
expect(out).toBe('Dangling[^x] ref.');
});
it('returns the input unchanged when there are no reference footnotes', () => {
const md = '# Title\n\nJust text with `inline code` and a [link](/x).';
expect(normalizeForeignMarkdown(md)).toBe(md);
});
it('does NOT touch callout surfaces — the canonical parser handles them', () => {
const callouts = ':::info\nHi\n:::\n\n> [!warning]\n> Careful';
expect(normalizeForeignMarkdown(callouts)).toBe(callouts);
});
it('strips a leading YAML front-matter block (Obsidian/Hugo/git-sync files)', () => {
const out = normalizeForeignMarkdown(
'---\ntitle: My Page\ntags: [a, b]\n---\n\n# Heading\n\nBody.',
);
expect(out).toBe('# Heading\n\nBody.');
// The front-matter must not leak into the body as a setext heading.
expect(out).not.toContain('title: My Page');
expect(out).not.toContain('---');
});
it('does not strip a horizontal rule that is not leading front-matter', () => {
const md = 'Intro paragraph.\n\n---\n\nAfter the rule.';
expect(normalizeForeignMarkdown(md)).toBe(md);
});
it('is linear on a document with thousands of definitions (no quadratic blowup)', () => {
// F2(a): the pass-2 rewrite must be O(text), not O(text × defs). Build a
// pathological doc (many defs + many plain text lines) and assert it
// completes well under a second — a quadratic implementation took ~14s.
const N = 4000;
const refs = Array.from({ length: N }, (_, i) => `line ${i} plain text`).join('\n');
const defs = Array.from({ length: N }, (_, i) => `[^n${i}]: def ${i}`).join('\n');
const doc = `start[^n0] and[^n${N - 1}] end\n\n${refs}\n\n${defs}`;
const t0 = Date.now();
const out = normalizeForeignMarkdown(doc);
const elapsed = Date.now() - t0;
expect(elapsed).toBeLessThan(2000);
// Sanity: the two real references were still inlined.
expect(out).toContain('^[def 0]');
expect(out).toContain(`^[def ${N - 1}]`);
});
it('is bounded on a long unclosed backtick run (no inline-split ReDoS)', () => {
// F2(b): a huge unterminated backtick run must not cause quadratic
// backtracking in the inline-code split. Oversized lines skip the split
// entirely (left untouched), so this returns promptly.
const line = 'x' + '`'.repeat(200000);
const doc = `${line}\n\n[^1]: def`;
const t0 = Date.now();
normalizeForeignMarkdown(doc);
expect(Date.now() - t0).toBeLessThan(2000);
});
it('does not crash or slow down on thousands of prefix-chain definition ids', () => {
// F7: the rewrite must use a FIXED generic scanner, not an alternation built
// from the ids. A `(a|aa|aaa|…)` alternation over prefix-chain ids blows the
// V8 regex compiler (FATAL RegExpCompiler Allocation failed — uncatchable,
// kills the process). A fixed scanner has no id-dependent compilation cost.
const N = 4000;
const ids = Array.from({ length: N }, (_, i) => 'a'.repeat(i + 1));
const defs = ids.map((id) => `[^${id}]: body ${id.length}`).join('\n');
const doc = `ref[^${ids[0]}] and[^${ids[N - 1]}] end\n\n${defs}`;
const t0 = Date.now();
const out = normalizeForeignMarkdown(doc);
expect(Date.now() - t0).toBeLessThan(2000);
// Prefix disambiguation is correct: [^a] and [^aaaa...] inline their OWN body.
expect(out).toContain('^[body 1]');
expect(out).toContain(`^[body ${N}]`);
});
it('strips a CRLF (Windows) front-matter block, not just LF', () => {
// F9: the line-anchored regex needs LF after the opening `---`, so a Windows
// file (`---\r\n…`) would slip past the strip and leak the front-matter into
// the body. normalizeForeignMarkdown normalizes CRLF -> LF first.
const out = normalizeForeignMarkdown(
'---\r\ntitle: Foo\r\ntags: [a]\r\n---\r\n\r\n# Heading\r\n\r\nBody.',
);
expect(out).toBe('# Heading\n\nBody.');
expect(out).not.toContain('title: Foo');
expect(out).not.toContain('---');
});
it('strips front-matter whose value contains a triple-dash (line-anchored)', () => {
// F8: the block must close only on a `\n---` LINE, not the first inline
// `---`. A value like `title: Q1 --- Q2` must not truncate the front-matter
// and leak the rest (author/closing ---) into the body.
const out = normalizeForeignMarkdown(
'---\ntitle: Q1 --- Q2 results\nauthor: bob\n---\n\nReal body.',
);
expect(out).toBe('Real body.');
expect(out).not.toContain('author: bob');
expect(out).not.toContain('Q2 results');
});
});
describe('foreign markdown import acceptance (normalizer + canonical parser)', () => {
const FOREIGN = [
'# Doc',
'',
'Body refs [^c] and [^a] and [^b] and again [^a].',
'',
':::info',
'A legacy callout.',
':::',
'',
'| h1 | h2 |',
'| --- | --- |',
'| 1 | 2 |',
'',
'[^a]: note A',
'[^b]: note B',
'[^c]: note C',
'[^z]: orphan note',
].join('\n');
it('leaves no literal [^id] or ::: in the imported doc and re-exports canonically', async () => {
const normalized = normalizeForeignMarkdown(FOREIGN);
const doc = await markdownToProseMirror(normalized);
const reexport = convertProseMirrorToMarkdown(doc);
// No foreign garbage leaks into the document.
expect(reexport).not.toMatch(/\[\^/); // no reference footnote refs/defs
expect(reexport).not.toContain(':::'); // no legacy callout fences
// Canonical forms are present.
expect(reexport).toContain('^[note C]');
expect(reexport).toContain('> [!info]');
expect(reexport).toContain('| h1 | h2 |');
// Footnotes: ordered by first reference (C, A, B), reused [^a] deduped to one,
// orphan [^z] dropped (it had no reference after normalization).
const list = doc.content.find((n: any) => n.type === 'footnotesList');
const bodies = list.content.map(
(d: any) => d.content[0].content[0].text,
);
expect(bodies).toEqual(['note C', 'note A', 'note B']);
expect(bodies).not.toContain('orphan note');
expect(
doc.content.filter((n: any) => n.type === 'footnotesList'),
).toHaveLength(1);
});
});
@@ -0,0 +1,265 @@
/**
* Foreign-markdown normalizer an input-liberal / output-canonical adapter that
* runs at the IMPORT boundary, BEFORE the canonical parser
* (`markdownToProseMirror` from `@docmost/prosemirror-markdown`).
*
* The canonical parser is deliberately STRICT: it only understands Docmost's
* canonical markdown surface (Obsidian-style `> [!type]` callouts, Pandoc/Obsidian
* inline footnotes `^[body]`, lossless `![alt](src) <!--img {...}-->` images, ).
* Import, however, ingests FOREIGN files (GitHub/GFM, Notion, old Docmost
* exports). Those use surfaces the canonical parser does not accept, most notably
* GitHub-flavoured *reference* footnotes:
*
* Text with a note[^1] and another[^long].
*
* [^1]: The first definition.
* [^long]: A second one.
*
* Left untouched, the parser does NOT recognise `[^id]` (it only parses `^[body]`),
* so the reference leaks as literal text and worse, the trailing `[^id]: def`
* line is a valid CommonMark *link-reference definition*, so `[^id]` is silently
* rendered as a bogus link. This normalizer rewrites reference footnotes into the
* canonical inline form so the parser materialises real footnote nodes.
*
* This is a TEXT pre-pass, NOT a second parser fork: it does not re-implement any
* converter logic. Callout surfaces (`:::type` and `> [!type]`) are intentionally
* NOT touched here the canonical parser already accepts BOTH natively (its
* `preprocessCallouts` pass), so normalizing them would be redundant and would
* only risk degrading the parser's nesting/code-fence-aware handling.
*/
/** Matches a fenced code block delimiter (``` or ~~~), capturing the marker run. */
const CODE_FENCE_RE = /^(\s*)(`{3,}|~{3,})/;
/**
* Matches a GFM footnote DEFINITION line: `[^id]: body`. The id is any run of
* non-`]` characters; the body is the remainder of the line (possibly empty).
*/
const FOOTNOTE_DEF_RE = /^\[\^([^\]]+)\]:[ \t]?(.*)$/;
/** True when a line is a code-fence delimiter that toggles fenced-code state. */
function fenceMarker(line: string): string | null {
const m = line.match(CODE_FENCE_RE);
return m ? m[2] : null;
}
/** True when a line is indented (leading space/tab) and not blank — a continuation. */
function isIndentedContinuation(line: string): boolean {
return /^[ \t]+\S/.test(line);
}
function escapeRegExp(value: string): string {
return value.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}
/**
* Backslash-escape any square bracket in a footnote body before it is wrapped in
* `^[...]`. The canonical inline-footnote tokenizer scans the body with bracket
* balancing and closes on the first UNMATCHED `]`, so an unbalanced bracket in a
* foreign definition (e.g. `[^1]: see item ] later`) would otherwise truncate the
* footnote and leak the tail as literal text. Escaping every `[`/`]` makes the
* body an inert run of characters the tokenizer then closes only on our own
* closing `]`. (A balanced `[link](url)` inside a body still round-trips because
* the escaped form renders the literal brackets, which is the safe reading for a
* footnote body; the alternative brittle balance tracking risks worse.)
*/
function escapeFootnoteBody(body: string): string {
return body.replace(/[[\]]/g, '\\$&');
}
/**
* Rewrite every `[^id]` reference on a line to its `^[body]` form, but ONLY in the
* text OUTSIDE inline-code spans. A `[^id]` inside backticks is literal code
* content and must be preserved verbatim (a footnote ref never lives inside code).
* We split the line on inline-code spans (paired backtick runs) and rewrite only
* the non-code segments.
*/
// Above this length a single line is not split into inline-code spans (see
// below). A genuine markdown line carrying a footnote reference is never tens of
// KB; the cap only bypasses the inline-code protection for pathological lines.
const INLINE_SPLIT_MAX_LINE = 8192;
function rewriteRefsOutsideInlineCode(
line: string,
replace: (text: string) => string,
): string {
// The inline-code split alternation `(`+)(?:(?!\1)[\s\S])*\1` backtracks
// quadratically on a long UNCLOSED backtick run (its middle can consume the
// rest of the line, then fail to find a closing run and retry from each
// position). On an untrusted import this is a request-thread ReDoS. A real
// footnote line is short, so for an oversized line we skip the inline-code
// protection entirely and leave the line UNTOUCHED (rewriting it wholesale
// could corrupt a `[^id]` that legitimately lives inside inline code). This is
// a conservative bypass: an over-8KB line simply does not get its reference
// footnotes inlined — acceptable for a pathological input.
if (line.length > INLINE_SPLIT_MAX_LINE) return line;
// Alternation: an inline-code span (one or more backticks, then anything up to
// the SAME run of backticks) OR a run of non-backtick text. Unterminated
// backticks fall through as ordinary text (matched by the second branch on the
// leftover), so a stray backtick never swallows the rest of the line.
const parts = line.match(/(`+)(?:(?!\1)[\s\S])*\1|[^`]+|`+/g);
if (!parts) return line;
return parts
.map((seg) => (seg.startsWith('`') ? seg : replace(seg)))
.join('');
}
/**
* Convert GFM reference footnotes (`[^id]` + `[^id]: def`) into canonical inline
* footnotes (`^[def]`).
*
* - Definitions are collected first (a leading `[^id]: text` line plus any
* immediately-following indented continuation lines, joined with a space) and
* removed from the output.
* - Each in-text reference `[^id]` for which a definition was found is replaced by
* `^[def]`. References with no matching definition are left literal (there is no
* body to inline; the parser fails them open the same way).
* - Code is respected on both passes: `[^id]` inside a fenced ``` / ~~~ block is
* never rewritten and a `[^id]:` line inside a fence is never a definition; and
* on the rewrite pass a `[^id]` inside an INLINE-code span (backticks) is left
* literal too.
* - The inlined body is bracket-escaped so an unbalanced `[`/`]` in a foreign
* definition cannot truncate the resulting `^[...]` footnote.
*
* Deduplication / reference-ordering / orphan-dropping of the resulting footnotes
* is handled downstream by the canonical parser (`assembleFootnotes`); this pass
* only changes the surface syntax.
*/
function convertReferenceFootnotes(markdown: string): string {
const lines = markdown.split('\n');
// Pass 1: collect definitions and mark their lines for removal.
const defs = new Map<string, string>();
const dropped = new Array<boolean>(lines.length).fill(false);
let inFence = false;
let fence = '';
for (let i = 0; i < lines.length; i++) {
const line = lines[i];
const marker = fenceMarker(line);
if (inFence) {
if (marker && marker[0] === fence[0] && marker.length >= fence.length) {
inFence = false;
fence = '';
}
continue;
}
if (marker) {
inFence = true;
fence = marker;
continue;
}
const def = line.match(FOOTNOTE_DEF_RE);
if (!def) continue;
const id = def[1];
const body: string[] = [def[2].trim()];
dropped[i] = true;
// Consume immediately-following indented continuation lines (GFM lazy
// continuation is not supported by design — keep it simple and predictable).
let j = i + 1;
while (j < lines.length && isIndentedContinuation(lines[j])) {
body.push(lines[j].trim());
dropped[j] = true;
j++;
}
i = j - 1;
// Last definition wins for a duplicated id (matches CommonMark link-ref
// semantics closely enough for a foreign-input adapter).
defs.set(id, body.filter((s) => s.length > 0).join(' '));
}
if (defs.size === 0) {
return markdown;
}
// ONE fixed, generic scanner regex — NOT one built from the definition ids.
// It matches ANY `[^id]` shape, and the replacer decides per match via a map
// lookup whether that id is a real definition (replace) or not (leave as-is).
// This is genuinely O(total text) with no per-document regex compilation.
//
// Do NOT rebuild this as an alternation over `[...defs.keys()]`: a giant
// `(id1|id2|...)` alternation over thousands of ids can blow the V8 regex
// compiler's stack — a fatal, UNCATCHABLE "RegExpCompiler Allocation failed"
// on prefix-chain ids (`a`, `aa`, `aaa`, ...) that kills the whole process
// (worse than the earlier per-def thread-hang). A fixed scanner has no
// id-dependent compilation cost and cannot blow up.
const refRe = /\[\^([^\]]+)\]/g;
const rewriteSegment = (segment: string): string =>
segment.replace(refRe, (whole, id: string) => {
const body = defs.get(id);
// Only real definitions are inlined; an unknown id is left literal (same as
// the old per-def loop, which simply never matched it).
return body === undefined ? whole : `^[${escapeFootnoteBody(body)}]`;
});
// Pass 2: rewrite in-text references, skipping fenced code and dropped lines.
const out: string[] = [];
inFence = false;
fence = '';
for (let i = 0; i < lines.length; i++) {
if (dropped[i]) continue;
let line = lines[i];
const marker = fenceMarker(line);
if (inFence) {
out.push(line);
if (marker && marker[0] === fence[0] && marker.length >= fence.length) {
inFence = false;
fence = '';
}
continue;
}
if (marker) {
inFence = true;
fence = marker;
out.push(line);
continue;
}
line = rewriteRefsOutsideInlineCode(line, rewriteSegment);
out.push(line);
}
return out.join('\n');
}
/**
* Strip a single leading YAML front-matter block (`---\n…\n---`). Foreign files
* from Obsidian / Hugo / Jekyll / Notion and Docmost's OWN git-sync page files
* open with front-matter that the canonical parser does not consume, so
* without this it leaks into the body (and `title: Foo` above the closing `---`
* renders as a setext `<h2>` that `extractTitleAndRemoveHeading` can hijack as
* the page title). It is a no-op for front-matter-free input.
*
* LINE-ANCHORED (the same shape the canonical parser uses in
* prosemirror-markdown/page-file.ts): the block opens only on `---\n` at the
* very start and closes only on a `\n---` line. The retired `markdownToHtml`
* strip closed on the FIRST `---` ANYWHERE (an unanchored close), so a value
* containing a triple-dash (e.g. `title: Q1 --- Q2`) truncated the front-matter
* and leaked the rest into the body. An optional leading BOM is tolerated.
*/
const YAML_FRONT_MATTER_RE = /^\uFEFF?---\n[\s\S]*?\n---\n?/;
/**
* Normalize a foreign markdown string into Docmost's canonical markdown surface
* so the strict canonical parser accepts it losslessly: normalize line endings,
* strip a leading YAML front-matter block, then rewrite GFM reference footnotes
* into inline footnotes. Add further fixture-driven foreign-surface cases here as
* they are found.
*/
export function normalizeForeignMarkdown(markdown: string): string {
if (!markdown) return markdown;
// Normalize CRLF -> LF FIRST. The line-anchored front-matter regex requires a
// bare `\n` after the opening `---`, and convertReferenceFootnotes splits on
// `\n`; a Windows/CRLF foreign file (`---\r\n…`) would otherwise slip past the
// front-matter strip and leak into the body. The canonical parser
// (page-file.ts parsePageFile) normalizes the same way before its FRONTMATTER_RE.
const src = markdown.replace(/\r\n/g, '\n');
const withoutFrontMatter = src.replace(YAML_FRONT_MATTER_RE, '').trimStart();
return convertReferenceFootnotes(withoutFrontMatter);
}
@@ -20,10 +20,6 @@ export interface IStripeSeatsSyncJob {
export interface IPageHistoryJob {
pageId: string;
// #370 — intentionality tier the worker stamps on the snapshot. All jobs on
// this queue are trailing idle-flush autosnapshots, so this is 'idle' (absent
// → treated as 'idle' by the processor).
kind?: 'idle';
}
/**
+23
View File
@@ -0,0 +1,23 @@
// Jest stub for @tiptap/react.
//
// The server export/import code paths transitively import editor-ext, whose node
// extensions import from `@tiptap/react`. The real module re-exports all of
// `@tiptap/core` (headless, safe under node) AND adds React view helpers
// (`ReactNodeViewRenderer`, …) that eagerly pull in react-dom — which throws
// `navigator is not defined` under jest's node environment.
//
// So this stub DELEGATES to the real `@tiptap/core` (keeping `mergeAttributes`,
// `Node`, `Mark`, `nodeInputRule`, … working — they are used by
// `jsonToHtml`/`htmlToJson` on the server) and overrides ONLY the React view
// helpers with no-ops. Those helpers are referenced solely inside `addNodeView()`
// — code that runs only in a live browser editor, never on the server; if any
// were actually invoked here it would (correctly) surface as a test failure.
const core = require('@tiptap/core');
module.exports = {
...core,
ReactNodeViewRenderer: () => () => ({}),
NodeViewWrapper: () => null,
NodeViewContent: () => null,
ReactRenderer: class {},
};
+3
View File
@@ -543,6 +543,9 @@ importers:
'@docmost/pdf-inspector':
specifier: 1.9.6
version: 1.9.6
'@docmost/prosemirror-markdown':
specifier: workspace:*
version: link:../../packages/prosemirror-markdown
'@fastify/cookie':
specifier: ^11.0.2
version: 11.0.2