Compare commits

..

3 Commits

Author SHA1 Message Date
agent_coder c1b2210a4e fix(#370): thread trx into addPageWatchers (F7 self-deadlock) + restore contributors on commit-failure (F8) + assert the lock (F9) (review round 2)
The round-1 F3 fix (wrapping the processor's find+save in a locked tx) itself
introduced two regressions:

F7 [CRITICAL] addPageWatchers ran WITHOUT trx inside the tx holding FOR UPDATE on
pages[pageId]. The watcher insert's FK check takes FOR KEY SHARE on the same row,
but on a DIFFERENT pool connection — a true self-deadlock (our tx connection sits
idle-in-transaction awaiting the JS await, the insert connection blocks on the
lock). Now passes trx (addPageWatchers already accepts it and routes it through
insertMany), so the FK lock is taken on the connection that already holds FOR
UPDATE — no self-conflict.

F8 [WARNING] popContributors is a destructive Redis SPOP; the inner catch only
restores on a throw INSIDE the callback. A COMMIT failure throws OUTSIDE it,
rolling the snapshot back while the pop is gone → a retry writes an unattributed
version. Now tracks the popped set and restores it in an outer catch (idempotent
SADD), leaving BullMQ to retry with attribution intact.

F9 [WARNING] The spec asserted saveHistory args with a loosened objectContaining
that stopped verifying trx, and never pinned withLock/trx on findById or the trx
on addPageWatchers — which is exactly why F7 slipped. Restored the exact
saveHistory(trx) assertion and added findById({withLock,trx}) + addPageWatchers
trx assertions (the latter would have caught F7), plus a commit-failure test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-05 05:50:19 +03:00
agent_coder b1e5193b37 fix(#370): ES2021-safe spec, source-specific idle ceiling, processor lock, tested index mapping (review round 1)
F1 [BLOCKER] persistence-store.spec used Array.prototype.at(-1) (ES2022) but the
server targets ES2021, so server tsc failed (TS2550) and ts-jest could not
compile the suite — 22 core manual-save/idle/boundary tests silently did not run
in CI. Replaced with [length - 1] index access.

F2 [WARNING] The idle burst-reset used a hardcoded IDLE_MAX_WAIT_USER for both
tiers, but computeHistoryJob's ceiling is source-specific. On a continuously
agent-edited page the burst marker stayed stale for 5..10m, forcing delay=0 on
every store and writing one idle row per store — the exact per-store bloat the
debounce prevents. The reset now uses the same source-specific max-wait.

F3 [WARNING] The processor did an unlocked findPageLastHistory -> saveHistory,
which TOCTOU-races a concurrent manual-save (that runs under a page-row lock),
producing two page_history rows with identical content (one idle, one manual) and
defeating promote-not-dup. The snapshot decision is now wrapped in executeTx with
the same page-row lock, so the second writer observes the first's committed row
and the isDeepStrictEqual gate collapses the duplicate.

F4 [WARNING] The risky client filtered-index -> full-list mapping had no tests.
Extracted it to a pure resolvePrevSnapshotId(fullItems, id) helper (diff/restore
baseline against the true previous snapshot in the FULL list, never the previous
visible version) and unit-tested it; removed the now-vestigial index threading.

F5/F6 [low] Renamed the misleading ceiling test + fixed its comment; added a
CHANGELOG entry for the user-facing versioning feature.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-05 05:31:26 +03:00
agent_coder 1542c99979 feat(#370): page-history intentionality tiers — kind column + intentional/idle/boundary triggers (PR-1 core)
PR-1 'core' of #370: introduces page_history.kind ('manual'|'agent'|'idle'|
'boundary'; legacy null = autosave) and rebuilds the snapshot triggers around a
three-tier intentionality model. Draft durability (pages/ydoc hocuspocus
autosave) is unchanged; only the frequency and labelling of history points change.

- Migration 20260705T120000: page_history.kind nullable varchar(20), no default.
- Manual Save: one stateless 'save-version' path for human AND agent; kind is
  derived SERVER-SIDE from the signed context.actor (never the payload), readOnly
  connections rejected, the fresh ydoc runs through the existing store path (no
  REST race), then broadcasts version.saved.
- Idle-flush: trailing debounce (one BullMQ job per page, remove-then-readd) with
  IDLE_INTERVAL_USER=60m / AGENT=15m AND a max-wait ceiling
  (IDLE_MAX_WAIT_USER=10m / AGENT=5m) so a continuous editing session can't starve
  the autosnapshot (review round-1 WARNING).
- Boundary: generalized from the user→agent special-case to ANY lastUpdatedSource
  transition (user↔agent↔git), same isDeepStrictEqual gate — covers git-sync free.
- Removed the agent delay=0 fast path and the old HISTORY_FAST_* constants; the
  agent joins the common idle pipeline.
- Promote-not-dup: a manual save on unchanged content promotes the latest
  autosave's kind in place (or no-ops if already manual) instead of duplicating a
  heavy content row.
- Client: mod+S hotkey + menu button (hidden when readOnly), history-panel kind
  badges, dimmed autosaves, a 'versions only' filter (indices map to the full
  list so diff/restore still target the true previous snapshot), live refresh on
  version.saved.

Internal review: APPROVE-with-suggestions; the round-1 WARNING (idle starvation)
is fixed here via the max-wait ceiling, and the generalized-boundary + ceiling
behaviours are pinned with new tests (115 collab/repo specs green, server tsc 0).

Deferred to later PRs: shares.published_mode (PR-2), the save_page_version MCP
tool + role prompts (PR-3), actor='git' wiring into #359 (PR-4).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-05 05:00:12 +03:00
36 changed files with 1187 additions and 1081 deletions
+2 -2
View File
@@ -201,7 +201,7 @@ pnpm workspace (`pnpm@10.4.0`) orchestrated by **Nx**. Four workspace packages:
| `apps/client` | `client` | React 18 + Vite + Mantine 8 + TanStack Query + Jotai | SPA frontend |
| `packages/editor-ext` | `@docmost/editor-ext` | Tiptap/ProseMirror | Shared Tiptap node/mark extensions, imported by both the client and the server |
| `packages/mcp` | `@docmost/mcp` | MCP SDK, Tiptap, Yjs | Standalone MCP server, also bundled into the server at `/mcp`. Consumes the shared converter/schema from `@docmost/prosemirror-markdown` (#293) — it no longer carries its own vendored converter/schema copy |
| `packages/prosemirror-markdown` | `@docmost/prosemirror-markdown` | Tiptap, marked, jsdom | The single, canonical ProseMirror↔Markdown converter + Docmost schema mirror (#293). Consumed by `mcp`, `git-sync`, AND `apps/server` (server-side markdown import/export, #345); there is exactly ONE copy of the converter now |
| `packages/prosemirror-markdown` | `@docmost/prosemirror-markdown` | Tiptap, marked, jsdom | The single, canonical ProseMirror↔Markdown converter + Docmost schema mirror (#293). Consumed by `mcp` and `git-sync`; there is exactly ONE copy of the converter now |
`build` targets are Nx-cached and dependency-ordered (`dependsOn: ["^build"]`), so `editor-ext` builds before the apps. `nx.json` sets `affected.defaultBase: main`.
@@ -284,7 +284,7 @@ The API server is a Fastify app with a global `/api` prefix (`main.ts` excludes
### Client structure
Vite SPA. Code is organized by feature under `apps/client/src/features/*` (mirrors the server domains: `page`, `space`, `comment`, `ai-chat`, `editor`, …). Conventions:
- **TanStack Query** for server state (one `queries/` file per feature), **Jotai** atoms for local/shared UI state, **Mantine 8** + CSS modules (`*.module.css`) + `postcss-preset-mantine` for UI.
- The editor is Tiptap; shared node/mark extensions live in `packages/editor-ext` and are imported by **both the client and the server** (collaboration, schema, `canonicalizeFootnotes`) — editor schema changes often need to be made in `editor-ext`, not just the client. Server-side markdown import/export no longer lives in `editor-ext`: it goes through the canonical converter (#345, see below). The ProseMirror↔Markdown converter and its Docmost schema mirror now live in a SINGLE package, `@docmost/prosemirror-markdown` (#293), consumed by `mcp`, `git-sync`, and `apps/server` (#345) — do NOT reintroduce a per-package copy. `editor-ext` is the upstream source of the Tiptap schema; the package's `docmost-schema.ts` mirrors it and a serializer-contract test (`packages/prosemirror-markdown/test/serializer-contract.test.ts`) guards the boundary (every schema node must have a converter case), so a drift surfaces as a failing test rather than silent divergence.
- The editor is Tiptap; shared node/mark extensions live in `packages/editor-ext` and are imported by **both the client and the server** (collaboration, import/export) — editor schema changes often need to be made in `editor-ext`, not just the client. The ProseMirror↔Markdown converter and its Docmost schema mirror now live in a SINGLE package, `@docmost/prosemirror-markdown` (#293), consumed by both `mcp` and `git-sync` — do NOT reintroduce a per-package copy. `editor-ext` is the upstream source of the Tiptap schema; the package's `docmost-schema.ts` mirrors it and a serializer-contract test (`packages/prosemirror-markdown/test/serializer-contract.test.ts`) guards the boundary (every schema node must have a converter case), so a drift surfaces as a failing test rather than silent divergence.
- API access goes through `apps/client/src/lib/api-client.ts` (axios). The `@` alias maps to `apps/client/src`.
- Runtime config is injected at build time by `vite.config.ts` via `define` (`APP_URL`, `COLLAB_URL`, `APP_VERSION`, …) — these come from the root `.env`, not from `import.meta.env`.
+8
View File
@@ -12,6 +12,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
- **Save intentional page versions.** Press `Cmd/Ctrl+S` (or use the page menu)
to save a named version of a page. The history panel now distinguishes
intentional versions (a "Saved" / "Agent version" badge) from automatic
snapshots, dims autosaves, and offers an "Only versions" filter. Automatic
snapshots switched from a fixed interval to a trailing idle-flush with a
max-wait ceiling, and a boundary snapshot is pinned whenever the editing source
changes (e.g. a person's edits followed by the AI agent). (#370)
- **Place several images side by side in a row.** A new "Inline (side by
side)" alignment mode in the image bubble menu renders consecutive inline
images as a row that wraps onto the next line on narrow screens. The row is
@@ -1385,5 +1385,14 @@
"The commented text changed since this suggestion was made; it was not applied.": "The commented text changed since this suggestion was made; it was not applied.",
"Dismiss": "Dismiss",
"Suggestion dismissed": "Suggestion dismissed",
"Failed to dismiss suggestion": "Failed to dismiss suggestion"
"Failed to dismiss suggestion": "Failed to dismiss suggestion",
"Save version": "Save version",
"Ctrl+S": "Ctrl+S",
"Version saved": "Version saved",
"Already saved as the latest version": "Already saved as the latest version",
"Agent version": "Agent version",
"Boundary": "Boundary",
"Autosave": "Autosave",
"Only versions": "Only versions",
"No saved versions yet.": "No saved versions yet."
}
@@ -1248,5 +1248,14 @@
"The commented text changed since this suggestion was made; it was not applied.": "Прокомментированный текст изменился после создания предложения; оно не было применено.",
"Dismiss": "Не применять",
"Suggestion dismissed": "Предложение отклонено",
"Failed to dismiss suggestion": "Не удалось отклонить предложение"
"Failed to dismiss suggestion": "Не удалось отклонить предложение",
"Save version": "Сохранить версию",
"Ctrl+S": "Ctrl+S",
"Version saved": "Версия сохранена",
"Already saved as the latest version": "Уже сохранено как последняя версия",
"Agent version": "Версия агента",
"Boundary": "Граница",
"Autosave": "Автосейв",
"Only versions": "Только версии",
"No saved versions yet.": "Пока нет сохранённых версий."
}
@@ -1,10 +1,19 @@
import { atom } from "jotai";
import { Editor } from "@tiptap/core";
import type { HocuspocusProvider } from "@hocuspocus/provider";
import { PageEditMode } from "@/features/user/types/user.types.ts";
import type { DictationUnavailableReason } from "@/features/dictation/dictation-status";
export const pageEditorAtom = atom<Editor | null>(null);
// #370 — the active page's collab provider, published by the page editor so the
// header menu can emit the "save-version" stateless signal (Cmd+S / button).
// Null when the page is read-only / collab isn't connected. A typed initial
// value (rather than an explicit generic) keeps jotai's overload resolution on
// the writable PrimitiveAtom branch.
const initialCollabProvider: HocuspocusProvider | null = null;
export const collabProviderAtom = atom(initialCollabProvider);
export const titleEditorAtom = atom<Editor | null>(null);
export const readOnlyEditorAtom = atom<Editor | null>(null);
@@ -31,11 +31,18 @@ import { useAtom, useAtomValue, useSetAtom } from "jotai";
import useCollaborationUrl from "@/features/editor/hooks/use-collaboration-url";
import { currentUserAtom } from "@/features/user/atoms/current-user-atom";
import {
collabProviderAtom,
currentPageEditModeAtom,
dictationAvailabilityAtom,
pageEditorAtom,
yjsConnectionStatusAtom,
} from "@/features/editor/atoms/editor-atoms";
import { notifications } from "@mantine/notifications";
import {
VERSION_SAVED_MESSAGE_TYPE,
type VersionSavedMessage,
saveVersionPending,
} from "@/features/page-history/version-messages";
import { asideStateAtom } from "@/components/layouts/global/hooks/atoms/sidebar-atom";
import {
activeCommentIdAtom,
@@ -123,6 +130,7 @@ export default function PageEditor({
const [currentUser] = useAtom(currentUserAtom);
const [, setEditor] = useAtom(pageEditorAtom);
const setCollabProvider = useSetAtom(collabProviderAtom);
const [, setAsideState] = useAtom(asideStateAtom);
const [, setActiveCommentId] = useAtom(activeCommentIdAtom);
const [showCommentPopup, setShowCommentPopup] = useAtom(showCommentPopupAtom);
@@ -180,6 +188,24 @@ export default function PageEditor({
const onStatelessHandler = ({ payload }: onStatelessParameters) => {
try {
const message = JSON.parse(payload);
// #370 — a version was saved somewhere; live-refresh the history panel
// on every client. Only the client that pressed Save (tracked by the
// module-level flag) shows the confirmation toast.
if (message?.type === VERSION_SAVED_MESSAGE_TYPE) {
const versionMsg = message as VersionSavedMessage;
queryClient.invalidateQueries({
queryKey: ["page-history-list"],
});
if (saveVersionPending.current) {
saveVersionPending.current = false;
notifications.show({
message: versionMsg.alreadySaved
? t("Already saved as the latest version")
: t("Version saved"),
});
}
return;
}
if (message?.type !== "page.updated" || !message.updatedAt) return;
const pageData = queryClient.getQueryData<IPage>(["pages", slugId]);
if (pageData) {
@@ -237,12 +263,16 @@ export default function PageEditor({
local.on("synced", onLocalSyncedHandler);
providersRef.current = { socket, local, remote };
// #370 — publish the provider so the header menu can emit save-version.
setCollabProvider(remote);
setProvidersReady(true);
} else {
setCollabProvider(providersRef.current.remote);
setProvidersReady(true);
}
// Only destroy on final unmount
return () => {
setCollabProvider(null);
providersRef.current?.socket.destroy();
providersRef.current?.remote.destroy();
providersRef.current?.local.destroy();
@@ -1,4 +1,11 @@
import { Text, Group, UnstyledButton, Avatar, Tooltip } from "@mantine/core";
import {
Text,
Group,
UnstyledButton,
Avatar,
Tooltip,
Badge,
} from "@mantine/core";
import { CustomAvatar } from "@/components/ui/custom-avatar.tsx";
import { AgentAvatarStack } from "@/components/ui/agent-avatar-stack.tsx";
import { formattedDate } from "@/lib/time";
@@ -7,36 +14,59 @@ import clsx from "clsx";
import { IPageHistory } from "@/features/page-history/types/page.types";
import { memo, useCallback } from "react";
import { useSetAtom } from "jotai";
import { useTranslation } from "react-i18next";
import { historyAtoms } from "@/features/page-history/atoms/history-atoms.ts";
const MAX_VISIBLE_AVATARS = 5;
/**
* #370 — map a snapshot's intentionality tier to its badge. `version: true`
* marks the intentional points (manual / agent); autosaves (boundary / idle /
* legacy null) are non-versions and get dimmed in the list.
*/
type HistoryKindMeta = { labelKey: string; color: string; version: boolean };
export function historyKindMeta(kind?: string | null): HistoryKindMeta {
switch (kind) {
case "manual":
return { labelKey: "Saved", color: "blue", version: true };
case "agent":
return { labelKey: "Agent version", color: "violet", version: true };
case "boundary":
return { labelKey: "Boundary", color: "gray", version: false };
default: // "idle" | null | undefined (legacy autosave)
return { labelKey: "Autosave", color: "gray", version: false };
}
}
interface HistoryItemProps {
historyItem: IPageHistory;
index: number;
onSelect: (id: string, index: number) => void;
onHover?: (id: string, index: number) => void;
// The previous snapshot for diff/restore is resolved by id from the FULL list
// in the parent (resolvePrevSnapshotId), so the item only needs to report its
// own id — never a list index (which would be the filtered-view index).
onSelect: (id: string) => void;
onHover?: (id: string) => void;
onHoverEnd?: () => void;
isActive: boolean;
}
const HistoryItem = memo(function HistoryItem({
historyItem,
index,
onSelect,
onHover,
onHoverEnd,
isActive,
}: HistoryItemProps) {
const setHistoryModalOpen = useSetAtom(historyAtoms);
const { t } = useTranslation();
const kindMeta = historyKindMeta(historyItem.kind);
const handleClick = useCallback(() => {
onSelect(historyItem.id, index);
}, [onSelect, historyItem.id, index]);
onSelect(historyItem.id);
}, [onSelect, historyItem.id]);
const handleMouseEnter = useCallback(() => {
onHover?.(historyItem.id, index);
}, [onHover, historyItem.id, index]);
onHover?.(historyItem.id);
}, [onHover, historyItem.id]);
const contributors = historyItem.contributors;
const hasContributors = contributors && contributors.length > 0;
@@ -49,8 +79,20 @@ const HistoryItem = memo(function HistoryItem({
onMouseEnter={handleMouseEnter}
onMouseLeave={onHoverEnd}
className={clsx(classes.history, { [classes.active]: isActive })}
// #370 — dim autosnapshots so intentional versions stand out.
style={{ opacity: kindMeta.version ? 1 : 0.55 }}
>
<Text size="sm">{formattedDate(new Date(historyItem.createdAt))}</Text>
<Group gap={6} wrap="nowrap" justify="space-between">
<Text size="sm">{formattedDate(new Date(historyItem.createdAt))}</Text>
<Badge
size="xs"
radius="sm"
variant={kindMeta.version ? "filled" : "light"}
color={kindMeta.color}
>
{t(kindMeta.labelKey)}
</Badge>
</Group>
<Group gap={6} wrap="nowrap" mt={4}>
{hasContributors ? (
@@ -9,7 +9,7 @@ import {
historyAtoms,
} from "@/features/page-history/atoms/history-atoms";
import { useAtom, useSetAtom } from "jotai";
import { useCallback, useEffect, useMemo, useRef } from "react";
import { useCallback, useEffect, useMemo, useRef, useState } from "react";
import {
Button,
ScrollArea,
@@ -17,9 +17,12 @@ import {
Divider,
Loader,
Center,
Switch,
Text,
} from "@mantine/core";
import { useTranslation } from "react-i18next";
import { useHistoryRestore } from "@/features/page-history/hooks";
import { resolvePrevSnapshotId } from "@/features/page-history/utils/resolve-prev-snapshot";
const PREFETCH_DELAY_MS = 150;
@@ -47,6 +50,23 @@ function HistoryList({ pageId }: Props) {
[pageHistoryData],
);
// #370 — "only versions" filter: hide autosnapshots (idle/boundary/legacy
// null), keep only intentional points (manual/agent). Filtering is over the
// already-loaded pages; the diff/restore still targets the true previous
// snapshot, so items carry their index within the FULL list.
const [onlyVersions, setOnlyVersions] = useState(false);
const isVersion = useCallback(
(kind?: string | null) => kind === "manual" || kind === "agent",
[],
);
const visibleItems = useMemo(
() =>
onlyVersions
? historyItems.filter((item) => isVersion(item.kind))
: historyItems,
[historyItems, onlyVersions, isVersion],
);
const loadMoreRef = useRef<HTMLDivElement>(null);
const prefetchTimeoutRef = useRef<ReturnType<typeof setTimeout> | null>(null);
@@ -60,11 +80,13 @@ function HistoryList({ pageId }: Props) {
}, []);
const handleHover = useCallback(
(historyId: string, index: number) => {
(historyId: string) => {
clearPrefetchTimeout();
prefetchTimeoutRef.current = setTimeout(() => {
prefetchPageHistory(historyId);
const prevId = historyItems[index + 1]?.id;
// The true previous snapshot in the FULL list (not the previous visible
// one under the "only versions" filter).
const prevId = resolvePrevSnapshotId(historyItems, historyId);
if (prevId) {
prefetchPageHistory(prevId);
}
@@ -78,9 +100,11 @@ function HistoryList({ pageId }: Props) {
}, [clearPrefetchTimeout]);
const handleSelect = useCallback(
(id: string, index: number) => {
(id: string) => {
setActiveHistoryId(id);
setActiveHistoryPrevId(historyItems[index + 1]?.id ?? "");
// Baseline = true previous snapshot in the FULL list, so the "only
// versions" filter never diffs/restores against the wrong item.
setActiveHistoryPrevId(resolvePrevSnapshotId(historyItems, id));
},
[historyItems, setActiveHistoryId, setActiveHistoryPrevId],
);
@@ -128,12 +152,27 @@ function HistoryList({ pageId }: Props) {
return (
<div>
<Group px="xs" py={6} justify="flex-end">
<Switch
size="xs"
checked={onlyVersions}
onChange={(e) => setOnlyVersions(e.currentTarget.checked)}
label={t("Only versions")}
/>
</Group>
<ScrollArea h={620} w="100%" type="scroll" scrollbarSize={5}>
{historyItems.map((historyItem, index) => (
{onlyVersions && visibleItems.length === 0 && (
<Center py="md">
<Text size="sm" c="dimmed">
{t("No saved versions yet.")}
</Text>
</Center>
)}
{visibleItems.map((historyItem) => (
<HistoryItem
key={historyItem.id}
historyItem={historyItem}
index={index}
onSelect={handleSelect}
onHover={handleHover}
onHoverEnd={clearPrefetchTimeout}
@@ -24,6 +24,10 @@ export interface IPageHistory {
updatedAt: string;
lastUpdatedBy: IPageHistoryUser;
contributors?: IPageHistoryUser[];
// #370 — intentionality tier: 'manual'/'agent' are versions (intentional
// points), 'idle'/'boundary' are autosnapshots; null/undefined = legacy
// autosave. Derived server-side, drives the history badge + "versions" filter.
kind?: "manual" | "agent" | "idle" | "boundary" | null;
// Provenance markers copied off the page row when the snapshot was saved.
// `'agent'` marks a version written by the AI agent; `lastUpdatedAiChatId`
// (when present) deep-links to the chat that produced the edit.
@@ -0,0 +1,42 @@
import { describe, it, expect } from "vitest";
import { resolvePrevSnapshotId } from "./resolve-prev-snapshot";
// #370 F4 — the risky client path: with the "only versions" filter active, diff
// and restore must still baseline against the TRUE previous snapshot in the FULL
// list, never the previous VISIBLE version (which would skip the autosnapshots
// between two versions). These pin that the resolution is by FULL-list order.
describe("resolvePrevSnapshotId", () => {
// Newest-first, as the history list stores it: a version, then two autosaves,
// then an older version.
const full = [
{ id: "v2", kind: "manual" },
{ id: "a2", kind: "idle" },
{ id: "a1", kind: "boundary" },
{ id: "v1", kind: "manual" },
{ id: "a0", kind: null },
];
it("returns the immediate FULL-list successor, not the previous visible version", () => {
// Selecting v2 while filtered to versions-only must baseline against a2 (the
// real chronological predecessor), NOT v1 (the previous visible version).
expect(resolvePrevSnapshotId(full, "v2")).toBe("a2");
});
it("resolves an autosnapshot's predecessor by full-list order", () => {
expect(resolvePrevSnapshotId(full, "a1")).toBe("v1");
});
it("returns '' for the oldest item (no predecessor)", () => {
expect(resolvePrevSnapshotId(full, "a0")).toBe("");
});
it("returns '' for an id not in the list", () => {
expect(resolvePrevSnapshotId(full, "missing")).toBe("");
});
it("does not depend on a filtered subset — same result whatever is visible", () => {
// The helper only ever sees the full list; a filtered view cannot change the
// baseline it computes.
expect(resolvePrevSnapshotId(full, "v1")).toBe("a0");
});
});
@@ -0,0 +1,22 @@
/**
* #370 — resolve the TRUE previous snapshot for a history item.
*
* The history panel can be filtered to "only versions" (manual/agent), but diff
* and restore must always compare against the immediately-preceding snapshot in
* the FULL, unfiltered list — NOT the previous VISIBLE item. Comparing against
* the previous visible version would silently skip the autosnapshots between two
* versions and diff/restore the wrong baseline.
*
* Given the full (newest-first) list and an item id, this returns the id of the
* item right after it in the full list (its chronological predecessor), or "" if
* it is the oldest / not found. Pure and list-order-preserving so it can be unit
* tested without mounting the component.
*/
export function resolvePrevSnapshotId(
fullItems: ReadonlyArray<{ id: string }>,
id: string,
): string {
const index = fullItems.findIndex((item) => item.id === id);
if (index === -1) return "";
return fullItems[index + 1]?.id ?? "";
}
@@ -0,0 +1,28 @@
/**
* #370 — page-version stateless wire formats. Kept in one place so the client
* emitter (Save hotkey / button) and the client listener (page-editor) agree
* with the server (PersistenceExtension) on the message shapes.
*/
/** Client → server: "save a version now". The server derives the tier
* (manual/agent) from the signed connection actor, never from this payload. */
export const SAVE_VERSION_MESSAGE_TYPE = "save-version";
/** Server → all clients: a version was saved (or promoted / already existed). */
export const VERSION_SAVED_MESSAGE_TYPE = "version.saved";
export interface VersionSavedMessage {
type: typeof VERSION_SAVED_MESSAGE_TYPE;
historyId: string;
kind: "manual" | "agent";
/** True when the latest snapshot was already a manual version (a no-op save). */
alreadySaved: boolean;
}
/**
* Cross-component coordination flag so only the client that pressed Save shows
* the confirmation toast, while every other client silently refreshes its
* history panel on the broadcast. A module-level ref avoids stale-closure
* pitfalls in the editor's long-lived stateless handler.
*/
export const saveVersionPending = { current: false };
@@ -3,6 +3,7 @@ import {
IconArrowRight,
IconArrowsHorizontal,
IconClockHour4,
IconDeviceFloppy,
IconDots,
IconEye,
IconEyeOff,
@@ -17,7 +18,7 @@ import {
IconTrash,
IconWifiOff,
} from "@tabler/icons-react";
import React, { useEffect, useRef, useState } from "react";
import React, { useCallback, useEffect, useRef, useState } from "react";
import { useAsideTriggerProps } from "@/hooks/use-toggle-aside.tsx";
import { useAtom, useAtomValue } from "jotai";
import { historyAtoms } from "@/features/page-history/atoms/history-atoms.ts";
@@ -39,9 +40,14 @@ import { Trans, useTranslation } from "react-i18next";
import ExportModal from "@/components/common/export-modal";
import { htmlToMarkdown } from "@docmost/editor-ext";
import {
collabProviderAtom,
pageEditorAtom,
yjsConnectionStatusAtom,
} from "@/features/editor/atoms/editor-atoms.ts";
import {
SAVE_VERSION_MESSAGE_TYPE,
saveVersionPending,
} from "@/features/page-history/version-messages.ts";
import { formattedDate } from "@/lib/time.ts";
import { PageEditModeToggle } from "@/features/user/components/page-state-pref.tsx";
import MovePageModal from "@/features/page/components/move-page-modal.tsx";
@@ -72,9 +78,34 @@ export default function PageHeaderMenu({ readOnly }: PageHeaderMenuProps) {
});
const isDeleted = !!page?.deletedAt;
const [workspace] = useAtom(workspaceAtom);
const collabProvider = useAtomValue(collabProviderAtom);
// Community public-sharing entry point (replaces the removed EE PageShareModal)
const workspaceSharingDisabled = workspace?.settings?.sharing?.disabled === true;
// #370 — explicit "save a version" (Cmd+S / Save button). One path for the
// human; the server derives the tier from the signed actor. Readers can't save
// (the button is hidden and the collab connection is read-only server-side).
const handleSaveVersion = useCallback(() => {
if (readOnly || !collabProvider) return;
// Flag this client as the initiator so only it shows the confirmation toast;
// a safety timeout clears it if no broadcast comes back (e.g. offline).
saveVersionPending.current = true;
window.setTimeout(() => {
saveVersionPending.current = false;
}, 5000);
collabProvider.sendStateless(
JSON.stringify({ type: SAVE_VERSION_MESSAGE_TYPE }),
);
}, [readOnly, collabProvider]);
// mod+S must also block the browser's "Save page" dialog. `triggerOnContent-
// Editable` + empty ignore-list so it fires while typing in the editor/title.
useHotkeys(
[["mod+S", handleSaveVersion, { preventDefault: true }]],
[],
true,
);
useHotkeys(
[
[
@@ -133,15 +164,16 @@ export default function PageHeaderMenu({ readOnly }: PageHeaderMenuProps) {
</ActionIcon>
</Tooltip>
<PageActionMenu readOnly={readOnly} />
<PageActionMenu readOnly={readOnly} onSaveVersion={handleSaveVersion} />
</>
);
}
interface PageActionMenuProps {
readOnly?: boolean;
onSaveVersion?: () => void;
}
function PageActionMenu({ readOnly }: PageActionMenuProps) {
function PageActionMenu({ readOnly, onSaveVersion }: PageActionMenuProps) {
const { t } = useTranslation();
const [, setHistoryModalOpen] = useAtom(historyAtoms);
const clipboard = useClipboard({ timeout: 500 });
@@ -302,6 +334,20 @@ function PageActionMenu({ readOnly }: PageActionMenuProps) {
</Group>
</Menu.Item>
{!readOnly && (
<Menu.Item
leftSection={<IconDeviceFloppy size={16} />}
onClick={onSaveVersion}
rightSection={
<Text size="xs" c="dimmed">
{t("Ctrl+S")}
</Text>
}
>
{t("Save version")}
</Menu.Item>
)}
<Menu.Item
leftSection={<IconHistory size={16} />}
onClick={openHistoryModal}
+4 -6
View File
@@ -23,7 +23,7 @@
"migration:reset": "tsx src/database/migrate.ts down-to NO_MIGRATIONS",
"migration:codegen": "kysely-codegen --dialect=postgres --camel-case --env-file=../../.env --out-file=./src/database/types/db.d.ts",
"lint": "eslint \"{src,apps,libs,test}/**/*.ts\" --fix",
"pretest": "pnpm --filter @docmost/editor-ext build && pnpm --filter @docmost/prosemirror-markdown build",
"pretest": "pnpm --filter @docmost/editor-ext build",
"test": "jest",
"test:int": "jest --config test/jest-integration.json",
"test:watch": "jest --watch",
@@ -43,7 +43,6 @@
"@clickhouse/client": "^1.18.2",
"@docmost/mcp": "workspace:*",
"@docmost/pdf-inspector": "1.9.6",
"@docmost/prosemirror-markdown": "workspace:*",
"@fastify/cookie": "^11.0.2",
"@fastify/multipart": "^10.0.0",
"@fastify/static": "^9.1.3",
@@ -176,7 +175,7 @@
"/node_modules/"
],
"transform": {
"(happy-dom.+|prosemirror-markdown/build/.+)\\.js$": [
"happy-dom.+\\.js$": [
"babel-jest",
{
"presets": [
@@ -194,7 +193,7 @@
"^.+\\.(t|j)sx?$": "ts-jest"
},
"transformIgnorePatterns": [
"/node_modules/(?!(\\.pnpm/)?(nanoid|uuid|image-dimensions|marked|happy-dom|lib0|@docmost/prosemirror-markdown)(@|/))"
"/node_modules/(?!(\\.pnpm/)?(nanoid|uuid|image-dimensions|marked|happy-dom|lib0)(@|/))"
],
"collectCoverageFrom": [
"**/*.(t|j)s"
@@ -205,8 +204,7 @@
"^@docmost/db/(.*)$": "<rootDir>/database/$1",
"^@docmost/transactional/(.*)$": "<rootDir>/integrations/transactional/$1",
"^@docmost/ee/(.*)$": "<rootDir>/ee/$1",
"^src/(.*)$": "<rootDir>/$1",
"^@tiptap/react$": "<rootDir>/../test/stubs/tiptap-react.js"
"^src/(.*)$": "<rootDir>/$1"
}
}
}
@@ -43,6 +43,7 @@ import {
Column,
Status,
addUniqueIdsToDoc,
htmlToMarkdown,
TransclusionSource,
TransclusionReference,
FootnoteReference,
@@ -50,7 +51,6 @@ import {
FootnoteDefinition,
PageEmbed,
} from '@docmost/editor-ext';
import { convertProseMirrorToMarkdown } from '@docmost/prosemirror-markdown';
import { generateText, getSchema, JSONContent } from '@tiptap/core';
import { generateHTML, generateJSON } from '../common/helpers/prosemirror/html';
// @tiptap/html library works best for generating prosemirror json state but not HTML
@@ -239,10 +239,6 @@ export function prosemirrorNodeToYElement(node: any): Y.XmlElement | Y.XmlText {
}
export function jsonToMarkdown(tiptapJson: any): string {
// Direct ProseMirror JSON -> Markdown via the canonical converter
// (`@docmost/prosemirror-markdown`) — no HTML intermediate, no second
// editor-ext markdown layer. Same serializer as the page/space export and the
// git-sync vault writer, so every server PM->MD path emits identical canonical
// markdown (issue #345).
return convertProseMirrorToMarkdown(tiptapJson);
const html = jsonToHtml(tiptapJson);
return htmlToMarkdown(html);
}
+29 -3
View File
@@ -1,3 +1,29 @@
export const HISTORY_INTERVAL = 5 * 60 * 1000;
export const HISTORY_FAST_INTERVAL = 60 * 1000;
export const HISTORY_FAST_THRESHOLD = 5 * 60 * 1000;
/**
* #370 page-history intentionality tiers. Domain of `page_history.kind`.
* - 'manual' / 'agent' Tier 1 versions (intentional points)
* - 'idle' / 'boundary' Tier 0 autosnapshots (safety net)
* A legacy `null` kind is treated as an autosave.
*/
export type PageHistoryKind = 'manual' | 'agent' | 'idle' | 'boundary';
/**
* #370 trailing idle-flush windows. A page's pending idle snapshot is
* re-armed on every store and fires this long after edits go quiet, so a burst
* of edits collapses into a single autosnapshot instead of one-per-store. Human
* sessions are noisier and less risky, so they flush less often than the agent.
*/
export const IDLE_INTERVAL_USER = 60 * 60 * 1000; // 60m
export const IDLE_INTERVAL_AGENT = 15 * 60 * 1000; // 15m
/**
* #370 max-wait ceiling for the idle flush. Pure trailing debounce starves the
* safety net: hocuspocus stores at least every ~45s, so a CONTINUOUS editing
* session would re-arm the trailing timer forever and never take an idle
* snapshot until edits finally go quiet (up to IDLE_INTERVAL_USER = 60m). This
* ceiling bounds the actual wait from the FIRST edit of a burst, so an idle
* snapshot fires at least this often during a long unbroken session restoring
* a recovery point cadence closer to the old heuristic without one-per-store
* noise. Mirrors hocuspocus's own maxDebounce idea.
*/
export const IDLE_MAX_WAIT_USER = 10 * 60 * 1000; // 10m
export const IDLE_MAX_WAIT_AGENT = 5 * 60 * 1000; // 5m
@@ -1,84 +1,93 @@
import { computeHistoryJob, resolveSource } from './persistence.extension';
import {
computeHistoryJob,
resolveSource,
} from './persistence.extension';
import {
HISTORY_FAST_INTERVAL,
HISTORY_FAST_THRESHOLD,
HISTORY_INTERVAL,
IDLE_INTERVAL_AGENT,
IDLE_INTERVAL_USER,
IDLE_MAX_WAIT_AGENT,
IDLE_MAX_WAIT_USER,
} from '../constants';
// A fixed clock + fixed createdAt make pageAge deterministic.
const NOW = 1_700_000_000_000;
const PAGE_ID = '550e8400-e29b-41d4-a716-446655440000';
// Build a minimal page whose age (NOW - createdAt) is exactly `ageMs`.
const pageAged = (ageMs: number) => ({
id: PAGE_ID,
createdAt: new Date(NOW - ageMs),
});
const page = { id: PAGE_ID };
describe('computeHistoryJob', () => {
it('agent edit → delay MUST be 0 and job id is source-keyed', () => {
// INVARIANT (§15 H2 / persistence.extension): the agent delay MUST stay 0.
// The worker re-reads the page row at run time, so any non-zero delay risks
// snapshotting content a later human edit has already overwritten. This is
// the load-bearing assertion of this spec — do not relax it.
const { jobId, delay } = computeHistoryJob(pageAged(0), 'agent', NOW);
expect(delay).toBe(0);
expect(jobId).toBe(`${PAGE_ID}-agent`);
});
it('agent edit on an OLD page is still delay 0 (age never applies to agents)', () => {
// Even when the page is far older than the fast threshold, the agent path
// must short-circuit to 0 — age-based debounce is a human-only concern.
const { jobId, delay } = computeHistoryJob(
pageAged(HISTORY_FAST_THRESHOLD + 60_000),
'agent',
NOW,
);
expect(delay).toBe(0);
expect(jobId).toBe(`${PAGE_ID}-agent`);
});
it('human edit on a YOUNG page (age < threshold) → fast interval, bare job id', () => {
const { jobId, delay } = computeHistoryJob(
pageAged(HISTORY_FAST_THRESHOLD - 1),
'user',
NOW,
);
expect(delay).toBe(HISTORY_FAST_INTERVAL);
describe('computeHistoryJob (#370 — shared trailing idle pipeline)', () => {
it('human edit → user idle window, bare page.id job', () => {
// Humans and the agent now share ONE idle job per page (jobId = page.id).
// The agent's old delay=0 fast path is GONE — intentional agent points now
// arrive via the explicit save-version signal, not a zero-delay snapshot.
const { jobId, delay } = computeHistoryJob(page, 'user');
expect(delay).toBe(IDLE_INTERVAL_USER);
expect(jobId).toBe(PAGE_ID);
});
it('human edit on an OLD page (age > threshold) → standard interval', () => {
const { jobId, delay } = computeHistoryJob(
pageAged(HISTORY_FAST_THRESHOLD + 1),
'user',
NOW,
);
expect(delay).toBe(HISTORY_INTERVAL);
it('agent edit → agent idle window (shorter), still the bare page.id job', () => {
const { jobId, delay } = computeHistoryJob(page, 'agent');
expect(delay).toBe(IDLE_INTERVAL_AGENT);
// No `-agent` suffix anymore: the agent joins the common idle pipeline.
expect(jobId).toBe(PAGE_ID);
});
it('boundary: pageAge EXACTLY === threshold takes the slow branch (the `<` is strict)', () => {
// Off-by-one guard: the condition is `pageAge < HISTORY_FAST_THRESHOLD`, so
// an age of exactly the threshold is NOT "fast" — it must use HISTORY_INTERVAL.
const { delay } = computeHistoryJob(
pageAged(HISTORY_FAST_THRESHOLD),
'user',
NOW,
);
expect(delay).toBe(HISTORY_INTERVAL);
it('agent flushes sooner than a human', () => {
expect(IDLE_INTERVAL_AGENT).toBeLessThan(IDLE_INTERVAL_USER);
});
it('treats any non-"agent" source string as human', () => {
// resolveSource only ever yields 'agent' | 'user', but guard the contract:
// the agent branch keys strictly on === 'agent'.
const { jobId, delay } = computeHistoryJob(pageAged(0), 'user', NOW);
expect(delay).toBe(HISTORY_FAST_INTERVAL);
it('treats any non-"agent" source string as human (keys strictly on === agent)', () => {
const { jobId, delay } = computeHistoryJob(page, 'user');
expect(delay).toBe(IDLE_INTERVAL_USER);
expect(jobId).toBe(PAGE_ID);
});
// #370 review round-1 WARNING: the max-wait ceiling prevents autosnapshot
// starvation during a continuous editing session (the trailing timer would
// otherwise re-arm forever and never fire).
describe('max-wait ceiling', () => {
const T0 = 1_000_000; // arbitrary fixed epoch for deterministic tests
it('once a burst is armed, delay clamps to the remaining max-wait budget', () => {
// 1 minute into the burst the USER interval (60m) far exceeds the remaining
// max-wait budget (10m - 1m = 9m), so the delay is clamped DOWN to that
// remaining budget — the full interval is NOT used once a ceiling applies.
const { delay } = computeHistoryJob(page, 'user', T0, T0 + 60_000);
expect(delay).toBe(IDLE_MAX_WAIT_USER - 60_000);
});
it('never waits longer than the max-wait budget from the burst start', () => {
// A store arriving right at the ceiling → delay 0 (fire promptly).
const { delay } = computeHistoryJob(
page,
'user',
T0,
T0 + IDLE_MAX_WAIT_USER,
);
expect(delay).toBe(0);
});
it('past the ceiling never returns a negative delay', () => {
const { delay } = computeHistoryJob(
page,
'user',
T0,
T0 + IDLE_MAX_WAIT_USER + 5 * 60_000,
);
expect(delay).toBe(0);
});
it('the agent ceiling is shorter than the user ceiling', () => {
expect(IDLE_MAX_WAIT_AGENT).toBeLessThan(IDLE_MAX_WAIT_USER);
const { delay } = computeHistoryJob(
page,
'agent',
T0,
T0 + IDLE_MAX_WAIT_AGENT,
);
expect(delay).toBe(0);
});
it('without a burstStart there is no ceiling (backward-compatible)', () => {
expect(computeHistoryJob(page, 'user').delay).toBe(IDLE_INTERVAL_USER);
expect(computeHistoryJob(page, 'agent').delay).toBe(IDLE_INTERVAL_AGENT);
});
});
});
describe('resolveSource (truth table)', () => {
@@ -40,11 +40,12 @@ describe('PersistenceExtension.onStoreDocument — Approach-A boundary snapshot'
let pageHistoryRepo: {
saveHistory: jest.Mock;
findPageLastHistory: jest.Mock;
updateHistoryKind: jest.Mock;
};
let aiQueue: { add: jest.Mock };
let historyQueue: { add: jest.Mock };
let historyQueue: { add: jest.Mock; remove: jest.Mock };
let notificationQueue: { add: jest.Mock };
let collabHistory: { addContributors: jest.Mock };
let collabHistory: { addContributors: jest.Mock; popContributors: jest.Mock };
let transclusionService: {
syncPageTransclusions: jest.Mock;
syncPageReferences: jest.Mock;
@@ -93,13 +94,22 @@ describe('PersistenceExtension.onStoreDocument — Approach-A boundary snapshot'
pageHistoryRepo = {
saveHistory: jest.fn().mockImplementation(async () => {
callOrder.push('saveHistory');
return { id: 'history-1' };
}),
findPageLastHistory: jest.fn().mockResolvedValue(null),
updateHistoryKind: jest.fn().mockResolvedValue(undefined),
};
aiQueue = { add: jest.fn().mockResolvedValue(undefined) };
historyQueue = { add: jest.fn().mockResolvedValue(undefined) };
historyQueue = {
add: jest.fn().mockResolvedValue(undefined),
// #370 — enqueuePageHistory now removes any pending idle job before re-adding.
remove: jest.fn().mockResolvedValue(undefined),
};
notificationQueue = { add: jest.fn().mockResolvedValue(undefined) };
collabHistory = { addContributors: jest.fn().mockResolvedValue(undefined) };
collabHistory = {
addContributors: jest.fn().mockResolvedValue(undefined),
popContributors: jest.fn().mockResolvedValue([]),
};
transclusionService = {
syncPageTransclusions: jest.fn().mockResolvedValue(undefined),
syncPageReferences: jest.fn().mockResolvedValue(undefined),
@@ -165,6 +175,50 @@ describe('PersistenceExtension.onStoreDocument — Approach-A boundary snapshot'
expect(pageRepo.updatePage.mock.calls[0][0].lastUpdatedSource).toBe('user');
});
// #370 review round-1 SUGGESTION: the boundary was GENERALIZED from a
// user→agent special-case to ANY lastUpdatedSource transition. These pin the
// generalized behaviour it was rebuilt for.
describe('generalized boundary — any source transition', () => {
// Same persisted page but with an explicit prior source.
const pageWithPriorSource = (prior: string | null) => ({
...persistedHumanPage('NEW CONTENT'),
lastUpdatedSource: prior,
});
it('agent→user transition fires the boundary (pins the prior agent revision)', async () => {
const document = ydocFor(doc('NEW CONTENT'));
pageRepo.findById.mockResolvedValue(pageWithPriorSource('agent'));
pageHistoryRepo.findPageLastHistory.mockResolvedValue(null);
await ext.onStoreDocument(buildData(document, 'user') as any);
expect(pageHistoryRepo.saveHistory).toHaveBeenCalledTimes(1);
expect(callOrder).toEqual(['saveHistory', 'updatePage']);
expect(pageRepo.updatePage.mock.calls[0][0].lastUpdatedSource).toBe('user');
});
it('git→user transition fires the boundary (git-sync overwrite is a source change)', async () => {
const document = ydocFor(doc('NEW CONTENT'));
pageRepo.findById.mockResolvedValue(pageWithPriorSource('git'));
pageHistoryRepo.findPageLastHistory.mockResolvedValue(null);
await ext.onStoreDocument(buildData(document, 'user') as any);
expect(pageHistoryRepo.saveHistory).toHaveBeenCalledTimes(1);
expect(callOrder).toEqual(['saveHistory', 'updatePage']);
});
it('a null prior source (first-ever edit) does NOT fire the boundary', async () => {
const document = ydocFor(doc('NEW CONTENT'));
pageRepo.findById.mockResolvedValue(pageWithPriorSource(null));
await ext.onStoreDocument(buildData(document, 'agent') as any);
expect(pageHistoryRepo.saveHistory).not.toHaveBeenCalled();
expect(pageRepo.updatePage).toHaveBeenCalledTimes(1);
});
});
it('idempotency: unchanged content → no updatePage, no history, no queues', async () => {
// The Y.Doc content equals the persisted content deeply → early skip.
// A Y.Doc round-trip normalizes attrs (e.g. paragraph indent), so derive
@@ -469,4 +523,125 @@ describe('PersistenceExtension.onStoreDocument — Approach-A boundary snapshot'
// Contributors keyed by the UUID so they match the PAGE_HISTORY job (page.id).
expect(collabHistory.addContributors.mock.calls[0][0]).toBe(PAGE_ID);
});
// #370 — explicit save-version (Cmd+S / agent save tool) over the stateless
// seam. The tier is derived from the SIGNED connection actor, the store path
// is reused, and promote-not-dup avoids duplicating heavy content rows.
describe('save-version (#370)', () => {
const emitSave = (document: any, actor: 'user' | 'agent') =>
ext.onStateless({
connection: {
readOnly: false,
context: { user: { id: USER_ID, name: 'Alice' }, actor },
} as any,
documentName: `page.${PAGE_ID}`,
document: document as any,
payload: JSON.stringify({ type: 'save-version' }),
} as any);
// findById returns a page whose content already equals the live doc, so the
// store path is a no-op and we isolate the versioning decision.
const pageMatchingDoc = (document: any) => ({
...persistedHumanPage('IGNORED'),
content: TiptapTransformer.fromYdoc(document, 'default'),
});
it('human save with no prior snapshot → writes a manual version + broadcasts', async () => {
const document = ydocFor(doc('VERSION ME'));
pageRepo.findById.mockResolvedValue(pageMatchingDoc(document));
pageHistoryRepo.findPageLastHistory.mockResolvedValue(null);
await emitSave(document, 'user');
expect(pageHistoryRepo.saveHistory).toHaveBeenCalledTimes(1);
expect(pageHistoryRepo.saveHistory.mock.calls[0][1]).toEqual(
expect.objectContaining({ kind: 'manual' }),
);
// The pending idle autosnapshot is cancelled by the explicit version.
expect(historyQueue.remove).toHaveBeenCalledWith(PAGE_ID);
const msg = JSON.parse(
(document as any).broadcastStateless.mock.calls[(document as any).broadcastStateless.mock.calls.length - 1][0],
);
expect(msg).toMatchObject({
type: 'version.saved',
kind: 'manual',
alreadySaved: false,
});
});
it('agent save derives kind=agent from the signed actor', async () => {
const document = ydocFor(doc('AGENT VERSION'));
pageRepo.findById.mockResolvedValue(pageMatchingDoc(document));
pageHistoryRepo.findPageLastHistory.mockResolvedValue(null);
await emitSave(document, 'agent');
expect(pageHistoryRepo.saveHistory.mock.calls[pageHistoryRepo.saveHistory.mock.calls.length - 1][1]).toEqual(
expect.objectContaining({ kind: 'agent' }),
);
});
it('promote-not-dup: latest snapshot is an autosave with identical content → upgrades in place', async () => {
const document = ydocFor(doc('SAME'));
const page = pageMatchingDoc(document);
pageRepo.findById.mockResolvedValue(page);
pageHistoryRepo.findPageLastHistory.mockResolvedValue({
id: 'auto-1',
content: page.content,
kind: 'idle',
});
await emitSave(document, 'user');
// No heavy new content row — the existing autosave is promoted to manual.
expect(pageHistoryRepo.updateHistoryKind).toHaveBeenCalledWith(
'auto-1',
'manual',
expect.anything(),
);
expect(pageHistoryRepo.saveHistory).not.toHaveBeenCalled();
const msg = JSON.parse(
(document as any).broadcastStateless.mock.calls[(document as any).broadcastStateless.mock.calls.length - 1][0],
);
expect(msg).toMatchObject({ historyId: 'auto-1', alreadySaved: false });
});
it('no-op when the latest snapshot is already a manual version of this content', async () => {
const document = ydocFor(doc('ALREADY SAVED'));
const page = pageMatchingDoc(document);
pageRepo.findById.mockResolvedValue(page);
pageHistoryRepo.findPageLastHistory.mockResolvedValue({
id: 'ver-1',
content: page.content,
kind: 'manual',
});
await emitSave(document, 'user');
expect(pageHistoryRepo.updateHistoryKind).not.toHaveBeenCalled();
expect(pageHistoryRepo.saveHistory).not.toHaveBeenCalled();
const msg = JSON.parse(
(document as any).broadcastStateless.mock.calls[(document as any).broadcastStateless.mock.calls.length - 1][0],
);
expect(msg).toMatchObject({ alreadySaved: true, kind: 'manual' });
});
it('a read-only connection cannot save a version', async () => {
const document = ydocFor(doc('READER'));
pageRepo.findById.mockResolvedValue(pageMatchingDoc(document));
await ext.onStateless({
connection: {
readOnly: true,
context: { user: { id: USER_ID }, actor: 'user' },
} as any,
documentName: `page.${PAGE_ID}`,
document: document as any,
payload: JSON.stringify({ type: 'save-version' }),
} as any);
expect(pageHistoryRepo.saveHistory).not.toHaveBeenCalled();
expect(pageHistoryRepo.updateHistoryKind).not.toHaveBeenCalled();
});
});
});
@@ -36,9 +36,11 @@ import {
import { Page } from '@docmost/db/types/entity.types';
import { CollabHistoryService } from '../services/collab-history.service';
import {
HISTORY_FAST_INTERVAL,
HISTORY_FAST_THRESHOLD,
HISTORY_INTERVAL,
IDLE_INTERVAL_AGENT,
IDLE_INTERVAL_USER,
IDLE_MAX_WAIT_AGENT,
IDLE_MAX_WAIT_USER,
PageHistoryKind,
} from '../constants';
import { TransclusionService } from '../../core/page/transclusion/transclusion.service';
import { observeCollabStore } from '../../integrations/metrics/metrics.registry';
@@ -51,6 +53,16 @@ import { observeCollabStore } from '../../integrations/metrics/metrics.registry'
*/
export const INTENTIONAL_CLEAR_MESSAGE_TYPE = 'intentional-clear';
/**
* #370 wire format of the clientserver "save a version" signal. Sent by the
* human (Cmd+S / Save button) and by the agent's explicit save tool over the
* SAME stateless channel. The intentionality tier ('manual' vs 'agent') is
* derived SERVER-SIDE from the signed connection actor, never from this
* payload, so a version's type is unforgeable. The document is taken from the
* connection (not the payload), so the signal cannot be aimed at another page.
*/
export const SAVE_VERSION_MESSAGE_TYPE = 'save-version';
/**
* #251 how long an intentional-clear signal stays "pending" before it is
* ignored. The signal is set on the clearing keystroke but consumed by the
@@ -87,35 +99,39 @@ export function resolveSource(
}
/**
* Compute the BullMQ job id + delay for a page-history snapshot job. Pure so
* the data-loss-sensitive timing arithmetic is unit-testable; `now` is injected
* (caller passes `Date.now()`) for determinism.
* #370 compute the BullMQ job id + delay for a page's trailing idle-flush
* autosnapshot. Pure so the timing is unit-testable.
*
* - Agent edits: delay 0 and a source-keyed job id `${page.id}-agent`. The
* delay MUST stay 0 the worker re-reads the page row at run time, so any
* delay risks reading content a later human edit has already overwritten
* (mis-tagged snapshot). 0 minimizes that window. The `-agent` suffix keeps
* the job from coalescing with the bare-page.id human job.
* - Human edits: age-based debounce so rapid human edits coalesce into one
* snapshot; job id is the bare `page.id`.
*
* BullMQ forbids ':' in custom job ids (Redis key separator), so '-' is used;
* page.id is a UUID, so `${page.id}-agent` cannot collide with a human job.
* Both humans and the agent now share ONE idle pipeline (the agent's old
* `delay=0` fast path is gone intentional agent points arrive via the
* explicit save-version signal instead). The job id is the bare `page.id`, so a
* page has at most one pending idle job; the caller removes-and-re-adds it on
* every store to keep it debounced to the trailing edge of an edit burst. The
* window differs by source only: the agent flushes sooner than a human.
*/
export function computeHistoryJob(
page: Pick<Page, 'id' | 'createdAt'>,
page: Pick<Page, 'id'>,
source: string,
now: number,
// Epoch ms of the FIRST edit in the current burst (when the pending idle job
// was first armed). Used to enforce the max-wait ceiling so a continuous
// editing session cannot re-arm the trailing timer forever. `now` is injectable
// for tests; both default to a live clock / no ceiling when omitted.
burstStart?: number,
now: number = Date.now(),
): { jobId: string; delay: number } {
const isAgent = source === 'agent';
const pageAge = now - new Date(page.createdAt).getTime();
const delay = isAgent
? 0
: pageAge < HISTORY_FAST_THRESHOLD
? HISTORY_FAST_INTERVAL
: HISTORY_INTERVAL;
const jobId = isAgent ? `${page.id}-agent` : page.id;
return { jobId, delay };
const interval = isAgent ? IDLE_INTERVAL_AGENT : IDLE_INTERVAL_USER;
const maxWait = isAgent ? IDLE_MAX_WAIT_AGENT : IDLE_MAX_WAIT_USER;
let delay = interval;
if (burstStart !== undefined) {
// Time already elapsed since the burst's first edit; the snapshot must fire
// no later than `maxWait` after that, so shrink the trailing delay to the
// remaining budget (never negative, so BullMQ fires it promptly).
const remaining = burstStart + maxWait - now;
delay = Math.max(0, Math.min(interval, remaining));
}
return { jobId: page.id, delay };
}
@Injectable()
@@ -127,6 +143,11 @@ export class PersistenceExtension implements Extension {
// coalescing window" per document and OR it across all edits in the window,
// so the snapshot is marked 'agent' regardless of who wrote last.
private agentTouched: Map<string, boolean> = new Map();
// #370 — epoch ms of the FIRST edit in the current idle-flush burst, per page.
// Set when the pending idle job is first armed (empty entry), read to enforce
// the max-wait ceiling in computeHistoryJob, and cleared when the idle job is
// consumed/cancelled so the next burst starts a fresh window.
private idleBurstStart: Map<string, number> = new Map();
// #251 — per-document "intentional clear pending" flags. Keyed by
// documentName, value = expiry timestamp (ms). Set by onStateless when the
// client reports a deliberate clear; consumed once by the next
@@ -326,20 +347,19 @@ export class PersistenceExtension implements Extension {
//this.logger.debug('Contributors error:' + err?.['message']);
}
// Approach A — boundary snapshot before the agent's first edit.
// When this store is the agent's and the page's currently persisted
// state was authored by a human, pin that human state as its own
// history version BEFORE the agent overwrites it. `page` still holds
// the OLD content/provenance here, so saveHistory(page) captures the
// pre-agent state tagged 'user'. The agent's new content is
// snapshotted later by the debounced PAGE_HISTORY job ('agent'). Skip
// if the prior state is already agent-authored (boundary already
// pinned on the user->agent transition), if the page is effectively
// empty, or if the latest existing snapshot already equals this human
// state (avoid duplicates).
// #370 — boundary snapshot on ANY source transition. When the store
// flips the page's provenance (user↔agent↔git), pin the OUTGOING
// state as its own history version BEFORE the incoming source
// overwrites it. `page` still holds the OLD content/provenance here,
// so saveHistory(page) captures the pre-transition state tagged with
// its own source, kind='boundary'. The incoming content is snapshotted
// later by the debounced idle job. Skip if the page is effectively
// empty or if the latest existing snapshot already equals this state
// (the shared isDeepStrictEqual gate — avoids duplicates). Generalizing
// beyond the old user→agent special-case also covers git-sync for free.
if (
lastUpdatedSource === 'agent' &&
page.lastUpdatedSource !== 'agent'
page.lastUpdatedSource &&
page.lastUpdatedSource !== lastUpdatedSource
) {
// pageHistory.pageId is uuid-typed; use page.id (never the doc-name
// slugId) so a `page.<slugId>` doc cannot throw 22P02 here (#260).
@@ -347,15 +367,13 @@ export class PersistenceExtension implements Extension {
page.id,
{ includeContent: true, trx },
);
const humanBaselineMissing =
const baselineMissing =
!lastHistory ||
!isDeepStrictEqual(lastHistory.content, page.content);
if (
!isEmptyParagraphDoc(page.content as any) &&
humanBaselineMissing
) {
if (!isEmptyParagraphDoc(page.content as any) && baselineMissing) {
await this.pageHistoryRepo.saveHistory(page, {
contributorIds: page.contributorIds ?? undefined,
kind: 'boundary',
trx,
});
}
@@ -480,6 +498,14 @@ export class PersistenceExtension implements Extension {
return; // unrelated / malformed stateless message
}
// #370 — explicit "save a version" (human Cmd+S / agent save tool). Edit
// rights are already enforced by the readOnly reject above (a reader can't
// create a version), exactly as intentional-clear requires.
if (message?.type === SAVE_VERSION_MESSAGE_TYPE) {
await this.handleSaveVersion(data);
return;
}
if (message?.type !== INTENTIONAL_CLEAR_MESSAGE_TYPE) return;
this.intentionalClear.set(
@@ -488,6 +514,117 @@ export class PersistenceExtension implements Extension {
);
}
/**
* #370 persist an intentional version from the live in-memory ydoc.
*
* One stateless path serves BOTH the human and the agent; the tier is derived
* SERVER-SIDE from the signed connection actor ('agent' 'agent', anything
* else 'manual'), so the version type cannot be spoofed by the client. We
* take the fresh ydoc from the collab process memory and run it through the
* EXISTING store path first (so pages.content/ydoc reflect the exact content
* being versioned a REST endpoint would race the up-to-10s-stale page row),
* then snapshot it into page_history with the intentional kind.
*
* Promote-not-dup: if the latest history row already holds this exact content
* and it is an autosave (idle/boundary/legacy-null), upgrade its kind in place
* instead of duplicating a heavy content row; if it is already 'manual', it is
* a no-op (the client shows an "already saved" toast). Otherwise a fresh
* version row is written, popping the aggregated contributors from Redis.
*/
private async handleSaveVersion(data: onStatelessPayload): Promise<void> {
const { connection, document, documentName } = data;
const context = connection?.context;
const pageId = getPageId(documentName);
// Unforgeable: 'agent' only for a signed agent connection, else 'manual'.
const kind: PageHistoryKind =
context?.actor === 'agent' ? 'agent' : 'manual';
// Flush the live ydoc through the normal store path so the page row + ydoc
// hold exactly what we are about to version (also fires the idle enqueue we
// supersede below, plus any source-transition boundary). onStoreDocument
// only needs document/documentName/context.
await this.onStoreDocument({
document,
documentName,
context,
} as onStoreDocumentPayload);
let result:
| { historyId: string; kind: PageHistoryKind; alreadySaved: boolean }
| undefined;
await executeTx(this.db, async (trx) => {
const page = await this.pageRepo.findById(pageId, {
withLock: true,
includeContent: true,
trx,
});
if (!page) return;
// Never version an effectively-empty page (mirrors the processor's
// first-history guard); there is nothing intentional to pin.
if (isEmptyParagraphDoc(page.content as any)) return;
const lastHistory = await this.pageHistoryRepo.findPageLastHistory(
page.id,
{ includeContent: true, trx },
);
if (
lastHistory &&
isDeepStrictEqual(lastHistory.content, page.content)
) {
// Content is already snapshotted. Promote-not-dup.
if (lastHistory.kind === 'manual') {
result = {
historyId: lastHistory.id,
kind: 'manual',
alreadySaved: true,
};
return;
}
await this.pageHistoryRepo.updateHistoryKind(
lastHistory.id,
kind,
trx,
);
result = { historyId: lastHistory.id, kind, alreadySaved: false };
return;
}
// Fresh version row. Pop the contributors aggregated since the last
// snapshot (SPOP); restore them if the write fails so they aren't lost.
const contributorIds = await this.collabHistory.popContributors(page.id);
try {
const saved = await this.pageHistoryRepo.saveHistory(page, {
contributorIds,
kind,
trx,
});
result = { historyId: saved.id, kind, alreadySaved: false };
} catch (err) {
await this.collabHistory.addContributors(page.id, contributorIds);
throw err;
}
});
// Housekeeping: this explicit version supersedes the page's pending idle
// autosnapshot, so cancel it (delayed job → remove() just deletes it) and
// end the current idle burst so the next edit starts a fresh max-wait window.
await this.historyQueue.remove(pageId).catch(() => undefined);
this.idleBurstStart.delete(pageId);
if (result) {
document.broadcastStateless(
JSON.stringify({
type: 'version.saved',
historyId: result.historyId,
kind: result.kind,
alreadySaved: result.alreadySaved,
}),
);
}
}
async onChange(data: onChangePayload) {
const documentName = data.documentName;
const userId = data.context?.user?.id;
@@ -545,17 +682,45 @@ export class PersistenceExtension implements Extension {
page: Page,
lastUpdatedSource: string,
): Promise<void> {
// Job id + delay arithmetic lives in the pure `computeHistoryJob` (see its
// doc comment for the agent-delay-0 / age-based-debounce invariants).
// #370 — trailing idle debounce with a max-wait ceiling. One pending idle
// job per page (jobId = page.id); on every store we remove the pending
// delayed job and re-add it, so the snapshot lands `delay` after edits go
// quiet rather than once per store (precedent: workspace.service.ts).
// remove() on a delayed job simply deletes it (0 if absent, no throw); if the
// job is already ACTIVE and the remove is a no-op, the add still de-dups and
// the processor's isDeepStrictEqual gate collapses the duplicate content.
//
// The FIRST arm of a burst records `burstStart`; computeHistoryJob shrinks
// the delay to the remaining max-wait budget from that point, so a continuous
// session cannot re-arm the trailing timer forever and starve the snapshot.
// A burst marker older than THIS TIER's max-wait means the previous idle job
// has already fired — start a fresh window instead of firing immediately on
// the next edit. Must use the SAME source-specific max-wait computeHistoryJob
// uses (agent 5m / user 10m): a hardcoded USER ceiling would leave an agent
// burst's marker stale for 5..10m, forcing delay=0 on every store in that
// window and writing one idle row per store — exactly the per-store bloat the
// debounce exists to prevent, on the continuous-agent path.
const maxWait =
lastUpdatedSource === 'agent' ? IDLE_MAX_WAIT_AGENT : IDLE_MAX_WAIT_USER;
const now = Date.now();
let burstStart = this.idleBurstStart.get(page.id);
if (burstStart === undefined || now - burstStart >= maxWait) {
burstStart = now;
this.idleBurstStart.set(page.id, burstStart);
}
const { jobId, delay } = computeHistoryJob(
page,
lastUpdatedSource,
Date.now(),
burstStart,
now,
);
await this.historyQueue.remove(jobId).catch(() => undefined);
await this.historyQueue.add(
QueueJob.PAGE_HISTORY,
{ pageId: page.id } as IPageHistoryJob,
{ pageId: page.id, kind: 'idle' } as IPageHistoryJob,
{ jobId, delay },
);
}
@@ -66,6 +66,15 @@ describe('HistoryProcessor.process', () => {
notificationQueue = { add: jest.fn().mockResolvedValue(undefined) };
generalQueue = { add: jest.fn().mockResolvedValue(undefined) };
// #370 F3 — the processor now serializes its find+save under a page-row lock
// via executeTx. A db whose transaction().execute(fn) runs fn with a trx stub
// drives the real executeTx() helper without a database.
const db = {
transaction: () => ({
execute: (fn: (trx: any) => Promise<any>) => fn({ __trx: true }),
}),
};
// WorkerHost's constructor reads `this.worker`; passing repos positionally
// matches the constructor and avoids the Nest DI container.
proc = new HistoryProcessor(
@@ -73,6 +82,7 @@ describe('HistoryProcessor.process', () => {
pageRepo as any,
collabHistory as any,
watcherService as any,
db as any,
notificationQueue as any,
generalQueue as any,
);
@@ -126,15 +136,26 @@ describe('HistoryProcessor.process', () => {
await proc.process(buildJob());
expect(collabHistory.popContributors).toHaveBeenCalledWith(PAGE_ID);
// #370 F3/F9 — the snapshot decision runs under a page-row lock. Pin the lock
// structurally so a refactor that drops withLock/trx (silently reintroducing
// the TOCTOU double-insert) turns this red. The tx stub is { __trx: true }.
expect(pageRepo.findById).toHaveBeenCalledWith(
PAGE_ID,
expect.objectContaining({ withLock: true, trx: { __trx: true } }),
);
// #370 F7 — addPageWatchers MUST receive the trx, or its FK-check runs on a
// separate connection and self-deadlocks against our FOR UPDATE. Asserting
// the trx arg here is exactly what would have caught that regression.
expect(watcherService.addPageWatchers).toHaveBeenCalledWith(
['u1', 'u2'],
PAGE_ID,
SPACE_ID,
WORKSPACE_ID,
{ __trx: true },
);
expect(pageHistoryRepo.saveHistory).toHaveBeenCalledWith(
expect.objectContaining({ id: PAGE_ID }),
{ contributorIds: ['u1', 'u2'] },
{ contributorIds: ['u1', 'u2'], kind: 'idle', trx: { __trx: true } },
);
expect(generalQueue.add).toHaveBeenCalledWith(
QueueJob.PAGE_BACKLINKS,
@@ -186,6 +207,48 @@ describe('HistoryProcessor.process', () => {
]);
});
it('COMMIT failure (throw outside the tx callback) → contributors RESTORED', async () => {
// #370 F8 — a commit-time failure throws OUTSIDE the callback, so the inner
// try/catch does not run; the outer catch must restore the popped set (else a
// BullMQ retry writes an unattributed version). Use a db whose execute() runs
// the callback THEN throws, simulating a commit abort.
pageHistoryRepo.findPageLastHistory.mockResolvedValue({
content: { type: 'doc', content: [] },
});
const commitFail = {
transaction: () => ({
execute: async (fn: (trx: any) => Promise<any>) => {
await fn({ __trx: true }); // callback succeeds (saveHistory ok)
throw new Error('commit aborted'); // ...but the COMMIT fails
},
}),
};
const procCommitFail = new HistoryProcessor(
pageHistoryRepo as any,
pageRepo as any,
collabHistory as any,
watcherService as any,
commitFail as any,
notificationQueue as any,
generalQueue as any,
);
jest
.spyOn(procCommitFail['logger'], 'error')
.mockImplementation(() => undefined);
await expect(procCommitFail.process(buildJob())).rejects.toThrow(
'commit aborted',
);
// The inner catch did NOT run (save succeeded), so only the outer catch can
// restore — assert it did.
expect(collabHistory.addContributors).toHaveBeenCalledWith(PAGE_ID, [
'u1',
'u2',
]);
// And the post-snapshot queue work must NOT have run (we rethrew).
expect(generalQueue.add).not.toHaveBeenCalled();
});
it('backlinks + notification queue failures are swallowed (history still committed)', async () => {
pageHistoryRepo.findPageLastHistory.mockResolvedValue({
content: { type: 'doc', content: [] },
@@ -19,6 +19,9 @@ import { isDeepStrictEqual } from 'node:util';
import { CollabHistoryService } from '../services/collab-history.service';
import { WatcherService } from '../../core/watcher/watcher.service';
import { isEmptyParagraphDoc } from '../collaboration.util';
import { InjectKysely } from 'nestjs-kysely';
import { KyselyDB } from '@docmost/db/types/kysely.types';
import { executeTx } from '@docmost/db/utils';
@Processor(QueueName.HISTORY_QUEUE)
export class HistoryProcessor extends WorkerHost implements OnModuleDestroy {
@@ -29,6 +32,7 @@ export class HistoryProcessor extends WorkerHost implements OnModuleDestroy {
private readonly pageRepo: PageRepo,
private readonly collabHistory: CollabHistoryService,
private readonly watcherService: WatcherService,
@InjectKysely() private readonly db: KyselyDB,
@InjectQueue(QueueName.NOTIFICATION_QUEUE) private notificationQueue: Queue,
@InjectQueue(QueueName.GENERAL_QUEUE) private generalQueue: Queue,
) {
@@ -41,6 +45,9 @@ export class HistoryProcessor extends WorkerHost implements OnModuleDestroy {
try {
const { pageId } = job.data;
// Read the page WITHOUT a lock first, only to bail early on the two cheap
// no-write cases (page gone / empty first snapshot) without opening a
// transaction. The authoritative check-then-write happens locked below.
const page = await this.pageRepo.findById(pageId, {
includeContent: true,
});
@@ -51,40 +58,109 @@ export class HistoryProcessor extends WorkerHost implements OnModuleDestroy {
return;
}
const lastHistory = await this.pageHistoryRepo.findPageLastHistory(
pageId,
{ includeContent: true },
);
// #370 F3 — the snapshot decision (findPageLastHistory → saveHistory) must
// be serialized against manual-save/boundary writers, which run under a
// page-row lock in onStoreDocument. Without it, this processor and a
// concurrent manual-save each read the same lastHistory (MVCC), both see
// content != lastHistory, and both insert — producing two page_history rows
// with IDENTICAL content (one 'idle', one 'manual'), defeating
// promote-not-dup and the version-vs-autosave split. Taking the same
// page-row lock makes the second writer observe the first's committed row so
// the isDeepStrictEqual gate collapses the duplicate. Only the read+write
// is transacted; the post-snapshot queue work stays outside.
let contributorIds: string[] = [];
let snapshotWritten = false;
let lastHistoryContent: unknown;
// #370 F8 — the contributor set popped from Redis (destructive SPOP) must be
// restored if the snapshot does not durably land. The inner try/catch only
// covers a throw INSIDE the callback; a COMMIT failure (connection drop,
// serialization/deadlock abort on commit — the transient class the epic
// already retries) throws OUTSIDE it, rolling the snapshot back while the
// pop is already gone. We track the popped set here and restore it in the
// outer catch so a BullMQ retry re-attributes the version. addContributors
// is an idempotent Redis SADD, so a double-restore is harmless.
let poppedForRestore: string[] = [];
if (!lastHistory && isEmptyParagraphDoc(page.content as any)) {
this.logger.debug(
`Skipping first history for page ${pageId}: empty content`,
);
await this.collabHistory.clearContributors(pageId);
try {
await executeTx(this.db, async (trx) => {
const lockedPage = await this.pageRepo.findById(pageId, {
includeContent: true,
withLock: true,
trx,
});
if (!lockedPage) return;
const lastHistory = await this.pageHistoryRepo.findPageLastHistory(
pageId,
{ includeContent: true, trx },
);
lastHistoryContent = lastHistory?.content;
if (!lastHistory && isEmptyParagraphDoc(lockedPage.content as any)) {
this.logger.debug(
`Skipping first history for page ${pageId}: empty content`,
);
return;
}
if (
lastHistory &&
isDeepStrictEqual(lastHistory.content, lockedPage.content)
) {
return; // already snapshotted at this content — nothing to write
}
contributorIds = await this.collabHistory.popContributors(pageId);
poppedForRestore = contributorIds;
try {
// Pass `trx` so the watcher insert's FK check (FOR KEY SHARE on
// pages[pageId]) runs on the SAME connection that already holds the
// FOR UPDATE lock from findById — otherwise it takes the FK lock on a
// separate pool connection and self-deadlocks against our own tx.
await this.watcherService.addPageWatchers(
contributorIds,
pageId,
lockedPage.spaceId,
lockedPage.workspaceId,
trx,
);
// #370 — every job on this queue is a trailing idle-flush autosnapshot.
await this.pageHistoryRepo.saveHistory(lockedPage, {
contributorIds,
kind: job.data.kind ?? 'idle',
trx,
});
snapshotWritten = true;
this.logger.debug(`History created for page: ${pageId}`);
} catch (err) {
await this.collabHistory.addContributors(pageId, contributorIds);
poppedForRestore = [];
throw err;
}
});
} catch (err) {
// A throw here means the tx did NOT commit (callback threw, or the commit
// itself failed and rolled back). If we popped contributors and the inner
// catch did not already restore them, restore now so the retry keeps
// attribution. snapshotWritten is irrelevant: it is set before commit, so
// it can be true even when the commit rolled the snapshot back.
if (poppedForRestore.length) {
await this.collabHistory.addContributors(pageId, poppedForRestore);
}
throw err;
}
// No snapshot written (page vanished / empty-first / unchanged content) →
// clear the contributor set for the skip cases and stop.
if (!snapshotWritten) {
if (!lastHistoryContent && isEmptyParagraphDoc(page.content as any)) {
await this.collabHistory.clearContributors(pageId);
}
return;
}
if (
!lastHistory ||
!isDeepStrictEqual(lastHistory.content, page.content)
) {
const contributorIds = await this.collabHistory.popContributors(pageId);
try {
await this.watcherService.addPageWatchers(
contributorIds,
pageId,
page.spaceId,
page.workspaceId,
);
await this.pageHistoryRepo.saveHistory(page, { contributorIds });
this.logger.debug(`History created for page: ${pageId}`);
} catch (err) {
await this.collabHistory.addContributors(pageId, contributorIds);
throw err;
}
{
const mentions = extractMentions(page.content);
const pageMentions = extractPageMentions(mentions);
const internalLinkSlugIds = extractInternalLinkSlugIds(page.content);
@@ -102,7 +178,7 @@ export class HistoryProcessor extends WorkerHost implements OnModuleDestroy {
);
});
if (contributorIds.length > 0 && lastHistory?.content) {
if (contributorIds.length > 0 && lastHistoryContent) {
await this.notificationQueue
.add(QueueJob.PAGE_UPDATED, {
pageId,
@@ -52,9 +52,7 @@ import {
INTERNAL_LINK_REGEX,
extractPageSlugId,
} from '../../../integrations/export/utils';
import { canonicalizeFootnotes } from '@docmost/editor-ext';
import { markdownToProseMirror } from '@docmost/prosemirror-markdown';
import { normalizeForeignMarkdown } from '../../../integrations/import/utils/foreign-markdown';
import { markdownToHtml, canonicalizeFootnotes } from '@docmost/editor-ext';
import { WatcherService } from '../../watcher/watcher.service';
import { sql } from 'kysely';
import { TransclusionService } from '../transclusion/transclusion.service';
@@ -1303,14 +1301,8 @@ export class PageService {
switch (format) {
case 'markdown': {
// Canonical markdown -> ProseMirror JSON directly via
// `@docmost/prosemirror-markdown` (issue #345) — no HTML intermediate,
// no editor-ext markdown layer. Foreign markdown surfaces the strict
// parser rejects (GFM `[^id]` reference footnotes) are normalized to the
// canonical inline form first.
prosemirrorJson = await markdownToProseMirror(
normalizeForeignMarkdown(content as string),
);
const html = await markdownToHtml(content as string);
prosemirrorJson = htmlToJson(html as string);
break;
}
case 'html': {
@@ -0,0 +1,27 @@
import { type Kysely } from 'kysely';
/**
* #370 page-versioning intentionality tier on a history snapshot.
*
* Adds `page_history.kind`, the three-tier "how intentional was this snapshot"
* marker that lets versions (intentional points) be told apart from autosaves:
* - 'manual' a human explicitly saved a version (Cmd+S / Save button)
* - 'agent' the AI agent explicitly saved a version
* - 'idle' trailing idle-flush autosnapshot (safety net)
* - 'boundary' autosnapshot pinned on a source transition (useragentgit)
*
* Nullable with NO default (mirrors last_updated_source in the agent-provenance
* migration): legacy rows predate the marker and read back as `null`, which the
* client renders as a plain autosave. Stored as a short varchar to stay
* forward-compatible without an enum migration.
*/
export async function up(db: Kysely<any>): Promise<void> {
await db.schema
.alterTable('page_history')
.addColumn('kind', 'varchar(20)', (col) => col)
.execute();
}
export async function down(db: Kysely<any>): Promise<void> {
await db.schema.alterTable('page_history').dropColumn('kind').execute();
}
@@ -13,6 +13,7 @@ import { jsonArrayFrom, jsonObjectFrom } from 'kysely/helpers/postgres';
import { ExpressionBuilder, sql } from 'kysely';
import { DB } from '@docmost/db/types/db';
import { resolveAgentProvenance } from '../agent-provenance';
import { PageHistoryKind } from '../../../collaboration/constants';
/**
* Role-resolution subquery for a page-history row's bound AI chat (#300). Joins
@@ -46,6 +47,9 @@ export class PageHistoryRepo {
'lastUpdatedById',
'lastUpdatedSource',
'lastUpdatedAiChatId',
// #370 — intentionality tier ('manual' | 'agent' | 'idle' | 'boundary');
// null on legacy rows (= autosave). Selected so callers can read/promote it.
'kind',
'contributorIds',
'spaceId',
'workspaceId',
@@ -85,9 +89,15 @@ export class PageHistoryRepo {
async saveHistory(
page: Page,
opts?: { contributorIds?: string[]; trx?: KyselyTransaction },
): Promise<void> {
await this.insertPageHistory(
opts?: {
contributorIds?: string[];
// #370 — intentionality tier for this snapshot. Omitted → null (legacy
// autosave semantics). Callers derive it server-side, never from a client.
kind?: PageHistoryKind;
trx?: KyselyTransaction;
},
): Promise<PageHistory> {
return await this.insertPageHistory(
{
pageId: page.id,
slugId: page.slugId,
@@ -99,6 +109,7 @@ export class PageHistoryRepo {
// Copy the provenance marker off the page row, as for lastUpdatedById.
lastUpdatedSource: page.lastUpdatedSource,
lastUpdatedAiChatId: page.lastUpdatedAiChatId,
kind: opts?.kind ?? null,
contributorIds: opts?.contributorIds,
spaceId: page.spaceId,
workspaceId: page.workspaceId,
@@ -107,6 +118,25 @@ export class PageHistoryRepo {
);
}
/**
* #370 promote an existing snapshot's intentionality tier in place. Used by
* the manual-save "promote-not-dup" path: when the latest history row already
* holds the exact content being versioned, we upgrade its `kind` instead of
* duplicating a heavy content row.
*/
async updateHistoryKind(
pageHistoryId: string,
kind: PageHistoryKind,
trx?: KyselyTransaction,
): Promise<void> {
const db = dbOrTx(this.db, trx);
await db
.updateTable('pageHistory')
.set({ kind })
.where('id', '=', pageHistoryId)
.execute();
}
async findPageHistoryByPageId(pageId: string, pagination: PaginationOptions) {
const query = this.db
.selectFrom('pageHistory')
+1
View File
@@ -280,6 +280,7 @@ export interface PageHistory {
createdAt: Generated<Timestamp>;
icon: string | null;
id: Generated<string>;
kind: string | null;
lastUpdatedAiChatId: string | null;
lastUpdatedById: string | null;
lastUpdatedSource: string | null;
@@ -1,145 +0,0 @@
// export.service.ts imports the ESM-only @sindresorhus/slugify (not in jest's
// transform allowlist). It is irrelevant to the markdown-serialization path under
// test (only used for page-mention link slugs on the DB path), so it is mocked
// out to keep the module graph loadable under ts-jest (mirrors the import specs).
jest.mock('@sindresorhus/slugify', () => ({
__esModule: true,
default: (input: string) => String(input),
}));
import { convertProseMirrorToMarkdown } from '@docmost/prosemirror-markdown';
import { ExportService } from './export.service';
import { ExportFormat } from './dto/export-dto';
/**
* STEP 1 golden test for issue #345: server MARKDOWN export runs DIRECTLY through
* the canonical converter (`convertProseMirrorToMarkdown`) no HTML intermediate
* and no `@docmost/editor-ext` markdown layer so the emitted markdown is in the
* canonical package forms and is byte-identical to the git-sync vault body.
*
* These are the goldens the swap has to satisfy: they assert the CANONICAL
* surface (callout `> [!type]`, inline footnote `^[…]`, lossless image
* `<!--img …-->`) rather than the old editor-ext forms (`:::type`, `[^id]`,
* lossy `![alt](src)`).
*
* `exportPage(..., singlePage=false)` takes no DB path (no mention rewriting), so
* the service is constructed with null collaborators and only the pure
* PM -> Markdown path is exercised.
*/
function makeService(): ExportService {
return new ExportService(
null as any, // pageRepo
null as any, // pagePermissionRepo
null as any, // db
null as any, // storageService
null as any, // environmentService
null as any, // domainService
);
}
// A representative page exercising the node types whose canonical markdown form
// changed with the move off the editor-ext layer: callout, inline footnote, and a
// lossless image carrying width/align attrs that the old layer dropped.
const REPRESENTATIVE_DOC = {
type: 'doc',
content: [
{
type: 'paragraph',
content: [
{ type: 'text', text: 'Body ' },
{ type: 'footnoteReference', attrs: { id: 'fn-1' } },
{ type: 'text', text: ' end.' },
],
},
{
type: 'callout',
attrs: { type: 'info', icon: null },
content: [
{
type: 'paragraph',
content: [{ type: 'text', text: 'Heads up' }],
},
],
},
{
type: 'image',
attrs: {
src: '/files/pic.png',
alt: 'Pic',
width: 320,
align: 'left',
},
},
{
type: 'footnotesList',
content: [
{
type: 'footnoteDefinition',
attrs: { id: 'fn-1' },
content: [
{
type: 'paragraph',
content: [{ type: 'text', text: 'the note' }],
},
],
},
],
},
],
};
describe('ExportService — markdown export via the canonical converter (#345)', () => {
it('emits canonical callout, inline footnote and lossless image forms', async () => {
const service = makeService();
const md = (await service.exportPage(ExportFormat.Markdown, {
title: '',
content: REPRESENTATIVE_DOC,
} as any)) as string;
// Callout: Obsidian `> [!type]`, NOT the legacy `:::type`.
expect(md).toContain('> [!info]');
expect(md).not.toContain(':::');
// Inline footnote: `^[…]`, NOT the reference `[^id]` form.
expect(md).toContain('^[the note]');
expect(md).not.toMatch(/\[\^/);
// Lossless image: trailing `<!--img …-->` carrying the dropped attrs.
expect(md).toContain('![Pic](/files/pic.png)');
expect(md).toContain('<!--img');
expect(md).toContain('"width":"320"');
expect(md).toContain('"align":"left"');
});
it('export body is byte-identical to the git-sync vault serializer (export == vault)', async () => {
const service = makeService();
// A title-less page: exportPage prepends NO heading, so the whole output is
// the page BODY — exactly what git-sync serializes (git-sync stores the title
// in frontmatter / the filename, never as an in-body H1).
const exported = (await service.exportPage(ExportFormat.Markdown, {
title: '',
content: REPRESENTATIVE_DOC,
} as any)) as string;
// The git-sync vault writer feeds this SAME converter (git-sync
// `stabilizePageBody` = convertProseMirrorToMarkdown(content) at the
// fixpoint). For an already-stable doc the single pass IS the fixpoint, so
// the two are byte-identical by construction — assert it.
const vaultBody = convertProseMirrorToMarkdown(REPRESENTATIVE_DOC);
expect(exported).toBe(vaultBody);
});
it('prepends the page title as an H1 heading (the one documented export/vault delta)', async () => {
const service = makeService();
const md = (await service.exportPage(ExportFormat.Markdown, {
title: 'My Page',
content: { type: 'doc', content: [] },
} as any)) as string;
// Export makes standalone files, so it prepends the title as an H1. This is
// the ONE deliberate difference from the vault body (which carries the title
// in frontmatter). The body below the heading still serializes canonically.
expect(md.startsWith('# My Page')).toBe(true);
});
});
@@ -37,7 +37,7 @@ import {
getAttachmentIds,
getProsemirrorContent,
} from '../../common/helpers/prosemirror/utils';
import { convertProseMirrorToMarkdown } from '@docmost/prosemirror-markdown';
import { htmlToMarkdown } from '@docmost/editor-ext';
type AllowedAttachment = { id: string; fileName: string; filePath: string };
@@ -79,8 +79,9 @@ export class ExportService {
prosemirrorJson.content.unshift(titleNode);
}
const pageHtml = jsonToHtml(prosemirrorJson);
if (format === ExportFormat.HTML) {
const pageHtml = jsonToHtml(prosemirrorJson);
return `<!DOCTYPE html>
<html>
<head>
@@ -91,14 +92,11 @@ export class ExportService {
}
if (format === ExportFormat.Markdown) {
// Direct ProseMirror JSON -> Markdown via the canonical converter
// (`@docmost/prosemirror-markdown`). This is the SAME serializer the
// git-sync vault writer feeds (see git-sync `stabilizePageBody`), so an
// exported page body is byte-identical to its vault representation — no
// HTML intermediate, no second markdown layer, no format drift (issue
// #345). The old `<colgroup>` scrub is gone with the HTML step: the
// converter emits GFM tables directly and never produces `<colgroup>`.
return convertProseMirrorToMarkdown(prosemirrorJson);
const newPageHtml = pageHtml.replace(
/<colgroup[^>]*>[\s\S]*?<\/colgroup>/gim,
'',
);
return htmlToMarkdown(newPageHtml);
}
return;
@@ -17,22 +17,6 @@ jest.mock('image-dimensions', () => ({
__esModule: true,
imageDimensionsFromData: () => undefined,
}));
// FileImportTaskService -> PageService -> collaboration.gateway ->
// metrics.registry imports `prom-client`, which is not resolvable in this
// workspace's node_modules (types-only stub, no runtime entry). Metrics are
// disabled on this path, so a virtual no-op mock keeps the module graph loadable.
jest.mock(
'prom-client',
() => ({
collectDefaultMetrics: () => undefined,
Registry: class {},
Histogram: class {},
Gauge: class {},
Counter: class {},
Summary: class {},
}),
{ virtual: true },
);
import { promises as fs } from 'fs';
import * as os from 'os';
@@ -42,17 +26,14 @@ import { ImportService } from './import.service';
/**
* Binding test for issue #228 / review #5: FileImportTaskService.processGenericImport
* is a NON-editor write path, so a zip-imported `.md` page ends up with canonical
* footnotes before persisting: ordered by first reference, reused refs deduped,
* orphan definitions dropped.
* is a NON-editor write path (markdownToHtml -> processHTML -> JSON, never runs
* footnoteSyncPlugin), so it canonicalizes footnotes before persisting. This pins
* that binding the same one import.service has a spec for which previously had
* NO spec at all.
*
* Since #345 the `.md` parse runs `normalizeForeignMarkdown` ->
* `markdownToProseMirror` -> `jsonToHtml` (feeding the shared HTML attachment /
* link pipeline) -> `processHTML` -> `canonicalizeFootnotes`. The parser assigns
* fresh `fn-*` ids, so we assert by definition BODY order rather than the source
* labels. The conversion is REAL (a real ImportService, its createYdoc stubbed);
* the filesystem is a real temp dir with one .md file; the DB transaction is
* stubbed to capture the persisted page content.
* The markdown -> HTML -> ProseMirror conversion is REAL (a real ImportService,
* its createYdoc stubbed); the filesystem is a real temp dir with one .md file;
* the DB transaction is stubbed to capture the persisted page content.
*/
// Out-of-order references (c, a, b), a REUSED reference ([^a] twice), and an
@@ -68,14 +49,13 @@ const MARKDOWN = [
'[^z]: orphan note',
].join('\n');
/** Definition body texts of the (single) footnotesList, in list order. */
function footnoteListBodies(content: any): string[] {
function footnoteListIds(content: any): string[] {
const list = (content?.content ?? []).find(
(n: any) => n.type === 'footnotesList',
);
return (list?.content ?? [])
.filter((n: any) => n.type === 'footnoteDefinition')
.map((n: any) => n.content?.[0]?.content?.[0]?.text);
.map((n: any) => n.attrs?.id);
}
// A permissive chainable stub for the spaces lookup (selectFrom(...).select(...)
@@ -91,127 +71,80 @@ function chainable(result: any): any {
return proxy;
}
/**
* Run one markdown file through the REAL zip-import pipeline
* (`processGenericImport` -> `markdownToProseMirror` -> `jsonToHtml` ->
* `processHTML`/`htmlToJson`) and return the persisted page `content`. This is
* the server-specific PM->HTML->PM hop that the package's own PM<->MD tests do
* NOT cover.
*/
async function runZipImport(markdown: string): Promise<any> {
const extractDir = await fs.mkdtemp(path.join(os.tmpdir(), 'fit-canon-'));
await fs.writeFile(path.join(extractDir, 'note.md'), markdown, 'utf-8');
const importService = new ImportService(
{} as any,
{} as any,
{} as any,
{} as any,
);
jest
.spyOn(importService as any, 'createYdoc')
.mockResolvedValue(Buffer.from([]) as any);
let captured: any = null;
const trx = {
insertInto: (table: string) => ({
values: (v: any) => {
if (table === 'pages') captured = v;
return { execute: async () => {} };
},
}),
};
const db: any = {
selectFrom: () => chainable({ slug: 'space-slug' }),
transaction: () => ({ execute: (fn: any) => fn(trx) }),
};
const importAttachmentService = {
processAttachments: async ({ html }: any) => html,
};
const service = new FileImportTaskService(
{} as any, // storageService
importService as any,
{ nextPagePosition: async () => 'a0' } as any,
{ insertBacklink: jest.fn() } as any,
db,
importAttachmentService as any,
{ emit: jest.fn() } as any,
{ logBatchWithContext: jest.fn() } as any,
);
const fileTask: any = {
id: 'task-1',
source: 'generic',
spaceId: 'space-1',
workspaceId: 'ws-1',
creatorId: 'user-1',
};
try {
await service.processGenericImport({ extractDir, fileTask });
expect(captured).toBeTruthy();
return captured.content;
} finally {
await fs.rm(extractDir, { recursive: true, force: true });
}
}
/** Find the first node of a given type anywhere in a PM content tree. */
function findFirst(node: any, type: string): any {
if (!node || typeof node !== 'object') return null;
if (node.type === type) return node;
for (const child of node.content ?? []) {
const hit = findFirst(child, type);
if (hit) return hit;
}
return null;
}
describe('FileImportTaskService.processGenericImport — footnote canonicalization (#228)', () => {
it('orders footnotes by first reference, dedupes reuse, and drops orphans on zip import', async () => {
const content = await runZipImport(MARKDOWN);
// Definitions ordered by FIRST REFERENCE (C, A, B), NOT the markdown
// definition order (A, B, C). Ids are the parser's fresh `fn-*`, so pin
// the BODIES.
expect(footnoteListBodies(content)).toEqual(['note C', 'note A', 'note B']);
// Orphan [^z] dropped; reused [^a] collapses to one definition; one list.
expect(footnoteListBodies(content)).not.toContain('orphan note');
const lists = (content.content ?? []).filter(
(n: any) => n.type === 'footnotesList',
const extractDir = await fs.mkdtemp(path.join(os.tmpdir(), 'fit-canon-'));
await fs.writeFile(path.join(extractDir, 'note.md'), MARKDOWN, 'utf-8');
// Real ImportService for the html -> JSON conversion; stub the yjs encode.
const importService = new ImportService(
{} as any,
{} as any,
{} as any,
{} as any,
);
expect(lists).toHaveLength(1);
expect(
footnoteListBodies(content).filter((b) => b === 'note A'),
).toHaveLength(1);
});
jest
.spyOn(importService as any, 'createYdoc')
.mockResolvedValue(Buffer.from([]) as any);
// #345 F4: the zip path routes markdown through jsonToHtml -> processHTML ->
// htmlToJson (the shared HTML attachment pipeline). #345's headline is LOSSLESS
// image width/align via the `<!--img {...}-->` comment; a callout carries its
// `type`. This asserts those survive the PM->HTML->PM hop — the one hop the
// package's PM<->MD suite does not exercise.
it('preserves image width/align and callout type through the PM->HTML->PM hop', async () => {
const md = [
'# Doc',
'',
'![a picture](https://example.com/i.png) <!--img {"width":"320","align":"left"}-->',
'',
':::warning',
'Careful now.',
':::',
].join('\n');
let captured: any = null;
const trx = {
insertInto: (table: string) => ({
values: (v: any) => {
if (table === 'pages') captured = v;
return { execute: async () => {} };
},
}),
};
const db: any = {
selectFrom: () => chainable({ slug: 'space-slug' }),
transaction: () => ({ execute: (fn: any) => fn(trx) }),
};
const content = await runZipImport(md);
const importAttachmentService = {
processAttachments: async ({ html }: any) => html,
};
const backlinkRepo = { insertBacklink: jest.fn() };
const eventEmitter = { emit: jest.fn() };
const auditService = { logBatchWithContext: jest.fn() };
const image = findFirst(content, 'image');
expect(image).toBeTruthy();
// The lossless sizing/alignment must survive the HTML hop.
expect(String(image.attrs?.width)).toBe('320');
expect(image.attrs?.align).toBe('left');
const pageService = { nextPagePosition: async () => 'a0' };
const callout = findFirst(content, 'callout');
expect(callout).toBeTruthy();
expect(callout.attrs?.type).toBe('warning');
const service = new FileImportTaskService(
{} as any, // storageService
importService as any,
pageService as any,
backlinkRepo as any,
db,
importAttachmentService as any,
eventEmitter as any,
auditService as any,
);
const fileTask: any = {
id: 'task-1',
source: 'generic',
spaceId: 'space-1',
workspaceId: 'ws-1',
creatorId: 'user-1',
};
try {
await service.processGenericImport({ extractDir, fileTask });
expect(captured).toBeTruthy();
const content = captured.content;
// Reference order is c, a, b (NOT the markdown definition order a, b, c).
expect(footnoteListIds(content)).toEqual(['c', 'a', 'b']);
// Orphan [^z] dropped; reused [^a] collapses to one definition; one list.
expect(footnoteListIds(content)).not.toContain('z');
const lists = (content.content ?? []).filter(
(n: any) => n.type === 'footnotesList',
);
expect(lists).toHaveLength(1);
expect(footnoteListIds(content).filter((id) => id === 'a')).toHaveLength(1);
} finally {
await fs.rm(extractDir, { recursive: true, force: true });
}
});
});
@@ -1,9 +1,6 @@
import { Inject, Injectable, Logger } from '@nestjs/common';
import * as path from 'path';
import {
jsonToHtml,
jsonToText,
} from '../../../collaboration/collaboration.util';
import { jsonToText } from '../../../collaboration/collaboration.util';
import { InjectKysely } from 'nestjs-kysely';
import { KyselyDB } from '@docmost/db/types/kysely.types';
import {
@@ -21,11 +18,9 @@ import { generateSlugId } from '../../../common/helpers';
import { v7 } from 'uuid';
import { generateJitteredKeyBetween } from 'fractional-indexing-jittered';
import { FileTask, InsertablePage } from '@docmost/db/types/entity.types';
import { canonicalizeFootnotes } from '@docmost/editor-ext';
import { markdownToProseMirror } from '@docmost/prosemirror-markdown';
import { markdownToHtml, canonicalizeFootnotes } from '@docmost/editor-ext';
import { getProsemirrorContent } from '../../../common/helpers/prosemirror/utils';
import { formatImportHtml } from '../utils/import-formatter';
import { normalizeForeignMarkdown } from '../utils/foreign-markdown';
import {
buildAttachmentCandidates,
collectMarkdownAndHtmlFiles,
@@ -466,18 +461,7 @@ export class FileImportTaskService {
content = await fs.readFile(absPath, 'utf-8');
if (page.fileExtension.toLowerCase() === '.md') {
// Parse markdown with the single canonical converter
// (`@docmost/prosemirror-markdown`), after normalizing foreign
// reference footnotes, then serialize to HTML so the shared HTML
// pipeline below (processAttachments + formatImportHtml +
// processHTML) keeps handling `.md` and `.html` imports
// uniformly. The markdown PARSE no longer goes through the
// editor-ext markdown layer (issue #345) — the drift source is
// gone. The PM -> HTML -> PM hop that follows is lossless
// plumbing for attachment/link resolution, NOT a second parse.
content = jsonToHtml(
await markdownToProseMirror(normalizeForeignMarkdown(content)),
);
content = await markdownToHtml(content);
}
} catch (err: any) {
if (err?.code === 'ENOENT') {
@@ -516,12 +500,10 @@ export class FileImportTaskService {
this.importService.extractTitleAndRemoveHeading(pmState);
// Canonicalize footnote topology on this non-editor write path
// (the HTML pipeline's processHTML never runs footnoteSyncPlugin), so
// a zip-imported page's footnotes are reference-ordered, deduped, and
// (markdownToHtml/processHTML never runs footnoteSyncPlugin), so a
// zip-imported page's footnotes are reference-ordered, deduped, and
// orphan-free like the editor's invariant (issue #228). Pure +
// idempotent + shape-safe; a footnote-free doc is unchanged. (For a
// `.md` file the package parser already yields canonical footnotes,
// so this is a no-op there.)
// idempotent + shape-safe; a footnote-free doc is unchanged.
// (Future consolidation, architecture B: like import.service, this
// path persists directly rather than via PageService — a shared
// "prepare JSON for persist" helper would centralize this call.)
@@ -12,19 +12,13 @@ import { canonicalizeFootnotes } from '@docmost/editor-ext';
/**
* Integration-ish test for the USER-FACING markdown import path
* (`ImportService.importPage`). It exercises the REAL markdown -> ProseMirror
* conversion and asserts the stored page's footnotes are canonical: ordered by
* FIRST REFERENCE (not markdown definition order), reused references deduped to a
* single definition, and orphan definitions dropped.
*
* Since #345 the markdown parse runs through the canonical package
* (`normalizeForeignMarkdown` -> `markdownToProseMirror`), which owns this
* canonicalization: the input's GFM `[^id]` reference footnotes are normalized to
* inline `^[…]`, and the parser assigns fresh sequential ids (`fn-*`) in
* reference order while merging identical bodies so we assert by definition
* BODY order, not by the source labels. `canonicalizeFootnotes` remains wired as
* an idempotent safety net (issue #228) and is a no-op on this already-canonical
* output.
* (`ImportService.importPage`). It exercises the REAL markdown -> HTML -> JSON
* conversion and asserts that the stored page content has its footnotes
* canonicalized the gap that issue #228 fixes: the import path builds
* ProseMirror JSON directly (never running the editor's footnoteSyncPlugin), so
* before this wiring the stored footnotes kept the markdown's physical
* definition order (out of order vs. references), retained orphan definitions,
* and did not collapse reused references.
*
* The DB/ydoc side-effects are stubbed: `getNewPagePosition` (DB query) and
* `createYdoc` (Yjs encode) are spied, and `pageRepo.insertPage` captures the
@@ -73,14 +67,24 @@ function makeService() {
}
/** List the footnote-definition ids of the (single) footnotesList, in order. */
/** Definition body texts of the (single) footnotesList, in list order. */
function footnoteListBodies(content: any): string[] {
function footnoteListIds(content: any): string[] {
const list = (content.content ?? []).find(
(n: any) => n.type === 'footnotesList',
);
return (list?.content ?? [])
if (!list) return [];
return (list.content ?? [])
.filter((n: any) => n.type === 'footnoteDefinition')
.map((n: any) => n.content?.[0]?.content?.[0]?.text);
.map((n: any) => n.attrs?.id);
}
function definitionText(content: any, id: string): string | undefined {
const list = (content.content ?? []).find(
(n: any) => n.type === 'footnotesList',
);
const def = (list?.content ?? []).find(
(n: any) => n.type === 'footnoteDefinition' && n.attrs?.id === id,
);
return def?.content?.[0]?.content?.[0]?.text;
}
describe('ImportService.importPage — footnote canonicalization (#228)', () => {
@@ -97,23 +101,23 @@ describe('ImportService.importPage — footnote canonicalization (#228)', () =>
const content = getCaptured().content;
expect(content).toBeTruthy();
// Definitions ordered by FIRST REFERENCE (C, A, B) — NOT the markdown
// definition order (A, B, C) — with the orphan [^z] dropped and the reused
// [^a] collapsed to a single definition. (Ids are the parser's fresh `fn-*`,
// so we pin the BODIES.)
expect(footnoteListBodies(content)).toEqual(['note C', 'note A', 'note B']);
// Reference order is c, a, b (NOT the markdown definition order a, b, c).
expect(footnoteListIds(content)).toEqual(['c', 'a', 'b']);
// Definitions preserved and attached to the right ids.
expect(definitionText(content, 'c')).toBe('note C');
expect(definitionText(content, 'a')).toBe('note A');
expect(definitionText(content, 'b')).toBe('note B');
// Orphan definition [^z] is dropped.
expect(footnoteListBodies(content)).not.toContain('orphan note');
expect(footnoteListIds(content)).not.toContain('z');
// Reused [^a] yields exactly ONE definition, and exactly one list.
const lists = (content.content ?? []).filter(
(n: any) => n.type === 'footnotesList',
);
expect(lists).toHaveLength(1);
expect(
footnoteListBodies(content).filter((b) => b === 'note A'),
).toHaveLength(1);
expect(footnoteListIds(content).filter((id) => id === 'a')).toHaveLength(1);
});
it('is idempotent: canonicalizing the stored output again is a no-op', async () => {
@@ -130,6 +134,6 @@ describe('ImportService.importPage — footnote canonicalization (#228)', () =>
// time must not change it (safe to wire into every write path).
const second = canonicalizeFootnotes(stored);
expect(second).toEqual(stored);
expect(footnoteListBodies(second)).toEqual(['note C', 'note A', 'note B']);
expect(footnoteListIds(second)).toEqual(['c', 'a', 'b']);
});
});
@@ -17,9 +17,7 @@ import {
import { generateJitteredKeyBetween } from 'fractional-indexing-jittered';
import { TiptapTransformer } from '@hocuspocus/transformer';
import * as Y from 'yjs';
import { canonicalizeFootnotes } from '@docmost/editor-ext';
import { markdownToProseMirror } from '@docmost/prosemirror-markdown';
import { normalizeForeignMarkdown } from '../utils/foreign-markdown';
import { markdownToHtml, canonicalizeFootnotes } from '@docmost/editor-ext';
import {
FileTaskStatus,
FileTaskType,
@@ -87,13 +85,11 @@ export class ImportService {
const extracted = this.extractTitleAndRemoveHeading(prosemirrorState);
const title = extracted.title;
// The markdown path now canonicalizes footnotes itself (the package parser),
// but the HTML path (processHTML -> htmlToJson) does NOT run the editor's
// footnoteSyncPlugin, so an imported HTML doc can keep its source's PHYSICAL
// definition order (out of order vs. references), retain orphan definitions,
// and not be deduped. Canonicalize before persisting so the stored page
// matches the editor's invariant (issue #228); it is an idempotent no-op on
// the already-canonical markdown output.
// Imported markdown/HTML is built via markdownToHtml -> htmlToJson, which
// never runs the editor's footnoteSyncPlugin, so the footnote topology keeps
// the source's PHYSICAL definition order (out of order vs. references),
// retains orphan definitions, and is not deduped. Canonicalize before
// persisting so the stored page matches the editor's invariant (issue #228).
// Pure + idempotent + shape-safe: a doc with no footnotes is unchanged.
// (Future consolidation, architecture B: this import path persists directly
// via pageRepo.insertPage rather than through PageService.createPage, so the
@@ -137,15 +133,12 @@ export class ImportService {
}
async processMarkdown(markdownInput: string): Promise<any> {
// Canonical markdown -> ProseMirror JSON directly via
// `@docmost/prosemirror-markdown` (issue #345) — no HTML intermediate and no
// second editor-ext markdown layer. Foreign markdown surfaces the strict
// canonical parser does not accept (GFM `[^id]` reference footnotes) are
// rewritten to the canonical inline form by `normalizeForeignMarkdown` first.
// The HTML-cleanup pass (`normalizeImportHtml`) is intentionally skipped here:
// it targets foreign *HTML* (Notion/XWiki), which only ever arrives on the
// `.html` path (`processHTML`), never as canonical markdown.
return markdownToProseMirror(normalizeForeignMarkdown(markdownInput));
try {
const html = await markdownToHtml(markdownInput);
return this.processHTML(html);
} catch (err) {
throw err;
}
}
async processHTML(htmlInput: string): Promise<any> {
@@ -1,218 +0,0 @@
import {
convertProseMirrorToMarkdown,
markdownToProseMirror,
} from '@docmost/prosemirror-markdown';
import { normalizeForeignMarkdown } from './foreign-markdown';
/**
* STEP 2 goldens for issue #345: the foreign-markdown normalizer that runs at the
* import boundary BEFORE the strict canonical parser (`markdownToProseMirror`).
*
* Two layers:
* 1. PURE stringstring cases pinning the normalizer's own behavior (GFM
* reference footnotes inline `^[…]`).
* 2. END-TO-END acceptance: for a foreign corpus, `normalizeForeignMarkdown`
* then `markdownToProseMirror` then `convertProseMirrorToMarkdown` must leave
* NO literal `[^id]` / `:::` garbage in the document and must re-export in the
* canonical forms.
*/
describe('normalizeForeignMarkdown — GFM reference footnotes', () => {
it('inlines a single-line reference footnote and drops its definition', () => {
const out = normalizeForeignMarkdown(
'A note[^1] here.\n\n[^1]: The definition.',
);
expect(out).toBe('A note^[The definition.] here.\n');
});
it('inlines every reference to a reused id (downstream dedups)', () => {
const out = normalizeForeignMarkdown(
'X[^a] and Y[^a].\n\n[^a]: shared.',
);
expect(out).toBe('X^[shared.] and Y^[shared.].\n');
});
it('joins indented continuation lines of a definition with a space', () => {
const out = normalizeForeignMarkdown(
'See[^n].\n\n[^n]: line one\n line two',
);
expect(out).toBe('See^[line one line two].\n');
});
it('never rewrites a reference inside a fenced code block', () => {
const out = normalizeForeignMarkdown(
'```\ncode[^1] here\n```\n\n[^1]: def.',
);
expect(out).toContain('code[^1] here');
// The (now orphaned) definition line is still removed.
expect(out).not.toContain('[^1]: def.');
});
it('never rewrites a reference inside an INLINE-code span (backticks)', () => {
// The `[^1]` inside backticks is literal code and must survive verbatim;
// the one outside is rewritten. (Bug #1: only fenced blocks were protected.)
const out = normalizeForeignMarkdown(
'Use `arr[^1]` in code but note[^1] in prose.\n\n[^1]: def.',
);
expect(out).toBe('Use `arr[^1]` in code but note^[def.] in prose.\n');
});
it('escapes brackets in a body so an unbalanced ] cannot truncate the footnote', () => {
// A foreign definition body with a stray `]` would, unescaped, close the
// canonical `^[...]` early and leak the tail as text (bug #2). The body's
// brackets are backslash-escaped so the footnote stays whole.
const out = normalizeForeignMarkdown(
'Ref[^1] here.\n\n[^1]: see item ] and [more] later',
);
expect(out).toBe('Ref^[see item \\] and \\[more\\] later] here.\n');
// The tokenizer must see exactly one unescaped closing bracket (our own).
expect(out.match(/(?<!\\)\]/g)).toHaveLength(1);
});
it('leaves a reference with no matching definition literal (no body to inline)', () => {
const out = normalizeForeignMarkdown('Dangling[^x] ref.');
expect(out).toBe('Dangling[^x] ref.');
});
it('returns the input unchanged when there are no reference footnotes', () => {
const md = '# Title\n\nJust text with `inline code` and a [link](/x).';
expect(normalizeForeignMarkdown(md)).toBe(md);
});
it('does NOT touch callout surfaces — the canonical parser handles them', () => {
const callouts = ':::info\nHi\n:::\n\n> [!warning]\n> Careful';
expect(normalizeForeignMarkdown(callouts)).toBe(callouts);
});
it('strips a leading YAML front-matter block (Obsidian/Hugo/git-sync files)', () => {
const out = normalizeForeignMarkdown(
'---\ntitle: My Page\ntags: [a, b]\n---\n\n# Heading\n\nBody.',
);
expect(out).toBe('# Heading\n\nBody.');
// The front-matter must not leak into the body as a setext heading.
expect(out).not.toContain('title: My Page');
expect(out).not.toContain('---');
});
it('does not strip a horizontal rule that is not leading front-matter', () => {
const md = 'Intro paragraph.\n\n---\n\nAfter the rule.';
expect(normalizeForeignMarkdown(md)).toBe(md);
});
it('is linear on a document with thousands of definitions (no quadratic blowup)', () => {
// F2(a): the pass-2 rewrite must be O(text), not O(text × defs). Build a
// pathological doc (many defs + many plain text lines) and assert it
// completes well under a second — a quadratic implementation took ~14s.
const N = 4000;
const refs = Array.from({ length: N }, (_, i) => `line ${i} plain text`).join('\n');
const defs = Array.from({ length: N }, (_, i) => `[^n${i}]: def ${i}`).join('\n');
const doc = `start[^n0] and[^n${N - 1}] end\n\n${refs}\n\n${defs}`;
const t0 = Date.now();
const out = normalizeForeignMarkdown(doc);
const elapsed = Date.now() - t0;
expect(elapsed).toBeLessThan(2000);
// Sanity: the two real references were still inlined.
expect(out).toContain('^[def 0]');
expect(out).toContain(`^[def ${N - 1}]`);
});
it('is bounded on a long unclosed backtick run (no inline-split ReDoS)', () => {
// F2(b): a huge unterminated backtick run must not cause quadratic
// backtracking in the inline-code split. Oversized lines skip the split
// entirely (left untouched), so this returns promptly.
const line = 'x' + '`'.repeat(200000);
const doc = `${line}\n\n[^1]: def`;
const t0 = Date.now();
normalizeForeignMarkdown(doc);
expect(Date.now() - t0).toBeLessThan(2000);
});
it('does not crash or slow down on thousands of prefix-chain definition ids', () => {
// F7: the rewrite must use a FIXED generic scanner, not an alternation built
// from the ids. A `(a|aa|aaa|…)` alternation over prefix-chain ids blows the
// V8 regex compiler (FATAL RegExpCompiler Allocation failed — uncatchable,
// kills the process). A fixed scanner has no id-dependent compilation cost.
const N = 4000;
const ids = Array.from({ length: N }, (_, i) => 'a'.repeat(i + 1));
const defs = ids.map((id) => `[^${id}]: body ${id.length}`).join('\n');
const doc = `ref[^${ids[0]}] and[^${ids[N - 1]}] end\n\n${defs}`;
const t0 = Date.now();
const out = normalizeForeignMarkdown(doc);
expect(Date.now() - t0).toBeLessThan(2000);
// Prefix disambiguation is correct: [^a] and [^aaaa...] inline their OWN body.
expect(out).toContain('^[body 1]');
expect(out).toContain(`^[body ${N}]`);
});
it('strips a CRLF (Windows) front-matter block, not just LF', () => {
// F9: the line-anchored regex needs LF after the opening `---`, so a Windows
// file (`---\r\n…`) would slip past the strip and leak the front-matter into
// the body. normalizeForeignMarkdown normalizes CRLF -> LF first.
const out = normalizeForeignMarkdown(
'---\r\ntitle: Foo\r\ntags: [a]\r\n---\r\n\r\n# Heading\r\n\r\nBody.',
);
expect(out).toBe('# Heading\n\nBody.');
expect(out).not.toContain('title: Foo');
expect(out).not.toContain('---');
});
it('strips front-matter whose value contains a triple-dash (line-anchored)', () => {
// F8: the block must close only on a `\n---` LINE, not the first inline
// `---`. A value like `title: Q1 --- Q2` must not truncate the front-matter
// and leak the rest (author/closing ---) into the body.
const out = normalizeForeignMarkdown(
'---\ntitle: Q1 --- Q2 results\nauthor: bob\n---\n\nReal body.',
);
expect(out).toBe('Real body.');
expect(out).not.toContain('author: bob');
expect(out).not.toContain('Q2 results');
});
});
describe('foreign markdown import acceptance (normalizer + canonical parser)', () => {
const FOREIGN = [
'# Doc',
'',
'Body refs [^c] and [^a] and [^b] and again [^a].',
'',
':::info',
'A legacy callout.',
':::',
'',
'| h1 | h2 |',
'| --- | --- |',
'| 1 | 2 |',
'',
'[^a]: note A',
'[^b]: note B',
'[^c]: note C',
'[^z]: orphan note',
].join('\n');
it('leaves no literal [^id] or ::: in the imported doc and re-exports canonically', async () => {
const normalized = normalizeForeignMarkdown(FOREIGN);
const doc = await markdownToProseMirror(normalized);
const reexport = convertProseMirrorToMarkdown(doc);
// No foreign garbage leaks into the document.
expect(reexport).not.toMatch(/\[\^/); // no reference footnote refs/defs
expect(reexport).not.toContain(':::'); // no legacy callout fences
// Canonical forms are present.
expect(reexport).toContain('^[note C]');
expect(reexport).toContain('> [!info]');
expect(reexport).toContain('| h1 | h2 |');
// Footnotes: ordered by first reference (C, A, B), reused [^a] deduped to one,
// orphan [^z] dropped (it had no reference after normalization).
const list = doc.content.find((n: any) => n.type === 'footnotesList');
const bodies = list.content.map(
(d: any) => d.content[0].content[0].text,
);
expect(bodies).toEqual(['note C', 'note A', 'note B']);
expect(bodies).not.toContain('orphan note');
expect(
doc.content.filter((n: any) => n.type === 'footnotesList'),
).toHaveLength(1);
});
});
@@ -1,265 +0,0 @@
/**
* Foreign-markdown normalizer an input-liberal / output-canonical adapter that
* runs at the IMPORT boundary, BEFORE the canonical parser
* (`markdownToProseMirror` from `@docmost/prosemirror-markdown`).
*
* The canonical parser is deliberately STRICT: it only understands Docmost's
* canonical markdown surface (Obsidian-style `> [!type]` callouts, Pandoc/Obsidian
* inline footnotes `^[body]`, lossless `![alt](src) <!--img {...}-->` images, ).
* Import, however, ingests FOREIGN files (GitHub/GFM, Notion, old Docmost
* exports). Those use surfaces the canonical parser does not accept, most notably
* GitHub-flavoured *reference* footnotes:
*
* Text with a note[^1] and another[^long].
*
* [^1]: The first definition.
* [^long]: A second one.
*
* Left untouched, the parser does NOT recognise `[^id]` (it only parses `^[body]`),
* so the reference leaks as literal text and worse, the trailing `[^id]: def`
* line is a valid CommonMark *link-reference definition*, so `[^id]` is silently
* rendered as a bogus link. This normalizer rewrites reference footnotes into the
* canonical inline form so the parser materialises real footnote nodes.
*
* This is a TEXT pre-pass, NOT a second parser fork: it does not re-implement any
* converter logic. Callout surfaces (`:::type` and `> [!type]`) are intentionally
* NOT touched here the canonical parser already accepts BOTH natively (its
* `preprocessCallouts` pass), so normalizing them would be redundant and would
* only risk degrading the parser's nesting/code-fence-aware handling.
*/
/** Matches a fenced code block delimiter (``` or ~~~), capturing the marker run. */
const CODE_FENCE_RE = /^(\s*)(`{3,}|~{3,})/;
/**
* Matches a GFM footnote DEFINITION line: `[^id]: body`. The id is any run of
* non-`]` characters; the body is the remainder of the line (possibly empty).
*/
const FOOTNOTE_DEF_RE = /^\[\^([^\]]+)\]:[ \t]?(.*)$/;
/** True when a line is a code-fence delimiter that toggles fenced-code state. */
function fenceMarker(line: string): string | null {
const m = line.match(CODE_FENCE_RE);
return m ? m[2] : null;
}
/** True when a line is indented (leading space/tab) and not blank — a continuation. */
function isIndentedContinuation(line: string): boolean {
return /^[ \t]+\S/.test(line);
}
function escapeRegExp(value: string): string {
return value.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}
/**
* Backslash-escape any square bracket in a footnote body before it is wrapped in
* `^[...]`. The canonical inline-footnote tokenizer scans the body with bracket
* balancing and closes on the first UNMATCHED `]`, so an unbalanced bracket in a
* foreign definition (e.g. `[^1]: see item ] later`) would otherwise truncate the
* footnote and leak the tail as literal text. Escaping every `[`/`]` makes the
* body an inert run of characters the tokenizer then closes only on our own
* closing `]`. (A balanced `[link](url)` inside a body still round-trips because
* the escaped form renders the literal brackets, which is the safe reading for a
* footnote body; the alternative brittle balance tracking risks worse.)
*/
function escapeFootnoteBody(body: string): string {
return body.replace(/[[\]]/g, '\\$&');
}
/**
* Rewrite every `[^id]` reference on a line to its `^[body]` form, but ONLY in the
* text OUTSIDE inline-code spans. A `[^id]` inside backticks is literal code
* content and must be preserved verbatim (a footnote ref never lives inside code).
* We split the line on inline-code spans (paired backtick runs) and rewrite only
* the non-code segments.
*/
// Above this length a single line is not split into inline-code spans (see
// below). A genuine markdown line carrying a footnote reference is never tens of
// KB; the cap only bypasses the inline-code protection for pathological lines.
const INLINE_SPLIT_MAX_LINE = 8192;
function rewriteRefsOutsideInlineCode(
line: string,
replace: (text: string) => string,
): string {
// The inline-code split alternation `(`+)(?:(?!\1)[\s\S])*\1` backtracks
// quadratically on a long UNCLOSED backtick run (its middle can consume the
// rest of the line, then fail to find a closing run and retry from each
// position). On an untrusted import this is a request-thread ReDoS. A real
// footnote line is short, so for an oversized line we skip the inline-code
// protection entirely and leave the line UNTOUCHED (rewriting it wholesale
// could corrupt a `[^id]` that legitimately lives inside inline code). This is
// a conservative bypass: an over-8KB line simply does not get its reference
// footnotes inlined — acceptable for a pathological input.
if (line.length > INLINE_SPLIT_MAX_LINE) return line;
// Alternation: an inline-code span (one or more backticks, then anything up to
// the SAME run of backticks) OR a run of non-backtick text. Unterminated
// backticks fall through as ordinary text (matched by the second branch on the
// leftover), so a stray backtick never swallows the rest of the line.
const parts = line.match(/(`+)(?:(?!\1)[\s\S])*\1|[^`]+|`+/g);
if (!parts) return line;
return parts
.map((seg) => (seg.startsWith('`') ? seg : replace(seg)))
.join('');
}
/**
* Convert GFM reference footnotes (`[^id]` + `[^id]: def`) into canonical inline
* footnotes (`^[def]`).
*
* - Definitions are collected first (a leading `[^id]: text` line plus any
* immediately-following indented continuation lines, joined with a space) and
* removed from the output.
* - Each in-text reference `[^id]` for which a definition was found is replaced by
* `^[def]`. References with no matching definition are left literal (there is no
* body to inline; the parser fails them open the same way).
* - Code is respected on both passes: `[^id]` inside a fenced ``` / ~~~ block is
* never rewritten and a `[^id]:` line inside a fence is never a definition; and
* on the rewrite pass a `[^id]` inside an INLINE-code span (backticks) is left
* literal too.
* - The inlined body is bracket-escaped so an unbalanced `[`/`]` in a foreign
* definition cannot truncate the resulting `^[...]` footnote.
*
* Deduplication / reference-ordering / orphan-dropping of the resulting footnotes
* is handled downstream by the canonical parser (`assembleFootnotes`); this pass
* only changes the surface syntax.
*/
function convertReferenceFootnotes(markdown: string): string {
const lines = markdown.split('\n');
// Pass 1: collect definitions and mark their lines for removal.
const defs = new Map<string, string>();
const dropped = new Array<boolean>(lines.length).fill(false);
let inFence = false;
let fence = '';
for (let i = 0; i < lines.length; i++) {
const line = lines[i];
const marker = fenceMarker(line);
if (inFence) {
if (marker && marker[0] === fence[0] && marker.length >= fence.length) {
inFence = false;
fence = '';
}
continue;
}
if (marker) {
inFence = true;
fence = marker;
continue;
}
const def = line.match(FOOTNOTE_DEF_RE);
if (!def) continue;
const id = def[1];
const body: string[] = [def[2].trim()];
dropped[i] = true;
// Consume immediately-following indented continuation lines (GFM lazy
// continuation is not supported by design — keep it simple and predictable).
let j = i + 1;
while (j < lines.length && isIndentedContinuation(lines[j])) {
body.push(lines[j].trim());
dropped[j] = true;
j++;
}
i = j - 1;
// Last definition wins for a duplicated id (matches CommonMark link-ref
// semantics closely enough for a foreign-input adapter).
defs.set(id, body.filter((s) => s.length > 0).join(' '));
}
if (defs.size === 0) {
return markdown;
}
// ONE fixed, generic scanner regex — NOT one built from the definition ids.
// It matches ANY `[^id]` shape, and the replacer decides per match via a map
// lookup whether that id is a real definition (replace) or not (leave as-is).
// This is genuinely O(total text) with no per-document regex compilation.
//
// Do NOT rebuild this as an alternation over `[...defs.keys()]`: a giant
// `(id1|id2|...)` alternation over thousands of ids can blow the V8 regex
// compiler's stack — a fatal, UNCATCHABLE "RegExpCompiler Allocation failed"
// on prefix-chain ids (`a`, `aa`, `aaa`, ...) that kills the whole process
// (worse than the earlier per-def thread-hang). A fixed scanner has no
// id-dependent compilation cost and cannot blow up.
const refRe = /\[\^([^\]]+)\]/g;
const rewriteSegment = (segment: string): string =>
segment.replace(refRe, (whole, id: string) => {
const body = defs.get(id);
// Only real definitions are inlined; an unknown id is left literal (same as
// the old per-def loop, which simply never matched it).
return body === undefined ? whole : `^[${escapeFootnoteBody(body)}]`;
});
// Pass 2: rewrite in-text references, skipping fenced code and dropped lines.
const out: string[] = [];
inFence = false;
fence = '';
for (let i = 0; i < lines.length; i++) {
if (dropped[i]) continue;
let line = lines[i];
const marker = fenceMarker(line);
if (inFence) {
out.push(line);
if (marker && marker[0] === fence[0] && marker.length >= fence.length) {
inFence = false;
fence = '';
}
continue;
}
if (marker) {
inFence = true;
fence = marker;
out.push(line);
continue;
}
line = rewriteRefsOutsideInlineCode(line, rewriteSegment);
out.push(line);
}
return out.join('\n');
}
/**
* Strip a single leading YAML front-matter block (`---\n…\n---`). Foreign files
* from Obsidian / Hugo / Jekyll / Notion and Docmost's OWN git-sync page files
* open with front-matter that the canonical parser does not consume, so
* without this it leaks into the body (and `title: Foo` above the closing `---`
* renders as a setext `<h2>` that `extractTitleAndRemoveHeading` can hijack as
* the page title). It is a no-op for front-matter-free input.
*
* LINE-ANCHORED (the same shape the canonical parser uses in
* prosemirror-markdown/page-file.ts): the block opens only on `---\n` at the
* very start and closes only on a `\n---` line. The retired `markdownToHtml`
* strip closed on the FIRST `---` ANYWHERE (an unanchored close), so a value
* containing a triple-dash (e.g. `title: Q1 --- Q2`) truncated the front-matter
* and leaked the rest into the body. An optional leading BOM is tolerated.
*/
const YAML_FRONT_MATTER_RE = /^\uFEFF?---\n[\s\S]*?\n---\n?/;
/**
* Normalize a foreign markdown string into Docmost's canonical markdown surface
* so the strict canonical parser accepts it losslessly: normalize line endings,
* strip a leading YAML front-matter block, then rewrite GFM reference footnotes
* into inline footnotes. Add further fixture-driven foreign-surface cases here as
* they are found.
*/
export function normalizeForeignMarkdown(markdown: string): string {
if (!markdown) return markdown;
// Normalize CRLF -> LF FIRST. The line-anchored front-matter regex requires a
// bare `\n` after the opening `---`, and convertReferenceFootnotes splits on
// `\n`; a Windows/CRLF foreign file (`---\r\n…`) would otherwise slip past the
// front-matter strip and leak into the body. The canonical parser
// (page-file.ts parsePageFile) normalizes the same way before its FRONTMATTER_RE.
const src = markdown.replace(/\r\n/g, '\n');
const withoutFrontMatter = src.replace(YAML_FRONT_MATTER_RE, '').trimStart();
return convertReferenceFootnotes(withoutFrontMatter);
}
@@ -20,6 +20,10 @@ export interface IStripeSeatsSyncJob {
export interface IPageHistoryJob {
pageId: string;
// #370 — intentionality tier the worker stamps on the snapshot. All jobs on
// this queue are trailing idle-flush autosnapshots, so this is 'idle' (absent
// → treated as 'idle' by the processor).
kind?: 'idle';
}
/**
-23
View File
@@ -1,23 +0,0 @@
// Jest stub for @tiptap/react.
//
// The server export/import code paths transitively import editor-ext, whose node
// extensions import from `@tiptap/react`. The real module re-exports all of
// `@tiptap/core` (headless, safe under node) AND adds React view helpers
// (`ReactNodeViewRenderer`, …) that eagerly pull in react-dom — which throws
// `navigator is not defined` under jest's node environment.
//
// So this stub DELEGATES to the real `@tiptap/core` (keeping `mergeAttributes`,
// `Node`, `Mark`, `nodeInputRule`, … working — they are used by
// `jsonToHtml`/`htmlToJson` on the server) and overrides ONLY the React view
// helpers with no-ops. Those helpers are referenced solely inside `addNodeView()`
// — code that runs only in a live browser editor, never on the server; if any
// were actually invoked here it would (correctly) surface as a test failure.
const core = require('@tiptap/core');
module.exports = {
...core,
ReactNodeViewRenderer: () => () => ({}),
NodeViewWrapper: () => null,
NodeViewContent: () => null,
ReactRenderer: class {},
};
-3
View File
@@ -543,9 +543,6 @@ importers:
'@docmost/pdf-inspector':
specifier: 1.9.6
version: 1.9.6
'@docmost/prosemirror-markdown':
specifier: workspace:*
version: link:../../packages/prosemirror-markdown
'@fastify/cookie':
specifier: ^11.0.2
version: 11.0.2