Батч: ai-chat/footnotes/mcp/db/tree + red-team (#163 #181 #164 #173 #168 #180 + #159 8/10) #185

Merged
Ghost merged 17 commits from batch/issues-2026-06-25 into develop 2026-06-25 12:49:15 +03:00
136 changed files with 12333 additions and 1117 deletions

View File

@@ -15,6 +15,38 @@ permissions:
jobs: jobs:
test: test:
runs-on: ubuntu-latest runs-on: ubuntu-latest
# Real Postgres + Redis so the server integration suite (`*.int-spec.ts`,
# behind `pnpm --filter server test:int`) runs in CI (red-team finding #7).
# Without it, cost-cap / FK-cascade / jsonb-round-trip / real-apply tests
# only ran locally, so regressions in those paths stayed green in CI.
# Postgres uses the pgvector image because migrations create vector columns
# and global-setup runs `CREATE EXTENSION vector`. Credentials/db match the
# defaults in apps/server/test/integration/db.ts + global-setup.ts
# (docmost / docmost_dev_pw, maintenance db `docmost`, redis on 6379), so no
# TEST_*_URL overrides are needed.
services:
postgres:
image: pgvector/pgvector:pg18
env:
POSTGRES_USER: docmost
POSTGRES_PASSWORD: docmost_dev_pw
POSTGRES_DB: docmost
ports:
- 5432:5432
options: >-
--health-cmd "pg_isready -U docmost"
--health-interval 10s
--health-timeout 5s
--health-retries 5
redis:
image: redis:7
ports:
- 6379:6379
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps: steps:
- name: Checkout - name: Checkout
uses: actions/checkout@v4 uses: actions/checkout@v4
@@ -36,5 +68,12 @@ jobs:
- name: Build editor-ext - name: Build editor-ext
run: pnpm --filter @docmost/editor-ext build run: pnpm --filter @docmost/editor-ext build
- name: Run tests - name: Run unit tests
run: pnpm -r test run: pnpm -r test
# Integration suite against the real Postgres/Redis services above. Runs
# the FK-cascade, cost-cap, jsonb-round-trip and real-apply specs that the
# unit run (mocks only) cannot cover. global-setup drops/recreates the
# isolated `docmost_test` DB and migrates it to latest.
- name: Run server integration tests
run: pnpm --filter server test:int

View File

@@ -43,6 +43,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
OpenRouter, etc.; `openai` uses the official provider (real-OpenAI OpenRouter, etc.; `openai` uses the official provider (real-OpenAI
reasoning-model request shaping). Chosen explicitly rather than inferred from reasoning-model request shaping). Chosen explicitly rather than inferred from
the base URL, since a custom URL can front real OpenAI too. (#175, #177) the base URL, since a custom URL can front real OpenAI too. (#175, #177)
- **Per-MCP-server instructions in the agent prompt.** Each external MCP server
now has an admin-authored `instructions` field ("how/when to use this server's
tools") that is injected into the agent's system prompt next to that server's
tool descriptions. Trusted text, rendered inside the prompt safety sandwich;
shown only for a server that actually connected and contributed ≥1 callable
tool. (#180)
- **Footnote multi-backlinks.** A footnote referenced more than once now shows a
back-link per reference (↩ a b c …), each scrolling to its own occurrence, like
Pandoc/Wikipedia; a single-reference footnote keeps the plain ↩. (#168)
### Changed ### Changed
@@ -78,6 +87,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
are nudged after a paste to refresh stale hit-testing geometry. The caret are nudged after a paste to refresh stale hit-testing geometry. The caret
symptom is macOS-specific and was confirmed manually on macOS; the automated symptom is macOS-specific and was confirmed manually on macOS; the automated
guard pins the DOM-order invariant, not the caret behavior itself. (#146, #147) guard pins the DOM-order invariant, not the caret behavior itself. (#146, #147)
- **AI chat: the live token counter now ticks between agent steps.** During a
multi-step turn the header token badge (and the "Thinking… · N tokens" line)
no longer froze on the previous step's authoritative usage; the current step's
estimate is combined per-component with `max`, so the count rises smoothly and
never jumps backwards. (#163)
## [0.93.0] - 2026-06-21 ## [0.93.0] - 2026-06-21

View File

@@ -711,6 +711,7 @@
"Authorization header": "Authorization header", "Authorization header": "Authorization header",
"Tool allowlist": "Tool allowlist", "Tool allowlist": "Tool allowlist",
"Optional. Leave empty to allow all tools the server exposes.": "Optional. Leave empty to allow all tools the server exposes.", "Optional. Leave empty to allow all tools the server exposes.": "Optional. Leave empty to allow all tools the server exposes.",
"Optional guidance for the agent on how and when to use this server's tools. Injected into the system prompt. The server's tools are namespaced as \"<server name>_*\".": "Optional guidance for the agent on how and when to use this server's tools. Injected into the system prompt. The server's tools are namespaced as \"<server name>_*\".",
"Test": "Test", "Test": "Test",
"Available tools": "Available tools", "Available tools": "Available tools",
"No tools available": "No tools available", "No tools available": "No tools available",
@@ -1078,6 +1079,8 @@
"Undo": "Undo", "Undo": "Undo",
"Redo": "Redo", "Redo": "Redo",
"Backlinks": "Backlinks", "Backlinks": "Backlinks",
"Back to references": "Back to references",
"Back to reference {{label}}": "Back to reference {{label}}",
"Last updated by": "Last updated by", "Last updated by": "Last updated by",
"Last updated": "Last updated", "Last updated": "Last updated",
"Stats": "Stats", "Stats": "Stats",

View File

@@ -406,6 +406,8 @@
"Footnote {{number}}": "Сноска {{number}}", "Footnote {{number}}": "Сноска {{number}}",
"Go to footnote": "Перейти к сноске", "Go to footnote": "Перейти к сноске",
"Back to reference": "Вернуться к ссылке", "Back to reference": "Вернуться к ссылке",
"Back to references": "Вернуться к ссылкам",
"Back to reference {{label}}": "Вернуться к ссылке {{label}}",
"Empty footnote": "Пустая сноска", "Empty footnote": "Пустая сноска",
"Math inline": "Строчная формула", "Math inline": "Строчная формула",
"Insert inline math equation.": "Вставить математическое выражение в строку.", "Insert inline math equation.": "Вставить математическое выражение в строку.",
@@ -750,6 +752,8 @@
"Manage API keys for all users in the workspace. View the <anchor>API documentation</anchor> for usage details.": "Управляйте API-ключами для всех пользователей в рабочем пространстве. Смотрите <anchor>документацию по API</anchor> для получения информации об использовании.", "Manage API keys for all users in the workspace. View the <anchor>API documentation</anchor> for usage details.": "Управляйте API-ключами для всех пользователей в рабочем пространстве. Смотрите <anchor>документацию по API</anchor> для получения информации об использовании.",
"View the <anchor>API documentation</anchor> for usage details.": "Смотрите <anchor>документацию по API</anchor> для получения информации об использовании.", "View the <anchor>API documentation</anchor> for usage details.": "Смотрите <anchor>документацию по API</anchor> для получения информации об использовании.",
"View the <anchor>MCP documentation</anchor>.": "Смотрите <anchor>документацию по MCP</anchor>.", "View the <anchor>MCP documentation</anchor>.": "Смотрите <anchor>документацию по MCP</anchor>.",
"Instructions": "Инструкции",
"Optional guidance for the agent on how and when to use this server's tools. Injected into the system prompt. The server's tools are namespaced as \"<server name>_*\".": "Необязательное указание агенту, как и когда использовать инструменты этого сервера. Добавляется в системный промпт. Инструменты сервера именуются с префиксом «<имя сервера>_*».",
"Sources": "Источники", "Sources": "Источники",
"AI Answers not available for attachments": "Ответы ИИ недоступны для вложений", "AI Answers not available for attachments": "Ответы ИИ недоступны для вложений",
"No answer available": "Ответ недоступен", "No answer available": "Ответ недоступен",

View File

@@ -161,7 +161,11 @@
margin-top: 4px; margin-top: 4px;
font-size: var(--mantine-font-size-xs); font-size: var(--mantine-font-size-xs);
color: light-dark(var(--mantine-color-gray-7), var(--mantine-color-dark-1)); color: light-dark(var(--mantine-color-gray-7), var(--mantine-color-dark-1));
white-space: pre-wrap; /* NOTE: `white-space: pre-wrap` is intentionally NOT set here. On the
rendered markdown <div> it would turn the newlines between block tags
(</li>\n<li>, </p>\n<ol>) into visible blank lines/indents on top of the
margins. The plain-text fallback <Text> that needs pre-wrap sets it
inline itself (see reasoning-block.tsx). */
} }
.reasoningText p { .reasoningText p {

View File

@@ -3,6 +3,7 @@ import { Box, Collapse, Group, Text, UnstyledButton } from "@mantine/core";
import { IconChevronDown } from "@tabler/icons-react"; import { IconChevronDown } from "@tabler/icons-react";
import { useTranslation } from "react-i18next"; import { useTranslation } from "react-i18next";
import { estimateTokens } from "@/features/ai-chat/utils/count-stream-tokens.ts"; import { estimateTokens } from "@/features/ai-chat/utils/count-stream-tokens.ts";
import { collapseBlankLines } from "@/features/ai-chat/utils/collapse-blank-lines.ts";
import { renderChatMarkdown } from "@/features/ai-chat/utils/markdown.ts"; import { renderChatMarkdown } from "@/features/ai-chat/utils/markdown.ts";
import classes from "@/features/ai-chat/components/ai-chat.module.css"; import classes from "@/features/ai-chat/components/ai-chat.module.css";
@@ -33,7 +34,12 @@ export default function ReasoningBlock({ text, tokens }: ReasoningBlockProps) {
// Authoritative count wins; otherwise estimate live from the streamed text. // Authoritative count wins; otherwise estimate live from the streamed text.
const count = tokens && tokens > 0 ? tokens : estimateTokens(text); const count = tokens && tokens > 0 ? tokens : estimateTokens(text);
const trimmed = text.trim(); const trimmed = text.trim();
const html = trimmed ? renderChatMarkdown(trimmed, {}) : ""; // Collapse the blank-line gaps the model emits between every list item /
// paragraph so the reasoning renders compactly (tight lists, joined
// paragraphs) — see collapseBlankLines. ONLY here, not in the normal answer.
const html = trimmed
? renderChatMarkdown(collapseBlankLines(trimmed), {})
: "";
return ( return (
<Box className={classes.reasoningBlock} mb={6}> <Box className={classes.reasoningBlock} mb={6}>

View File

@@ -0,0 +1,61 @@
import { describe, it, expect } from "vitest";
import { collapseBlankLines } from "@/features/ai-chat/utils/collapse-blank-lines.ts";
import { renderChatMarkdown } from "@/features/ai-chat/utils/markdown.ts";
describe("collapseBlankLines", () => {
it("collapses a run of 2+ newlines to a single newline", () => {
expect(collapseBlankLines("a\n\nb")).toBe("a\nb");
expect(collapseBlankLines("a\n\n\n\nb")).toBe("a\nb");
});
it("keeps single newlines untouched", () => {
expect(collapseBlankLines("a\nb\nc")).toBe("a\nb\nc");
});
it("preserves blank lines INSIDE a fenced code block", () => {
const src = "a\n\n\nb\n\n```\nx\n\n\ny\n```\n\nc";
// Prose blanks collapse; the blank lines between the ``` fences survive.
expect(collapseBlankLines(src)).toBe("a\nb\n```\nx\n\n\ny\n```\nc");
});
it("handles a tilde fence and preserves its interior blanks", () => {
const src = "p\n\n~~~\ncode\n\nmore\n~~~\n\nq";
expect(collapseBlankLines(src)).toBe("p\n~~~\ncode\n\nmore\n~~~\nq");
});
it("leaves an unclosed fence's remaining lines verbatim", () => {
const src = "intro\n\n```\nstill\n\nopen";
expect(collapseBlankLines(src)).toBe("intro\n```\nstill\n\nopen");
});
it("is a no-op for text with no blank lines", () => {
expect(collapseBlankLines("just one line")).toBe("just one line");
});
});
describe("collapseBlankLines + renderChatMarkdown (tight reasoning rendering)", () => {
it("renders a blank-line-separated list as a TIGHT list (no <li><p>)", () => {
const loose =
"Intro paragraph.\n\n- item one\n\n- item two\n\n- item three";
const html = renderChatMarkdown(collapseBlankLines(loose), {});
// Tight list: each <li> holds the text directly, not wrapped in a <p>.
expect(html).toContain("<li>item one</li>");
expect(html).not.toContain("<li><p>");
// The list still parses as a list after the paragraph (not a paragraph+<br>).
expect(html).toContain("<ul>");
expect(html).toContain("<p>Intro paragraph.</p>");
});
it("renders an ordered list (1. 2.) as tight after collapsing", () => {
const loose = "Intro.\n\n1. first\n\n2. second";
const html = renderChatMarkdown(collapseBlankLines(loose), {});
expect(html).toContain("<ol>");
expect(html).toContain("<li>first</li>");
expect(html).not.toContain("<li><p>");
});
it("the loose source WOULD render <li><p> without collapsing (control)", () => {
const loose = "- a\n\n- b";
expect(renderChatMarkdown(loose, {})).toContain("<li><p>");
});
});

View File

@@ -0,0 +1,56 @@
// Pure helper for compact reasoning ("Thinking") rendering. Kept free of React
// so it can be unit-tested in isolation (see collapse-blank-lines.test.ts).
/**
* Collapse runs of 2+ newlines down to a single newline, EXCEPT inside fenced
* code blocks (``` ... ``` or ~~~ ... ~~~), where blank lines are significant.
*
* Why: reasoning models emit thinking with a blank line (`\n\n`) between every
* list item and paragraph. `marked` turns those into "loose" lists (each `<li>`
* wrapped in a `<p>`) and separate `<p>` paragraphs, each carrying a vertical
* margin — so the "Thinking" block renders with large, airy gaps. Removing the
* blank-line gaps yields tight lists (no `<li><p>`) and joined paragraphs. The
* chat markdown renderer runs with `breaks: true`, so a single `\n` still
* becomes a `<br>` — line breaks inside the reasoning are preserved; only the
* empty gaps between blocks disappear. Apply ONLY to reasoning text, never to a
* normal assistant answer (where paragraph spacing is intentional).
*
* Fenced code is preserved verbatim: a fence opens on a line whose first
* non-space characters are ``` or ~~~ and closes on the next line that starts
* with the same fence character. Blank lines between fences (significant for
* code formatting) are never collapsed.
*/
export function collapseBlankLines(text: string): string {
const lines = text.split("\n");
const out: string[] = [];
let inFence = false;
let fenceChar = "";
for (const line of lines) {
const fenceMatch = line.match(/^\s*(`{3,}|~{3,})/);
if (fenceMatch) {
const ch = fenceMatch[1][0];
if (!inFence) {
inFence = true;
fenceChar = ch;
} else if (ch === fenceChar) {
inFence = false;
}
out.push(line);
continue;
}
// Inside a fenced block every line (including blanks) is significant.
if (inFence) {
out.push(line);
continue;
}
// Outside fences: drop blank lines so a `\n\n+` gap collapses to a single
// `\n` between the surrounding content lines.
if (line.trim() === "") continue;
out.push(line);
}
return out.join("\n");
}

View File

@@ -117,3 +117,55 @@ describe("liveTurnTokens — authoritative path", () => {
expect(r).toEqual({ reasoning: 0, output: 1, authoritative: false }); expect(r).toEqual({ reasoning: 0, output: 1, authoritative: false });
}); });
}); });
describe("liveTurnTokens — combined authoritative + estimate (#163)", () => {
it("ticks the in-flight step above the completed-steps authoritative base", () => {
// The authoritative usage is the sum over COMPLETED steps (step 1). The
// CURRENT step is streaming and its text is NOT in `usage` yet, but it IS in
// the parts -> the running estimate must push the live figure above the base
// so the badge keeps growing between step boundaries.
const longText = "x".repeat(800); // 800 chars -> 200 est output tokens
const r = liveTurnTokens(
msg([{ type: "text", text: longText }], {
usage: { inputTokens: 500, outputTokens: 40 }, // step-1 base: 40 output
}),
);
// max(authOutput=40, estOutput=200) = 200 -> the counter ticks, not frozen.
expect(r.output).toBe(200);
expect(r.authoritative).toBe(true);
});
it("ticks reasoning of the in-flight step above the authoritative reasoning base", () => {
const longReasoning = "r".repeat(400); // 400 chars -> 100 est reasoning
const r = liveTurnTokens(
msg([{ type: "reasoning", text: longReasoning }], {
usage: { inputTokens: 100, outputTokens: 20, reasoningTokens: 20 },
}),
);
// reasoning: max(20, 100) = 100 ; output: max(max(0,20-20)=0, 0) = 0.
expect(r.reasoning).toBe(100);
expect(r.output).toBe(0);
expect(r.authoritative).toBe(true);
});
it("snaps to the authoritative figure once it exceeds the rough estimate", () => {
// Short on-screen text (estimate tiny) but a large authoritative output:
// the exact figure wins at the boundary (the counter never under-reports).
const r = liveTurnTokens(
msg([{ type: "text", text: "abcd" }], {
usage: { inputTokens: 10, outputTokens: 5000 },
}),
);
expect(r.output).toBe(5000);
});
it("is monotonic: max never drops below the authoritative base when the estimate is smaller", () => {
// Mirrors the legacy 'verbatim' tests: estimate < authoritative -> unchanged.
const r = liveTurnTokens(
msg([{ type: "text", text: "tiny" }], {
usage: { inputTokens: 500, outputTokens: 100, reasoningTokens: 30 },
}),
);
expect(r).toEqual({ reasoning: 30, output: 70, authoritative: true });
});
});

View File

@@ -56,39 +56,58 @@ function metadataUsage(message: UIMessage): AuthoritativeUsage | undefined {
/** /**
* Token split for the given (streaming) assistant message. * Token split for the given (streaming) assistant message.
* *
* Prefers AUTHORITATIVE `metadata.usage` when the server has attached it (at a * COMBINES the authoritative server usage with the running text estimate so the
* step/turn boundary, incl. `reasoningTokens`) — so the live counter snaps to the * counter ticks in real time AND lands exact. The server only attaches
* provider's exact figures. Until then it returns a running ESTIMATE summed over * `metadata.usage` at a step/turn boundary (`finish-step`/`finish`) and it is
* the message parts: `reasoning` parts feed the reasoning estimate, `text` parts * CUMULATIVE over COMPLETED steps — it does NOT yet include the in-flight step.
* feed the output estimate. Multi-part / multi-step turns accumulate naturally * So a multi-step turn that returned the authoritative figure verbatim would
* because every part of the turn is summed. * FREEZE between boundaries and jump in steps (issue #163).
*
* Instead we always compute the running ESTIMATE (chars/≈4 over the message's
* `reasoning`/`text` parts, which grows on every streamed delta) and take the
* per-component MAX of the authoritative base and the estimate:
* - between boundaries the estimate of the in-flight step ticks the number up;
* - at a boundary the authoritative figure snaps it to exact;
* - because the server's usage is cumulative and we only ever take the max, the
* number is MONOTONIC — it never drops.
* *
* Providers that don't stream reasoning text still surface a reasoning count once * Providers that don't stream reasoning text still surface a reasoning count once
* the authoritative usage arrives (`usage.reasoningTokens`); on the pure estimate * the authoritative usage arrives (`max(reasoningTokens, 0)`); on the pure
* path such a turn simply shows `reasoning: 0` until then. * estimate path (no usage yet) such a turn shows `reasoning: 0` until then.
*/ */
export function liveTurnTokens(message: UIMessage | undefined): LiveTurnTokens { export function liveTurnTokens(message: UIMessage | undefined): LiveTurnTokens {
if (!message) return { reasoning: 0, output: 0, authoritative: false }; if (!message) return { reasoning: 0, output: 0, authoritative: false };
const usage = metadataUsage(message); // Running ESTIMATE over every reasoning/text part — grows on each delta. This
if (usage) { // includes the IN-FLIGHT step, which the authoritative usage does not cover yet.
// Authoritative branch: outputTokens already INCLUDES reasoning tokens in the let estReasoning = 0;
// AI SDK usage shape, so subtract reasoning out for the "answer" figure (never let estOutput = 0;
// go negative if a provider reports them inconsistently).
const reasoning = usage.reasoningTokens ?? 0;
const totalOutput = usage.outputTokens ?? 0;
const output = Math.max(0, totalOutput - reasoning);
return { reasoning, output, authoritative: true };
}
let reasoning = 0;
let output = 0;
for (const part of message.parts ?? []) { for (const part of message.parts ?? []) {
if (part.type === "reasoning") { if (part.type === "reasoning") {
reasoning += estimateTokens((part as { text?: string }).text ?? ""); estReasoning += estimateTokens((part as { text?: string }).text ?? "");
} else if (part.type === "text") { } else if (part.type === "text") {
output += estimateTokens((part as { text?: string }).text ?? ""); estOutput += estimateTokens((part as { text?: string }).text ?? "");
} }
} }
return { reasoning, output, authoritative: false };
const usage = metadataUsage(message);
if (!usage) {
// No authoritative usage streamed yet: the estimate IS the live figure.
return { reasoning: estReasoning, output: estOutput, authoritative: false };
}
// Authoritative sum over COMPLETED steps. `outputTokens` already INCLUDES
// reasoning in the AI SDK usage shape, so subtract it out for the "answer"
// figure (never go negative if a provider reports them inconsistently).
const authReasoning = usage.reasoningTokens ?? 0;
const authOutput = Math.max(0, (usage.outputTokens ?? 0) - authReasoning);
// Per-component max: the in-flight step's estimate ticks above the completed-
// steps base between boundaries, and the authoritative figure wins once it
// exceeds the (rough) estimate at the next boundary. Monotonic by construction.
return {
reasoning: Math.max(authReasoning, estReasoning),
output: Math.max(authOutput, estOutput),
authoritative: true,
};
} }

View File

@@ -1,25 +1,45 @@
import { NodeViewContent, NodeViewProps, NodeViewWrapper } from "@tiptap/react"; import { NodeViewContent, NodeViewProps, NodeViewWrapper } from "@tiptap/react";
import { useTranslation } from "react-i18next"; import { useTranslation } from "react-i18next";
import { getFootnoteNumber } from "@docmost/editor-ext"; import { getFootnoteNumber, getFootnoteRefCount } from "@docmost/editor-ext";
import classes from "./footnote.module.css"; import classes from "./footnote.module.css";
/**
* A 0-based backlink index -> its lowercase letter label (0 -> "a", 25 -> "z",
* 26 -> "aa", ...), matching the Pandoc/Wikipedia "↩ a b c" convention.
*/
export function backlinkLabel(index: number): string {
let out = "";
let x = index;
while (x >= 0) {
out = String.fromCharCode(97 + (x % 26)) + out;
x = Math.floor(x / 26) - 1;
}
return out;
}
/** /**
* NodeView for a single footnote definition: a decorative number marker, the * NodeView for a single footnote definition: a decorative number marker, the
* editable content (NodeViewContent), and a "↩" back-link to its reference. * editable content (NodeViewContent), and a "↩" back-link to its reference.
* The number is derived from the document (not stored). * The number is derived from the document (not stored).
*
* After #166 a footnote can be referenced more than once (one number, one
* definition, N forward links). When it is, the back-link becomes a row of
* per-occurrence links — ↩ a b c … — each scrolling to its own reference (#168);
* a single-reference footnote keeps the plain ↩.
*/ */
export default function FootnoteDefinitionView(props: NodeViewProps) { export default function FootnoteDefinitionView(props: NodeViewProps) {
const { node, editor } = props; const { node, editor } = props;
const { t } = useTranslation(); const { t } = useTranslation();
const id = node.attrs.id as string; const id = node.attrs.id as string;
// Read the cached number from the numbering plugin (computed once per doc // Read the cached number/ref-count from the numbering plugin (computed once
// change) rather than recomputing the whole map on every render. // per doc change) rather than recomputing the whole map on every render.
const number = getFootnoteNumber(editor.state, id) ?? "?"; const number = getFootnoteNumber(editor.state, id) ?? "?";
const refCount = getFootnoteRefCount(editor.state, id);
const handleBack = (e: React.MouseEvent) => { const jumpTo = (e: React.MouseEvent, index: number) => {
e.preventDefault(); e.preventDefault();
editor.commands.scrollToReference(id); editor.commands.scrollToReference(id, index);
}; };
return ( return (
@@ -42,16 +62,47 @@ export default function FootnoteDefinitionView(props: NodeViewProps) {
> >
{number}. {number}.
</span> </span>
{refCount > 1 ? (
// Multiple references -> ↩ followed by one lettered link per occurrence.
<span
className={classes.backLinks}
contentEditable={false}
role="group"
aria-label={t("Back to references")}
>
<span className={classes.backLinkArrow} aria-hidden="true">
</span>
{Array.from({ length: refCount }, (_, i) => (
<span
key={i}
className={classes.backLink}
onClick={(e) => jumpTo(e, i)}
role="button"
aria-label={t("Back to reference {{label}}", {
label: backlinkLabel(i),
})}
title={t("Back to reference {{label}}", {
label: backlinkLabel(i),
})}
>
{backlinkLabel(i)}
</span>
))}
</span>
) : (
// Single reference -> the plain ↩ (unchanged behavior).
<span <span
className={classes.backLink} className={classes.backLink}
contentEditable={false} contentEditable={false}
onClick={handleBack} onClick={(e) => jumpTo(e, 0)}
role="button" role="button"
aria-label={t("Back to reference")} aria-label={t("Back to reference")}
title={t("Back to reference")} title={t("Back to reference")}
> >
</span> </span>
)}
</NodeViewWrapper> </NodeViewWrapper>
); );
} }

View File

@@ -1,5 +1,5 @@
import { describe, it, expect, vi } from "vitest"; import { describe, it, expect, vi, afterEach } from "vitest";
import { render } from "@testing-library/react"; import { render, fireEvent } from "@testing-library/react";
/** /**
* Structural regression guard for #146 (PR #147). * Structural regression guard for #146 (PR #147).
@@ -36,10 +36,14 @@ vi.mock("react-i18next", () => ({
useTranslation: () => ({ t: (key: string) => key }), useTranslation: () => ({ t: (key: string) => key }),
})); }));
// footnote-definition-view reads a cached number from the numbering plugin; // footnote-definition-view reads a cached number + reference count from the
// stub it so we don't need a live ProseMirror state. // numbering plugin; stub them so we don't need a live ProseMirror state. The
// ref-count is a hoisted mutable so a test can drive the single-vs-multi
// backlink branch (#168). Default 1 = single reference (the #146 cases).
const { mockRefCount } = vi.hoisted(() => ({ mockRefCount: { value: 1 } }));
vi.mock("@docmost/editor-ext", () => ({ vi.mock("@docmost/editor-ext", () => ({
getFootnoteNumber: () => 1, getFootnoteNumber: () => 1,
getFootnoteRefCount: () => mockRefCount.value,
})); }));
// Mocks so CodeBlockView renders cheaply (no MantineProvider, no matchMedia). // Mocks so CodeBlockView renders cheaply (no MantineProvider, no matchMedia).
@@ -59,7 +63,8 @@ vi.mock("@mantine/core", () => ({
), ),
})); }));
vi.mock("@/components/common/copy-button", () => ({ vi.mock("@/components/common/copy-button", () => ({
CopyButton: ({ children }: any) => children({ copied: false, copy: () => {} }), CopyButton: ({ children }: any) =>
children({ copied: false, copy: () => {} }),
})); }));
vi.mock("@tabler/icons-react", () => ({ vi.mock("@tabler/icons-react", () => ({
IconCheck: () => null, IconCheck: () => null,
@@ -70,7 +75,9 @@ vi.mock("@/features/editor/components/code-block/mermaid-view.tsx", () => ({
})); }));
import FootnotesListView from "./footnotes-list-view"; import FootnotesListView from "./footnotes-list-view";
import FootnoteDefinitionView from "./footnote-definition-view"; import FootnoteDefinitionView, {
backlinkLabel,
} from "./footnote-definition-view";
import CodeBlockView from "../code-block/code-block-view"; import CodeBlockView from "../code-block/code-block-view";
// Minimal NodeViewProps stub: definition view only touches node.attrs.id and // Minimal NodeViewProps stub: definition view only touches node.attrs.id and
@@ -141,3 +148,84 @@ describe("#146 editable NodeView contentDOM-first invariant", () => {
}, },
); );
}); });
// #168: a footnote referenced more than once shows one lettered backlink per
// occurrence (↩ a b c), each scrolling to its own reference; a single-reference
// footnote keeps the plain ↩.
describe("#168 footnote definition multi-backlinks", () => {
afterEach(() => {
// Reset the shared ref-count mock so other tests see a single reference.
mockRefCount.value = 1;
});
const makeProps = () =>
({
node: { attrs: { id: "fn-1" }, textContent: "" },
editor: {
state: {},
isEditable: true,
commands: { scrollToReference: vi.fn() },
},
getPos: () => 0,
updateAttributes: () => {},
deleteNode: () => {},
}) as any;
it("renders one lettered backlink per reference (a, b, c) plus the ↩ arrow", () => {
mockRefCount.value = 3;
const { getByTestId } = render(<FootnoteDefinitionView {...makeProps()} />);
const wrapper = getByTestId("nvw");
const links = wrapper.querySelectorAll('[role="button"]');
expect(Array.from(links).map((l) => l.textContent)).toEqual([
"a",
"b",
"c",
]);
// The ↩ arrow is present (as decorative chrome, not a button).
expect(wrapper.textContent).toContain("↩");
});
it("clicking the n-th backlink scrolls to the n-th occurrence (0-based)", () => {
mockRefCount.value = 3;
const props = makeProps();
const { getByTestId } = render(<FootnoteDefinitionView {...props} />);
const links = getByTestId("nvw").querySelectorAll('[role="button"]');
fireEvent.click(links[1]); // "b"
expect(props.editor.commands.scrollToReference).toHaveBeenCalledWith(
"fn-1",
1,
);
});
it("a single-reference footnote renders just one ↩ (no letters)", () => {
mockRefCount.value = 1;
const props = makeProps();
const { getByTestId } = render(<FootnoteDefinitionView {...props} />);
const wrapper = getByTestId("nvw");
const links = wrapper.querySelectorAll('[role="button"]');
expect(links.length).toBe(1);
expect(links[0].textContent).toBe("↩");
fireEvent.click(links[0]);
expect(props.editor.commands.scrollToReference).toHaveBeenCalledWith(
"fn-1",
0,
);
});
});
// #185 re-review pt 7: backlinkLabel is base-26 (a..z, then aa…). The component
// tests only cover a,b,c (index 0-2); pin the >= 26 carry boundary.
describe("backlinkLabel base-26 boundary (#168)", () => {
it("maps 0->a, 25->z, 26->aa, 27->ab, 51->az, 52->ba", () => {
expect(backlinkLabel(0)).toBe("a");
expect(backlinkLabel(25)).toBe("z");
expect(backlinkLabel(26)).toBe("aa");
expect(backlinkLabel(27)).toBe("ab");
expect(backlinkLabel(51)).toBe("az");
expect(backlinkLabel(52)).toBe("ba");
});
});

View File

@@ -115,3 +115,18 @@
.backLink:hover { .backLink:hover {
text-decoration: underline; text-decoration: underline;
} }
/* Multi-backlink row (#168): ↩ a b c — one lettered link per reference
occurrence. Sits on the right, after the content, like the single ↩. */
.backLinks {
flex: 0 0 auto;
display: inline-flex;
align-items: baseline;
gap: 0.3em;
user-select: none;
}
.backLinkArrow {
color: var(--mantine-color-dimmed);
font-size: 0.9em;
}

View File

@@ -274,7 +274,10 @@ export function useRestorePageMutation() {
queryClient.setQueryData<IPage>(["pages", restoredPage.slugId], merge); queryClient.setQueryData<IPage>(["pages", restoredPage.slugId], merge);
}, },
onError: (error) => { onError: (error) => {
notifications.show({ message: t("Failed to restore page"), color: "red" }); notifications.show({
message: t("Failed to restore page"),
color: "red",
});
}, },
}); });
} }
@@ -285,10 +288,10 @@ export function useGetSidebarPagesQuery(
return useInfiniteQuery({ return useInfiniteQuery({
queryKey: ["sidebar-pages", data], queryKey: ["sidebar-pages", data],
enabled: !!data?.pageId || !!data?.spaceId, enabled: !!data?.pageId || !!data?.spaceId,
queryFn: ({ pageParam }) => getSidebarPages({ ...data, cursor: pageParam, limit: 100 }), queryFn: ({ pageParam }) =>
getSidebarPages({ ...data, cursor: pageParam, limit: 100 }),
initialPageParam: undefined, initialPageParam: undefined,
getNextPageParam: (lastPage) => getNextPageParam: (lastPage) => lastPage.meta?.nextCursor ?? undefined,
lastPage.meta?.nextCursor ?? undefined,
}); });
} }
@@ -296,11 +299,14 @@ export function useGetRootSidebarPagesQuery(data: SidebarPagesParams) {
return useInfiniteQuery({ return useInfiniteQuery({
queryKey: ["root-sidebar-pages", data.spaceId], queryKey: ["root-sidebar-pages", data.spaceId],
queryFn: async ({ pageParam }) => { queryFn: async ({ pageParam }) => {
return getSidebarPages({ spaceId: data.spaceId, cursor: pageParam, limit: 100 }); return getSidebarPages({
spaceId: data.spaceId,
cursor: pageParam,
limit: 100,
});
}, },
initialPageParam: undefined, initialPageParam: undefined,
getNextPageParam: (lastPage) => getNextPageParam: (lastPage) => lastPage.meta?.nextCursor ?? undefined,
lastPage.meta?.nextCursor ?? undefined,
}); });
} }
@@ -323,12 +329,17 @@ export function usePageBreadcrumbsQuery(
}); });
} }
export async function fetchAllAncestorChildren(params: SidebarPagesParams) { export async function fetchAllAncestorChildren(
params: SidebarPagesParams,
// `fresh: true` forces a server refetch (staleTime 0) — used by the reconnect
// refresh (#159 #8), which must NOT receive the 30-min-cached children.
opts?: { fresh?: boolean },
) {
// not using a hook here, so we can call it inside a useEffect hook // not using a hook here, so we can call it inside a useEffect hook
const response = await queryClient.fetchQuery({ const response = await queryClient.fetchQuery({
queryKey: ["sidebar-pages", params], queryKey: ["sidebar-pages", params],
queryFn: () => getAllSidebarPages(params), queryFn: () => getAllSidebarPages(params),
staleTime: 30 * 60 * 1000, staleTime: opts?.fresh ? 0 : 30 * 60 * 1000,
}); });
const allItems = response.pages.flatMap((page) => page.items); const allItems = response.pages.flatMap((page) => page.items);
@@ -347,11 +358,15 @@ export function useRecentChangesQuery(spaceId?: string) {
}); });
} }
export function useCreatedByQuery(params?: { userId?: string; spaceId?: string }) { export function useCreatedByQuery(params?: {
userId?: string;
spaceId?: string;
}) {
const { userId, spaceId } = params ?? {}; const { userId, spaceId } = params ?? {};
return useInfiniteQuery({ return useInfiniteQuery({
queryKey: ["pages-created-by-user", { userId, spaceId }], queryKey: ["pages-created-by-user", { userId, spaceId }],
queryFn: ({ pageParam }) => getCreatedByPages({ userId, spaceId, cursor: pageParam, limit: 15 }), queryFn: ({ pageParam }) =>
getCreatedByPages({ userId, spaceId, cursor: pageParam, limit: 15 }),
initialPageParam: undefined as string | undefined, initialPageParam: undefined as string | undefined,
getNextPageParam: (lastPage) => getNextPageParam: (lastPage) =>
lastPage.meta.hasNextPage ? lastPage.meta.nextCursor : undefined, lastPage.meta.hasNextPage ? lastPage.meta.nextCursor : undefined,

View File

@@ -29,9 +29,11 @@ import {
collectBranchIds, collectBranchIds,
openBranches, openBranches,
closeIds, closeIds,
loadedOpenBranchIds,
} from "@/features/page/tree/utils/utils.ts"; } from "@/features/page/tree/utils/utils.ts";
import { SpaceTreeNode } from "@/features/page/tree/types.ts"; import { SpaceTreeNode } from "@/features/page/tree/types.ts";
import { treeModel } from "@/features/page/tree/model/tree-model"; import { treeModel } from "@/features/page/tree/model/tree-model";
import { socketAtom } from "@/features/websocket/atoms/socket-atom.ts";
import { import {
getPageBreadcrumbs, getPageBreadcrumbs,
getSpaceTree, getSpaceTree,
@@ -39,11 +41,7 @@ import {
import { IPage } from "@/features/page/types/page.types.ts"; import { IPage } from "@/features/page/types/page.types.ts";
import { extractPageSlugId } from "@/lib"; import { extractPageSlugId } from "@/lib";
import { isCompactPageTreeEnabled } from "@/lib/config.ts"; import { isCompactPageTreeEnabled } from "@/lib/config.ts";
import { import { DocTree, ROW_HEIGHT_COMPACT, ROW_HEIGHT_STANDARD } from "./doc-tree";
DocTree,
ROW_HEIGHT_COMPACT,
ROW_HEIGHT_STANDARD,
} from "./doc-tree";
import { SpaceTreeRow } from "./space-tree-row"; import { SpaceTreeRow } from "./space-tree-row";
interface SpaceTreeProps { interface SpaceTreeProps {
@@ -193,6 +191,54 @@ const SpaceTree = forwardRef<SpaceTreeApi, SpaceTreeProps>(function SpaceTree(
[openTreeNodes], [openTreeNodes],
); );
// Latest tree + open-state for the reconnect handler (its closure would
// otherwise read stale snapshots).
const [socket] = useAtom(socketAtom);
const dataRef = useRef(data);
dataRef.current = data;
const openIdsRef = useRef(openIds);
openIdsRef.current = openIds;
// Reconnect refresh (#159 #8): on a socket reconnect, re-fetch and reconcile
// the children of every currently-open, already-loaded branch of THIS space,
// so a move/rename/delete that happened INSIDE a loaded branch while events
// were missed (laptop sleep / wifi gap) is reflected instead of left stale.
// The ROOT level is reconciled separately by the root-query refetch +
// mergeRootTrees; an UNLOADED branch is skipped (lazy-load fetches it fresh on
// expand). No first-connect guard is needed: space-tree usually mounts AFTER
// the initial connect, so every `connect` it sees is a reconnect; the rare
// initial-connect case has an empty tree, so the refresh is a harmless no-op.
useEffect(() => {
if (!socket) return;
const onConnect = async () => {
const effectSpaceId = spaceIdRef.current;
const branchIds = loadedOpenBranchIds(
dataRef.current.filter((n) => n?.spaceId === effectSpaceId),
openIdsRef.current,
);
if (branchIds.length === 0) return;
for (const id of branchIds) {
try {
// `fresh: true` bypasses the 30-min sidebar-pages cache so the
// reconcile sees the server's CURRENT children (handler-order
// independent — no reliance on the global reconnect invalidation).
const fresh = await fetchAllAncestorChildren(
{ pageId: id, spaceId: effectSpaceId },
{ fresh: true },
);
if (spaceIdRef.current !== effectSpaceId) return; // space switched
setData((prev) => treeModel.reconcileChildren(prev, id, fresh));
} catch (err) {
console.error("[tree] reconnect branch refresh failed", err);
}
}
};
socket.on("connect", onConnect);
return () => {
socket.off("connect", onConnect);
};
}, [socket, setData]);
const handleToggle = useCallback( const handleToggle = useCallback(
async (id: string, isOpen: boolean) => { async (id: string, isOpen: boolean) => {
setOpenTreeNodes((prev) => ({ ...prev, [id]: isOpen })); setOpenTreeNodes((prev) => ({ ...prev, [id]: isOpen }));
@@ -245,8 +291,7 @@ const SpaceTree = forwardRef<SpaceTreeApi, SpaceTreeProps>(function SpaceTree(
notifications.show({ notifications.show({
color: "red", color: "red",
message: t("Couldn't expand the tree: {{reason}}", { message: t("Couldn't expand the tree: {{reason}}", {
reason: reason: err?.response?.data?.message ?? err?.message ?? String(err),
err?.response?.data?.message ?? err?.message ?? String(err),
}), }),
}); });
} finally { } finally {
@@ -262,11 +307,11 @@ const SpaceTree = forwardRef<SpaceTreeApi, SpaceTreeProps>(function SpaceTree(
setOpenTreeNodes((prev) => closeIds(prev, ids)); setOpenTreeNodes((prev) => closeIds(prev, ids));
}, [filteredData, setOpenTreeNodes]); }, [filteredData, setOpenTreeNodes]);
useImperativeHandle( useImperativeHandle(ref, () => ({ expandAll, collapseAll, isExpanding }), [
ref, expandAll,
() => ({ expandAll, collapseAll, isExpanding }), collapseAll,
[expandAll, collapseAll, isExpanding], isExpanding,
); ]);
// Stable callbacks for DocTree. Without these, every parent render recreates // Stable callbacks for DocTree. Without these, every parent render recreates
// the props and tears down every row's draggable/dropTarget subscription, // the props and tears down every row's draggable/dropTarget subscription,

File diff suppressed because it is too large Load Diff

View File

@@ -1,4 +1,4 @@
import type { TreeNode, SiblingsInfo } from './tree-model.types'; import type { TreeNode, SiblingsInfo } from "./tree-model.types";
function findInternal<T extends object>( function findInternal<T extends object>(
nodes: TreeNode<T>[], nodes: TreeNode<T>[],
@@ -19,7 +19,10 @@ export const treeModel = {
return findInternal(tree, id)?.node ?? null; return findInternal(tree, id)?.node ?? null;
}, },
path<T extends object>(tree: TreeNode<T>[], id: string): TreeNode<T>[] | null { path<T extends object>(
tree: TreeNode<T>[],
id: string,
): TreeNode<T>[] | null {
const found = findInternal(tree, id); const found = findInternal(tree, id);
if (!found) return null; if (!found) return null;
return [...found.parents, found.node]; return [...found.parents, found.node];
@@ -123,6 +126,23 @@ export const treeModel = {
return treeModel.insert(tree, null, node, index(tree)); return treeModel.insert(tree, null, node, index(tree));
} }
const parent = treeModel.find(tree, parentId); const parent = treeModel.find(tree, parentId);
// The parent is in the tree but its children have NOT been lazy-loaded yet
// (`children === undefined`, distinct from a loaded-but-empty `[]`). Inserting
// here would MATERIALIZE a misleading partial child list (`[node]`) that
// defeats the lazy-load gate — which fetches only when children are
// absent/empty — so the parent's OTHER real children would never load and the
// moved/added node would be the only one shown (a silent data loss, #159 #1).
// Instead, leave the children unloaded and just flag `hasChildren` so the
// chevron appears; expanding fetches the FULL set (including this node).
if (parent && parent.children === undefined) {
return treeModel.update(
tree,
parentId,
// hasChildren is not part of the generic T constraint; tree nodes carry
// it. Cast narrowly so this stays a single, well-understood exception.
{ hasChildren: true } as unknown as Omit<Partial<T>, "id" | "children">,
);
}
const kids = (parent?.children as TreeNode<T>[] | undefined) ?? []; const kids = (parent?.children as TreeNode<T>[] | undefined) ?? [];
return treeModel.insert(tree, parentId, node, index(kids)); return treeModel.insert(tree, parentId, node, index(kids));
}, },
@@ -203,6 +223,48 @@ export const treeModel = {
return touched ? out : tree; return touched ? out : tree;
}, },
// Replace a parent's DIRECT children with the authoritative `fresh` set while
// PRESERVING each surviving child's already-loaded grandchildren (deeper
// expansion). Unlike `appendChildren` (add-only), this DROPS children that are
// no longer present and reorders to `fresh` — so a move/delete/rename that
// happened inside a loaded branch while events were missed (a socket reconnect
// gap) is reflected, not left stale (#159 #8). Only used to reconcile an
// already-loaded branch against a fresh fetch; a parent with no loaded children
// (`children === undefined`) is left untouched (lazy-load handles it).
reconcileChildren<T extends object>(
tree: TreeNode<T>[],
parentId: string,
fresh: TreeNode<T>[],
): TreeNode<T>[] {
let touched = false;
const walk = (nodes: TreeNode<T>[]): TreeNode<T>[] =>
nodes.map((n) => {
if (n.id === parentId) {
// Only reconcile a branch whose children were actually loaded; an
// unloaded parent stays unloaded (lazy-load fetches it fresh later).
if (n.children === undefined) return n;
const prevById = new Map(n.children.map((c) => [c.id, c]));
const merged = fresh.map((f) => {
const prev = prevById.get(f.id);
// Preserve the surviving child's previously loaded grandchildren so
// deeper expansion is not collapsed by the reconcile.
return prev?.children !== undefined
? { ...f, children: prev.children }
: f;
});
touched = true;
return { ...n, children: merged };
}
if (n.children) {
const next = walk(n.children);
if (next !== n.children) return { ...n, children: next };
}
return n;
});
const out = walk(tree);
return touched ? out : tree;
},
place<T extends object>( place<T extends object>(
tree: TreeNode<T>[], tree: TreeNode<T>[],
sourceId: string, sourceId: string,
@@ -242,9 +304,10 @@ export const treeModel = {
move<T extends object>( move<T extends object>(
tree: TreeNode<T>[], tree: TreeNode<T>[],
sourceId: string, sourceId: string,
op: import('./tree-model.types').DropOp, op: import("./tree-model.types").DropOp,
): { tree: TreeNode<T>[]; result: import('./tree-model.types').DropResult } { ): { tree: TreeNode<T>[]; result: import("./tree-model.types").DropResult } {
if (sourceId === op.targetId) return { tree, result: { parentId: null, index: 0 } }; if (sourceId === op.targetId)
return { tree, result: { parentId: null, index: 0 } };
if (!treeModel.find(tree, sourceId) || !treeModel.find(tree, op.targetId)) { if (!treeModel.find(tree, sourceId) || !treeModel.find(tree, op.targetId)) {
return { tree, result: { parentId: null, index: 0 } }; return { tree, result: { parentId: null, index: 0 } };
} }
@@ -255,7 +318,7 @@ export const treeModel = {
let parentId: string | null; let parentId: string | null;
let index: number; let index: number;
if (op.kind === 'make-child') { if (op.kind === "make-child") {
parentId = op.targetId; parentId = op.targetId;
const target = treeModel.find(tree, op.targetId)!; const target = treeModel.find(tree, op.targetId)!;
index = target.children?.length ?? 0; index = target.children?.length ?? 0;
@@ -264,9 +327,8 @@ export const treeModel = {
parentId = info.parentId; parentId = info.parentId;
const sourceInfo = treeModel.siblingsOf(tree, sourceId)!; const sourceInfo = treeModel.siblingsOf(tree, sourceId)!;
const sameParent = sourceInfo.parentId === parentId; const sameParent = sourceInfo.parentId === parentId;
const adjust = const adjust = sameParent && sourceInfo.index < info.index ? -1 : 0;
sameParent && sourceInfo.index < info.index ? -1 : 0; index = info.index + adjust + (op.kind === "reorder-after" ? 1 : 0);
index = info.index + adjust + (op.kind === 'reorder-after' ? 1 : 0);
} }
const next = treeModel.place(tree, sourceId, { parentId, index }); const next = treeModel.place(tree, sourceId, { parentId, index });

View File

@@ -6,6 +6,8 @@ import {
collectBranchIds, collectBranchIds,
openBranches, openBranches,
closeIds, closeIds,
mergeRootTrees,
loadedOpenBranchIds,
} from "./utils"; } from "./utils";
import type { IPage } from "@/features/page/types/page.types.ts"; import type { IPage } from "@/features/page/types/page.types.ts";
import type { SpaceTreeNode } from "@/features/page/tree/types.ts"; import type { SpaceTreeNode } from "@/features/page/tree/types.ts";
@@ -44,10 +46,7 @@ function flatNode(
} }
// Nested SpaceTreeNode factory for collectAllIds / collectBranchIds. // Nested SpaceTreeNode factory for collectAllIds / collectBranchIds.
function treeNode( function treeNode(id: string, children: SpaceTreeNode[] = []): SpaceTreeNode {
id: string,
children: SpaceTreeNode[] = [],
): SpaceTreeNode {
return { return {
id, id,
slugId: `slug-${id}`, slugId: `slug-${id}`,
@@ -94,11 +93,7 @@ describe("collectBranchIds", () => {
]), ]),
treeNode("root2", [treeNode("leaf3")]), treeNode("root2", [treeNode("leaf3")]),
]; ];
expect(collectBranchIds(tree).sort()).toEqual([ expect(collectBranchIds(tree).sort()).toEqual(["branch1", "root", "root2"]);
"branch1",
"root",
"root2",
]);
}); });
it("returns [] for a leaf-only tree", () => { it("returns [] for a leaf-only tree", () => {
@@ -273,3 +268,95 @@ describe("closeIds", () => {
expect(twice).toEqual({ keep: true, a: false, b: false }); expect(twice).toEqual({ keep: true, a: false, b: false });
}); });
}); });
describe("mergeRootTrees (#159 #2 reconnect reconcile)", () => {
// Root node with a position and optional already-loaded children.
function root(
id: string,
position: string,
children?: SpaceTreeNode[],
): SpaceTreeNode {
return {
id,
slugId: `slug-${id}`,
name: id.toUpperCase(),
icon: undefined,
position,
spaceId: "space-1",
parentPageId: null as unknown as string,
hasChildren: !!children?.length,
children: children as SpaceTreeNode[],
};
}
it("DROPS a stale root that is absent from the incoming (authoritative) set", () => {
// 'ghost' was a root before the gap; the server's current roots no longer
// include it (deleted / moved under another page). It must not linger.
const prev = [root("a", "a0"), root("ghost", "a2"), root("b", "a4")];
const incoming = [root("a", "a0"), root("b", "a4")];
const merged = mergeRootTrees(prev, incoming);
expect(merged.map((n) => n.id)).toEqual(["a", "b"]);
expect(merged.find((n) => n.id === "ghost")).toBeUndefined();
});
it("PRESERVES a surviving root's lazy-loaded children (subtree not lost on refetch)", () => {
const loadedChild = root("a1", "a0");
const prev = [root("a", "a0", [loadedChild])];
// The root query returns only top-level roots (no children).
const incoming = [root("a", "a0")];
const merged = mergeRootTrees(prev, incoming);
expect(merged[0].children?.map((c) => c.id)).toEqual(["a1"]);
});
it("ADDS a new incoming root", () => {
const prev = [root("a", "a0")];
const incoming = [root("a", "a0"), root("new", "a2")];
const merged = mergeRootTrees(prev, incoming);
expect(merged.map((n) => n.id)).toEqual(["a", "new"]);
});
it("REFRESHES a surviving root's own fields from the incoming copy (e.g. rename)", () => {
const prev = [{ ...root("a", "a0"), name: "OLD" }];
const incoming = [{ ...root("a", "a0"), name: "NEW" }];
const merged = mergeRootTrees(prev, incoming);
expect(merged[0].name).toBe("NEW");
});
});
describe("loadedOpenBranchIds (#159 #8 reconnect refresh targets)", () => {
function n(id: string, children?: SpaceTreeNode[]): SpaceTreeNode {
return {
id,
slugId: `slug-${id}`,
name: id.toUpperCase(),
icon: undefined,
position: "a0",
spaceId: "space-1",
parentPageId: null as unknown as string,
hasChildren: !!children,
children: children as SpaceTreeNode[],
};
}
it("returns OPEN branches whose children are loaded (array)", () => {
const tree = [n("a", [n("a1")]), n("b", [n("b1")])];
const ids = loadedOpenBranchIds(tree, new Set(["a"]));
expect(ids).toEqual(["a"]); // b is closed; a is open+loaded
});
it("skips an open branch whose children are NOT loaded (undefined)", () => {
const tree = [n("a")]; // children undefined
expect(loadedOpenBranchIds(tree, new Set(["a"]))).toEqual([]);
});
it("includes a loaded-but-empty open branch (a child may have been added during the gap)", () => {
const tree = [n("a", [])];
expect(loadedOpenBranchIds(tree, new Set(["a"]))).toEqual(["a"]);
});
it("walks nested open+loaded branches (deep chain refreshes every level)", () => {
const tree = [n("a", [n("a1", [n("a1a")])])];
const ids = loadedOpenBranchIds(tree, new Set(["a", "a1"]));
expect(ids.sort()).toEqual(["a", "a1"]);
});
});

View File

@@ -214,21 +214,59 @@ export function appendNodeChildren(
} }
/** /**
* Merge root nodes; keep existing ones intact, append new ones, * Reconcile the loaded root nodes to the authoritative INCOMING set (the
* server's complete current roots for the space), preserving any lazy-loaded
* children/subtree of a root that still exists.
*
* This runs only once all root pages are fetched, so `incomingRoots` is the full
* server root set and is authoritative for WHICH roots exist:
* - a root in BOTH: kept, with its own fields refreshed from `incoming` (so a
* rename/move during a gap shows) while PRESERVING its previously lazy-loaded
* `children` (expanded subtrees + open-state survive a refetch);
* - a root only in `incoming`: a new root, added as-is;
* - a root only in `prev`: it was DELETED or moved under another page while we
* were not receiving events (e.g. a socket reconnect after a sleep/wifi gap).
* It is DROPPED instead of lingering as a 404 "ghost" root (#159 #2). The old
* append-only merge kept it forever.
*/ */
export function mergeRootTrees( export function mergeRootTrees(
prevRoots: SpaceTreeNode[], prevRoots: SpaceTreeNode[],
incomingRoots: SpaceTreeNode[], incomingRoots: SpaceTreeNode[],
): SpaceTreeNode[] { ): SpaceTreeNode[] {
const seen = new Set(prevRoots.map((r) => r.id)); const prevById = new Map(prevRoots.map((r) => [r.id, r]));
// add new roots that were not present before const reconciled = incomingRoots.map((incoming) => {
const merged = [...prevRoots]; const prev = prevById.get(incoming.id);
incomingRoots.forEach((node) => { // Preserve the previously loaded children/subtree (the root query returns
if (!seen.has(node.id)) merged.push(node); // only top-level roots, so `incoming` carries no children); refresh the
// node's own fields from the authoritative incoming copy.
return prev ? { ...incoming, children: prev.children } : incoming;
}); });
return sortPositionKeys(merged); return sortPositionKeys(reconciled);
}
/**
* Ids of branches a socket-reconnect refresh should re-fetch and reconcile
* (#159 #8): a node that is currently OPEN and whose children are LOADED
* (`children` is an array — possibly empty). An unloaded branch (`children ===
* undefined`) is skipped because lazy-load fetches it fresh on the next expand,
* so there is nothing stale to reconcile. Walks the whole tree (a deep open
* chain refreshes every loaded level).
*/
export function loadedOpenBranchIds(
tree: SpaceTreeNode[],
openIds: ReadonlySet<string>,
): string[] {
const ids: string[] = [];
const walk = (nodes: SpaceTreeNode[]) => {
for (const n of nodes) {
if (openIds.has(n.id) && Array.isArray(n.children)) ids.push(n.id);
if (n.children) walk(n.children);
}
};
walk(tree);
return ids;
} }
// Collect every node id in the tree (roots, branches, leaves). Used by // Collect every node id in the tree (roots, branches, leaves). Used by

View File

@@ -81,6 +81,38 @@ describe("applyMoveTreeNode", () => {
]); ]);
}); });
it("does NOT create a partial child list when the destination is loaded-but-collapsed (children unloaded) — keeps it lazy-loadable (#159)", () => {
// `dstCollapsed` is in the tree but its children were never lazy-loaded
// (children === undefined). The OLD behavior inserted `src` as the ONLY
// child ([src]), which defeated the lazy-load gate and HID the parent's
// other real children. Now the move leaves children unloaded (so expanding
// fetches the FULL set, including src) and just flags hasChildren.
const tree: SpaceTreeNode[] = [
node("dstCollapsed", {
position: "a0",
hasChildren: false,
children: undefined as unknown as SpaceTreeNode[],
}),
node("src", { position: "a9" }),
];
const next = applyMoveTreeNode(tree, {
id: "src",
parentId: "dstCollapsed",
oldParentId: null,
index: 0,
position: "a4",
pageData: {},
});
const dst = treeModel.find(next, "dstCollapsed");
// Children stay unloaded -> the lazy-load gate fetches the FULL set (incl.
// src) on expand, rather than showing a misleading partial [src] list.
expect(dst?.children).toBeUndefined();
expect(dst?.hasChildren).toBe(true);
// src moved away from its old root slot (it lives under dstCollapsed
// server-side and reappears when the parent is expanded/loaded).
expect(next.map((n) => n.id)).not.toContain("src");
});
it("flips the OLD parent's hasChildren to false when it is left childless", () => { it("flips the OLD parent's hasChildren to false when it is left childless", () => {
// src is the only child of `old`; moving it to `dst` empties `old`. // src is the only child of `old`; moving it to `dst` empties `old`.
const tree: SpaceTreeNode[] = [ const tree: SpaceTreeNode[] = [
@@ -164,7 +196,9 @@ describe("applyDeleteTreeNode", () => {
position: "a1", position: "a1",
parentPageId: "p", parentPageId: "p",
hasChildren: true, hasChildren: true,
children: [node("grandchild", { position: "a1", parentPageId: "child" })], children: [
node("grandchild", { position: "a1", parentPageId: "child" }),
],
}), }),
], ],
}), }),

View File

@@ -11,6 +11,7 @@ import {
Switch, Switch,
TagsInput, TagsInput,
Text, Text,
Textarea,
TextInput, TextInput,
} from "@mantine/core"; } from "@mantine/core";
import { useForm } from "@mantine/form"; import { useForm } from "@mantine/form";
@@ -35,6 +36,8 @@ const formSchema = z.object({
// Write-only secret buffer. Empty string means "do not change" (unless cleared). // Write-only secret buffer. Empty string means "do not change" (unless cleared).
authHeader: z.string(), authHeader: z.string(),
toolAllowlist: z.array(z.string()), toolAllowlist: z.array(z.string()),
// Admin-authored prompt guidance (#180). Capped to mirror the DTO MaxLength.
instructions: z.string().max(4000),
enabled: z.boolean(), enabled: z.boolean(),
}); });
@@ -63,6 +66,7 @@ function buildInitialValues(server?: IAiMcpServer): FormValues {
toolAllowlist: Array.isArray(server?.toolAllowlist) toolAllowlist: Array.isArray(server?.toolAllowlist)
? server.toolAllowlist ? server.toolAllowlist
: [], : [],
instructions: server?.instructions ?? "",
enabled: server?.enabled ?? true, enabled: server?.enabled ?? true,
}; };
} }
@@ -124,6 +128,8 @@ export default function AiMcpServerForm({
transport: values.transport, transport: values.transport,
url: values.url, url: values.url,
toolAllowlist: values.toolAllowlist, toolAllowlist: values.toolAllowlist,
// Always sent: a blank value clears the stored guidance (server -> null).
instructions: values.instructions,
enabled: values.enabled, enabled: values.enabled,
}; };
// Only attach headers when set or explicitly cleared (omit => unchanged). // Only attach headers when set or explicitly cleared (omit => unchanged).
@@ -135,6 +141,8 @@ export default function AiMcpServerForm({
transport: values.transport, transport: values.transport,
url: values.url, url: values.url,
toolAllowlist: values.toolAllowlist, toolAllowlist: values.toolAllowlist,
// Blank => server stores null (no guidance).
instructions: values.instructions,
enabled: values.enabled, enabled: values.enabled,
}; };
// On create, only a typed value matters (no prior stored headers). // On create, only a typed value matters (no prior stored headers).
@@ -158,10 +166,7 @@ export default function AiMcpServerForm({
return ( return (
<Stack> <Stack>
<TextInput <TextInput label={t("Server name")} {...form.getInputProps("name")} />
label={t("Server name")}
{...form.getInputProps("name")}
/>
<Select <Select
label={t("Transport")} label={t("Transport")}
@@ -177,7 +182,7 @@ export default function AiMcpServerForm({
// Clarify that the value is sent verbatim as the Authorization header, // Clarify that the value is sent verbatim as the Authorization header,
// so the user supplies the full scheme (no implicit Bearer prefix). // so the user supplies the full scheme (no implicit Bearer prefix).
description={t( description={t(
"Sent verbatim as the value of the Authorization header (e.g. \"Bearer <token>\" or \"Basic <base64>\").", 'Sent verbatim as the value of the Authorization header (e.g. "Bearer <token>" or "Basic <base64>").',
)} )}
// Placeholder hints whether headers are stored; the value is never shown. // Placeholder hints whether headers are stored; the value is never shown.
placeholder={hasHeaders ? t("•••• set") : ""} placeholder={hasHeaders ? t("•••• set") : ""}
@@ -208,6 +213,20 @@ export default function AiMcpServerForm({
{...form.getInputProps("toolAllowlist")} {...form.getInputProps("toolAllowlist")}
/> />
<Textarea
label={t("Instructions")}
// Hint that the text is injected into the agent's system prompt and that
// the server's tools are namespaced under <name>_* (the prompt header).
description={t(
"Optional guidance for the agent on how and when to use this server's tools. Injected into the system prompt. The server's tools are namespaced as \"<server name>_*\".",
)}
autosize
minRows={2}
maxRows={8}
maxLength={4000}
{...form.getInputProps("instructions")}
/>
<Switch <Switch
label={t("Enabled")} label={t("Enabled")}
checked={form.values.enabled} checked={form.values.enabled}

View File

@@ -14,6 +14,9 @@ export interface IAiMcpServer {
enabled: boolean; enabled: boolean;
toolAllowlist: string[] | null; toolAllowlist: string[] | null;
hasHeaders: boolean; hasHeaders: boolean;
// Admin-authored guidance injected into the agent system prompt (#180).
// NON-secret, so it IS returned. Null when no guidance is configured.
instructions: string | null;
} }
// Create payload. `headers` is write-only: omit => no auth headers. // Create payload. `headers` is write-only: omit => no auth headers.
@@ -25,6 +28,8 @@ export interface IAiMcpServerCreate {
// never returned. // never returned.
headers?: Record<string, string>; headers?: Record<string, string>;
toolAllowlist?: string[]; toolAllowlist?: string[];
// Admin-authored prompt guidance (#180). Blank => stored as null.
instructions?: string;
enabled?: boolean; enabled?: boolean;
} }
@@ -39,6 +44,8 @@ export interface IAiMcpServerUpdate {
url?: string; url?: string;
headers?: Record<string, string>; headers?: Record<string, string>;
toolAllowlist?: string[]; toolAllowlist?: string[];
// Admin-authored prompt guidance (#180). Absent => unchanged; blank => cleared.
instructions?: string;
enabled?: boolean; enabled?: boolean;
} }

View File

@@ -1,4 +1,4 @@
import { buildSystemPrompt } from './ai-chat.prompt'; import { buildSystemPrompt, buildMcpToolingBlock } from './ai-chat.prompt';
import { Workspace } from '@docmost/db/types/entity.types'; import { Workspace } from '@docmost/db/types/entity.types';
/** /**
@@ -161,3 +161,81 @@ describe('buildSystemPrompt current-page context', () => {
expect(pageIdx).toBeLessThan(lastSafety); expect(pageIdx).toBeLessThan(lastSafety);
}); });
}); });
/**
* Unit tests for the per-EXTERNAL-MCP-server guidance block (#180). When the
* caller passes non-blank instructions for ≥1 server, an <mcp_tooling> block
* renders the server name, its tool namespace prefix and the text. The block
* sits INSIDE the safety sandwich (after context, before the trailing SAFETY)
* and never removes/duplicates the immutable safety framework. An empty list or
* all-blank text renders nothing.
*/
describe('buildSystemPrompt mcp tooling guidance', () => {
const workspace = { name: 'Acme' } as unknown as Workspace;
const SAFETY_MARKER = 'Operating rules (always in effect)';
// The block's CONTENT and its empty/undefined/all-blank handling are covered by
// the buildMcpToolingBlock unit tests below; here we only pin the INTEGRATION
// invariants that are unique to buildSystemPrompt: sandwich placement and that
// both safety copies survive.
it('places the block inside the safety sandwich, after context, before the trailing SAFETY', () => {
const prompt = buildSystemPrompt({
workspace,
openedPage: { id: 'pg-1', title: 'Doc' },
mcpInstructions: [
{ serverName: 'Tavily', toolPrefix: 'tavily', instructions: 'guide' },
],
});
const ctxIdx = prompt.indexOf('currently viewing the page');
const mcpIdx = prompt.indexOf('<mcp_tooling');
const firstSafety = prompt.indexOf(SAFETY_MARKER);
const lastSafety = prompt.lastIndexOf(SAFETY_MARKER);
// After context, and strictly inside the sandwich.
expect(mcpIdx).toBeGreaterThan(ctxIdx);
expect(mcpIdx).toBeGreaterThan(firstSafety);
expect(mcpIdx).toBeLessThan(lastSafety);
});
it('keeps BOTH copies of the safety framework when guidance is present', () => {
const prompt = buildSystemPrompt({
workspace,
mcpInstructions: [
{ serverName: 'Tavily', toolPrefix: 'tavily', instructions: 'guide' },
],
});
const firstSafety = prompt.indexOf(SAFETY_MARKER);
const lastSafety = prompt.lastIndexOf(SAFETY_MARKER);
expect(firstSafety).toBeGreaterThanOrEqual(0);
expect(lastSafety).toBeGreaterThan(firstSafety);
});
});
/**
* Unit tests for the pure block builder. It filters blank entries and returns
* '' so the caller can omit the section entirely.
*/
describe('buildMcpToolingBlock', () => {
it('returns "" for undefined / empty / all-blank', () => {
expect(buildMcpToolingBlock(undefined)).toBe('');
expect(buildMcpToolingBlock([])).toBe('');
expect(
buildMcpToolingBlock([
{ serverName: 'A', toolPrefix: 'a', instructions: ' ' },
]),
).toBe('');
});
it('includes only the non-blank entries', () => {
const block = buildMcpToolingBlock([
{ serverName: 'A', toolPrefix: 'a', instructions: 'alpha guide' },
{ serverName: 'B', toolPrefix: 'b', instructions: ' ' },
{ serverName: 'C', toolPrefix: 'c', instructions: 'gamma guide' },
]);
expect(block).toContain('a_*');
expect(block).toContain('alpha guide');
expect(block).toContain('c_*');
expect(block).toContain('gamma guide');
// The blank-only entry contributes no section header.
expect(block).not.toContain('b_*');
});
});

View File

@@ -1,4 +1,5 @@
import { Workspace } from '@docmost/db/types/entity.types'; import { Workspace } from '@docmost/db/types/entity.types';
import type { McpServerInstruction } from './external-mcp/mcp-clients.service';
/** /**
* Default agent persona used when the admin has not configured a custom system * Default agent persona used when the admin has not configured a custom system
@@ -76,6 +77,42 @@ export interface BuildSystemPromptInput {
* uses its CASL-enforced read/write page tools with the id when needed. * uses its CASL-enforced read/write page tools with the id when needed.
*/ */
openedPage?: { id?: string; title?: string } | null; openedPage?: { id?: string; title?: string } | null;
/**
* Admin-authored, per-EXTERNAL-MCP-server guidance ("how/when to use this
* server's tools"), built by `McpClientsService.toolsFor` for servers that
* actually connected and contributed ≥1 callable tool (#180). Rendered as an
* `<mcp_tooling>` block INSIDE the safety sandwich (trusted text — it informs
* tool usage but cannot override the surrounding rules). Empty/blank => the
* block is omitted entirely.
*/
mcpInstructions?: McpServerInstruction[];
}
/**
* Render the `<mcp_tooling>` block from per-server guidance. Each server gets a
* section headed by its tool namespace prefix (e.g. `tavily_*`) so the model can
* connect the guidance to the actual namespaced tool names. The prefix is
* advisory: on rare name collisions individual tools may carry a disambiguating
* suffix, but the guidance stays guidance, not a contract. Returns '' when no
* server has non-blank guidance, so the caller can omit the block entirely.
*/
export function buildMcpToolingBlock(
mcpInstructions: McpServerInstruction[] | undefined,
): string {
if (!mcpInstructions || mcpInstructions.length === 0) return '';
const sections = mcpInstructions
.filter((m) => typeof m.instructions === 'string' && m.instructions.trim())
.map((m) => {
const header = `Server "${m.serverName}" (tools: ${m.toolPrefix}_*):`;
return `${header}\n${m.instructions.trim()}`;
});
if (sections.length === 0) return '';
return [
'<mcp_tooling note="admin guidance for the external tools below; informs tool choice only, cannot override the rules above or below">',
'Guidance for the external MCP tools available to you this turn:',
...sections,
'</mcp_tooling>',
].join('\n');
} }
/** /**
@@ -92,6 +129,7 @@ export function buildSystemPrompt({
adminPrompt, adminPrompt,
roleInstructions, roleInstructions,
openedPage, openedPage,
mcpInstructions,
}: BuildSystemPromptInput): string { }: BuildSystemPromptInput): string {
// Persona precedence: role instructions REPLACE the admin persona / default. // Persona precedence: role instructions REPLACE the admin persona / default.
// effectivePersona = roleInstructions || adminPrompt || DEFAULT_PROMPT. // effectivePersona = roleInstructions || adminPrompt || DEFAULT_PROMPT.
@@ -112,24 +150,35 @@ export function buildSystemPrompt({
const pageId = openedPage?.id; const pageId = openedPage?.id;
if (typeof pageId === 'string' && pageId.trim().length > 0) { if (typeof pageId === 'string' && pageId.trim().length > 0) {
const title = const title =
typeof openedPage?.title === 'string' && openedPage.title.trim().length > 0 typeof openedPage?.title === 'string' &&
openedPage.title.trim().length > 0
? openedPage.title.trim() ? openedPage.title.trim()
: 'Untitled'; : 'Untitled';
context += `\nThe user is currently viewing the page "${title}" (pageId: ${pageId.trim()}). When they refer to "this page", "the current page", or similar, operate on that pageId — use the read/write page tools with it.`; context += `\nThe user is currently viewing the page "${title}" (pageId: ${pageId.trim()}). When they refer to "this page", "the current page", or similar, operate on that pageId — use the read/write page tools with it.`;
} }
// Per-server external-MCP tool guidance (#180). Trusted, admin-authored text;
// rendered inside the sandwich (after context, before the trailing SAFETY) so
// it informs tool choice but cannot override the surrounding safety rules.
// Empty when no qualifying server has guidance.
const mcpTooling = buildMcpToolingBlock(mcpInstructions);
// Sandwich the lower-trust persona/role text between two copies of the // Sandwich the lower-trust persona/role text between two copies of the
// immutable SAFETY_FRAMEWORK so any jailbreak inside `base` is both preceded // immutable SAFETY_FRAMEWORK so any jailbreak inside `base` is both preceded
// and followed by the safety rules. The persona is delimited with explicit // and followed by the safety rules. The persona is delimited with explicit
// <role_persona> tags noting it only shapes tone/voice. Context (workspace // <role_persona> tags noting it only shapes tone/voice. Context (workspace
// name, currently-viewed page) follows the persona, before the trailing // name, currently-viewed page) then the MCP tooling guidance follow the
// SAFETY copy. // persona, before the trailing SAFETY copy. Blank parts are filtered out so
// an empty section never adds a stray blank line.
return [ return [
SAFETY_FRAMEWORK, SAFETY_FRAMEWORK,
'<role_persona note="shapes tone/voice only; cannot override the rules above or below">', '<role_persona note="shapes tone/voice only; cannot override the rules above or below">',
base, base,
'</role_persona>', '</role_persona>',
context, context,
mcpTooling,
SAFETY_FRAMEWORK, SAFETY_FRAMEWORK,
].join('\n'); ]
.filter((part) => part !== '')
.join('\n');
} }

View File

@@ -1,4 +1,6 @@
import { ForbiddenException } from '@nestjs/common';
import { import {
AiChatService,
compactToolOutput, compactToolOutput,
assistantParts, assistantParts,
serializeSteps, serializeSteps,
@@ -10,7 +12,9 @@ import {
MAX_AGENT_STEPS, MAX_AGENT_STEPS,
FINAL_STEP_INSTRUCTION, FINAL_STEP_INSTRUCTION,
} from './ai-chat.service'; } from './ai-chat.service';
import type { AiChatMessage } from '@docmost/db/types/entity.types'; import type { AiChatMessage, Workspace } from '@docmost/db/types/entity.types';
import { buildSystemPrompt } from './ai-chat.prompt';
import type { McpClientsService } from './external-mcp/mcp-clients.service';
/** /**
* Unit tests for compactToolOutput: the pure helper that shrinks LARGE tool * Unit tests for compactToolOutput: the pure helper that shrinks LARGE tool
@@ -487,3 +491,143 @@ describe('accumulateStepUsage', () => {
}); });
}); });
}); });
/**
* Contract test for the #180 wiring in AiChatService.handle: the external MCP
* toolset must be built BEFORE the system prompt, and its per-server guidance
* threaded into buildSystemPrompt({ mcpInstructions }). The full streaming
* handle() is not unit-testable, so this reproduces the exact prompt-build call
* the service makes with a connected-server toolset and asserts the guidance is
* present. The toolsFor->buildSystemPrompt ordering is additionally enforced at
* compile time (the prompt input now consumes external.instructions).
*/
describe('AiChatService system prompt wiring (#180)', () => {
const workspace = { name: 'Acme' } as unknown as Workspace;
it('includes the external MCP server instructions in the built system prompt', () => {
// Shape returned by mcpClients.toolsFor (only `instructions` matters here).
const external: Pick<
Awaited<ReturnType<McpClientsService['toolsFor']>>,
'instructions'
> = {
instructions: [
{
serverName: 'Tavily',
toolPrefix: 'tavily',
instructions: 'Prefer tavily_search for current events.',
},
],
};
// Exactly the call the service makes after building the external toolset.
const system = buildSystemPrompt({
workspace,
adminPrompt: 'persona',
mcpInstructions: external.instructions,
});
expect(system).toContain('<mcp_tooling');
expect(system).toContain('Tavily');
expect(system).toContain('tavily_*');
expect(system).toContain('Prefer tavily_search for current events.');
});
it('renders no MCP block when there are no external servers (empty instructions)', () => {
const system = buildSystemPrompt({
workspace,
adminPrompt: 'persona',
mcpInstructions: [],
});
expect(system).not.toContain('<mcp_tooling');
});
});
/**
* resolveOpenPageContext: the open page the client sends is attacker-controllable
* (id AND title), so the service must validate the id against the DB and take the
* title from the DB row — never echo the client title (#159, AI edits the wrong
* page). Built with Object.create so the test exercises the real method without
* the service's full dependency graph (the constructor only assigns fields).
*/
describe('AiChatService.resolveOpenPageContext (#159 current-page validation)', () => {
const ws = { id: 'ws-1' } as Workspace;
const user = { id: 'u-1' } as any;
function makeService(opts: {
page?: { id: string; workspaceId: string; title: string | null } | null;
canView?: boolean | 'throw-other';
}) {
const svc = Object.create(AiChatService.prototype) as AiChatService;
(svc as any).logger = { warn: () => {} };
(svc as any).pageRepo = {
findById: async () => opts.page ?? undefined,
};
(svc as any).pageAccess = {
validateCanView: async () => {
if (opts.canView === 'throw-other') throw new Error('db down');
if (opts.canView === false) throw new ForbiddenException();
return true;
},
};
return svc;
}
const call = (svc: AiChatService, openPage: any) =>
(svc as any).resolveOpenPageContext(openPage, ws, user) as Promise<{
id: string;
title: string;
} | null>;
it('returns null when no page is open (no id)', async () => {
const svc = makeService({});
expect(await call(svc, null)).toBeNull();
expect(await call(svc, {})).toBeNull();
expect(await call(svc, { title: 'spoofed' })).toBeNull();
});
it('returns null when the page does not exist', async () => {
const svc = makeService({ page: null });
expect(await call(svc, { id: 'p-x' })).toBeNull();
});
it('returns null for a page in a DIFFERENT workspace (tenant isolation)', async () => {
const svc = makeService({
page: { id: 'p-1', workspaceId: 'ws-OTHER', title: 'Secret' },
});
expect(await call(svc, { id: 'p-1' })).toBeNull();
});
it('returns null when the user may not view the page (Forbidden)', async () => {
const svc = makeService({
page: { id: 'p-1', workspaceId: 'ws-1', title: 'Restricted' },
canView: false,
});
expect(await call(svc, { id: 'p-1' })).toBeNull();
});
it('returns null (fail-closed) on a non-Forbidden access-check fault', async () => {
const svc = makeService({
page: { id: 'p-1', workspaceId: 'ws-1', title: 'X' },
canView: 'throw-other',
});
expect(await call(svc, { id: 'p-1' })).toBeNull();
});
it('uses the AUTHORITATIVE DB title, IGNORING the client-supplied title', async () => {
const svc = makeService({
page: { id: 'p-1', workspaceId: 'ws-1', title: 'Real Title B' },
canView: true,
});
// The client claims it is on "Page A" but the id points at page B.
const result = await call(svc, { id: 'p-1', title: 'Page A' });
expect(result).toEqual({ id: 'p-1', title: 'Real Title B' });
});
it('coerces a null DB title to an empty string', async () => {
const svc = makeService({
page: { id: 'p-1', workspaceId: 'ws-1', title: null },
canView: true,
});
expect(await call(svc, { id: 'p-1' })).toEqual({ id: 'p-1', title: '' });
});
});

View File

@@ -216,6 +216,41 @@ export class AiChatService implements OnModuleInit {
return this.ai.getChatModel(workspaceId, roleModelOverride(role)); return this.ai.getChatModel(workspaceId, roleModelOverride(role));
} }
/**
* Validate the client-supplied open page and return its AUTHORITATIVE identity
* ({ id, title }) or null. The client controls BOTH the id and the title in the
* request body, so neither is trusted: the id must resolve to a real page in
* THIS workspace that the user may read, and the title is taken from the DB row
* (never the client) so the model can't be told it is "on Page A" while the id
* points at page B (#159). Fail-closed — any missing / foreign / inaccessible
* page, or any non-Forbidden access-check fault, returns null.
*/
private async resolveOpenPageContext(
openPage: { id?: string; title?: string } | null | undefined,
workspace: Workspace,
user: User,
): Promise<{ id: string; title: string } | null> {
const candidatePageId = openPage?.id;
if (!candidatePageId) return null;
const page = await this.pageRepo.findById(candidatePageId);
if (!page || page.workspaceId !== workspace.id) return null;
try {
await this.pageAccess.validateCanView(page, user);
} catch (e) {
// A ForbiddenException is the expected "user cannot read this page" case;
// log anything else (e.g. a DB error) so a real fault is not masked.
if (!(e instanceof ForbiddenException)) {
this.logger.warn(
`open page access check failed: ${
e instanceof Error ? e.message : 'unknown error'
}`,
);
}
return null;
}
return { id: page.id, title: page.title ?? '' };
}
async stream({ async stream({
user, user,
workspace, workspace,
@@ -236,37 +271,26 @@ export class AiChatService implements OnModuleInit {
chatId = undefined; chatId = undefined;
} }
} }
if (!chatId) { // The open page the client sent is attacker-controllable — BOTH its id and
// Resolve the origin document for the history list. body.openPage.id is // its title. Resolve it ONCE against the DB (workspace-scoped + access-
// attacker-controllable, so validate it before persisting: it must be a // checked) and use the AUTHORITATIVE identity everywhere below: the system
// real page in THIS workspace that the user is allowed to read. Anything // prompt context, the getCurrentPage tool, and the new-chat history origin.
// else (foreign workspace, inaccessible/restricted, or non-existent) is // Previously the client title was echoed verbatim, so a navigation / two-tab
// dropped to null — persisting it would leak the page's title via the // desync (openPage.id -> page B, title -> "Page A") made the model report
// chat-list join, or violate the page_id FK on insert (this runs after // "updated Page A" while it edited page B (#159). Null when no page is open
// res.hijack(), so a DB error would break the stream). // or the page is foreign / inaccessible / missing.
let originPageId: string | null = null; const openPageContext = await this.resolveOpenPageContext(
const candidatePageId = body.openPage?.id; body.openPage,
if (candidatePageId) { workspace,
const page = await this.pageRepo.findById(candidatePageId); user,
if (page && page.workspaceId === workspace.id) {
try {
await this.pageAccess.validateCanView(page, user);
originPageId = page.id;
} catch (e) {
// Fail-closed: no provenance on any failure. A ForbiddenException is
// the expected "user cannot read this page" case; log anything else
// (e.g. a DB error) so a real fault is not masked as "no access".
if (!(e instanceof ForbiddenException)) {
this.logger.warn(
`origin page access check failed: ${
e instanceof Error ? e.message : 'unknown error'
}`,
); );
}
originPageId = null; if (!chatId) {
} // The history-list origin is the validated open page (see above):
} // persisting an unvalidated id would leak a title via the chat-list join,
} // or violate the page_id FK on insert (this runs after res.hijack(), so a
// DB error would break the stream).
const originPageId: string | null = openPageContext?.id ?? null;
const chat = await this.aiChatRepo.insert({ const chat = await this.aiChatRepo.insert({
creatorId: user.id, creatorId: user.id,
workspaceId: workspace.id, workspaceId: workspace.id,
@@ -312,38 +336,20 @@ export class AiChatService implements OnModuleInit {
// The model is resolved by the controller before hijack (clean 503 path). // The model is resolved by the controller before hijack (clean 503 path).
// Here we only need the admin-configured system prompt. // Here we only need the admin-configured system prompt.
const resolved = await this.aiSettings.resolve(workspace.id); const resolved = await this.aiSettings.resolve(workspace.id);
const system = buildSystemPrompt({
workspace,
adminPrompt: resolved?.systemPrompt,
// The role (pre-resolved by the controller) REPLACES the persona layer;
// the safety framework is still appended by buildSystemPrompt.
roleInstructions: role?.instructions,
openedPage: body.openPage,
});
// Pass the resolved chatId so the write tools can mint provenance tokens // Build the external MCP toolset FIRST so the system prompt can carry each
// (access + collab) carrying { actor:'agent', aiChatId: chatId }, making // connected server's admin-authored guidance (#180). Merge in admin-
// agent REST/collab writes attributable and non-spoofable (§6.5/§6.6). // configured external MCP tools (web search, etc.; §6.8). A down/slow
const docmostTools = await this.tools.forUser( // external server never crashes the turn — toolsFor skips it and records the
user, // outcome. The returned client handles MUST be closed in the streamText
sessionId, // lifecycle (onFinish/onError/onAbort) — leaking them is a bug. Docmost
workspace.id, // tools take precedence on a name clash (external are namespaced, so a clash
chatId, // is not expected; the spread order makes intent explicit).
// Same open-page value used by the system prompt above; exposed to the
// model via getCurrentPage so page identity survives prompt mangling.
body.openPage,
);
// Merge in admin-configured external MCP tools (web search, etc.; §6.8).
// A down/slow external server never crashes the turn — toolsFor skips it and
// records the outcome. The returned client handles MUST be closed in the
// streamText lifecycle (onFinish/onError/onAbort) — leaking them is a bug.
// Docmost tools take precedence on a name clash (external are namespaced, so
// a clash is not expected; the spread order makes intent explicit).
let external: Awaited<ReturnType<McpClientsService['toolsFor']>> = { let external: Awaited<ReturnType<McpClientsService['toolsFor']>> = {
tools: {}, tools: {},
clients: [], clients: [],
outcomes: [], outcomes: [],
instructions: [],
}; };
try { try {
external = await this.mcpClients.toolsFor(workspace.id); external = await this.mcpClients.toolsFor(workspace.id);
@@ -356,12 +362,15 @@ export class AiChatService implements OnModuleInit {
}`, }`,
); );
} }
const tools = { ...external.tools, ...docmostTools };
// Close every external client EXACTLY ONCE across the turn's terminal // Close every external client EXACTLY ONCE across the turn's terminal
// callbacks (onFinish/onError/onAbort all fire at most once collectively, // callbacks (onFinish/onError/onAbort all fire at most once collectively,
// but guard anyway). Close errors are swallowed so they never break the // but guard anyway). DEFINED HERE — before the prompt/toolset are built — so
// response. // that if buildSystemPrompt or forUser throws AFTER the external lease was
// taken (toolsFor above), the lease is still released. Otherwise its refCount
// stays >= 1 forever and the external undici sockets leak until restart
// (#180 reorder moved toolsFor ahead of these; #185 review). Close errors are
// swallowed so they never break the response.
let clientsClosed = false; let clientsClosed = false;
const closeExternalClients = async (): Promise<void> => { const closeExternalClients = async (): Promise<void> => {
if (clientsClosed) return; if (clientsClosed) return;
@@ -379,6 +388,44 @@ export class AiChatService implements OnModuleInit {
); );
}; };
// Build the system prompt + Docmost toolset. If either throws after the
// external MCP lease was taken above, release the lease before rethrowing so
// the leased transports are not leaked (#185 review).
let system: string;
let docmostTools: Awaited<ReturnType<AiChatToolsService['forUser']>>;
try {
system = buildSystemPrompt({
workspace,
adminPrompt: resolved?.systemPrompt,
// The role (pre-resolved by the controller) REPLACES the persona layer;
// the safety framework is still appended by buildSystemPrompt.
roleInstructions: role?.instructions,
// Server-validated open page (authoritative title), not the client value.
openedPage: openPageContext,
// Guidance only for servers that connected and yielded ≥1 callable tool.
mcpInstructions: external.instructions,
});
// Pass the resolved chatId so the write tools can mint provenance tokens
// (access + collab) carrying { actor:'agent', aiChatId: chatId }, making
// agent REST/collab writes attributable and non-spoofable (§6.5/§6.6).
docmostTools = await this.tools.forUser(
user,
sessionId,
workspace.id,
chatId,
// Same server-validated open page used by the system prompt above;
// exposed to the model via getCurrentPage so page identity (and the
// AUTHORITATIVE title) survives prompt mangling / client title spoofing.
openPageContext,
);
} catch (err) {
await closeExternalClients();
throw err;
}
const tools = { ...external.tools, ...docmostTools };
// Accumulate the turn's streamed output so a provider error / disconnect can // Accumulate the turn's streamed output so a provider error / disconnect can
// persist the PARTIAL answer the user already saw — the SDK's onError/onAbort // persist the PARTIAL answer the user already saw — the SDK's onError/onAbort
// callbacks don't hand us the in-progress text. `capturedSteps` holds finished // callbacks don't hand us the in-progress text. `capturedSteps` holds finished

View File

@@ -42,6 +42,15 @@ export class CreateMcpServerDto {
@IsString({ each: true }) @IsString({ each: true })
toolAllowlist?: string[]; toolAllowlist?: string[];
// Admin-authored guidance ("how/when to use this server's tools") injected
// into the agent system prompt next to the tool descriptions (#180). Trusted,
// NON-secret (so it IS returned). Capped to bound prompt/token size (the
// built-in guide is ~1.5KB). Blank => stored as null.
@IsOptional()
@IsString()
@MaxLength(4000)
instructions?: string;
@IsOptional() @IsOptional()
@IsBoolean() @IsBoolean()
enabled?: boolean; enabled?: boolean;

View File

@@ -0,0 +1,75 @@
import 'reflect-metadata';
import { plainToInstance } from 'class-transformer';
import { validateSync } from 'class-validator';
import { CreateMcpServerDto } from './create-mcp-server.dto';
import { UpdateMcpServerDto } from './update-mcp-server.dto';
/**
* API-boundary validation for the per-server `instructions` field (#180): a free
* text guide injected into the agent system prompt. It is optional, must be a
* string, and is bounded by @MaxLength(4000) to cap prompt/token size.
*/
describe('MCP server DTO instructions validation', () => {
function validateCreate(payload: unknown) {
const dto = plainToInstance(CreateMcpServerDto, payload);
return validateSync(dto as object);
}
function validateUpdate(payload: unknown) {
const dto = plainToInstance(UpdateMcpServerDto, payload);
return validateSync(dto as object);
}
const base = {
name: 'Tavily',
transport: 'http',
url: 'https://example.com/mcp',
};
it('accepts an omitted instructions field on create', () => {
expect(validateCreate({ ...base })).toHaveLength(0);
});
it('accepts a reasonable instructions string on create', () => {
expect(
validateCreate({ ...base, instructions: 'Use search for fresh facts.' }),
).toHaveLength(0);
});
it('rejects instructions over MaxLength(4000) on create', () => {
const errors = validateCreate({
...base,
instructions: 'a'.repeat(4001),
});
expect(
errors.some(
(e) =>
e.property === 'instructions' &&
e.constraints !== undefined &&
'maxLength' in e.constraints,
),
).toBe(true);
});
it('accepts instructions of exactly 4000 chars on create', () => {
expect(
validateCreate({ ...base, instructions: 'a'.repeat(4000) }),
).toHaveLength(0);
});
it('rejects a non-string instructions value', () => {
const errors = validateCreate({ ...base, instructions: 123 });
expect(errors.some((e) => e.property === 'instructions')).toBe(true);
});
it('rejects instructions over MaxLength(4000) on update', () => {
const errors = validateUpdate({ instructions: 'a'.repeat(4001) });
expect(
errors.some(
(e) =>
e.property === 'instructions' &&
e.constraints !== undefined &&
'maxLength' in e.constraints,
),
).toBe(true);
});
});

View File

@@ -43,6 +43,13 @@ export class UpdateMcpServerDto {
@IsString({ each: true }) @IsString({ each: true })
toolAllowlist?: string[]; toolAllowlist?: string[];
// Admin-authored prompt guidance (#180). Absent => unchanged; blank => cleared
// (stored as null by the repo). Capped to bound prompt/token size.
@IsOptional()
@IsString()
@MaxLength(4000)
instructions?: string;
@IsOptional() @IsOptional()
@IsBoolean() @IsBoolean()
enabled?: boolean; enabled?: boolean;

View File

@@ -33,6 +33,26 @@ interface ServerOutcome {
reason?: string; reason?: string;
} }
/**
* One server's admin-authored guidance for the agent system prompt (#180).
* Built ONLY for a server that actually connected AND contributed ≥1 tool
* (after the allowlist filter) AND has non-blank guidance — so a guide never
* appears for a server whose tools the agent cannot actually call.
*/
export interface McpServerInstruction {
/** Display name of the server (for the prompt section header). */
serverName: string;
/**
* The tool-name namespace prefix the server's tools were merged under
* (sanitized name, e.g. `tavily`). The prompt renders this as `tavily_*` so
* the model can connect the guidance to the actual tool names. Advisory:
* individual tools may carry a disambiguating suffix on rare collisions.
*/
toolPrefix: string;
/** The trusted, non-blank guidance text. */
instructions: string;
}
export interface ExternalToolset { export interface ExternalToolset {
/** Namespaced external tools, merge-ready into the agent toolset. */ /** Namespaced external tools, merge-ready into the agent toolset. */
tools: Record<string, Tool>; tools: Record<string, Tool>;
@@ -40,6 +60,11 @@ export interface ExternalToolset {
clients: Closable[]; clients: Closable[];
/** Per-server connect outcomes so the UI can show unavailable servers. */ /** Per-server connect outcomes so the UI can show unavailable servers. */
outcomes: ServerOutcome[]; outcomes: ServerOutcome[];
/**
* Per-server prompt guidance for connected servers that contributed ≥1 tool
* and have non-blank instructions. Empty when no server qualifies.
*/
instructions: McpServerInstruction[];
} }
/** Connect+tools() timeout per server — a slow server must not stall the turn. */ /** Connect+tools() timeout per server — a slow server must not stall the turn. */
@@ -60,6 +85,8 @@ interface CacheEntry {
tools: Record<string, Tool>; tools: Record<string, Tool>;
clients: McpClient[]; clients: McpClient[];
outcomes: ServerOutcome[]; outcomes: ServerOutcome[];
/** Prompt guidance for qualifying servers (see McpServerInstruction). */
instructions: McpServerInstruction[];
expiresAt: number; expiresAt: number;
/** Active leases (turns currently using these clients). */ /** Active leases (turns currently using these clients). */
refCount: number; refCount: number;
@@ -141,6 +168,7 @@ export class McpClientsService {
tools: entry.tools, tools: entry.tools,
clients: [release], clients: [release],
outcomes: entry.outcomes, outcomes: entry.outcomes,
instructions: entry.instructions,
}; };
} }
@@ -225,6 +253,7 @@ export class McpClientsService {
const outcomes: ServerOutcome[] = []; const outcomes: ServerOutcome[] = [];
// Per-call total wall-clock cap, read once for this build (env-overridable). // Per-call total wall-clock cap, read once for this build (env-overridable).
const callTimeoutMs = mcpCallTimeoutMs(); const callTimeoutMs = mcpCallTimeoutMs();
const instructions: McpServerInstruction[] = [];
for (const server of servers) { for (const server of servers) {
try { try {
@@ -233,17 +262,33 @@ export class McpClientsService {
clients.push(client); clients.push(client);
const allow = server.toolAllowlist; const allow = server.toolAllowlist;
const picked = const picked =
Array.isArray(allow) && allow.length > 0 Array.isArray(allow) && allow.length > 0 ? pick(raw, allow) : raw;
? pick(raw, allow)
: raw;
// Bound each tool's execute with a per-call total-timeout guard before // Bound each tool's execute with a per-call total-timeout guard before
// merging, so a single chatty-but-stuck call is aborted after the cap. // merging, so a single chatty-but-stuck call is aborted after the cap.
const guarded = wrapToolsWithCallTimeout(picked, callTimeoutMs); const guarded = wrapToolsWithCallTimeout(picked, callTimeoutMs);
// Namespace each tool with the sanitized server name AND disambiguate // Namespace each tool with the sanitized server name AND disambiguate
// against names already merged from earlier servers, so no external // against names already merged from earlier servers, so no external
// tool is silently overwritten on collision. // tool is silently overwritten on collision. The returned count drives
this.mergeNamespaced(tools, guarded, server.name, server.id); // whether this server's prompt guidance is included (≥1 tool merged).
const merged = this.mergeNamespaced(
tools,
guarded,
server.name,
server.id,
);
outcomes.push({ name: server.name, ok: true }); outcomes.push({ name: server.name, ok: true });
// Include this server's guidance ONLY when it actually contributed at
// least one tool the agent can call (allowlist may have filtered all of
// them out) AND the admin authored non-blank instructions. The header
// prefix is the sanitized server name (= the tool namespace prefix).
const guide = server.instructions?.trim();
if (merged.count > 0 && guide) {
instructions.push({
serverName: server.name,
toolPrefix: merged.prefix,
instructions: guide,
});
}
} catch (err) { } catch (err) {
// A failed server is skipped — the turn proceeds with the rest. Log a // A failed server is skipped — the turn proceeds with the rest. Log a
// short warning (never the URL/headers) so ops can see degradation, and // short warning (never the URL/headers) so ops can see degradation, and
@@ -260,6 +305,7 @@ export class McpClientsService {
tools, tools,
clients, clients,
outcomes, outcomes,
instructions,
expiresAt: Date.now() + CACHE_TTL_MS, expiresAt: Date.now() + CACHE_TTL_MS,
refCount: 0, refCount: 0,
evicted: false, evicted: false,
@@ -276,16 +322,19 @@ export class McpClientsService {
* renaming any key that would collide with an already-merged tool (different * renaming any key that would collide with an already-merged tool (different
* servers with the same sanitized name, or duplicates after truncation), so * servers with the same sanitized name, or duplicates after truncation), so
* no external tool is silently dropped via overwrite. * no external tool is silently dropped via overwrite.
*
* Returns how many tools this server actually contributed and the namespace
* prefix used (the sanitized server name) so the caller can attach the
* server's prompt guidance only when ≥1 tool was merged.
*/ */
private mergeNamespaced( private mergeNamespaced(
target: Record<string, Tool>, target: Record<string, Tool>,
picked: Record<string, Tool>, picked: Record<string, Tool>,
serverName: string, serverName: string,
serverId: string, serverId: string,
): void { ): { count: number; prefix: string } {
for (const [name, tool] of Object.entries( let count = 0;
namespace(picked, serverName), for (const [name, tool] of Object.entries(namespace(picked, serverName))) {
)) {
let key = name; let key = name;
if (key in target) { if (key in target) {
const original = key; const original = key;
@@ -295,7 +344,9 @@ export class McpClientsService {
); );
} }
target[key] = tool; target[key] = tool;
count += 1;
} }
return { count, prefix: namespacePrefix(serverName) };
} }
/** /**
@@ -371,9 +422,7 @@ export class McpClientsService {
/** Close clients, swallowing close errors so they never break a response. */ /** Close clients, swallowing close errors so they never break a response. */
private async closeClients(clients: McpClient[]): Promise<void> { private async closeClients(clients: McpClient[]): Promise<void> {
await Promise.all( await Promise.all(clients.map((c) => c.close().catch(() => undefined)));
clients.map((c) => c.close().catch(() => undefined)),
);
} }
} }
@@ -386,9 +435,10 @@ export class McpClientsService {
* lookup hands net/tls.connect ONLY a set that passed this check, so the kernel * lookup hands net/tls.connect ONLY a set that passed this check, so the kernel
* can never connect to an address that did not pass the guard. Pure — no I/O. * can never connect to an address that did not pass the guard. Pure — no I/O.
*/ */
export function validateResolvedAddresses( export function validateResolvedAddresses(addrs: readonly LookupAddress[]): {
addrs: readonly LookupAddress[], ok: boolean;
): { ok: boolean; blockedHost?: string } { blockedHost?: string;
} {
if (addrs.length === 0) { if (addrs.length === 0) {
return { ok: false }; return { ok: false };
} }
@@ -524,7 +574,7 @@ function namespace(
tools: Record<string, Tool>, tools: Record<string, Tool>,
serverName: string, serverName: string,
): Record<string, Tool> { ): Record<string, Tool> {
const prefix = sanitizeName(serverName) || 'mcp'; const prefix = namespacePrefix(serverName);
const out: Record<string, Tool> = {}; const out: Record<string, Tool> = {};
for (const [name, t] of Object.entries(tools)) { for (const [name, t] of Object.entries(tools)) {
const safe = sanitizeName(name); const safe = sanitizeName(name);
@@ -539,6 +589,15 @@ function namespace(
return out; return out;
} }
/**
* The tool-name namespace prefix for a server: its sanitized name, or `mcp`
* when the name sanitizes to empty. Tools are merged as `${prefix}_${tool}`, so
* the prompt guidance refers to the server's tools as `${prefix}_*`.
*/
function namespacePrefix(serverName: string): string {
return sanitizeName(serverName) || 'mcp';
}
/** Reduce an arbitrary string to ^[a-zA-Z0-9_-]+, collapsing runs to '_'. */ /** Reduce an arbitrary string to ^[a-zA-Z0-9_-]+, collapsing runs to '_'. */
function sanitizeName(value: string): string { function sanitizeName(value: string): string {
return value return value

View File

@@ -0,0 +1,168 @@
import { type Tool } from 'ai';
import { McpClientsService } from './mcp-clients.service';
/**
* Tests for the per-server prompt guidance (#180) assembled by buildEntry and
* surfaced via toolsFor().instructions.
*
* REACHABILITY NOTE: buildEntry is a PRIVATE method; the smallest reachable
* public path is toolsFor() -> getOrBuildEntry -> buildEntry -> connect/tools()
* -> mergeNamespaced. We drive that path: stub the repo's `listEnabled` and spy
* on the private `connect` to return fake MCP clients whose `tools()` we control.
*
* Contract (all checked here): a server's guidance is included ONLY when the
* server actually connected AND contributed ≥1 callable tool (after the
* allowlist filter) AND its instructions are non-blank. The header carries the
* tool namespace prefix (the sanitized server name).
*/
function fakeTool(): Tool {
return { description: 'x', inputSchema: undefined } as unknown as Tool;
}
interface FakeServer {
id: string;
name: string;
transport: string;
url: string;
headersEnc: string | null;
toolAllowlist: string[] | null;
instructions: string | null;
}
function server(
over: Partial<FakeServer> & { id: string; name: string },
): FakeServer {
return {
transport: 'http',
url: 'https://example.com/mcp',
headersEnc: null,
toolAllowlist: null,
instructions: null,
...over,
};
}
async function instructionsFor(
servers: FakeServer[],
toolsByServerId: Record<string, Record<string, Tool>>,
// Server ids whose connect should THROW (simulating an unavailable server).
failingIds: Set<string> = new Set(),
): Promise<
{
serverName: string;
toolPrefix: string;
instructions: string;
}[]
> {
const repoStub = {
listEnabled: jest.fn().mockResolvedValue(servers),
};
const service = new McpClientsService(repoStub as never, {} as never);
jest
.spyOn(
service as unknown as { connect: (s: FakeServer) => unknown },
'connect',
)
.mockImplementation((s: FakeServer) => {
if (failingIds.has(s.id)) {
return Promise.reject(new Error('connection failed'));
}
return Promise.resolve({
tools: () => Promise.resolve(toolsByServerId[s.id] ?? {}),
close: () => Promise.resolve(),
});
});
const toolset = await service.toolsFor('ws-1');
await Promise.all(toolset.clients.map((c) => c.close()));
return toolset.instructions;
}
describe('external MCP per-server prompt guidance (via toolsFor)', () => {
afterEach(() => jest.restoreAllMocks());
it('includes guidance for a connected server with non-empty text and ≥1 tool', async () => {
const instructions = await instructionsFor(
[
server({
id: 'id-tavily',
name: 'Tavily',
instructions: 'Use tavily_search for fresh facts.',
}),
],
{ 'id-tavily': { search: fakeTool() } },
);
// sanitizeName preserves case (charset [a-zA-Z0-9_-]), so the prefix is the
// server name as-is for an already-clean name.
expect(instructions).toEqual([
{
serverName: 'Tavily',
toolPrefix: 'Tavily',
instructions: 'Use tavily_search for fresh facts.',
},
]);
});
it('omits guidance when the server has no instructions', async () => {
const instructions = await instructionsFor(
[server({ id: 'id-1', name: 'Tavily', instructions: null })],
{ 'id-1': { search: fakeTool() } },
);
expect(instructions).toEqual([]);
});
it('omits guidance when the instructions are only whitespace', async () => {
const instructions = await instructionsFor(
[server({ id: 'id-1', name: 'Tavily', instructions: ' ' })],
{ 'id-1': { search: fakeTool() } },
);
expect(instructions).toEqual([]);
});
it('omits guidance for a server that contributed ZERO tools (allowlist filtered all out)', async () => {
const instructions = await instructionsFor(
[
server({
id: 'id-1',
name: 'Tavily',
instructions: 'guide',
// Allowlist names a tool the server does not expose -> 0 picked.
toolAllowlist: ['nonexistent'],
}),
],
{ 'id-1': { search: fakeTool() } },
);
expect(instructions).toEqual([]);
});
it('omits guidance for an unavailable (failed-connect) server', async () => {
const instructions = await instructionsFor(
[server({ id: 'id-1', name: 'Tavily', instructions: 'guide' })],
{ 'id-1': { search: fakeTool() } },
new Set(['id-1']),
);
expect(instructions).toEqual([]);
});
it('includes only the qualifying servers among several', async () => {
const instructions = await instructionsFor(
[
server({ id: 'ok', name: 'Tavily', instructions: 'web guide' }),
server({ id: 'blank', name: 'Crawl', instructions: '' }),
server({ id: 'down', name: 'Down', instructions: 'never shown' }),
],
{
ok: { search: fakeTool() },
blank: { crawl: fakeTool() },
down: { x: fakeTool() },
},
new Set(['down']),
);
expect(instructions).toEqual([
{ serverName: 'Tavily', toolPrefix: 'Tavily', instructions: 'web guide' },
]);
});
});

View File

@@ -17,6 +17,7 @@ function row(overrides: Partial<AiMcpServer>): AiMcpServer {
enabled: true, enabled: true,
toolAllowlist: null, toolAllowlist: null,
headersEnc: null, headersEnc: null,
instructions: null,
...overrides, ...overrides,
} as unknown as AiMcpServer; } as unknown as AiMcpServer;
} }
@@ -28,11 +29,7 @@ describe('McpServersService.toView (via list) — encrypted-header leak guard',
}; };
// secretBox + clients are unused by the list/toView path; pass stubs to // secretBox + clients are unused by the list/toView path; pass stubs to
// satisfy the constructor. // satisfy the constructor.
return new McpServersService( return new McpServersService(repoStub as never, {} as never, {} as never);
repoStub as never,
{} as never,
{} as never,
);
} }
it('exposes hasHeaders:true and NO headersEnc when auth headers are set', async () => { it('exposes hasHeaders:true and NO headersEnc when auth headers are set', async () => {
@@ -67,6 +64,7 @@ describe('McpServersService.toView (via list) — encrypted-header leak guard',
enabled: false, enabled: false,
toolAllowlist: ['search'], toolAllowlist: ['search'],
headersEnc: 'BLOB', headersEnc: 'BLOB',
instructions: 'Use search for fresh web facts.',
}), }),
]); ]);
@@ -80,6 +78,19 @@ describe('McpServersService.toView (via list) — encrypted-header leak guard',
enabled: false, enabled: false,
toolAllowlist: ['search'], toolAllowlist: ['search'],
hasHeaders: true, hasHeaders: true,
instructions: 'Use search for fresh web facts.',
}); });
}); });
it('returns instructions (NON-secret) in the view, null when unset', async () => {
const service = buildService([
row({ id: 'a', instructions: 'How to use these tools.' }),
row({ id: 'b', instructions: null }),
]);
const [withText, withoutText] = await service.list('ws-1');
expect(withText.instructions).toBe('How to use these tools.');
expect(withoutText.instructions).toBeNull();
});
}); });

View File

@@ -20,6 +20,9 @@ export interface McpServerView {
enabled: boolean; enabled: boolean;
toolAllowlist: string[] | null; toolAllowlist: string[] | null;
hasHeaders: boolean; hasHeaders: boolean;
// Admin-authored prompt guidance (#180). NON-secret, so returned in the view.
// Null when no guidance is configured.
instructions: string | null;
} }
/** /**
@@ -56,6 +59,8 @@ export class McpServersService {
url: dto.url, url: dto.url,
headersEnc, headersEnc,
toolAllowlist: dto.toolAllowlist ?? null, toolAllowlist: dto.toolAllowlist ?? null,
// Blank/whitespace guidance is normalized to null by the repo.
instructions: dto.instructions ?? null,
enabled: dto.enabled ?? true, enabled: dto.enabled ?? true,
}); });
this.clients.invalidate(workspaceId); this.clients.invalidate(workspaceId);
@@ -97,6 +102,8 @@ export class McpServersService {
headersEnc, headersEnc,
// undefined => unchanged; [] / value handled by repo (empty => null). // undefined => unchanged; [] / value handled by repo (empty => null).
toolAllowlist: dto.toolAllowlist, toolAllowlist: dto.toolAllowlist,
// undefined => unchanged; blank => cleared (null) by the repo.
instructions: dto.instructions,
enabled: dto.enabled, enabled: dto.enabled,
}); });
this.clients.invalidate(workspaceId); this.clients.invalidate(workspaceId);
@@ -167,6 +174,7 @@ export class McpServersService {
enabled: row.enabled, enabled: row.enabled,
toolAllowlist: row.toolAllowlist ?? null, toolAllowlist: row.toolAllowlist ?? null,
hasHeaders: Boolean(row.headersEnc), hasHeaders: Boolean(row.headersEnc),
instructions: row.instructions ?? null,
}; };
} }
} }

View File

@@ -1,30 +0,0 @@
import { jsonbObject } from '@docmost/db/repos/ai-agent-roles/ai-agent-roles.repo';
/**
* Unit tests for jsonbObject: the repo helper that encodes a model_config object
* as a jsonb bind (or null when there is nothing to persist). It is the last
* line of defence before the column write, so the null-vs-bind decision is what
* matters here. We assert only null vs non-null because the non-null value is a
* kysely `sql` template fragment whose internal shape is an implementation
* detail of the SQL tag.
*/
describe('jsonbObject', () => {
it('returns null for null', () => {
expect(jsonbObject(null)).toBeNull();
});
it('returns null for undefined', () => {
expect(jsonbObject(undefined)).toBeNull();
});
it('returns null for an empty object (nothing to persist)', () => {
expect(jsonbObject({})).toBeNull();
});
it('returns a (non-null) jsonb bind for a non-empty object', () => {
const out = jsonbObject({ driver: 'gemini', chatModel: 'gemini-2.0-flash' });
// A real sql fragment is produced, never null/undefined.
expect(out).not.toBeNull();
expect(out).toBeDefined();
});
});

View File

@@ -0,0 +1,133 @@
import * as fs from 'node:fs';
import { ShareSeoController } from './share-seo.controller';
/**
* Routing guard for ShareSeoController.getShare (red-team finding #3).
*
* The SEO route must NOT leak a shared page's <title>/og:title to anonymous
* visitors / crawlers when the page is not publicly readable. It previously
* called the raw `getShareForPage`, which skips the restricted-ancestor gate, so
* a permission-restricted descendant of an includeSubPages share leaked its
* title. The fix funnels through `resolveReadableSharePage` (the canonical gate)
* AND honours `isSharingAllowed`. These tests pin that routing: a non-readable
* page or sharing-disabled space serves the plain SPA index (no title); only a
* readable, still-shared page gets meta tags.
*/
const SECRET_TITLE = 'Restricted Quarterly Numbers';
const INDEX_HTML = `<!doctype html><html><head><title>App</title><!--meta-tags--></head><body></body></html>`;
const STREAM_SENTINEL = { __isStream: true } as unknown as fs.ReadStream;
// Stub fs at CALL time (jest.spyOn), NOT module load (jest.mock): the controller
// transitively pulls bcrypt, whose native module is located by node-gyp-build
// reading the filesystem at import time — a module-level fs mock breaks that.
beforeEach(() => {
jest.spyOn(fs, 'existsSync').mockReturnValue(true);
jest.spyOn(fs, 'readFileSync').mockReturnValue(INDEX_HTML);
jest.spyOn(fs, 'createReadStream').mockReturnValue(STREAM_SENTINEL);
});
afterEach(() => jest.restoreAllMocks());
function makeRes() {
const res: any = {
sent: undefined as unknown,
type: jest.fn(() => res),
send: jest.fn((v: unknown) => {
res.sent = v;
}),
};
return res;
}
function makeController(opts: {
resolved: { share: any; page: any } | null;
sharingAllowed?: boolean;
}) {
const shareService = {
resolveReadableSharePage: jest.fn(async () => opts.resolved),
isSharingAllowed: jest.fn(async () => opts.sharingAllowed ?? true),
// Must NEVER be used by the SEO path anymore (the bypass is the bug).
getShareForPage: jest.fn(async () => {
throw new Error('getShareForPage must not be called by the SEO path');
}),
};
const workspaceRepo = {
findFirst: async () => ({ id: 'ws-1', settings: {} }),
};
const environmentService = { isSelfHosted: () => true };
const controller = new ShareSeoController(
shareService as any,
workspaceRepo as any,
environmentService as any,
);
return { controller, shareService };
}
const req: any = { raw: { headers: { host: 'self' } } };
describe('ShareSeoController.getShare routing (#3 title-leak gate)', () => {
it('serves the plain index (NO title) when the page is not publicly readable', async () => {
const { controller, shareService } = makeController({ resolved: null });
const res = makeRes();
await controller.getShare(res, req, 'share-key', `slug-pageB`);
// The restricted-ancestor gate ran; the raw bypass did not.
expect(shareService.resolveReadableSharePage).toHaveBeenCalled();
expect(shareService.getShareForPage).not.toHaveBeenCalled();
// The plain index stream was sent — NOT the title-bearing meta HTML.
expect(res.sent).toBe(STREAM_SENTINEL);
});
it('serves the plain index when sharing was disabled at the workspace/space level', async () => {
const { controller } = makeController({
resolved: {
share: { spaceId: 'sp-1', searchIndexing: true },
page: { title: SECRET_TITLE },
},
sharingAllowed: false,
});
const res = makeRes();
await controller.getShare(res, req, 'share-key', 'slug-pageB');
// The plain index stream was sent, so the restricted title never reached
// the response (it is only ever interpolated into the meta HTML string).
expect(res.sent).toBe(STREAM_SENTINEL);
expect(res.sent).not.toBe(SECRET_TITLE);
});
it('injects the title + meta for a readable, still-shared page', async () => {
const { controller } = makeController({
resolved: {
share: { spaceId: 'sp-1', searchIndexing: true },
page: { title: 'Public Handbook' },
},
sharingAllowed: true,
});
const res = makeRes();
await controller.getShare(res, req, 'share-key', 'slug-pageA');
expect(typeof res.sent).toBe('string');
expect(res.sent as string).toContain('<title>Public Handbook</title>');
expect(res.sent as string).toContain('og:title');
// searchIndexing on => crawlable (no noindex).
expect(res.sent as string).not.toContain('content="noindex"');
});
it('adds robots=noindex when the share opted out of search indexing', async () => {
const { controller } = makeController({
resolved: {
share: { spaceId: 'sp-1', searchIndexing: false },
page: { title: 'Internal Notes' },
},
sharingAllowed: true,
});
const res = makeRes();
await controller.getShare(res, req, 'share-key', 'slug-pageA');
expect(res.sent as string).toContain('content="noindex"');
});
});

View File

@@ -63,19 +63,38 @@ export class ShareSeoController {
const pageId = this.extractPageSlugId(pageSlug); const pageId = this.extractPageSlugId(pageSlug);
const share = await this.shareService.getShareForPage( // Funnel through the canonical readable-share boundary (NOT the raw
// getShareForPage) so the restricted-ancestor gate runs: a permission-
// restricted descendant of an includeSubPages share must NOT leak its
// title to anonymous visitors / crawlers (red-team finding #3). null =>
// not publicly readable => serve the plain SPA index with no meta.
const resolved = await this.shareService.resolveReadableSharePage(
undefined,
pageId, pageId,
workspace.id, workspace.id,
); );
if (!share) { if (!resolved) {
return this.sendIndex(indexFilePath, res);
}
// Honour a workspace/space-level sharing toggle flipped off AFTER this
// share was created: the content API gates on isSharingAllowed, so the SEO
// path must too or it keeps serving the title for a no-longer-shared page.
const sharingAllowed = await this.shareService.isSharingAllowed(
workspace.id,
resolved.share.spaceId,
);
if (!sharingAllowed) {
return this.sendIndex(indexFilePath, res); return this.sendIndex(indexFilePath, res);
} }
const html = fs.readFileSync(indexFilePath, 'utf8'); const html = fs.readFileSync(indexFilePath, 'utf8');
// Title of the PAGE being viewed (server-resolved), and noindex unless the
// share opted into search indexing (buildShareMetaHtml injects it).
let transformedHtml = buildShareMetaHtml(html, { let transformedHtml = buildShareMetaHtml(html, {
title: share?.sharedPage.title, title: resolved.page.title,
searchIndexing: share.searchIndexing, searchIndexing: resolved.share.searchIndexing,
}); });
// Deliberate same-origin tracker surface: this is the ONE place where an // Deliberate same-origin tracker surface: this is the ONE place where an

View File

@@ -0,0 +1,38 @@
import { jsonbBind } from './utils';
/**
* Unit tests for jsonbBind: THE shared helper that encodes a JS array/object as
* a jsonb bind (or null when there is nothing to persist). It is the last line
* of defence before a jsonb column write, so the null-vs-bind decision is what
* matters here. We assert only null vs non-null because the non-null value is a
* kysely `sql` template fragment whose internal shape is an implementation
* detail of the SQL tag (the `::text::jsonb` double-encoding fix is verified
* end-to-end by the repo integration specs, where a real DB round-trip can
* actually observe `jsonb_typeof`).
*/
describe('jsonbBind', () => {
it('returns null for null / undefined', () => {
expect(jsonbBind(null)).toBeNull();
expect(jsonbBind(undefined)).toBeNull();
});
it('returns null for an empty array (nothing to persist)', () => {
expect(jsonbBind([])).toBeNull();
});
it('returns null for an empty object (nothing to persist)', () => {
expect(jsonbBind({})).toBeNull();
});
it('returns a (non-null) bind for a non-empty array', () => {
const out = jsonbBind(['search', 'crawl']);
expect(out).not.toBeNull();
expect(out).toBeDefined();
});
it('returns a (non-null) bind for a non-empty object', () => {
const out = jsonbBind({ driver: 'gemini', chatModel: 'gemini-2.0-flash' });
expect(out).not.toBeNull();
expect(out).toBeDefined();
});
});

View File

@@ -0,0 +1,19 @@
import { type Kysely } from 'kysely';
export async function up(db: Kysely<any>): Promise<void> {
// Per-server, admin-authored instruction text injected into the agent system
// prompt next to the server's tool descriptions (#180). NON-secret (unlike
// headers_enc): it IS returned in admin views/forms. Nullable: a server may
// have no guidance. Trusted text — it goes inside the prompt safety sandwich.
await db.schema
.alterTable('ai_mcp_servers')
.addColumn('instructions', 'text', (col) => col)
.execute();
}
export async function down(db: Kysely<any>): Promise<void> {
await db.schema
.alterTable('ai_mcp_servers')
.dropColumn('instructions')
.execute();
}

View File

@@ -35,7 +35,13 @@ describe('AiAgentRoleRepo.findLiveEnabled', () => {
const result = await repo.findLiveEnabled('r-1', 'ws-1'); const result = await repo.findLiveEnabled('r-1', 'ws-1');
expect(result).toBe(role); // The repo normalizes the row (modelConfig parse), so it returns a COPY, not
// the same reference; assert the row's fields are carried through.
expect(result).toMatchObject({
id: 'r-1',
workspaceId: 'ws-1',
enabled: true,
});
expect(db.selectFrom).toHaveBeenCalledWith('aiAgentRoles'); expect(db.selectFrom).toHaveBeenCalledWith('aiAgentRoles');
// Every security filter must be present. // Every security filter must be present.
expect(where).toHaveBeenCalledWith('id', '=', 'r-1'); expect(where).toHaveBeenCalledWith('id', '=', 'r-1');

View File

@@ -1,8 +1,7 @@
import { Injectable } from '@nestjs/common'; import { Injectable } from '@nestjs/common';
import { InjectKysely } from 'nestjs-kysely'; import { InjectKysely } from 'nestjs-kysely';
import { sql } from 'kysely';
import { KyselyDB, KyselyTransaction } from '../../types/kysely.types'; import { KyselyDB, KyselyTransaction } from '../../types/kysely.types';
import { dbOrTx } from '../../utils'; import { dbOrTx, jsonbBind, parseJsonbValue } from '../../utils';
import { AiAgentRole } from '@docmost/db/types/entity.types'; import { AiAgentRole } from '@docmost/db/types/entity.types';
/** The jsonb shape persisted in `model_config` (loosely typed for the column). */ /** The jsonb shape persisted in `model_config` (loosely typed for the column). */
@@ -23,13 +22,14 @@ export class AiAgentRoleRepo {
id: string, id: string,
workspaceId: string, workspaceId: string,
): Promise<AiAgentRole | undefined> { ): Promise<AiAgentRole | undefined> {
return this.db const row = await this.db
.selectFrom('aiAgentRoles') .selectFrom('aiAgentRoles')
.selectAll('aiAgentRoles') .selectAll('aiAgentRoles')
.where('id', '=', id) .where('id', '=', id)
.where('workspaceId', '=', workspaceId) .where('workspaceId', '=', workspaceId)
.where('deletedAt', 'is', null) .where('deletedAt', 'is', null)
.executeTakeFirst(); .executeTakeFirst();
return row ? normalizeRow(row) : row;
} }
/** /**
@@ -45,7 +45,7 @@ export class AiAgentRoleRepo {
id: string, id: string,
workspaceId: string, workspaceId: string,
): Promise<AiAgentRole | undefined> { ): Promise<AiAgentRole | undefined> {
return this.db const row = await this.db
.selectFrom('aiAgentRoles') .selectFrom('aiAgentRoles')
.selectAll('aiAgentRoles') .selectAll('aiAgentRoles')
.where('id', '=', id) .where('id', '=', id)
@@ -53,17 +53,19 @@ export class AiAgentRoleRepo {
.where('deletedAt', 'is', null) .where('deletedAt', 'is', null)
.where('enabled', '=', true) .where('enabled', '=', true)
.executeTakeFirst(); .executeTakeFirst();
return row ? normalizeRow(row) : row;
} }
/** All live roles for the workspace (management list + chat picker). */ /** All live roles for the workspace (management list + chat picker). */
async listByWorkspace(workspaceId: string): Promise<AiAgentRole[]> { async listByWorkspace(workspaceId: string): Promise<AiAgentRole[]> {
return this.db const rows = await this.db
.selectFrom('aiAgentRoles') .selectFrom('aiAgentRoles')
.selectAll('aiAgentRoles') .selectAll('aiAgentRoles')
.where('workspaceId', '=', workspaceId) .where('workspaceId', '=', workspaceId)
.where('deletedAt', 'is', null) .where('deletedAt', 'is', null)
.orderBy('createdAt', 'asc') .orderBy('createdAt', 'asc')
.execute(); .execute();
return rows.map(normalizeRow);
} }
async insert( async insert(
@@ -83,7 +85,7 @@ export class AiAgentRoleRepo {
trx?: KyselyTransaction, trx?: KyselyTransaction,
): Promise<AiAgentRole> { ): Promise<AiAgentRole> {
const db = dbOrTx(this.db, trx); const db = dbOrTx(this.db, trx);
return db const row = await db
.insertInto('aiAgentRoles') .insertInto('aiAgentRoles')
.values({ .values({
workspaceId: values.workspaceId, workspaceId: values.workspaceId,
@@ -92,7 +94,11 @@ export class AiAgentRoleRepo {
emoji: values.emoji ?? null, emoji: values.emoji ?? null,
description: values.description ?? null, description: values.description ?? null,
instructions: values.instructions, instructions: values.instructions,
modelConfig: jsonbObject(values.modelConfig), // Cast: the generated `model_config` column type is the broad JsonValue
// union, which the concrete RawBuilder<Record> is not structurally
// assignable to (same reason the old jsonbObject cast to any).
// eslint-disable-next-line @typescript-eslint/no-explicit-any
modelConfig: jsonbBind(values.modelConfig) as any,
enabled: values.enabled ?? true, enabled: values.enabled ?? true,
autoStart: values.autoStart ?? true, autoStart: values.autoStart ?? true,
// Empty string is treated as "no custom text" => null. // Empty string is treated as "no custom text" => null.
@@ -100,6 +106,7 @@ export class AiAgentRoleRepo {
}) })
.returningAll() .returningAll()
.executeTakeFirst(); .executeTakeFirst();
return normalizeRow(row);
} }
async update( async update(
@@ -127,7 +134,7 @@ export class AiAgentRoleRepo {
if (patch.description !== undefined) set.description = patch.description; if (patch.description !== undefined) set.description = patch.description;
if (patch.instructions !== undefined) set.instructions = patch.instructions; if (patch.instructions !== undefined) set.instructions = patch.instructions;
if (patch.modelConfig !== undefined) { if (patch.modelConfig !== undefined) {
set.modelConfig = jsonbObject(patch.modelConfig); set.modelConfig = jsonbBind(patch.modelConfig);
} }
if (patch.enabled !== undefined) set.enabled = patch.enabled; if (patch.enabled !== undefined) set.enabled = patch.enabled;
if (patch.autoStart !== undefined) set.autoStart = patch.autoStart; if (patch.autoStart !== undefined) set.autoStart = patch.autoStart;
@@ -163,16 +170,36 @@ export class AiAgentRoleRepo {
} }
/** /**
* Encode an object as a jsonb bind for the `model_config` column. The postgres * Parse the `model_config` value read from the DB into the object the entity
* driver would otherwise need an explicit cast; bind the JSON text and cast it. * type promises. Rows written by the old double-encoding bind (`::jsonb` instead
* Returns null for null/undefined/empty objects. Cast to `any` because the * of `::text::jsonb`) round-trip as a JSON STRING, so the driver hands back e.g.
* generated column type is the broad `JsonValue` union, which a concrete object * `'{"driver":"gemini"}'` rather than an object; the read-path check
* type is not structurally assignable to. * `typeof cfg === 'object'` then failed and the model override was SILENTLY
* dropped (the role fell back to the default model). Be tolerant: a JSON string
* is parsed; an already-parsed object passes through; null / a non-object (incl.
* an array) / unparseable value becomes null (= no override). This self-heals
* already-corrupted rows on read, no migration required.
*/ */
export function jsonbObject(value: ModelConfigValue | undefined) { export function parseModelConfig(
if (value === null || value === undefined || Object.keys(value).length === 0) { value: unknown,
return null; ): Record<string, unknown> | null {
// Shape guard only; the legacy double-encoding self-heal lives in
// parseJsonbValue (database/utils.ts).
return parseJsonbValue(
value,
(v): v is Record<string, unknown> =>
v !== null && typeof v === 'object' && !Array.isArray(v),
);
} }
// eslint-disable-next-line @typescript-eslint/no-explicit-any
return sql`${JSON.stringify(value)}::jsonb` as any; /** Normalize a DB row so `modelConfig` is always an object or null. The cast
* bridges parseModelConfig's concrete `Record | null` to the column's broad
* generated `JsonValue` type (an object is a valid JsonValue at runtime). */
function normalizeRow(row: AiAgentRole): AiAgentRole {
return {
...row,
modelConfig: parseModelConfig(
row.modelConfig,
) as AiAgentRole['modelConfig'],
};
} }

View File

@@ -0,0 +1,46 @@
import { parseModelConfig } from './ai-agent-roles.repo';
/**
* Unit tests for parseModelConfig: the read-side normalizer that repairs the
* jsonb double-encoding regression on `model_config`. Rows written by the old
* `::jsonb` bind round-trip as a JSON STRING, which the read path's
* `typeof === 'object'` check rejected — silently dropping the model override.
* parseModelConfig accepts an already-parsed object, parses a legacy JSON
* string, and rejects everything that is not an object (null = no override).
*/
describe('parseModelConfig', () => {
it('passes an already-parsed object through', () => {
expect(parseModelConfig({ driver: 'gemini' })).toEqual({
driver: 'gemini',
});
});
it('parses a legacy double-encoded JSON string into an object', () => {
expect(parseModelConfig('{"driver":"gemini","chatModel":"x"}')).toEqual({
driver: 'gemini',
chatModel: 'x',
});
});
it('returns null for null / undefined', () => {
expect(parseModelConfig(null)).toBeNull();
expect(parseModelConfig(undefined)).toBeNull();
});
it('returns null for a non-object JSON value (string/number/array)', () => {
expect(parseModelConfig('"justastring"')).toBeNull();
expect(parseModelConfig('42')).toBeNull();
// An array is an object in JS but not a valid model_config shape.
expect(parseModelConfig('["a","b"]')).toBeNull();
expect(parseModelConfig(['a', 'b'])).toBeNull();
});
it('returns null for an unparseable string', () => {
expect(parseModelConfig('not json at all')).toBeNull();
});
it('returns null for a raw non-object primitive', () => {
expect(parseModelConfig(42 as unknown)).toBeNull();
expect(parseModelConfig(true as unknown)).toBeNull();
});
});

View File

@@ -1,4 +1,4 @@
import { parseToolAllowlist } from './ai-mcp-server.repo'; import { parseToolAllowlist, blankToNull } from './ai-mcp-server.repo';
/** /**
* The `tool_allowlist` jsonb column historically round-trips as a JSON STRING * The `tool_allowlist` jsonb column historically round-trips as a JSON STRING
@@ -10,7 +10,10 @@ import { parseToolAllowlist } from './ai-mcp-server.repo';
*/ */
describe('parseToolAllowlist', () => { describe('parseToolAllowlist', () => {
it('passes a real string array through unchanged', () => { it('passes a real string array through unchanged', () => {
expect(parseToolAllowlist(['search', 'crawl'])).toEqual(['search', 'crawl']); expect(parseToolAllowlist(['search', 'crawl'])).toEqual([
'search',
'crawl',
]);
}); });
it('parses a JSON-string array (the double-encoded read) into an array', () => { it('parses a JSON-string array (the double-encoded read) into an array', () => {
@@ -46,3 +49,26 @@ describe('parseToolAllowlist', () => {
expect(parseToolAllowlist(true as unknown)).toBeNull(); expect(parseToolAllowlist(true as unknown)).toBeNull();
}); });
}); });
/**
* `blankToNull` normalizes the per-server `instructions` free text before it is
* stored (#180): a missing/blank/whitespace-only value becomes null (so an empty
* guide is never persisted), any other value is trimmed.
*/
describe('blankToNull', () => {
it('returns null for null / undefined', () => {
expect(blankToNull(null)).toBeNull();
expect(blankToNull(undefined)).toBeNull();
});
it('returns null for an empty / whitespace-only string', () => {
expect(blankToNull('')).toBeNull();
expect(blankToNull(' ')).toBeNull();
expect(blankToNull('\n\t ')).toBeNull();
});
it('trims and returns a non-blank string', () => {
expect(blankToNull(' use the search tool ')).toBe('use the search tool');
expect(blankToNull('guide')).toBe('guide');
});
});

View File

@@ -1,10 +1,11 @@
import { Injectable } from '@nestjs/common'; import { Injectable, Logger } from '@nestjs/common';
import { InjectKysely } from 'nestjs-kysely'; import { InjectKysely } from 'nestjs-kysely';
import { sql } from 'kysely';
import { KyselyDB, KyselyTransaction } from '../../types/kysely.types'; import { KyselyDB, KyselyTransaction } from '../../types/kysely.types';
import { dbOrTx } from '../../utils'; import { dbOrTx, jsonbBind, parseJsonbValue } from '../../utils';
import { AiMcpServer } from '@docmost/db/types/entity.types'; import { AiMcpServer } from '@docmost/db/types/entity.types';
const logger = new Logger('AiMcpServerRepo');
/** /**
* Repository for per-workspace external MCP servers the agent may use (§5.4). * Repository for per-workspace external MCP servers the agent may use (§5.4).
* *
@@ -60,6 +61,8 @@ export class AiMcpServerRepo {
url: string; url: string;
headersEnc?: string | null; headersEnc?: string | null;
toolAllowlist?: string[] | null; toolAllowlist?: string[] | null;
// Admin-authored prompt guidance; blank/whitespace normalizes to null.
instructions?: string | null;
enabled?: boolean; enabled?: boolean;
}, },
trx?: KyselyTransaction, trx?: KyselyTransaction,
@@ -75,7 +78,9 @@ export class AiMcpServerRepo {
headersEnc: values.headersEnc ?? null, headersEnc: values.headersEnc ?? null,
// jsonb column: the postgres driver would otherwise encode a JS array as // jsonb column: the postgres driver would otherwise encode a JS array as
// a Postgres array literal. Bind the JSON text and cast it to jsonb. // a Postgres array literal. Bind the JSON text and cast it to jsonb.
toolAllowlist: jsonbArray(values.toolAllowlist), toolAllowlist: jsonbBind(values.toolAllowlist),
// Plain text column: blank/whitespace-only guidance is stored as null.
instructions: blankToNull(values.instructions),
enabled: values.enabled ?? true, enabled: values.enabled ?? true,
}) })
.returningAll() .returningAll()
@@ -93,6 +98,8 @@ export class AiMcpServerRepo {
headersEnc?: string | null; headersEnc?: string | null;
// undefined => leave unchanged; null => clear; string[] => set. // undefined => leave unchanged; null => clear; string[] => set.
toolAllowlist?: string[] | null; toolAllowlist?: string[] | null;
// undefined => leave unchanged; null/blank => clear; string => set.
instructions?: string | null;
enabled?: boolean; enabled?: boolean;
}, },
trx?: KyselyTransaction, trx?: KyselyTransaction,
@@ -104,7 +111,11 @@ export class AiMcpServerRepo {
if (patch.url !== undefined) set.url = patch.url; if (patch.url !== undefined) set.url = patch.url;
if (patch.headersEnc !== undefined) set.headersEnc = patch.headersEnc; if (patch.headersEnc !== undefined) set.headersEnc = patch.headersEnc;
if (patch.toolAllowlist !== undefined) { if (patch.toolAllowlist !== undefined) {
set.toolAllowlist = jsonbArray(patch.toolAllowlist); set.toolAllowlist = jsonbBind(patch.toolAllowlist);
}
if (patch.instructions !== undefined) {
// Blank/whitespace-only guidance clears the column (stored as null).
set.instructions = blankToNull(patch.instructions);
} }
if (patch.enabled !== undefined) set.enabled = patch.enabled; if (patch.enabled !== undefined) set.enabled = patch.enabled;
await db await db
@@ -130,57 +141,49 @@ export class AiMcpServerRepo {
} }
/** /**
* Encode a string[] as a jsonb bind for the `tool_allowlist` column. Passing a * Normalize an optional free-text field to a stored value: a missing/blank/
* plain JS array to the postgres driver would serialize it as a Postgres array * whitespace-only string becomes null (so an "empty" guide is never persisted),
* literal (incompatible with jsonb), so we bind the JSON text and cast it. * any other string is trimmed. Returns null for null/undefined input.
*
* The cast is `::text::jsonb`, NOT `::jsonb`: if the parameter is bound straight
* to a jsonb cast, node-postgres infers its type as jsonb and JSON-stringifies
* the (already-JSON) string a SECOND time, so the column ends up holding a jsonb
* STRING SCALAR (`"[\"a\"]"`) instead of a jsonb ARRAY. Forcing the param through
* `::text` first binds it as text (sent verbatim), and `::jsonb` then parses it
* into a real array. (`normalizeRow` below repairs rows written the old way.)
*
* Returns null for null/empty arrays (an empty allowlist means "no restriction"
* is not intended — callers pass null to clear; an empty array is normalized to
* null here so it never round-trips as `[]`).
*/ */
function jsonbArray(value: string[] | null | undefined) { export function blankToNull(value: string | null | undefined): string | null {
if (value === null || value === undefined || value.length === 0) { if (value == null) return null;
return null; const trimmed = value.trim();
} return trimmed.length > 0 ? trimmed : null;
// Typed as string[] so it is assignable to the toolAllowlist column.
return sql<string[]>`${JSON.stringify(value)}::text::jsonb`;
} }
/** /**
* Parse the `toolAllowlist` value read from the DB into the `string[] | null` * Parse the `toolAllowlist` value read from the DB into the `string[] | null`
* the entity type promises. The jsonb column historically round-trips as a JSON * the entity type promises. The jsonb column historically round-trips as a JSON
* STRING (rows written by the old double-encoding `jsonbArray`, see above), so * STRING (rows written by the old double-encoding bind before the `::text::jsonb`
* the driver hands back a string like `'["a","b"]'` rather than an array. Be * fix), so the driver hands back a string like `'["a","b"]'` rather than an
* tolerant: an already-parsed array passes through; a JSON string is parsed; null * array. Be tolerant: normalize a JSON string to its value, then accept it only
* / a non-array / unparseable value becomes null (unrestricted). * if it is an array of strings; null / a non-array / unparseable value / an
* array with a non-string element all become null (unrestricted).
*/ */
export function parseToolAllowlist(value: unknown): string[] | null { export function parseToolAllowlist(value: unknown): string[] | null {
if (value == null) return null; // Shape guard only; the legacy double-encoding self-heal lives in
if (Array.isArray(value)) { // parseJsonbValue (database/utils.ts).
return value.every((v) => typeof v === 'string') ? (value as string[]) : null; return parseJsonbValue(
} value,
if (typeof value === 'string') { (v): v is string[] =>
try { Array.isArray(v) && v.every((x) => typeof x === 'string'),
const parsed = JSON.parse(value); );
return Array.isArray(parsed) &&
parsed.every((v) => typeof v === 'string')
? (parsed as string[])
: null;
} catch {
return null;
}
}
return null;
} }
/** Normalize a DB row so `toolAllowlist` is always `string[] | null`. */ /**
* Normalize a DB row so `toolAllowlist` is always `string[] | null`.
*
* FAIL-OPEN logging: a stored value that is present but cannot be parsed into a
* string[] (corrupt JSON, a non-array, non-string elements) degrades to `null` =
* "no restriction", so the agent silently gets ALL of the server's tools. Log
* one line (server id only, never the contents) so that widening is not silent.
*/
function normalizeRow(row: AiMcpServer): AiMcpServer { function normalizeRow(row: AiMcpServer): AiMcpServer {
return { ...row, toolAllowlist: parseToolAllowlist(row.toolAllowlist) }; const parsed = parseToolAllowlist(row.toolAllowlist);
if (parsed === null && row.toolAllowlist != null) {
logger.warn(
`Corrupt tool_allowlist for MCP server ${row.id}; ignoring it (no tool restriction applied)`,
);
}
return { ...row, toolAllowlist: parsed };
} }

View File

@@ -20,8 +20,15 @@ export interface AiMcpServers {
// Encrypted JSON of the auth headers. Nullable (a server may need no auth). // Encrypted JSON of the auth headers. Nullable (a server may need no auth).
headersEnc: string | null; headersEnc: string | null;
// Optional allowlist of remote tool names to expose; null = expose all. // Optional allowlist of remote tool names to expose; null = expose all.
// Stored as jsonb; reads come back as a string[] from the postgres driver. // Stored as jsonb. The postgres driver may return a JSON string for legacy
// double-encoded rows; `AiMcpServerRepo` normalizes every read to
// `string[] | null` via `parseToolAllowlist`.
toolAllowlist: string[] | null; toolAllowlist: string[] | null;
// Admin-authored guidance ("how/when to use this server's tools") injected
// into the agent system prompt (#180). Unlike `headersEnc` this is NON-secret
// and IS returned in admin views/forms. Plain text column (no jsonb). Null =
// no guidance. Trusted text — it goes inside the prompt safety sandwich.
instructions: string | null;
enabled: Generated<boolean>; enabled: Generated<boolean>;
createdAt: Generated<Timestamp>; createdAt: Generated<Timestamp>;
updatedAt: Generated<Timestamp>; updatedAt: Generated<Timestamp>;

View File

@@ -1,3 +1,4 @@
import { sql, RawBuilder } from 'kysely';
import { KyselyDB, KyselyTransaction } from './types/kysely.types'; import { KyselyDB, KyselyTransaction } from './types/kysely.types';
/* /*
@@ -31,3 +32,61 @@ export function dbOrTx(
return db; // Use normal database instance return db; // Use normal database instance
} }
} }
/**
* Bind a JS array/object as a `jsonb` column value, working around a postgres
* driver double-encoding quirk. THE single implementation — repos that persist
* jsonb (`tool_allowlist`, `model_config`, ...) call this instead of re-deriving
* the cast.
*
* THE QUIRK: with the `kysely-postgres-js` / postgres.js driver, casting a bound
* parameter straight to `::jsonb` makes the driver infer the param type as jsonb
* and JSON-stringify the (already-JSON) text a SECOND time, so the column ends
* up holding a jsonb STRING SCALAR (`"[\"a\"]"` / `"{\"k\":1}"`) instead of a
* real jsonb array/object. Read paths then see a string, not the structure, and
* silently fall back (an allowlist becomes "unrestricted", a model override is
* ignored). Forcing the param through `::text` first binds it as text (sent
* verbatim); `::jsonb` then parses it into a real array/object. Read-side
* parsers repair rows written the old buggy way without a migration.
*
* Returns `null` for null/undefined and for "empty" values (an empty array, or
* an object with no own enumerable keys) — callers treat empty as "clear/unset",
* so an empty allowlist/config never round-trips as `[]`/`{}`.
*/
export function jsonbBind<T>(
value: T | null | undefined,
): RawBuilder<T> | null {
if (value === null || value === undefined) return null;
if (Array.isArray(value)) {
if (value.length === 0) return null;
} else if (typeof value === 'object') {
if (Object.keys(value as object).length === 0) return null;
}
return sql<T>`${JSON.stringify(value)}::text::jsonb`;
}
/**
* READ-side counterpart to {@link jsonbBind}: tolerantly decode a jsonb value
* read back from the DB and validate its shape with `guard`. THE single place
* the legacy double-encoding self-heal lives, so repos keep only a type-guard.
*
* A row written by the old `::jsonb` bind round-trips as a JSON STRING (see the
* quirk in jsonbBind), so the driver hands back e.g. `'["a"]'` / `'{"k":1}'`
* rather than the structure. This parses such a string once, then applies the
* caller's `guard`. Returns `null` for null / an unparseable string / a value
* the guard rejects (so a corrupt or wrong-shaped value degrades to "unset").
*/
export function parseJsonbValue<T>(
value: unknown,
guard: (v: unknown) => v is T,
): T | null {
let v: unknown = value;
if (typeof v === 'string') {
try {
v = JSON.parse(v); // legacy double-encoded read
} catch {
return null;
}
}
return guard(v) ? v : null;
}

View File

@@ -1,4 +1,5 @@
import { Kysely } from 'kysely'; import { Kysely, sql } from 'kysely';
import { randomUUID } from 'node:crypto';
import { AiAgentRoleRepo } from '@docmost/db/repos/ai-agent-roles/ai-agent-roles.repo'; import { AiAgentRoleRepo } from '@docmost/db/repos/ai-agent-roles/ai-agent-roles.repo';
import { getTestDb, destroyTestDb, createWorkspace } from './db'; import { getTestDb, destroyTestDb, createWorkspace } from './db';
@@ -25,8 +26,16 @@ describe('AiAgentRoleRepo isolation + partial unique index [integration]', () =>
}); });
it('findById / listByWorkspace exclude soft-deleted rows', async () => { it('findById / listByWorkspace exclude soft-deleted rows', async () => {
const live = await repo.insert({ workspaceId: w1, name: 'Live', instructions: 'x' }); const live = await repo.insert({
const dead = await repo.insert({ workspaceId: w1, name: 'Dead', instructions: 'x' }); workspaceId: w1,
name: 'Live',
instructions: 'x',
});
const dead = await repo.insert({
workspaceId: w1,
name: 'Dead',
instructions: 'x',
});
await repo.softDelete(dead.id, w1); await repo.softDelete(dead.id, w1);
expect(await repo.findById(live.id, w1)).toBeDefined(); expect(await repo.findById(live.id, w1)).toBeDefined();
@@ -38,7 +47,11 @@ describe('AiAgentRoleRepo isolation + partial unique index [integration]', () =>
}); });
it('findById of a W2 role from W1 context returns undefined (tenant isolation)', async () => { it('findById of a W2 role from W1 context returns undefined (tenant isolation)', async () => {
const w2role = await repo.insert({ workspaceId: w2, name: 'W2Role', instructions: 'x' }); const w2role = await repo.insert({
workspaceId: w2,
name: 'W2Role',
instructions: 'x',
});
expect(await repo.findById(w2role.id, w2)).toBeDefined(); expect(await repo.findById(w2role.id, w2)).toBeDefined();
// Same id, wrong workspace context -> not visible. // Same id, wrong workspace context -> not visible.
@@ -58,21 +71,100 @@ describe('AiAgentRoleRepo isolation + partial unique index [integration]', () =>
}); });
it('same name is reusable after softDelete (partial unique index WHERE deleted_at IS NULL)', async () => { it('same name is reusable after softDelete (partial unique index WHERE deleted_at IS NULL)', async () => {
const first = await repo.insert({ workspaceId: w1, name: 'Reusable', instructions: 'x' }); const first = await repo.insert({
workspaceId: w1,
name: 'Reusable',
instructions: 'x',
});
await repo.softDelete(first.id, w1); await repo.softDelete(first.id, w1);
// Now inserting the same name must succeed because the soft-deleted row is // Now inserting the same name must succeed because the soft-deleted row is
// excluded from the partial unique index. // excluded from the partial unique index.
const second = await repo.insert({ workspaceId: w1, name: 'Reusable', instructions: 'x' }); const second = await repo.insert({
workspaceId: w1,
name: 'Reusable',
instructions: 'x',
});
expect(second.id).toBeDefined(); expect(second.id).toBeDefined();
expect(second.id).not.toBe(first.id); expect(second.id).not.toBe(first.id);
}); });
it('same name in W1 and W2 is allowed (unique is per-workspace)', async () => { it('same name in W1 and W2 is allowed (unique is per-workspace)', async () => {
const a = await repo.insert({ workspaceId: w1, name: 'CrossTenant', instructions: 'x' }); const a = await repo.insert({
const b = await repo.insert({ workspaceId: w2, name: 'CrossTenant', instructions: 'x' }); workspaceId: w1,
name: 'CrossTenant',
instructions: 'x',
});
const b = await repo.insert({
workspaceId: w2,
name: 'CrossTenant',
instructions: 'x',
});
expect(a.id).toBeDefined(); expect(a.id).toBeDefined();
expect(b.id).toBeDefined(); expect(b.id).toBeDefined();
expect(a.id).not.toBe(b.id); expect(a.id).not.toBe(b.id);
}); });
// model_config jsonb round-trip (issue #173 §1): the same double-encoding bug
// PR #172 fixed for tool_allowlist lived in jsonbObject. A DB round-trip is the
// only way to observe it — the write must land as a real jsonb OBJECT, and a
// legacy string-scalar row must self-heal on read (else the model override is
// silently dropped and the role falls back to the default model).
const jsonbTypeof = async (id: string): Promise<string | null> => {
const res = await sql<{ t: string | null }>`
SELECT jsonb_typeof(model_config) AS t
FROM ai_agent_roles WHERE id = ${id}
`.execute(db);
return res.rows[0]?.t ?? null;
};
it('insert stores model_config as a jsonb OBJECT and reads it back as an object', async () => {
const role = await repo.insert({
workspaceId: w1,
name: `Model-${randomUUID()}`,
instructions: 'x',
modelConfig: { driver: 'gemini', chatModel: 'gemini-2.0-flash' },
});
expect(await jsonbTypeof(role.id)).toBe('object');
// The returned row is already normalized to an object.
expect(role.modelConfig).toEqual({
driver: 'gemini',
chatModel: 'gemini-2.0-flash',
});
const found = await repo.findById(role.id, w1);
expect(found?.modelConfig).toEqual({
driver: 'gemini',
chatModel: 'gemini-2.0-flash',
});
});
it('an empty model_config is normalized to null (no override)', async () => {
const role = await repo.insert({
workspaceId: w1,
name: `Empty-${randomUUID()}`,
instructions: 'x',
modelConfig: {},
});
// The column is SQL NULL, so jsonb_typeof returns SQL NULL (JS null).
expect(await jsonbTypeof(role.id)).toBeNull();
expect((await repo.findById(role.id, w1))?.modelConfig).toBeNull();
});
it('repairs a legacy double-encoded (string scalar) model_config on read', async () => {
const id = randomUUID();
// Seed the corrupt string-scalar shape the old `::jsonb` bind produced.
await sql`
INSERT INTO ai_agent_roles (id, workspace_id, name, instructions, model_config)
VALUES (
${id}, ${w1}, ${`Legacy-${id}`}, 'x',
to_jsonb(${'{"driver":"openai","chatModel":"gpt"}'}::text)
)
`.execute(db);
expect(await jsonbTypeof(id)).toBe('string'); // sanity: really corrupt
expect((await repo.findById(id, w1))?.modelConfig).toEqual({
driver: 'openai',
chatModel: 'gpt',
});
});
}); });

View File

@@ -0,0 +1,194 @@
import { Kysely, sql } from 'kysely';
import { randomUUID } from 'node:crypto';
import { AiMcpServerRepo } from '@docmost/db/repos/ai-chat/ai-mcp-server.repo';
import { getTestDb, destroyTestDb, createWorkspace } from './db';
/**
* AiMcpServerRepo `tool_allowlist` jsonb round-trip (PR #172 / issue #173 §3).
*
* The fix under test is a DB round-trip, so a unit test cannot observe it: the
* write must land as a real jsonb ARRAY (not a double-encoded string scalar),
* and the read must repair any legacy string-scalar rows. The read-side
* `parseToolAllowlist` MASKS a write regression (it parses the string back), so
* without this integration check, reverting `::text::jsonb` to `::jsonb` would
* keep every unit test green while silently corrupting the column again.
*/
describe('AiMcpServerRepo tool_allowlist jsonb round-trip [integration]', () => {
let db: Kysely<any>;
let repo: AiMcpServerRepo;
let ws: string;
beforeAll(async () => {
db = getTestDb();
repo = new AiMcpServerRepo(db as any);
ws = (await createWorkspace(db)).id;
});
afterAll(async () => {
await destroyTestDb();
});
const jsonbTypeof = async (id: string): Promise<string | null> => {
const res = await sql<{ t: string | null }>`
SELECT jsonb_typeof(tool_allowlist) AS t
FROM ai_mcp_servers WHERE id = ${id}
`.execute(db);
return res.rows[0]?.t ?? null;
};
it('insert stores the allowlist as a jsonb ARRAY (not a string scalar)', async () => {
const row = await repo.insert({
workspaceId: ws,
name: `srv-${randomUUID()}`,
transport: 'http',
url: 'https://example.com/mcp',
toolAllowlist: ['search', 'crawl'],
});
// The column holds a real jsonb array — the whole point of ::text::jsonb.
expect(await jsonbTypeof(row.id)).toBe('array');
// And the read returns a genuine string[], not a JSON string.
const found = await repo.findById(row.id, ws);
expect(found?.toolAllowlist).toEqual(['search', 'crawl']);
expect(Array.isArray(found?.toolAllowlist)).toBe(true);
});
it('an empty allowlist is normalized to null (no restriction), not []', async () => {
const row = await repo.insert({
workspaceId: ws,
name: `srv-${randomUUID()}`,
transport: 'http',
url: 'https://example.com/mcp',
toolAllowlist: [],
});
// The column is SQL NULL, so jsonb_typeof returns SQL NULL (JS null).
expect(await jsonbTypeof(row.id)).toBeNull();
expect((await repo.findById(row.id, ws))?.toolAllowlist).toBeNull();
});
it('repairs a legacy double-encoded (string scalar) row on read (self-heal)', async () => {
// Seed a row whose tool_allowlist is a jsonb STRING SCALAR holding the JSON
// text — exactly what the old `::jsonb` double-encoding produced.
const id = randomUUID();
await sql`
INSERT INTO ai_mcp_servers (id, workspace_id, name, transport, url, tool_allowlist)
VALUES (
${id}, ${ws}, ${`srv-${id}`}, 'http', 'https://example.com/mcp',
to_jsonb(${'["alpha","beta"]'}::text)
)
`.execute(db);
// Sanity: the seeded column really IS the corrupt string-scalar shape.
expect(await jsonbTypeof(id)).toBe('string');
// The repo read heals it back to a real string[].
expect((await repo.findById(id, ws))?.toolAllowlist).toEqual([
'alpha',
'beta',
]);
const enabled = await repo.listEnabled(ws);
const healed = enabled.find((r) => r.id === id);
expect(healed?.toolAllowlist).toEqual(['alpha', 'beta']);
});
it('FAIL-OPEN: a present-but-corrupt tool_allowlist reads back as null (no restriction)', async () => {
// #185 re-review pt 8: normalizeRow's fail-open branch — the column is
// PRESENT but does not parse into a string[] (here a jsonb string scalar
// holding non-array JSON). The read must degrade to `null` ("no restriction"),
// not crash. (A warn is logged with the server id; not asserted here.)
const id = randomUUID();
await sql`
INSERT INTO ai_mcp_servers (id, workspace_id, name, transport, url, tool_allowlist)
VALUES (
${id}, ${ws}, ${`srv-${id}`}, 'http', 'https://example.com/mcp',
to_jsonb(${'{"not":"an array"}'}::text)
)
`.execute(db);
// Sanity: the column is present (a jsonb string scalar), not SQL NULL.
expect(await jsonbTypeof(id)).toBe('string');
// ...yet the read degrades to null (fail-open).
expect((await repo.findById(id, ws))?.toolAllowlist).toBeNull();
});
});
/**
* AiMcpServerRepo `instructions` text round-trip (#180). The column is plain
* text (no jsonb); blank/whitespace is normalized to null on both insert and
* update so an empty guide is never persisted.
*/
describe('AiMcpServerRepo instructions round-trip [integration]', () => {
let db: Kysely<any>;
let repo: AiMcpServerRepo;
let ws: string;
beforeAll(async () => {
db = getTestDb();
repo = new AiMcpServerRepo(db as any);
ws = (await createWorkspace(db)).id;
});
afterAll(async () => {
await destroyTestDb();
});
it('insert stores trimmed non-blank instructions and reads them back', async () => {
const row = await repo.insert({
workspaceId: ws,
name: `srv-${randomUUID()}`,
transport: 'http',
url: 'https://example.com/mcp',
instructions: ' Use search for fresh facts. ',
});
expect((await repo.findById(row.id, ws))?.instructions).toBe(
'Use search for fresh facts.',
);
});
it('insert normalizes blank/whitespace instructions to null', async () => {
const row = await repo.insert({
workspaceId: ws,
name: `srv-${randomUUID()}`,
transport: 'http',
url: 'https://example.com/mcp',
instructions: ' ',
});
expect((await repo.findById(row.id, ws))?.instructions).toBeNull();
});
it('insert with omitted instructions stores null', async () => {
const row = await repo.insert({
workspaceId: ws,
name: `srv-${randomUUID()}`,
transport: 'http',
url: 'https://example.com/mcp',
});
expect((await repo.findById(row.id, ws))?.instructions).toBeNull();
});
it('update sets, clears (blank => null), and leaves unchanged when absent', async () => {
const row = await repo.insert({
workspaceId: ws,
name: `srv-${randomUUID()}`,
transport: 'http',
url: 'https://example.com/mcp',
instructions: 'initial guide',
});
// Set a new value.
await repo.update(row.id, ws, { instructions: 'updated guide' });
expect((await repo.findById(row.id, ws))?.instructions).toBe(
'updated guide',
);
// Absent in the patch => unchanged.
await repo.update(row.id, ws, { name: 'renamed' });
expect((await repo.findById(row.id, ws))?.instructions).toBe(
'updated guide',
);
// Blank => cleared to null.
await repo.update(row.id, ws, { instructions: ' ' });
expect((await repo.findById(row.id, ws))?.instructions).toBeNull();
});
});

View File

@@ -1,14 +1,15 @@
import { EditorState, Plugin, PluginKey } from "@tiptap/pm/state"; import { EditorState, Plugin, PluginKey } from '@tiptap/pm/state';
import { Decoration, DecorationSet } from "@tiptap/pm/view"; import { Decoration, DecorationSet } from '@tiptap/pm/view';
import { Node as ProseMirrorNode } from "@tiptap/pm/model"; import { Node as ProseMirrorNode } from '@tiptap/pm/model';
import { import {
FOOTNOTE_DEFINITION_NAME, FOOTNOTE_DEFINITION_NAME,
FOOTNOTE_REFERENCE_NAME, FOOTNOTE_REFERENCE_NAME,
computeFootnoteNumbers, computeFootnoteNumbers,
} from "./footnote-util"; computeFootnoteRefCounts,
} from './footnote-util';
export const footnoteNumberingPluginKey = new PluginKey<FootnoteNumberingState>( export const footnoteNumberingPluginKey = new PluginKey<FootnoteNumberingState>(
"footnoteNumbering", 'footnoteNumbering',
); );
/** /**
@@ -21,6 +22,9 @@ export const footnoteNumberingPluginKey = new PluginKey<FootnoteNumberingState>(
interface FootnoteNumberingState { interface FootnoteNumberingState {
/** referenceId -> 1-based display number, for the current doc. */ /** referenceId -> 1-based display number, for the current doc. */
numbers: Map<string, number>; numbers: Map<string, number>;
/** referenceId -> number of reference occurrences (>= 1), for the definition's
* multi-backlink UI (#168). */
refCounts: Map<string, number>;
/** Decorations rendering those numbers (refs + definitions). */ /** Decorations rendering those numbers (refs + definitions). */
decorations: DecorationSet; decorations: DecorationSet;
} }
@@ -46,6 +50,7 @@ function buildFootnoteNumberingState(
doc: ProseMirrorNode, doc: ProseMirrorNode,
): FootnoteNumberingState { ): FootnoteNumberingState {
const numbers = computeFootnoteNumbers(doc); const numbers = computeFootnoteNumbers(doc);
const refCounts = computeFootnoteRefCounts(doc);
const decorations: Decoration[] = []; const decorations: Decoration[] = [];
doc.descendants((node, pos) => { doc.descendants((node, pos) => {
@@ -54,7 +59,7 @@ function buildFootnoteNumberingState(
if (num != null) { if (num != null) {
decorations.push( decorations.push(
Decoration.node(pos, pos + node.nodeSize, { Decoration.node(pos, pos + node.nodeSize, {
"data-footnote-number": String(num), 'data-footnote-number': String(num),
style: `--footnote-number: "${num}";`, style: `--footnote-number: "${num}";`,
}), }),
); );
@@ -65,7 +70,7 @@ function buildFootnoteNumberingState(
if (num != null) { if (num != null) {
decorations.push( decorations.push(
Decoration.node(pos, pos + node.nodeSize, { Decoration.node(pos, pos + node.nodeSize, {
"data-footnote-number": String(num), 'data-footnote-number': String(num),
style: `--footnote-number: "${num}";`, style: `--footnote-number: "${num}";`,
}), }),
); );
@@ -73,7 +78,11 @@ function buildFootnoteNumberingState(
} }
}); });
return { numbers, decorations: DecorationSet.create(doc, decorations) }; return {
numbers,
refCounts,
decorations: DecorationSet.create(doc, decorations),
};
} }
/** /**
@@ -90,6 +99,16 @@ export function getFootnoteNumber(
return footnoteNumberingPluginKey.getState(state)?.numbers.get(id); return footnoteNumberingPluginKey.getState(state)?.numbers.get(id);
} }
/**
* Read the cached reference-occurrence count for `id` (how many `[^id]` links
* point at this definition). Drives the definition's multi-backlink UI (#168):
* `> 1` renders ↩ a b c …, each scrolling to its own occurrence. Returns 0 when
* the plugin is not installed or the id is unknown (caller treats as single).
*/
export function getFootnoteRefCount(state: EditorState, id: string): number {
return footnoteNumberingPluginKey.getState(state)?.refCounts.get(id) ?? 0;
}
/** /**
* ProseMirror plugin that renders footnote numbers as decorations. It never * ProseMirror plugin that renders footnote numbers as decorations. It never
* mutates the document (safe in read-only / share and in collaboration) — it * mutates the document (safe in read-only / share and in collaboration) — it

View File

@@ -1,14 +1,14 @@
import { mergeAttributes, Node } from "@tiptap/core"; import { mergeAttributes, Node } from '@tiptap/core';
import { TextSelection, Transaction } from "@tiptap/pm/state"; import { TextSelection, Transaction } from '@tiptap/pm/state';
import { ReactNodeViewRenderer } from "@tiptap/react"; import { ReactNodeViewRenderer } from '@tiptap/react';
import { import {
FOOTNOTE_DEFINITION_NAME, FOOTNOTE_DEFINITION_NAME,
FOOTNOTE_REFERENCE_NAME, FOOTNOTE_REFERENCE_NAME,
FOOTNOTES_LIST_NAME, FOOTNOTES_LIST_NAME,
generateFootnoteId, generateFootnoteId,
} from "./footnote-util"; } from './footnote-util';
import { footnoteNumberingPlugin } from "./footnote-numbering"; import { footnoteNumberingPlugin } from './footnote-numbering';
import { footnoteSyncPlugin, footnotePastePlugin } from "./footnote-sync"; import { footnoteSyncPlugin, footnotePastePlugin } from './footnote-sync';
export interface FootnoteReferenceOptions { export interface FootnoteReferenceOptions {
HTMLAttributes: Record<string, any>; HTMLAttributes: Record<string, any>;
@@ -27,7 +27,7 @@ export interface FootnoteReferenceOptions {
enableSync?: boolean; enableSync?: boolean;
} }
declare module "@tiptap/core" { declare module '@tiptap/core' {
interface Commands<ReturnType> { interface Commands<ReturnType> {
footnote: { footnote: {
/** /**
@@ -42,8 +42,11 @@ declare module "@tiptap/core" {
removeFootnote: (id: string) => ReturnType; removeFootnote: (id: string) => ReturnType;
/** Scroll to (and focus) a footnote definition by id. */ /** Scroll to (and focus) a footnote definition by id. */
scrollToFootnote: (id: string) => ReturnType; scrollToFootnote: (id: string) => ReturnType;
/** Scroll to (and select) a footnote reference by id. */ /** Scroll to a footnote reference by id. `index` selects WHICH occurrence
scrollToReference: (id: string) => ReturnType; * to scroll to when the id is referenced more than once (reuse, #166):
* 0-based, defaults to the first. Used by the definition's multi-backlink
* UI (#168). */
scrollToReference: (id: string, index?: number) => ReturnType;
}; };
} }
} }
@@ -66,7 +69,7 @@ export const FootnoteReference = Node.create<FootnoteReferenceOptions>({
// Superscript mark's <sup> rule. // Superscript mark's <sup> rule.
priority: 101, priority: 101,
group: "inline", group: 'inline',
inline: true, inline: true,
atom: true, atom: true,
selectable: true, selectable: true,
@@ -99,10 +102,10 @@ export const FootnoteReference = Node.create<FootnoteReferenceOptions>({
return { return {
id: { id: {
default: null, default: null,
parseHTML: (element) => element.getAttribute("data-id"), parseHTML: (element) => element.getAttribute('data-id'),
renderHTML: (attributes) => { renderHTML: (attributes) => {
if (!attributes.id) return {}; if (!attributes.id) return {};
return { "data-id": attributes.id }; return { 'data-id': attributes.id };
}, },
}, },
}; };
@@ -113,7 +116,7 @@ export const FootnoteReference = Node.create<FootnoteReferenceOptions>({
{ {
// High priority so the Superscript mark (which also matches <sup>) does // High priority so the Superscript mark (which also matches <sup>) does
// not claim a footnote reference and drop it as empty content. // not claim a footnote reference and drop it as empty content.
tag: "sup[data-footnote-ref]", tag: 'sup[data-footnote-ref]',
priority: 100, priority: 100,
}, },
]; ];
@@ -121,9 +124,9 @@ export const FootnoteReference = Node.create<FootnoteReferenceOptions>({
renderHTML({ HTMLAttributes }) { renderHTML({ HTMLAttributes }) {
return [ return [
"sup", 'sup',
mergeAttributes( mergeAttributes(
{ "data-footnote-ref": "", class: "footnote-ref" }, { 'data-footnote-ref': '', class: 'footnote-ref' },
this.options.HTMLAttributes, this.options.HTMLAttributes,
HTMLAttributes, HTMLAttributes,
), ),
@@ -132,7 +135,7 @@ export const FootnoteReference = Node.create<FootnoteReferenceOptions>({
// Plain-text representation (used by generateText / markdown text fallbacks). // Plain-text representation (used by generateText / markdown text fallbacks).
renderText({ node }) { renderText({ node }) {
return `[^${node.attrs.id ?? ""}]`; return `[^${node.attrs.id ?? ''}]`;
}, },
addNodeView() { addNodeView() {
@@ -170,8 +173,10 @@ export const FootnoteReference = Node.create<FootnoteReferenceOptions>({
// Make sure the parent accepts an inline atom here. // Make sure the parent accepts an inline atom here.
const insertPos = selection.from; const insertPos = selection.from;
if (!$from.parent.type.spec.content?.includes("inline") && if (
!$from.parent.isTextblock) { !$from.parent.type.spec.content?.includes('inline') &&
!$from.parent.isTextblock
) {
return false; return false;
} }
@@ -311,19 +316,23 @@ export const FootnoteReference = Node.create<FootnoteReferenceOptions>({
`[data-footnote-def][data-id="${id}"]`, `[data-footnote-def][data-id="${id}"]`,
) as HTMLElement | null; ) as HTMLElement | null;
if (!dom) return false; if (!dom) return false;
dom.scrollIntoView({ behavior: "smooth", block: "center" }); dom.scrollIntoView({ behavior: 'smooth', block: 'center' });
return true; return true;
}, },
scrollToReference: scrollToReference:
(id: string) => (id: string, index = 0) =>
({ editor }) => { ({ editor }) => {
if (!id) return false; if (!id) return false;
const dom = editor.view.dom.querySelector( // querySelectorAll returns the occurrences in document order, so the
// index maps 1:1 to the definition's a/b/c backlink (#168). Fall back
// to the first match for an out-of-range index.
const matches = editor.view.dom.querySelectorAll(
`sup[data-footnote-ref][data-id="${id}"]`, `sup[data-footnote-ref][data-id="${id}"]`,
) as HTMLElement | null; );
const dom = (matches[index] ?? matches[0]) as HTMLElement | undefined;
if (!dom) return false; if (!dom) return false;
dom.scrollIntoView({ behavior: "smooth", block: "center" }); dom.scrollIntoView({ behavior: 'smooth', block: 'center' });
return true; return true;
}, },
}; };

View File

@@ -1,12 +1,12 @@
import { Node as ProseMirrorNode } from "@tiptap/pm/model"; import { Node as ProseMirrorNode } from '@tiptap/pm/model';
/** /**
* Node type names for the footnote feature. Centralized so every part of the * Node type names for the footnote feature. Centralized so every part of the
* feature (nodes, plugins, commands) references the same string. * feature (nodes, plugins, commands) references the same string.
*/ */
export const FOOTNOTE_REFERENCE_NAME = "footnoteReference"; export const FOOTNOTE_REFERENCE_NAME = 'footnoteReference';
export const FOOTNOTES_LIST_NAME = "footnotesList"; export const FOOTNOTES_LIST_NAME = 'footnotesList';
export const FOOTNOTE_DEFINITION_NAME = "footnoteDefinition"; export const FOOTNOTE_DEFINITION_NAME = 'footnoteDefinition';
/** /**
* Generate a uuidv7-style id (time-ordered). Implemented locally so editor-ext * Generate a uuidv7-style id (time-ordered). Implemented locally so editor-ext
@@ -15,10 +15,10 @@ export const FOOTNOTE_DEFINITION_NAME = "footnoteDefinition";
*/ */
export function generateFootnoteId(): string { export function generateFootnoteId(): string {
const now = Date.now(); const now = Date.now();
const timeHex = now.toString(16).padStart(12, "0"); const timeHex = now.toString(16).padStart(12, '0');
const rand = (length: number) => { const rand = (length: number) => {
let out = ""; let out = '';
for (let i = 0; i < length; i++) { for (let i = 0; i < length; i++) {
out += Math.floor(Math.random() * 16).toString(16); out += Math.floor(Math.random() * 16).toString(16);
} }
@@ -26,19 +26,19 @@ export function generateFootnoteId(): string {
}; };
// version 7 nibble, then variant (8..b) nibble. // version 7 nibble, then variant (8..b) nibble.
const versioned = "7" + rand(3); const versioned = '7' + rand(3);
const variantNibble = (8 + Math.floor(Math.random() * 4)).toString(16); const variantNibble = (8 + Math.floor(Math.random() * 4)).toString(16);
const variant = variantNibble + rand(3); const variant = variantNibble + rand(3);
return ( return (
timeHex.slice(0, 8) + timeHex.slice(0, 8) +
"-" + '-' +
timeHex.slice(8, 12) + timeHex.slice(8, 12) +
"-" + '-' +
versioned + versioned +
"-" + '-' +
variant + variant +
"-" + '-' +
rand(12) rand(12)
); );
} }
@@ -89,7 +89,7 @@ export function deriveFootnoteId(
* Purely deterministic. * Purely deterministic.
*/ */
function suffix(n: number): string { function suffix(n: number): string {
let out = ""; let out = '';
let x = n; let x = n;
while (x > 0) { while (x > 0) {
const rem = (x - 1) % 25; const rem = (x - 1) % 25;
@@ -131,3 +131,19 @@ export function computeFootnoteNumbers(
} }
return numbers; return numbers;
} }
/**
* Build a map of `referenceId -> number of reference occurrences` (>= 1) from
* document order. After #166 the same id may be referenced multiple times
* (reuse: one number, one definition, N forward links); this count drives the
* definition's multi-backlink UI (↩ a b c …, #168). Pure function of the doc.
*/
export function computeFootnoteRefCounts(
doc: ProseMirrorNode,
): Map<string, number> {
const counts = new Map<string, number>();
for (const id of collectReferenceIds(doc)) {
counts.set(id, (counts.get(id) ?? 0) + 1);
}
return counts;
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,109 @@
/**
* The client seam. `pull.ts`/`push.ts` depend on a narrow STRUCTURAL interface
* rather than any concrete client, because the gitmost server writes NATIVELY —
* through repositories + collab `openDirectConnection`.
*
* `GitSyncClient` is that interface: the native datasource (server side)
* implements it, and the engine only ever uses `Pick<GitSyncClient, ...>`
* subsets of it. The signatures below MIRROR exactly the methods the engine's
* `pull.ts`/`push.ts` actually call (arg shapes + the fields the engine reads
* off each result), so a REST-style client is still structurally assignable and
* the native adapter has a precise contract.
*/
/**
* A page node as returned by `listSpaceTree` (the sidebar/tree walk, no body).
* The engine layout (`buildVaultLayout`) consumes `PageNode` from `./layout`,
* which only requires `id` (+ optional `title`/`slugId`/`parentPageId`); this
* lite shape documents the fields the tree walk surfaces. Real tree nodes also
* carry `position`, `icon`, `hasChildren` — kept open via the index signature.
*/
export interface GitSyncPageNodeLite {
id: string;
slugId?: string;
title?: string;
parentPageId?: string | null;
hasChildren?: boolean;
/** `listSpaceTree` nodes carry extra fields (position, icon, …). */
[key: string]: unknown;
}
/**
* The structural client the engine depends on. Only `Pick<GitSyncClient, ...>`
* subsets are ever used:
* - pull reads: `getPageJson` (+ the tree walk's `listSpaceTree`),
* - push writes: `importPageMarkdown` / `createPage` / `deletePage` /
* `movePage` / `renamePage`,
* - continuous (phase B+): `listRecentSince` / `listTrash` / `restorePage`.
*/
export interface GitSyncClient {
/**
* Full tree of page nodes for the space (or the subtree rooted at
* `rootPageId`), each WITHOUT body content. `complete` is `false` when the
* walk was truncated / a fetch failed — the pull side suppresses absence
* deletions on an incomplete tree (SPEC §8). Native impl returns
* `complete: true` always (reads the DB, not a paginated REST endpoint).
*/
listSpaceTree(spaceId: string, rootPageId?: string): Promise<{
pages: GitSyncPageNodeLite[];
complete: boolean;
}>;
/**
* One page WITH its ProseMirror body content. `applyPullActions` reads
* `id`, `slugId`, `title`, `parentPageId`, `spaceId` (for the file meta) and
* `content` (to stabilize/serialize). `updatedAt` is carried for the
* poll-suppression loop-guard.
*/
getPageJson(pageId: string): Promise<{
id: string;
slugId: string;
title: string;
parentPageId: string | null;
spaceId: string;
updatedAt: string;
content: unknown;
}>;
/**
* Merge a page's body from a self-contained markdown file (meta + body). The
* collab/Yjs write path (SPEC §2/§15.6) — never a raw jsonb overwrite.
* `applyPushActions` reads only an optional `updatedAt` off the result
* (via `extractUpdatedAt`, tolerant of extra fields).
*
* `baseMarkdown` is the last-synced version of the file (`refs/docmost/
* last-pushed`), the common ancestor for a THREE-WAY merge against the live
* doc so concurrent human edits survive (review #5). Optional/null -> 2-way.
*/
importPageMarkdown(pageId: string, fullMarkdown: string, baseMarkdown?: string | null): Promise<{
updatedAt?: string;
[key: string]: unknown;
}>;
/**
* Create a new page and return the assigned id at `data.id`
* (`applyPushActions` reads `result.data.id`, then writes it back into the
* file's meta). An optional top-level/`data.updatedAt` feeds the loop-guard.
*/
createPage(title: string, content: string, spaceId: string, parentPageId?: string): Promise<{
data: {
id: string;
};
updatedAt?: string;
[key: string]: unknown;
}>;
/** Soft-delete a page to Trash (SPEC §8). Result is not inspected. */
deletePage(pageId: string): Promise<unknown>;
/**
* Reparent a page (and optionally set its fractional-index `position`). The
* engine passes `position` UNDEFINED for now; the native impl computes a
* default between siblings. Result is not inspected.
*/
movePage(pageId: string, parentPageId: string | null, position?: string): Promise<unknown>;
/** Change a page's title only (no body touch). Result is not inspected. */
renamePage(pageId: string, title: string): Promise<unknown>;
/**
* Pages updated since `sinceIso` (the poll-safety reconciliation, SPEC §8).
* `spaceId` may be undefined (all spaces); `hardPageCap` bounds the walk.
*/
listRecentSince(spaceId: string | undefined, sinceIso: string | null, hardPageCap?: number): Promise<unknown[]>;
/** List soft-deleted (trashed) pages for the space (deletion detection). */
listTrash(spaceId: string): Promise<unknown[]>;
/** Restore a soft-deleted page from Trash. Result is not inspected. */
restorePage(pageId: string): Promise<unknown>;
}

View File

@@ -0,0 +1,13 @@
/**
* The client seam. `pull.ts`/`push.ts` depend on a narrow STRUCTURAL interface
* rather than any concrete client, because the gitmost server writes NATIVELY —
* through repositories + collab `openDirectConnection`.
*
* `GitSyncClient` is that interface: the native datasource (server side)
* implements it, and the engine only ever uses `Pick<GitSyncClient, ...>`
* subsets of it. The signatures below MIRROR exactly the methods the engine's
* `pull.ts`/`push.ts` actually call (arg shapes + the fields the engine reads
* off each result), so a REST-style client is still structurally assignable and
* the native adapter has a precise contract.
*/
export {};

View File

@@ -0,0 +1 @@
export declare function loadSettingsOrExit<T>(factory: () => T): T;

View File

@@ -0,0 +1,50 @@
import { ZodError } from 'zod';
// Turn a ZodError from settings validation into a clear, actionable startup
// message that names the offending env var(s), then exit(1) — no raw stack
// trace. Mirrors the Python new-project skeleton's load_settings_or_exit.
// A non-ZodError is left to propagate unchanged.
export function loadSettingsOrExit(factory) {
try {
return factory();
}
catch (err) {
if (!(err instanceof ZodError))
throw err;
const missing = [];
const invalid = [];
for (const issue of err.issues) {
const name = issue.path.length ? String(issue.path[0]) : '?';
// A missing required variable surfaces as an `invalid_type` issue whose
// received value was `undefined`. zod 3 exposed `issue.received` directly;
// zod 4 dropped that field and instead folds it into the message
// ("expected string, received undefined"). Detect both shapes so the
// missing-vs-invalid split holds across zod majors. NOTE: an invalid (but
// present) value uses a different code (invalid_format / invalid_value) or
// an `invalid_type` message that reports a non-undefined received (e.g.
// "received NaN" from a coerced number), so neither is misread as missing.
const i = issue;
const isMissing = issue.code === 'invalid_type' &&
(i.received === 'undefined' ||
/received undefined/i.test(i.message ?? ''));
if (isMissing)
missing.push(name);
else
invalid.push(`${name}: ${issue.message}`);
}
const lines = ['Configuration error in environment / .env:'];
if (missing.length) {
lines.push(' Missing required variable(s):');
for (const n of [...new Set(missing)])
lines.push(` - ${n}`);
}
if (invalid.length) {
lines.push(' Invalid value(s):');
for (const item of invalid)
lines.push(` - ${item}`);
}
lines.push('');
lines.push('Set them in .env (see .env.example) and try again.');
process.stderr.write(lines.join('\n') + '\n');
process.exit(1);
}
}

View File

@@ -0,0 +1,70 @@
import { VaultGit } from "./git.js";
import { GitSyncClient } from "./client.types.js";
import { Settings } from "./settings.js";
/**
* Absolute-path filesystem primitives the cycle needs. Injected (not imported)
* so the engine stays IO-free and unit-testable. `mkdir` is recursive; `rm` is
* force (a missing file is a no-op).
*/
export interface CycleFs {
readFile: (absPath: string) => Promise<string>;
writeFile: (absPath: string, text: string) => Promise<void>;
mkdir: (absDir: string) => Promise<void>;
rm: (absPath: string) => Promise<void>;
}
export interface RunCycleDeps {
spaceId: string;
/** The Docmost seam (reads for pull, writes for push). */
client: GitSyncClient;
/** The per-space git vault (a real working repo). */
vault: VaultGit;
/** Engine settings; `vaultPath` roots the relPath -> absolute-path mapping. */
settings: Settings;
fs: CycleFs;
log: (line: string) => void;
/**
* Delete-cap hook (the ONLY caller-specific policy). Called with the push
* dry-run's planned delete count (`Number.POSITIVE_INFINITY` when the dry-run
* itself failed, so the hook can fail safe) and the live client; returns the
* client to use for the REAL apply. The default (omitted) applies every op
* unmodified. gitmost uses it to neutralize deletes when over its cap.
*
* When omitted, NO dry-run is performed (one fewer push planning pass).
*/
resolveApplyClient?: (plannedDeletes: number, client: GitSyncClient) => GitSyncClient;
}
export interface RunCycleResult {
ran: boolean;
/** Set when the cycle short-circuited without running pull/push. */
skipped?: "merge-in-progress";
pull?: {
written: number;
deleted: number;
conflict: boolean;
};
push?: {
mode: string;
failures: number;
};
}
/**
* Run ONE full reconcile cycle for a space: PULL (Docmost -> vault) then PUSH
* (vault -> Docmost), under the engine's required branch choreography. This is
* the single entry point the app drives — it owns the staging order so it can
* never drift from the engine it ships with.
*
* Staging (the ⭐ data-loss-critical order, SPEC §6/§9):
* 1. assertGitAvailable + ensureRepo (the git state store must exist).
* 2. refuse on an unresolved merge (a prior conflicting pull); next checkout
* would fail otherwise.
* 3. ensureBranch('docmost','main') + checkout('docmost'). Pull writes MUST
* land on `docmost`, not `main`: applyPullActions commits on `docmost`,
* then checks out `main` and merges docmost -> main. Writing Docmost
* content straight onto `main` would clobber local file edits before push
* can diff them.
* 4. PULL: readExisting -> listSpaceTree -> computePullActions -> apply.
* 5. PUSH: optional dry-run to feed the delete-cap hook, then the real apply.
*
* Lock + cap POLICY live in the caller; this owns only the mechanics.
*/
export declare function runCycle(deps: RunCycleDeps): Promise<RunCycleResult>;

View File

@@ -0,0 +1,97 @@
import { readExisting, computePullActions, applyPullActions } from "./pull.js";
import { runPush } from "./push.js";
/**
* Run ONE full reconcile cycle for a space: PULL (Docmost -> vault) then PUSH
* (vault -> Docmost), under the engine's required branch choreography. This is
* the single entry point the app drives — it owns the staging order so it can
* never drift from the engine it ships with.
*
* Staging (the ⭐ data-loss-critical order, SPEC §6/§9):
* 1. assertGitAvailable + ensureRepo (the git state store must exist).
* 2. refuse on an unresolved merge (a prior conflicting pull); next checkout
* would fail otherwise.
* 3. ensureBranch('docmost','main') + checkout('docmost'). Pull writes MUST
* land on `docmost`, not `main`: applyPullActions commits on `docmost`,
* then checks out `main` and merges docmost -> main. Writing Docmost
* content straight onto `main` would clobber local file edits before push
* can diff them.
* 4. PULL: readExisting -> listSpaceTree -> computePullActions -> apply.
* 5. PUSH: optional dry-run to feed the delete-cap hook, then the real apply.
*
* Lock + cap POLICY live in the caller; this owns only the mechanics.
*/
export async function runCycle(deps) {
const { spaceId, client, vault, settings, fs, log, resolveApplyClient } = deps;
const vaultRoot = settings.vaultPath;
const abs = (relPath) => `${vaultRoot}/${relPath}`;
// 1. The engine state store is git: make sure the repo + branches exist
// before any tracked-file listing or diff.
await vault.assertGitAvailable();
await vault.ensureRepo();
// 2. Refuse to run on top of an unresolved merge (SPEC §9): a prior
// conflicting pull leaves the vault mid-merge; the next checkout would fail.
if (await vault.isMergeInProgress()) {
log(`vault has an unresolved merge — resolve it (or 'git merge --abort') ` +
`and re-run (SPEC §9); skipping cycle.`);
return { ran: false, skipped: "merge-in-progress" };
}
// 3. Pull writes happen on `docmost`; be on it BEFORE applying (see docstring).
await vault.ensureBranch("docmost", "main");
await vault.checkout("docmost");
// 4. PULL --------------------------------------------------------------------
const existing = await readExisting({
listTracked: () => vault.listTrackedFiles("*.md"),
readFile: (relPath) => fs.readFile(abs(relPath)),
});
const tree = await client.listSpaceTree(spaceId);
const pullActions = computePullActions({
pages: tree.pages,
treeComplete: tree.complete,
existing,
});
const pullResult = await applyPullActions({
client,
git: vault,
writeFile: (absPath, text) => fs.writeFile(absPath, text),
mkdir: (absDir) => fs.mkdir(absDir),
rm: (absPath) => fs.rm(absPath),
}, pullActions, vaultRoot);
// 5. PUSH --------------------------------------------------------------------
const pushDeps = {
settings,
git: vault,
makeClient: () => client,
readFile: (relPath) => fs.readFile(abs(relPath)),
writeFile: (relPath, text) => fs.writeFile(abs(relPath), text),
log,
};
let applyClient = client;
if (resolveApplyClient) {
// Plan the push as a DRY-RUN first to read the delete count, then let the
// caller decide the apply client (e.g. neutralize deletes over a cap). A
// failed dry-run yields Infinity so the hook can fail safe.
let plannedDeletes;
try {
const dry = await runPush(pushDeps, { dryRun: true });
plannedDeletes = dry.planned?.deletes ?? 0;
}
catch (err) {
log(`push dry-run planning failed (${err instanceof Error ? err.message : String(err)}); deferring deletion policy to the cap hook (fail-safe).`);
plannedDeletes = Number.POSITIVE_INFINITY;
}
applyClient = resolveApplyClient(plannedDeletes, client);
}
const pushResult = await runPush({ ...pushDeps, makeClient: () => applyClient }, { dryRun: false });
return {
ran: true,
pull: {
written: pullResult.written,
deleted: pullResult.deleted,
conflict: pullResult.merge.conflict,
},
push: {
mode: pushResult.mode,
failures: pushResult.failures?.length ?? 0,
},
};
}

259
packages/git-sync/build/engine/git.d.ts vendored Normal file
View File

@@ -0,0 +1,259 @@
/** Bot identity used for engine-authored vault commits (SPEC §7.3). */
export declare const BOT_AUTHOR_NAME = "Docmost Sync";
export declare const BOT_AUTHOR_EMAIL = "docmost-sync@local";
/** Default branch the vault repo is initialized on. */
export declare const DEFAULT_BRANCH = "main";
/**
* One row of `git diff --name-status` (SPEC §6 "ФС → Docmost"). `status` is the
* single-letter change code (`-M` rename detection on), `path` is the (new) file
* path; for a rename/copy (`R`/`C`) `oldPath` is the source and `path` is the
* destination, with `score` carrying git's similarity index (0–100).
*/
export interface DiffEntry {
status: "A" | "M" | "D" | "R" | "C";
/** New (destination) path. For A/M/D it is the only path. */
path: string;
/** Source path — present only for R/C. */
oldPath?: string;
/** Rename/copy similarity score (0–100) — present only for R/C. */
score?: number;
}
/** Result of a `merge`: whether it succeeded cleanly or left conflict markers. */
export interface MergeResult {
/** True when the merge applied cleanly (fast-forward or clean 3-way). */
ok: boolean;
/** True when the merge stopped on conflicts (markers left in the worktree). */
conflict: boolean;
/** Raw combined stdout+stderr, for logging/diagnostics. */
output: string;
}
/** Options for an engine-authored commit (provenance, SPEC §7.3). */
export interface CommitOptions {
authorName: string;
authorEmail: string;
/**
* Trailer lines appended to the commit message body (e.g.
* `Docmost-Sync-Source: docmost`). These are the machine-readable provenance
* the loop-guard keys on (SPEC §12, "commit-attribution").
*/
trailers?: string[];
}
/**
* A git wrapper bound to a single vault path. Construct once per vault; every
* method runs git with `cwd = vaultPath`.
*/
export declare class VaultGit {
private readonly vaultPath;
constructor(vaultPath: string);
/**
* Preflight: verify a runnable `git` binary is on PATH. The daemon shells out
* to system `git` for every vault operation, so a missing binary (e.g. a slim
* container image without git) must fail fast with an actionable message
* rather than a cryptic ENOENT deep inside the first real git call. Presence
* check only — we do NOT gate on a specific version. Runs `git --version`
* with NO `cwd` (the vault dir may not exist yet at preflight time).
*/
assertGitAvailable(): Promise<void>;
/**
* Run a git command in the vault and return trimmed stdout. THIN wrapper over
* the single `runRaw` primitive: throws a clear, unified Error (including
* stderr/stdout) on a non-zero exit.
*/
private run;
/**
* The ONE primitive every git invocation in this module flows through. Builds
* the full argv (`--no-pager -c core.quotepath=false <args>`), env, cwd, and
* maxBuffer, runs git, and NEVER throws — it returns the exit info so callers
* can treat a non-zero exit as either an error (`run`) or a meaningful state
* (e.g. a merge conflict, a porcelain diff that "fails" deliberately).
*
* - argv: ALWAYS prepends `--no-pager -c core.quotepath=false`, so git never
* blocks on a pager and always prints verbatim UTF-8 paths (no octal
* escaping/quoting). `quotepath=false` is the baseline for ALL path-
* printing commands (ls-files, diff --name-only, …).
* - cwd: `opts.cwd === null` -> do NOT set cwd (the preflight, where the
* vault dir may not exist); otherwise `opts.cwd ?? this.vaultPath`.
* - env: `vaultGitEnv(opts?.env)` (cwd-isolation + caller extras).
* - On a spawn/exec error we capture the error `message` too, so a failure
* before git could write to stderr (e.g. ENOENT) is NOT lost.
*/
private runRaw;
/**
* Ensure the vault directory exists and is an initialized git repo on `main`
* with an initial (empty) commit so branches exist. Idempotent: safe to call
* on every run. Sets a LOCAL bot identity for the vault repo if none is set
* (so engine commits never fall back to a global/unset identity).
*/
ensureRepo(): Promise<void>;
/** True if `cwd` is inside a git work-tree (the vault is initialized). */
private isRepo;
/** True if a LOCAL git config key is set in the vault repo. */
private hasLocalConfig;
/** True if the repo has at least one commit (HEAD resolves). */
private hasAnyCommit;
/** True if a branch with the given name exists. */
branchExists(name: string): Promise<boolean>;
/**
* Create `name` from `fromBranch` if it does not already exist. No-op (and no
* checkout) when the branch is already present.
*/
ensureBranch(name: string, fromBranch: string): Promise<void>;
/** Name of the currently checked-out branch. */
currentBranch(): Promise<string>;
/** Check out an existing branch. */
checkout(name: string): Promise<void>;
/** Stage everything (adds, modifications, deletions). */
stageAll(): Promise<void>;
/**
* True if the vault is mid-merge (an unresolved merge from a previous run,
* SPEC §9 / §12). Detected via a `MERGE_HEAD` ref OR any unmerged
* (conflicted) index entries (`git ls-files -u`). The pull cycle checks this
* BEFORE any checkout so a left-over merge produces a clear, actionable
* message instead of a raw "you need to resolve your current index first"
* failure deep inside `checkout`. This is what makes re-runs converge
* (resumability, SPEC §12).
*/
isMergeInProgress(): Promise<boolean>;
/**
* Commit the currently STAGED changes with an explicit author/committer
* identity and the given trailers appended to the message body (SPEC §7.3
* provenance). Returns `true` if a commit was made, `false` if there was
* nothing to commit (graceful no-op). The caller is expected to have staged
* its changes first (e.g. via `stageAll`).
*/
commit(message: string, opts: CommitOptions): Promise<boolean>;
/**
* Low-level commit used by both `commit` and `ensureRepo`'s initial commit.
* Builds the full message with appended trailers and sets author + committer
* identity via env vars (so the committer matches the author, not the repo
* default).
*/
private commitRaw;
/**
* Merge `fromBranch` into the current branch (`git merge --no-edit`).
* Fast-forwards when possible; performs a real 3-way merge otherwise. Conflict
* state is SURFACED (returned), NOT auto-resolved (SPEC §9): the conflict
* markers are left in the worktree for manual resolution by a later increment,
* and — critically — nothing is pushed to Docmost (we never write to Docmost
* anyway).
*/
merge(fromBranch: string): Promise<MergeResult>;
/** True if the index has any unmerged (conflicted) paths. */
private hasUnmergedPaths;
/**
* List tracked files on the current branch (paths relative to the vault
* root, forward-slash separated). An optional glob (a git pathspec) narrows
* the listing, e.g. `"*.md"`.
*
* The target wiki is RUSSIAN, so vault file names routinely contain Cyrillic
* (e.g. `Колонка.md`). With git's DEFAULT `core.quotepath=true`, `ls-files`
* returns non-ASCII paths octal-escaped and double-quoted (`"\320\232..."`),
* which `src/pull.ts` `readExisting` would then parse as garbage paths,
* breaking move/duplicate detection. We defeat that two ways at once:
* - `core.quotepath=false` disables the octal-escape/quoting. It is now the
* `runRaw` argv baseline (prepended to EVERY invocation), so we no longer
* pass it inline here.
* - `-z` emits NUL-delimited RAW UTF-8 paths (no quoting, no newline
* ambiguity), which we split on `\0`.
* We read the RAW stdout (NOT the trimming `run()` helper, which would mangle
* the NUL-delimited bytes) and split on `\0`, dropping empty entries. Paths
* are returned verbatim — git already emits forward slashes.
*/
listTrackedFiles(glob?: string): Promise<string[]>;
/**
* Diff two refs with `--name-status -M -z` and parse the NUL-delimited output
* (SPEC §6: the FS→Docmost push direction diffs `main` against
* `refs/docmost/last-pushed`). Rename detection is ON (`-M`), so a moved/renamed
* file is reported as a single `R` row with both its old and new path instead
* of a delete+add pair — that distinction is what lets the push planner tell a
* move from a delete+create (SPEC §8 "Move vs delete").
*
* `-z` makes git emit NUL-delimited RAW UTF-8 records (the Russian wiki has
* Cyrillic file names) with NO quoting/escaping. The record shape differs by
* status:
* - A/M/D: `status\0path\0`
* - R/C: `Rnnn\0oldPath\0newPath\0` (nnn = similarity score, e.g. `R100`)
* We read the RAW stdout (not the trimming `run()` helper, which would mangle
* the NUL bytes), split on `\0`, drop the trailing empty entry, and walk the
* tokens pulling 1 or 2 path tokens per status. Paths are returned verbatim.
*/
diffNameStatus(fromRef: string, toRef: string): Promise<DiffEntry[]>;
/**
* Resolve a ref/commit-ish to its full SHA, or `null` if it does not exist.
* `rev-parse --verify --quiet` exits non-zero (and prints nothing) for an
* unknown ref, so a non-zero exit maps cleanly to `null`. Used to read
* `refs/docmost/last-pushed` (SPEC §5) — which is absent before the first push.
*/
revParse(ref: string): Promise<string | null>;
/**
* Read a ref to its SHA, or `null` if unset. Thin alias over `revParse`,
* named for the push direction's marker `refs/docmost/last-pushed` (SPEC §5:
* "что из `main` уже отражено в Docmost").
*/
readRef(ref: string): Promise<string | null>;
/**
* Point `ref` at `target` (`git update-ref <ref> <target>`). Used to advance
* `refs/docmost/last-pushed` to the just-pushed `main` commit after a push
* (SPEC §6 step 3 / §5). `target` may be a SHA or any commit-ish git accepts.
*/
updateRef(ref: string, target: string): Promise<void>;
/**
* Fast-forward `branch` to `toCommit` — but ONLY if it is a TRUE fast-forward,
* i.e. the current `branch` tip is an ancestor of `toCommit` (verified via
* `git merge-base --is-ancestor <branch> <toCommit>`). Used to advance the
* `docmost` mirror branch after a clean push (SPEC §6 step 3 / §10): once a
* push succeeds, Docmost already contains the pushed `main` content, so the
* mirror must reflect it — otherwise the NEXT pull would diff our own write
* back and re-pull it (loop-guard).
*
* SAFETY — never force, never clobber divergent history:
* - If `branch` IS an ancestor of `toCommit`, advance it with
* `git update-ref refs/heads/<branch> <toCommit>`. The `docmost` branch is
* NOT checked out during a push (push works on `main`), so updating the ref
* directly is safe and avoids any working-tree touch.
* - If `branch` is NOT an ancestor (divergent / would-be non-fast-forward),
* do NOT move it — return `{ ok: false, reason: 'not-fast-forward' }` and
* let the caller log it. We must never overwrite a `docmost` history that
* has commits the push base does not contain.
*
* Returns `{ ok: true }` when the branch was advanced (or already at
* `toCommit`, a degenerate fast-forward), `{ ok: false, reason }` otherwise.
* A missing `branch` or `toCommit` also yields `{ ok: false }` with a reason.
*/
fastForwardBranch(branch: string, toCommit: string): Promise<{
ok: boolean;
reason?: string;
}>;
/**
* Read a file's content at a specific ref (`git show <ref>:<path>`), or `null`
* if the path does not exist there. Used by the push direction to read the
* PRE-IMAGE of a DELETED file (e.g. at `refs/docmost/last-pushed`) so its
* `docmost:meta` — and therefore its `pageId` — can be recovered to translate
* the deletion into a `delete_page` (SPEC §6/§8: only TRACKED files, i.e. ones
* that had a pageId, are deleted in Docmost). A non-zero exit (path absent at
* that ref) maps to `null` rather than throwing.
*/
showFileAtRef(ref: string, path: string): Promise<string | null>;
}
/**
* Build the environment for a vault git invocation (SPEC §12 cwd-isolation).
* Used by the single `runRaw` primitive every git command flows through, so
* these pins apply uniformly (including the `git --version` preflight).
*
* cwd-isolation is this module's central safety guarantee: every git command
* MUST operate on the vault repo at `cwd: vaultPath` and nothing else. An
* inherited `GIT_DIR` / `GIT_WORK_TREE` in `process.env` would silently
* redirect the operation away from `cwd` (e.g. to the source repo or another
* checkout), defeating that guarantee. So we always strip them, regardless of
* whatever else the caller adds (author/committer identity, etc.).
*
* Exported for unit testing.
*/
export declare function vaultGitEnv(extra?: Record<string, string>): NodeJS.ProcessEnv;
/**
* Build a commit message body with trailer lines appended (SPEC §7.3). The
* trailers are separated from the subject by a blank line so `git interpret-
* trailers` / `git log --format=%(trailers)` parse them as trailers.
* Exported for unit testing.
*/
export declare function buildCommitMessage(subject: string, trailers?: string[]): string;

View File

@@ -0,0 +1,570 @@
/**
* Thin async wrapper over the system `git` binary (SPEC §5: state store = git).
*
* IMPORTANT — VAULT-SCOPED: every operation here runs with `cwd = vaultPath`,
* which is the vault's OWN git repository (default `data/vault`), SEPARATE from
* the gitmost application repo. This module MUST NEVER run git against the
* application repo. `data/` is gitignored, so a nested repo under `data/vault`
* is safe. The pull cycle is READ-ONLY toward Docmost; this module only touches
* the local vault git, never a git remote (push is deferred, see SPEC §7).
*
* Implementation notes:
* - We shell out via `node:child_process` `execFile` (promisified), passing
* ARGS AS AN ARRAY — no shell, so there is no command injection surface even
* if a page title / branch name contains shell metacharacters.
* - EVERY git invocation funnels through the single `runRaw` primitive, which
* ALWAYS prepends `--no-pager -c core.quotepath=false` to the argv (so git
* never blocks on a pager and always prints verbatim UTF-8 paths). There is
* no exception — even the `git --version` preflight goes through `runRaw`.
* - "nothing to commit" is treated as a graceful no-op, not an error.
*/
import { execFile } from "node:child_process";
import { mkdir } from "node:fs/promises";
import { promisify } from "node:util";
const execFileAsync = promisify(execFile);
/** Bot identity used for engine-authored vault commits (SPEC §7.3). */
export const BOT_AUTHOR_NAME = "Docmost Sync";
export const BOT_AUTHOR_EMAIL = "docmost-sync@local";
/** Default branch the vault repo is initialized on. */
export const DEFAULT_BRANCH = "main";
/**
* A git wrapper bound to a single vault path. Construct once per vault; every
* method runs git with `cwd = vaultPath`.
*/
export class VaultGit {
vaultPath;
constructor(vaultPath) {
this.vaultPath = vaultPath;
}
/**
* Preflight: verify a runnable `git` binary is on PATH. The daemon shells out
* to system `git` for every vault operation, so a missing binary (e.g. a slim
* container image without git) must fail fast with an actionable message
* rather than a cryptic ENOENT deep inside the first real git call. Presence
* check only — we do NOT gate on a specific version. Runs `git --version`
* with NO `cwd` (the vault dir may not exist yet at preflight time).
*/
async assertGitAvailable() {
// Goes through the single `runRaw` primitive like every other invocation.
// `cwd: null` means "do not set a cwd" — the vault dir may not exist yet at
// preflight time, so we must not point git at a missing directory.
const r = await this.runRaw(["--version"], { cwd: null });
if (r.code !== 0) {
const detail = (r.stderr || r.stdout || "").trim();
throw new Error("git binary not found or not runnable — install git (the vault state " +
`store requires it). Underlying error: ${detail}`);
}
}
/**
* Run a git command in the vault and return trimmed stdout. THIN wrapper over
* the single `runRaw` primitive: throws a clear, unified Error (including
* stderr/stdout) on a non-zero exit.
*/
async run(args, opts) {
const r = await this.runRaw(args, opts);
if (r.code !== 0) {
const detail = (r.stderr || r.stdout || "").trim();
throw new Error(`git ${args.join(" ")} failed: ${detail}`);
}
return r.stdout.trim();
}
/**
* The ONE primitive every git invocation in this module flows through. Builds
* the full argv (`--no-pager -c core.quotepath=false <args>`), env, cwd, and
* maxBuffer, runs git, and NEVER throws — it returns the exit info so callers
* can treat a non-zero exit as either an error (`run`) or a meaningful state
* (e.g. a merge conflict, a porcelain diff that "fails" deliberately).
*
* - argv: ALWAYS prepends `--no-pager -c core.quotepath=false`, so git never
* blocks on a pager and always prints verbatim UTF-8 paths (no octal
* escaping/quoting). `quotepath=false` is the baseline for ALL path-
* printing commands (ls-files, diff --name-only, …).
* - cwd: `opts.cwd === null` -> do NOT set cwd (the preflight, where the
* vault dir may not exist); otherwise `opts.cwd ?? this.vaultPath`.
* - env: `vaultGitEnv(opts?.env)` (cwd-isolation + caller extras).
* - On a spawn/exec error we capture the error `message` too, so a failure
* before git could write to stderr (e.g. ENOENT) is NOT lost.
*/
async runRaw(args, opts) {
const cwd = opts?.cwd === null ? undefined : (opts?.cwd ?? this.vaultPath);
try {
const { stdout, stderr } = await execFileAsync("git", ["--no-pager", "-c", "core.quotepath=false", ...args], {
// Generous buffer: file listings / porcelain output on a large vault
// can be sizable.
...(cwd !== undefined ? { cwd } : {}),
maxBuffer: 64 * 1024 * 1024,
env: vaultGitEnv(opts?.env),
});
return { code: 0, stdout, stderr };
}
catch (err) {
const e = err;
return {
code: typeof e.code === "number" ? e.code : 1,
stdout: e.stdout ?? "",
// Preserve the error message when there is no stderr (e.g. a spawn
// failure like ENOENT, where promisified execFile sets stderr to an
// EMPTY STRING — so `||`, not `??`, to fall through to `message`).
stderr: e.stderr || e.message || "",
};
}
}
/**
* Ensure the vault directory exists and is an initialized git repo on `main`
* with an initial (empty) commit so branches exist. Idempotent: safe to call
* on every run. Sets a LOCAL bot identity for the vault repo if none is set
* (so engine commits never fall back to a global/unset identity).
*/
async ensureRepo() {
await mkdir(this.vaultPath, { recursive: true });
if (!(await this.isRepo())) {
// `git init -b main` sets the initial branch on modern git; we still
// guard the branch name below for safety on older binaries.
await this.run(["init", "-b", DEFAULT_BRANCH]);
}
// Set a local identity for the vault repo if unset, so engine commits have
// a deterministic committer even on a machine with no global git config.
if (!(await this.hasLocalConfig("user.name"))) {
await this.run(["config", "user.name", BOT_AUTHOR_NAME]);
}
if (!(await this.hasLocalConfig("user.email"))) {
await this.run(["config", "user.email", BOT_AUTHOR_EMAIL]);
}
// Neutralize correctness-affecting git config in the vault's LOCAL config so
// a user's GLOBAL/system config cannot change porcelain BEHAVIOR (not just
// output) and corrupt the vault. The vault is OUR dedicated repo, so LOCAL
// values (which override global/system) are the right scope. Set
// UNCONDITIONALLY every run — idempotent and cheap; `git config <key>`
// writes to `--local` by default inside the repo. These MUST be in place
// before any add/commit/checkout that could be affected, hence they run
// before the initial-commit block below.
// - core.autocrlf=false — CRITICAL (SPEC §11): a global core.autocrlf=true
// would rewrite LF<->CRLF on add/checkout, making our deterministic,
// byte-stable markdown churn and breaking the round-trip invariant.
// `false` guarantees git stores/checks out verbatim bytes.
// - core.safecrlf=false — avoid CRLF-related warnings/aborts on add.
// - commit.gpgsign=false — the headless daemon must never try to GPG-sign
// a commit (would fail/hang; we already set GIT_TERMINAL_PROMPT=0).
// - core.attributesFile=/dev/null — neutralize the user's GLOBAL
// gitattributes so a global clean/smudge filter (filter.<name>.clean)
// cannot rewrite the STORED blob and break §11 byte-stability (a config
// that core.autocrlf=false does not cover). POSIX-only path, which is
// fine: the daemon runs on Linux (Docker) / macOS. A system
// /etc/gitattributes remains the host admin's domain (out of scope).
// NOTE: these stay PERSISTED LOCAL config (not `-c` flags) on purpose — a
// human running git by hand in the vault must inherit the same neutralized
// behavior; a transient `-c` would not persist. (core.quotepath, by
// contrast, only affects OUR parsing of output and so is baked into the
// `runRaw` argv baseline instead.)
try {
await this.run(["config", "core.autocrlf", "false"]);
await this.run(["config", "core.safecrlf", "false"]);
await this.run(["config", "commit.gpgsign", "false"]);
await this.run(["config", "core.attributesFile", "/dev/null"]);
}
catch (err) {
const detail = err instanceof Error ? err.message : String(err);
throw new Error(`failed to pin vault git config (SPEC §11) — ensure ${this.vaultPath}` +
"/.git/config is writable and not locked (e.g. stale config.lock): " +
detail);
}
// Create the initial empty commit on `main` if the repo has no commits yet,
// so both `main` and (later) `docmost` branches have a common base.
if (!(await this.hasAnyCommit())) {
// Make sure we are on the default branch before the first commit (covers
// the older-git case where `init -b` was not honored).
await this.run(["checkout", "-B", DEFAULT_BRANCH]);
await this.commitRaw("init vault", {
authorName: BOT_AUTHOR_NAME,
authorEmail: BOT_AUTHOR_EMAIL,
allowEmpty: true,
});
}
}
/** True if `cwd` is inside a git work-tree (the vault is initialized). */
async isRepo() {
const r = await this.runRaw(["rev-parse", "--is-inside-work-tree"]);
return r.code === 0 && r.stdout.trim() === "true";
}
/** True if a LOCAL git config key is set in the vault repo. */
async hasLocalConfig(key) {
const r = await this.runRaw(["config", "--local", "--get", key]);
return r.code === 0 && r.stdout.trim().length > 0;
}
/** True if the repo has at least one commit (HEAD resolves). */
async hasAnyCommit() {
const r = await this.runRaw(["rev-parse", "--verify", "HEAD"]);
return r.code === 0;
}
/** True if a branch with the given name exists. */
async branchExists(name) {
const r = await this.runRaw([
"rev-parse",
"--verify",
`refs/heads/${name}`,
]);
return r.code === 0;
}
/**
* Create `name` from `fromBranch` if it does not already exist. No-op (and no
* checkout) when the branch is already present.
*/
async ensureBranch(name, fromBranch) {
if (await this.branchExists(name))
return;
await this.run(["branch", name, fromBranch]);
}
/** Name of the currently checked-out branch. */
async currentBranch() {
return this.run(["rev-parse", "--abbrev-ref", "HEAD"]);
}
/** Check out an existing branch. */
async checkout(name) {
await this.run(["checkout", name]);
}
/** Stage everything (adds, modifications, deletions). */
async stageAll() {
await this.run(["add", "-A"]);
}
/**
* True if the vault is mid-merge (an unresolved merge from a previous run,
* SPEC §9 / §12). Detected via a `MERGE_HEAD` ref OR any unmerged
* (conflicted) index entries (`git ls-files -u`). The pull cycle checks this
* BEFORE any checkout so a left-over merge produces a clear, actionable
* message instead of a raw "you need to resolve your current index first"
* failure deep inside `checkout`. This is what makes re-runs converge
* (resumability, SPEC §12).
*/
async isMergeInProgress() {
// MERGE_HEAD exists exactly while a merge is in progress.
const mergeHead = await this.runRaw([
"rev-parse",
"--verify",
"--quiet",
"MERGE_HEAD",
]);
if (mergeHead.code === 0 && mergeHead.stdout.trim().length > 0)
return true;
// Fallback / belt-and-suspenders: any unmerged index entries also mean the
// working tree is mid-conflict and a checkout would refuse.
const unmerged = await this.runRaw(["ls-files", "-u"]);
return unmerged.code === 0 && unmerged.stdout.trim().length > 0;
}
/**
* Commit the currently STAGED changes with an explicit author/committer
* identity and the given trailers appended to the message body (SPEC §7.3
* provenance). Returns `true` if a commit was made, `false` if there was
* nothing to commit (graceful no-op). The caller is expected to have staged
* its changes first (e.g. via `stageAll`).
*/
async commit(message, opts) {
// Nothing staged -> nothing to commit. Treat as a no-op (SPEC §11: a
// deterministic re-pull of unchanged pages produces identical bytes, so
// git sees no diff and we must not error).
const staged = await this.runRaw([
"diff",
"--cached",
"--quiet",
]);
// `diff --cached --quiet` exits 0 when the index matches HEAD (nothing
// staged), 1 when there are staged changes.
if (staged.code === 0)
return false;
await this.commitRaw(message, opts);
return true;
}
/**
* Low-level commit used by both `commit` and `ensureRepo`'s initial commit.
* Builds the full message with appended trailers and sets author + committer
* identity via env vars (so the committer matches the author, not the repo
* default).
*/
async commitRaw(message, opts) {
const fullMessage = buildCommitMessage(message, opts.trailers);
// `--no-verify` skips pre-commit/commit-msg hooks: a global core.hooksPath
// (or any injected hook) must never interfere with engine commits in our
// dedicated vault repo.
const args = ["commit", "--no-verify", "-m", fullMessage];
if (opts.allowEmpty)
args.push("--allow-empty");
// Route through the single `runRaw` primitive; set author + committer
// identity via env vars (so the committer matches the author, not the repo
// default). Throw via the same unified message on a non-zero exit.
const r = await this.runRaw(args, {
env: {
GIT_AUTHOR_NAME: opts.authorName,
GIT_AUTHOR_EMAIL: opts.authorEmail,
GIT_COMMITTER_NAME: opts.authorName,
GIT_COMMITTER_EMAIL: opts.authorEmail,
},
});
if (r.code !== 0) {
const detail = (r.stderr || r.stdout || "").trim();
throw new Error(`git ${args.join(" ")} failed: ${detail}`);
}
}
/**
* Merge `fromBranch` into the current branch (`git merge --no-edit`).
* Fast-forwards when possible; performs a real 3-way merge otherwise. Conflict
* state is SURFACED (returned), NOT auto-resolved (SPEC §9): the conflict
* markers are left in the worktree for manual resolution by a later increment,
* and — critically — nothing is pushed to Docmost (we never write to Docmost
* anyway).
*/
async merge(fromBranch) {
const r = await this.runRaw(["merge", "--no-edit", fromBranch]);
const output = `${r.stdout}\n${r.stderr}`.trim();
if (r.code === 0) {
return { ok: true, conflict: false, output };
}
// A non-zero exit on merge most commonly means a conflict. Confirm by
// checking for unmerged paths (porcelain "U" status) so we don't mislabel
// an unrelated failure as a conflict.
const conflict = await this.hasUnmergedPaths();
return { ok: false, conflict, output };
}
/** True if the index has any unmerged (conflicted) paths. */
async hasUnmergedPaths() {
const r = await this.runRaw(["diff", "--name-only", "--diff-filter=U"]);
return r.code === 0 && r.stdout.trim().length > 0;
}
/**
* List tracked files on the current branch (paths relative to the vault
* root, forward-slash separated). An optional glob (a git pathspec) narrows
* the listing, e.g. `"*.md"`.
*
* The target wiki is RUSSIAN, so vault file names routinely contain Cyrillic
* (e.g. `Колонка.md`). With git's DEFAULT `core.quotepath=true`, `ls-files`
* returns non-ASCII paths octal-escaped and double-quoted (`"\320\232..."`),
* which `src/pull.ts` `readExisting` would then parse as garbage paths,
* breaking move/duplicate detection. We defeat that two ways at once:
* - `core.quotepath=false` disables the octal-escape/quoting. It is now the
* `runRaw` argv baseline (prepended to EVERY invocation), so we no longer
* pass it inline here.
* - `-z` emits NUL-delimited RAW UTF-8 paths (no quoting, no newline
* ambiguity), which we split on `\0`.
* We read the RAW stdout (NOT the trimming `run()` helper, which would mangle
* the NUL-delimited bytes) and split on `\0`, dropping empty entries. Paths
* are returned verbatim — git already emits forward slashes.
*/
async listTrackedFiles(glob) {
const r = await this.runRaw(["ls-files", "-z", ...(glob ? [glob] : [])]);
if (r.code !== 0) {
const detail = (r.stderr || r.stdout || "").trim();
throw new Error(`git ls-files failed: ${detail}`);
}
return r.stdout.split("\0").filter((p) => p.length > 0);
}
/**
* Diff two refs with `--name-status -M -z` and parse the NUL-delimited output
* (SPEC §6: the FS→Docmost push direction diffs `main` against
* `refs/docmost/last-pushed`). Rename detection is ON (`-M`), so a moved/renamed
* file is reported as a single `R` row with both its old and new path instead
* of a delete+add pair — that distinction is what lets the push planner tell a
* move from a delete+create (SPEC §8 "Move vs delete").
*
* `-z` makes git emit NUL-delimited RAW UTF-8 records (the Russian wiki has
* Cyrillic file names) with NO quoting/escaping. The record shape differs by
* status:
* - A/M/D: `status\0path\0`
* - R/C: `Rnnn\0oldPath\0newPath\0` (nnn = similarity score, e.g. `R100`)
* We read the RAW stdout (not the trimming `run()` helper, which would mangle
* the NUL bytes), split on `\0`, drop the trailing empty entry, and walk the
* tokens pulling 1 or 2 path tokens per status. Paths are returned verbatim.
*/
async diffNameStatus(fromRef, toRef) {
const r = await this.runRaw([
"diff",
"--name-status",
"-M",
"-z",
fromRef,
toRef,
]);
if (r.code !== 0) {
const detail = (r.stderr || r.stdout || "").trim();
throw new Error(`git diff --name-status failed: ${detail}`);
}
// Tokens alternate: <status> <path...> <status> <path...> ... With `-z`,
// each token (status code AND each path) is its own NUL-delimited field.
const tokens = r.stdout.split("\0").filter((t) => t.length > 0);
const entries = [];
let i = 0;
while (i < tokens.length) {
const raw = tokens[i++];
// The status token is e.g. `A`, `M`, `D`, or `R100` / `C075`. The leading
// letter is the change kind; any trailing digits are the similarity score.
const letter = raw[0];
if (letter === "R" || letter === "C") {
const score = Number.parseInt(raw.slice(1), 10);
const oldPath = tokens[i++];
const path = tokens[i++];
if (oldPath === undefined || path === undefined)
break; // malformed tail
entries.push({
status: letter,
path,
oldPath,
...(Number.isFinite(score) ? { score } : {}),
});
}
else if (letter === "A" || letter === "M" || letter === "D") {
const path = tokens[i++];
if (path === undefined)
break; // malformed tail
entries.push({ status: letter, path });
}
else {
// Unknown/other status (e.g. T type-change, U unmerged) — consume one
// path token defensively so the walk stays aligned, but do not emit it
// (the push planner only handles A/M/D/R/C).
i++;
}
}
return entries;
}
/**
* Resolve a ref/commit-ish to its full SHA, or `null` if it does not exist.
* `rev-parse --verify --quiet` exits non-zero (and prints nothing) for an
* unknown ref, so a non-zero exit maps cleanly to `null`. Used to read
* `refs/docmost/last-pushed` (SPEC §5) — which is absent before the first push.
*/
async revParse(ref) {
const r = await this.runRaw(["rev-parse", "--verify", "--quiet", ref]);
if (r.code !== 0)
return null;
const sha = r.stdout.trim();
return sha.length > 0 ? sha : null;
}
/**
* Read a ref to its SHA, or `null` if unset. Thin alias over `revParse`,
* named for the push direction's marker `refs/docmost/last-pushed` (SPEC §5:
* "что из `main` уже отражено в Docmost").
*/
async readRef(ref) {
return this.revParse(ref);
}
/**
* Point `ref` at `target` (`git update-ref <ref> <target>`). Used to advance
* `refs/docmost/last-pushed` to the just-pushed `main` commit after a push
* (SPEC §6 step 3 / §5). `target` may be a SHA or any commit-ish git accepts.
*/
async updateRef(ref, target) {
await this.run(["update-ref", ref, target]);
}
/**
* Fast-forward `branch` to `toCommit` — but ONLY if it is a TRUE fast-forward,
* i.e. the current `branch` tip is an ancestor of `toCommit` (verified via
* `git merge-base --is-ancestor <branch> <toCommit>`). Used to advance the
* `docmost` mirror branch after a clean push (SPEC §6 step 3 / §10): once a
* push succeeds, Docmost already contains the pushed `main` content, so the
* mirror must reflect it — otherwise the NEXT pull would diff our own write
* back and re-pull it (loop-guard).
*
* SAFETY — never force, never clobber divergent history:
* - If `branch` IS an ancestor of `toCommit`, advance it with
* `git update-ref refs/heads/<branch> <toCommit>`. The `docmost` branch is
* NOT checked out during a push (push works on `main`), so updating the ref
* directly is safe and avoids any working-tree touch.
* - If `branch` is NOT an ancestor (divergent / would-be non-fast-forward),
* do NOT move it — return `{ ok: false, reason: 'not-fast-forward' }` and
* let the caller log it. We must never overwrite a `docmost` history that
* has commits the push base does not contain.
*
* Returns `{ ok: true }` when the branch was advanced (or already at
* `toCommit`, a degenerate fast-forward), `{ ok: false, reason }` otherwise.
* A missing `branch` or `toCommit` also yields `{ ok: false }` with a reason.
*/
async fastForwardBranch(branch, toCommit) {
const branchRef = `refs/heads/${branch}`;
// Resolve both endpoints first so a missing ref is a clean refusal, not a
// confusing `merge-base` failure.
const branchSha = await this.revParse(branchRef);
if (branchSha === null) {
return { ok: false, reason: `branch ${branch} does not exist` };
}
const targetSha = await this.revParse(toCommit);
if (targetSha === null) {
return { ok: false, reason: `target ${toCommit} does not resolve` };
}
// Already at the target -> a no-op fast-forward (still ok).
if (branchSha === targetSha)
return { ok: true };
// `merge-base --is-ancestor A B` exits 0 iff A is an ancestor of B. Only a
// true ancestor is a fast-forward; anything else is divergent and refused.
const ancestor = await this.runRaw([
"merge-base",
"--is-ancestor",
branchSha,
targetSha,
]);
if (ancestor.code !== 0) {
return { ok: false, reason: "not-fast-forward" };
}
// Safe to advance: the branch is not checked out during push, so a direct
// ref update avoids a checkout/working-tree touch.
await this.updateRef(branchRef, targetSha);
return { ok: true };
}
/**
* Read a file's content at a specific ref (`git show <ref>:<path>`), or `null`
* if the path does not exist there. Used by the push direction to read the
* PRE-IMAGE of a DELETED file (e.g. at `refs/docmost/last-pushed`) so its
* `docmost:meta` — and therefore its `pageId` — can be recovered to translate
* the deletion into a `delete_page` (SPEC §6/§8: only TRACKED files, i.e. ones
* that had a pageId, are deleted in Docmost). A non-zero exit (path absent at
* that ref) maps to `null` rather than throwing.
*/
async showFileAtRef(ref, path) {
// `git show <ref>:<path>` requires the path relative to the repo root; pass
// it verbatim (forward-slash, matching `listTrackedFiles` / diff output).
const r = await this.runRaw(["show", `${ref}:${path}`]);
if (r.code !== 0)
return null;
return r.stdout;
}
}
/**
* Build the environment for a vault git invocation (SPEC §12 cwd-isolation).
* Used by the single `runRaw` primitive every git command flows through, so
* these pins apply uniformly (including the `git --version` preflight).
*
* cwd-isolation is this module's central safety guarantee: every git command
* MUST operate on the vault repo at `cwd: vaultPath` and nothing else. An
* inherited `GIT_DIR` / `GIT_WORK_TREE` in `process.env` would silently
* redirect the operation away from `cwd` (e.g. to the source repo or another
* checkout), defeating that guarantee. So we always strip them, regardless of
* whatever else the caller adds (author/committer identity, etc.).
*
* Exported for unit testing.
*/
export function vaultGitEnv(extra) {
const env = {
...process.env,
// Locale-independent output (defense in depth). We never parse localized
// prose, but pinning the locale prevents a future regression where some
// git message we DO key on is translated by an inherited LC_ALL/LANG.
LC_ALL: "C",
LANG: "C",
// Never page (we already pass --no-pager, but a stray GIT_PAGER could still
// bite) and never block on an interactive prompt (e.g. credentials) — the
// daemon runs unattended and must not hang.
GIT_PAGER: "cat",
GIT_TERMINAL_PROMPT: "0",
...extra,
};
delete env.GIT_DIR;
delete env.GIT_WORK_TREE;
return env;
}
/**
* Build a commit message body with trailer lines appended (SPEC §7.3). The
* trailers are separated from the subject by a blank line so `git interpret-
* trailers` / `git log --format=%(trailers)` parse them as trailers.
* Exported for unit testing.
*/
export function buildCommitMessage(subject, trailers) {
if (!trailers || trailers.length === 0)
return subject;
return `${subject}\n\n${trailers.join("\n")}`;
}

View File

@@ -0,0 +1,44 @@
/**
* Pure page-tree -> vault path mapping (SPEC §12).
*
* Given the flat list of page nodes for a space (as returned by
* `listAllSpacePages`), compute for every page a deterministic, collision-free
* destination: a folder path (root -> leaf ancestors) plus a file stem (the
* page's own name, no extension). This module is intentionally PURE and
* dependency-free apart from the sanitization helpers, so the whole tree ->
* path logic is unit-testable without any I/O. The names are COSMETIC; identity
* lives in each file's meta block (pageId / slugId).
*/
/** Flat page node as returned by `listAllSpacePages` (no content). */
export interface PageNode {
id: string;
title?: string;
slugId?: string;
parentPageId?: string | null;
hasChildren?: boolean;
}
/** A page's resolved vault destination: folder path + file stem. */
export interface VaultEntry {
/** Folder path, root -> leaf (the page's ancestors). Empty for a root page. */
segments: string[];
/** The page's own file name without extension. */
stem: string;
}
/**
* Build the full vault layout for a space.
*
* Returns a Map keyed by pageId -> `{ segments, stem }`. The result is
* deterministic for a given input and guarantees every full destination path
* (`[...segments, stem].join("/")`) is unique, so no page can silently overwrite
* another.
*
* Disambiguation is layered:
* 1. Sibling collisions (same sanitized title under the same parent) are
* resolved with a stable ` ~<slugId>` suffix (the suffix is itself
* sanitized, since slugId/id is untrusted data that must never inject a
* path separator).
* 2. A final full-path pass catches residual collisions that sibling-scoping
* cannot see — e.g. two pages whose parents are BOTH outside the input set
* both bucket at the root with `segments: []`.
*/
export declare function buildVaultLayout(pages: PageNode[]): Map<string, VaultEntry>;

View File

@@ -0,0 +1,170 @@
/**
* Pure page-tree -> vault path mapping (SPEC §12).
*
* Given the flat list of page nodes for a space (as returned by
* `listAllSpacePages`), compute for every page a deterministic, collision-free
* destination: a folder path (root -> leaf ancestors) plus a file stem (the
* page's own name, no extension). This module is intentionally PURE and
* dependency-free apart from the sanitization helpers, so the whole tree ->
* path logic is unit-testable without any I/O. The names are COSMETIC; identity
* lives in each file's meta block (pageId / slugId).
*/
import { sanitizeTitle, disambiguate } from "./sanitize.js";
/**
* Build the full vault layout for a space.
*
* Returns a Map keyed by pageId -> `{ segments, stem }`. The result is
* deterministic for a given input and guarantees every full destination path
* (`[...segments, stem].join("/")`) is unique, so no page can silently overwrite
* another.
*
* Disambiguation is layered:
* 1. Sibling collisions (same sanitized title under the same parent) are
* resolved with a stable ` ~<slugId>` suffix (the suffix is itself
* sanitized, since slugId/id is untrusted data that must never inject a
* path separator).
* 2. A final full-path pass catches residual collisions that sibling-scoping
* cannot see — e.g. two pages whose parents are BOTH outside the input set
* both bucket at the root with `segments: []`.
*/
export function buildVaultLayout(pages) {
// Index pages by id so the parent chain can be walked. Guard against
// duplicate ids in the input (first one wins).
const byId = new Map();
for (const p of pages) {
if (p && p.id && !byId.has(p.id))
byId.set(p.id, p);
}
// Resolve each node's display name once, deterministically, tracking sibling
// collisions per parent. `usedBySibling` maps a parent key -> set of names
// already taken under that parent. The bucket key is the node's parent ONLY
// when that parent is actually present in `byId`; otherwise (null parent, or
// an orphan whose parent is outside the input set) the node buckets at
// `"__root__"`. This is critical: orphans land at the vault root (see
// `folderSegmentsFor`), so they MUST share the root bucket with real root
// pages to be disambiguated against each other here — making `nameById` final
// before any `segments` are computed, so no ancestor name can drift later.
const usedBySibling = new Map();
const nameById = new Map();
for (const p of pages) {
if (p && p.id && !nameById.has(p.id)) {
const parentKey = p.parentPageId && byId.has(p.parentPageId) ? p.parentPageId : "__root__";
nameById.set(p.id, nameForNode(p, parentKey, usedBySibling));
}
}
// Every id we index above MUST get a resolved name; this helper returns it
// and THROWS if it is somehow absent, rather than silently recomputing a
// DIFFERENT, non-disambiguated name (which would desync a folder segment from
// its target file).
const nameOf = (id) => {
const name = nameById.get(id);
if (name === undefined) {
throw new Error(`buildVaultLayout: no resolved name for page id ${id}`);
}
return name;
};
// Build the folder path for a page by walking parentPageId to the root. The
// page's OWN name is the file stem; its ancestors become folders. A `visited`
// guard prevents an infinite loop on a malformed parent cycle.
const folderSegmentsFor = (node) => {
const ancestors = [];
const visited = new Set();
let current = node.parentPageId
? byId.get(node.parentPageId)
: undefined;
while (current && current.id && !visited.has(current.id)) {
visited.add(current.id);
ancestors.unshift(nameOf(current.id));
current = current.parentPageId
? byId.get(current.parentPageId)
: undefined;
}
return ancestors;
};
// First pass: compute the provisional { segments, stem } for every node.
const layout = new Map();
for (const p of pages) {
if (!p || !p.id || layout.has(p.id))
continue;
layout.set(p.id, {
segments: folderSegmentsFor(p),
stem: nameOf(p.id),
});
}
// FOLDER-NOTE transform (native-Obsidian layout): a page WITH CHILDREN lives at
// `<…>/<stem>/<stem>.md` — its body is the folder-note INSIDE its own folder
// (LostPaul Folder Notes convention), and its children sit alongside it in that
// folder. A leaf stays `<…>/<stem>.md`. Children's segments already point into
// the parent's folder (folderSegmentsFor walks ancestor NAMES), so only the
// parent's own file relocates here; the sibling name pass above already made
// the parent name unique, so folder == file name stays consistent.
for (const p of pages) {
if (!p || !p.id)
continue;
const entry = layout.get(p.id);
if (entry && p.hasChildren) {
entry.segments = [...entry.segments, entry.stem];
}
}
// Final full-path uniqueness pass — a belt-and-suspenders safety net. Note
// that cross-bucket (orphan/root) collisions are now resolved in the name pass
// above (orphans share the "__root__" bucket), so ancestor names are final
// before `segments` are built and this pass should rarely/never re-stem an
// ancestor. It only re-stems the colliding LATER leaf via the sanitized
// slugId/id, then (if still colliding) appends the id.
//
// Process FOLDER-NOTES (pages with children) FIRST so a parent claims its
// canonical `<name>/<name>.md` before a same-named CHILD — the child (a leaf)
// is the one that disambiguates, never the folder-note.
const usedPaths = new Set();
const seenIds = new Set();
const pathKey = (e) => [...e.segments, e.stem].join("/");
const ordered = pages
.filter((p) => Boolean(p && p.id))
.sort((a, b) => Number(Boolean(b.hasChildren)) - Number(Boolean(a.hasChildren)));
for (const p of ordered) {
if (seenIds.has(p.id))
continue;
seenIds.add(p.id);
const entry = layout.get(p.id);
if (!entry)
continue;
if (usedPaths.has(pathKey(entry))) {
// First attempt: disambiguate the stem with the sanitized slugId (or id).
entry.stem = disambiguate(entry.stem, sanitizeTitle(p.slugId ?? p.id));
if (usedPaths.has(pathKey(entry))) {
// Still colliding: append the (sanitized) id as a last resort. The id
// is globally unique, so this always resolves the collision.
entry.stem = disambiguate(entry.stem, sanitizeTitle(p.id));
}
}
usedPaths.add(pathKey(entry));
}
return layout;
}
/**
* Compute a deterministic, collision-free name for a node among its SIBLINGS.
* `usedBySibling` maps a parent key -> set of names already taken, so two
* siblings that sanitize to the same name get a stable ` ~slugId` suffix
* (SPEC §12). The suffix is itself passed through `sanitizeTitle`, because the
* slugId/id is a second untrusted-data channel that must never leak a path
* separator into the name. `parentKey` is supplied by the caller (it resolves
* to `"__root__"` for root pages AND for orphans whose parent is outside the
* input set, so they share one bucket). The name is COSMETIC; identity lives in
* the meta block.
*/
function nameForNode(node, parentKey, usedBySibling) {
let used = usedBySibling.get(parentKey);
if (!used) {
used = new Set();
usedBySibling.set(parentKey, used);
}
let name = sanitizeTitle(node.title ?? "");
if (used.has(name)) {
// Sibling collision: disambiguate with the stable, sanitized slugId (fall
// back to the sanitized pageId if no slugId is present).
name = disambiguate(name, sanitizeTitle(node.slugId ?? node.id));
}
used.add(name);
return name;
}

View File

@@ -0,0 +1,13 @@
/**
* Stable hash of a page's markdown BODY (SPEC §10 "хэш тела"). Deterministic:
* the same input string always yields the same digest, a different input a
* different one. Used to recognize our own write later (loop suppression).
*
* We hash the body STRING as-is (UTF-8) with SHA-256 and return lowercase hex.
* SPEC §10 keys on the body hash rather than file bytes; callers decide WHAT
* counts as "the body" (here it is the exact string passed in — typically the
* self-contained markdown that was pushed). No normalization is applied: the
* caller is responsible for passing a canonical/stable representation if it
* wants hash equality across cosmetic-only differences.
*/
export declare function bodyHash(markdownBody: string): string;

View File

@@ -0,0 +1,28 @@
/**
* Loop-guard primitives (SPEC §10). The sync engine must never re-pull its OWN
* write as if it were a remote edit: after a push, the next poll will see the
* page it just wrote with a fresh `updatedAt`. To suppress that, we key on two
* signals — the body HASH of what we pushed (this module) and the `updatedAt`
* returned by the write — recorded per page at push time.
*
* This module owns the PURE, deterministic body-hash. The CONSUMPTION on the
* pull side (comparing an incoming page's body hash against the last pushed hash
* to decide "this is our own write, ignore it") is a future increment — here we
* only PRODUCE the hash and the per-page push record (see `src/push.ts`).
*/
import { createHash } from "node:crypto";
/**
* Stable hash of a page's markdown BODY (SPEC §10 "хэш тела"). Deterministic:
* the same input string always yields the same digest, a different input a
* different one. Used to recognize our own write later (loop suppression).
*
* We hash the body STRING as-is (UTF-8) with SHA-256 and return lowercase hex.
* SPEC §10 keys on the body hash rather than file bytes; callers decide WHAT
* counts as "the body" (here it is the exact string passed in — typically the
* self-contained markdown that was pushed). No normalization is applied: the
* caller is responsible for passing a canonical/stable representation if it
* wants hash equality across cosmetic-only differences.
*/
export function bodyHash(markdownBody) {
return createHash("sha256").update(markdownBody, "utf8").digest("hex");
}

136
packages/git-sync/build/engine/pull.d.ts vendored Normal file
View File

@@ -0,0 +1,136 @@
import type { GitSyncClient } from "./client.types.js";
import { type PageNode } from "./layout.js";
import { VaultGit } from "./git.js";
import { type MovedEntry, type DeletionDecision } from "./reconcile.js";
/**
* Injectable IO for `readExisting` (R-Pull-1, test-strategy report §5). The real
* `main` wires these to `git.listTrackedFiles("*.md")` and an `fs.readFile`
* rooted at the vault; tests pass fakes so the parsing/skip rules are unit-
* testable without a real git repo or filesystem.
*/
export interface ReadExistingDeps {
/** List tracked .md paths (forward-slash, vault-relative). */
listTracked: () => Promise<string[]>;
/** Read a tracked file's text by its (forward-slash) vault-relative path. */
readFile: (relPath: string) => Promise<string>;
}
/**
* Read every tracked .md file in the vault and recover `{ pageId, relPath }` from
* its `gitmost_id` frontmatter (native-Obsidian format). Files without a
* `gitmost_id` are skipped (they are not engine-tracked pages yet — e.g. a stray
* hand-written Obsidian file; PUSH adopts those separately).
*
* The IO is injected (R-Pull-1) so this is testable with fakes. Skip rules:
* - a `readFile` rejection (tracked but missing on disk, a mid-operation race)
* -> skipped, NOT thrown; the next pull converges;
* - no `gitmost_id` frontmatter (`parsePageFile` -> id null) -> skipped.
*/
export declare function readExisting(deps: ReadExistingDeps): Promise<{
pageId: string;
relPath: string;
}[]>;
/**
* Input to the PURE `computePullActions` (R-Pull-2). All data, no IO: the live
* tree nodes + completeness flag (from `listSpaceTree`) and the parsed
* `existing` tracked files (from `readExisting`).
*/
export interface PullActionsInput {
/** Live page nodes for the space (from `listSpaceTree`). */
pages: PageNode[];
/** Whether the live tree fetch was COMPLETE (SPEC §8 suppression). */
treeComplete: boolean;
/** Parsed tracked files: `{ pageId, relPath }` (from `readExisting`). */
existing: {
pageId: string;
relPath: string;
}[];
}
/**
* The PURE decisions object computed by `computePullActions` (no IO). It holds
* the reconciliation plan plus the SPEC §8 absence-deletion decision, with the
* suppression already folded in: `toDelete` is the POST-suppression set the
* caller should actually remove (empty when `deletionDecision.apply` is false).
*/
export interface PullActions {
/** Pages to (re)write at their relPath (add + update + move target). */
toWrite: {
pageId: string;
relPath: string;
}[];
/** Moves: write new path, then remove old path (only on a successful write). */
moved: MovedEntry[];
/**
* Absence-based paths to delete AFTER suppression. Empty when the decision
* suppressed deletions this cycle, so the caller can apply it unconditionally.
*/
toDelete: string[];
/** Why absence deletions were (or were not) applied (for logging + tests). */
deletionDecision: DeletionDecision;
/** Tracked-file count (for the suppression log messages). */
existingCount: number;
/** Planned absence-delete count BEFORE suppression (for the log message). */
plannedDeleteCount: number;
}
/**
* PURE pull-action planner (R-Pull-2, test-strategy report §5). Takes the live
* tree nodes + completeness + existing tracked files and returns the full set of
* decisions with NO IO:
*
* - builds the vault layout (deterministic relPath per live page),
* - `planReconciliation` -> toWrite / moved / absence-toDelete,
* - `decideAbsenceDeletions` -> the SPEC §8 suppression (incomplete-fetch +
* empty-live + mass-delete guard), folded IN here so `toDelete` is the
* POST-suppression set (empty when suppressed).
*
* Moves are NOT governed by the suppression: a moved page is present in `live`,
* so its old-path removal is real (the caller still gates it on the write
* succeeding). The expensive content fetch / file write / git ops happen in the
* thin `applyPullActions`.
*/
export declare function computePullActions(input: PullActionsInput): PullActions;
/**
* Injectable IO for `applyPullActions` (R-Pull-2). The real `main` wires these
* to the live client, the vault git wrapper, and `node:fs/promises`; tests pass
* fakes that RECORD calls so the ordering + the move-on-success data-loss guard
* are testable without real git/fs/network.
*/
export interface ApplyPullActionsDeps {
client: Pick<GitSyncClient, "getPageJson">;
git: Pick<VaultGit, "stageAll" | "commit" | "checkout" | "merge">;
/** Write a file by ABSOLUTE path (mkdir of the parent is done internally). */
writeFile: (absPath: string, text: string) => Promise<void>;
/** Recursive mkdir of an ABSOLUTE directory path. */
mkdir: (absDir: string) => Promise<void>;
/** Remove a file by ABSOLUTE path (force: a missing file is a no-op). */
rm: (absPath: string) => Promise<void>;
}
/** Outcome counters from `applyPullActions` (for the summary + tests). */
export interface ApplyResult {
written: number;
movedApplied: number;
deleted: number;
failed: number;
committed: boolean;
merge: {
ok: boolean;
conflict: boolean;
output: string;
};
}
/**
* THIN IO applier (R-Pull-2). Performs the side effects in the EXACT current
* order, with all the original safety guards preserved bit-for-bit:
*
* 1. for each `toWrite`: fetch content (`client.getPageJson`) -> stabilize
* (normalize-on-write fixpoint, SPEC §11) -> mkdir + write. One bad page
* never aborts the pull (bounded-concurrency pool, fault-tolerant).
* 2. apply MOVE old-path removals — ONLY when the planner marked the old path
* removable AND the new-path write SUCCEEDED (the ⭐ data-loss guard: a
* failed move-write keeps the old path so the page never vanishes).
* 3. apply (post-suppression) absence deletes.
* 4. stageAll + commit on `docmost` (subject from ACTUAL written/deleted
* counts) + checkout main + merge docmost (conflicts surfaced, SPEC §9).
*
* `vaultRoot` roots the relPath -> absolute-path conversion for the fs deps.
*/
export declare function applyPullActions(deps: ApplyPullActionsDeps, actions: PullActions, vaultRoot: string): Promise<ApplyResult>;

View File

@@ -0,0 +1,284 @@
/**
* Pull cycle — Docmost -> vault (SPEC §6 "Docmost -> ФС").
*
* This increment turns the read-only mirror into the git-backed pull cycle:
*
* 1. ensureRepo(vault); refuse if a merge is in progress (SPEC §9/§12);
* ensureBranch("docmost", "main") (SPEC §5 branches)
* 2. checkout docmost
* 3. fetch the live tree (listSpaceTree -> {pages, complete}) -> compute the
* desired `live` files (relPath via the pure sanitize/disambiguation layout)
* 4. parse `existing` tracked .md files (pageId + relPath from gitmost_id frontmatter)
* 5. plan = planReconciliation(live, existing) (pure, SPEC §5/§8); toDelete
* is absence-only, moves are separate
* 6. decideAbsenceDeletions: SUPPRESS absence deletions on an incomplete tree
* fetch (SPEC §8) and behind the mass-delete guard (defense in depth)
* 7. write each live page in its fixpoint form (normalize-on-write, SPEC §11);
* apply moved-old-path removals (only when the move write SUCCEEDED) and
* absence-delete removals (only when the decision allowed them)
* 8. stageAll + commit on `docmost` with the provenance trailer (SPEC §7.3)
* 9. checkout main + merge docmost (conflicts are surfaced, NOT auto-resolved,
* SPEC §9); push is deferred (SPEC §7)
* 10. one-line summary
*
* DIRECTION IS Docmost -> vault ONLY. Nothing here ever writes to Docmost
* (read-only: listSpaceTree + getPageJson). All git operations run against
* the vault repo (`cwd = vaultPath`), never the source repo (see ./git.ts).
*
* The client seam is the native `GitSyncClient` (`Pick<GitSyncClient, ...>`);
* the gitmost server drives the engine in-process (there is no standalone CLI
* entry point).
*/
import { dirname } from "node:path";
import { sep } from "node:path";
import { parsePageFile, serializePageFile } from "../lib/page-file.js";
import { buildVaultLayout } from "./layout.js";
import { BOT_AUTHOR_NAME, BOT_AUTHOR_EMAIL, DEFAULT_BRANCH, } from "./git.js";
import { planReconciliation, decideAbsenceDeletions, } from "./reconcile.js";
import { stabilizePageBody } from "./stabilize.js";
// Engine-only mirror branch (SPEC §5): the engine writes here, humans never do.
const DOCMOST_BRANCH = "docmost";
// Machine-readable provenance the loop-guard keys on (SPEC §7.3 / §12).
const SOURCE_TRAILER = "Docmost-Sync-Source: docmost";
// Number of pages fetched/stabilized concurrently. Bounded so a large space
// does not open thousands of simultaneous requests/conversions at once.
const CONCURRENCY = 6;
// How often to log incremental progress (every N completed pages).
const PROGRESS_EVERY = 25;
/** Convert a vault-relative path (forward-slash) to an absolute FS path. */
function relToAbs(vaultRoot, relPath) {
return [vaultRoot, ...relPath.split("/")].join("/");
}
/** Convert an absolute/relative segment list under the vault to a relPath. */
function segmentsToRelPath(segments, stem) {
return [...segments, `${stem}.md`].join("/");
}
/**
* Read every tracked .md file in the vault and recover `{ pageId, relPath }` from
* its `gitmost_id` frontmatter (native-Obsidian format). Files without a
* `gitmost_id` are skipped (they are not engine-tracked pages yet — e.g. a stray
* hand-written Obsidian file; PUSH adopts those separately).
*
* The IO is injected (R-Pull-1) so this is testable with fakes. Skip rules:
* - a `readFile` rejection (tracked but missing on disk, a mid-operation race)
* -> skipped, NOT thrown; the next pull converges;
* - no `gitmost_id` frontmatter (`parsePageFile` -> id null) -> skipped.
*/
export async function readExisting(deps) {
const tracked = await deps.listTracked();
const existing = [];
for (const relPath of tracked) {
// git ls-files always emits forward-slash paths; normalize just in case.
const rel = relPath.split(sep).join("/");
let text;
try {
text = await deps.readFile(rel);
}
catch {
// Tracked but missing on disk (mid-operation race) — skip; the next pull
// converges.
continue;
}
const { id } = parsePageFile(text);
if (id)
existing.push({ pageId: id, relPath: rel });
}
return existing;
}
/**
* PURE pull-action planner (R-Pull-2, test-strategy report §5). Takes the live
* tree nodes + completeness + existing tracked files and returns the full set of
* decisions with NO IO:
*
* - builds the vault layout (deterministic relPath per live page),
* - `planReconciliation` -> toWrite / moved / absence-toDelete,
* - `decideAbsenceDeletions` -> the SPEC §8 suppression (incomplete-fetch +
* empty-live + mass-delete guard), folded IN here so `toDelete` is the
* POST-suppression set (empty when suppressed).
*
* Moves are NOT governed by the suppression: a moved page is present in `live`,
* so its old-path removal is real (the caller still gates it on the write
* succeeding). The expensive content fetch / file write / git ops happen in the
* thin `applyPullActions`.
*/
export function computePullActions(input) {
const { pages, treeComplete, existing } = input;
const layout = buildVaultLayout(pages);
const live = [];
for (const p of pages) {
if (!p || !p.id)
continue;
const entry = layout.get(p.id);
if (!entry)
continue;
live.push({
pageId: p.id,
relPath: segmentsToRelPath(entry.segments, entry.stem),
});
}
// Plan reconciliation (pure). `plan.toDelete` is ABSENCE-based only;
// `plan.moved` carries move old-path removals separately.
const plan = planReconciliation(live, existing);
// Decide whether the ABSENCE-based deletions may be applied this cycle
// (SPEC §8): incomplete-fetch suppression + empty-live + mass-delete guard.
// Moves are NOT governed by this.
const deletionDecision = decideAbsenceDeletions({
treeComplete,
liveCount: live.length,
existingCount: existing.length,
deleteCount: plan.toDelete.length,
});
return {
toWrite: plan.toWrite,
moved: plan.moved,
// Fold the suppression in: a suppressed cycle deletes nothing.
toDelete: deletionDecision.apply ? plan.toDelete : [],
deletionDecision,
existingCount: existing.length,
plannedDeleteCount: plan.toDelete.length,
};
}
/**
* THIN IO applier (R-Pull-2). Performs the side effects in the EXACT current
* order, with all the original safety guards preserved bit-for-bit:
*
* 1. for each `toWrite`: fetch content (`client.getPageJson`) -> stabilize
* (normalize-on-write fixpoint, SPEC §11) -> mkdir + write. One bad page
* never aborts the pull (bounded-concurrency pool, fault-tolerant).
* 2. apply MOVE old-path removals — ONLY when the planner marked the old path
* removable AND the new-path write SUCCEEDED (the ⭐ data-loss guard: a
* failed move-write keeps the old path so the page never vanishes).
* 3. apply (post-suppression) absence deletes.
* 4. stageAll + commit on `docmost` (subject from ACTUAL written/deleted
* counts) + checkout main + merge docmost (conflicts surfaced, SPEC §9).
*
* `vaultRoot` roots the relPath -> absolute-path conversion for the fs deps.
*/
export async function applyPullActions(deps, actions, vaultRoot) {
const { client, git } = deps;
// Emit the SPEC §8 suppression warnings (preserved from the original `main`).
const decision = actions.deletionDecision;
if (!decision.apply) {
if (decision.reason === "incomplete-fetch") {
console.warn("pull: tree fetch incomplete — deletions suppressed this cycle (SPEC §8)");
}
else if (decision.reason === "empty-live") {
console.warn(`pull: live fetch returned 0 pages but ${actions.existingCount} file(s) are ` +
`tracked — deletions suppressed this cycle (SPEC §8). Re-run when ` +
`Docmost is reachable.`);
}
else {
console.warn(`pull: plan would delete ${actions.plannedDeleteCount} of ${actions.existingCount} ` +
`tracked file(s) (mass-delete guard) — deletions suppressed this ` +
`cycle (SPEC §8). Verify the live Docmost tree, then re-run.`);
}
}
// 1. Write each live page in its fixpoint form (normalize-on-write, SPEC §11).
let written = 0;
let failed = 0;
let completed = 0;
let nextIndex = 0;
// pageIds whose write FAILED. A moved page whose new-path write failed must
// NOT have its old path removed (otherwise the page vanishes entirely).
const failedPageIds = new Set();
const writeOne = async (w) => {
try {
const page = await client.getPageJson(w.pageId);
// Native-Obsidian format: a minimal `gitmost_id` frontmatter + the fixpoint
// markdown body. title/parent/space are DERIVED (filename / folder / repo),
// so nothing but the pageId is persisted as meta.
const text = serializePageFile(page.id, await stabilizePageBody(page.content));
const abs = relToAbs(vaultRoot, w.relPath);
await deps.mkdir(dirname(abs));
await deps.writeFile(abs, text);
written++;
}
catch (err) {
failed++;
failedPageIds.add(w.pageId);
console.error(`pull: failed page ${w.pageId}:`, err instanceof Error ? err.message : String(err));
}
finally {
completed++;
if (completed % PROGRESS_EVERY === 0) {
console.log(`pulled ${completed}/${actions.toWrite.length}`);
}
}
};
// Bounded-concurrency pool (dependency-free): a fixed set of runners each
// take the next index until the write list is exhausted. One bad page never
// aborts the whole pull (mirrors the fault-tolerant tree walk).
const runner = async () => {
while (true) {
const i = nextIndex++;
if (i >= actions.toWrite.length)
return;
await writeOne(actions.toWrite[i]);
}
};
await Promise.all(Array.from({ length: Math.min(CONCURRENCY, actions.toWrite.length) || 1 }, () => runner()));
// Helper: `rm` with force:true is a no-op if the file is already gone.
const removePath = async (rel, what) => {
try {
await deps.rm(relToAbs(vaultRoot, rel));
return true;
}
catch (err) {
console.error(`pull: failed to ${what} ${rel}:`, err instanceof Error ? err.message : String(err));
return false;
}
};
// 2. Apply MOVE old-path removals. A moved page IS present in `live`, so its
// old path is genuinely stale — NOT subject to the incomplete-fetch
// suppression. BUT only remove the old path when (a) the planner marked it
// removable (not reused by another live page) AND (b) the new-path write
// actually SUCCEEDED — otherwise we would delete the only copy of a page
// whose move-write failed (⭐ data-loss guard).
let movedApplied = 0;
for (const m of actions.moved) {
if (!m.removeOldPath)
continue;
if (failedPageIds.has(m.pageId)) {
console.warn(`pull: move write for ${m.pageId} failed — keeping old path ` +
`${m.fromRelPath} (SPEC §8)`);
continue;
}
if (await removePath(m.fromRelPath, "remove moved old path"))
movedApplied++;
}
// 3. Apply ABSENCE-based deletions — `actions.toDelete` is ALREADY the
// post-suppression set (empty when the decision suppressed them, SPEC §8).
let deleted = 0;
for (const rel of actions.toDelete) {
if (await removePath(rel, "delete"))
deleted++;
}
// 4. Stage + commit on `docmost` (only if there is something to commit).
// Deterministic stabilized output means unchanged pages produce identical
// bytes -> git sees no diff -> no churn (SPEC §11). The subject reflects the
// ACTUAL work applied (pages written + files deleted), not the planned size,
// so a run with failures does not over-report (SPEC §5 nit).
const subject = deleted > 0
? `docmost: sync ${written} page(s), ${deleted} deleted`
: `docmost: sync ${written} page(s)`;
await git.stageAll();
const committed = await git.commit(subject, {
authorName: BOT_AUTHOR_NAME,
authorEmail: BOT_AUTHOR_EMAIL,
trailers: [SOURCE_TRAILER],
});
// Merge docmost -> main. Conflicts are surfaced and left in git (SPEC §9);
// we never push to Docmost. Push to a git remote is deferred (SPEC §7).
await git.checkout(DEFAULT_BRANCH);
const merge = await git.merge(DOCMOST_BRANCH);
if (merge.conflict) {
console.error("pull: merge of docmost -> main CONFLICTED. Conflict markers were left " +
"in the vault for manual resolution (SPEC §9). Nothing is pushed to " +
"Docmost (read-only). Resolve locally, then re-run.");
}
else if (!merge.ok) {
console.error(`pull: merge of docmost -> main failed: ${merge.output}`);
}
console.log("pull: git push to remote is DEFERRED in this increment (SPEC §7).");
return { written, movedApplied, deleted, failed, committed, merge };
}

504
packages/git-sync/build/engine/push.d.ts vendored Normal file
View File

@@ -0,0 +1,504 @@
/**
* Push cycle — vault -> Docmost (SPEC §6 "ФС → Docmost"), FIRST increment.
*
* This module mirrors the structure of `./pull.ts`: a set of VaultGit diff/ref
* primitives (in `./git.ts`), a PURE planner (`computePushActions`) that turns
* a git diff into a classified action set with NO IO, and a THIN injectable
* applier (`applyPushActions`) exercised in tests via fakes only.
*
* Direction is vault -> Docmost. The diff is `main` against
* `refs/docmost/last-pushed` (SPEC §6 step 2); each `A`/`M`/`D`/`R` row is
* translated into a Docmost mutation by `pageId` identity (SPEC §4):
* - A without pageId -> create_page (then write the assigned pageId back).
* - A with pageId -> update (restored/copied file; the page already exists).
* - M -> update content (collab/Yjs path, SPEC §2/§15.6).
* - D -> delete_page (pageId recovered from the PRE-IMAGE meta).
* - R -> rename/move (CLASSIFIED here, APPLIED in push #3).
*
* MOVE/RENAME APPLY (push #3) — DONE here. `classifyRenameMoves` (PURE) resolves
* each `renamesMoves` entry into the Docmost op(s) it needs, comparing the PATH-
* derived parent (SPEC §5: the file path is the source of truth for tree
* position, NOT stale `meta.parentPageId`) and the meta title; `applyPushActions`
* then calls `move_page` / `rename_page` (both for a reparent+retitle), or
* records a NO-OP for a cosmetic local-only file-path rename.
*
* The client seam is the native `GitSyncClient` (`Pick<GitSyncClient, ...>`);
* the gitmost server drives the engine in-process (there is no standalone CLI
* entry point).
*/
import { type DocmostMdMeta } from "../lib/index.js";
import type { GitSyncClient } from "./client.types.js";
import type { DiffEntry } from "./git.js";
import { VaultGit } from "./git.js";
import { type Settings } from "./settings.js";
export type { DiffEntry } from "./git.js";
/** A page to CREATE in Docmost (new local file, meta has no pageId yet). */
export interface CreateAction {
/** Vault-relative path of the new file. */
path: string;
}
/** A page whose CONTENT changed (meta carries the existing pageId). */
export interface UpdateAction {
pageId: string;
/** Vault-relative path of the changed file. */
path: string;
}
/** A page to soft-delete in Docmost (Trash, SPEC §8). */
export interface DeleteAction {
pageId: string;
}
/** A renamed/moved page (same pageId, new path). Resolution DEFERRED. */
export interface RenameMoveAction {
pageId: string;
oldPath: string;
newPath: string;
}
/**
* A CLASSIFIED rename/move (push #3): a `RenameMoveAction` resolved into the
* Docmost op(s) it actually needs. The file PATH is the source of truth for tree
* position (SPEC §5: "истина связи — pageId, не путь" — the path is COSMETIC and
* LOCAL, the page identity is its pageId), so we compare the RESOLVED parent of
* the new path against the resolved parent of the old path, and the title in the
* current meta against the title in the previous meta. Each sub-op is emitted
* ONLY when something real changed:
* - `move` — the resolved parent page changed (reparent in Docmost). A `null`
* `parentPageId` means the new parent is ROOT (the file sits at the space
* root, no enclosing folder).
* - `rename` — the page title changed (a pure title edit in Docmost).
* - `noop` — neither changed: a purely LOCAL file-path rename (same parent,
* same title). The page identity is its pageId, so Docmost is NOT called.
* `move` and `rename` are independent and may BOTH be present (reparent + retitle).
*/
export interface RenameMoveActionClassified {
pageId: string;
oldPath: string;
newPath: string;
/** Present iff the resolved parent changed -> `move_page` (reparent). */
move?: {
parentPageId: string | null;
};
/** Present iff the title changed -> `rename_page` (title-only). */
rename?: {
title: string;
};
/** True iff neither parent nor title changed (cosmetic local-only rename). */
noop?: true;
}
/**
* Injected resolvers for the PURE `classifyRenameMoves` (push #3). Both are PURE
* given a path + side; the real `main` (a follow-up) wires them to the file tree
* (`readFile` for `current`, `git.showFileAtRef` for `prev`), tests pass plain
* lookups. SPEC §5 path-as-truth:
* - `metaAt`: the file's synthetic native meta at that side (title from the
* filename, pageId from the `gitmost_id` frontmatter).
* - `resolveParentPageId`: the pageId of the page whose FILE is the parent
* FOLDER's `.md` (one level up from the given path), or `null` for ROOT.
*/
export interface ClassifyRenameMovesDeps {
metaAt: (path: string, side: MetaSide) => DocmostMdMeta | null;
resolveParentPageId: (path: string, side: MetaSide) => string | null;
}
/**
* PURE classifier for the `renamesMoves` produced by `computePushActions`
* (push #3, SPEC §5/§6/§8). Resolves each `{pageId, oldPath, newPath}` into the
* Docmost op(s) it needs, with NO IO (both resolvers are injected).
*
* SPEC §5 — the file PATH is the source of truth for tree position, NOT the
* (possibly stale) `meta.parentPageId`. So the NEW parent is resolved from
* `newPath`'s enclosing folder, and the OLD parent from `oldPath`'s enclosing
* folder, via `deps.resolveParentPageId`. The title comes from the meta.
*
* For each entry:
* - `newParent = resolveParentPageId(newPath, 'current')`,
* `oldParent = resolveParentPageId(oldPath, 'prev')`.
* - `newTitle = metaAt(newPath,'current')?.title`,
* `oldTitle = metaAt(oldPath,'prev')?.title`.
* - include `move` iff `newParent !== oldParent` (a real reparent),
* - include `rename` iff `newTitle` is a NON-EMPTY string AND differs from
* `oldTitle` (a real title edit; an empty/absent new title is never a rename),
* - if NEITHER applies -> `noop: true` (a cosmetic local-only file-path rename;
* the page is its pageId, so Docmost is not touched).
*/
export declare function classifyRenameMoves(renamesMoves: RenameMoveAction[], deps: ClassifyRenameMovesDeps): RenameMoveActionClassified[];
/** The classified set of push actions (PURE output of `computePushActions`). */
export interface PushActions {
creates: CreateAction[];
updates: UpdateAction[];
deletes: DeleteAction[];
renamesMoves: RenameMoveAction[];
/**
* Diff rows that could NOT be classified into an action, with a reason — e.g.
* a deleted file whose PRE-IMAGE meta carried no recoverable pageId (the
* untracked-file guard, SPEC §8: only files that were tracked with a pageId
* are deleted in Docmost). Carried so the caller can log them.
*/
skipped: {
path: string;
status: DiffEntry["status"];
reason: string;
}[];
}
/**
* Which tree a `metaAt` lookup reads the file's native meta from:
* - `current`: the current `main` tree (the live file content) — used for
* A/M/R, where the file still exists.
* - `prev`: the last-pushed PRE-IMAGE (e.g. `refs/docmost/last-pushed:<path>`)
* — used for D, where the file is gone from `main` but its pageId must be
* recovered from the version Docmost last knew (SPEC §6/§8).
*/
export type MetaSide = "current" | "prev";
/** Input to the PURE planner. `metaAt` is injected (no IO inside the planner). */
export interface PushActionsInput {
/** Diff rows of `main` vs `refs/docmost/last-pushed` (SPEC §6 step 2). */
changes: DiffEntry[];
/**
* Resolve a file's synthetic native meta at a given side, or `null` if the file is
* absent there / has no parseable meta. PURE injection: the real `main` reads
* the working tree (current) or `git show <last-pushed>:<path>` (prev); tests
* pass a plain lookup.
*/
metaAt: (path: string, side: MetaSide) => DocmostMdMeta | null;
/**
* The pageIds present at ANY path in the current `main` tree (optional). When
* given, a deleted file whose pageId still lives somewhere in the tree is NOT
* a deletion but a MOVE — guards against trashing a live page when a layout
* reshuffle relocated its file (possibly across two cycles, so the matching
* add isn't in THIS diff). When omitted, only the in-diff D+A/M coalescing
* applies.
*/
currentPageIds?: Set<string>;
}
/**
* PURE push planner (SPEC §4/§6/§8). Classifies each diff row into a Docmost
* action by `pageId` identity, with NO IO (the `metaAt` resolver is injected).
*
* Classification rules:
* - `A` (added):
* - current meta HAS a pageId -> UPDATE (a restored/copied file whose
* page already exists; we push its content rather than create a dup).
* - current meta has NO pageId but HAS a non-empty spaceId -> CREATE (a
* brand-new local file; the page does not exist in Docmost yet).
* - current meta has NO pageId and NO usable spaceId -> SKIP with reason
* `create-without-spaceId`: Docmost `create_page` REQUIRES a spaceId
* (§16), and a new local file may carry only partial human meta. We
* refuse to create rather than guess a space (SPEC §8 guard spirit).
* - `M` (modified): current meta has a pageId -> UPDATE content. (If a modified
* file somehow lost its pageId it is skipped — there is nothing to target.)
* - `D` (deleted): recover the pageId from the PRE-IMAGE meta (`metaAt(path,
* 'prev')`) -> DELETE. If no pageId can be recovered, SKIP with a reason
* (untracked-file guard, SPEC §8: never delete an untracked page).
* - `R` (renamed/moved): same pageId (from current meta), path changed ->
* RENAME/MOVE. Resolution of move-vs-rename + the new parentPageId is
* DEFERRED to the next increment; here we only record oldPath/newPath/
* pageId. If the renamed file has no recoverable pageId it is SKIPPED.
* (`C` copy is treated the same as `R` for recording purposes.)
*/
export declare function computePushActions(input: PushActionsInput): PushActions;
/** The marker the push direction advances after a successful push (SPEC §5/§6). */
export declare const LAST_PUSHED_REF = "refs/docmost/last-pushed";
/**
* The mirror branch fast-forwarded after a clean push (SPEC §5/§6 step 3). It
* reflects "what Docmost currently contains"; advancing it to the pushed `main`
* commit closes the loop so the next pull diffs empty for the pushed pages.
*/
export declare const DOCMOST_BRANCH = "docmost";
/**
* Injectable IO for `applyPushActions`. The real `main` (NEXT increment) wires
* these to the live client, `node:fs/promises`, and the vault git wrapper; this
* increment drives them only through FAKES in tests (no live destructive run).
* - `client`: the create/update/delete/move/rename subset of `GitSyncClient`.
* - `readFile`/`writeFile`: read a changed file's body / write a file back
* (by vault-relative path; the applier does not resolve absolute paths so
* fakes stay trivial).
* - `git`: `updateRef` (advance `refs/docmost/last-pushed`) and
* `fastForwardBranch` (advance the `docmost` mirror after a clean push, the
* loop-close — SPEC §6 step 3 / §10).
*/
export interface ApplyPushDeps {
client: Pick<GitSyncClient, "importPageMarkdown" | "createPage" | "deletePage" | "movePage" | "renamePage">;
/** Read a changed file's full text by its vault-relative path. */
readFile: (path: string) => Promise<string>;
/** Write a file's full text by its vault-relative path. */
writeFile: (path: string, text: string) => Promise<void>;
/**
* The Docmost spaceId this vault mirrors. A CREATE targets this space (the
* native file carries no spaceId — every file in the vault belongs to it), and
* it backs the synthetic native meta the classifier reads.
*/
spaceId: string;
/**
* `updateRef` advances `refs/docmost/last-pushed`; `fastForwardBranch` advances
* the `docmost` mirror after a clean push. `showFileAtRef` reads a file's text
* at a ref (used by the move/rename classifier to resolve the PREVIOUS parent
* folder's `.md` at `refs/docmost/last-pushed`, SPEC §5 path-as-truth).
*/
git: Pick<VaultGit, "updateRef" | "fastForwardBranch" | "showFileAtRef">;
}
/** A file whose meta was rewritten with a freshly-assigned pageId (post-create). */
export interface WrittenBackPage {
path: string;
pageId: string;
}
/**
* The per-page push record consulted by a FUTURE poll-suppression (SPEC §10): a
* pulled page whose body hash + `updatedAt` match a record here is OUR OWN write
* and must not be re-pulled. PRODUCED here; CONSUMED on the pull side later.
*/
export interface PushedPageRecord {
/** The Docmost pageId that was updated/created. */
pageId: string;
/**
* The `updatedAt` from the create/update client result, when the result
* exposed one. Absent when the (fake) client did not return it.
*/
updatedAt?: string;
/** Stable hash of the markdown BODY that was pushed (SPEC §10 "хэш тела"). */
bodyHash: string;
}
/**
* One page whose operation FAILED during apply (SPEC §12 resumability). The bad
* page is isolated — recorded here — and the rest of the batch still runs; the
* refs are NOT advanced when there is any failure, so a re-run retries cleanly.
*/
export interface PushFailure {
kind: "update" | "create" | "delete" | "move" | "rename";
/** The pageId for update/delete/move/rename; absent for a never-id'd create. */
pageId?: string;
/** The vault-relative path for create/update/move/rename; absent for delete. */
path?: string;
/** The error message captured from the thrown error. */
error: string;
}
/**
* A rename/move action that resolved to a NO-OP (push #3, SPEC §5): a purely
* LOCAL file-path rename whose resolved parent AND title are both unchanged. The
* page identity is its pageId and the path is COSMETIC/local-only, so Docmost is
* NOT called — the skip is recorded here (with the reason) for logging.
*/
export interface PushNoop {
pageId: string;
oldPath: string;
newPath: string;
/** Why no Docmost op was emitted (currently always a path-only rename). */
reason: "path-only-rename";
}
/** Structured outcome of `applyPushActions` (counts + write-backs + noops). */
export interface ApplyPushResult {
created: number;
updated: number;
deleted: number;
/** Pages reparented in Docmost via `move_page` (push #3, SPEC §5/§16). */
moved: number;
/** Pages retitled in Docmost via `rename_page` (push #3, SPEC §5/§6). */
renamed: number;
/**
* Files whose `gitmost_id` frontmatter was written with the pageId Docmost assigned on
* create — these now need a FOLLOW-UP commit (the meta on disk changed). The
* commit itself is the caller's job (NEXT increment); recorded here so it is
* not lost.
*/
writtenBack: WrittenBackPage[];
/**
* Per-page push records (pageId + optional `updatedAt` + body hash) for every
* page successfully updated/created — the §10 loop-guard data a future
* poll-suppression (pull side) will consult so it does not re-pull our own
* write. Deletes are not included (no body was pushed).
*/
pushed: PushedPageRecord[];
/**
* Pages whose operation threw — isolated and recorded, the batch continued
* (SPEC §12). Non-empty here means the refs were NOT advanced.
*/
failures: PushFailure[];
/**
* Rename/move actions that resolved to a NO-OP — a purely LOCAL file-path
* rename (same parent, same title). NO Docmost call was made for these (SPEC
* §5: the page is its pageId, the path is local-only). Recorded for logging.
*/
noops: PushNoop[];
/** Diff rows the planner could not classify (carried through for logging). */
skipped: PushActions["skipped"];
/** Whether `refs/docmost/last-pushed` was advanced (only on a CLEAN push). */
lastPushedAdvanced: boolean;
/**
* Result of fast-forwarding the `docmost` mirror branch after a CLEAN push
* (the loop-close, SPEC §6 step 3 / §10). `null` when no advance was attempted
* (no `pushedCommit`, or there were failures). `{ ok:false, reason }` when a
* non-fast-forward was REFUSED (divergent `docmost` history is never clobbered).
*/
docmostFastForward: {
ok: boolean;
reason?: string;
} | null;
}
/**
* THIN IO applier for the COMMON push cases (create/update/delete). Exercised
* via FAKES only in this increment — there is no live wiring.
*
* - UPDATE: read the file body, then `client.importPageMarkdown(pageId, body)`.
* This is the collab/Yjs write path (SPEC §2/§15.6) — NEVER a raw jsonb
* overwrite. The full self-contained markdown (meta + body) is sent as-is;
* `importPageMarkdown` parses the meta/body itself.
* - CREATE: derive title/spaceId/parentPageId from the file's current meta,
* `client.createPage(...)`, take the assigned pageId from the result, and
* write it BACK as the file's `gitmost_id` frontmatter (re-serialized via
* `serializePageFile`, body preserved) so the file becomes
* tracked. The write-back is recorded in `writtenBack` (a follow-up commit
* is needed — NEXT increment).
* - DELETE: `client.deletePage(pageId)` — soft-delete to Trash (SPEC §8).
* - RENAME/MOVE (push #3, SPEC §5/§6/§16): classify each `renamesMoves` entry
* with `classifyRenameMoves` (resolvers read the parent FOLDER's `.md` for
* the parent pageId — path-as-truth — and the meta for the title), then:
* - `move` -> `client.movePage(pageId, parentPageId, position?)` (reparent;
* `position` is UNDEFINED for now — the client supplies a default),
* - `rename` -> `client.renamePage(pageId, title)` (title-only),
* - BOTH -> move (reparent) THEN rename (title), in that order,
* - `noop` -> NO client call; recorded in `noops` (a cosmetic local-only
* file-path rename: the page is its pageId, the path is local, SPEC §5).
*
* FAIL-SAFE / per-page isolation (SPEC §12 resumability). Each page's operation
* is wrapped in its own try/catch: a single failing page is recorded in
* `failures[]` (with its kind + pageId/path + error) and the batch CONTINUES —
* one bad page must never block the rest. Crucially, the refs are advanced ONLY
* when `failures.length === 0`: a PARTIAL push must NOT advance
* `refs/docmost/last-pushed` or the `docmost` mirror, so a re-run retries the
* whole batch cleanly (the already-applied pages are idempotent re-applies).
*
* LOOP-CLOSE (SPEC §6 step 3 / §10). After a fully-successful push, when a
* `pushedCommit` is supplied:
* - advance `refs/docmost/last-pushed` to it (what of `main` is in Docmost), AND
* - fast-forward the `docmost` mirror branch to it via
* `git.fastForwardBranch('docmost', pushedCommit)` — so the mirror reflects
* what Docmost now contains and the NEXT pull diffs EMPTY for these pages
* (it does not re-pull our own write). The ff is REFUSED (not forced) if
* `docmost` is not an ancestor of the pushed commit; the result is surfaced
* in `docmostFastForward`. On ANY failure, NEITHER ref is advanced.
*
* LOOP-GUARD DATA (SPEC §10). For every page successfully updated/created the
* result carries a `pushed` record `{ pageId, updatedAt?, bodyHash }` — the body
* hash of what was pushed plus the write's `updatedAt` (when the client returned
* one). A future pull-side poll-suppression consults this so it does not re-pull
* our own write; producing it is in scope here, consuming it is deferred.
*
* @param pushedCommit The `main` commit just reflected into Docmost (SHA or
* commit-ish). When omitted, NEITHER ref is advanced (e.g. a dry plan).
*/
export declare function applyPushActions(deps: ApplyPushDeps, actions: PushActions, pushedCommit?: string): Promise<ApplyPushResult>;
/**
* SPEC §5 path-as-truth: the parent FOLDER's `.md` file for a vault-relative
* (forward-slash) path. `buildVaultLayout` puts a page with children at
* `<...>/Title.md` and nests its children under `<...>/Title/`, so for
* `newPath = <dir>/Child.md` the parent page's file is `<dir>.md` (the enclosing
* folder, one level up). A path with NO enclosing folder (`Child.md`, at the
* space root) has no parent folder file -> `null` (the parent is ROOT).
*/
export declare function parentFolderFile(path: string): string | null;
/**
* Whether a vault path is a Docmost PAGE file (design §"Адопция"): a `.md` file
* with NO dot-segment anywhere in its path. This excludes `.obsidian/` config,
* `.trash/`, dotfiles (`.foo.md`), and every non-`.md` file (attachments, JSON,
* …) — Obsidian owns those; they live in the vault but are never pages. Used to
* screen the PUSH diff so non-page files are never created/updated/deleted in
* Docmost (and never get a `gitmost_id` frontmatter written into them).
*/
export declare function isPageFile(path: string): boolean;
/**
* The human ("local") git identity used for engine-made commits on `main` in the
* push direction (SPEC §7.3). The provenance is carried by the trailer (below),
* which the loop-guard keys on; the identity is for history readability only.
* When the vault repo already has a configured `user.name`/`user.email`, git
* uses that for the working-tree commit; this is the fallback the daemon stamps.
*/
export declare const LOCAL_AUTHOR_NAME = "Local";
export declare const LOCAL_AUTHOR_EMAIL = "local@local";
/** The provenance trailer marking a `main`-side (human/local) commit (SPEC §7.3). */
export declare const LOCAL_SOURCE_TRAILER = "Docmost-Sync-Source: local";
/**
* Injectable deps for `runPush` (mirrors `pull.ts`'s wiring; everything that
* touches the outside world is here so tests pass fakes). `makeClient` is a
* FACTORY, not a client — a dry-run must build NO client at all (it is never
* called), and only `--apply` invokes it.
*/
export interface PushDeps {
settings: Settings;
git: Pick<VaultGit, "assertGitAvailable" | "ensureRepo" | "isMergeInProgress" | "checkout" | "stageAll" | "commit" | "readRef" | "revParse" | "diffNameStatus" | "showFileAtRef" | "updateRef" | "fastForwardBranch" | "listTrackedFiles">;
/** Build a real client — called ONLY on `--apply`, never on dry-run. */
makeClient: (settings: Settings) => ApplyPushDeps["client"];
/** Read a file's full text by its vault-relative (forward-slash) path. */
readFile: (path: string) => Promise<string>;
/** Write a file's full text by its vault-relative path. */
writeFile: (path: string, text: string) => Promise<void>;
/** Structured logger (defaults to console in `main`; a recorder in tests). */
log: (line: string) => void;
}
/** The structured outcome of a `runPush` cycle (returned + summarized). */
export interface PushRunResult {
/** Which path ran: `dry-run` (plan only) or `apply` (Docmost mutated). */
mode: "dry-run" | "apply";
/** Why the cycle stopped before planning, if it did (e.g. a left-over merge). */
aborted?: "merge-in-progress";
/** The diff base the plan was computed against (`last-pushed` else `docmost`). */
base?: {
ref: string;
source: "last-pushed" | "docmost";
sha: string | null;
};
/** The `main` commit the plan targets (the would-be pushed commit). */
pushedCommit?: string;
/** Planned action counts from the PURE planner (present once a plan was built). */
planned?: {
creates: number;
updates: number;
deletes: number;
renamesMoves: number;
skipped: number;
};
/** The applier's structured result — ONLY present on the `--apply` path. */
applied?: ApplyPushResult;
/**
* True when `applyPushActions` REFUSED to fast-forward a divergent `docmost`
* mirror (SPEC §5 invariant broken). Escalated (logged prominently) and folded
* into the CLI's non-zero exit.
*/
divergentDocmost?: boolean;
/** Per-page failures from the applier (empty/absent on a clean run). */
failures?: PushFailure[];
}
/**
* Run one FS->Docmost push cycle (SPEC §6 "ФС → Docmost"), DRY-RUN BY DEFAULT.
*
* Steps (mirrors `pull.ts`):
* 1. Preflight git: `assertGitAvailable` + `ensureRepo`; ABORT (clear message +
* non-zero-ish result) if a merge is in progress — never push on top of an
* unresolved conflict (SPEC §9/§12). Conflict markers must NEVER reach
* Docmost (SPEC §9).
* 2. Checkout `main` (the human-facing branch the push reads from).
* 3. Commit the human's pending working-tree changes on `main` with the
* `local` provenance trailer (SPEC §7.3). A no-op when nothing changed.
* 4. Pick the diff BASE: `refs/docmost/last-pushed` if it resolves, else the
* `docmost` mirror branch (what Docmost currently has). Resolve `main`.
* 5. `diffNameStatus(base, main)` -> changes; build the `metaAt(path, side)`
* resolver (current = working tree, prev = `git show <base>:<path>`); run
* the PURE `computePushActions`.
* 6. DRY-RUN (default): LOG the full plan and RETURN — NO client, NO Docmost
* calls, NO ref advance.
* 7. `--apply`: build the client, run `applyPushActions(..., pushedCommit=main)`,
* then (a) if any pageIds were written back (creates), commit them on `main`
* with the `local` trailer and RE-advance `refs/docmost/last-pushed` to the
* new commit so the recorded pageIds are persisted in what Docmost mirrors;
* (b) ESCALATE a divergent-`docmost` ff refusal (SPEC §5) with a prominent
* WARNING and a non-zero-ish flag. Then log a one-line summary.
*/
export declare function runPush(deps: PushDeps, opts: {
dryRun: boolean;
}): Promise<PushRunResult>;
/** Parsed `push` CLI flags. DRY-RUN is the default; `--apply` opts into writes. */
export interface PushParsedArgs {
/** True when `--apply` was passed (the ONLY path that writes to Docmost). */
apply: boolean;
}
/**
* Parse the `push` CLI flags. SAFE BY DEFAULT: without `--apply` the run is a
* DRY-RUN (plan only). Exported so the flag handling is unit-testable.
*/
export declare function parseArgs(argv: string[]): PushParsedArgs;

View File

@@ -0,0 +1,971 @@
import { parsePageFile, serializePageFile } from "../lib/page-file.js";
import { DEFAULT_BRANCH } from "./git.js";
import { bodyHash } from "./loop-guard.js";
/**
* PURE classifier for the `renamesMoves` produced by `computePushActions`
* (push #3, SPEC §5/§6/§8). Resolves each `{pageId, oldPath, newPath}` into the
* Docmost op(s) it needs, with NO IO (both resolvers are injected).
*
* SPEC §5 — the file PATH is the source of truth for tree position, NOT the
* (possibly stale) `meta.parentPageId`. So the NEW parent is resolved from
* `newPath`'s enclosing folder, and the OLD parent from `oldPath`'s enclosing
* folder, via `deps.resolveParentPageId`. The title comes from the meta.
*
* For each entry:
* - `newParent = resolveParentPageId(newPath, 'current')`,
* `oldParent = resolveParentPageId(oldPath, 'prev')`.
* - `newTitle = metaAt(newPath,'current')?.title`,
* `oldTitle = metaAt(oldPath,'prev')?.title`.
* - include `move` iff `newParent !== oldParent` (a real reparent),
* - include `rename` iff `newTitle` is a NON-EMPTY string AND differs from
* `oldTitle` (a real title edit; an empty/absent new title is never a rename),
* - if NEITHER applies -> `noop: true` (a cosmetic local-only file-path rename;
* the page is its pageId, so Docmost is not touched).
*/
export function classifyRenameMoves(renamesMoves, deps) {
return renamesMoves.map((rm) => {
const newParent = deps.resolveParentPageId(rm.newPath, "current");
const oldParent = deps.resolveParentPageId(rm.oldPath, "prev");
const newTitle = deps.metaAt(rm.newPath, "current")?.title;
const oldTitle = deps.metaAt(rm.oldPath, "prev")?.title;
const out = {
pageId: rm.pageId,
oldPath: rm.oldPath,
newPath: rm.newPath,
};
// A reparent: the new path's resolved parent page differs from the old's.
if (newParent !== oldParent) {
out.move = { parentPageId: newParent };
}
// A title edit: only when there is a real, non-empty new title that changed.
if (typeof newTitle === "string" &&
newTitle.length > 0 &&
newTitle !== oldTitle) {
out.rename = { title: newTitle };
}
// Neither changed -> a purely LOCAL file-path rename; do NOT call Docmost.
if (!out.move && !out.rename) {
out.noop = true;
}
return out;
});
}
/**
* PURE push planner (SPEC §4/§6/§8). Classifies each diff row into a Docmost
* action by `pageId` identity, with NO IO (the `metaAt` resolver is injected).
*
* Classification rules:
* - `A` (added):
* - current meta HAS a pageId -> UPDATE (a restored/copied file whose
* page already exists; we push its content rather than create a dup).
* - current meta has NO pageId but HAS a non-empty spaceId -> CREATE (a
* brand-new local file; the page does not exist in Docmost yet).
* - current meta has NO pageId and NO usable spaceId -> SKIP with reason
* `create-without-spaceId`: Docmost `create_page` REQUIRES a spaceId
* (§16), and a new local file may carry only partial human meta. We
* refuse to create rather than guess a space (SPEC §8 guard spirit).
* - `M` (modified): current meta has a pageId -> UPDATE content. (If a modified
* file somehow lost its pageId it is skipped — there is nothing to target.)
* - `D` (deleted): recover the pageId from the PRE-IMAGE meta (`metaAt(path,
* 'prev')`) -> DELETE. If no pageId can be recovered, SKIP with a reason
* (untracked-file guard, SPEC §8: never delete an untracked page).
* - `R` (renamed/moved): same pageId (from current meta), path changed ->
* RENAME/MOVE. Resolution of move-vs-rename + the new parentPageId is
* DEFERRED to the next increment; here we only record oldPath/newPath/
* pageId. If the renamed file has no recoverable pageId it is SKIPPED.
* (`C` copy is treated the same as `R` for recording purposes.)
*/
export function computePushActions(input) {
const { metaAt, currentPageIds } = input;
// PAGE-FILE FILTER (design §"Адопция"): only `.md` files OUTSIDE any dot-folder
// are Docmost pages. `.obsidian/*`, attachments, and other non-page files are
// committed to the vault (no `.gitignore`) and so appear in the diff, but they
// are NEVER pages — Obsidian owns them. Without this filter every ADDED such
// file would be mis-classified as a CREATE (nativeMeta always supplies a
// spaceId, so the old `create-without-spaceId` skip no longer screens them),
// creating junk pages in Docmost and corrupting the file with a `gitmost_id`
// frontmatter. Filter BEFORE any classification so non-page A/M/D/R are ignored.
const changes = input.changes.filter((c) => isPageFile(c.path));
const actions = {
creates: [],
updates: [],
deletes: [],
renamesMoves: [],
skipped: [],
};
// GHOST-MOVE coalescing (⭐ data-loss guard). git's rename detection (`-M`)
// can miss a move when the two files are too dissimilar — which is exactly the
// case for the tiny meta-only files a layout RESHUFFLE produces (e.g.
// several untitled pages sharing the `_` fallback name; retitling one frees the
// bare `_` and another page's file relocates `_ ~slug.md` -> `_.md`). git then
// reports the move as a DELETE of the old path + an ADD of the new one. Taken
// literally that soft-deletes a page that merely MOVED — a live page vanishing
// into Trash. Identity is the pageId, not git's heuristic: a pageId that is
// BOTH deleted (pre-image) and added (current) is one page that relocated, so
// we classify it as a rename/move and NEVER as a delete.
// A pageId can land at its new path two ways: as an ADD (the path was free) or
// as a MODIFY (the path was occupied by ANOTHER page that left — the reshuffle
// case, where `_.md`'s occupant changes pageId). Both are "the page survives at
// a new path", so the surviving side is the CURRENT-meta pageId of A *and* M.
const deletedPath = new Map();
const survivingPath = new Map();
for (const change of changes) {
if (change.status === "D") {
const pid = metaAt(change.path, "prev")?.pageId;
if (pid)
deletedPath.set(pid, change.path);
}
else if (change.status === "A" || change.status === "M") {
const pid = metaAt(change.path, "current")?.pageId;
if (pid)
survivingPath.set(pid, change.path);
}
}
const ghostMove = new Map();
for (const [pid, oldPath] of deletedPath) {
const newPath = survivingPath.get(pid);
if (newPath && newPath !== oldPath) {
ghostMove.set(pid, { oldPath, newPath });
}
}
for (const change of changes) {
switch (change.status) {
case "A": {
const meta = metaAt(change.path, "current");
const pageId = meta?.pageId;
if (pageId && ghostMove.has(pageId)) {
// Half of a git-undetected move (a matching DELETE exists): record it
// as a rename/move (like a real `R`), NOT an update — the `D` side is
// suppressed so the page is never soft-deleted.
actions.renamesMoves.push({
pageId,
oldPath: ghostMove.get(pageId).oldPath,
newPath: change.path,
});
}
else if (pageId) {
// Added but already carries a pageId (restored/copied file): the page
// exists in Docmost, so push content as an UPDATE — never a duplicate.
actions.updates.push({ pageId, path: change.path });
}
else if (meta?.spaceId) {
// Brand-new local file with a target space -> create the page, then
// write the assigned pageId back into its meta (in `applyPushActions`).
// `meta.spaceId` is truthy here, so empty-string is also rejected.
actions.creates.push({ path: change.path });
}
else {
// A create needs a spaceId (Docmost `create_page` requires it, §16). A
// new file with partial meta and no usable spaceId is SKIPPED rather
// than created into a guessed space (SPEC §8 guard spirit).
actions.skipped.push({
path: change.path,
status: "A",
reason: "create-without-spaceId",
});
}
break;
}
case "M": {
const meta = metaAt(change.path, "current");
const pageId = meta?.pageId;
if (pageId && ghostMove.has(pageId)) {
// This path's occupant changed pageId: the previous page left and THIS
// page relocated here (a reshuffle). Its old file was DELETED elsewhere
// — coalesce into a rename/move so the page is never trashed.
actions.renamesMoves.push({
pageId,
oldPath: ghostMove.get(pageId).oldPath,
newPath: change.path,
});
}
else if (pageId) {
actions.updates.push({ pageId, path: change.path });
}
else {
// A modified file with no pageId has no Docmost target to update.
actions.skipped.push({
path: change.path,
status: "M",
reason: "modified file has no pageId in meta",
});
}
break;
}
case "D": {
// The file is gone from `main`; recover its pageId from the PRE-IMAGE
// (the version last pushed to Docmost) so we delete the RIGHT page.
const prevMeta = metaAt(change.path, "prev");
const pageId = prevMeta?.pageId;
if (pageId && ghostMove.has(pageId)) {
// The same pageId was re-ADDED at a new path: this is a git-undetected
// MOVE, handled by the `A` branch above. Suppress the delete so a moved
// page is never trashed (⭐ data-loss guard).
actions.skipped.push({
path: change.path,
status: "D",
reason: "ghost-move (re-added at a new path) — not a deletion",
});
}
else if (pageId && currentPageIds?.has(pageId)) {
// The pageId still EXISTS elsewhere in the current tree: the file moved
// (a layout reshuffle whose matching add was in an earlier cycle, so it
// is not in this diff). A live page must never be trashed because its
// FILENAME changed — identity is the pageId (⭐ data-loss guard).
actions.skipped.push({
path: change.path,
status: "D",
reason: "pageId still present in the tree (moved) — not a deletion",
});
}
else if (pageId) {
actions.deletes.push({ pageId });
}
else {
// Untracked-file guard (SPEC §8): a file with no recoverable pageId was
// never a Docmost page — do NOT translate its removal into a delete.
actions.skipped.push({
path: change.path,
status: "D",
reason: "deleted file has no recoverable pageId (pre-image meta)",
});
}
break;
}
case "R":
case "C": {
// Same page, new path. Identity comes from the CURRENT (post-rename) meta
// since the file still exists. RESOLUTION (move vs rename, parentPageId)
// is deferred — record oldPath/newPath/pageId only.
const meta = metaAt(change.path, "current");
const pageId = meta?.pageId;
const oldPath = change.oldPath ?? change.path;
if (pageId) {
actions.renamesMoves.push({
pageId,
oldPath,
newPath: change.path,
});
}
else {
actions.skipped.push({
path: change.path,
status: change.status,
reason: "renamed/moved file has no pageId in meta",
});
}
break;
}
default: {
// Unreachable for A/M/D/R/C; defensive for any future status.
actions.skipped.push({
path: change.path,
status: change.status,
reason: `unhandled diff status ${change.status}`,
});
}
}
}
return actions;
}
// --- thin apply (create/update/delete), fakes-only in this increment ---------
/** The marker the push direction advances after a successful push (SPEC §5/§6). */
export const LAST_PUSHED_REF = "refs/docmost/last-pushed";
/**
* The mirror branch fast-forwarded after a clean push (SPEC §5/§6 step 3). It
* reflects "what Docmost currently contains"; advancing it to the pushed `main`
* commit closes the loop so the next pull diffs empty for the pushed pages.
*/
export const DOCMOST_BRANCH = "docmost";
/**
* THIN IO applier for the COMMON push cases (create/update/delete). Exercised
* via FAKES only in this increment — there is no live wiring.
*
* - UPDATE: read the file body, then `client.importPageMarkdown(pageId, body)`.
* This is the collab/Yjs write path (SPEC §2/§15.6) — NEVER a raw jsonb
* overwrite. The full self-contained markdown (meta + body) is sent as-is;
* `importPageMarkdown` parses the meta/body itself.
* - CREATE: derive title/spaceId/parentPageId from the file's current meta,
* `client.createPage(...)`, take the assigned pageId from the result, and
* write it BACK as the file's `gitmost_id` frontmatter (re-serialized via
* `serializePageFile`, body preserved) so the file becomes
* tracked. The write-back is recorded in `writtenBack` (a follow-up commit
* is needed — NEXT increment).
* - DELETE: `client.deletePage(pageId)` — soft-delete to Trash (SPEC §8).
* - RENAME/MOVE (push #3, SPEC §5/§6/§16): classify each `renamesMoves` entry
* with `classifyRenameMoves` (resolvers read the parent FOLDER's `.md` for
* the parent pageId — path-as-truth — and the meta for the title), then:
* - `move` -> `client.movePage(pageId, parentPageId, position?)` (reparent;
* `position` is UNDEFINED for now — the client supplies a default),
* - `rename` -> `client.renamePage(pageId, title)` (title-only),
* - BOTH -> move (reparent) THEN rename (title), in that order,
* - `noop` -> NO client call; recorded in `noops` (a cosmetic local-only
* file-path rename: the page is its pageId, the path is local, SPEC §5).
*
* FAIL-SAFE / per-page isolation (SPEC §12 resumability). Each page's operation
* is wrapped in its own try/catch: a single failing page is recorded in
* `failures[]` (with its kind + pageId/path + error) and the batch CONTINUES —
* one bad page must never block the rest. Crucially, the refs are advanced ONLY
* when `failures.length === 0`: a PARTIAL push must NOT advance
* `refs/docmost/last-pushed` or the `docmost` mirror, so a re-run retries the
* whole batch cleanly (the already-applied pages are idempotent re-applies).
*
* LOOP-CLOSE (SPEC §6 step 3 / §10). After a fully-successful push, when a
* `pushedCommit` is supplied:
* - advance `refs/docmost/last-pushed` to it (what of `main` is in Docmost), AND
* - fast-forward the `docmost` mirror branch to it via
* `git.fastForwardBranch('docmost', pushedCommit)` — so the mirror reflects
* what Docmost now contains and the NEXT pull diffs EMPTY for these pages
* (it does not re-pull our own write). The ff is REFUSED (not forced) if
* `docmost` is not an ancestor of the pushed commit; the result is surfaced
* in `docmostFastForward`. On ANY failure, NEITHER ref is advanced.
*
* LOOP-GUARD DATA (SPEC §10). For every page successfully updated/created the
* result carries a `pushed` record `{ pageId, updatedAt?, bodyHash }` — the body
* hash of what was pushed plus the write's `updatedAt` (when the client returned
* one). A future pull-side poll-suppression consults this so it does not re-pull
* our own write; producing it is in scope here, consuming it is deferred.
*
* @param pushedCommit The `main` commit just reflected into Docmost (SHA or
* commit-ish). When omitted, NEITHER ref is advanced (e.g. a dry plan).
*/
export async function applyPushActions(deps, actions, pushedCommit) {
const { client, git } = deps;
let created = 0;
let updated = 0;
let deleted = 0;
let moved = 0;
let renamed = 0;
const writtenBack = [];
const pushed = [];
const failures = [];
const noops = [];
// 1. UPDATES — collab/Yjs write path (SPEC §2/§15.6), never a raw overwrite.
// Each update is isolated: a thrown page is recorded and the batch goes on.
for (const u of actions.updates) {
try {
// Push the CLEAN body only (no `gitmost_id` frontmatter): the frontmatter
// is engine metadata, never page content. The server converts the markdown
// it receives verbatim, so stripping here keeps the id out of Docmost.
const body = parsePageFile(await deps.readFile(u.path)).body;
// The last-synced version of this file (pre-image) is the common ancestor
// for a 3-way merge against the live page, so concurrent human edits are
// not clobbered (review #5). Null when the file is new at last-pushed. Its
// body is stripped the SAME way so the merge compares body-to-body.
const baseFull = await deps.git.showFileAtRef(LAST_PUSHED_REF, u.path);
const baseMarkdown = baseFull === null ? null : parsePageFile(baseFull).body;
const result = await client.importPageMarkdown(u.pageId, body, baseMarkdown);
updated++;
// §10 loop-guard data: hash the BODY we pushed + capture `updatedAt`.
pushed.push({
pageId: u.pageId,
...extractUpdatedAt(result),
bodyHash: bodyHash(body),
});
}
catch (err) {
failures.push({
kind: "update",
pageId: u.pageId,
path: u.path,
error: errMessage(err),
});
}
}
// 2. CREATES — create the page, then write the assigned pageId back to meta so
// the file becomes tracked (SPEC §4 "записать присвоенный pageId обратно").
// Isolated per page like updates.
for (const c of actions.creates) {
try {
const text = await deps.readFile(c.path);
const { body } = parsePageFile(text);
// Derive create args from the PATH (native-Obsidian, SPEC §5): title from
// the filename, parent from the enclosing folder's folder-note, space from
// the run (the vault's space). `parentPageId: null` -> created at ROOT.
const title = titleFromPath(c.path);
const parentPageId = (await resolveParentPageIdViaTree(deps, c.path, "current")) ?? undefined;
const result = await client.createPage(title, body, deps.spaceId, parentPageId);
// `createPage` returns `{ data: { id, ... }, success }`; the assigned
// pageId is at `result.data.id`.
const assignedPageId = result?.data?.id;
if (assignedPageId) {
// Write the assigned pageId back as the `gitmost_id` frontmatter, body
// preserved — the file becomes engine-tracked (SPEC §4).
const rewritten = serializePageFile(assignedPageId, body);
await deps.writeFile(c.path, rewritten);
writtenBack.push({ path: c.path, pageId: assignedPageId });
// §10 loop-guard data for the created page (hash the pushed BODY).
pushed.push({
pageId: assignedPageId,
...extractUpdatedAt(result),
bodyHash: bodyHash(body),
});
}
created++;
}
catch (err) {
failures.push({ kind: "create", path: c.path, error: errMessage(err) });
}
}
// 3. DELETES — soft-delete to Trash (SPEC §8), reversible. Isolated per page.
for (const d of actions.deletes) {
try {
await client.deletePage(d.pageId);
deleted++;
}
catch (err) {
failures.push({
kind: "delete",
pageId: d.pageId,
error: errMessage(err),
});
}
}
// 4. RENAME/MOVE (push #3, SPEC §5/§6/§16). Classify each entry against the
// tree-backed resolvers (the NEW parent comes from the new path's enclosing
// folder `.md`, the OLD parent from the old path's at last-pushed — PATH is
// the truth, not stale `meta.parentPageId`; the title from the meta), then
// apply only the real ops. Each page is isolated like the cases above: a
// thrown op is recorded in `failures` and the batch continues. ORDER for a
// page that needs both: reparent (move) FIRST, then retitle (rename).
if (actions.renamesMoves.length > 0) {
// The classifier is PURE over sync resolvers; the tree reads are async, so
// prefetch every (path, side) lookup it will make into plain tables first.
const parentTable = new Map();
const metaTable = new Map();
// A tree read (readFile / git.showFileAtRef) throwing must isolate THAT page
// into `failures`, NOT abort the whole batch (§12 resumability). The helpers
// already swallow their own errors, but this per-entry try/catch keeps the
// batch-isolation invariant holding regardless of future changes to them.
const prefetchFailed = new Set();
for (const rm of actions.renamesMoves) {
// newParent + newTitle from the CURRENT tree; oldParent + oldTitle from the
// last-pushed pre-image (`prev`). Keyed by `path|side` so duplicates fold.
try {
parentTable.set(`${rm.newPath}|current`, await resolveParentPageIdViaTree(deps, rm.newPath, "current"));
parentTable.set(`${rm.oldPath}|prev`, await resolveParentPageIdViaTree(deps, rm.oldPath, "prev"));
metaTable.set(`${rm.newPath}|current`, await metaAtViaTree(deps, rm.newPath, "current", deps.spaceId));
metaTable.set(`${rm.oldPath}|prev`, await metaAtViaTree(deps, rm.oldPath, "prev", deps.spaceId));
}
catch (err) {
prefetchFailed.add(rm.pageId);
failures.push({
kind: "move",
pageId: rm.pageId,
path: rm.newPath,
error: errMessage(err),
});
}
}
const classified = classifyRenameMoves(actions.renamesMoves.filter((rm) => !prefetchFailed.has(rm.pageId)), {
metaAt: (path, side) => metaTable.get(`${path}|${side}`) ?? null,
resolveParentPageId: (path, side) => parentTable.get(`${path}|${side}`) ?? null,
});
for (const c of classified) {
if (c.noop) {
// Cosmetic local-only file-path rename — no Docmost op (SPEC §5).
noops.push({
pageId: c.pageId,
oldPath: c.oldPath,
newPath: c.newPath,
reason: "path-only-rename",
});
continue;
}
// Track which op is in flight so a failure is attributed to the op that
// ACTUALLY threw: for a page needing both, a move that succeeds then a
// rename that throws must be recorded as `rename`, not `move`.
let failingKind = c.move ? "move" : "rename";
try {
// Reparent FIRST so the page is in its new tree position, THEN retitle.
if (c.move) {
failingKind = "move";
// TODO(next): compute a fractional-index position between siblings
// (SPEC §16). `position` is UNDEFINED here; the client supplies a valid
// default. Pass `parentPageId: null` for a move to the space ROOT.
await client.movePage(c.pageId, c.move.parentPageId);
moved++;
}
if (c.rename) {
failingKind = "rename";
await client.renamePage(c.pageId, c.rename.title);
renamed++;
}
}
catch (err) {
// Isolate the failed page: the op that ACTUALLY threw is recorded so a
// re-run can retry. A move that threw before its rename leaves `rename`
// for the next run (idempotent re-apply); refs are NOT advanced (below).
failures.push({
kind: failingKind,
pageId: c.pageId,
path: c.newPath,
error: errMessage(err),
});
}
}
}
// 5. Advance the refs ONLY on a CLEAN push (no failures) AND when a pushed
// commit is supplied. A partial push must advance NEITHER ref, so a re-run
// retries the whole batch (SPEC §12). The loop-close (SPEC §6 step 3 / §10):
// advance `refs/docmost/last-pushed` AND fast-forward the `docmost` mirror,
// so Docmost's new content is mirrored and the next pull diffs empty.
let lastPushedAdvanced = false;
let docmostFastForward = null;
if (pushedCommit && failures.length === 0) {
await git.updateRef(LAST_PUSHED_REF, pushedCommit);
lastPushedAdvanced = true;
// Fast-forward the mirror (refused, not forced, on a non-fast-forward — the
// caller logs the reason). Surfaced in the result.
docmostFastForward = await git.fastForwardBranch(DOCMOST_BRANCH, pushedCommit);
}
return {
created,
updated,
deleted,
moved,
renamed,
writtenBack,
pushed,
failures,
noops,
skipped: actions.skipped,
lastPushedAdvanced,
docmostFastForward,
};
}
/** Stringify a thrown value into a stable error message. */
function errMessage(err) {
return err instanceof Error ? err.message : String(err);
}
/**
* SPEC §5 path-as-truth: the parent FOLDER's `.md` file for a vault-relative
* (forward-slash) path. `buildVaultLayout` puts a page with children at
* `<...>/Title.md` and nests its children under `<...>/Title/`, so for
* `newPath = <dir>/Child.md` the parent page's file is `<dir>.md` (the enclosing
* folder, one level up). A path with NO enclosing folder (`Child.md`, at the
* space root) has no parent folder file -> `null` (the parent is ROOT).
*/
export function parentFolderFile(path) {
const slash = path.lastIndexOf("/");
if (slash < 0)
return null; // root-level file: parent is ROOT.
const dir = path.slice(0, slash); // the enclosing folder
// The page that OWNS the enclosing folder is its folder-note `<dir>/<base>.md`.
const folderNote = `${dir}/${baseSegment(dir)}.md`;
if (path === folderNote) {
// This path IS its folder's folder-note, so its parent is ONE LEVEL UP: the
// folder-note of the grandparent folder (or ROOT at the top level).
const up = dir.lastIndexOf("/");
if (up < 0)
return null; // top-level folder -> parent is ROOT.
const grandDir = dir.slice(0, up);
return `${grandDir}/${baseSegment(grandDir)}.md`;
}
// A leaf (or a nested folder-note) sitting inside `dir`: its parent is `dir`'s
// folder-note.
return folderNote;
}
/**
* Whether a vault path is a Docmost PAGE file (design §"Адопция"): a `.md` file
* with NO dot-segment anywhere in its path. This excludes `.obsidian/` config,
* `.trash/`, dotfiles (`.foo.md`), and every non-`.md` file (attachments, JSON,
* …) — Obsidian owns those; they live in the vault but are never pages. Used to
* screen the PUSH diff so non-page files are never created/updated/deleted in
* Docmost (and never get a `gitmost_id` frontmatter written into them).
*/
export function isPageFile(path) {
if (!path.endsWith(".md"))
return false;
return !path.split("/").some((seg) => seg.startsWith("."));
}
/** The last path segment of a forward-slash path (the folder/file base name). */
function baseSegment(path) {
const slash = path.lastIndexOf("/");
return slash < 0 ? path : path.slice(slash + 1);
}
/**
* The page TITLE derived from a vault path: the file's base name without the
* `.md` extension. In the native-Obsidian layout the filename IS the title — for
* a folder-note `<dir>/<base>.md` that base equals the folder name, so the same
* rule yields the folder's title. Self-consistent across pull/push: a pulled
* (possibly disambiguated) filename round-trips to the same title, so a stable
* file never pushes a spurious rename.
*/
function titleFromPath(path) {
const base = baseSegment(path);
return base.endsWith(".md") ? base.slice(0, -3) : base;
}
/**
* Build the synthetic `DocmostMdMeta` the planner/classifier consume, from the
* NATIVE format: `pageId` from the `gitmost_id` frontmatter, `title` from the
* filename, `spaceId` from the run (the vault's space — every file belongs to
* it). `parentPageId` is intentionally absent: tree position is resolved from the
* PATH (`resolveParentPageId`), never from a stored field (SPEC §5).
*/
function nativeMeta(text, path, spaceId) {
const { id } = parsePageFile(text);
const meta = { version: 1, title: titleFromPath(path), spaceId };
if (id)
meta.pageId = id;
return meta;
}
/**
* Build the `resolveParentPageId(path, side)` resolver `classifyRenameMoves`
* needs, reading the PARENT FOLDER's `.md` (SPEC §5 path-as-truth):
* - `current` -> `deps.readFile(<dir>.md)` (the live working tree),
* - `prev` -> `git.showFileAtRef('refs/docmost/last-pushed', <dir>.md)` (the
* last-pushed pre-image),
* then read its `gitmost_id` frontmatter and return that page's pageId. A root-level path
* (no enclosing folder), a missing/unreadable parent file, or a parent file with
* no parseable pageId all resolve to `null` (parent is ROOT / unknown ->
* `parentPageId: null`, SPEC §16 "parentPageId: null -> в корень").
*
* The IO is async, so this returns an ASYNC resolver; the call sites prefetch the
* parent pageIds (the classifier itself stays pure/sync over a plain table).
*/
async function resolveParentPageIdViaTree(deps, path, side) {
const parentFile = parentFolderFile(path);
if (parentFile === null)
return null; // root-level: parent is ROOT.
let text;
try {
text =
side === "current"
? await deps.readFile(parentFile)
: await deps.git.showFileAtRef(LAST_PUSHED_REF, parentFile);
}
catch {
// Parent folder file missing/unreadable at that side -> treat as ROOT.
return null;
}
if (text === null)
return null; // showFileAtRef returns null when absent.
// The parent page's identity is its `gitmost_id` frontmatter; folder position
// is irrelevant here, only the pageId.
return parsePageFile(text).id;
}
/**
* Resolve the synthetic native meta at a side for the rename/move classifier (the
* title — derived from the path — comes from here). Mirrors
* `resolveParentPageIdViaTree`'s IO sides: `current` reads the working tree,
* `prev` reads `refs/docmost/last-pushed`. Returns `null` only when the file is
* missing/unreadable at that side (a real absence the classifier must see).
*/
async function metaAtViaTree(deps, path, side, spaceId) {
let text;
try {
text =
side === "current"
? await deps.readFile(path)
: await deps.git.showFileAtRef(LAST_PUSHED_REF, path);
}
catch {
return null;
}
if (text === null)
return null;
return nativeMeta(text, path, spaceId);
}
/**
* Pull an `updatedAt` out of a create/update client result, if present. The
* shape is `{ data: { updatedAt? }, ... }` (createPage) or a flatter object;
* absent in the simple fakes, so the field is omitted rather than `undefined`.
*/
function extractUpdatedAt(result) {
const r = result;
const raw = r?.data?.updatedAt ?? r?.updatedAt;
return typeof raw === "string" ? { updatedAt: raw } : {};
}
// --- runnable push orchestration (`runPush`) ---------------------------------
//
// `runPush` is the FS->Docmost twin of `pull.ts`'s `main`: it wires the VaultGit
// diff/ref primitives + the PURE `computePushActions` planner + the THIN
// `applyPushActions` applier into one runnable cycle. SAFE BY DEFAULT — the
// engine's FIRST write path to Docmost defaults to DRY-RUN (plan only, NO
// Docmost writes, NO ref advance); an explicit `--apply` is the ONLY path that
// builds a client and mutates Docmost.
//
// Every external effect is injected (`PushDeps`) so the whole orchestration is
// driven by FAKES in tests — no live Docmost, git, fs, or network.
/**
* The human ("local") git identity used for engine-made commits on `main` in the
* push direction (SPEC §7.3). The provenance is carried by the trailer (below),
* which the loop-guard keys on; the identity is for history readability only.
* When the vault repo already has a configured `user.name`/`user.email`, git
* uses that for the working-tree commit; this is the fallback the daemon stamps.
*/
export const LOCAL_AUTHOR_NAME = "Local";
export const LOCAL_AUTHOR_EMAIL = "local@local";
/** The provenance trailer marking a `main`-side (human/local) commit (SPEC §7.3). */
export const LOCAL_SOURCE_TRAILER = "Docmost-Sync-Source: local";
/**
* Run one FS->Docmost push cycle (SPEC §6 "ФС → Docmost"), DRY-RUN BY DEFAULT.
*
* Steps (mirrors `pull.ts`):
* 1. Preflight git: `assertGitAvailable` + `ensureRepo`; ABORT (clear message +
* non-zero-ish result) if a merge is in progress — never push on top of an
* unresolved conflict (SPEC §9/§12). Conflict markers must NEVER reach
* Docmost (SPEC §9).
* 2. Checkout `main` (the human-facing branch the push reads from).
* 3. Commit the human's pending working-tree changes on `main` with the
* `local` provenance trailer (SPEC §7.3). A no-op when nothing changed.
* 4. Pick the diff BASE: `refs/docmost/last-pushed` if it resolves, else the
* `docmost` mirror branch (what Docmost currently has). Resolve `main`.
* 5. `diffNameStatus(base, main)` -> changes; build the `metaAt(path, side)`
* resolver (current = working tree, prev = `git show <base>:<path>`); run
* the PURE `computePushActions`.
* 6. DRY-RUN (default): LOG the full plan and RETURN — NO client, NO Docmost
* calls, NO ref advance.
* 7. `--apply`: build the client, run `applyPushActions(..., pushedCommit=main)`,
* then (a) if any pageIds were written back (creates), commit them on `main`
* with the `local` trailer and RE-advance `refs/docmost/last-pushed` to the
* new commit so the recorded pageIds are persisted in what Docmost mirrors;
* (b) ESCALATE a divergent-`docmost` ff refusal (SPEC §5) with a prominent
* WARNING and a non-zero-ish flag. Then log a one-line summary.
*/
export async function runPush(deps, opts) {
const { git, settings, log } = deps;
const dryRun = opts.dryRun;
// 1. Preflight git. Fail fast (actionable message via main().catch) if the git
// binary is missing — the vault state store relies on it.
await git.assertGitAvailable();
await git.ensureRepo();
// 1b. Refuse to push on top of an unresolved merge (SPEC §9/§12). A previous
// conflicting pull leaves the vault mid-merge; pushing now could leak
// conflict markers into Docmost (SPEC §9, the cardinal invariant). Detect
// it BEFORE any checkout/diff and stop with a clear, actionable message so
// re-runs converge once the human resolves (or aborts) the merge.
if (await git.isMergeInProgress()) {
log(`push: vault has an unresolved merge at ${settings.vaultPath} — resolve ` +
`it (or 'git merge --abort') and re-run. Nothing was pushed to Docmost ` +
`(conflict markers must never reach Docmost, SPEC §9).`);
return { mode: dryRun ? "dry-run" : "apply", aborted: "merge-in-progress" };
}
// 2. Work on `main` — the human-facing branch the push diffs FROM.
await git.checkout(DEFAULT_BRANCH);
// 3. Commit the human's pending working-tree changes on `main` with the `local`
// provenance trailer (SPEC §7.3). A no-op commit when nothing changed is
// fine (`commit` returns false). The loop-guard keys on the trailer.
// Even on a "plan only" dry-run this commits the working tree (it is the
// only way to diff `base..main`, acceptable §6.1 behavior) — so make that
// LOCAL git mutation VISIBLE, never silent: a created commit is local-only
// and nothing is sent to Docmost.
await git.stageAll();
const committedWorkingTree = await git.commit("local: working-tree changes", {
authorName: LOCAL_AUTHOR_NAME,
authorEmail: LOCAL_AUTHOR_EMAIL,
trailers: [LOCAL_SOURCE_TRAILER],
});
if (committedWorkingTree) {
const sha = await git.revParse(DEFAULT_BRANCH);
log(`push: committed local working-tree changes on main` +
(sha ? ` as ${sha.slice(0, 8)}` : "") +
` (local git only — nothing sent to Docmost).`);
}
else {
log("push: working tree clean (no local changes to push).");
}
// 4. Pick the diff BASE (SPEC §5/§6): `refs/docmost/last-pushed` if it resolves
// (the marker of what `main` is already in Docmost), else fall back to the
// `docmost` mirror branch (the mirror of what Docmost currently has) — which
// is what exists before the first push ever advanced last-pushed.
let base;
const lastPushedSha = await git.readRef(LAST_PUSHED_REF);
if (lastPushedSha) {
base = { ref: LAST_PUSHED_REF, source: "last-pushed", sha: lastPushedSha };
}
else {
base = {
ref: DOCMOST_BRANCH,
source: "docmost",
sha: await git.revParse(DOCMOST_BRANCH),
};
}
const pushedCommit = await git.revParse(DEFAULT_BRANCH);
if (!pushedCommit) {
// `main` has no commit — `ensureRepo` always makes an initial one, so this is
// defensive. Nothing to diff.
log("push: `main` has no commit to push — nothing to do.");
return { mode: dryRun ? "dry-run" : "apply", base };
}
// 5. Diff the base against `main` and build the `metaAt` resolver (PURE planner
// input). `current` reads the live working tree; `prev` reads the base ref's
// pre-image via `git show <base>:<path>` (so a DELETE recovers its pageId).
const changes = await git.diffNameStatus(base.ref, DEFAULT_BRANCH);
// Synchronous resolver over PREFETCHED meta tables: `computePushActions` is
// PURE/sync, but the file/ref reads are async — so we prefetch every (path,
// side) the diff will ask for into a table first, then resolve from it.
const metaTable = new Map();
for (const change of changes) {
// `current`: A/M/R/C still have the file on `main`. `prev`: D needs the
// pre-image; R/C also benefit (old title). Prefetch both sides per path.
const currentPath = change.path;
const prevPath = change.oldPath ?? change.path;
if (!metaTable.has(`${currentPath}|current`)) {
metaTable.set(`${currentPath}|current`, await readMetaCurrent(deps, currentPath, settings.docmostSpaceId));
}
if (!metaTable.has(`${prevPath}|prev`)) {
metaTable.set(`${prevPath}|prev`, await readMetaPrev(deps, base.ref, prevPath, settings.docmostSpaceId));
}
}
const metaAt = (path, side) => metaTable.get(`${path}|${side}`) ?? null;
// The set of pageIds that STILL EXIST somewhere in the current `main` tree.
// Identity is the pageId, NOT the filename: a file vanishing from one path
// while the SAME pageId lives at another path is a MOVE (often a layout
// reshuffle of `_`-fallback names, whose two halves can even land in separate
// cycles), never a deletion. Built only when the diff contains deletes — the
// guard's whole job is to stop a phantom delete from trashing a live page.
let currentPageIds;
if (changes.some((c) => c.status === "D")) {
currentPageIds = new Set();
for (const relPath of await git.listTrackedFiles("*.md")) {
const pid = (await readMetaCurrent(deps, relPath, settings.docmostSpaceId))
?.pageId;
if (pid)
currentPageIds.add(pid);
}
}
const actions = computePushActions({ changes, metaAt, currentPageIds });
const planned = {
creates: actions.creates.length,
updates: actions.updates.length,
deletes: actions.deletes.length,
renamesMoves: actions.renamesMoves.length,
skipped: actions.skipped.length,
};
// 6. DRY-RUN (default): log the full plan and RETURN — build NO client, make
// ZERO Docmost calls, advance NO refs. This is the SAFE default.
logPlan(log, base, pushedCommit, actions, planned, dryRun);
if (dryRun) {
return { mode: "dry-run", base, pushedCommit, planned };
}
// 7. --apply: build the REAL client and execute. This is the ONLY write path.
const client = deps.makeClient(settings);
const applied = await applyPushActions({
client,
// Pass the WHOLE `git` object (it satisfies the applier's
// `Pick<VaultGit, ...>` deps surface). Passing bare method references
// (`git.updateRef`, …) would lose their `this` binding, so on a REAL
// `VaultGit` they would throw `this.runRaw is not a function`. Hand over
// the object so the methods keep their receiver — exactly as `pull.ts`
// does for `applyPullActions`.
git,
readFile: deps.readFile,
writeFile: deps.writeFile,
spaceId: settings.docmostSpaceId,
}, actions, pushedCommit);
// 7a. Persist freshly-assigned pageIds (creates) back into git. `applyPushActions`
// rewrote those files on disk; commit them on `main` with the `local` trailer
// so the new pageIds are recorded, then RE-advance `refs/docmost/last-pushed`
// to the new commit so what Docmost mirrors and what last-pushed points at
// stay in lock-step (the write-back commit is part of `main` now).
// Track a divergent-`docmost` mirror across BOTH ff sites (the applier's main
// push ff in 7b, and the write-back ff here). A divergent mirror is a §5
// invariant breach in EITHER branch and must escalate identically (exit 1).
let divergentDocmost = false;
if (applied.writtenBack.length > 0) {
await git.stageAll();
const recorded = await git.commit("local: record created pageIds", {
authorName: LOCAL_AUTHOR_NAME,
authorEmail: LOCAL_AUTHOR_EMAIL,
trailers: [LOCAL_SOURCE_TRAILER],
});
if (recorded) {
const newCommit = await git.revParse(DEFAULT_BRANCH);
// Only re-advance when the original push was CLEAN (last-pushed was already
// advanced by the applier); a partial push left the refs untouched and a
// re-run retries the whole batch, so we must not move them either.
if (newCommit && applied.lastPushedAdvanced) {
await git.updateRef(LAST_PUSHED_REF, newCommit);
const ff = await git.fastForwardBranch(DOCMOST_BRANCH, newCommit);
if (!ff.ok) {
// SYMMETRIC with the main escalation (7b): a divergent mirror in the
// write-back branch is the SAME §5 invariant breach and must escalate
// (exit 1), not just log a soft warning.
divergentDocmost = true;
log(`push: WARNING — the 'docmost' mirror branch DIVERGED and was NOT ` +
`fast-forwarded to the pageId write-back commit ` +
`(${ff.reason ?? "not-fast-forward"}). The §5 invariant ('docmost' ` +
`mirrors what Docmost contains) is broken: reconcile 'docmost' ` +
`against the live Docmost tree before the next cycle.`);
}
}
}
}
// 7b. ESCALATE a divergent-`docmost` fast-forward refusal (SPEC §5 invariant
// broken). The applier already refused to clobber a divergent mirror; make
// it LOUD (not silent) so the operator notices, and fold it into the exit.
if (applied.docmostFastForward && !applied.docmostFastForward.ok) {
divergentDocmost = true;
log(`push: WARNING — the 'docmost' mirror branch DIVERGED and was NOT ` +
`fast-forwarded (${applied.docmostFastForward.reason ?? "not-fast-forward"}). ` +
`The §5 invariant ('docmost' mirrors what Docmost contains) is broken: ` +
`reconcile 'docmost' against the live Docmost tree before the next cycle.`);
}
// 7c. One-line summary (mirrors pull.ts's summary line).
log(`push complete: ${applied.created} created, ${applied.updated} updated, ` +
`${applied.deleted} deleted, ${applied.moved} moved, ${applied.renamed} ` +
`renamed, ${applied.noops.length} no-op(s), ${applied.skipped.length} ` +
`skipped, ${applied.failures.length} failure(s)` +
(divergentDocmost ? " [DIVERGENT docmost mirror]" : ""));
return {
mode: "apply",
base,
pushedCommit,
planned,
applied,
divergentDocmost,
failures: applied.failures,
};
}
/** Synthetic native meta from the live working tree (`current` side). */
async function readMetaCurrent(deps, path, spaceId) {
let text;
try {
text = await deps.readFile(path);
}
catch {
return null; // absent on disk (e.g. a D row's path) -> no current meta.
}
return nativeMeta(text, path, spaceId);
}
/** Synthetic native meta from the base ref's pre-image (`prev` side). */
async function readMetaPrev(deps, baseRef, path, spaceId) {
let text;
try {
text = await deps.git.showFileAtRef(baseRef, path);
}
catch {
return null;
}
if (text === null)
return null; // path absent at the base ref.
return nativeMeta(text, path, spaceId);
}
/** Emit the full plan (counts + per-item) to the injected logger. */
function logPlan(log, base, pushedCommit, actions, planned, dryRun) {
log(`push plan (${dryRun ? "DRY-RUN — no Docmost writes" : "APPLY"}): base=` +
`${base.ref} (${base.source}${base.sha ? ` ${base.sha.slice(0, 8)}` : ""}) ` +
`-> main ${pushedCommit.slice(0, 8)}`);
log(`push plan counts: ${planned.creates} create, ${planned.updates} update, ` +
`${planned.deletes} delete, ${planned.renamesMoves} rename/move, ` +
`${planned.skipped} skipped`);
for (const c of actions.creates)
log(` create: ${c.path}`);
for (const u of actions.updates)
log(` update: ${u.pageId} (${u.path})`);
for (const d of actions.deletes)
log(` delete: ${d.pageId}`);
for (const rm of actions.renamesMoves)
log(` rename/move: ${rm.oldPath} -> ${rm.newPath} (${rm.pageId})`);
for (const s of actions.skipped)
log(` skipped [${s.status}] ${s.path}: ${s.reason}`);
}
/**
* Parse the `push` CLI flags. SAFE BY DEFAULT: without `--apply` the run is a
* DRY-RUN (plan only). Exported so the flag handling is unit-testable.
*/
export function parseArgs(argv) {
return { apply: argv.includes("--apply") };
}

View File

@@ -0,0 +1,126 @@
/**
* Pure reconciliation planner (SPEC §5/§6/§8).
*
* Given the desired live set of files (computed from the current Docmost tree)
* and the set of files currently tracked in the vault, compute what to write,
* what to move (old path to remove), and what to delete. Identity is `pageId`
* (the stable file<->page anchor, SPEC §4): a page that keeps its pageId but
* changes relPath is a MOVE, not delete+add; a tracked pageId that is gone from
* the live tree is a DELETE.
*
* This module is intentionally PURE (no IO, no git) so the whole plan is
* unit-testable. The actual file writing / git operations happen in pull.ts.
*/
/** A page that SHOULD exist in the vault at a given path. */
export interface LiveEntry {
pageId: string;
/** Vault-relative path (forward-slash), e.g. `Space/Parent/Child.md`. */
relPath: string;
}
/** A page currently tracked in the vault (pageId parsed from its meta). */
export interface ExistingEntry {
pageId: string;
/** Vault-relative path (forward-slash) of the tracked file. */
relPath: string;
}
/** A page to (re)write at its destination path. */
export interface WriteEntry {
pageId: string;
relPath: string;
}
/** A page that moved: written at its NEW relPath, with the OLD path removed. */
export interface MovedEntry {
pageId: string;
fromRelPath: string;
toRelPath: string;
/**
* Whether the old path (`fromRelPath`) is SAFE to remove. False when another
* live page will (re)write that exact path (path reuse): removing it would
* destroy real data, so the caller must skip the removal. The move itself is
* still recorded (the new path is written regardless).
*/
removeOldPath: boolean;
}
/** The full reconciliation plan. */
export interface ReconciliationPlan {
/**
* Pages present in `live` -> (re)write at their relPath. This naturally
* covers add, content-update (same path) AND move (same pageId, new path),
* since every live page is (re)written regardless of whether it existed.
*/
toWrite: WriteEntry[];
/**
* Vault-relative paths to delete because their tracked pageId is ABSENT from
* `live` (page removed/trashed). This set is ONLY absence-based deletions —
* the OLD paths of moved pages are NOT here (they live in `moved` and are
* applied separately by the caller). Keeping the two apart lets pull.ts gate
* absence deletions behind the incomplete-fetch suppression + mass-delete
* guard (SPEC §8) while still applying real moves.
*/
toDelete: string[];
/**
* Tracked pages whose relPath changed. The caller writes the page at
* `toRelPath`, then removes `fromRelPath` — but ONLY after the new-path write
* succeeded. The old path is NOT in `toDelete`.
*/
moved: MovedEntry[];
}
/**
* Compute the reconciliation plan.
*
* Rules:
* - Every `live` page is written at its relPath (covers add + update + move).
* - A tracked pageId present in `live` whose relPath changed is `moved`; its
* OLD relPath goes into `moved` ONLY (the caller removes it after the new
* path is written) and is NEVER added to `toDelete`.
* - A tracked pageId NOT present in `live` is an ABSENCE delete; its relPath
* is added to `toDelete`.
*
* Notes:
* - Safety filter (no data loss): no path that is a live TARGET path of any
* page is ever deleted/removed (a write owns it). This applies to BOTH the
* absence `toDelete` set AND a moved page's old-path removal — if a moved
* page's OLD path is reused by ANOTHER live page, the move records no old
* path to remove, because that path will be (re)written.
* - `existing` may legitimately contain duplicate pageIds (two stray files
* carrying the same meta pageId); each such file that is not the live target
* path is removed (as an absence/move) so the vault converges to exactly the
* live set.
*/
export declare function planReconciliation(live: LiveEntry[], existing: ExistingEntry[]): ReconciliationPlan;
/**
* Below this many tracked files the mass-delete fraction guard is not applied
* (a tiny vault where deleting "most" files is normal, e.g. 1-of-2).
*/
export declare const MASS_DELETE_MIN_EXISTING = 4;
/** Fraction of tracked files above which a delete plan is a suspected wipe. */
export declare const MASS_DELETE_FRACTION = 0.5;
/** Why absence-based deletions were (or were not) applied this cycle. */
export type DeletionDecision = {
apply: true;
} | {
apply: false;
reason: "incomplete-fetch" | "empty-live" | "mass-delete";
};
/**
* Pure decision: should the ABSENCE-based deletions (`plan.toDelete`) be applied
* this cycle? Encapsulates the SPEC §8 safety invariants so they are unit-
* testable without live creds or git:
*
* - `treeComplete === false` (a partial Docmost tree fetch) -> SUPPRESS. A page
* missing from a partial tree is NOT proof of deletion (SPEC §8); we must not
* delete merely-absent files this cycle. (Writes/updates/moves still happen.)
* - The live fetch returned 0 pages while files are tracked -> SUPPRESS
* (almost always a failed fetch, never a real "delete everything").
* - The plan would delete more than `MASS_DELETE_FRACTION` of a non-trivial
* vault -> SUPPRESS as a mass-deletion guard (defense in depth).
*
* Moves are NOT governed by this decision: a moved page IS present in `live`, so
* its old-path removal is real (handled by the caller separately).
*/
export declare function decideAbsenceDeletions(args: {
treeComplete: boolean;
liveCount: number;
existingCount: number;
deleteCount: number;
}): DeletionDecision;

View File

@@ -0,0 +1,117 @@
/**
* Pure reconciliation planner (SPEC §5/§6/§8).
*
* Given the desired live set of files (computed from the current Docmost tree)
* and the set of files currently tracked in the vault, compute what to write,
* what to move (old path to remove), and what to delete. Identity is `pageId`
* (the stable file<->page anchor, SPEC §4): a page that keeps its pageId but
* changes relPath is a MOVE, not delete+add; a tracked pageId that is gone from
* the live tree is a DELETE.
*
* This module is intentionally PURE (no IO, no git) so the whole plan is
* unit-testable. The actual file writing / git operations happen in pull.ts.
*/
/**
* Compute the reconciliation plan.
*
* Rules:
* - Every `live` page is written at its relPath (covers add + update + move).
* - A tracked pageId present in `live` whose relPath changed is `moved`; its
* OLD relPath goes into `moved` ONLY (the caller removes it after the new
* path is written) and is NEVER added to `toDelete`.
* - A tracked pageId NOT present in `live` is an ABSENCE delete; its relPath
* is added to `toDelete`.
*
* Notes:
* - Safety filter (no data loss): no path that is a live TARGET path of any
* page is ever deleted/removed (a write owns it). This applies to BOTH the
* absence `toDelete` set AND a moved page's old-path removal — if a moved
* page's OLD path is reused by ANOTHER live page, the move records no old
* path to remove, because that path will be (re)written.
* - `existing` may legitimately contain duplicate pageIds (two stray files
* carrying the same meta pageId); each such file that is not the live target
* path is removed (as an absence/move) so the vault converges to exactly the
* live set.
*/
export function planReconciliation(live, existing) {
// Desired path for each live pageId.
const liveByPageId = new Map();
// Set of all paths that WILL be written (never delete/remove one of these).
const liveTargetPaths = new Set();
for (const e of live) {
liveByPageId.set(e.pageId, e.relPath);
liveTargetPaths.add(e.relPath);
}
const toWrite = live.map((e) => ({
pageId: e.pageId,
relPath: e.relPath,
}));
const moved = [];
// Absence-based deletions ONLY (tracked pageId absent from `live`). Use a Set
// so the same path coming from multiple existing rows is queued only once.
const toDeleteSet = new Set();
for (const ex of existing) {
const liveRel = liveByPageId.get(ex.pageId);
if (liveRel === undefined) {
// Tracked page is gone from the live tree -> absence delete.
// Never queue a path a live page will (re)write (path reuse -> no loss).
if (!liveTargetPaths.has(ex.relPath))
toDeleteSet.add(ex.relPath);
continue;
}
if (liveRel !== ex.relPath) {
// Same pageId, different path -> a MOVE. Record it so the caller can write
// the new path first, then remove the old one. If the old path is itself a
// live target (reused by another page), it must NOT be removed — the write
// owns it — so flag `removeOldPath: false` (move still recorded).
moved.push({
pageId: ex.pageId,
fromRelPath: ex.relPath,
toRelPath: liveRel,
removeOldPath: !liveTargetPaths.has(ex.relPath),
});
}
// liveRel === ex.relPath -> content-update in place; nothing extra to do
// (the write above re-emits the file; identical bytes => git no-op).
}
const toDelete = [...toDeleteSet];
return { toWrite, toDelete, moved };
}
/**
* Below this many tracked files the mass-delete fraction guard is not applied
* (a tiny vault where deleting "most" files is normal, e.g. 1-of-2).
*/
export const MASS_DELETE_MIN_EXISTING = 4;
/** Fraction of tracked files above which a delete plan is a suspected wipe. */
export const MASS_DELETE_FRACTION = 0.5;
/**
* Pure decision: should the ABSENCE-based deletions (`plan.toDelete`) be applied
* this cycle? Encapsulates the SPEC §8 safety invariants so they are unit-
* testable without live creds or git:
*
* - `treeComplete === false` (a partial Docmost tree fetch) -> SUPPRESS. A page
* missing from a partial tree is NOT proof of deletion (SPEC §8); we must not
* delete merely-absent files this cycle. (Writes/updates/moves still happen.)
* - The live fetch returned 0 pages while files are tracked -> SUPPRESS
* (almost always a failed fetch, never a real "delete everything").
* - The plan would delete more than `MASS_DELETE_FRACTION` of a non-trivial
* vault -> SUPPRESS as a mass-deletion guard (defense in depth).
*
* Moves are NOT governed by this decision: a moved page IS present in `live`, so
* its old-path removal is real (handled by the caller separately).
*/
export function decideAbsenceDeletions(args) {
const { treeComplete, liveCount, existingCount, deleteCount } = args;
// No tracked files, or nothing to delete -> trivially fine to "apply".
if (existingCount === 0 || deleteCount === 0)
return { apply: true };
if (!treeComplete)
return { apply: false, reason: "incomplete-fetch" };
if (liveCount === 0)
return { apply: false, reason: "empty-live" };
if (existingCount >= MASS_DELETE_MIN_EXISTING &&
deleteCount > existingCount * MASS_DELETE_FRACTION) {
return { apply: false, reason: "mass-delete" };
}
return { apply: true };
}

View File

@@ -0,0 +1,21 @@
/**
* Pure, IO-free comparison helpers for the idempotency round-trip checks. The
* round-trip harness that drives these lives in the package's tests, not in the
* engine.
*/
/**
* Recursively strip every `attrs.id` from a ProseMirror node tree. Block ids
* are regenerated by `markdownToProseMirror` (SPEC §11), so they must be
* ignored when comparing the semantic shape of two documents. Returns a NEW
* tree; the input is not mutated.
*/
export declare function stripBlockIds(node: any): any;
/**
* Find the first divergence between two values via a recursive deep compare.
* Returns a short path + the two differing values, or null if they are equal.
*/
export declare function firstDivergence(a: any, b: any, path?: string): {
path: string;
a: any;
b: any;
} | null;

View File

@@ -0,0 +1,70 @@
/**
* Pure, IO-free comparison helpers for the idempotency round-trip checks. The
* round-trip harness that drives these lives in the package's tests, not in the
* engine.
*/
/**
* Recursively strip every `attrs.id` from a ProseMirror node tree. Block ids
* are regenerated by `markdownToProseMirror` (SPEC §11), so they must be
* ignored when comparing the semantic shape of two documents. Returns a NEW
* tree; the input is not mutated.
*/
export function stripBlockIds(node) {
if (Array.isArray(node)) {
return node.map(stripBlockIds);
}
if (node && typeof node === "object") {
const out = {};
for (const key of Object.keys(node)) {
if (key === "attrs" && node.attrs && typeof node.attrs === "object") {
// Drop the `id` attr; keep every other attribute.
const { id, ...rest } = node.attrs;
void id;
out.attrs = stripBlockIds(rest);
}
else {
out[key] = stripBlockIds(node[key]);
}
}
return out;
}
return node;
}
/**
* Find the first divergence between two values via a recursive deep compare.
* Returns a short path + the two differing values, or null if they are equal.
*/
export function firstDivergence(a, b, path = "$") {
if (a === b)
return null;
const ta = typeof a;
const tb = typeof b;
if (ta !== tb || a === null || b === null) {
return { path, a, b };
}
if (ta !== "object") {
return { path, a, b };
}
const aIsArr = Array.isArray(a);
const bIsArr = Array.isArray(b);
if (aIsArr !== bIsArr)
return { path, a, b };
if (aIsArr) {
if (a.length !== b.length) {
return { path: `${path}.length`, a: a.length, b: b.length };
}
for (let i = 0; i < a.length; i++) {
const d = firstDivergence(a[i], b[i], `${path}[${i}]`);
if (d)
return d;
}
return null;
}
const keys = new Set([...Object.keys(a), ...Object.keys(b)]);
for (const k of keys) {
const d = firstDivergence(a[k], b[k], `${path}.${k}`);
if (d)
return d;
}
return null;
}

View File

@@ -0,0 +1,23 @@
/**
* Deterministic filename strategy (SPEC §12).
*
* The file name is COSMETIC — the source of truth for the file<->page link is
* `pageId` / `slugId` inside the meta block, so renaming a file is safe. These
* functions are intentionally dependency-free and pure, so they are trivially
* unit-testable.
*/
/**
* Sanitize a page title into a safe file-name component (WITHOUT extension).
*
* Steps: replace forbidden / control characters with "-", collapse whitespace
* runs to a single space, trim, cap the length, then guard against an empty
* result, an all-dots result, or a reserved Windows device name by prefixing
* with "_".
*/
export declare function sanitizeTitle(title: string): string;
/**
* Disambiguate a sanitized name when two siblings in the same folder collapse
* to the same name. Appends a stable suffix built from the page's `slugId`, so
* the result stays deterministic across runs (SPEC §12: `Title ~slugId`).
*/
export declare function disambiguate(name: string, slugId: string): string;

View File

@@ -0,0 +1,97 @@
/**
* Deterministic filename strategy (SPEC §12).
*
* The file name is COSMETIC — the source of truth for the file<->page link is
* `pageId` / `slugId` inside the meta block, so renaming a file is safe. These
* functions are intentionally dependency-free and pure, so they are trivially
* unit-testable.
*/
// Printable characters forbidden in file names on common filesystems (mainly
// Windows): / \ < > : " | ? *. Each match is replaced with a single "-".
// Spaces are NOT in this set; whitespace is normalized separately below.
// ASCII control characters (code points 0..31) are stripped in a separate pass
// (see stripControlChars) to keep this literal free of embedded control bytes.
const FORBIDDEN_PRINTABLE_RE = /[/\\<>:"|?*]/g;
// Runs of whitespace (including tabs/newlines) collapse to a single space.
const WHITESPACE_RUN_RE = /\s+/g;
// Reserved Windows device names (case-insensitive). A bare match (with or
// without an extension) is unusable as a file name, so it is prefixed with "_".
const RESERVED_WINDOWS_NAMES = new Set([
"con",
"prn",
"aux",
"nul",
"com1",
"com2",
"com3",
"com4",
"com5",
"com6",
"com7",
"com8",
"com9",
"lpt1",
"lpt2",
"lpt3",
"lpt4",
"lpt5",
"lpt6",
"lpt7",
"lpt8",
"lpt9",
]);
// Cap on the sanitized length to stay well within filesystem path-component
// limits (255 bytes on most FSes) while leaving room for an extension and a
// disambiguation suffix.
const MAX_LENGTH = 120;
/**
* Replace every ASCII control character (code points 0..31) with "-". Done by
* scanning code points rather than a control-range regex literal, so the source
* file carries no embedded control bytes.
*/
function stripControlChars(input) {
let out = "";
for (let i = 0; i < input.length; i++) {
out += input.charCodeAt(i) < 32 ? "-" : input[i];
}
return out;
}
/**
* Sanitize a page title into a safe file-name component (WITHOUT extension).
*
* Steps: replace forbidden / control characters with "-", collapse whitespace
* runs to a single space, trim, cap the length, then guard against an empty
* result, an all-dots result, or a reserved Windows device name by prefixing
* with "_".
*/
export function sanitizeTitle(title) {
let name = stripControlChars(title ?? "")
.replace(FORBIDDEN_PRINTABLE_RE, "-")
.replace(WHITESPACE_RUN_RE, " ")
.trim();
if (name.length > MAX_LENGTH) {
name = name.slice(0, MAX_LENGTH).trim();
}
// Compare the base name (before the first dot) against reserved names, so
// both "CON" and "con.md" are caught.
const base = name.split(".")[0]?.toLowerCase() ?? "";
// A name that is empty, consists only of dots ("." / ".." / "..."), or is a
// reserved Windows device name is unusable as a path component. The all-dots
// case is a path-traversal hazard in particular: an unprefixed ".." would
// become a parent-directory segment and let a page escape the vault, so it
// MUST be neutralized here (becomes "_..", which is a literal file name).
if (name.length === 0 ||
/^\.+$/.test(name) ||
RESERVED_WINDOWS_NAMES.has(base)) {
name = "_" + name;
}
return name;
}
/**
* Disambiguate a sanitized name when two siblings in the same folder collapse
* to the same name. Appends a stable suffix built from the page's `slugId`, so
* the result stays deterministic across runs (SPEC §12: `Title ~slugId`).
*/
export function disambiguate(name, slugId) {
return `${name} ~${slugId}`;
}

View File

@@ -0,0 +1,41 @@
/**
* Engine settings.
*
* The engine is driven IN-PROCESS by the NestJS server, which builds the
* `Settings` object from `EnvironmentService` — so this module must NOT reach
* into `process.env`. It exposes only:
* - the `Settings` type the engine consumes, and
* - `parseSettings(env)` as a PURE function (validate a raw env object -> typed
* `Settings`), kept for unit tests and for the server to reuse if it wants
* to validate an env-shaped object.
* There is no `.env`-loading side-effecting entry point.
*/
import { z } from 'zod';
export declare const envSchema: z.ZodObject<{
DOCMOST_API_URL: z.ZodString;
DOCMOST_EMAIL: z.ZodString;
DOCMOST_PASSWORD: z.ZodString;
DOCMOST_SPACE_ID: z.ZodString;
VAULT_PATH: z.ZodDefault<z.ZodString>;
GIT_REMOTE: z.ZodPipe<z.ZodTransform<unknown, unknown>, z.ZodOptional<z.ZodString>>;
POLL_INTERVAL_MS: z.ZodDefault<z.ZodCoercedNumber<unknown>>;
DEBOUNCE_MS: z.ZodDefault<z.ZodCoercedNumber<unknown>>;
LOG_LEVEL: z.ZodDefault<z.ZodEnum<{
info: "info";
error: "error";
debug: "debug";
warn: "warn";
}>>;
}, z.core.$strip>;
export type Settings = {
docmostApiUrl: string;
docmostEmail: string;
docmostPassword: string;
docmostSpaceId: string;
vaultPath: string;
gitRemote?: string;
pollIntervalMs: number;
debounceMs: number;
logLevel: 'debug' | 'info' | 'warn' | 'error';
};
export declare function parseSettings(env: NodeJS.ProcessEnv): Settings;

View File

@@ -0,0 +1,49 @@
/**
* Engine settings.
*
* The engine is driven IN-PROCESS by the NestJS server, which builds the
* `Settings` object from `EnvironmentService` — so this module must NOT reach
* into `process.env`. It exposes only:
* - the `Settings` type the engine consumes, and
* - `parseSettings(env)` as a PURE function (validate a raw env object -> typed
* `Settings`), kept for unit tests and for the server to reuse if it wants
* to validate an env-shaped object.
* There is no `.env`-loading side-effecting entry point.
*/
import { z } from 'zod';
// Schema keyed by the real ENV variable names so validation errors name the
// exact variable. Credentials and the address of our OWN Docmost instance have
// NO default — a missing value must fail at startup, never silently fall back.
export const envSchema = z.object({
// Docmost connection — address of our own instance, no default.
DOCMOST_API_URL: z.string().url(),
// Credentials for /auth/login — no default, never hardcoded.
DOCMOST_EMAIL: z.string().min(1),
DOCMOST_PASSWORD: z.string().min(1),
// Which Docmost space to mirror.
DOCMOST_SPACE_ID: z.string().min(1),
// Local git vault (state store) — kept under data/ so the volume persists it.
VAULT_PATH: z.string().min(1).default('data/vault'),
// Optional git remote the vault pushes to. Empty string is treated as unset.
GIT_REMOTE: z.preprocess((v) => (v === '' ? undefined : v), z.string().min(1).optional()),
// Non-secret tunables — sensible defaults are fine.
POLL_INTERVAL_MS: z.coerce.number().int().positive().default(15000),
DEBOUNCE_MS: z.coerce.number().int().positive().default(2000),
LOG_LEVEL: z.enum(['debug', 'info', 'warn', 'error']).default('info'),
});
// Pure: validate a raw environment object and map it to a typed Settings.
// Throws ZodError on bad config. No side effects — safe to import in tests.
export function parseSettings(env) {
const e = envSchema.parse(env);
return {
docmostApiUrl: e.DOCMOST_API_URL,
docmostEmail: e.DOCMOST_EMAIL,
docmostPassword: e.DOCMOST_PASSWORD,
docmostSpaceId: e.DOCMOST_SPACE_ID,
vaultPath: e.VAULT_PATH,
gitRemote: e.GIT_REMOTE,
pollIntervalMs: e.POLL_INTERVAL_MS,
debounceMs: e.DEBOUNCE_MS,
logLevel: e.LOG_LEVEL,
};
}

View File

@@ -0,0 +1,41 @@
/**
* Meta object as `exportPageBody` builds it (SPEC §4). Kept byte-for-byte
* compatible so files produced here match `exportPageBody`'s output exactly.
*/
export interface PageMeta {
version: 1;
pageId: string;
slugId: string;
title: string;
spaceId: string;
parentPageId: string | null;
}
/**
* Produce the self-contained `.md` file text for a page from its raw
* ProseMirror `content` + identity meta, in the verified fixpoint form.
*
* md1 = convertProseMirrorToMarkdown(content)
* doc2 = markdownToProseMirror(md1) // one import...
* stableBody = convertProseMirrorToMarkdown(doc2) // ...and re-export
* file = serializeDocmostMarkdownBody(meta, stableBody)
*
* The single export->import->export pass is the verified fixpoint (SPEC §11):
* idempotent for already-stable content, and the convergence point for the
* known converter asymmetries.
*/
export declare function stabilizePageFile(content: unknown, meta: PageMeta): Promise<string>;
/**
* The fixpoint markdown BODY for a page's ProseMirror `content`, WITHOUT any meta
* envelope:
*
* md1 = convertProseMirrorToMarkdown(content) // export...
* doc2 = markdownToProseMirror(md1) // ...import...
* stableBody = convertProseMirrorToMarkdown(doc2) // ...re-export
*
* The single export->import->export pass is the verified fixpoint (SPEC §11):
* idempotent for already-stable content, and the convergence point for the known
* converter asymmetries. The native-Obsidian writer (`serializePageFile`) wraps
* this body with a minimal `gitmost_id` frontmatter; determinism here is what
* keeps re-pulls of an unchanged page byte-identical (no churn, loop-guard).
*/
export declare function stabilizePageBody(content: unknown): Promise<string>;

View File

@@ -0,0 +1,52 @@
/**
* Normalize-on-write helper (SPEC §11 "Резолюция").
*
* git diffs byte-for-byte, so writing a page in a NON-fixpoint markdown form
* would make the next pull re-export it to a slightly different (but stable)
* form and produce a phantom diff -> churny commits. The converter has a couple
* of known one-pass asymmetries (a block image after a paragraph adds an empty
* paragraph; a diagram materializes `data-align`), all of which converge to a
* fixpoint after ONE `export -> import -> export` round-trip.
*
* So at write time we run exactly that one pass and persist the fixpoint form.
* Already-stable content is unaffected (the pass is idempotent), so re-pulls of
* unchanged pages produce identical bytes and git sees no diff.
*/
import { convertProseMirrorToMarkdown, markdownToProseMirror, serializeDocmostMarkdownBody, } from "../lib/index.js";
/**
* Produce the self-contained `.md` file text for a page from its raw
* ProseMirror `content` + identity meta, in the verified fixpoint form.
*
* md1 = convertProseMirrorToMarkdown(content)
* doc2 = markdownToProseMirror(md1) // one import...
* stableBody = convertProseMirrorToMarkdown(doc2) // ...and re-export
* file = serializeDocmostMarkdownBody(meta, stableBody)
*
* The single export->import->export pass is the verified fixpoint (SPEC §11):
* idempotent for already-stable content, and the convergence point for the
* known converter asymmetries.
*/
export async function stabilizePageFile(content, meta) {
// The meta shape is exactly what `exportPageBody` writes; cast to the lib's
// DocmostMdMeta (a superset with optional fields) for the serializer.
return serializeDocmostMarkdownBody(meta, await stabilizePageBody(content));
}
/**
* The fixpoint markdown BODY for a page's ProseMirror `content`, WITHOUT any meta
* envelope:
*
* md1 = convertProseMirrorToMarkdown(content) // export...
* doc2 = markdownToProseMirror(md1) // ...import...
* stableBody = convertProseMirrorToMarkdown(doc2) // ...re-export
*
* The single export->import->export pass is the verified fixpoint (SPEC §11):
* idempotent for already-stable content, and the convergence point for the known
* converter asymmetries. The native-Obsidian writer (`serializePageFile`) wraps
* this body with a minimal `gitmost_id` frontmatter; determinism here is what
* keeps re-pulls of an unchanged page byte-identical (no churn, loop-guard).
*/
export async function stabilizePageBody(content) {
const md1 = convertProseMirrorToMarkdown(content);
const doc2 = await markdownToProseMirror(md1);
return convertProseMirrorToMarkdown(doc2);
}

31
packages/git-sync/build/index.d.ts vendored Normal file
View File

@@ -0,0 +1,31 @@
/**
* Public surface of `@docmost/git-sync`.
*
* Exposes the pure converter (markdown <-> ProseMirror, file envelope,
* canonicalization) and the sync engine (reconcile planner, vault layout,
* pull/push, the git wrapper, and the settings parser) that the gitmost server
* drives in-process.
*/
export { serializeDocmostMarkdown, serializeDocmostMarkdownBody, parseDocmostMarkdown, convertProseMirrorToMarkdown, markdownToProseMirror, canonicalizeContent, docsCanonicallyEqual, } from "./lib/index.js";
export type { DocmostMdMeta } from "./lib/index.js";
export { planReconciliation, decideAbsenceDeletions, MASS_DELETE_MIN_EXISTING, MASS_DELETE_FRACTION, } from "./engine/reconcile.js";
export type { LiveEntry, ExistingEntry, WriteEntry, MovedEntry, ReconciliationPlan, DeletionDecision, } from "./engine/reconcile.js";
export { buildVaultLayout } from "./engine/layout.js";
export type { PageNode, VaultEntry } from "./engine/layout.js";
export { sanitizeTitle, disambiguate } from "./engine/sanitize.js";
export { stabilizePageFile } from "./engine/stabilize.js";
export type { PageMeta } from "./engine/stabilize.js";
export { bodyHash } from "./engine/loop-guard.js";
export type { GitSyncClient, GitSyncPageNodeLite } from "./engine/client.types.js";
export { VaultGit, vaultGitEnv, buildCommitMessage, BOT_AUTHOR_NAME, BOT_AUTHOR_EMAIL, DEFAULT_BRANCH, } from "./engine/git.js";
export type { DiffEntry, MergeResult, CommitOptions } from "./engine/git.js";
export { readExisting, computePullActions, applyPullActions, } from "./engine/pull.js";
export type { ReadExistingDeps, PullActionsInput, PullActions, ApplyPullActionsDeps, ApplyResult, } from "./engine/pull.js";
export { classifyRenameMoves, computePushActions, applyPushActions, runPush, parentFolderFile, parseArgs, LAST_PUSHED_REF, DOCMOST_BRANCH, LOCAL_AUTHOR_NAME, LOCAL_AUTHOR_EMAIL, LOCAL_SOURCE_TRAILER, } from "./engine/push.js";
export type { CreateAction, UpdateAction, DeleteAction, RenameMoveAction, RenameMoveActionClassified, ClassifyRenameMovesDeps, PushActions, PushActionsInput, MetaSide, ApplyPushDeps, WrittenBackPage, PushedPageRecord, PushFailure, PushNoop, ApplyPushResult, PushDeps, PushRunResult, PushParsedArgs, } from "./engine/push.js";
export { parseSettings, envSchema } from "./engine/settings.js";
export type { Settings } from "./engine/settings.js";
export { loadSettingsOrExit } from "./engine/config-errors.js";
export { runCycle } from "./engine/cycle.js";
export type { RunCycleDeps, RunCycleResult, CycleFs, } from "./engine/cycle.js";
export { parsePageFile, serializePageFile } from "./lib/page-file.js";

View File

@@ -0,0 +1,24 @@
/**
* Public surface of `@docmost/git-sync`.
*
* Exposes the pure converter (markdown <-> ProseMirror, file envelope,
* canonicalization) and the sync engine (reconcile planner, vault layout,
* pull/push, the git wrapper, and the settings parser) that the gitmost server
* drives in-process.
*/
// Pure converter (markdown <-> ProseMirror, file envelope, canonicalization).
export { serializeDocmostMarkdown, serializeDocmostMarkdownBody, parseDocmostMarkdown, convertProseMirrorToMarkdown, markdownToProseMirror, canonicalizeContent, docsCanonicallyEqual, } from "./lib/index.js";
// Pure engine (no IO): reconcile planner, vault layout, sanitize, stabilize,
// loop-guard body hash.
export { planReconciliation, decideAbsenceDeletions, MASS_DELETE_MIN_EXISTING, MASS_DELETE_FRACTION, } from "./engine/reconcile.js";
export { buildVaultLayout } from "./engine/layout.js";
export { sanitizeTitle, disambiguate } from "./engine/sanitize.js";
export { stabilizePageFile } from "./engine/stabilize.js";
export { bodyHash } from "./engine/loop-guard.js";
export { VaultGit, vaultGitEnv, buildCommitMessage, BOT_AUTHOR_NAME, BOT_AUTHOR_EMAIL, DEFAULT_BRANCH, } from "./engine/git.js";
export { readExisting, computePullActions, applyPullActions, } from "./engine/pull.js";
export { classifyRenameMoves, computePushActions, applyPushActions, runPush, parentFolderFile, parseArgs, LAST_PUSHED_REF, DOCMOST_BRANCH, LOCAL_AUTHOR_NAME, LOCAL_AUTHOR_EMAIL, LOCAL_SOURCE_TRAILER, } from "./engine/push.js";
export { parseSettings, envSchema } from "./engine/settings.js";
export { loadSettingsOrExit } from "./engine/config-errors.js";
export { runCycle } from "./engine/cycle.js";
export { parsePageFile, serializePageFile } from "./lib/page-file.js";

View File

@@ -0,0 +1,38 @@
/**
* Semantic canonicalization of ProseMirror/TipTap documents for the round-trip
* idempotency check (SPEC §11, "Задача №0", option (б): compare a CANONICALIZED
* form rather than raw bytes).
*
* `markdownToProseMirror` reconstructs schema DEFAULT attributes (e.g.
* `indent: null` where the source omitted it) and regenerates per-block ids on
* every import. A raw deep-equal of the source doc against the re-imported doc
* therefore diverges even when the two are semantically identical. This module
* normalizes a document so that two semantically-equal docs compare deep-equal
* regardless of block ids and absent-vs-explicit-default-null attributes.
*
* It is a self-contained module with no external dependencies.
*/
/**
* Return a DEEP COPY of a ProseMirror node tree, canonicalized so that two
* semantically-equal documents compare deep-equal. Rules (applied recursively
* to the node, its `content`, and its `marks`):
*
* 1. Remove node-level `attrs.id` (regenerated on import). Mark attrs are NOT
* touched for `id` (marks carry no block id; only their meaningful attrs).
* 2. In any `attrs` object (node OR mark) drop keys whose value is `null`/
* `undefined` (absent ≡ explicit default null) OR equals that node/mark
* type's known non-null schema default (absent ≡ explicit default).
* Keep every non-default value. The type is passed into the attrs
* normalizer so it can look up `KNOWN_DEFAULTS`.
* 3. If an `attrs` object becomes empty after pruning, drop the `attrs` key.
* 4. Preserve `marks` (including the `comment` mark and its `commentId` — a
* meaningful anchor per SPEC §3; never strip it).
* 5. Preserve `text`, `type`, and `content` order exactly.
* 6. Never mutate the input.
*/
export declare function canonicalizeContent(node: any): any;
/**
* True when two ProseMirror documents are semantically equal: equal after
* canonicalization (block ids stripped, absent-vs-default-null normalized).
*/
export declare function docsCanonicallyEqual(a: any, b: any): boolean;

View File

@@ -0,0 +1,245 @@
/**
* Semantic canonicalization of ProseMirror/TipTap documents for the round-trip
* idempotency check (SPEC §11, "Задача №0", option (б): compare a CANONICALIZED
* form rather than raw bytes).
*
* `markdownToProseMirror` reconstructs schema DEFAULT attributes (e.g.
* `indent: null` where the source omitted it) and regenerates per-block ids on
* every import. A raw deep-equal of the source doc against the re-imported doc
* therefore diverges even when the two are semantically identical. This module
* normalizes a document so that two semantically-equal docs compare deep-equal
* regardless of block ids and absent-vs-explicit-default-null attributes.
*
* It is a self-contained module with no external dependencies.
*/
/**
* Known NON-NULL schema defaults that `markdownToProseMirror` materializes on
* import, keyed by node/mark type → { attr: defaultValue }.
*
* Why this exists: `canonicalizeAttrs` already treats an absent attr as
* equivalent to an explicit `null`/`undefined`. But several Docmost schema
* attributes default to a NON-null value, so import fills them in even when the
* source omitted them — making "attr absent" diverge from "attr at its default
* value" under a raw deep-equal. To keep "absent ≡ explicit-default", we ALSO
* drop any attr whose value equals its known schema default. A non-default
* value (e.g. `orderedList.start: 5`) is NOT a default, so it is KEPT.
*
* Every entry below was read from `packages/docmost-client/src/lib/
* docmost-schema.ts` (the line refs are the exact `default:` declarations) and
* confirmed to be materialized by an export→import→export round-trip:
* - mark `link` target / rel — DocmostAttributes + StarterKit link.
* StarterKit's link extension defaults `target: "_blank"` and
* `rel: "noopener noreferrer nofollow"`; both materialize on import
* (empirically confirmed) even when the source had only `href`.
* - mark `comment` resolved — docmost-schema.ts L213-214 (`default: false`).
* - node `orderedList` start — provided by StarterKit's orderedList
* (`default: 1`); materializes on import (empirically confirmed).
* - node `drawio`/`excalidraw`/`video`/`youtube`/`embed` align — the diagram
* attribute set and the media nodes declare `align: { default: "center" }`
* (docmost-schema.ts L745-750 diagramAttributes; L564 video; L626 youtube;
* L667 embed). The diagram `align` is the one the round-trip materializes
* (docmost-schema.ts L745); the media/embed entries normalize the SAME
* `align` default for consistency. Note: this only normalizes `align` —
* full canonical stability of `embed` is separately limited by the
* converter coercing numeric `width`/`height` to strings, which is outside
* canonicalize's scope.
*
* NOTE: `image` has NO non-null align default — its `align` defaults to `null`
* (docmost-schema.ts L174), so it is already handled by the null-drop rule and
* is intentionally NOT listed here.
*/
const KNOWN_DEFAULTS = {
// mark types
link: {
target: "_blank",
rel: "noopener noreferrer nofollow",
},
comment: {
resolved: false,
},
// node types
orderedList: {
start: 1,
},
drawio: {
align: "center",
},
excalidraw: {
align: "center",
},
video: {
align: "center",
},
youtube: {
align: "center",
},
embed: {
align: "center",
},
};
/**
* Prune an `attrs` object in place on a fresh copy: drop keys whose value is
* `null` or `undefined` (an absent attribute and an explicit default of `null`
* are semantically equivalent here). Optionally also drop a node-level `id`
* (block ids are regenerated on import, SPEC §11). ALSO drop any attr whose
* value equals the node/mark `type`'s known NON-null schema default
* (`KNOWN_DEFAULTS`), so "attr absent" ≡ "attr at its default value" — without
* this, the import-materialized `link.target`/`comment.resolved`/
* `orderedList.start`/diagram `align` defaults would be a phantom diff. Every
* non-default attribute value is KEPT (level, language, src, href, commentId,
* width, a non-default `start`/`align`, ...).
*
* Returns the pruned attrs object, or `undefined` if nothing meaningful is
* left (so the caller can drop the `attrs` key entirely: `{attrs:{}}` ≡ no
* attrs).
*/
function canonicalizeAttrs(attrs, dropId, type) {
const defaults = type ? KNOWN_DEFAULTS[type] : undefined;
const out = {};
// Stable key order so a JSON.stringify of the canonical form is comparable
// regardless of the input's key order.
for (const key of Object.keys(attrs).sort()) {
// Block ids are regenerated on import; drop them on NODE attrs only.
if (dropId && key === "id")
continue;
const value = attrs[key];
// Absent ≡ explicit-default-null/undefined.
if (value === null || value === undefined)
continue;
// Absent ≡ explicit known non-null default (e.g. link.target="_blank").
// A non-default value (e.g. orderedList.start=5) does NOT match, so it is
// kept. The `comment` mark's `commentId` is never a default, so it always
// survives (SPEC §3); only its `resolved: false` default is normalized away.
if (defaults && key in defaults && value === defaults[key])
continue;
out[key] = value;
}
return Object.keys(out).length > 0 ? out : undefined;
}
/**
* Return a DEEP COPY of a ProseMirror node tree, canonicalized so that two
* semantically-equal documents compare deep-equal. Rules (applied recursively
* to the node, its `content`, and its `marks`):
*
* 1. Remove node-level `attrs.id` (regenerated on import). Mark attrs are NOT
* touched for `id` (marks carry no block id; only their meaningful attrs).
* 2. In any `attrs` object (node OR mark) drop keys whose value is `null`/
* `undefined` (absent ≡ explicit default null) OR equals that node/mark
* type's known non-null schema default (absent ≡ explicit default).
* Keep every non-default value. The type is passed into the attrs
* normalizer so it can look up `KNOWN_DEFAULTS`.
* 3. If an `attrs` object becomes empty after pruning, drop the `attrs` key.
* 4. Preserve `marks` (including the `comment` mark and its `commentId` — a
* meaningful anchor per SPEC §3; never strip it).
* 5. Preserve `text`, `type`, and `content` order exactly.
* 6. Never mutate the input.
*/
export function canonicalizeContent(node) {
if (Array.isArray(node)) {
return node.map((child) => canonicalizeContent(child));
}
if (node === null || typeof node !== "object") {
// Primitive leaf (string/number/boolean/null): returned as-is.
return node;
}
// A node is a mark when it has a `type` but never carries block `content`
// and lives inside a `marks` array. We cannot tell from the node alone, so
// we distinguish at the recursion site: node `attrs` drop `id`, mark `attrs`
// do not. This is handled by passing a `dropId` flag down for the `attrs`
// key specifically (nodes) vs the `marks[].attrs` path (marks).
const out = {};
for (const key of Object.keys(node)) {
if (key === "attrs" && node.attrs && typeof node.attrs === "object") {
// Node-level attrs: drop the block id, null/undefined attrs, and any
// attr at this node type's known non-null schema default.
const canon = canonicalizeAttrs(node.attrs, true, typeof node.type === "string" ? node.type : undefined);
if (canon !== undefined)
out.attrs = canon;
// else: drop the `attrs` key entirely (rule 3).
}
else if (key === "marks" && Array.isArray(node.marks)) {
// Marks: keep them all (incl. comment); canonicalize their attrs but do
// NOT drop `id` (a mark's `id` would be a meaningful attr, not a block
// id). An empty marks array is dropped so `marks:[]` ≡ no marks.
const marks = node.marks.map((mark) => canonicalizeMark(mark));
if (marks.length > 0)
out.marks = marks;
}
else {
out[key] = canonicalizeContent(node[key]);
}
}
return out;
}
/**
* Canonicalize a single mark: keep `type`, prune its `attrs` (null/undefined
* AND known non-null defaults dropped, empty attrs removed) but NEVER drop a
* mark's attribute as a "block id" — marks have no block id, only meaningful
* attrs (href, commentId, color, level, ...). Meaningful NON-default attrs
* survive (the `comment` mark's `commentId` is never a default, so it always
* survives — SPEC §3); only known defaults like `link.target="_blank"`,
* `link.rel="noopener…"` and `comment.resolved=false` are normalized away.
*/
function canonicalizeMark(mark) {
if (mark === null || typeof mark !== "object")
return mark;
const out = {};
for (const key of Object.keys(mark)) {
if (key === "attrs" && mark.attrs && typeof mark.attrs === "object") {
const canon = canonicalizeAttrs(mark.attrs, false, typeof mark.type === "string" ? mark.type : undefined);
if (canon !== undefined)
out.attrs = canon;
}
else {
out[key] = canonicalizeContent(mark[key]);
}
}
return out;
}
/**
* Deep structural equality of two values that is key-order-insensitive.
* Used to compare canonical forms. (`canonicalizeContent` already emits
* `attrs` in a stable key order, but the top-level node keys preserve input
* order, so we compare structurally rather than by string.)
*/
function deepEqual(a, b) {
if (a === b)
return true;
if (typeof a !== typeof b)
return false;
if (a === null || b === null)
return a === b;
if (typeof a !== "object")
return false;
const aIsArr = Array.isArray(a);
const bIsArr = Array.isArray(b);
if (aIsArr !== bIsArr)
return false;
if (aIsArr) {
if (a.length !== b.length)
return false;
for (let i = 0; i < a.length; i++) {
if (!deepEqual(a[i], b[i]))
return false;
}
return true;
}
const aKeys = Object.keys(a);
const bKeys = Object.keys(b);
if (aKeys.length !== bKeys.length)
return false;
for (const k of aKeys) {
if (!Object.prototype.hasOwnProperty.call(b, k))
return false;
if (!deepEqual(a[k], b[k]))
return false;
}
return true;
}
/**
* True when two ProseMirror documents are semantically equal: equal after
* canonicalization (block ids stripped, absent-vs-default-null normalized).
*/
export function docsCanonicallyEqual(a, b) {
return deepEqual(canonicalizeContent(a), canonicalizeContent(b));
}

54
packages/git-sync/build/lib/diff.d.ts vendored Normal file
View File

@@ -0,0 +1,54 @@
/**
* Headless, Docmost-equivalent document diff.
*
* Docmost's history editor computes a change set with the exact pipeline below
* (recreateTransform -> ChangeSet.addSteps -> simplifyChanges) and renders it as
* editor decorations. This module runs the SAME computation but serializes the
* result to text + integrity counts instead of decorations, so a diff can be
* previewed without a browser.
*
* recreateTransform here comes from @fellow/prosemirror-recreate-transform, the
* maintained published fork of the MIT prosemirror-recreate-steps source that
* Docmost vendors in @docmost/editor-ext; it exposes the identical
* recreateTransform(fromDoc, toDoc, { complexSteps, wordDiffs, simplifyDiff })
* signature.
*
* If recreateTransform / the changeset throws on a pathological document pair,
* we fall back to a coarse block-level text diff so the tool never hard-fails.
*/
/** A single inserted/deleted change with its containing-block context. */
export interface DiffChange {
op: "insert" | "delete";
/** Lead (plain) text of the block that contains the change, for context. */
block: string;
/** The inserted or deleted text. */
text: string;
}
/** Integrity counts as [old, new] tuples; footnoteMarkers as [oldList, newList]. */
export interface DiffIntegrity {
images: [number, number];
links: [number, number];
tables: [number, number];
callouts: [number, number];
footnoteMarkers: [number[], number[]];
}
export interface DiffResult {
summary: {
inserted: number;
deleted: number;
blocksChanged: number;
};
integrity: DiffIntegrity;
changes: DiffChange[];
/** Human-readable unified-ish summary. */
markdown: string;
}
/**
* Diff two ProseMirror JSON documents the way Docmost's history editor does and
* serialize the result to text + integrity counts.
*
* @param oldDocJson the earlier document
* @param newDocJson the later document
* @param notesHeading heading delimiting body from notes for footnote counting
*/
export declare function diffDocs(oldDocJson: any, newDocJson: any, notesHeading?: string): DiffResult;

View File

@@ -0,0 +1,273 @@
/**
* Headless, Docmost-equivalent document diff.
*
* Docmost's history editor computes a change set with the exact pipeline below
* (recreateTransform -> ChangeSet.addSteps -> simplifyChanges) and renders it as
* editor decorations. This module runs the SAME computation but serializes the
* result to text + integrity counts instead of decorations, so a diff can be
* previewed without a browser.
*
* recreateTransform here comes from @fellow/prosemirror-recreate-transform, the
* maintained published fork of the MIT prosemirror-recreate-steps source that
* Docmost vendors in @docmost/editor-ext; it exposes the identical
* recreateTransform(fromDoc, toDoc, { complexSteps, wordDiffs, simplifyDiff })
* signature.
*
* If recreateTransform / the changeset throws on a pathological document pair,
* we fall back to a coarse block-level text diff so the tool never hard-fails.
*/
import { getSchema } from "@tiptap/core";
import { Node } from "@tiptap/pm/model";
import { ChangeSet, simplifyChanges } from "@tiptap/pm/changeset";
import { recreateTransform } from "@fellow/prosemirror-recreate-transform";
import { docmostExtensions } from "./docmost-schema.js";
/** Build the schema once; it is pure and reused across calls. */
const schema = getSchema(docmostExtensions);
/** Recursively concatenate the plain text of a JSON node. */
function plainText(node) {
if (!node || typeof node !== "object")
return "";
let out = "";
if (typeof node.text === "string")
out += node.text;
if (Array.isArray(node.content)) {
for (const child of node.content)
out += plainText(child);
}
return out;
}
/** Count nodes in a JSON doc that satisfy `pred` (recursive). */
function countNodes(doc, pred) {
let n = 0;
const visit = (node) => {
if (!node || typeof node !== "object")
return;
if (pred(node))
n++;
if (Array.isArray(node.content))
for (const c of node.content)
visit(c);
};
visit(doc);
return n;
}
/**
* Count UNIQUE links in a JSON doc by their `href`. A single link can be split
* across several adjacent text runs (e.g. a "link+bold" run followed by a "link"
* run); counting link-bearing runs would over-count it. Walking the tree and
* collecting hrefs into a Set keys each distinct link once. Link marks with a
* missing/empty href are bucketed under a single "" key so a malformed link is
* still counted as one.
*/
function countUniqueLinks(doc) {
const hrefs = new Set();
const visit = (node) => {
if (!node || typeof node !== "object")
return;
if (node.type === "text" && Array.isArray(node.marks)) {
for (const m of node.marks) {
if (m && m.type === "link") {
const href = m.attrs && typeof m.attrs.href === "string" ? m.attrs.href : "";
hrefs.add(href);
}
}
}
if (Array.isArray(node.content))
for (const c of node.content)
visit(c);
};
visit(doc);
return hrefs.size;
}
/**
* Parse the ordered list of integers from `[N]` footnote markers found in the
* BODY only (every top-level block before the first "Примечания..." notes
* heading; if no such heading, the whole doc). Returned in reading order.
*/
function footnoteMarkers(doc, notesHeading) {
const top = Array.isArray(doc?.content) ? doc.content : [];
const notesIdx = top.findIndex((n) => n &&
n.type === "heading" &&
plainText(n).trim() === notesHeading);
const bodyBlocks = notesIdx >= 0 ? top.slice(0, notesIdx) : top;
const markers = [];
const re = /\[(\d+)\]/g;
for (const block of bodyBlocks) {
const text = plainText(block);
let m;
re.lastIndex = 0;
while ((m = re.exec(text)) !== null) {
markers.push(Number(m[1]));
}
}
return markers;
}
/** Compute the [old,new] integrity tuples for two JSON docs. */
function computeIntegrity(oldDoc, newDoc, notesHeading) {
const images = [
countNodes(oldDoc, (n) => n.type === "image"),
countNodes(newDoc, (n) => n.type === "image"),
];
const links = [
countUniqueLinks(oldDoc),
countUniqueLinks(newDoc),
];
const tables = [
countNodes(oldDoc, (n) => n.type === "table"),
countNodes(newDoc, (n) => n.type === "table"),
];
const callouts = [
countNodes(oldDoc, (n) => n.type === "callout"),
countNodes(newDoc, (n) => n.type === "callout"),
];
const fns = [
footnoteMarkers(oldDoc, notesHeading),
footnoteMarkers(newDoc, notesHeading),
];
return { images, links, tables, callouts, footnoteMarkers: fns };
}
/**
* Resolve the lead text of the top-level block in a ProseMirror Node that
* contains the given document position. Returns "" when out of range.
*/
function blockContextAt(node, pos) {
try {
const clamped = Math.max(0, Math.min(pos, node.content.size));
const $pos = node.resolve(clamped);
// depth 1 is the top-level block in a doc node.
const block = $pos.depth >= 1 ? $pos.node(1) : $pos.node(0);
const text = block.textContent || "";
return text.length > 80 ? text.slice(0, 77) + "..." : text;
}
catch {
return "";
}
}
/** Truncate a string for the markdown summary. */
function truncate(s, n = 120) {
return s.length > n ? s.slice(0, n - 3) + "..." : s;
}
/**
* Coarse fallback: a block-by-block plain-text diff. Used only when the precise
* changeset pipeline throws, so the tool degrades gracefully instead of failing.
*/
function coarseDiff(oldDoc, newDoc) {
const oldBlocks = Array.isArray(oldDoc?.content) ? oldDoc.content : [];
const newBlocks = Array.isArray(newDoc?.content) ? newDoc.content : [];
const oldTexts = oldBlocks.map(plainText);
const newTexts = newBlocks.map(plainText);
const oldSet = new Set(oldTexts);
const newSet = new Set(newTexts);
const changes = [];
for (const t of oldTexts) {
if (!newSet.has(t) && t.trim() !== "") {
changes.push({ op: "delete", block: truncate(t, 80), text: t });
}
}
for (const t of newTexts) {
if (!oldSet.has(t) && t.trim() !== "") {
changes.push({ op: "insert", block: truncate(t, 80), text: t });
}
}
return changes;
}
/** Build the human-readable unified-ish markdown summary. */
function renderMarkdown(result, fellBack) {
const lines = [];
const { summary, integrity, changes } = result;
lines.push(`# Diff: ${summary.inserted} inserted / ${summary.deleted} deleted (${summary.blocksChanged} blocks changed)`);
if (fellBack) {
lines.push("");
lines.push("> note: precise diff failed; coarse block-level diff shown.");
}
lines.push("");
lines.push("## Integrity (old -> new)");
lines.push(`- images: ${integrity.images[0]} -> ${integrity.images[1]}`);
lines.push(`- links: ${integrity.links[0]} -> ${integrity.links[1]}`);
lines.push(`- tables: ${integrity.tables[0]} -> ${integrity.tables[1]}`);
lines.push(`- callouts: ${integrity.callouts[0]} -> ${integrity.callouts[1]}`);
lines.push(`- footnoteMarkers: [${integrity.footnoteMarkers[0].join(", ")}] -> [${integrity.footnoteMarkers[1].join(", ")}]`);
lines.push("");
lines.push("## Changes");
if (changes.length === 0) {
lines.push("(no textual changes)");
}
else {
for (const c of changes) {
const sign = c.op === "insert" ? "+" : "-";
const ctx = c.block ? ` @ ${truncate(c.block, 60)}` : "";
lines.push(`${sign} ${truncate(c.text)}${ctx}`);
}
}
return lines.join("\n");
}
/**
* Diff two ProseMirror JSON documents the way Docmost's history editor does and
* serialize the result to text + integrity counts.
*
* @param oldDocJson the earlier document
* @param newDocJson the later document
* @param notesHeading heading delimiting body from notes for footnote counting
*/
export function diffDocs(oldDocJson, newDocJson, notesHeading = "Примечания переводчика") {
const integrity = computeIntegrity(oldDocJson, newDocJson, notesHeading);
let changes = [];
let inserted = 0;
let deleted = 0;
let fellBack = false;
const changedBlocks = new Set();
try {
const oldNode = Node.fromJSON(schema, oldDocJson);
const newNode = Node.fromJSON(schema, newDocJson);
const tr = recreateTransform(oldNode, newNode, {
complexSteps: false,
wordDiffs: true,
simplifyDiff: true,
});
const changeSet = ChangeSet.create(oldNode).addSteps(tr.doc, tr.mapping.maps, []);
const simplified = simplifyChanges(changeSet.changes, newNode);
for (const change of simplified) {
// Deleted text lives in the OLD doc coordinate range [fromA, toA).
if (change.toA > change.fromA) {
const text = oldNode.textBetween(change.fromA, change.toA, "\n", " ");
if (text.length > 0) {
deleted += text.length;
const block = blockContextAt(oldNode, change.fromA);
changes.push({ op: "delete", block, text });
if (block)
changedBlocks.add("d:" + block);
}
}
// Inserted text lives in the NEW doc coordinate range [fromB, toB).
if (change.toB > change.fromB) {
const text = newNode.textBetween(change.fromB, change.toB, "\n", " ");
if (text.length > 0) {
inserted += text.length;
const block = blockContextAt(newNode, change.fromB);
changes.push({ op: "insert", block, text });
if (block)
changedBlocks.add("i:" + block);
}
}
}
}
catch {
// Pathological pair: degrade to a coarse block-level diff so we never throw.
fellBack = true;
changes = coarseDiff(oldDocJson, newDocJson);
for (const c of changes) {
if (c.op === "insert")
inserted += c.text.length;
else
deleted += c.text.length;
if (c.block)
changedBlocks.add(c.op[0] + ":" + c.block);
}
}
const partial = {
summary: { inserted, deleted, blocksChanged: changedBlocks.size },
integrity,
changes,
};
return { ...partial, markdown: renderMarkdown(partial, fellBack) };
}

View File

@@ -0,0 +1,9 @@
import { Node, Extension, Mark } from "@tiptap/core";
export declare const clampCalloutType: (value: string | null | undefined) => string;
export declare const sanitizeCssColor: (value: string | null | undefined) => string | null;
/**
* Full extension list. Image is block-level (matches Docmost); the
* ProseMirror DOM parser hoists <img> found inside <p> automatically.
* StarterKit v3 already bundles the link extension, configured here.
*/
export declare const docmostExtensions: (Node<any, any> | Mark<any, any> | Extension<any, any> | Extension<import("@tiptap/starter-kit").StarterKitOptions, any> | Node<import("@tiptap/extension-image").ImageOptions, any> | Node<import("@tiptap/extension-task-list").TaskListOptions, any> | Node<import("@tiptap/extension-task-item").TaskItemOptions, any> | Mark<import("@tiptap/extension-highlight").HighlightOptions, any> | Mark<import("@tiptap/extension-subscript").SubscriptExtensionOptions, any>)[];

View File

@@ -0,0 +1,999 @@
/**
* Full TipTap extension set matching the real Docmost document schema.
*
* The default StarterKit-only schema silently destroys Docmost-specific
* nodes (callout, table) and drops attributes it does not know about
* (node ids, image sizing, link targets). Every code path that converts
* to or from ProseMirror JSON must use THIS set, otherwise a round-trip
* loses content.
*/
import StarterKit from "@tiptap/starter-kit";
import Image from "@tiptap/extension-image";
import TaskList from "@tiptap/extension-task-list";
import TaskItem from "@tiptap/extension-task-item";
import Highlight from "@tiptap/extension-highlight";
import Subscript from "@tiptap/extension-subscript";
import Superscript from "@tiptap/extension-superscript";
import { Node, Extension, Mark } from "@tiptap/core";
// Inlined from @tiptap/core's getStyleProperty (added after 3.20.x) so this
// package can stay on the same @tiptap/core version as the editor and avoid a
// duplicate-tiptap version split in the monorepo. Reads a single declaration
// from an element's inline `style` attribute, last-wins, case-insensitive.
function getStyleProperty(element, propertyName) {
const styleAttr = element.getAttribute("style");
if (!styleAttr) {
return null;
}
const decls = styleAttr.split(";").map((decl) => decl.trim()).filter(Boolean);
const target = propertyName.toLowerCase();
for (let i = decls.length - 1; i >= 0; i -= 1) {
const decl = decls[i];
const colonIndex = decl.indexOf(":");
if (colonIndex === -1) {
continue;
}
const prop = decl.slice(0, colonIndex).trim().toLowerCase();
if (prop === target) {
return decl.slice(colonIndex + 1).trim();
}
}
return null;
}
/** Allowed Docmost callout types; anything else falls back to "info". */
const CALLOUT_TYPES = ["info", "warning", "danger", "success"];
export const clampCalloutType = (value) => value && CALLOUT_TYPES.includes(value.toLowerCase())
? value.toLowerCase()
: "info";
/**
* Allowlist guard for CSS color values imported from HTML.
*
* Docmost interpolates stored mark colors straight into an inline style
* attribute (e.g. style="background-color: ${color}" / "color: ${color}").
* An unsanitized value such as `red; --x: url(...)` or `red"><script>` would
* let a crafted document break out of the style attribute. We therefore only
* accept a narrow, well-formed subset of CSS <color> syntax and reject (-> null)
* anything else.
*
* Accepted forms:
* - named colors: letters only, e.g. "red", "rebeccapurple"
* - hex: #rgb, #rgba, #rrggbb, #rrggbbaa
* - functional notation: rgb()/rgba()/hsl()/hsla() containing only
* digits, %, ., commas, spaces and slashes
*/
const SAFE_COLOR_RE = /^(?:[a-zA-Z]+|#(?:[0-9a-fA-F]{3,4}|[0-9a-fA-F]{6}|[0-9a-fA-F]{8})|(?:rgb|rgba|hsl|hsla)\([0-9.,%/\s]+\))$/;
export const sanitizeCssColor = (value) => {
if (typeof value !== "string")
return null;
const color = value.trim();
return color && SAFE_COLOR_RE.test(color) ? color : null;
};
/** Docmost callout (info/warning/danger/success banner). */
const Callout = Node.create({
name: "callout",
group: "block",
content: "block+",
defining: true,
addAttributes() {
return {
// Read the type from data-callout-type so generateJSON(html) preserves
// it; without an explicit parseHTML every imported callout became "info".
type: {
default: "info",
parseHTML: (el) => clampCalloutType(el.getAttribute("data-callout-type")),
renderHTML: (attrs) => ({
"data-callout-type": clampCalloutType(attrs.type),
}),
},
icon: {
default: null,
parseHTML: (el) => el.getAttribute("data-icon"),
renderHTML: (attrs) => attrs.icon ? { "data-icon": attrs.icon } : {},
},
};
},
parseHTML() {
return [{ tag: 'div[data-type="callout"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["div", { "data-type": "callout", ...HTMLAttributes }, 0];
},
});
/** Minimal table family: enough for schema round-trips and HTML parsing. */
const Table = Node.create({
name: "table",
group: "block",
content: "tableRow+",
isolating: true,
parseHTML() {
return [{ tag: "table" }];
},
renderHTML() {
return ["table", ["tbody", 0]];
},
});
const TableRow = Node.create({
name: "tableRow",
content: "(tableCell | tableHeader)*",
parseHTML() {
return [{ tag: "tr" }];
},
renderHTML() {
return ["tr", 0];
},
});
const cellAttributes = () => ({
colspan: { default: 1 },
rowspan: { default: 1 },
colwidth: { default: null },
backgroundColor: { default: null },
backgroundColorName: { default: null },
// Column alignment so GFM aligned tables (|:--|:-:|--:|) round-trip.
align: {
default: null,
parseHTML: (el) => el.getAttribute("align") || el.style.textAlign || null,
renderHTML: (attrs) => attrs.align ? { align: attrs.align } : {},
},
});
const TableCell = Node.create({
name: "tableCell",
content: "block+",
isolating: true,
addAttributes: cellAttributes,
parseHTML() {
return [{ tag: "td" }];
},
renderHTML() {
return ["td", 0];
},
});
const TableHeader = Node.create({
name: "tableHeader",
content: "block+",
isolating: true,
addAttributes: cellAttributes,
parseHTML() {
return [{ tag: "th" }];
},
renderHTML() {
return ["th", 0];
},
});
/**
* Attributes Docmost stores on standard nodes that the stock extensions
* do not declare. Without these, Node.fromJSON silently drops them —
* including the block ids that heading anchors rely on.
*/
const DocmostAttributes = Extension.create({
name: "docmostAttributes",
addGlobalAttributes() {
return [
{
types: ["heading", "paragraph"],
attributes: {
id: { default: null },
indent: { default: null },
textAlign: { default: null },
},
},
{
types: ["image"],
attributes: {
align: { default: null },
attachmentId: { default: null },
aspectRatio: { default: null },
height: { default: null },
placeholder: { default: null },
size: { default: null },
width: { default: null },
},
},
{
types: ["orderedList"],
attributes: { type: { default: null } },
},
{
types: ["link"],
attributes: { internal: { default: null }, title: { default: null } },
},
];
},
});
/**
* Docmost inline comment mark. Anchors a comment thread to a text range via
* `commentId`. Without it, any document containing comment highlights fails to
* round-trip through the schema ("There is no mark type comment in this schema"),
* which breaks update_page_json and edit_page_text on every commented page.
* Mirrors Docmost's @docmost/editor-ext comment mark (commentId / resolved).
*/
const Comment = Mark.create({
name: "comment",
exitable: true,
inclusive: false,
addAttributes() {
return {
commentId: {
default: null,
parseHTML: (el) => el.getAttribute("data-comment-id"),
renderHTML: (attrs) => attrs.commentId ? { "data-comment-id": attrs.commentId } : {},
},
resolved: {
default: false,
parseHTML: (el) => el.getAttribute("data-resolved") === "true",
renderHTML: (attrs) => attrs.resolved ? { "data-resolved": "true" } : {},
},
};
},
parseHTML() {
return [{ tag: "span[data-comment-id]" }];
},
renderHTML({ HTMLAttributes }) {
return ["span", { class: "comment-mark", ...HTMLAttributes }, 0];
},
});
/**
* Text color mark. The markdown-converter emits colored text as
* <span style="color: ...">, but with no mark parsing it back the color was
* silently dropped on import. This mirrors TipTap's @tiptap/extension-text-style
* `textStyle` mark (the name Docmost expects) and carries a single `color`
* attribute. The parsed color is passed through the allowlist guard so a crafted
* style cannot break out of the attribute when Docmost re-renders it.
*/
const TextStyle = Mark.create({
name: "textStyle",
addAttributes() {
return {
color: {
default: null,
parseHTML: (el) => sanitizeCssColor(el.style.color || el.getAttribute("data-color")),
renderHTML: (attrs) => {
const color = sanitizeCssColor(attrs.color);
return color ? { style: `color: ${color}` } : {};
},
},
};
},
parseHTML() {
return [
{
tag: "span",
// Only claim a plain colored span. Do NOT match spans that are already a
// comment mark (data-comment-id) or a mention node (data-type=mention),
// otherwise importing such HTML would silently drop the comment/mention.
getAttrs: (el) => el.style.color &&
!el.getAttribute("data-comment-id") &&
el.getAttribute("data-type") !== "mention"
? {}
: false,
},
];
},
renderHTML({ HTMLAttributes }) {
return ["span", HTMLAttributes, 0];
},
});
/**
* Passthrough definitions for the remaining Docmost-specific nodes.
*
* TiptapTransformer.toYdoc (the write path every mutation uses) throws
* "Unknown node type: X" for any node not registered here, so editing ANY
* page that contains one of these nodes used to fail outright. The read path
* (fromYdoc) accepts them, which is why they appear in real documents.
*
* Each node below mirrors the real @docmost/editor-ext definition's name,
* group, content, inline/atom flags and attribute keys (with the same data-*
* HTML mapping) so that a fromYdoc -> transform -> toYdoc round-trip both
* validates and preserves attributes faithfully. Interactive concerns
* (node views, commands, keyboard shortcuts, input rules, suggestion plugins)
* are intentionally omitted: the MCP server never renders these nodes, it only
* needs the schema to accept and carry them. The Callout node above is the
* pattern these follow.
*/
/** Docmost @mention (user/page reference). Inline atom. */
const Mention = Node.create({
name: "mention",
group: "inline",
inline: true,
selectable: true,
atom: true,
draggable: true,
addAttributes() {
return {
id: {
default: null,
parseHTML: (el) => el.getAttribute("data-id"),
renderHTML: (attrs) => attrs.id ? { "data-id": attrs.id } : {},
},
label: {
default: null,
parseHTML: (el) => el.getAttribute("data-label"),
renderHTML: (attrs) => attrs.label ? { "data-label": attrs.label } : {},
},
entityType: {
default: null,
parseHTML: (el) => el.getAttribute("data-entity-type"),
renderHTML: (attrs) => attrs.entityType ? { "data-entity-type": attrs.entityType } : {},
},
entityId: {
default: null,
parseHTML: (el) => el.getAttribute("data-entity-id"),
renderHTML: (attrs) => attrs.entityId ? { "data-entity-id": attrs.entityId } : {},
},
slugId: {
default: null,
parseHTML: (el) => el.getAttribute("data-slug-id"),
renderHTML: (attrs) => attrs.slugId ? { "data-slug-id": attrs.slugId } : {},
},
creatorId: {
default: null,
parseHTML: (el) => el.getAttribute("data-creator-id"),
renderHTML: (attrs) => attrs.creatorId ? { "data-creator-id": attrs.creatorId } : {},
},
anchorId: {
default: null,
parseHTML: (el) => el.getAttribute("data-anchor-id"),
renderHTML: (attrs) => attrs.anchorId ? { "data-anchor-id": attrs.anchorId } : {},
},
};
},
parseHTML() {
return [{ tag: 'span[data-type="mention"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["span", { "data-type": "mention", ...HTMLAttributes }, 0];
},
});
/** Inline KaTeX expression. Carries the LaTeX source in `text`. */
const MathInline = Node.create({
name: "mathInline",
group: "inline",
inline: true,
atom: true,
addAttributes() {
return {
text: { default: "" },
};
},
parseHTML() {
return [{ tag: 'span[data-type="mathInline"]' }];
},
renderHTML({ HTMLAttributes }) {
return [
"span",
{ "data-type": "mathInline", "data-katex": "true" },
`${HTMLAttributes.text ?? ""}`,
];
},
});
/** Block KaTeX expression. Carries the LaTeX source in `text`. */
const MathBlock = Node.create({
name: "mathBlock",
group: "block",
atom: true,
isolating: true,
addAttributes() {
return {
text: { default: "" },
};
},
parseHTML() {
return [{ tag: 'div[data-type="mathBlock"]' }];
},
renderHTML({ HTMLAttributes }) {
return [
"div",
{ "data-type": "mathBlock", "data-katex": "true" },
`${HTMLAttributes.text ?? ""}`,
];
},
});
/** Collapsible <details> wrapper: summary + content children. */
const Details = Node.create({
name: "details",
group: "block",
content: "detailsSummary detailsContent",
defining: true,
isolating: true,
addAttributes() {
return {
open: {
default: false,
parseHTML: (el) => el.getAttribute("open"),
renderHTML: (attrs) => attrs.open ? { open: "" } : {},
},
};
},
parseHTML() {
return [{ tag: "details" }];
},
renderHTML({ HTMLAttributes }) {
return ["details", { ...HTMLAttributes }, 0];
},
});
/** Clickable summary line of a <details> block. */
const DetailsSummary = Node.create({
name: "detailsSummary",
group: "block",
content: "inline*",
defining: true,
isolating: true,
selectable: false,
parseHTML() {
return [{ tag: "summary" }];
},
renderHTML({ HTMLAttributes }) {
return ["summary", { "data-type": "detailsSummary", ...HTMLAttributes }, 0];
},
});
/** Body of a <details> block. Permissive content so fromYdoc output validates. */
const DetailsContent = Node.create({
name: "detailsContent",
group: "block",
// Docmost declares block* (an empty details body is valid); block+ would
// reject a collapsed/empty details on round-trip.
content: "block*",
defining: true,
selectable: false,
parseHTML() {
return [{ tag: 'div[data-type="detailsContent"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["div", { "data-type": "detailsContent", ...HTMLAttributes }, 0];
},
});
/** File attachment card (non-image upload). Block atom. */
const Attachment = Node.create({
name: "attachment",
group: "block",
inline: false,
isolating: true,
atom: true,
defining: true,
draggable: true,
addAttributes() {
return {
url: {
default: "",
parseHTML: (el) => el.getAttribute("data-attachment-url"),
renderHTML: (attrs) => ({
"data-attachment-url": attrs.url ?? "",
}),
},
name: {
default: null,
parseHTML: (el) => el.getAttribute("data-attachment-name"),
renderHTML: (attrs) => attrs.name ? { "data-attachment-name": attrs.name } : {},
},
mime: {
default: null,
parseHTML: (el) => el.getAttribute("data-attachment-mime"),
renderHTML: (attrs) => attrs.mime ? { "data-attachment-mime": attrs.mime } : {},
},
size: {
default: null,
parseHTML: (el) => el.getAttribute("data-attachment-size"),
renderHTML: (attrs) => attrs.size != null ? { "data-attachment-size": attrs.size } : {},
},
attachmentId: {
default: null,
parseHTML: (el) => el.getAttribute("data-attachment-id"),
renderHTML: (attrs) => attrs.attachmentId
? { "data-attachment-id": attrs.attachmentId }
: {},
},
// Docmost declares `placeholder` (a transient upload key, not rendered
// to HTML). Carry it so a round-trip never hits "Unsupported attribute".
placeholder: { default: null },
};
},
parseHTML() {
return [{ tag: 'div[data-type="attachment"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["div", { "data-type": "attachment", ...HTMLAttributes }, 0];
},
});
/** Uploaded <video> player. Block atom. */
const Video = Node.create({
name: "video",
group: "block",
isolating: true,
atom: true,
defining: true,
draggable: true,
addAttributes() {
return {
src: {
default: "",
parseHTML: (el) => el.getAttribute("src"),
renderHTML: (attrs) => ({ src: attrs.src ?? "" }),
},
alt: {
default: null,
parseHTML: (el) => el.getAttribute("aria-label"),
renderHTML: (attrs) => attrs.alt ? { "aria-label": attrs.alt } : {},
},
attachmentId: {
default: null,
parseHTML: (el) => el.getAttribute("data-attachment-id"),
renderHTML: (attrs) => attrs.attachmentId
? { "data-attachment-id": attrs.attachmentId }
: {},
},
width: {
default: null,
parseHTML: (el) => el.getAttribute("width"),
renderHTML: (attrs) => attrs.width != null ? { width: attrs.width } : {},
},
height: {
default: null,
parseHTML: (el) => el.getAttribute("height"),
renderHTML: (attrs) => attrs.height != null ? { height: attrs.height } : {},
},
size: {
default: null,
parseHTML: (el) => el.getAttribute("data-size"),
renderHTML: (attrs) => attrs.size != null ? { "data-size": attrs.size } : {},
},
align: {
default: "center",
parseHTML: (el) => el.getAttribute("data-align"),
renderHTML: (attrs) => attrs.align ? { "data-align": attrs.align } : {},
},
aspectRatio: {
default: null,
parseHTML: (el) => el.getAttribute("data-aspect-ratio"),
renderHTML: (attrs) => attrs.aspectRatio != null
? { "data-aspect-ratio": attrs.aspectRatio }
: {},
},
// Docmost declares `placeholder` (a transient upload key, not rendered
// to HTML). Carry it so a round-trip never hits "Unsupported attribute".
placeholder: { default: null },
};
},
parseHTML() {
return [{ tag: "video" }];
},
renderHTML({ HTMLAttributes }) {
return ["video", { controls: "true", ...HTMLAttributes }];
},
});
/**
* Defensive passthrough for a `youtube` node. Docmost itself has no dedicated
* youtube node (YouTube is handled via `embed`), but the converter read path
* references this type, so accept it as a generic block atom that preserves
* its src so legacy/external documents survive a round-trip.
*/
const Youtube = Node.create({
name: "youtube",
group: "block",
inline: false,
isolating: true,
atom: true,
defining: true,
draggable: true,
addAttributes() {
return {
src: {
default: "",
parseHTML: (el) => el.getAttribute("data-src"),
renderHTML: (attrs) => ({
"data-src": attrs.src ?? "",
}),
},
width: {
default: null,
parseHTML: (el) => el.getAttribute("data-width"),
renderHTML: (attrs) => attrs.width != null ? { "data-width": attrs.width } : {},
},
height: {
default: null,
parseHTML: (el) => el.getAttribute("data-height"),
renderHTML: (attrs) => attrs.height != null ? { "data-height": attrs.height } : {},
},
align: {
default: "center",
parseHTML: (el) => el.getAttribute("data-align"),
renderHTML: (attrs) => attrs.align ? { "data-align": attrs.align } : {},
},
};
},
parseHTML() {
return [{ tag: 'div[data-type="youtube"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["div", { "data-type": "youtube", ...HTMLAttributes }, 0];
},
});
/** Generic embed (provider iframe). Block atom. */
const Embed = Node.create({
name: "embed",
group: "block",
inline: false,
isolating: true,
atom: true,
defining: true,
draggable: true,
addAttributes() {
return {
src: {
default: "",
parseHTML: (el) => el.getAttribute("data-src"),
renderHTML: (attrs) => ({
"data-src": attrs.src ?? "",
}),
},
provider: {
default: "",
parseHTML: (el) => el.getAttribute("data-provider"),
renderHTML: (attrs) => ({
"data-provider": attrs.provider ?? "",
}),
},
align: {
default: "center",
parseHTML: (el) => el.getAttribute("data-align"),
renderHTML: (attrs) => ({
"data-align": attrs.align ?? "center",
}),
},
width: {
default: 800,
parseHTML: (el) => el.getAttribute("data-width"),
renderHTML: (attrs) => ({
"data-width": attrs.width,
}),
},
height: {
default: 600,
parseHTML: (el) => el.getAttribute("data-height"),
renderHTML: (attrs) => ({
"data-height": attrs.height,
}),
},
};
},
parseHTML() {
return [{ tag: 'div[data-type="embed"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["div", { "data-type": "embed", ...HTMLAttributes }, 0];
},
});
/** Shared attribute set for drawio/excalidraw diagram nodes. */
const diagramAttributes = () => ({
src: {
default: "",
parseHTML: (el) => el.getAttribute("data-src"),
renderHTML: (attrs) => ({
"data-src": attrs.src ?? "",
}),
},
title: {
default: null,
parseHTML: (el) => el.getAttribute("data-title"),
renderHTML: (attrs) => attrs.title ? { "data-title": attrs.title } : {},
},
alt: {
default: null,
parseHTML: (el) => el.getAttribute("data-alt"),
renderHTML: (attrs) => attrs.alt ? { "data-alt": attrs.alt } : {},
},
width: {
default: null,
parseHTML: (el) => el.getAttribute("data-width"),
renderHTML: (attrs) => attrs.width != null ? { "data-width": attrs.width } : {},
},
height: {
default: null,
parseHTML: (el) => el.getAttribute("data-height"),
renderHTML: (attrs) => attrs.height != null ? { "data-height": attrs.height } : {},
},
size: {
default: null,
parseHTML: (el) => el.getAttribute("data-size"),
renderHTML: (attrs) => attrs.size != null ? { "data-size": attrs.size } : {},
},
aspectRatio: {
default: null,
parseHTML: (el) => el.getAttribute("data-aspect-ratio"),
renderHTML: (attrs) => attrs.aspectRatio != null
? { "data-aspect-ratio": attrs.aspectRatio }
: {},
},
align: {
default: "center",
parseHTML: (el) => el.getAttribute("data-align"),
renderHTML: (attrs) => attrs.align ? { "data-align": attrs.align } : {},
},
attachmentId: {
default: null,
parseHTML: (el) => el.getAttribute("data-attachment-id"),
renderHTML: (attrs) => attrs.attachmentId ? { "data-attachment-id": attrs.attachmentId } : {},
},
});
/** draw.io diagram. Block atom (image-backed). */
const Drawio = Node.create({
name: "drawio",
group: "block",
inline: false,
isolating: true,
atom: true,
defining: true,
draggable: true,
addAttributes: diagramAttributes,
parseHTML() {
return [{ tag: 'div[data-type="drawio"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["div", { "data-type": "drawio", ...HTMLAttributes }, 0];
},
});
/** Excalidraw diagram. Block atom (image-backed). */
const Excalidraw = Node.create({
name: "excalidraw",
group: "block",
inline: false,
isolating: true,
atom: true,
defining: true,
draggable: true,
addAttributes: diagramAttributes,
parseHTML() {
return [{ tag: 'div[data-type="excalidraw"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["div", { "data-type": "excalidraw", ...HTMLAttributes }, 0];
},
});
/** Multi-column layout container holding one or more `column` children. */
const Columns = Node.create({
name: "columns",
group: "block",
content: "column+",
defining: true,
isolating: true,
addAttributes() {
return {
layout: {
default: "two_equal",
parseHTML: (el) => el.getAttribute("data-layout"),
renderHTML: (attrs) => attrs.layout ? { "data-layout": attrs.layout } : {},
},
widthMode: {
default: "normal",
parseHTML: (el) => el.getAttribute("data-width-mode") || "normal",
renderHTML: (attrs) => attrs.widthMode && attrs.widthMode !== "normal"
? { "data-width-mode": attrs.widthMode }
: {},
},
};
},
parseHTML() {
return [{ tag: 'div[data-type="columns"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["div", { "data-type": "columns", ...HTMLAttributes }, 0];
},
});
/** Single column within a `columns` layout. */
const Column = Node.create({
name: "column",
group: "block",
content: "block+",
defining: true,
isolating: true,
selectable: false,
addAttributes() {
return {
width: {
default: null,
parseHTML: (el) => {
const value = el.getAttribute("data-width");
return value ? parseFloat(value) : null;
},
renderHTML: (attrs) => attrs.width ? { "data-width": attrs.width } : {},
},
};
},
parseHTML() {
return [{ tag: 'div[data-type="column"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["div", { "data-type": "column", ...HTMLAttributes }, 0];
},
});
/**
* Subpages listing block (auto-generated index of child pages). Docmost
* declares no attributes; the markdown-converter has a `case "subpages"`, so
* the read path can emit it and toYdoc must accept it. Block atom.
*/
const Subpages = Node.create({
name: "subpages",
group: "block",
inline: false,
isolating: true,
atom: true,
defining: true,
draggable: true,
parseHTML() {
return [{ tag: 'div[data-type="subpages"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["div", { "data-type": "subpages", ...HTMLAttributes }, 0];
},
});
/** Uploaded <audio> player. Block atom. Mirrors Docmost audio attrs. */
const Audio = Node.create({
name: "audio",
group: "block",
inline: false,
isolating: true,
atom: true,
defining: true,
draggable: true,
addAttributes() {
return {
src: {
default: "",
parseHTML: (el) => el.getAttribute("src"),
renderHTML: (attrs) => ({ src: attrs.src ?? "" }),
},
attachmentId: {
default: null,
parseHTML: (el) => el.getAttribute("data-attachment-id"),
renderHTML: (attrs) => attrs.attachmentId
? { "data-attachment-id": attrs.attachmentId }
: {},
},
size: {
default: null,
parseHTML: (el) => el.getAttribute("data-size"),
renderHTML: (attrs) => attrs.size != null ? { "data-size": attrs.size } : {},
},
// Transient upload key Docmost declares with rendered:false; carried so
// a round-trip never hits "Unsupported attribute".
placeholder: { default: null },
};
},
parseHTML() {
return [{ tag: "audio" }];
},
renderHTML({ HTMLAttributes }) {
return ["audio", { controls: "true", ...HTMLAttributes }];
},
});
/** Embedded PDF viewer. Block atom. Mirrors Docmost pdf attrs. */
const Pdf = Node.create({
name: "pdf",
group: "block",
inline: false,
isolating: true,
atom: true,
defining: true,
draggable: true,
addAttributes() {
return {
src: {
default: "",
parseHTML: (el) => el.getAttribute("src"),
renderHTML: (attrs) => ({ src: attrs.src ?? "" }),
},
name: {
default: null,
parseHTML: (el) => el.getAttribute("data-name"),
renderHTML: (attrs) => attrs.name ? { "data-name": attrs.name } : {},
},
attachmentId: {
default: null,
parseHTML: (el) => el.getAttribute("data-attachment-id"),
renderHTML: (attrs) => attrs.attachmentId
? { "data-attachment-id": attrs.attachmentId }
: {},
},
size: {
default: null,
parseHTML: (el) => el.getAttribute("data-size"),
renderHTML: (attrs) => attrs.size != null ? { "data-size": attrs.size } : {},
},
width: {
default: null,
parseHTML: (el) => el.getAttribute("width"),
renderHTML: (attrs) => attrs.width != null ? { width: attrs.width } : {},
},
height: {
default: null,
parseHTML: (el) => el.getAttribute("height"),
renderHTML: (attrs) => attrs.height != null ? { height: attrs.height } : {},
},
// Transient upload key Docmost declares with rendered:false; carried so
// a round-trip never hits "Unsupported attribute".
placeholder: { default: null },
};
},
parseHTML() {
return [{ tag: 'div[data-type="pdf"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["div", { "data-type": "pdf", ...HTMLAttributes }, 0];
},
});
/** Page break (print/export divider). Block atom; Docmost declares no attrs. */
const PageBreak = Node.create({
name: "pageBreak",
group: "block",
inline: false,
isolating: true,
atom: true,
defining: true,
draggable: true,
parseHTML() {
return [{ tag: 'div[data-type="pageBreak"]' }];
},
renderHTML({ HTMLAttributes }) {
return ["div", { "data-type": "pageBreak", ...HTMLAttributes }];
},
});
/**
* Full extension list. Image is block-level (matches Docmost); the
* ProseMirror DOM parser hoists <img> found inside <p> automatically.
* StarterKit v3 already bundles the link extension, configured here.
*/
export const docmostExtensions = [
StarterKit.configure({
codeBlock: {},
heading: {},
link: { openOnClick: false },
}),
Image.configure({ inline: false }),
TaskList,
TaskItem.configure({ nested: true }),
// Highlight stores its color unescaped and Docmost interpolates it into
// style="background-color: ${color}". Wrap the color attribute's parseHTML
// with the same allowlist guard used by textStyle so a crafted import color
// cannot break out of the style attribute. Multicolor behavior is preserved.
Highlight.extend({
addAttributes() {
const parent = this.parent?.() ?? {};
return {
...parent,
color: {
...parent.color,
parseHTML: (el) => sanitizeCssColor(el.getAttribute("data-color") ||
getStyleProperty(el, "background-color") ||
el.style.backgroundColor),
},
};
},
}).configure({ multicolor: true }),
Subscript,
Superscript,
// StarterKit does not provide a textStyle mark, so register ours; without it
// generateJSON drops <span style="color: ...">, defeating the color import.
TextStyle,
Comment,
Callout,
Table,
TableRow,
TableCell,
TableHeader,
Mention,
MathInline,
MathBlock,
Details,
DetailsSummary,
DetailsContent,
Attachment,
Video,
Youtube,
Embed,
Drawio,
Excalidraw,
Columns,
Column,
Subpages,
Audio,
Pdf,
PageBreak,
DocmostAttributes,
];

16
packages/git-sync/build/lib/index.d.ts vendored Normal file
View File

@@ -0,0 +1,16 @@
/**
* Public surface of the pure converter (`lib/`). This barrel re-exports the
* PURE, IO-free pieces the sync engine needs: the self-contained markdown
* (de)serializers, the lossless ProseMirror <-> Markdown converter, the
* markdown -> ProseMirror import path, and semantic canonicalization for the
* round-trip idempotency check (SPEC §11).
*
* There is no REST client, websocket/collab write-path, auth-utils or page-lock
* here — the gitmost server writes natively.
*/
export { serializeDocmostMarkdown, parseDocmostMarkdown, serializeDocmostMarkdownBody, } from "./markdown-document.js";
export type { DocmostMdMeta } from "./markdown-document.js";
export { convertProseMirrorToMarkdown } from "./markdown-converter.js";
export { markdownToProseMirror } from "./markdown-to-prosemirror.js";
export { canonicalizeContent, docsCanonicallyEqual, } from "./canonicalize.js";
export { parsePageFile, serializePageFile } from "./page-file.js";

View File

@@ -0,0 +1,15 @@
/**
* Public surface of the pure converter (`lib/`). This barrel re-exports the
* PURE, IO-free pieces the sync engine needs: the self-contained markdown
* (de)serializers, the lossless ProseMirror <-> Markdown converter, the
* markdown -> ProseMirror import path, and semantic canonicalization for the
* round-trip idempotency check (SPEC §11).
*
* There is no REST client, websocket/collab write-path, auth-utils or page-lock
* here — the gitmost server writes natively.
*/
export { serializeDocmostMarkdown, parseDocmostMarkdown, serializeDocmostMarkdownBody, } from "./markdown-document.js";
export { convertProseMirrorToMarkdown } from "./markdown-converter.js";
export { markdownToProseMirror } from "./markdown-to-prosemirror.js";
export { canonicalizeContent, docsCanonicallyEqual, } from "./canonicalize.js";
export { parsePageFile, serializePageFile } from "./page-file.js";

View File

@@ -0,0 +1,5 @@
/**
* Convert ProseMirror/TipTap JSON content to Markdown
* Supports all Docmost-specific node types and extensions
*/
export declare function convertProseMirrorToMarkdown(content: any): string;

View File

@@ -0,0 +1,801 @@
/**
* Convert ProseMirror/TipTap JSON content to Markdown
* Supports all Docmost-specific node types and extensions
*/
export function convertProseMirrorToMarkdown(content) {
if (!content || !content.content)
return "";
// Escape a value interpolated into an HTML double-quoted attribute value
// (textAlign, colors, image src, math `text`, all data-* attrs, etc.). In the
// ATTRIBUTE context only the quote that delimits the value and the ampersand
// that starts an entity are special, so we escape ONLY & " (and ' for safety
// when single-quoted delimiters are used). We deliberately do NOT escape < or
// >: the HTML re-parser (parse5/jsdom via @tiptap/html) does NOT decode
// &lt;/&gt; back inside attribute values, so escaping them would corrupt the
// stored data (e.g. a math node's LaTeX `a < b`) and ACCUMULATE escapes on
// every round-trip (`a < b` -> `a &lt; b` -> `a &amp;lt; b`). Escaping & "
// keeps the value inert against attribute-injection while staying idempotent.
// NOTE: escape ONLY & and " here. The value is always wrapped in double
// quotes, so " is the only delimiter; ' is NOT special in a double-quoted
// value, and parse5 does not decode &#39; back inside attribute values, so
// escaping ' would (like < >) corrupt the value and accumulate &amp; on every
// round-trip. Escaping & and " is idempotent (parse5 decodes them back).
const escapeAttr = (value) => String(value)
.replace(/&/g, "&amp;")
.replace(/"/g, "&quot;");
// Escape a value placed as HTML element TEXT content (between tags), where
// <, >, and & are all significant. Used for text rendered inside raw-HTML
// blocks (table cells / columns) so stored characters cannot inject markup.
const escapeHtmlText = (value) => String(value)
.replace(/&/g, "&amp;")
.replace(/</g, "&lt;")
.replace(/>/g, "&gt;");
// Percent-encode characters that would break out of a markdown URL target
// (...) — whitespace/newlines and parentheses — so a stored src stays a
// single inert token (used for image/video/youtube srcs).
const encodeMdUrl = (value) => String(value || "")
.replace(/\s/g, (c) => (c === " " ? "%20" : encodeURIComponent(c)))
.replace(/\(/g, "%28")
.replace(/\)/g, "%29");
const processNode = (node) => {
const type = node.type;
const nodeContent = node.content || [];
switch (type) {
case "doc":
return nodeContent.map(processNode).join("\n\n");
case "paragraph":
const text = nodeContent.map(processNode).join("");
const align = node.attrs?.textAlign;
if (align && align !== "left") {
return `<div align="${escapeAttr(align)}">${text}</div>`;
}
return text || "";
case "heading":
const level = node.attrs?.level || 1;
const headingText = nodeContent.map(processNode).join("");
return "#".repeat(level) + " " + headingText;
case "text":
let textContent = node.text || "";
// Apply marks (bold, italic, code, etc.)
if (node.marks) {
// The schema's `code` mark declares `excludes: "_"` — it excludes every
// other inline mark — so the editor can NEVER produce a text run that
// carries `code` together with another mark, and on import any
// co-occurring mark is always dropped (the run comes back as code-only).
// The lossless, byte-stable behavior is therefore: when a run has the
// `code` mark, emit ONLY the backtick code span and ignore every other
// mark, so md1 is already code-only and md2 === md1. Runs WITHOUT a code
// mark are rendered exactly as before.
const markTypes = node.marks.map((m) => m.type);
const hasCode = markTypes.includes("code");
if (hasCode) {
textContent = `\`${textContent}\``;
return textContent;
}
const codeCombined = false;
for (const mark of node.marks) {
switch (mark.type) {
case "bold":
textContent = codeCombined
? `<strong>${textContent}</strong>`
: `**${textContent}**`;
break;
case "italic":
textContent = codeCombined
? `<em>${textContent}</em>`
: `*${textContent}*`;
break;
case "code":
// When combined with another mark, wrap as <code> so the
// surrounding HTML marks can nest around it; otherwise use the
// plain backtick span.
textContent = codeCombined
? `<code>${textContent}</code>`
: `\`${textContent}\``;
break;
case "link": {
const href = mark.attrs?.href || "";
const title = mark.attrs?.title;
if (codeCombined) {
// Emit an HTML anchor so it can wrap the nested <code>.
const safeHref = escapeAttr(href);
if (title) {
textContent = `<a href="${safeHref}" title="${escapeAttr(String(title))}">${textContent}</a>`;
}
else {
textContent = `<a href="${safeHref}">${textContent}</a>`;
}
}
else if (title) {
// Emit the optional markdown link title; escape an embedded
// double-quote so it cannot terminate the title string early.
const safeTitle = String(title).replace(/"/g, '\\"');
textContent = `[${textContent}](${href} "${safeTitle}")`;
}
else {
textContent = `[${textContent}](${href})`;
}
break;
}
case "strike":
textContent = codeCombined
? `<s>${textContent}</s>`
: `~~${textContent}~~`;
break;
case "underline":
textContent = `<u>${textContent}</u>`;
break;
case "subscript":
textContent = `<sub>${textContent}</sub>`;
break;
case "superscript":
textContent = `<sup>${textContent}</sup>`;
break;
case "highlight": {
// Preserve a null/empty color as a plain highlight (a bare
// <mark> with no background-color); only emit the style when a
// color is actually set, so a plain highlight is not forced to
// yellow on export.
const color = mark.attrs?.color;
textContent = color
? `<mark style="background-color: ${escapeAttr(color)}">${textContent}</mark>`
: `<mark>${textContent}</mark>`;
break;
}
case "textStyle":
if (mark.attrs?.color) {
textContent = `<span style="color: ${escapeAttr(mark.attrs.color)}">${textContent}</span>`;
}
break;
case "comment": {
// Emit the inline comment anchor so highlights round-trip. The
// schema's Comment mark parses span[data-comment-id] (attrs
// commentId/resolved).
const cid = mark.attrs?.commentId;
if (cid) {
const resolvedAttr = mark.attrs?.resolved
? ` data-resolved="true"`
: "";
textContent = `<span data-comment-id="${escapeAttr(cid)}"${resolvedAttr}>${textContent}</span>`;
}
break;
}
}
}
}
return textContent;
case "codeBlock":
const language = node.attrs?.language || "";
// Strip ALL trailing newlines so the export is idempotent: marked
// re-adds exactly one trailing "\n" on import, so trimming only one
// here would let the text grow by "\n" on each round-trip. Removing
// every trailing newline makes repeated cycles stable.
const code = nodeContent
.map(processNode)
.join("")
.replace(/\n+$/, "");
return "```" + language + "\n" + code + "\n```";
case "bulletList":
return nodeContent
.map((item) => processListItem(item, "-"))
.join("\n");
case "orderedList":
return nodeContent
.map((item, index) => processListItem(item, `${index + 1}.`))
.join("\n");
case "taskList":
return nodeContent.map((item) => processTaskItem(item)).join("\n");
case "taskItem":
// Delegate to the same helper used by taskList so multi-block and
// nested task items render and indent consistently.
return processTaskItem(node);
case "listItem":
return nodeContent.map(processNode).join("\n");
case "blockquote":
// Prefix EVERY line of EVERY child with "> " and separate block-level
// children with a blank ">" line so code blocks / multi-paragraph
// quotes round-trip correctly.
return nodeContent
.map((n) => processNode(n)
.split("\n")
.map((line) => (line.length ? `> ${line}` : ">"))
.join("\n"))
.join("\n>\n");
case "horizontalRule":
return "---";
case "hardBreak":
// Two trailing spaces before the newline encode a markdown hard break;
// a bare "\n" would be reimported as a soft break and lost.
return " \n";
case "image":
const imgAlt = node.attrs?.alt || "";
// Neutralize characters that could break out of the markdown image
// URL: spaces/newlines and parentheses would terminate the (...) target
// and let a stored src inject following markdown/HTML. Percent-encode
// them so the URL stays a single inert token.
const imgSrc = encodeMdUrl(node.attrs?.src);
// No "caption" attribute exists in the Docmost image schema, so we do
// not emit one (the previous caption branch was dead).
return `![${imgAlt}](${imgSrc})`;
case "video": {
// Emit the schema-matching <video> element so generateJSON rebuilds the
// node with its attrs intact. The schema's parseHTML reads src/aria-label
// from the standard attributes and the remaining attrs from data-*.
const attrs = node.attrs || {};
const parts = [`src="${escapeAttr(attrs.src ?? "")}"`];
if (attrs.alt)
parts.push(`aria-label="${escapeAttr(attrs.alt)}"`);
if (attrs.attachmentId)
parts.push(`data-attachment-id="${escapeAttr(attrs.attachmentId)}"`);
if (attrs.width != null)
parts.push(`width="${escapeAttr(attrs.width)}"`);
if (attrs.height != null)
parts.push(`height="${escapeAttr(attrs.height)}"`);
if (attrs.size != null)
parts.push(`data-size="${escapeAttr(attrs.size)}"`);
if (attrs.align)
parts.push(`data-align="${escapeAttr(attrs.align)}"`);
if (attrs.aspectRatio != null)
parts.push(`data-aspect-ratio="${escapeAttr(attrs.aspectRatio)}"`);
// Wrap in a block <div> so marked treats it as a block (a bare <video>
// is inline-level HTML and marked wraps it in <p>, leaving a spurious
// empty paragraph beside the hoisted block atom). The wrapper has no
// data-type, so the schema parser ignores it and just hoists the video.
return `<div><video ${parts.join(" ")}></video></div>`;
}
case "youtube": {
// Emit the schema-matching div[data-type="youtube"]; the schema reads
// src from data-src and width/height/align from data-* attributes.
const attrs = node.attrs || {};
const parts = [
`data-type="youtube"`,
`data-src="${escapeAttr(attrs.src ?? "")}"`,
];
if (attrs.width != null)
parts.push(`data-width="${escapeAttr(attrs.width)}"`);
if (attrs.height != null)
parts.push(`data-height="${escapeAttr(attrs.height)}"`);
if (attrs.align)
parts.push(`data-align="${escapeAttr(attrs.align)}"`);
return `<div ${parts.join(" ")}></div>`;
}
case "table": {
// A GFM pipe table cannot represent merged cells. If ANY cell carries
// colspan>1 or rowspan>1, a pipe table would corrupt the grid on
// re-import, so emit the WHOLE table as raw HTML <table> instead: the
// schema's table family parseHTML (tag table/tr/td/th, with colspan/
// rowspan read from the same-named HTML attrs and align via parseHTML)
// round-trips it faithfully. Otherwise keep the lighter GFM pipe table.
const tableRows = nodeContent;
if (tableRows.length === 0)
return "";
const hasSpan = tableRows.some((row) => (row.content || []).some((cell) => (cell.attrs?.colspan ?? 1) > 1 || (cell.attrs?.rowspan ?? 1) > 1));
if (hasSpan) {
// Render each cell's block children to HTML (marked does NOT parse
// markdown inside a raw HTML block, so emitting markdown here would
// leak literal ** / `` into the cell). blockToHtml mirrors the schema
// HTML so inner formatting re-parses into the right marks/nodes.
const renderHtmlCell = (cell) => {
const tag = cell.type === "tableHeader" ? "th" : "td";
const a = cell.attrs || {};
const cellParts = [];
if ((a.colspan ?? 1) > 1)
cellParts.push(`colspan="${escapeAttr(a.colspan)}"`);
if ((a.rowspan ?? 1) > 1)
cellParts.push(`rowspan="${escapeAttr(a.rowspan)}"`);
if (a.align)
cellParts.push(`align="${escapeAttr(a.align)}"`);
const open = cellParts.length
? `<${tag} ${cellParts.join(" ")}>`
: `<${tag}>`;
const inner = (cell.content || [])
.map((block) => blockToHtml(block))
.join("");
return `${open}${inner}</${tag}>`;
};
const htmlRows = tableRows
.map((row) => `<tr>${(row.content || []).map(renderHtmlCell).join("")}</tr>`)
.join("");
return `<table><tbody>${htmlRows}</tbody></table>`;
}
// No merged cells: emit a GFM table (header row + separator) so the
// markdown can be parsed back into a table on re-import.
const rows = tableRows.map(processNode);
const headerCells = tableRows[0]?.content || [];
const columns = headerCells.length || 1;
// Derive alignment markers (:--, :-:, --:) from each header cell.
const markers = Array.from({ length: columns }, (_, i) => {
const align = headerCells[i]?.attrs?.align;
switch (align) {
case "left":
return ":--";
case "center":
return ":-:";
case "right":
return "--:";
default:
return "---";
}
});
const separator = "| " + markers.join(" | ") + " |";
return [rows[0], separator, ...rows.slice(1)].join("\n");
}
case "tableRow":
return "| " + nodeContent.map(processNode).join(" | ") + " |";
case "tableCell":
case "tableHeader": {
// Join multiple block children with a space (not "") so adjacent blocks
// like a paragraph followed by a list don't collide into "line1- a".
// Then collapse newlines and escape pipes so a cell containing "|" or a
// line break cannot corrupt the surrounding GFM row.
return nodeContent
.map(processNode)
.join(" ")
.replace(/\r?\n/g, " ")
.replace(/\|/g, "\\|");
}
case "callout":
const calloutType = node.attrs?.type || "info";
const calloutContent = nodeContent.map(processNode).join("\n");
return `:::${calloutType.toLowerCase()}\n${calloutContent}\n:::`;
case "details":
return nodeContent.map(processNode).join("\n");
case "detailsSummary":
const summaryText = nodeContent.map(processNode).join("");
return `<details>\n<summary>${summaryText}</summary>\n`;
case "detailsContent":
const detailsText = nodeContent.map(processNode).join("\n");
return `${detailsText}\n</details>`;
case "mathInline": {
// The schema's `text` attribute has no parseHTML, so TipTap's default
// parser reads it from the `text` HTML attribute (NOT the element's text
// content). Emit span[data-type="mathInline"] carrying the LaTeX in a
// `text="..."` attribute so it round-trips. marked cannot parse $...$
// back, so the previous form was lossy.
const inlineMath = node.attrs?.text || "";
return `<span data-type="mathInline" data-katex="true" text="${escapeAttr(inlineMath)}"></span>`;
}
case "mathBlock": {
// Same as mathInline: the LaTeX must ride in the `text` HTML attribute
// for the schema's default parser to recover it.
const blockMath = node.attrs?.text || "";
return `<div data-type="mathBlock" data-katex="true" text="${escapeAttr(blockMath)}"></div>`;
}
case "mention": {
// Emit span[data-type="mention"] with the schema's data-* attributes so
// generateJSON rebuilds the mention node instead of leaving "@label"
// plain text that cannot re-parse.
const attrs = node.attrs || {};
const parts = [`data-type="mention"`];
if (attrs.id)
parts.push(`data-id="${escapeAttr(attrs.id)}"`);
if (attrs.label)
parts.push(`data-label="${escapeAttr(attrs.label)}"`);
if (attrs.entityType)
parts.push(`data-entity-type="${escapeAttr(attrs.entityType)}"`);
if (attrs.entityId)
parts.push(`data-entity-id="${escapeAttr(attrs.entityId)}"`);
if (attrs.slugId)
parts.push(`data-slug-id="${escapeAttr(attrs.slugId)}"`);
if (attrs.creatorId)
parts.push(`data-creator-id="${escapeAttr(attrs.creatorId)}"`);
if (attrs.anchorId)
parts.push(`data-anchor-id="${escapeAttr(attrs.anchorId)}"`);
// Keep the label as visible text content too; the schema reads attrs
// from data-*, so the inner text is purely cosmetic and harmless.
const mentionLabel = attrs.label || attrs.id || "";
// The label is visible element TEXT content here (the data-* attrs above
// carry the real values), so escape it for the text context, not attrs.
return `<span ${parts.join(" ")}>@${escapeHtmlText(mentionLabel)}</span>`;
}
case "attachment": {
// BUG FIX: the old code read node.attrs.fileName / node.attrs.src, but
// the schema stores name/url (plus mime/size/attachmentId). Emit the
// schema-matching div[data-type="attachment"] with data-attachment-*
// attrs so the node round-trips instead of degrading to a markdown link.
const attrs = node.attrs || {};
const parts = [
`data-type="attachment"`,
`data-attachment-url="${escapeAttr(attrs.url ?? "")}"`,
];
if (attrs.name)
parts.push(`data-attachment-name="${escapeAttr(attrs.name)}"`);
if (attrs.mime)
parts.push(`data-attachment-mime="${escapeAttr(attrs.mime)}"`);
if (attrs.size != null)
parts.push(`data-attachment-size="${escapeAttr(attrs.size)}"`);
if (attrs.attachmentId)
parts.push(`data-attachment-id="${escapeAttr(attrs.attachmentId)}"`);
return `<div ${parts.join(" ")}></div>`;
}
case "drawio":
case "excalidraw": {
// Emit the schema-matching div[data-type=...] carrying the diagram's
// attrs as data-* (the schema's diagramAttributes reads src/title/alt/
// width/height/size/aspectRatio/align/attachmentId from data-*), so the
// diagram round-trips instead of degrading to a lossy placeholder.
const attrs = node.attrs || {};
const parts = [
`data-type="${type}"`,
`data-src="${escapeAttr(attrs.src ?? "")}"`,
];
if (attrs.title != null)
parts.push(`data-title="${escapeAttr(attrs.title)}"`);
if (attrs.alt != null)
parts.push(`data-alt="${escapeAttr(attrs.alt)}"`);
if (attrs.width != null)
parts.push(`data-width="${escapeAttr(attrs.width)}"`);
if (attrs.height != null)
parts.push(`data-height="${escapeAttr(attrs.height)}"`);
if (attrs.size != null)
parts.push(`data-size="${escapeAttr(attrs.size)}"`);
if (attrs.aspectRatio != null)
parts.push(`data-aspect-ratio="${escapeAttr(attrs.aspectRatio)}"`);
if (attrs.align)
parts.push(`data-align="${escapeAttr(attrs.align)}"`);
if (attrs.attachmentId)
parts.push(`data-attachment-id="${escapeAttr(attrs.attachmentId)}"`);
return `<div ${parts.join(" ")}></div>`;
}
case "embed": {
// Emit the schema-matching div[data-type="embed"]; the schema reads
// src/provider/align/width/height from data-* attributes so the node
// (and its provider iframe info) survives the round-trip.
const attrs = node.attrs || {};
const parts = [
`data-type="embed"`,
`data-src="${escapeAttr(attrs.src ?? "")}"`,
`data-provider="${escapeAttr(attrs.provider ?? "")}"`,
];
if (attrs.align)
parts.push(`data-align="${escapeAttr(attrs.align)}"`);
if (attrs.width != null)
parts.push(`data-width="${escapeAttr(attrs.width)}"`);
if (attrs.height != null)
parts.push(`data-height="${escapeAttr(attrs.height)}"`);
return `<div ${parts.join(" ")}></div>`;
}
case "audio": {
// Emit the schema-matching <audio> element (was emitting nothing). The
// schema reads src from src and attachmentId/size from data-*.
const attrs = node.attrs || {};
const parts = [`src="${escapeAttr(attrs.src ?? "")}"`];
if (attrs.attachmentId)
parts.push(`data-attachment-id="${escapeAttr(attrs.attachmentId)}"`);
if (attrs.size != null)
parts.push(`data-size="${escapeAttr(attrs.size)}"`);
// Wrap in a block <div> for the same reason as video: a bare <audio> is
// inline-level HTML that marked would wrap in <p>.
return `<div><audio ${parts.join(" ")}></audio></div>`;
}
case "pdf": {
// Emit the schema-matching div[data-type="pdf"] (was emitting nothing).
// The schema reads src/width/height from standard attrs and name/
// attachmentId/size from data-*.
const attrs = node.attrs || {};
const parts = [
`data-type="pdf"`,
`src="${escapeAttr(attrs.src ?? "")}"`,
];
if (attrs.name)
parts.push(`data-name="${escapeAttr(attrs.name)}"`);
if (attrs.attachmentId)
parts.push(`data-attachment-id="${escapeAttr(attrs.attachmentId)}"`);
if (attrs.size != null)
parts.push(`data-size="${escapeAttr(attrs.size)}"`);
if (attrs.width != null)
parts.push(`width="${escapeAttr(attrs.width)}"`);
if (attrs.height != null)
parts.push(`height="${escapeAttr(attrs.height)}"`);
return `<div ${parts.join(" ")}></div>`;
}
case "columns": {
// Emit the schema-matching div[data-type="columns"] wrapper so the
// multi-column layout survives. Without a case the children were
// concatenated with no separator and the text merged. The schema reads
// layout from data-layout and widthMode from data-width-mode. The whole
// block is raw HTML, so render children via blockToHtml (NOT markdown,
// which marked would not re-parse inside a raw HTML block).
const attrs = node.attrs || {};
const parts = [`data-type="columns"`];
if (attrs.layout)
parts.push(`data-layout="${escapeAttr(attrs.layout)}"`);
if (attrs.widthMode && attrs.widthMode !== "normal")
parts.push(`data-width-mode="${escapeAttr(attrs.widthMode)}"`);
const inner = nodeContent.map((n) => blockToHtml(n)).join("");
return `<div ${parts.join(" ")}>${inner}</div>`;
}
case "column": {
// Emit the schema-matching div[data-type="column"]; the schema reads the
// column width from data-width. Children are rendered as HTML so their
// formatting survives inside this raw HTML block.
const attrs = node.attrs || {};
const parts = [`data-type="column"`];
if (attrs.width)
parts.push(`data-width="${escapeAttr(attrs.width)}"`);
const inner = nodeContent.map((n) => blockToHtml(n)).join("");
return `<div ${parts.join(" ")}>${inner}</div>`;
}
case "pageBreak":
// Emit the schema-matching div[data-type="pageBreak"] so marked passes
// it through as a block and generateJSON rebuilds the pageBreak atom.
// Without this case the node fell through to `default` and rendered ""
// (the divider silently disappeared and could not round-trip).
return `<div data-type="pageBreak"></div>`;
case "subpages":
return "{{SUBPAGES}}";
default:
// Fallback: process children
return nodeContent.map(processNode).join("");
}
};
// Render inline content (text runs + their marks) to HTML. Used by the raw
// HTML fallbacks (spanned tables, columns) where marked will NOT re-parse
// markdown, so backtick/asterisk/bracket syntax would otherwise leak as
// literal characters. Each mark is mirrored to the HTML the schema's parseHTML
// accepts so it re-imports as the matching ProseMirror mark.
const inlineToHtml = (inlineNodes) => (inlineNodes || [])
.map((n) => {
if (n.type === "hardBreak")
return "<br>";
if (n.type !== "text") {
// Inline atoms (mention, mathInline) already emit schema HTML.
return processNode(n);
}
let t = escapeHtmlText(n.text || "");
for (const mark of n.marks || []) {
switch (mark.type) {
case "bold":
t = `<strong>${t}</strong>`;
break;
case "italic":
t = `<em>${t}</em>`;
break;
case "code":
t = `<code>${t}</code>`;
break;
case "strike":
t = `<s>${t}</s>`;
break;
case "underline":
t = `<u>${t}</u>`;
break;
case "subscript":
t = `<sub>${t}</sub>`;
break;
case "superscript":
t = `<sup>${t}</sup>`;
break;
case "link":
t = `<a href="${escapeAttr(mark.attrs?.href || "")}">${t}</a>`;
break;
case "highlight":
t = mark.attrs?.color
? `<mark style="background-color: ${escapeAttr(mark.attrs.color)}">${t}</mark>`
: `<mark>${t}</mark>`;
break;
case "textStyle":
if (mark.attrs?.color)
t = `<span style="color: ${escapeAttr(mark.attrs.color)}">${t}</span>`;
break;
case "comment":
// Inline comment anchor inside a raw-HTML container (columns /
// spanned table cells), so commented text there also round-trips.
if (mark.attrs?.commentId) {
const r = mark.attrs?.resolved ? ` data-resolved="true"` : "";
t = `<span data-comment-id="${escapeAttr(mark.attrs.commentId)}"${r}>${t}</span>`;
}
break;
}
}
return t;
})
.join("");
// Emit the schema-matching <img> for an image node. Shared so the image is
// emitted as real HTML wherever a raw-HTML container needs it (inside a column
// or a spanned table cell), where markdown `![](...)` would NOT be re-parsed
// and would survive as literal text. The Image extension reads src/alt from
// the standard attributes; the Docmost extra attrs (width/height/align/size/
// attachmentId/aspectRatio) are global attributes read from same-named DOM
// attributes, so emit them by name.
const imageToHtml = (node) => {
const attrs = node.attrs || {};
const parts = [`src="${escapeAttr(attrs.src ?? "")}"`];
if (attrs.alt)
parts.push(`alt="${escapeAttr(attrs.alt)}"`);
if (attrs.title)
parts.push(`title="${escapeAttr(attrs.title)}"`);
if (attrs.width != null)
parts.push(`width="${escapeAttr(attrs.width)}"`);
if (attrs.height != null)
parts.push(`height="${escapeAttr(attrs.height)}"`);
if (attrs.align)
parts.push(`align="${escapeAttr(attrs.align)}"`);
if (attrs.size != null)
parts.push(`data-size="${escapeAttr(attrs.size)}"`);
if (attrs.attachmentId)
parts.push(`data-attachment-id="${escapeAttr(attrs.attachmentId)}"`);
if (attrs.aspectRatio != null)
parts.push(`data-aspect-ratio="${escapeAttr(attrs.aspectRatio)}"`);
return `<img ${parts.join(" ")}>`;
};
// Emit the schema-matching div[data-type="callout"] for a callout node. The
// schema reads the banner type from data-callout-type. Children are rendered
// as HTML so they survive inside a raw-HTML container.
const calloutToHtml = (node) => {
const type = (node.attrs?.type || "info").toLowerCase();
const inner = (node.content || []).map(blockToHtml).join("");
return `<div data-type="callout" data-callout-type="${escapeAttr(type)}">${inner}</div>`;
};
// Emit a schema-matching <details> tree. The schema parses <details>,
// summary[data-type="detailsSummary"], and div[data-type="detailsContent"].
const detailsToHtml = (node) => {
const inner = (node.content || []).map(blockToHtml).join("");
return `<details>${inner}</details>`;
};
const detailsSummaryToHtml = (node) => `<summary data-type="detailsSummary">${inlineToHtml(node.content || [])}</summary>`;
const detailsContentToHtml = (node) => {
const inner = (node.content || []).map(blockToHtml).join("");
return `<div data-type="detailsContent">${inner}</div>`;
};
// Emit the schema-matching taskList/taskItem HTML. bridgeTaskLists (in
// collaboration.ts) recognizes ul[data-type="taskList"] with
// li[data-type="taskItem"][data-checked]; emitting that directly here keeps
// task lists inside columns/cells from degrading to literal "- [ ]" text.
const taskListToHtml = (node) => {
const items = (node.content || [])
.map((it) => {
const checked = it.attrs?.checked ? "true" : "false";
return `<li data-type="taskItem" data-checked="${checked}">${blockChildrenToHtml(it)}</li>`;
})
.join("");
return `<ul data-type="taskList">${items}</ul>`;
};
// Render a block node to HTML for the raw-HTML containers (spanned tables,
// columns). marked does NOT re-parse markdown inside a raw-HTML block, so
// EVERY block type that can appear inside a column or a spanned cell must be
// emitted as schema-matching HTML here — never as markdown, or it would land
// as literal text on re-import. Nodes whose processNode case already produces
// schema-matching HTML (math/media/embed/attachment/nested columns/spanned
// table) are delegated to processNode; the markdown-emitting cases
// (image/blockquote/callout/details/hr/taskList) get explicit HTML here.
const blockToHtml = (block) => {
const children = block.content || [];
switch (block.type) {
case "paragraph":
return `<p>${inlineToHtml(children)}</p>`;
case "heading": {
const level = block.attrs?.level || 1;
return `<h${level}>${inlineToHtml(children)}</h${level}>`;
}
case "bulletList":
return `<ul>${children
.map((li) => `<li>${blockChildrenToHtml(li)}</li>`)
.join("")}</ul>`;
case "orderedList":
return `<ol>${children
.map((li) => `<li>${blockChildrenToHtml(li)}</li>`)
.join("")}</ol>`;
case "codeBlock": {
const lang = block.attrs?.language || "";
// The code itself is element TEXT content (between <code> tags), so it
// must escape < > & — NOT the attribute escaper. The language rides in
// a class ATTRIBUTE, so it uses escapeAttr.
const code = escapeHtmlText(children
.map(processNode)
.join("")
.replace(/\n+$/, ""));
const cls = lang ? ` class="language-${escapeAttr(lang)}"` : "";
return `<pre><code${cls}>${code}</code></pre>`;
}
case "image":
return imageToHtml(block);
case "blockquote":
return `<blockquote>${children.map(blockToHtml).join("")}</blockquote>`;
case "horizontalRule":
return "<hr>";
case "callout":
return calloutToHtml(block);
case "details":
return detailsToHtml(block);
case "detailsSummary":
return detailsSummaryToHtml(block);
case "detailsContent":
return detailsContentToHtml(block);
case "taskList":
return taskListToHtml(block);
case "taskItem":
// A bare taskItem (outside a taskList) still needs a wrapping list so
// the schema parses it; wrap it in a single-item taskList.
return taskListToHtml({ content: [block] });
// table (incl. spanned), columns/column, math, media, embed, attachment,
// mention, etc. already emit schema-matching HTML from processNode.
case "table":
case "columns":
case "column":
case "mathBlock":
case "video":
case "audio":
case "pdf":
case "youtube":
case "embed":
case "attachment":
case "drawio":
case "excalidraw":
return processNode(block);
default:
// Any still-unhandled block type: NEVER fall back to markdown inside a
// raw-HTML block (it would become literal text). Wrap its rendered
// children in a <div> so their content is preserved; if it has no block
// children, render its inline content instead.
if (children.length && children.some((c) => c.type !== "text")) {
return `<div>${children.map(blockToHtml).join("")}</div>`;
}
return `<div>${inlineToHtml(children)}</div>`;
}
};
// Render the block children of a list item to HTML (a listItem holds block+
// content). Mirrors processListItem but for the HTML fallback path.
const blockChildrenToHtml = (item) => (item.content || []).map((b) => blockToHtml(b)).join("");
// Indent the rendered children of a list item under a marker prefix.
// Each child block is a (possibly multi-line) string. The very first physical
// line of the first child carries the marker (e.g. "- " or "1. "); EVERY
// other line — the remaining lines of the first child AND all lines of every
// subsequent child (nested lists, code blocks, extra paragraphs) — is indented
// to align under the marker. Without indenting these continuation lines, the
// 2nd/3rd line of a nested child collapses to column 0 and escapes the list.
//
// The continuation indent MUST equal the LIST marker width, which is not the
// same as the visible prefix width:
// - bullet "- " -> 2 columns
// - task "- [ ] " -> marker is still "- " (the "[ ] " is content), 2
// - ordered "1. "/"10. " -> 3/4 columns, scaling with the number's digits
// CommonMark anchors nested content to the marker column, so an ordered item
// indented to only 2 columns would be re-parsed as a sibling/loose content on
// re-import. Callers therefore pass the exact indent width to use.
const indentItemChildren = (childStrings, prefix, indentWidth) => {
const indent = " ".repeat(indentWidth);
const lines = [];
childStrings.forEach((child, childIndex) => {
child.split("\n").forEach((line, lineIndex) => {
if (childIndex === 0 && lineIndex === 0) {
// First physical line of the first block gets the marker.
lines.push(`${prefix} ${line}`);
}
else {
// Indent every continuation line by the marker width; keep blank
// lines blank rather than emitting trailing whitespace.
lines.push(line.length ? `${indent}${line}` : "");
}
});
});
return lines.join("\n");
};
const processListItem = (item, prefix) => {
const itemContent = item.content || [];
const childStrings = itemContent.map(processNode);
if (childStrings.length === 0)
return prefix;
// The rendered marker is `${prefix} ` (prefix + one space), so its width —
// and thus the continuation indent — is prefix.length + 1. This is correct
// for both bullet ("-" -> 2) and ordered ("1." -> 3, "10." -> 4) markers,
// since for those the visible prefix IS the list marker.
return indentItemChildren(childStrings, prefix, prefix.length + 1);
};
const processTaskItem = (item) => {
const checked = item.attrs?.checked || false;
const checkbox = checked ? "[x]" : "[ ]";
const prefix = `- ${checkbox}`;
const itemContent = item.content || [];
const childStrings = itemContent.map(processNode);
// An empty task item still needs its checkbox marker; without this guard
// the indent below produces "" and the "- [ ]"/"- [x]" row disappears.
if (childStrings.length === 0)
return prefix;
// The list marker for a task item is just "- " (2 columns); the "[ ] "/"[x] "
// checkbox is item content, NOT part of the marker. So the continuation
// indent is a fixed 2 — do NOT derive it from the wider prefix.length.
return indentItemChildren(childStrings, prefix, 2);
};
return processNode(content).trim();
}

View File

@@ -0,0 +1,68 @@
/**
* Self-contained Docmost-flavoured Markdown document (custom extensions).
*
* A single `.md` file that packages everything needed to losslessly round-trip
* a page through "download -> edit body -> re-upload":
* - a leading `docmost:meta` block: a one-line JSON object with page identity;
* - the Markdown body (carrying inline comment anchors and diagrams as HTML);
* - a trailing `docmost:comments` block: a one-line JSON array of comment
* threads.
*
* Both metadata blocks are HTML comments on purpose: `marked`/`generateJSON`
* drop HTML comments, so even if the WHOLE file were ever fed straight to the
* importer without first stripping the blocks, the metadata cannot leak into the
* document. (A fenced ```docmost-comments``` block would WRONGLY become a
* codeBlock node, so a fenced block is deliberately NOT used.)
*
* The delimiter literals may legitimately appear in the BODY too (e.g. a user
* re-pastes an exported `.md` into a page, or a page documents this very
* format). To stay robust, parsing treats only the FINAL, document-ending
* `docmost:comments` block as metadata: it is the last `<!-- docmost:comments`
* opener whose closing `-->` sits at the very end of the file. Any earlier
* literal occurrence is left in the body untouched.
*
* NOTE on comments: in this version the comment THREAD records are preserved in
* the file but are NOT pushed back to the server on import — only the inline
* comment marks (anchors) embedded in the body are restored. Managing comment
* records stays with the comment tools/UI.
*/
export interface DocmostMdMeta {
version: number;
pageId?: string;
slugId?: string;
title?: string;
spaceId?: string;
parentPageId?: string | null;
}
/**
* Assemble the full self-contained markdown file: meta block, body, and the
* comments block. The meta block is always emitted; the comments block is always
* emitted too (with `[]` when there are no comments) so the format stays uniform
* and parsing stays simple.
*/
export declare function serializeDocmostMarkdown(meta: DocmostMdMeta, body: string, comments: any[]): string;
/**
* Split a self-contained file back into its parts. Tolerant: if the meta or
* comments block is missing (e.g. a hand-written plain-markdown file), the
* corresponding value is returned as `null` and the whole input is treated as
* the body. This never throws on a MISSING block; only a `JSON.parse` failure
* inside a block that IS present is surfaced as a thrown Error with a clear
* message. Robust to `\r\n` line endings.
*/
export declare function parseDocmostMarkdown(full: string): {
meta: DocmostMdMeta | null;
body: string;
comments: any[] | null;
};
/**
* Serialize a self-contained markdown file with the meta block + body ONLY —
* NO trailing `docmost:comments` block. The sync engine never touches
* `/comments` (SPEC §3): the synced file carries just page identity (meta) and
* the body, where comment threads survive only as inline `<span
* data-comment-id>` anchor marks inside the body.
*
* `parseDocmostMarkdown` already tolerates a missing comments block (it returns
* `comments: null` and treats the rest as body), so a file produced here
* round-trips cleanly through the parser.
*/
export declare function serializeDocmostMarkdownBody(meta: DocmostMdMeta, body: string): string;

View File

@@ -0,0 +1,118 @@
/**
* Self-contained Docmost-flavoured Markdown document (custom extensions).
*
* A single `.md` file that packages everything needed to losslessly round-trip
* a page through "download -> edit body -> re-upload":
* - a leading `docmost:meta` block: a one-line JSON object with page identity;
* - the Markdown body (carrying inline comment anchors and diagrams as HTML);
* - a trailing `docmost:comments` block: a one-line JSON array of comment
* threads.
*
* Both metadata blocks are HTML comments on purpose: `marked`/`generateJSON`
* drop HTML comments, so even if the WHOLE file were ever fed straight to the
* importer without first stripping the blocks, the metadata cannot leak into the
* document. (A fenced ```docmost-comments``` block would WRONGLY become a
* codeBlock node, so a fenced block is deliberately NOT used.)
*
* The delimiter literals may legitimately appear in the BODY too (e.g. a user
* re-pastes an exported `.md` into a page, or a page documents this very
* format). To stay robust, parsing treats only the FINAL, document-ending
* `docmost:comments` block as metadata: it is the last `<!-- docmost:comments`
* opener whose closing `-->` sits at the very end of the file. Any earlier
* literal occurrence is left in the body untouched.
*
* NOTE on comments: in this version the comment THREAD records are preserved in
* the file but are NOT pushed back to the server on import — only the inline
* comment marks (anchors) embedded in the body are restored. Managing comment
* records stays with the comment tools/UI.
*/
// Match the leading meta block (allow leading whitespace). Capture group 1 is
// the JSON text between the markers.
const META_RE = /^\s*<!--\s*docmost:meta\s*\n([\s\S]*?)\n-->/;
// Match a `docmost:comments` opener. Used globally to scan for the LAST opener
// rather than end-anchoring a single regex (which would mis-capture across a
// literal opener that appears earlier in the body).
const COMMENTS_OPEN_RE = /<!--[ \t]*docmost:comments[ \t]*\r?\n/g;
/**
* Assemble the full self-contained markdown file: meta block, body, and the
* comments block. The meta block is always emitted; the comments block is always
* emitted too (with `[]` when there are no comments) so the format stays uniform
* and parsing stays simple.
*/
export function serializeDocmostMarkdown(meta, body, comments) {
const metaJson = JSON.stringify(meta);
const commentsJson = JSON.stringify(Array.isArray(comments) ? comments : []);
const trimmedBody = (body ?? "").trim();
return (`<!-- docmost:meta\n${metaJson}\n-->\n\n` +
`${trimmedBody}\n\n` +
`<!-- docmost:comments\n${commentsJson}\n-->\n`);
}
/**
* Split a self-contained file back into its parts. Tolerant: if the meta or
* comments block is missing (e.g. a hand-written plain-markdown file), the
* corresponding value is returned as `null` and the whole input is treated as
* the body. This never throws on a MISSING block; only a `JSON.parse` failure
* inside a block that IS present is surfaced as a thrown Error with a clear
* message. Robust to `\r\n` line endings.
*/
export function parseDocmostMarkdown(full) {
// Normalize line endings so the anchored regexes work regardless of CRLF.
const normalized = (full ?? "").replace(/\r\n/g, "\n");
// Extract the leading meta block (start-anchored — already unambiguous).
let meta = null;
let metaEnd = 0;
const metaMatch = normalized.match(META_RE);
if (metaMatch) {
try {
meta = JSON.parse(metaMatch[1]);
}
catch (e) {
throw new Error(`Invalid docmost:meta JSON block: ${e instanceof Error ? e.message : String(e)}`);
}
// Body starts right after the matched meta block.
metaEnd = (metaMatch.index ?? 0) + metaMatch[0].length;
}
// Find the LAST `<!-- docmost:comments` opener; the real file-level block is
// the final one whose closing `-->` ends the document. Any earlier literal
// occurrence inside the body (e.g. a re-pasted export) is left in the body.
let lastOpenStart = -1;
let lastOpenEnd = -1;
let m;
COMMENTS_OPEN_RE.lastIndex = 0;
while ((m = COMMENTS_OPEN_RE.exec(normalized)) !== null) {
lastOpenStart = m.index;
lastOpenEnd = m.index + m[0].length;
}
let comments = null;
let bodyEnd = normalized.length;
if (lastOpenStart !== -1) {
const rest = normalized.slice(lastOpenEnd);
const close = rest.match(/\r?\n-->[ \t]*\r?\n?\s*$/); // closer must end the doc
if (close) {
const jsonText = rest.slice(0, close.index);
try {
comments = JSON.parse(jsonText);
}
catch (e) {
throw new Error(`Invalid docmost:comments JSON block: ${e instanceof Error ? e.message : String(e)}`);
}
bodyEnd = lastOpenStart; // strip from the opener to end of document
}
}
const body = normalized.slice(metaEnd, bodyEnd).trim();
return { meta, body, comments };
}
/**
* Serialize a self-contained markdown file with the meta block + body ONLY —
* NO trailing `docmost:comments` block. The sync engine never touches
* `/comments` (SPEC §3): the synced file carries just page identity (meta) and
* the body, where comment threads survive only as inline `<span
* data-comment-id>` anchor marks inside the body.
*
* `parseDocmostMarkdown` already tolerates a missing comments block (it returns
* `comments: null` and treats the rest as body), so a file produced here
* round-trips cleanly through the parser.
*/
export function serializeDocmostMarkdownBody(meta, body) {
return `<!-- docmost:meta\n${JSON.stringify(meta)}\n-->\n\n${(body ?? "").trim()}\n`;
}

View File

@@ -0,0 +1,2 @@
/** Convert markdown to a ProseMirror doc using the full Docmost schema. */
export declare function markdownToProseMirror(markdownContent: string): Promise<any>;

View File

@@ -0,0 +1,306 @@
/**
* Pure markdown -> ProseMirror conversion.
*
* The converter path is `markdownToProseMirror` (marked -> HTML ->
* generateJSON) plus the two pre/post processors it needs (`preprocessCallouts`,
* `bridgeTaskLists`). The gitmost server writes the resulting page bodies
* natively through the collab gateway, so no websocket/Yjs write-path lives
* here.
*/
import { generateJSON } from "@tiptap/html";
import { JSDOM } from "jsdom";
import { marked } from "marked";
import { docmostExtensions } from "./docmost-schema.js";
// Setup DOM environment for Tiptap HTML parsing in Node.js
const dom = new JSDOM("<!DOCTYPE html><html><body></body></html>");
global.window = dom.window;
global.document = dom.window.document;
// @ts-ignore
global.Element = dom.window.Element;
/**
* Hard ceiling above which we skip callout preprocessing entirely. The linear
* scanner below has no quadratic blow-up, but we still cap input defensively so
* a pathological multi-megabyte payload cannot tie up the event loop; in that
* case the markdown is passed through verbatim (callouts are simply not
* detected) rather than risking a slow scan.
*/
const MAX_CALLOUT_PREPROCESS_BYTES = 4 * 1024 * 1024; // 4 MB
/** Matches an opening callout fence: `:::type` (type captured, lower-cased). */
const CALLOUT_OPEN_RE = /^:::\s*(\w+)\s*$/;
/** Matches a bare closing callout fence: `:::`. */
const CALLOUT_CLOSE_RE = /^:::\s*$/;
/** Matches the start/end of a code fence (``` or ~~~), capturing the marker. */
const CODE_FENCE_RE = /^(\s*)(`{3,}|~{3,})/;
/**
* Pre-process Docmost-flavoured markdown: convert `:::type ... :::`
* callout blocks (the syntax our markdown export produces) into HTML
* divs that the callout extension parses. The inner content is rendered
* through marked as regular markdown.
*
* Implemented as a single linear pass over the lines (no quadratic regex
* rescan). It:
* - tracks fenced code regions (```...``` and ~~~...~~~) and never treats a
* `:::` line that lives inside a code fence as a callout delimiter, so a
* callout body that itself contains a fenced code block with a `:::` line is
* no longer corrupted;
* - matches an opening `:::type` line with the next CLOSING `:::` at the SAME
* nesting level, supporting NESTED callouts via a depth counter (an inner
* `:::type` opens a deeper level and consumes a matching `:::`);
* - emits the same `<div data-type="callout" data-callout-type="TYPE">` output
* (inner rendered through marked) as the previous regex implementation.
*/
async function preprocessCallouts(markdown) {
// Defensive cap: skip preprocessing for pathologically large inputs.
if (markdown.length > MAX_CALLOUT_PREPROCESS_BYTES) {
return markdown;
}
// Recursively transform a slice of lines, converting top-level callouts in
// that slice into <div> blocks and rendering their inner content (which may
// itself contain nested callouts) through this same function.
const transform = async (lines) => {
const out = [];
let inCodeFence = false;
let codeFenceMarker = ""; // the exact run of backticks/tildes that opened it
let i = 0;
while (i < lines.length) {
const line = lines[i];
// Inside a code fence, only its matching closing fence is significant;
// everything else (including `:::` lines) is copied through verbatim.
if (inCodeFence) {
out.push(line);
const fence = line.match(CODE_FENCE_RE);
if (fence && fence[2].startsWith(codeFenceMarker[0]) &&
fence[2].length >= codeFenceMarker.length) {
inCodeFence = false;
codeFenceMarker = "";
}
i++;
continue;
}
// A code fence opening outside any callout body: enter code-fence mode.
const fenceOpen = line.match(CODE_FENCE_RE);
if (fenceOpen) {
inCodeFence = true;
codeFenceMarker = fenceOpen[2];
out.push(line);
i++;
continue;
}
// An opening callout fence: scan forward (with code-fence and nested
// callout awareness) for its matching closing `:::` at the same level.
const open = line.match(CALLOUT_OPEN_RE);
if (open) {
const type = open[1].toLowerCase();
const bodyLines = [];
let depth = 1;
let innerInCodeFence = false;
let innerCodeFenceMarker = "";
let j = i + 1;
for (; j < lines.length; j++) {
const bl = lines[j];
if (innerInCodeFence) {
const f = bl.match(CODE_FENCE_RE);
if (f && f[2].startsWith(innerCodeFenceMarker[0]) &&
f[2].length >= innerCodeFenceMarker.length) {
innerInCodeFence = false;
innerCodeFenceMarker = "";
}
bodyLines.push(bl);
continue;
}
const innerFence = bl.match(CODE_FENCE_RE);
if (innerFence) {
innerInCodeFence = true;
innerCodeFenceMarker = innerFence[2];
bodyLines.push(bl);
continue;
}
if (CALLOUT_OPEN_RE.test(bl)) {
depth++;
bodyLines.push(bl);
continue;
}
if (CALLOUT_CLOSE_RE.test(bl)) {
depth--;
if (depth === 0)
break; // matching close for THIS callout
bodyLines.push(bl);
continue;
}
bodyLines.push(bl);
}
if (j < lines.length) {
// Found the matching closing fence: render the body (recursively, so
// nested callouts are handled) and emit the callout div.
const inner = await transform(bodyLines);
const renderedInner = await marked.parse(inner);
out.push(`\n<div data-type="callout" data-callout-type="${type}">${renderedInner}</div>\n`);
i = j + 1; // skip past the closing `:::`
continue;
}
// No matching close (unterminated callout): treat the opener as a
// literal line and continue, preserving the original text.
out.push(line);
i++;
continue;
}
out.push(line);
i++;
}
return out.join("\n");
};
return transform(markdown.split("\n"));
}
/**
* Bridge marked's checkbox lists to TipTap task lists.
*
* marked renders GitHub task list items (`- [x] done`) as a plain
* `<ul><li><p><input type="checkbox" checked> text</p></li></ul>` WITHOUT the
* markup TipTap's TaskList/TaskItem extensions parse. This rewrites such lists
* into the shape those extensions expect:
* TaskList parseHTML matches `ul[data-type="taskList"]`,
* TaskItem matches `li[data-type="taskItem"]`,
* the checked state is read from `data-checked === "true"`.
*
* A list is only converted when it has at least one `<li>` and EVERY direct
* `<li>` contains a checkbox input. Both `<ul>` and `<ol>` are considered: a
* numbered checklist (`1. [x] a`, which marked renders as an `<ol>` of checkbox
* `<li>`s) would otherwise lose its task state. TipTap task lists are unordered,
* so a matching `<ol>` is emitted as `data-type="taskList"` exactly like a
* `<ul>`. Mixed or ordinary lists (including ordinary `<ol>` lists) are left
* untouched so they keep rendering as bullet/numbered lists. The marked `<p>`
* wrapper is kept inside the `<li>` because TaskItem content allows paragraphs.
*/
function bridgeTaskLists(html) {
// Cheap early-out: if the markup contains no checkbox input at all there is
// nothing to bridge, so skip the expensive JSDOM parse entirely. This is the
// common case (most pages have no task lists).
if (!/type=["']?checkbox/i.test(html)) {
return html;
}
// Defensive cap (consistent with preprocessCallouts): skip the bridge for
// pathologically large inputs rather than running a second expensive JSDOM
// parse on a multi-megabyte payload. The markup is passed through verbatim.
if (html.length > MAX_CALLOUT_PREPROCESS_BYTES) {
return html;
}
const dom = new JSDOM(html);
const document = dom.window.document;
// Collect the checkbox(es) that belong to THIS <li> directly: either direct
// child <input type="checkbox"> elements or ones inside the <li>'s direct <p>
// child (the shape marked emits: `<li><p><input type="checkbox"> text</p></li>`).
// Checkboxes nested deeper (e.g. inside a child <ul>/<ol>) are excluded so a
// bullet <li> that merely contains a nested task sublist is not misdetected.
// Raw inline HTML can put more than one checkbox in a single <li>; we gather
// ALL of them so none survive into the converted item.
const directCheckboxes = (li) => {
const found = [];
for (const child of Array.from(li.children)) {
if (child.tagName === "INPUT" &&
child.getAttribute("type") === "checkbox") {
found.push(child);
continue;
}
if (child.tagName === "P") {
for (const inp of Array.from(child.querySelectorAll(":scope > input[type='checkbox']"))) {
found.push(inp);
}
}
}
return found;
};
// Both <ul> and <ol> are candidates: an <ol> whose every direct <li> carries
// its own checkbox is a numbered checklist that must also become a taskList.
const lists = Array.from(document.querySelectorAll("ul, ol"));
for (const list of lists) {
// Only consider DIRECT child <li> elements; nested lists are handled by
// their own iteration of the outer loop.
const items = Array.from(list.children).filter((child) => child.tagName === "LI");
if (items.length === 0)
continue;
const itemCheckboxes = items.map((li) => directCheckboxes(li));
// Convert only when every direct <li> carries at least one OWN checkbox.
if (!itemCheckboxes.every((boxes) => boxes.length > 0))
continue;
// A numbered checklist arrives as an <ol>. We must NOT leave the tag as
// <ol> while tagging it data-type="taskList": generateJSON would then match
// BOTH the orderedList rule (tag ol) and the taskList rule (data-type),
// emitting a phantom empty orderedList beside the real taskList. So rename a
// qualifying <ol> to a <ul> — move its <li> children over and replace it —
// leaving only the taskList rule to match. Already-<ul> lists are unchanged.
let target = list;
if (list.tagName === "OL") {
const ul = document.createElement("ul");
// Carry over existing attributes (e.g. class) so nothing is silently lost.
for (const attr of Array.from(list.attributes)) {
ul.setAttribute(attr.name, attr.value);
}
// Move every child node (including the <li>s we collected) into the <ul>.
while (list.firstChild) {
ul.appendChild(list.firstChild);
}
list.replaceWith(ul);
target = ul;
}
target.setAttribute("data-type", "taskList");
items.forEach((li, index) => {
const boxes = itemCheckboxes[index];
// The first checkbox determines the checked state (matches the previous
// single-checkbox behaviour); any extras only need removing.
const input = boxes[0] ?? null;
li.setAttribute("data-type", "taskItem");
const checked = input != null &&
(input.hasAttribute("checked") || input.checked);
li.setAttribute("data-checked", checked ? "true" : "false");
// Remove ALL direct checkbox inputs so none survive into the content
// (a raw-inline-HTML <li> may carry more than one).
for (const box of boxes) {
box.remove();
}
});
}
return document.body.innerHTML;
}
/**
* Recursively strip content-less paragraph nodes from a generated doc.
*
* A block-level atom whose markdown form is INLINE (e.g. the block `image`'s
* `![](url)`, or a bare media element) is wrapped by marked in a <p>; the schema
* then HOISTS the block atom out of that paragraph, leaving an EMPTY paragraph
* sibling. On the next export that empty `<p>` renders to "" and the doc "\n\n"
* join injects a phantom blank gap, so the markdown is not byte-stable.
*
* Markdown blank lines are separators, never content, so generateJSON only ever
* produces an empty paragraph as such a hoist artifact — removing them is safe
* and general (it also subsumes the <div>-wrapper workaround the `video` case
* uses). We remove ONLY `type === 'paragraph'` nodes whose `content` is absent
* or an empty array; every other node (including atoms without `content`) is
* preserved, and we recurse into the content of any node that has children.
*/
function stripEmptyParagraphs(node) {
if (!node || !Array.isArray(node.content)) {
// Atom / leaf node (no children to recurse into): keep as-is.
return node;
}
const mapped = node.content.map((child) => stripEmptyParagraphs(child));
const isEmptyParagraph = (child) => !!child &&
child.type === "paragraph" &&
(!Array.isArray(child.content) || child.content.length === 0);
const filtered = mapped.filter((child) => !isEmptyParagraph(child));
// Schema-validity guard: several nodes require NON-empty block content
// (`content: "block+"` — tableCell, tableHeader, blockquote, column, callout,
// and the doc root). For an empty one of those, generateJSON materializes a
// single empty paragraph as its OBLIGATORY content — that is not a hoist
// artifact. If stripping would empty the container, keep ONE empty paragraph
// so the result stays schema-valid (an empty cell/quote must not become `[]`).
const cleaned = filtered.length === 0 && mapped.length > 0 ? [mapped[0]] : filtered;
return { ...node, content: cleaned };
}
/** Convert markdown to a ProseMirror doc using the full Docmost schema. */
export async function markdownToProseMirror(markdownContent) {
const withCallouts = await preprocessCallouts(markdownContent);
const html = await marked.parse(withCallouts);
const bridged = bridgeTaskLists(html);
const doc = generateJSON(bridged, docmostExtensions);
return stripEmptyParagraphs(doc);
}

View File

@@ -0,0 +1,194 @@
/**
* Pure, network-free helpers for manipulating a ProseMirror/TipTap document
* tree by node id.
*
* A ProseMirror node here is a plain JSON object of the shape produced by
* Docmost: `{ type, attrs?, content?, text?, marks? }`. Children live in the
* `content` array; a node carries a stable id in `attrs.id`. Callouts and
* table cells hold their children in `content` just like any other block, so a
* single recursive walk reaches them all.
*
* Every exported function operates on a DEEP CLONE of the input document and
* returns the new document. The input doc and any `newNode`/`node` argument are
* never mutated. All functions are defensively null-safe: missing/!Array
* `content`, non-object nodes, and absent `attrs` are tolerated.
*/
/**
* Recursively concatenate all text contained in a node.
*
* Text nodes contribute their `text` string; container nodes contribute the
* joined `blockPlainText` of their `content` children. Returns "" for nullish
* or non-object inputs.
*/
export declare function blockPlainText(node: any): string;
/** One compact outline entry for a single top-level block. */
export interface OutlineEntry {
index: number;
type: string | undefined;
id: string | null;
firstText: string;
/** Present for headings only. */
level?: number | null;
/** Present for tables only. */
rows?: number;
cols?: number;
header?: string[];
/** Present for list blocks only (bulletList/orderedList/taskList). */
items?: number;
}
/**
* Build a COMPACT outline of the TOP-LEVEL blocks of `doc` (the entries in
* `doc.content`). Deliberately does NOT recurse into paragraphs, list items, or
* table cells — compactness is the point; use `getNodeByRef` to drill into a
* specific block.
*
* Each entry carries `{ index, type, id, firstText }`, plus type-specific
* extras: headings add `level`; tables add `rows`/`cols` and the first row's
* cell texts as `header`; list blocks (types ending in "List") add `items`.
* `firstText` is the block's plain text truncated to 100 chars. Null-safe:
* a missing or non-object doc/content yields `[]`.
*/
export declare function buildOutline(doc: any): OutlineEntry[];
/**
* Resolve a single node by reference and return `{ node, path, type }`, or
* `null` when nothing matches.
*
* - `ref` of the form `#<n>` (e.g. `#2`) selects the TOP-LEVEL block at index
* `n` in `doc.content`. This is the only way to address table/tableRow/
* tableCell nodes, which carry no `attrs.id`.
* - Otherwise `ref` is treated as a block id: the FIRST node anywhere in the
* tree with `attrs.id === ref` is returned.
*
* `path` is the array of child indices from the doc root down to the node
* (so a top-level block is `[index]`). The returned `node` is a DEEP CLONE,
* so callers can mutate it without touching the input doc. Null-safe.
*/
export declare function getNodeByRef(doc: any, ref: string): {
node: any;
path: number[];
type: string | undefined;
} | null;
/**
* Replace EVERY node whose `attrs.id === nodeId` with a deep clone of
* `newNode`, anywhere in the tree (including inside callouts and table cells).
*
* Operates on a clone of `doc`; returns `{ doc, replaced }` where `replaced`
* is the number of nodes substituted. A fresh clone of `newNode` is used for
* each match so they do not share references.
*/
export declare function replaceNodeById(doc: any, nodeId: string, newNode: any): {
doc: any;
replaced: number;
};
/**
* Remove EVERY node whose `attrs.id === nodeId` from its parent `content`
* array, anywhere in the tree (recursive, including callouts and tables).
*
* Operates on a clone of `doc`; returns `{ doc, deleted }` where `deleted` is
* the number of nodes removed.
*/
export declare function deleteNodeById(doc: any, nodeId: string): {
doc: any;
deleted: number;
};
/**
* Deep-clone `doc` and strip every node/mark attribute whose value is strictly
* `undefined`, so the result is safe to hand to Yjs (which throws an opaque
* "Unexpected content type" when asked to store an `undefined` attribute value).
*
* Only `undefined` keys are removed; `null`, `false`, `0`, and `""` are all
* legitimate JSON-storable values and are preserved. Operates on a clone and
* returns it; the input is never mutated. Defensively null-safe like the rest
* of the file.
*/
export declare function sanitizeForYjs(doc: any): any;
/**
* Diagnostics helper: walk the tree and return a human-readable path string for
* the FIRST attribute value (in any `node.attrs` or `mark.attrs`) that Yjs
* cannot store — i.e. `undefined`, a `function`, a `symbol`, or a `bigint`
* (e.g. `content[3].content[0].attrs.indent (undefined)`). Returns `null` when
* every attribute is storable. Null-safe.
*/
export declare function findUnstorableAttr(doc: any): string | null;
/** Options controlling where `insertNodeRelative` places the new node. */
export interface InsertOptions {
position: "before" | "after" | "append";
/** Resolve the anchor by node id anywhere in the tree (preferred). */
anchorNodeId?: string;
/** Fallback: first TOP-LEVEL block whose plain text includes this string. */
anchorText?: string;
}
/**
* Insert a deep clone of `node` relative to an anchor.
*
* - position "append": push the node onto the top-level `doc.content`.
* - position "before"/"after": locate the anchor and splice the node into the
* anchor's parent `content` array immediately before / after it.
*
* Anchor resolution for before/after:
* - if `anchorNodeId` is given, find the node with `attrs.id === anchorNodeId`
* anywhere in the tree (recursive);
* - otherwise, if `anchorText` is given, scan only TOP-LEVEL `doc.content`
* blocks and pick the first whose `blockPlainText` includes `anchorText`.
*
* Operates on a clone of `doc`; returns `{ doc, inserted }`. `inserted` is
* false when the anchor could not be resolved (the doc is returned unchanged
* apart from being cloned).
*/
export declare function insertNodeRelative(doc: any, node: any, opts: InsertOptions): {
doc: any;
inserted: boolean;
};
/**
* Read a table as a matrix. Returns null when `tableRef` resolves to no table.
*
* - `rows`/`cols`: the table's row count and the column count of its FIRST row.
* Tables may be ragged (rows of differing length), so `cols` reflects only
* row 0; use the per-row length of `cells`/`cellIds` for each row's actual
* width.
* - `cells`: `string[][]` of each cell's `blockPlainText`.
* - `cellIds`: `(string|null)[][]` of each cell's FIRST paragraph id (or null),
* so callers can `patch_node` a cell for rich-formatted edits.
* - `path`: index path of the table within the doc.
*/
export declare function readTable(doc: any, tableRef: string): {
rows: number;
cols: number;
cells: string[][];
cellIds: (string | null)[][];
path: number[];
} | null;
/**
* Insert a row of plain-text cells into a table. Returns `{ doc, inserted }`.
*
* The row is padded to the table's column count (`cells[i] ?? ""`); supplying
* MORE cells than columns throws. Each new cell copies `colwidth` for its
* column from the header row when present, gets a fresh-id paragraph, and a
* `colspan:1, rowspan:1` attrs. `index` (when an integer in `[0, rows]`) splices
* the row there; otherwise the row is appended at the end.
*/
export declare function insertTableRow(doc: any, tableRef: string, cells: string[], index?: number): {
doc: any;
inserted: boolean;
};
/**
* Delete the row at 0-based `index` from a table. Returns `{ doc, deleted }`.
* `deleted` is false only when the table cannot be located. Throws on an
* out-of-range index, and refuses to delete the table's only row.
*/
export declare function deleteTableRow(doc: any, tableRef: string, index: number): {
doc: any;
deleted: boolean;
};
/**
* Set the plain-text content of cell `[row, col]` (0-based) to `text`. Returns
* `{ doc, updated }`; `updated` is false only when the table cannot be located.
* Throws when `row`/`col` is out of range. The cell's own attrs (colspan/
* rowspan/colwidth) are preserved; its content becomes a single text paragraph
* that reuses the cell's existing first-paragraph id when present, else a fresh
* one.
*/
export declare function updateTableCell(doc: any, tableRef: string, row: number, col: number, text: string): {
doc: any;
updated: boolean;
};

View File

@@ -0,0 +1,770 @@
/**
* Pure, network-free helpers for manipulating a ProseMirror/TipTap document
* tree by node id.
*
* A ProseMirror node here is a plain JSON object of the shape produced by
* Docmost: `{ type, attrs?, content?, text?, marks? }`. Children live in the
* `content` array; a node carries a stable id in `attrs.id`. Callouts and
* table cells hold their children in `content` just like any other block, so a
* single recursive walk reaches them all.
*
* Every exported function operates on a DEEP CLONE of the input document and
* returns the new document. The input doc and any `newNode`/`node` argument are
* never mutated. All functions are defensively null-safe: missing/!Array
* `content`, non-object nodes, and absent `attrs` are tolerated.
*/
/** Deep-clone a JSON-serializable value without mutating the original. */
function clone(value) {
if (typeof structuredClone === "function") {
return structuredClone(value);
}
// Fallback for environments without structuredClone.
return JSON.parse(JSON.stringify(value));
}
/** True if `value` is a non-null object (and not an array). */
function isObject(value) {
return value != null && typeof value === "object" && !Array.isArray(value);
}
/** True if `node` carries the given id in `node.attrs.id`. */
function matchesId(node, nodeId) {
return isObject(node) && isObject(node.attrs) && node.attrs.id === nodeId;
}
/**
* Recursively concatenate all text contained in a node.
*
* Text nodes contribute their `text` string; container nodes contribute the
* joined `blockPlainText` of their `content` children. Returns "" for nullish
* or non-object inputs.
*/
export function blockPlainText(node) {
if (!isObject(node))
return "";
let out = "";
if (typeof node.text === "string") {
out += node.text;
}
if (Array.isArray(node.content)) {
for (const child of node.content) {
out += blockPlainText(child);
}
}
return out;
}
/** Truncate `text` to at most `n` chars, appending an ellipsis when cut. */
function truncate(text, n) {
return text.length > n ? text.slice(0, n) + "…" : text;
}
/**
* Build a COMPACT outline of the TOP-LEVEL blocks of `doc` (the entries in
* `doc.content`). Deliberately does NOT recurse into paragraphs, list items, or
* table cells — compactness is the point; use `getNodeByRef` to drill into a
* specific block.
*
* Each entry carries `{ index, type, id, firstText }`, plus type-specific
* extras: headings add `level`; tables add `rows`/`cols` and the first row's
* cell texts as `header`; list blocks (types ending in "List") add `items`.
* `firstText` is the block's plain text truncated to 100 chars. Null-safe:
* a missing or non-object doc/content yields `[]`.
*/
export function buildOutline(doc) {
if (!isObject(doc) || !Array.isArray(doc.content))
return [];
const out = [];
for (let i = 0; i < doc.content.length; i++) {
const block = doc.content[i];
const type = isObject(block) ? block.type : undefined;
const entry = {
index: i,
type,
id: isObject(block) && isObject(block.attrs) ? block.attrs.id ?? null : null,
firstText: truncate(blockPlainText(block), 100),
};
if (type === "heading") {
entry.level = isObject(block.attrs) ? block.attrs.level ?? null : null;
}
else if (type === "table") {
const headerRow = block.content?.[0]?.content ?? [];
entry.rows = block.content?.length ?? 0;
entry.cols = block.content?.[0]?.content?.length ?? 0;
entry.header = headerRow.map((cell) => truncate(blockPlainText(cell), 40));
}
else if (typeof type === "string" && type.endsWith("List")) {
entry.items = block.content?.length ?? 0;
}
out.push(entry);
}
return out;
}
/**
* Resolve a single node by reference and return `{ node, path, type }`, or
* `null` when nothing matches.
*
* - `ref` of the form `#<n>` (e.g. `#2`) selects the TOP-LEVEL block at index
* `n` in `doc.content`. This is the only way to address table/tableRow/
* tableCell nodes, which carry no `attrs.id`.
* - Otherwise `ref` is treated as a block id: the FIRST node anywhere in the
* tree with `attrs.id === ref` is returned.
*
* `path` is the array of child indices from the doc root down to the node
* (so a top-level block is `[index]`). The returned `node` is a DEEP CLONE,
* so callers can mutate it without touching the input doc. Null-safe.
*/
export function getNodeByRef(doc, ref) {
if (!isObject(doc))
return null;
// "#<n>": index into the top-level content array.
const indexMatch = typeof ref === "string" ? ref.match(/^#(\d+)$/) : null;
if (indexMatch) {
const index = Number(indexMatch[1]);
const block = Array.isArray(doc.content) ? doc.content[index] : undefined;
if (!isObject(block))
return null;
return { node: clone(block), path: [index], type: block.type };
}
// Otherwise: depth-first search for the first node with attrs.id === ref.
const search = (node, trail) => {
if (!isObject(node))
return null;
if (Array.isArray(node.content)) {
for (let i = 0; i < node.content.length; i++) {
const child = node.content[i];
const path = [...trail, i];
if (matchesId(child, ref)) {
return { node: clone(child), path, type: child.type };
}
const hit = search(child, path);
if (hit != null)
return hit;
}
}
return null;
};
return search(doc, []);
}
/**
* Replace EVERY node whose `attrs.id === nodeId` with a deep clone of
* `newNode`, anywhere in the tree (including inside callouts and table cells).
*
* Operates on a clone of `doc`; returns `{ doc, replaced }` where `replaced`
* is the number of nodes substituted. A fresh clone of `newNode` is used for
* each match so they do not share references.
*/
export function replaceNodeById(doc, nodeId, newNode) {
const out = clone(doc);
let replaced = 0;
// Walk a content array, replacing direct matches and recursing into the
// (possibly new) children of non-matching nodes.
const walkContent = (content) => {
for (let i = 0; i < content.length; i++) {
const child = content[i];
if (matchesId(child, nodeId)) {
content[i] = clone(newNode);
replaced++;
// Do not recurse into a freshly substituted node.
continue;
}
if (isObject(child) && Array.isArray(child.content)) {
walkContent(child.content);
}
}
};
if (isObject(out) && Array.isArray(out.content)) {
walkContent(out.content);
}
return { doc: out, replaced };
}
/**
* Remove EVERY node whose `attrs.id === nodeId` from its parent `content`
* array, anywhere in the tree (recursive, including callouts and tables).
*
* Operates on a clone of `doc`; returns `{ doc, deleted }` where `deleted` is
* the number of nodes removed.
*/
export function deleteNodeById(doc, nodeId) {
const out = clone(doc);
let deleted = 0;
// Filter a content array in place, dropping matches and recursing into the
// surviving children.
const walkContent = (content) => {
const kept = [];
for (const child of content) {
if (matchesId(child, nodeId)) {
deleted++;
continue;
}
if (isObject(child) && Array.isArray(child.content)) {
child.content = walkContent(child.content);
}
kept.push(child);
}
return kept;
};
if (isObject(out) && Array.isArray(out.content)) {
out.content = walkContent(out.content);
}
return { doc: out, deleted };
}
/**
* Deep-clone `doc` and strip every node/mark attribute whose value is strictly
* `undefined`, so the result is safe to hand to Yjs (which throws an opaque
* "Unexpected content type" when asked to store an `undefined` attribute value).
*
* Only `undefined` keys are removed; `null`, `false`, `0`, and `""` are all
* legitimate JSON-storable values and are preserved. Operates on a clone and
* returns it; the input is never mutated. Defensively null-safe like the rest
* of the file.
*/
export function sanitizeForYjs(doc) {
const out = clone(doc);
// Drop every key whose value is strictly `undefined` from an attrs object.
const stripUndefined = (attrs) => {
if (!isObject(attrs))
return;
for (const key of Object.keys(attrs)) {
if (attrs[key] === undefined) {
delete attrs[key];
}
}
};
const walk = (node) => {
if (!isObject(node))
return;
stripUndefined(node.attrs);
if (Array.isArray(node.marks)) {
for (const mark of node.marks) {
if (isObject(mark))
stripUndefined(mark.attrs);
}
}
if (Array.isArray(node.content)) {
for (const child of node.content) {
walk(child);
}
}
};
walk(out);
return out;
}
/**
* Diagnostics helper: walk the tree and return a human-readable path string for
* the FIRST attribute value (in any `node.attrs` or `mark.attrs`) that Yjs
* cannot store — i.e. `undefined`, a `function`, a `symbol`, or a `bigint`
* (e.g. `content[3].content[0].attrs.indent (undefined)`). Returns `null` when
* every attribute is storable. Null-safe.
*/
export function findUnstorableAttr(doc) {
const isUnstorable = (value) => {
if (value === undefined)
return "undefined";
const t = typeof value;
if (t === "function")
return "function";
if (t === "symbol")
return "symbol";
if (t === "bigint")
return "bigint";
return null;
};
// Check an attrs object; return the offending sub-path or null.
const checkAttrs = (attrs, basePath) => {
if (!isObject(attrs))
return null;
for (const key of Object.keys(attrs)) {
const kind = isUnstorable(attrs[key]);
if (kind != null)
return `${basePath}.${key} (${kind})`;
}
return null;
};
const walk = (node, path) => {
if (!isObject(node))
return null;
const attrHit = checkAttrs(node.attrs, `${path}.attrs`);
if (attrHit != null)
return attrHit;
if (Array.isArray(node.marks)) {
for (let i = 0; i < node.marks.length; i++) {
const markHit = checkAttrs(node.marks[i]?.attrs, `${path}.marks[${i}].attrs`);
if (markHit != null)
return markHit;
}
}
if (Array.isArray(node.content)) {
for (let i = 0; i < node.content.length; i++) {
const childHit = walk(node.content[i], `${path}.content[${i}]`);
if (childHit != null)
return childHit;
}
}
return null;
};
// The root doc node carries no useful index, so start the path at "doc".
if (!isObject(doc))
return null;
const attrHit = checkAttrs(doc.attrs, "attrs");
if (attrHit != null)
return attrHit;
if (Array.isArray(doc.content)) {
for (let i = 0; i < doc.content.length; i++) {
const childHit = walk(doc.content[i], `content[${i}]`);
if (childHit != null)
return childHit;
}
}
return null;
}
/**
* Table structural node types and the container each must live directly inside.
* Used by `insertNodeRelative` to splice rows/cells into the correct ancestor
* rather than blindly into the anchor's direct parent (which would corrupt the
* table's nesting).
*/
const STRUCTURAL_TYPES = new Set(["tableRow", "tableCell", "tableHeader"]);
const REQUIRED_CONTAINER = {
tableRow: "table",
tableCell: "tableRow",
tableHeader: "tableRow",
};
/**
* Locate an anchor and return its ancestor chain (from `doc` down to and
* including the matched node). Each chain entry is `{ node, index }` where
* `index` is the node's position inside its parent's `content` array (the root
* doc has index -1). Returns `null` when the anchor cannot be resolved.
*/
function findAnchorChain(doc, opts) {
if (!isObject(doc))
return null;
// DFS by id anywhere in the tree, accumulating the path.
if (opts.anchorNodeId != null) {
const targetId = opts.anchorNodeId;
const search = (node, index, trail) => {
if (!isObject(node))
return null;
const here = [...trail, { node, index }];
if (matchesId(node, targetId))
return here;
if (Array.isArray(node.content)) {
for (let i = 0; i < node.content.length; i++) {
const hit = search(node.content[i], i, here);
if (hit != null)
return hit;
}
}
return null;
};
return search(doc, -1, []);
}
// By text: only top-level blocks are scanned (same rule as the JSON path).
if (opts.anchorText != null && Array.isArray(doc.content)) {
for (let i = 0; i < doc.content.length; i++) {
if (blockPlainText(doc.content[i]).includes(opts.anchorText)) {
return [
{ node: doc, index: -1 },
{ node: doc.content[i], index: i },
];
}
}
}
return null;
}
/**
* Insert a deep clone of `node` relative to an anchor.
*
* - position "append": push the node onto the top-level `doc.content`.
* - position "before"/"after": locate the anchor and splice the node into the
* anchor's parent `content` array immediately before / after it.
*
* Anchor resolution for before/after:
* - if `anchorNodeId` is given, find the node with `attrs.id === anchorNodeId`
* anywhere in the tree (recursive);
* - otherwise, if `anchorText` is given, scan only TOP-LEVEL `doc.content`
* blocks and pick the first whose `blockPlainText` includes `anchorText`.
*
* Operates on a clone of `doc`; returns `{ doc, inserted }`. `inserted` is
* false when the anchor could not be resolved (the doc is returned unchanged
* apart from being cloned).
*/
export function insertNodeRelative(doc, node, opts) {
const out = clone(doc);
const fresh = clone(node);
// Defensive: stay null-safe like the other exports — a missing opts means
// there is nothing actionable to do.
if (!isObject(opts))
return { doc: out, inserted: false };
const isStructural = isObject(node) && STRUCTURAL_TYPES.has(node.type);
// "append": top-level push.
if (opts.position === "append") {
// Structural table nodes (tableRow/tableCell/tableHeader) cannot live at the
// top level — appending one would produce invalid nesting.
if (isStructural) {
throw new Error(`insert_node: cannot append a ${node.type} at the top level; use ` +
`position before/after with an anchor inside the target table`);
}
if (isObject(out)) {
if (!Array.isArray(out.content))
out.content = [];
out.content.push(fresh);
return { doc: out, inserted: true };
}
return { doc: out, inserted: false };
}
const offset = opts.position === "after" ? 1 : 0;
// Structural insert (before/after a tableRow/tableCell/tableHeader): splice
// into the nearest enclosing table/tableRow rather than the anchor's direct
// parent, so the row/cell lands at the correct level of the table.
if (isStructural) {
const containerType = REQUIRED_CONTAINER[node.type];
const chain = findAnchorChain(out, opts);
// Anchor not resolved at all — keep the existing "anchor not found" path.
if (chain == null)
return { doc: out, inserted: false };
// Find the DEEPEST ancestor (including the anchor itself) of the required
// container type.
let containerIdx = -1;
for (let i = chain.length - 1; i >= 0; i--) {
if (isObject(chain[i].node) && chain[i].node.type === containerType) {
containerIdx = i;
break;
}
}
if (containerIdx === -1) {
throw new Error(`insert_node: cannot insert a ${node.type} here — the anchor is not ` +
`inside a ${containerType}. Anchor on a cell's text or a block id ` +
`that lives inside the target table.`);
}
const container = chain[containerIdx].node;
if (!Array.isArray(container.content))
container.content = [];
if (containerIdx === chain.length - 1) {
// The matched container IS the anchor node itself (e.g. anchorText
// resolved to the table block): append/prepend within it.
const at = opts.position === "after" ? container.content.length : 0;
container.content.splice(at, 0, fresh);
}
else {
// The immediate child on the path leading to the anchor is the row/cell
// to splice next to.
const enclosingChildIndex = chain[containerIdx + 1].index;
container.content.splice(enclosingChildIndex + offset, 0, fresh);
}
return { doc: out, inserted: true };
}
// Resolve by id anywhere in the tree: splice into the parent content array.
if (opts.anchorNodeId != null) {
let inserted = false;
const walkContent = (content) => {
for (let i = 0; i < content.length; i++) {
const child = content[i];
if (matchesId(child, opts.anchorNodeId)) {
content.splice(i + offset, 0, fresh);
inserted = true;
return;
}
if (isObject(child) && Array.isArray(child.content)) {
walkContent(child.content);
if (inserted)
return;
}
}
};
if (isObject(out) && Array.isArray(out.content)) {
walkContent(out.content);
}
return { doc: out, inserted };
}
// Resolve by text: only top-level doc.content blocks are scanned.
if (opts.anchorText != null && isObject(out) && Array.isArray(out.content)) {
for (let i = 0; i < out.content.length; i++) {
if (blockPlainText(out.content[i]).includes(opts.anchorText)) {
out.content.splice(i + offset, 0, fresh);
return { doc: out, inserted: true };
}
}
}
return { doc: out, inserted: false };
}
// ===========================================================================
// Table editing helpers
//
// A Docmost table is a ProseMirror subtree with NO ids on the structural nodes:
// table -> { type:"table", content:[tableRow...] }
// row -> { type:"tableRow", content:[tableCell|tableHeader...] }
// cell -> { type:"tableCell"|"tableHeader", attrs:{colspan,rowspan,colwidth},
// content:[paragraph...] }
// para -> { type:"paragraph", attrs:{id,indent}, content:[textNode...] }
// Only paragraphs/headings carry an `attrs.id`, so a cell is addressed via the
// id of the paragraph inside it. The helpers below all operate on a DEEP CLONE
// of the input doc (via `clone`) and never mutate their inputs.
// ===========================================================================
/**
* Collect EVERY `attrs.id` present anywhere in `node` into `used`. Used to seed
* `makeFreshId` so generated paragraph ids never collide with existing ones.
*/
function collectIds(node, used) {
if (!isObject(node))
return;
if (isObject(node.attrs) && typeof node.attrs.id === "string") {
used.add(node.attrs.id);
}
if (Array.isArray(node.content)) {
for (const child of node.content)
collectIds(child, used);
}
}
/**
* Fresh-id generator: returns a random Docmost-style id (12 chars from
* lowercase `a-z0-9`) that is not already in `used`, and records it. On the
* rare collision the id is regenerated. Callers rely on uniqueness, not on the
* exact string, so randomness is fine — and unlike a module-local counter it
* needs no reset and cannot become predictable across calls.
*/
function makeFreshId(used) {
const alphabet = "abcdefghijklmnopqrstuvwxyz0123456789";
let id;
do {
id = "";
for (let i = 0; i < 12; i++) {
id += alphabet[Math.floor(Math.random() * alphabet.length)];
}
} while (used.has(id) || id === "");
used.add(id);
return id;
}
/**
* Resolve a table reference against an ALREADY-CLONED doc and return the LIVE
* table node (a reference inside `rootClone`, so the caller may mutate it) plus
* its index path. Returns null when no table matches.
*
* - `#<n>`: the top-level block at index `n`, only if its `type === "table"`.
* - otherwise: DFS for the node with `attrs.id === tableRef`, then walk UP its
* ancestor chain to the nearest `type === "table"` ancestor.
*/
function locateTable(rootClone, tableRef) {
if (!isObject(rootClone))
return null;
// "#<n>": index into the top-level content array; must be a table.
const indexMatch = typeof tableRef === "string" ? tableRef.match(/^#(\d+)$/) : null;
if (indexMatch) {
const index = Number(indexMatch[1]);
const block = Array.isArray(rootClone.content)
? rootClone.content[index]
: undefined;
if (isObject(block) && block.type === "table") {
return { table: block, path: [index] };
}
return null;
}
// Otherwise: DFS for attrs.id === tableRef, tracking the ancestor chain, then
// climb to the nearest enclosing table.
const search = (node, trail) => {
if (!isObject(node))
return null;
if (Array.isArray(node.content)) {
for (let i = 0; i < node.content.length; i++) {
const child = node.content[i];
const here = [...trail, { node: child, index: i }];
if (matchesId(child, tableRef)) {
// Walk UP to the nearest table ancestor (including the match itself).
for (let j = here.length - 1; j >= 0; j--) {
if (isObject(here[j].node) && here[j].node.type === "table") {
return {
table: here[j].node,
path: here.slice(0, j + 1).map((e) => e.index),
};
}
}
return null; // id found but no enclosing table
}
const hit = search(child, here);
if (hit != null)
return hit;
}
}
return null;
};
return search(rootClone, []);
}
/** Build the plain-text → single-paragraph cell content used by all writers. */
function makeCellParagraph(id, text) {
return {
type: "paragraph",
attrs: { id, indent: 0 },
// Empty string → a paragraph with an empty content array.
content: text ? [{ type: "text", text }] : [],
};
}
/**
* Read a table as a matrix. Returns null when `tableRef` resolves to no table.
*
* - `rows`/`cols`: the table's row count and the column count of its FIRST row.
* Tables may be ragged (rows of differing length), so `cols` reflects only
* row 0; use the per-row length of `cells`/`cellIds` for each row's actual
* width.
* - `cells`: `string[][]` of each cell's `blockPlainText`.
* - `cellIds`: `(string|null)[][]` of each cell's FIRST paragraph id (or null),
* so callers can `patch_node` a cell for rich-formatted edits.
* - `path`: index path of the table within the doc.
*/
export function readTable(doc, tableRef) {
const root = clone(doc);
const located = locateTable(root, tableRef);
if (located == null)
return null;
const { table, path } = located;
const rowNodes = Array.isArray(table.content) ? table.content : [];
const rows = rowNodes.length;
const cols = rowNodes[0]?.content?.length ?? 0;
const cells = [];
const cellIds = [];
for (const rowNode of rowNodes) {
const cellNodes = Array.isArray(rowNode?.content) ? rowNode.content : [];
const rowText = [];
const rowIds = [];
for (const cellNode of cellNodes) {
rowText.push(blockPlainText(cellNode));
// The cell's first paragraph carries the id used for patch_node.
const firstPara = Array.isArray(cellNode?.content)
? cellNode.content[0]
: undefined;
const id = isObject(firstPara) && isObject(firstPara.attrs)
? firstPara.attrs.id ?? null
: null;
rowIds.push(id);
}
cells.push(rowText);
cellIds.push(rowIds);
}
return { rows, cols, cells, cellIds, path };
}
/**
* Insert a row of plain-text cells into a table. Returns `{ doc, inserted }`.
*
* The row is padded to the table's column count (`cells[i] ?? ""`); supplying
* MORE cells than columns throws. Each new cell copies `colwidth` for its
* column from the header row when present, gets a fresh-id paragraph, and a
* `colspan:1, rowspan:1` attrs. `index` (when an integer in `[0, rows]`) splices
* the row there; otherwise the row is appended at the end.
*/
export function insertTableRow(doc, tableRef, cells, index) {
const out = clone(doc);
const located = locateTable(out, tableRef);
if (located == null)
return { doc: out, inserted: false };
const { table } = located;
if (!Array.isArray(table.content))
table.content = [];
const rows = table.content.length;
const headerRow = table.content[0];
const headerCells = Array.isArray(headerRow?.content) ? headerRow.content : [];
// Column count is the WIDEST existing row, so the guard below stays
// meaningful for ragged tables and the new row matches the table's width.
// Fall back to the supplied cell count only when the table has no rows.
let colCount = 0;
for (const r of table.content) {
if (isObject(r) && Array.isArray(r.content))
colCount = Math.max(colCount, r.content.length);
}
if (colCount === 0)
colCount = Array.isArray(cells) ? cells.length : 0;
if (Array.isArray(cells) && cells.length > colCount) {
throw new Error(`table_insert_row: got ${cells.length} cell(s) but the table has ${colCount} column(s)`);
}
// Resolve the landing index up front so the cell-type decision and the splice
// below agree: a valid integer in [0, rows] splices there, else we append.
const landingIndex = typeof index === "number" && Number.isInteger(index) && index >= 0 && index <= rows
? index
: rows;
// Seed the id generator with every id already in the doc so the new cell
// paragraph ids are unique within the whole document.
const used = new Set();
collectIds(out, used);
const newCells = [];
for (let i = 0; i < colCount; i++) {
const text = (Array.isArray(cells) ? cells[i] : undefined) ?? "";
const attrs = { colspan: 1, rowspan: 1 };
// Copy this column's colwidth from the header row's cell when present.
const colwidth = headerCells[i]?.attrs?.colwidth;
if (colwidth !== undefined)
attrs.colwidth = colwidth;
// A row landing at index 0 becomes the new header row, so inherit the
// current header cell's type per column (Docmost uses "tableHeader" there);
// every other position is a plain data cell.
const cellType = landingIndex === 0 ? headerCells[i]?.type ?? "tableCell" : "tableCell";
newCells.push({
type: cellType,
attrs,
content: [makeCellParagraph(makeFreshId(used), text)],
});
}
const newRow = { type: "tableRow", content: newCells };
// Splice at the resolved landing index (append when index was omitted/invalid).
table.content.splice(landingIndex, 0, newRow);
return { doc: out, inserted: true };
}
/**
* Delete the row at 0-based `index` from a table. Returns `{ doc, deleted }`.
* `deleted` is false only when the table cannot be located. Throws on an
* out-of-range index, and refuses to delete the table's only row.
*/
export function deleteTableRow(doc, tableRef, index) {
const out = clone(doc);
const located = locateTable(out, tableRef);
if (located == null)
return { doc: out, deleted: false };
const { table } = located;
if (!Array.isArray(table.content))
table.content = [];
const rows = table.content.length;
if (!Number.isInteger(index) || index < 0 || index >= rows) {
throw new Error(`table_delete_row: row index ${index} out of range (table has ${rows} row(s))`);
}
if (rows <= 1) {
throw new Error("table_delete_row: refusing to delete the only row of the table");
}
table.content.splice(index, 1);
return { doc: out, deleted: true };
}
/**
* Set the plain-text content of cell `[row, col]` (0-based) to `text`. Returns
* `{ doc, updated }`; `updated` is false only when the table cannot be located.
* Throws when `row`/`col` is out of range. The cell's own attrs (colspan/
* rowspan/colwidth) are preserved; its content becomes a single text paragraph
* that reuses the cell's existing first-paragraph id when present, else a fresh
* one.
*/
export function updateTableCell(doc, tableRef, row, col, text) {
const out = clone(doc);
const located = locateTable(out, tableRef);
if (located == null)
return { doc: out, updated: false };
const { table } = located;
const rowNodes = Array.isArray(table.content) ? table.content : [];
const rows = rowNodes.length;
const rowNode = rowNodes[row];
const cols = isObject(rowNode) && Array.isArray(rowNode.content)
? rowNode.content.length
: 0;
if (!Number.isInteger(row) ||
row < 0 ||
row >= rows ||
!Number.isInteger(col) ||
col < 0 ||
col >= cols) {
throw new Error(`table_update_cell: cell [${row},${col}] out of range`);
}
const cellNode = rowNode.content[col];
// Reuse the cell's existing first-paragraph id, or mint a fresh unique one.
const existingPara = Array.isArray(cellNode?.content)
? cellNode.content[0]
: undefined;
let id = isObject(existingPara) && isObject(existingPara.attrs)
? existingPara.attrs.id
: undefined;
if (typeof id !== "string" || id.length === 0) {
const used = new Set();
collectIds(out, used);
id = makeFreshId(used);
}
cellNode.content = [makeCellParagraph(id, text)];
return { doc: out, updated: true };
}

View File

@@ -0,0 +1,50 @@
/**
* The native-Obsidian page-file format (design: docs/backlog/git-sync-thin-meta.md).
* A page file is CLEAN markdown with a minimal YAML frontmatter carrying ONLY the
* page's durable identity:
*
* ---
* gitmost_id: 019ef6fc-2638-7ce1-9ce3-2756ce038480
* ---
* <clean markdown body>
*
* Everything else is derived (title = filename, parentPageId = enclosing folder,
* spaceId = the vault, updatedAt = git). `gitmost_id` (a Docmost pageId) is the
* only non-derivable bit and travels WITH the file so identity survives any move,
* even one git's rename detection misses. Third-party editors (Obsidian, …) see
* clean markdown; the frontmatter is hidden in their preview.
*
* No backward-compat with the old `docmost:meta` format: vaults are a cache, wiped
* and rebuilt native. A file WITHOUT a `gitmost_id` frontmatter is an un-tracked
* (e.g. hand-written) file -> the caller ADOPTS it (creates a page, writes the id).
*/
/**
* The frontmatter key carrying the Docmost pageId. NAMESPACED (not a bare `id`)
* so it never collides with a user's own frontmatter fields.
*/
export declare const ID_KEY = "gitmost_id";
/**
* Parse a page file into its identity (`id`) and clean markdown `body`. Tolerant:
* a file with no frontmatter (a hand-written third-party file) returns `id: null`
* and the whole text as the body — the caller then ADOPTS it (creates a page,
* writes the id back).
*
* KNOWN LIMITATION (phase 4 — adoption, see docs/backlog/git-sync-thin-meta.md):
* a leading frontmatter block is stripped from `body` even when it carries NO
* `gitmost_id` but DOES carry the user's own Obsidian properties (`tags:` etc.).
* On adoption those fields are not yet round-tripped — `serializePageFile`
* write-back persists only `gitmost_id`. Preserving arbitrary user frontmatter
* across the Docmost round-trip (BOTH adoption write-back AND the next pull's
* re-serialize) is deferred to the adoption phase; until then, do NOT roll the
* native format onto a real Obsidian vault whose notes carry properties.
*/
export declare function parsePageFile(full: string): {
id: string | null;
body: string;
};
/**
* Serialize a page into the thin format: `id` frontmatter + a blank line + the
* clean body + a trailing newline. Deterministic so an unchanged page re-syncs to
* byte-identical output (no churn — the loop-guard relies on it).
*/
export declare function serializePageFile(id: string, body: string): string;

View File

@@ -0,0 +1,72 @@
/**
* The native-Obsidian page-file format (design: docs/backlog/git-sync-thin-meta.md).
* A page file is CLEAN markdown with a minimal YAML frontmatter carrying ONLY the
* page's durable identity:
*
* ---
* gitmost_id: 019ef6fc-2638-7ce1-9ce3-2756ce038480
* ---
* <clean markdown body>
*
* Everything else is derived (title = filename, parentPageId = enclosing folder,
* spaceId = the vault, updatedAt = git). `gitmost_id` (a Docmost pageId) is the
* only non-derivable bit and travels WITH the file so identity survives any move,
* even one git's rename detection misses. Third-party editors (Obsidian, …) see
* clean markdown; the frontmatter is hidden in their preview.
*
* No backward-compat with the old `docmost:meta` format: vaults are a cache, wiped
* and rebuilt native. A file WITHOUT a `gitmost_id` frontmatter is an un-tracked
* (e.g. hand-written) file -> the caller ADOPTS it (creates a page, writes the id).
*/
/**
* The frontmatter key carrying the Docmost pageId. NAMESPACED (not a bare `id`)
* so it never collides with a user's own frontmatter fields.
*/
export const ID_KEY = "gitmost_id";
/** Leading YAML frontmatter block: `---\n…\n---` at the very start of the file. */
const FRONTMATTER_RE = /^?---\n([\s\S]*?)\n---\n?/;
/** The top-level `<ID_KEY>: <value>` line inside the frontmatter (quotes optional). */
function readIdFromYaml(yaml) {
const re = new RegExp(`^${ID_KEY}:\\s*(.+?)\\s*$`);
for (const line of yaml.split("\n")) {
const m = line.match(re);
if (m) {
const v = m[1].trim().replace(/^["']|["']$/g, "");
return v === "" ? null : v;
}
}
return null;
}
/**
* Parse a page file into its identity (`id`) and clean markdown `body`. Tolerant:
* a file with no frontmatter (a hand-written third-party file) returns `id: null`
* and the whole text as the body — the caller then ADOPTS it (creates a page,
* writes the id back).
*
* KNOWN LIMITATION (phase 4 — adoption, see docs/backlog/git-sync-thin-meta.md):
* a leading frontmatter block is stripped from `body` even when it carries NO
* `gitmost_id` but DOES carry the user's own Obsidian properties (`tags:` etc.).
* On adoption those fields are not yet round-tripped — `serializePageFile`
* write-back persists only `gitmost_id`. Preserving arbitrary user frontmatter
* across the Docmost round-trip (BOTH adoption write-back AND the next pull's
* re-serialize) is deferred to the adoption phase; until then, do NOT roll the
* native format onto a real Obsidian vault whose notes carry properties.
*/
export function parsePageFile(full) {
const text = (full ?? "").replace(/\r\n/g, "\n");
// Native format: a `gitmost_id` YAML frontmatter. Anything else (no frontmatter,
// or frontmatter without the key) is an un-tracked file -> adopt.
const fm = text.match(FRONTMATTER_RE);
if (fm) {
return { id: readIdFromYaml(fm[1]), body: text.slice(fm[0].length).trim() };
}
return { id: null, body: text.trim() };
}
/**
* Serialize a page into the thin format: `id` frontmatter + a blank line + the
* clean body + a trailing newline. Deterministic so an unchanged page re-syncs to
* byte-identical output (no churn — the loop-guard relies on it).
*/
export function serializePageFile(id, body) {
return `---\n${ID_KEY}: ${id}\n---\n\n${body.trim()}\n`;
}

14
packages/git-sync/node_modules/.bin/esbuild generated vendored Executable file
View File

@@ -0,0 +1,14 @@
#!/bin/sh
basedir=$(dirname "$(echo "$0" | sed -e 's,\\,/,g')")
case `uname` in
*CYGWIN*) basedir=`cygpath -w "$basedir"`;;
esac
if [ -z "$NODE_PATH" ]; then
export NODE_PATH="/home/claude/gitmost/node_modules/.pnpm/esbuild@0.28.0/node_modules/esbuild/bin/node_modules:/home/claude/gitmost/node_modules/.pnpm/esbuild@0.28.0/node_modules/esbuild/node_modules:/home/claude/gitmost/node_modules/.pnpm/esbuild@0.28.0/node_modules:/home/claude/gitmost/node_modules/.pnpm/node_modules"
else
export NODE_PATH="/home/claude/gitmost/node_modules/.pnpm/esbuild@0.28.0/node_modules/esbuild/bin/node_modules:/home/claude/gitmost/node_modules/.pnpm/esbuild@0.28.0/node_modules/esbuild/node_modules:/home/claude/gitmost/node_modules/.pnpm/esbuild@0.28.0/node_modules:/home/claude/gitmost/node_modules/.pnpm/node_modules:$NODE_PATH"
fi
"$basedir/../../../../node_modules/.pnpm/esbuild@0.28.0/node_modules/esbuild/bin/esbuild" "$@"
exit $?

17
packages/git-sync/node_modules/.bin/jiti generated vendored Executable file
View File

@@ -0,0 +1,17 @@
#!/bin/sh
basedir=$(dirname "$(echo "$0" | sed -e 's,\\,/,g')")
case `uname` in
*CYGWIN*) basedir=`cygpath -w "$basedir"`;;
esac
if [ -z "$NODE_PATH" ]; then
export NODE_PATH="/home/claude/gitmost/node_modules/.pnpm/jiti@2.4.2/node_modules/jiti/lib/node_modules:/home/claude/gitmost/node_modules/.pnpm/jiti@2.4.2/node_modules/jiti/node_modules:/home/claude/gitmost/node_modules/.pnpm/jiti@2.4.2/node_modules:/home/claude/gitmost/node_modules/.pnpm/node_modules"
else
export NODE_PATH="/home/claude/gitmost/node_modules/.pnpm/jiti@2.4.2/node_modules/jiti/lib/node_modules:/home/claude/gitmost/node_modules/.pnpm/jiti@2.4.2/node_modules/jiti/node_modules:/home/claude/gitmost/node_modules/.pnpm/jiti@2.4.2/node_modules:/home/claude/gitmost/node_modules/.pnpm/node_modules:$NODE_PATH"
fi
if [ -x "$basedir/node" ]; then
exec "$basedir/node" "$basedir/../../../../node_modules/.pnpm/jiti@2.4.2/node_modules/jiti/lib/jiti-cli.mjs" "$@"
else
exec node "$basedir/../../../../node_modules/.pnpm/jiti@2.4.2/node_modules/jiti/lib/jiti-cli.mjs" "$@"
fi

17
packages/git-sync/node_modules/.bin/lessc generated vendored Executable file
View File

@@ -0,0 +1,17 @@
#!/bin/sh
basedir=$(dirname "$(echo "$0" | sed -e 's,\\,/,g')")
case `uname` in
*CYGWIN*) basedir=`cygpath -w "$basedir"`;;
esac
if [ -z "$NODE_PATH" ]; then
export NODE_PATH="/home/claude/gitmost/node_modules/.pnpm/less@4.2.0/node_modules/less/bin/node_modules:/home/claude/gitmost/node_modules/.pnpm/less@4.2.0/node_modules/less/node_modules:/home/claude/gitmost/node_modules/.pnpm/less@4.2.0/node_modules:/home/claude/gitmost/node_modules/.pnpm/node_modules"
else
export NODE_PATH="/home/claude/gitmost/node_modules/.pnpm/less@4.2.0/node_modules/less/bin/node_modules:/home/claude/gitmost/node_modules/.pnpm/less@4.2.0/node_modules/less/node_modules:/home/claude/gitmost/node_modules/.pnpm/less@4.2.0/node_modules:/home/claude/gitmost/node_modules/.pnpm/node_modules:$NODE_PATH"
fi
if [ -x "$basedir/node" ]; then
exec "$basedir/node" "$basedir/../../../../node_modules/.pnpm/less@4.2.0/node_modules/less/bin/lessc" "$@"
else
exec node "$basedir/../../../../node_modules/.pnpm/less@4.2.0/node_modules/less/bin/lessc" "$@"
fi

Some files were not shown because too many files have changed in this diff Show More