Compare commits

..

5 Commits

Author SHA1 Message Date
claude code agent 227
888deba891 docs(#193): drop uploadImage from MCP-transport method list in contract-guard comment (F3)
uploadImage is internal to client.ts (called by insertImage/replaceImage);
the MCP transport (index.ts) does not call it directly. Remove it from the
comment's list of transport-called methods.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 14:07:02 +03:00
claude code agent 227
4c7b671950 docs(#193): correct contract-guard comment — interface is a subset, not superset
The DocmostClientLike mirror covers only methods the in-app adapter consumes;
the standalone MCP transport calls additional client methods not tracked here
(covered by its own typecheck). Fixes the misleading 'superset' wording (F2).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 01:59:10 +03:00
claude code agent 227
4131deaabb test(mcp): robustify the client-host contract drift-guard parser
Architect-review hardening of the bidirectional DocmostClientLike <->
HOST_CONTRACT_METHODS guard (test-only, no production change):

- Interface method-name regex now accepts full TS identifiers
  (digits/_/$) and generic signatures (method<T>(), avoiding a future
  benign false-FAIL.
- Skip /* ... */ block comments in the interface body so a `name(` line
  inside one is not falsely parsed as a method.
- Wrap the cross-package readFileSync with a clear "expected monorepo
  layout" error instead of a bare ENOENT when run outside the monorepo.
- Narrow the guard's comments/error to state plainly it checks the
  method-NAME set only; signature parity remains the deferred staged-plan
  item.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 23:54:04 +03:00
claude code agent 227
5b88e3dddf test(mcp): drift-guard HOST_CONTRACT_METHODS against DocmostClientLike both ways
The contract test only checked one direction (each name in
HOST_CONTRACT_METHODS exists on the real DocmostClient). But
HOST_CONTRACT_METHODS is itself a hand-copy of the server's
DocmostClientLike interface (docmost-client.loader.ts), and that
list<->interface link was untested: a method added to the interface +
consumed by the adapter but forgotten in the list (or removed from the
interface but left in the list) would escape both the server typecheck
(the pkg emits no .d.ts) and the existing test (name not in the list) ->
a runtime "x is not a function" in a tool call.

Parse the method names from the DocmostClientLike interface body (read
the .ts source via import.meta.url, scan member-signature lines) and
assert.deepEqual them against HOST_CONTRACT_METHODS BOTH ways. Lists are
currently identical (39=39), so this is a coverage hole closed, not a
live bug.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 23:36:22 +03:00
claude code agent 227
d0ca127d83 refactor(ai-chat): drift-guard the DocmostClientLike hand-mirror (#193)
Issue #193's tool-half has two open items. The shared, zod-agnostic tool-spec
registry (SHARED_TOOL_SPECS) for the identical tools is already merged
(f3fa15e7) and consumed by both layers, so that subset is done. The remaining
items are: (a) deriving the layer-3 hand-mirror `DocmostClientLike` from the
real client type, and (b) folding more tools into the registry. Both were
deferred as risky, and that deferral still holds (verified, see below) — so
this change ships the safest concrete increment instead of forcing the risk.

What this adds (behaviour-neutral, test-only + a doc comment):

- packages/mcp/test/unit/client-host-contract.test.mjs: pins the layer-3
  contract from the ESM side, where the real DocmostClient is importable. It
  asserts every method the in-app `DocmostClientLike` mirror declares exists as
  a function on a real DocmostClient instance (constructor is side-effect-free).
  A rename/removal in client.ts now fails this test instead of silently shipping
  a runtime "x is not a function" into an agent tool call. Negative-case
  verified (a bogus method name is detected).

- docmost-client.loader.ts: replaces the vague mirror comment with a pointer to
  the guard test and a concrete, empirically-grounded staged plan for the full
  type-derivation. Verified blockers kept it deferred: @docmost/mcp emits no
  .d.ts (no `declaration`, no `types` export) and the server has no path mapping
  for it, so there is no type to import today; and the real methods' inferred
  CONCRETE return types conflict with the in-app adapter's loose
  Record<string,unknown> + `as`-cast result handling (deriving the exact type
  breaks the build / forces pervasive double-casts and full-surface test stubs).

Out of scope (noted in the issue): the PM<->Markdown converter unification.

Verified: server tsc clean; mcp tsc clean; mcp tests 369 pass (367 + 2 new);
ai-chat tools specs 51 pass. No behaviour change; committed mcp build untouched
(no mcp src changed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 15:07:43 +03:00
37 changed files with 266 additions and 2539 deletions

View File

@@ -124,26 +124,6 @@ MCP_DOCMOST_PASSWORD=
# MCP_TOKEN=
# MCP_SESSION_IDLE_MS=1800000
#
# BLOB SANDBOX (stash_page). An in-RAM, process-local store that hands large page
# content + images to an external consumer WITHOUT bloating the model context or
# requiring Docmost auth. The stash_page tool serializes a page, mirrors its
# internal images into the store, and returns ONLY a short anonymous URL; the
# consumer fetches blobs via `GET /api/sb/<uuid>` (no token — the capability is
# the unguessable UUID + short TTL + TLS). Blobs are RAM-only and cleared on
# restart. ETag = the blob's sha256 (integrity check).
# SANDBOX_PUBLIC_URL is the base used to build those URLs; it MUST be reachable
# by the consumer (do NOT use a loopback address if the consumer is remote).
# Defaults to APP_URL when unset.
# NOTE: the store is process-local — blobs live only on the instance that
# created them. Behind a multi-replica load balancer WITHOUT sticky sessions a
# consumer may hit a different instance and get a 404 (indistinguishable from an
# expired blob). Single-host deployments are unaffected.
# SANDBOX_PUBLIC_URL=https://docs.example.com
# SANDBOX_TTL_MS=3600000
# SANDBOX_MAX_BYTES=8388608
# SANDBOX_MAX_IMAGE_BYTES=20971520
# SANDBOX_MAX_TOTAL_BYTES=134217728
#
# AI-AGENT ATTRIBUTION (comments/pages written via MCP are badged as "AI"):
# attribution is driven by a per-user `is_agent` flag on the users row. There is
# NO admin UI/API for it — set it out-of-band with SQL. Use a DEDICATED service

View File

@@ -241,7 +241,7 @@ Migration files live in `apps/server/src/database/migrations/` and are named `YY
- **API server** — `dist/main` (`apps/server/src/main.ts`), the Fastify HTTP app (`AppModule`).
- **Collaboration server** — `dist/collaboration/server/collab-main` (`pnpm collab`), a Hocuspocus/Yjs WebSocket server (`apps/server/src/collaboration/`) handling real-time document editing, persistence, and page-history snapshots. It listens on `COLLAB_PORT` (default `3001`), separate from the API server's `PORT` (default `3000`), and shares state with the API server through Redis.
The API server is a Fastify app with a global `/api` prefix (`main.ts` excludes `robots.txt`, public share pages, and `mcp` from the prefix). A `preHandler` hook enforces that a resolved `workspaceId` exists for most `/api` routes (multi-tenant by hostname/subdomain via `DomainMiddleware`). `GET /api/sb/:id` (the anonymous blob-sandbox read route) is listed in that preHandler's `excludedPaths`, so it is exempt from workspace resolution and carries no session auth at all (its capability is the unguessable UUID + TTL + TLS) — unlike `/api/files/public/...`, which still resolves a workspace and requires a workspace-bound attachment JWT. Auth is JWT (cookie + bearer); authorization is **CASL** (`core/casl`) — every data access is scoped to the user's abilities.
The API server is a Fastify app with a global `/api` prefix (`main.ts` excludes `robots.txt`, public share pages, and `mcp` from the prefix). A `preHandler` hook enforces that a resolved `workspaceId` exists for most `/api` routes (multi-tenant by hostname/subdomain via `DomainMiddleware`). Auth is JWT (cookie + bearer); authorization is **CASL** (`core/casl`) — every data access is scoped to the user's abilities.
### Module structure (server)
`AppModule` wires integration modules (`integrations/*`: storage [local/S3/Azure], mail, queue [BullMQ on Redis], security, telemetry, throttle, `mcp`, `ai`) plus `CoreModule`, `DatabaseModule`, and `CollaborationModule`. `CoreModule` (`core/*`) holds the domain modules: `page`, `space`, `comment`, `workspace`, `user`, `auth`, `group`, `attachment`, `search`, `share`, `ai-chat`, etc. Each domain module follows NestJS controller → service → repo layering; DB repos live under `database/repos` and are injected app-wide from the global `DatabaseModule`.
@@ -254,7 +254,7 @@ The API server is a Fastify app with a global `/api` prefix (`main.ts` excludes
- **Redis** backs caching, the BullMQ queues, the WebSocket Socket.IO adapter, and collaboration sync.
### The two AI subsystems (the main fork additions)
1. **Embedded MCP server** (`integrations/mcp/` + `packages/mcp`). The standalone `@docmost/mcp` server (40 agent-native tools: per-block patch/insert/delete by id, scripted `(doc)=>doc` transforms with dry-run diff, table editing, version diff/restore, comments, images, shares) is bundled and served over HTTP at `/mcp`. It writes through Docmost's real-time-collaboration layer so concurrent human edits aren't clobbered. Each request authenticates **per-user** via the `Authorization` header — either HTTP Basic (`base64(email:password)`, the user's own Docmost login, validated through `AuthService`) or a Bearer access JWT (the user's `authToken`) — and the session acts under that user's permissions. `MCP_DOCMOST_EMAIL` / `MCP_DOCMOST_PASSWORD` are an **optional service-account fallback**, used only when a request carries neither Basic nor Bearer credentials (back-compat for CI/scripts). An admin enables MCP with a workspace toggle (Workspace settings → AI). Optionally protected by a shared `MCP_TOKEN`: when set, every `/mcp` request must carry a matching `X-MCP-Token` header (its own header, separate from `Authorization`, which now carries the per-user Basic/Bearer credentials). Note: this changed from the older `Authorization: Bearer <MCP_TOKEN>` scheme — see `.env.example` and the CHANGELOG Breaking Changes entry.
1. **Embedded MCP server** (`integrations/mcp/` + `packages/mcp`). The standalone `@docmost/mcp` server (39 agent-native tools: per-block patch/insert/delete by id, scripted `(doc)=>doc` transforms with dry-run diff, table editing, version diff/restore, comments, images, shares) is bundled and served over HTTP at `/mcp`. It writes through Docmost's real-time-collaboration layer so concurrent human edits aren't clobbered. Each request authenticates **per-user** via the `Authorization` header — either HTTP Basic (`base64(email:password)`, the user's own Docmost login, validated through `AuthService`) or a Bearer access JWT (the user's `authToken`) — and the session acts under that user's permissions. `MCP_DOCMOST_EMAIL` / `MCP_DOCMOST_PASSWORD` are an **optional service-account fallback**, used only when a request carries neither Basic nor Bearer credentials (back-compat for CI/scripts). An admin enables MCP with a workspace toggle (Workspace settings → AI). Optionally protected by a shared `MCP_TOKEN`: when set, every `/mcp` request must carry a matching `X-MCP-Token` header (its own header, separate from `Authorization`, which now carries the per-user Basic/Bearer credentials). Note: this changed from the older `Authorization: Bearer <MCP_TOKEN>` scheme — see `.env.example` and the CHANGELOG Breaking Changes entry.
2. **AI agent chat** (`core/ai-chat/` server + `apps/client/src/features/ai-chat/` client). A built-in agent over the wiki using the Vercel **AI SDK** (`ai`, `@ai-sdk/*`) against any OpenAI-compatible provider configured per workspace (`integrations/ai/` — credentials encrypted at rest via `integrations/crypto`, stored in `ai_provider_credentials`). Key pieces:
- `core/ai-chat/tools/` — the agent's ~40 read+write tools. Every tool runs under the **calling user's** CASL permissions via a per-user loopback access token (`docmost-client.loader.ts`), so the agent can never exceed what the user could do. Only **reversible** operations are exposed (page history + trash; no permanent delete). Agent edits get an "AI agent" provenance badge in page history (`20260616T130000-agent-provenance` migration).
- `core/ai-chat/embedding/` — RAG indexer + a BullMQ consumer on `AI_QUEUE` that embeds pages into `page_embeddings` (vector search), complementing Postgres full-text search. Pages are (re)indexed on edit; `AI_EMBEDDING_TIMEOUT_MS` bounds a hung embeddings endpoint.

View File

@@ -58,15 +58,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
append/prepend fragments, nor to COMMENT bodies — a comment may legitimately
contain a standalone footnote definition, which canonicalization would drop.
(#228)
- **Out-of-band page transfer via an in-RAM blob sandbox (`stash_page`).** A
new MCP tool serializes a whole page (its full ProseMirror JSON, with every
internal image/file mirrored) into an ephemeral in-RAM blob and returns only
a short anonymous URL, so a large page can be handed to an external consumer
without flooding the model context. Blobs are served by unguessable UUID over
a new anonymous `GET /api/sb/:id` route (strong sha256 ETag, short TTL,
`nosniff` + restrictive CSP + attachment disposition for non-image mimes) and
are RAM-only, bound to the instance that created them. Tunable via five
`SANDBOX_*` env vars (see `.env.example`). (#243)
### Changed

View File

@@ -34,7 +34,7 @@ The goal of the fork is a **100% open, AGPL-only build with no Enterprise-Editio
| --- | --- |
| **EE code removed** | Stripped all client and server Enterprise-Edition code; ships as a clean community/AGPL build with no license checks. |
| **Comment resolution** | Re-implemented from scratch as a community feature (resolve / re-open with Open/Resolved tabs). No EE code reused, available to anyone who can comment. |
| **Embedded MCP server** | A community MCP server (`@docmost/mcp`, 40 tools) is served over HTTP at `/mcp` — no enterprise license required. Replaces the removed license-gated EE MCP. |
| **Embedded MCP server** | A community MCP server (`@docmost/mcp`, 39 tools) is served over HTTP at `/mcp` — no enterprise license required. Replaces the removed license-gated EE MCP. |
| **AI agent chat** | Built-in AI agent chat over your wiki, written from scratch as a community feature — no enterprise license. The agent reads and edits pages on your behalf (scoped to your permissions), with full-text + vector (RAG) search and optional web access via external MCP servers. |
| **Rebranding** | App logo / name changed from *Docmost* to *Gitmost*. |
| **Compact page tree** | Default page-tree indentation reduced from 16px to 8px per nesting level. |
@@ -44,7 +44,7 @@ The goal of the fork is a **100% open, AGPL-only build with no Enterprise-Editio
### Embedded MCP server
Gitmost has **our own MCP server** — [docmost-mcp](https://github.com/vvzvlad/docmost-mcp),
which we wrote — **built directly into the app** and served at `/mcp`. It exposes **40
which we wrote — **built directly into the app** and served at `/mcp`. It exposes **39
agent-native tools**: surgical per-block edits (patch / insert / delete by id),
structure-preserving find/replace, scripted `(doc) => doc` transforms with a dry-run diff,
structured table editing, version history with diff / restore, comments, images and share
@@ -60,7 +60,7 @@ every little fix. And it needs no enterprise license.
| | **Gitmost `/mcp` (our docmost-mcp)** | Docmost's built-in MCP |
| --- | :---: | :---: |
| **Enterprise license** | Not required | Required |
| **Tools** | 40, agent-native | Coarse (read Markdown, page CRUD, replace whole page) |
| **Tools** | 39, agent-native | Coarse (read Markdown, page CRUD, replace whole page) |
| **Per-block edits / find-replace / scripted transforms** | ✅ | — |
| **Structured table editing, version diff / restore** | ✅ | — |
| **Comments, images, share links** | ✅ | — |

View File

@@ -33,7 +33,7 @@
| --- | --- |
| **Удалён EE-код** | Вырезан весь код Enterprise-редакции на клиенте и сервере; это чистая community/AGPL-сборка без лицензионных проверок. |
| **Резолв комментариев** | Переписан с нуля как community-функция (резолв / переоткрытие с вкладками «Открытые» / «Решённые»). EE-код не используется, доступно любому, кто может комментировать. |
| **Встроенный MCP-сервер** | Community MCP-сервер (`@docmost/mcp`, 40 инструментов) отдаётся по HTTP на `/mcp` — без enterprise-лицензии. Заменяет удалённый лицензируемый EE MCP. |
| **Встроенный MCP-сервер** | Community MCP-сервер (`@docmost/mcp`, 39 инструментов) отдаётся по HTTP на `/mcp` — без enterprise-лицензии. Заменяет удалённый лицензируемый EE MCP. |
| **Чат с AI-агентом** | Встроенный чат с AI-агентом по содержимому вики, написанный с нуля как community-функция — без enterprise-лицензии. Агент читает и редактирует страницы от вашего имени (в рамках ваших прав), с полнотекстовым + векторным (RAG) поиском и опциональным доступом в интернет через внешние MCP-серверы. |
| **Ребрендинг** | Логотип / название приложения изменены с *Docmost* на *Gitmost*. |
| **Компактное дерево страниц** | Отступ дерева страниц по умолчанию уменьшен с 16px до 8px на уровень вложенности. |
@@ -44,7 +44,7 @@
В Gitmost есть **наш собственный MCP-сервер** — [docmost-mcp](https://github.com/vvzvlad/docmost-mcp),
который мы написали сами, — **встроенный прямо в приложение** и доступный на `/mcp`. Он даёт
**40 agent-native инструментов**: точечное редактирование по блокам (patch / insert / delete
**39 agent-native инструментов**: точечное редактирование по блокам (patch / insert / delete
по id), find/replace с сохранением структуры, скриптовые трансформации `(doc) => doc` с
предпросмотром диффа, структурное редактирование таблиц, история версий с диффом /
восстановлением, комментарии, изображения и ссылки на шаринг — всё применяется через слой
@@ -60,7 +60,7 @@ real-time-коллаборации Docmost, поэтому запись нико
| | **`/mcp` в Gitmost (наш docmost-mcp)** | Родной MCP у Docmost |
| --- | :---: | :---: |
| **Enterprise-лицензия** | Не нужна | Нужна |
| **Инструменты** | 40, agent-native | Примитивные (Markdown, CRUD страниц, замена целиком) |
| **Инструменты** | 39, agent-native | Примитивные (Markdown, CRUD страниц, замена целиком) |
| **Правки по блокам / find-replace / скриптовые трансформации** | ✅ | — |
| **Структурное редактирование таблиц, дифф / восстановление версий** | ✅ | — |
| **Комментарии, изображения, ссылки на шаринг** | ✅ | — |

View File

@@ -28,7 +28,6 @@ import { ClsModule } from 'nestjs-cls';
import { NoopAuditModule } from './integrations/audit/audit.module';
import { ThrottleModule } from './integrations/throttle/throttle.module';
import { McpModule } from './integrations/mcp/mcp.module';
import { SandboxModule } from './integrations/sandbox/sandbox.module';
import { AiModule } from './integrations/ai/ai.module';
import { AiChatModule } from './core/ai-chat/ai-chat.module';
@@ -90,7 +89,6 @@ try {
TelemetryModule,
ThrottleModule,
McpModule,
SandboxModule,
AiModule,
AiChatModule,
...enterpriseModules,

View File

@@ -63,9 +63,6 @@ describe('AiChatToolsService deletePage guardrail (H4)', () => {
{} as never,
{} as never,
{} as never,
// sandboxStore (only used by the stash tool closure, which these tests do
// not execute).
{} as never,
);
});
@@ -178,9 +175,6 @@ describe('AiChatToolsService expanded toolset guardrails', () => {
{} as never,
{} as never,
{} as never,
// sandboxStore (only used by the stash tool closure, which these tests do
// not execute).
{} as never,
);
});
@@ -296,9 +290,6 @@ describe('AiChatToolsService node-arg JSON-string coercion', () => {
{} as never,
{} as never,
{} as never,
// sandboxStore (only used by the stash tool closure, which these tests do
// not execute).
{} as never,
);
});
@@ -449,9 +440,6 @@ describe('AiChatToolsService model-friendly input validation (#190)', () => {
{} as never,
{} as never,
{} as never,
// sandboxStore (only used by the stash tool closure, which these tests do
// not execute).
{} as never,
);
});

View File

@@ -16,7 +16,6 @@ import {
import { resolveCurrentPageResult } from './current-page.util';
import { parseNodeArg } from './parse-node-arg';
import { modelFriendlyInput } from './model-friendly-input';
import { SandboxStore } from '../../../integrations/sandbox/sandbox.store';
/**
* Per-user, per-request adapter that exposes Docmost READ operations to the
@@ -42,8 +41,6 @@ export class AiChatToolsService {
private readonly pageEmbeddingRepo: PageEmbeddingRepo,
private readonly spaceMemberRepo: SpaceMemberRepo,
private readonly pagePermissionRepo: PagePermissionRepo,
// Shared singleton in-RAM blob store backing the stash tool.
private readonly sandboxStore: SandboxStore,
) {}
async forUser(
@@ -89,17 +86,11 @@ export class AiChatToolsService {
aiChatId,
});
// Bind the stash tool to the shared in-RAM SandboxStore. The store owns the
// anonymous-URL composition (putAndLink) and the live/evict probes the MCP
// package needs to keep its mirror counts honest under FIFO eviction (the
// package never touches env or the store). asSink() centralizes the uri↔id
// mapping next to putAndLink, shared with the embedded-MCP wiring site.
const { DocmostClient, sharedToolSpecs } = await loadDocmostMcp();
const client: DocmostClientLike = new DocmostClient({
apiUrl,
getToken,
getCollabToken,
sandbox: this.sandboxStore.asSink(),
});
// Build an ai-SDK tool from a shared, zod-agnostic spec. The spec owns the
@@ -634,14 +625,6 @@ export class AiChatToolsService {
async ({ pageId, edits }) => await client.editPageText(pageId, edits),
),
// Returns ONLY the short link object — never the document body — so a
// large page can be handed to an external consumer without bloating
// context.
stashPage: sharedTool(
sharedToolSpecs.stashPage,
async ({ pageId }) => await client.stashPage(pageId),
),
patchNode: tool({
description:
'Replace a single content block (by id) with a new ProseMirror ' +

View File

@@ -5,6 +5,34 @@ import { pathToFileURL } from 'node:url';
* ESM-only `@docmost/mcp` package. We only need the constructor + the read/write
* methods used by the per-user tool adapter; the full client surface lives in
* `packages/mcp/src/client.ts`. Signatures here mirror that file exactly.
*
* DRIFT GUARD: the method NAMES below are runtime-checked against the real
* `DocmostClient` by `packages/mcp/test/unit/client-host-contract.test.mjs`
* (which can import the ESM class directly). If you rename/remove a method here
* or in client.ts, that test fails — so a stale mirror cannot silently ship a
* runtime "x is not a function" into an agent tool call. Keep the two in sync.
*
* STAGED PLAN — full derivation `DocmostClientLike = <real DocmostClient type>`
* (issue #193, layer 3) is intentionally NOT done; it stays a hand-mirror for
* now because of two verified blockers across the ESM(mcp)/CJS(server) boundary:
* 1. `@docmost/mcp` emits NO declaration files (its tsconfig has no
* `declaration`, package.json has no `types`/types-export) and the server
* tsconfig has no path mapping for it — the server only loads it via the
* runtime `import()` trick below, so there is no type to import today.
* 2. The real client methods have inferred, CONCRETE return types; the in-app
* tool adapter reads results through loose `Record<string,unknown>` returns
* + `as` casts (e.g. `(result?.data ?? {}) as { title?: string }`).
* Deriving the exact type would make those casts non-overlapping ("may be a
* mistake") and break the build, and `Partial<DocmostClientLike>` test stubs
* would have to satisfy the full concrete surface.
* To do it safely later (incrementally): (a) turn on `declaration: true` in
* packages/mcp/tsconfig.json + add a `types` export condition and commit the
* emitted `.d.ts`; (b) `import type { DocmostClient } from '@docmost/mcp'` here
* and replace this interface with a `Pick<DocmostClient, ...>` of the consumed
* methods; (c) audit every `as` cast in ai-chat-tools.service.ts against the now
* concrete return types (double-cast through `unknown` only where genuinely
* needed); (d) keep the runtime guard test as a belt-and-braces check. Until
* then the guard test above is the cheap, behaviour-neutral protection.
*/
export interface DocmostClientLike {
// --- read ---
@@ -154,14 +182,6 @@ export interface DocmostClientLike {
commentId: string,
resolved: boolean,
): Promise<Record<string, unknown>>;
// Serialize a page + mirror its internal images into the blob sandbox; returns
// ONLY a short anonymous URL (the body never enters the model context).
stashPage(pageId: string): Promise<{
uri: string;
sha256: string;
size: number;
images: { mirrored: number; failed: number };
}>;
}
export type DocmostClientConfig = {
@@ -169,18 +189,6 @@ export type DocmostClientConfig = {
getToken: () => Promise<string>;
// Provenance collab-token provider for content mutations (signed agent claim).
getCollabToken?: () => Promise<string>;
// Optional blob-sandbox sink for the stash tool. `put` stores a blob in the
// host's in-RAM SandboxStore and returns the anonymous read URL + integrity.
// The optional `has`/`evict` probes let stashPage keep its mirror counts
// honest under the store's FIFO eviction (mirror of the package's sink type).
sandbox?: {
put: (
buf: Buffer,
mime: string,
) => { uri: string; sha256: string; size: number };
has?: (uri: string) => boolean;
evict?: (uri: string) => void;
};
};
export interface DocmostClientCtor {

View File

@@ -14,148 +14,4 @@ describe('EnvironmentService', () => {
it('should be defined', () => {
expect(service).toBeDefined();
});
describe('getSandboxTtlMs', () => {
// ConfigService stub: get(key, def) returns the configured value for the key
// (falling back to def), matching the @nestjs/config contract the service
// calls with (key, default).
const build = (sandboxTtl?: string) =>
new EnvironmentService({
get: (key: string, def?: string) =>
key === 'SANDBOX_TTL_MS' ? (sandboxTtl ?? def) : def,
} as any);
it.each(['0', '-5', 'abc'])(
'falls back to the 3600000 default for invalid value %s',
(value) => {
expect(build(value).getSandboxTtlMs()).toBe(3_600_000);
},
);
it('returns the parsed value for a valid positive integer', () => {
expect(build('120000').getSandboxTtlMs()).toBe(120_000);
});
it('uses the 3600000 default when SANDBOX_TTL_MS is unset', () => {
expect(build(undefined).getSandboxTtlMs()).toBe(3_600_000);
});
});
// The three byte caps share the same getPositiveIntEnv() helper as the TTL,
// so a non-integer / non-positive value ('0'/'-5'/'abc') falls back to the
// documented default and a valid positive integer is returned parsed. Note
// parseInt truncates '1.5' -> 1 (a valid positive integer), so that value is
// accepted, not rejected — same as the pre-existing TTL getter.
describe.each([
{
name: 'getSandboxMaxBytes',
key: 'SANDBOX_MAX_BYTES',
def: 8_388_608,
getter: (s: EnvironmentService) => s.getSandboxMaxBytes(),
},
{
name: 'getSandboxMaxImageBytes',
key: 'SANDBOX_MAX_IMAGE_BYTES',
def: 20_971_520,
getter: (s: EnvironmentService) => s.getSandboxMaxImageBytes(),
},
{
name: 'getSandboxMaxTotalBytes',
key: 'SANDBOX_MAX_TOTAL_BYTES',
def: 134_217_728,
getter: (s: EnvironmentService) => s.getSandboxMaxTotalBytes(),
},
])('$name', ({ key, def, getter }) => {
// ConfigService stub: get(k, d) returns the configured value for THIS cap's
// key (falling back to d), and the default for every other key.
const build = (value?: string) =>
new EnvironmentService({
get: (k: string, d?: string) =>
k === key ? (value ?? d) : d,
} as any);
it.each(['0', '-5', 'abc'])(
`falls back to the ${def} default for invalid value %s`,
(value) => {
expect(getter(build(value))).toBe(def);
},
);
it('returns the parsed value for a valid positive integer', () => {
expect(getter(build('4096'))).toBe(4096);
});
it('truncates a non-integer like "1.5" to 1 via parseInt (not rejected)', () => {
expect(getter(build('1.5'))).toBe(1);
});
it(`uses the ${def} default when the env is unset`, () => {
expect(getter(build(undefined))).toBe(def);
});
});
// getPositiveIntEnv keeps a one-shot `invalidPositiveIntWarned` set so a bad
// value is logged ONCE per key (not on every getter call, which the sandbox
// hits per-put). These tests pin that dedup so a regression to per-call logging
// would fail loudly.
describe('invalid-value warn dedup', () => {
it('warns only once per key across repeated getter calls', () => {
const service = new EnvironmentService({
get: (k: string, d?: string) =>
k === 'SANDBOX_MAX_TOTAL_BYTES' ? '-5' : d,
} as any);
const warnSpy = jest
.spyOn((service as any).logger, 'warn')
.mockImplementation(() => undefined);
service.getSandboxMaxTotalBytes();
service.getSandboxMaxTotalBytes();
expect(warnSpy).toHaveBeenCalledTimes(1);
});
it('warns independently per key (dedup is per-key, not global)', () => {
// Two DIFFERENT SANDBOX_* keys are both invalid -> each warns once, so two
// warns total. This proves the dedup set is keyed, not a single global flag.
const service = new EnvironmentService({
get: (k: string, d?: string) =>
k === 'SANDBOX_MAX_BYTES' || k === 'SANDBOX_MAX_TOTAL_BYTES'
? '-5'
: d,
} as any);
const warnSpy = jest
.spyOn((service as any).logger, 'warn')
.mockImplementation(() => undefined);
service.getSandboxMaxBytes();
service.getSandboxMaxTotalBytes();
expect(warnSpy).toHaveBeenCalledTimes(2);
});
});
describe('getSandboxPublicUrl', () => {
// Stub that resolves BOTH keys the public-url logic consults.
const build = (vals: { sandboxUrl?: string; appUrl?: string }) =>
new EnvironmentService({
get: (key: string, def?: string) =>
key === 'SANDBOX_PUBLIC_URL'
? (vals.sandboxUrl ?? def)
: key === 'APP_URL'
? (vals.appUrl ?? def)
: def,
} as any);
it('uses SANDBOX_PUBLIC_URL and trims a trailing slash', () => {
expect(
build({ sandboxUrl: 'https://docs.example.com/' }).getSandboxPublicUrl(),
).toBe('https://docs.example.com');
});
it('falls back to APP_URL (origin) when SANDBOX_PUBLIC_URL is unset', () => {
expect(
build({ appUrl: 'https://app.example.com' }).getSandboxPublicUrl(),
).toBe('https://app.example.com');
});
});
});

View File

@@ -1,15 +1,9 @@
import { Injectable, Logger } from '@nestjs/common';
import { Injectable } from '@nestjs/common';
import { ConfigService } from '@nestjs/config';
import ms, { StringValue } from 'ms';
@Injectable()
export class EnvironmentService {
private readonly logger = new Logger(EnvironmentService.name);
// Env keys already warned about for an invalid value (one-shot per key, so a
// bad SANDBOX_* value is not logged on every blob put). Mirrors the original
// sandboxTtlWarned guard, generalized across the TTL + the three byte caps.
private readonly invalidPositiveIntWarned = new Set<string>();
constructor(private configService: ConfigService) {}
getNodeEnv(): string {
@@ -338,63 +332,4 @@ export class EnvironmentService {
.map((o) => o.trim())
.filter(Boolean);
}
// --- Blob sandbox (in-RAM ephemeral blob transfer; see SandboxModule) ---
// Base URL the sandbox `uri` is built from. It MUST be reachable over the
// network by the external consumer that fetches the blobs (not a loopback
// address if that consumer is remote). Falls back to APP_URL when unset so a
// single-host deployment works out of the box; set it explicitly when the
// consumer lives on another host.
getSandboxPublicUrl(): string {
const raw =
this.configService.get<string>('SANDBOX_PUBLIC_URL') || this.getAppUrl();
// Drop any trailing slash so `${base}/api/sb/${id}` never doubles up.
return raw.replace(/\/+$/, '');
}
// Parse a REQUIRED positive-integer env (TTL in ms or a byte cap). A
// non-integer or <= 0 value would break the sandbox silently (instant expiry,
// or every put failing against a 0-byte cap), so warn once and fall back to
// the default instead. Blob bodies are never logged.
private getPositiveIntEnv(key: string, def: number): number {
const parsed = parseInt(
this.configService.get<string>(key, String(def)),
10,
);
if (!Number.isInteger(parsed) || parsed <= 0) {
if (!this.invalidPositiveIntWarned.has(key)) {
this.invalidPositiveIntWarned.add(key);
this.logger.warn(
`Invalid ${key} (must be a positive integer); falling back to the ${def} default`,
);
}
return def;
}
return parsed;
}
// Blob time-to-live. Default 1h. The unguessable UUID + this short TTL + TLS
// are the whole capability model (no tokens). A non-positive or non-integer
// value would make every blob expire instantly (silent 404s), so reject it and
// fall back to the 1h default (warned about once to avoid per-put log spam).
getSandboxTtlMs(): number {
return this.getPositiveIntEnv('SANDBOX_TTL_MS', 3_600_000);
}
// Per-blob cap for non-image blobs (the serialized document). Default 8 MiB.
getSandboxMaxBytes(): number {
return this.getPositiveIntEnv('SANDBOX_MAX_BYTES', 8_388_608);
}
// Per-blob cap for mirrored image blobs. Default 20 MiB.
getSandboxMaxImageBytes(): number {
return this.getPositiveIntEnv('SANDBOX_MAX_IMAGE_BYTES', 20_971_520);
}
// RAM guard: total bytes the whole store may hold. Default 128 MiB. On
// overflow the store evicts oldest entries to make room.
getSandboxMaxTotalBytes(): number {
return this.getPositiveIntEnv('SANDBOX_MAX_TOTAL_BYTES', 134_217_728);
}
}

View File

@@ -2,7 +2,6 @@ import {
IsIn,
IsNotEmpty,
IsNotIn,
IsNumberString,
IsOptional,
IsString,
IsUrl,
@@ -171,35 +170,6 @@ export class EnvironmentVariables {
},
)
CLICKHOUSE_URL: string;
// --- Blob sandbox (in-RAM ephemeral blob transfer; see SandboxModule) ---
@IsOptional()
@ValidateIf((obj) => obj.SANDBOX_PUBLIC_URL != '' && obj.SANDBOX_PUBLIC_URL != null)
@IsUrl(
{ protocols: ['http', 'https'], require_tld: false },
{
message:
'SANDBOX_PUBLIC_URL must be a valid http(s) URL reachable by the external blob consumer',
},
)
SANDBOX_PUBLIC_URL: string;
@IsOptional()
@IsNumberString({}, { message: 'SANDBOX_TTL_MS must be an integer (milliseconds)' })
SANDBOX_TTL_MS: string;
@IsOptional()
@IsNumberString({}, { message: 'SANDBOX_MAX_BYTES must be an integer (bytes)' })
SANDBOX_MAX_BYTES: string;
@IsOptional()
@IsNumberString({}, { message: 'SANDBOX_MAX_IMAGE_BYTES must be an integer (bytes)' })
SANDBOX_MAX_IMAGE_BYTES: string;
@IsOptional()
@IsNumberString({}, { message: 'SANDBOX_MAX_TOTAL_BYTES must be an integer (bytes)' })
SANDBOX_MAX_TOTAL_BYTES: string;
}
export function validate(config: Record<string, any>) {

View File

@@ -131,25 +131,10 @@ export class FailedLoginLimiter {
}
// The per-session DocmostMcpConfig shape understood by @docmost/mcp: either the
// service-account credentials variant OR the per-user getToken variant. The
// optional `sandbox` sink (blob store for the stash tool) is common to both and
// injected by McpService after the auth decision.
export type DocmostMcpConfig = (
// service-account credentials variant OR the per-user getToken variant.
export type DocmostMcpConfig =
| { apiUrl: string; email: string; password: string }
| { apiUrl: string; getToken: () => Promise<string> }
) & {
sandbox?: {
put: (
buf: Buffer,
mime: string,
) => { uri: string; sha256: string; size: number };
// Optional live/evict probes the package uses to keep stash_page's mirror
// counts honest under the store's FIFO eviction (mirror of the package's
// sink type); older bindings omit them.
has?: (uri: string) => boolean;
evict?: (uri: string) => void;
};
};
| { apiUrl: string; getToken: () => Promise<string> };
export interface ResolvedMcpAuth {
config: DocmostMcpConfig;

View File

@@ -109,13 +109,13 @@ function makeService(opts: {
};
const service = new McpService(
undefined as never, // environmentService
undefined as never, // workspaceRepo
undefined as never, // authService
undefined as never, // tokenService
undefined as never, // userRepo
undefined as never, // userSessionRepo
moduleRef as never, // moduleRef (read by the MFA branch)
undefined as never, // sandboxStore (unused by the login-gate path)
);
// Stop the constructor's unref'd sweep timer leaking across tests.
service.onModuleDestroy();

View File

@@ -2,15 +2,17 @@ import { Module } from '@nestjs/common';
import { McpController } from './mcp.controller';
import { McpService } from './mcp.service';
import { DatabaseModule } from '@docmost/db/database.module';
import { EnvironmentModule } from '../environment/environment.module';
import { AuthModule } from '../../core/auth/auth.module';
import { TokenModule } from '../../core/auth/token.module';
// Community MCP feature: the server itself serves the Model Context Protocol
// over HTTP at /mcp. DatabaseModule (global) provides WorkspaceRepo. AuthModule
// supplies AuthService (per-user HTTP-Basic login validation) and TokenModule
// supplies TokenService (Bearer access-JWT verification for the token fallback).
// over HTTP at /mcp. DatabaseModule (global) provides WorkspaceRepo and
// EnvironmentModule (global) provides EnvironmentService. AuthModule supplies
// AuthService (per-user HTTP-Basic login validation) and TokenModule supplies
// TokenService (Bearer access-JWT verification for the token fallback).
@Module({
imports: [DatabaseModule, AuthModule, TokenModule],
imports: [DatabaseModule, EnvironmentModule, AuthModule, TokenModule],
controllers: [McpController],
providers: [McpService],
})

View File

@@ -8,6 +8,7 @@ import { ModuleRef } from '@nestjs/core';
import { pathToFileURL } from 'node:url';
import { IncomingMessage } from 'node:http';
import { FastifyReply, FastifyRequest } from 'fastify';
import { EnvironmentService } from '../environment/environment.service';
import { WorkspaceRepo } from '@docmost/db/repos/workspace/workspace.repo';
import { UserRepo } from '@docmost/db/repos/user/user.repo';
import { UserSessionRepo } from '@docmost/db/repos/session/user-session.repo';
@@ -29,7 +30,6 @@ import {
DocmostMcpConfig,
ResolvedMcpAuth,
} from './mcp-auth.helpers';
import { SandboxStore } from '../sandbox/sandbox.store';
// Minimal shape of the embedded MCP HTTP handler exported by @docmost/mcp/http.
interface McpHttpHandler {
@@ -92,14 +92,13 @@ export class McpService implements OnModuleDestroy {
private readonly sweepTimer: NodeJS.Timeout;
constructor(
private readonly environmentService: EnvironmentService,
private readonly workspaceRepo: WorkspaceRepo,
private readonly authService: AuthService,
private readonly tokenService: TokenService,
private readonly userRepo: UserRepo,
private readonly userSessionRepo: UserSessionRepo,
private readonly moduleRef: ModuleRef,
// Shared singleton in-RAM blob store backing the stash tool.
private readonly sandboxStore: SandboxStore,
) {
this.sweepTimer = setInterval(() => {
try {
@@ -327,11 +326,7 @@ export class McpService implements OnModuleDestroy {
// Should never happen: handle() always stashes before delegating.
throw new UnauthorizedException('MCP authentication missing.');
}
// Inject the blob-sandbox sink after the auth decision so stash_page
// can store blobs in the shared in-RAM store regardless of which
// credential variant resolved. The sink (put/has/evict + uri↔id
// mapping) is owned by SandboxStore.asSink().
return { ...resolved.config, sandbox: this.sandboxStore.asSink() };
return resolved.config;
},
{
identify: (req: IncomingMessage) => {

View File

@@ -1,6 +0,0 @@
// Single source of truth for the anonymous blob-sandbox route. The controller
// is mounted under the global `/api` prefix, so its decorator uses the bare
// segment while the public URL and the workspace-gate exclusion need the full
// path — derive the latter from the former so the two never drift.
export const SANDBOX_ROUTE_SEGMENT = 'sb';
export const SANDBOX_API_PATH = `/api/${SANDBOX_ROUTE_SEGMENT}`;

View File

@@ -1,265 +0,0 @@
import { SandboxController } from './sandbox.controller';
import { SandboxEntry } from './sandbox.store';
// Capturing fake of the FastifyReply surface the controller uses:
// status()/header()/headers()/send(), all chainable.
function makeRes() {
const sent: { status: number; headers: Record<string, any>; body: any } = {
status: 200,
headers: {},
body: undefined,
};
const res: any = {
status(code: number) {
sent.status = code;
return res;
},
header(key: string, value: any) {
sent.headers[key.toLowerCase()] = value;
return res;
},
headers(obj: Record<string, any>) {
for (const k of Object.keys(obj)) sent.headers[k.toLowerCase()] = obj[k];
return res;
},
send(body?: any) {
sent.body = body;
return res;
},
_sent: sent,
};
return res;
}
function makeReq(headers: Record<string, any> = {}) {
return { headers } as any;
}
// A syntactically valid v4 UUID (version nibble 4, variant nibble 8). The
// shared `uuid` validator is stricter than a bare hex-shape regex, so the id
// must carry a real version/variant.
const VALID_ID = 'aaaaaaaa-bbbb-4ccc-8ddd-eeeeeeeeeeee';
function entry(buf: Buffer, mime: string, sha256: string): SandboxEntry {
return { buf, mime, sha256, expiresAt: Date.now() + 60_000 };
}
describe('SandboxController', () => {
it('serves 200 with body, Content-Type, Content-Length and sha256 ETag', async () => {
const buf = Buffer.from('{"ok":true}', 'utf8');
const sha = 'a'.repeat(64);
const store = { get: jest.fn().mockReturnValue(entry(buf, 'application/json', sha)) };
const controller = new SandboxController(store as any);
const res = makeRes();
await controller.get(VALID_ID, makeReq(), res);
expect(store.get).toHaveBeenCalledWith(VALID_ID);
expect(res._sent.status).toBe(200);
expect(res._sent.headers['content-type']).toBe('application/json');
expect(res._sent.headers['content-length']).toBe(buf.length);
expect(res._sent.headers['etag']).toBe(`"${sha}"`);
expect(res._sent.body).toBe(buf);
});
it('returns 404 for a missing/expired blob', async () => {
const store = { get: jest.fn().mockReturnValue(undefined) };
const controller = new SandboxController(store as any);
const res = makeRes();
await controller.get(VALID_ID, makeReq(), res);
expect(res._sent.status).toBe(404);
expect(res._sent.body).toBeUndefined();
});
it('returns 404 for a non-UUID id WITHOUT touching the store (anti-traversal)', async () => {
const store = { get: jest.fn() };
const controller = new SandboxController(store as any);
const res = makeRes();
await controller.get('../../etc/passwd', makeReq(), res);
expect(store.get).not.toHaveBeenCalled();
expect(res._sent.status).toBe(404);
});
it('returns 304 (no body) when If-None-Match matches the ETag', async () => {
const sha = 'b'.repeat(64);
const store = {
get: jest.fn().mockReturnValue(entry(Buffer.from('x'), 'application/json', sha)),
};
const controller = new SandboxController(store as any);
const res = makeRes();
await controller.get(VALID_ID, makeReq({ 'if-none-match': `"${sha}"` }), res);
expect(res._sent.status).toBe(304);
expect(res._sent.body).toBeUndefined();
expect(res._sent.headers['etag']).toBe(`"${sha}"`);
});
it('accepts a bare (unquoted) sha256 in If-None-Match too', async () => {
const sha = 'c'.repeat(64);
const store = {
get: jest.fn().mockReturnValue(entry(Buffer.from('x'), 'application/json', sha)),
};
const controller = new SandboxController(store as any);
const res = makeRes();
await controller.get(VALID_ID, makeReq({ 'if-none-match': sha }), res);
expect(res._sent.status).toBe(304);
});
it('serves 200 when If-None-Match does NOT match', async () => {
const sha = 'd'.repeat(64);
const store = {
get: jest.fn().mockReturnValue(entry(Buffer.from('x'), 'application/json', sha)),
};
const controller = new SandboxController(store as any);
const res = makeRes();
await controller.get(VALID_ID, makeReq({ 'if-none-match': '"stale"' }), res);
expect(res._sent.status).toBe(200);
});
it('returns 304 for a wildcard "*" If-None-Match', async () => {
const sha = 'e'.repeat(64);
const store = {
get: jest.fn().mockReturnValue(entry(Buffer.from('x'), 'application/json', sha)),
};
const controller = new SandboxController(store as any);
const res = makeRes();
await controller.get(VALID_ID, makeReq({ 'if-none-match': '*' }), res);
expect(res._sent.status).toBe(304);
});
it('returns 304 for a weak validator W/"<sha>"', async () => {
const sha = 'f'.repeat(64);
const store = {
get: jest.fn().mockReturnValue(entry(Buffer.from('x'), 'application/json', sha)),
};
const controller = new SandboxController(store as any);
const res = makeRes();
await controller.get(VALID_ID, makeReq({ 'if-none-match': `W/"${sha}"` }), res);
expect(res._sent.status).toBe(304);
});
it('returns 304 when a comma-separated If-None-Match list contains the sha', async () => {
const sha = '1'.repeat(64);
const store = {
get: jest.fn().mockReturnValue(entry(Buffer.from('x'), 'application/json', sha)),
};
const controller = new SandboxController(store as any);
const res = makeRes();
await controller.get(
VALID_ID,
makeReq({ 'if-none-match': `"other", "${sha}"` }),
res,
);
expect(res._sent.status).toBe(304);
});
it('sets a private, immutable Cache-Control with a max-age within the TTL on 200', async () => {
const sha = '2'.repeat(64);
// Known TTL: ~30s out, so the floored max-age must land within [0, 60].
const e: SandboxEntry = {
buf: Buffer.from('x'),
mime: 'application/json',
sha256: sha,
expiresAt: Date.now() + 30_000,
};
const store = { get: jest.fn().mockReturnValue(e) };
const controller = new SandboxController(store as any);
const res = makeRes();
await controller.get(VALID_ID, makeReq(), res);
expect(res._sent.status).toBe(200);
const cc = res._sent.headers['cache-control'] as string;
expect(cc).toMatch(/^private, max-age=\d+, immutable$/);
const maxAge = Number(cc.match(/max-age=(\d+)/)![1]);
expect(maxAge).toBeGreaterThanOrEqual(0);
expect(maxAge).toBeLessThanOrEqual(60);
});
it('emits Cache-Control alongside ETag on the 304 branch', async () => {
const sha = '3'.repeat(64);
const store = {
get: jest.fn().mockReturnValue(entry(Buffer.from('x'), 'application/json', sha)),
};
const controller = new SandboxController(store as any);
const res = makeRes();
await controller.get(VALID_ID, makeReq({ 'if-none-match': `"${sha}"` }), res);
expect(res._sent.status).toBe(304);
expect(res._sent.headers['cache-control']).toMatch(
/^private, max-age=\d+, immutable$/,
);
});
it('sets nosniff + restrictive CSP and serves an allowlisted image inline', async () => {
const sha = '4'.repeat(64);
const store = {
get: jest.fn().mockReturnValue(entry(Buffer.from('x'), 'image/png', sha)),
};
const controller = new SandboxController(store as any);
const res = makeRes();
await controller.get(VALID_ID, makeReq(), res);
expect(res._sent.status).toBe(200);
expect(res._sent.headers['x-content-type-options']).toBe('nosniff');
expect(res._sent.headers['content-security-policy']).toBe(
"base-uri 'none'; object-src 'self'; default-src 'self';",
);
expect(res._sent.headers['content-disposition']).toBe('inline');
});
it('forces an SVG to download (attachment) while keeping nosniff + CSP', async () => {
const sha = '5'.repeat(64);
const store = {
get: jest.fn().mockReturnValue(entry(Buffer.from('<svg/>'), 'image/svg+xml', sha)),
};
const controller = new SandboxController(store as any);
const res = makeRes();
await controller.get(VALID_ID, makeReq(), res);
expect(res._sent.status).toBe(200);
expect(res._sent.headers['content-disposition']).toBe('attachment');
expect(res._sent.headers['x-content-type-options']).toBe('nosniff');
expect(res._sent.headers['content-security-policy']).toBe(
"base-uri 'none'; object-src 'self'; default-src 'self';",
);
});
it('forces text/html to download (attachment) while keeping nosniff + CSP', async () => {
const sha = '6'.repeat(64);
const store = {
get: jest
.fn()
.mockReturnValue(entry(Buffer.from('<h1>x</h1>'), 'text/html', sha)),
};
const controller = new SandboxController(store as any);
const res = makeRes();
await controller.get(VALID_ID, makeReq(), res);
expect(res._sent.status).toBe(200);
expect(res._sent.headers['content-disposition']).toBe('attachment');
expect(res._sent.headers['x-content-type-options']).toBe('nosniff');
expect(res._sent.headers['content-security-policy']).toBe(
"base-uri 'none'; object-src 'self'; default-src 'self';",
);
});
});

View File

@@ -1,130 +0,0 @@
import { Controller, Get, Param, Req, Res } from '@nestjs/common';
import { FastifyReply, FastifyRequest } from 'fastify';
import { validate as isValidUUID } from 'uuid';
import { SandboxStore } from './sandbox.store';
import { SANDBOX_ROUTE_SEGMENT } from './sandbox.constants';
// MIME types safe to render inline in a browser. SVG is deliberately EXCLUDED
// (it can carry script), as are text/html and the JSON document blob — anything
// not on this list is served as an attachment so an attacker-controlled mime can
// never execute script on this origin (the route is anonymous + same-origin).
const INLINE_SAFE_MIME = new Set([
'image/png',
'image/jpeg',
'image/gif',
'image/webp',
'image/avif',
]);
/**
* Anonymous read endpoint for the in-RAM blob sandbox.
*
* Mounted under the global `/api` prefix as `GET /api/sb/:id`. It carries NO
* `@UseGuards(JwtAuthGuard)`, so — exactly like the public attachment route
* `GET /api/files/public/...` — it is exempt from Docmost session auth. The
* route is ALSO listed in the workspace-resolution preHandler's excludedPaths
* in main.ts so a request from a remote consumer (which carries no workspace
* host) is not rejected with "Workspace not found".
*
* It only ever serves blobs looked up from the SandboxStore by a validated
* UUID; `:id` is never used as a filesystem path, so there is no traversal
* surface. Never returns tokens, never 401s.
*
* Anti-XSS hardening mirrors the public attachment route: every response sets
* `X-Content-Type-Options: nosniff` and a restrictive CSP, and serves any mime
* NOT on the inline-safe allowlist (svg/html/the JSON document blob) as an
* attachment, so an attacker-controlled `entry.mime` can never execute script
* on this same-origin anonymous route.
*/
@Controller(SANDBOX_ROUTE_SEGMENT)
export class SandboxController {
constructor(private readonly store: SandboxStore) {}
@Get(':id')
async get(
@Param('id') id: string,
@Req() req: FastifyRequest,
@Res() res: FastifyReply,
): Promise<void> {
// Validate `:id` as a real UUID via the shared `uuid` validator (same as the
// attachment routes). This is anti-traversal / input hygiene (so `:id` can
// never be a path like `../...`), NOT authorization — the capability is the
// unguessable id itself plus the short TTL plus TLS. A non-UUID id (including
// any traversal attempt) → 404 before touching the store; no stack trace
// leaks out.
if (!isValidUUID(id)) {
res.status(404).send();
return;
}
const entry = this.store.get(id);
if (!entry) {
// Missing or expired — indistinguishable to the caller, by design.
res.status(404).send();
return;
}
// Strong validator: quoted sha256, no W/ weak prefix. Same value computed
// at put() time, so an external consumer can detect a truncated/corrupted
// body — the original bug this whole channel exists to fix.
const etag = `"${entry.sha256}"`;
// Compute freshness BEFORE the conditional check: a 304 conditional
// revalidation must not lose the Cache-Control freshness directives, or a
// revalidating client would forget how long the blob stays fresh.
const ttlSeconds = Math.max(
0,
Math.floor((entry.expiresAt - Date.now()) / 1000),
);
// Capability URL — keep it out of shared caches; immutable for its TTL.
const cacheControl = `private, max-age=${ttlSeconds}, immutable`;
// Conditional request: an exact ETag match → 304 with no body. The blob is
// immutable, so the validator is stable for the blob's whole lifetime.
if (this.ifNoneMatchMatches(req.headers['if-none-match'], entry.sha256)) {
res
.status(304)
.header('ETag', etag)
.header('Cache-Control', cacheControl)
.send();
return;
}
// Non-allowlisted mimes (svg/html/the JSON blob) are forced to download so
// an attacker-controlled mime can never run script inline on this origin.
const disposition = INLINE_SAFE_MIME.has(entry.mime)
? 'inline'
: 'attachment';
// Use @Res() + res.send(Buffer) with an explicit Content-Type so the binary
// body bypasses the global JSON response transform/serializer.
res
.status(200)
.headers({
'Content-Type': entry.mime,
'Content-Length': entry.buf.length,
ETag: etag,
'Cache-Control': cacheControl,
'X-Content-Type-Options': 'nosniff',
'Content-Security-Policy':
"base-uri 'none'; object-src 'self'; default-src 'self';",
'Content-Disposition': disposition,
})
.send(entry.buf);
}
// Accept the consumer's If-None-Match whether it sends the quoted ETag, a bare
// sha256, a weak "W/"-prefixed validator, or a comma-separated list.
private ifNoneMatchMatches(
header: string | string[] | undefined,
sha256: string,
): boolean {
if (!header) return false;
const raw = Array.isArray(header) ? header.join(',') : header;
if (raw.trim() === '*') return true;
return raw
.split(',')
.map((t) => t.trim().replace(/^W\//, '').replace(/^"|"$/g, ''))
.some((t) => t === sha256);
}
}

View File

@@ -1,19 +0,0 @@
import { Global, Module } from '@nestjs/common';
import { SandboxController } from './sandbox.controller';
import { SandboxStore } from './sandbox.store';
/**
* In-RAM blob sandbox: a SINGLE shared SandboxStore (the @Injectable singleton)
* is written to by the stash tool (via McpService / AiChatToolsService) and read
* back by the anonymous SandboxController. Marked @Global so the same store
* instance is injectable everywhere without import churn — put() and get() MUST
* hit the same Map. EnvironmentService (caps/TTL/public URL) is provided by the
* global EnvironmentModule.
*/
@Global()
@Module({
controllers: [SandboxController],
providers: [SandboxStore],
exports: [SandboxStore],
})
export class SandboxModule {}

View File

@@ -1,163 +0,0 @@
import { createHash } from 'node:crypto';
import { validate as isValidUUID } from 'uuid';
import { SandboxStore } from './sandbox.store';
// Build a minimal EnvironmentService stub with overridable caps/TTL.
function makeEnv(
overrides: Partial<{
ttlMs: number;
maxBytes: number;
maxImageBytes: number;
maxTotalBytes: number;
}> = {},
) {
const cfg = {
ttlMs: 3_600_000,
maxBytes: 8_388_608,
maxImageBytes: 20_971_520,
maxTotalBytes: 134_217_728,
...overrides,
};
return {
getSandboxTtlMs: () => cfg.ttlMs,
getSandboxMaxBytes: () => cfg.maxBytes,
getSandboxMaxImageBytes: () => cfg.maxImageBytes,
getSandboxMaxTotalBytes: () => cfg.maxTotalBytes,
getSandboxPublicUrl: () => 'https://example.test',
} as any;
}
describe('SandboxStore', () => {
let store: SandboxStore;
afterEach(() => {
// Clear the unref'd sweep interval so it never leaks across tests.
store?.onModuleDestroy();
jest.useRealTimers();
});
it('put/get round-trips the exact bytes + mime and returns a UUID id', () => {
store = new SandboxStore(makeEnv());
const buf = Buffer.from('{"type":"doc","content":[]}', 'utf8');
const res = store.put(buf, 'application/json');
expect(isValidUUID(res.id)).toBe(true);
expect(res.size).toBe(buf.length);
const entry = store.get(res.id);
expect(entry).toBeDefined();
expect(entry!.buf.equals(buf)).toBe(true);
expect(entry!.mime).toBe('application/json');
});
it('computes sha256 over the body (matches a manual digest)', () => {
store = new SandboxStore(makeEnv());
const buf = Buffer.from('hello sandbox', 'utf8');
const expected = createHash('sha256').update(buf).digest('hex');
const res = store.put(buf, 'text/plain');
expect(res.sha256).toBe(expected);
expect(store.get(res.id)!.sha256).toBe(expected);
});
it('returns undefined for a missing id', () => {
store = new SandboxStore(makeEnv());
expect(store.get('11111111-1111-1111-1111-111111111111')).toBeUndefined();
});
it('lazily expires entries past the TTL (get returns undefined)', () => {
jest.useFakeTimers();
jest.setSystemTime(new Date('2026-01-01T00:00:00Z'));
store = new SandboxStore(makeEnv({ ttlMs: 1000 }));
const res = store.put(Buffer.from('x'), 'text/plain');
expect(store.get(res.id)).toBeDefined();
jest.setSystemTime(new Date('2026-01-01T00:00:02Z')); // +2s > 1s TTL
expect(store.get(res.id)).toBeUndefined();
// Eviction also frees the byte accounting.
expect(store.bytes).toBe(0);
});
it('background sweep drops expired entries without a get()', () => {
jest.useFakeTimers();
jest.setSystemTime(new Date('2026-01-01T00:00:00Z'));
store = new SandboxStore(makeEnv({ ttlMs: 1000 }));
store.put(Buffer.from('x'), 'text/plain');
expect(store.size).toBe(1);
jest.setSystemTime(new Date('2026-01-01T00:01:30Z')); // past TTL
jest.advanceTimersByTime(60_000); // fire the sweep interval
expect(store.size).toBe(0);
});
it('rejects a non-image blob over SANDBOX_MAX_BYTES', () => {
store = new SandboxStore(makeEnv({ maxBytes: 16 }));
expect(() => store.put(Buffer.alloc(17), 'application/json')).toThrow(
/per-blob cap/,
);
});
it('uses the larger image cap for image/* blobs', () => {
// 100 bytes exceeds the doc cap (16) but fits the image cap (1024).
store = new SandboxStore(makeEnv({ maxBytes: 16, maxImageBytes: 1024 }));
expect(() => store.put(Buffer.alloc(100), 'image/png')).not.toThrow();
// SVG counts as an image too.
expect(() => store.put(Buffer.alloc(100), 'image/svg+xml')).not.toThrow();
});
it('evicts oldest entries when the total cap would be exceeded', () => {
// Total cap 250 bytes; each blob 100 bytes -> only 2 fit at a time.
store = new SandboxStore(
makeEnv({ maxTotalBytes: 250, maxBytes: 1024 }),
);
const a = store.put(Buffer.alloc(100), 'application/json');
const b = store.put(Buffer.alloc(100), 'application/json');
const c = store.put(Buffer.alloc(100), 'application/json'); // evicts a
expect(store.get(a.id)).toBeUndefined(); // oldest evicted
expect(store.get(b.id)).toBeDefined();
expect(store.get(c.id)).toBeDefined();
expect(store.bytes).toBeLessThanOrEqual(250);
});
it('rejects a single blob larger than the whole total cap', () => {
store = new SandboxStore(
makeEnv({ maxTotalBytes: 50, maxBytes: 1024 }),
);
expect(() => store.put(Buffer.alloc(100), 'application/json')).toThrow(
/total store cap/,
);
});
it('putAndLink composes the anonymous /api/sb/<id> url with matching integrity', () => {
store = new SandboxStore(makeEnv());
const buf = Buffer.from('hello link', 'utf8');
const expected = createHash('sha256').update(buf).digest('hex');
const res = store.putAndLink(buf, 'image/png');
expect(res.uri).toMatch(/^https:\/\/example\.test\/api\/sb\/[0-9a-f-]{36}$/);
expect(res.sha256).toBe(expected);
expect(res.size).toBe(buf.length);
});
it('has()/remove() report and free a blob by id', () => {
store = new SandboxStore(makeEnv());
const { id } = store.put(Buffer.from('x'), 'text/plain');
expect(store.has(id)).toBe(true);
store.remove(id);
expect(store.has(id)).toBe(false);
expect(store.bytes).toBe(0);
});
it('asSink() round-trips put/has/evict through the anonymous uri', () => {
store = new SandboxStore(makeEnv());
const sink = store.asSink();
const buf = Buffer.from('sink bytes', 'utf8');
const r = sink.put(buf, 'image/png');
expect(sink.has(r.uri)).toBe(true);
sink.evict(r.uri);
expect(sink.has(r.uri)).toBe(false);
});
});

View File

@@ -1,178 +0,0 @@
import { Injectable, Logger, OnModuleDestroy } from '@nestjs/common';
import { createHash, randomUUID } from 'node:crypto';
import { EnvironmentService } from '../environment/environment.service';
import { SANDBOX_API_PATH } from './sandbox.constants';
// In-RAM, process-local blob store. No disk, no DB. Ephemeral by design: a
// restart empties it. A blob is addressed by an unguessable randomUUID() which
// IS the read capability — there are NO tokens. Each blob is immutable (its id
// never maps to changing content), so its sha256 is a perfect strong ETag.
export interface SandboxEntry {
buf: Buffer;
mime: string;
sha256: string;
expiresAt: number;
}
export interface SandboxPutResult {
id: string;
sha256: string;
size: number;
}
@Injectable()
export class SandboxStore implements OnModuleDestroy {
private readonly logger = new Logger(SandboxStore.name);
// Map preserves insertion order, so the first key is the oldest entry — used
// for FIFO eviction when the total-bytes RAM guard is exceeded.
private readonly map = new Map<string, SandboxEntry>();
private totalBytes = 0;
// Background sweep clears expired entries so never-fetched blobs do not linger
// until the next get(). unref()'d so it never holds the event loop open;
// cleared on module destroy. Mirrors the sweepTimer pattern in
// integrations/mcp/mcp.service.ts and packages/mcp/src/http.ts.
private readonly sweepIntervalMs = 60_000;
private readonly sweepTimer: NodeJS.Timeout;
constructor(private readonly environmentService: EnvironmentService) {
this.sweepTimer = setInterval(() => {
try {
this.sweep();
} catch (err) {
this.logger.error('Sandbox sweep failed', err as Error);
}
}, this.sweepIntervalMs);
this.sweepTimer.unref?.();
}
onModuleDestroy(): void {
clearInterval(this.sweepTimer);
}
/**
* Store a blob and return its read capability id + integrity metadata. The
* per-blob cap is chosen by mime (images get the larger image cap), and the
* total-store RAM guard evicts oldest entries to make room. Throws a clear
* error when a single blob cannot fit even after eviction. Blob bodies are
* never logged.
*/
put(buf: Buffer, mime: string): SandboxPutResult {
const perBlobCap = mime.startsWith('image/')
? this.environmentService.getSandboxMaxImageBytes()
: this.environmentService.getSandboxMaxBytes();
if (buf.length > perBlobCap) {
throw new Error(
`Sandbox blob of ${buf.length} bytes exceeds the ${perBlobCap}-byte per-blob cap`,
);
}
const maxTotal = this.environmentService.getSandboxMaxTotalBytes();
if (buf.length > maxTotal) {
throw new Error(
`Sandbox blob of ${buf.length} bytes exceeds the total store cap of ${maxTotal} bytes`,
);
}
// Drop expired entries first, then evict oldest until the new blob fits.
this.sweep();
while (this.totalBytes + buf.length > maxTotal && this.map.size > 0) {
const oldest = this.map.keys().next().value as string;
this.evict(oldest);
}
const id = randomUUID();
const sha256 = createHash('sha256').update(buf).digest('hex');
const expiresAt = Date.now() + this.environmentService.getSandboxTtlMs();
this.map.set(id, { buf, mime, sha256, expiresAt });
this.totalBytes += buf.length;
return { id, sha256, size: buf.length };
}
/**
* Store a blob and return its anonymous read URL plus integrity metadata.
* Owns the single sandbox-URL composition (`${publicBase}${SANDBOX_API_PATH}/
* <id>`) so callers never hand-build the route; the raw put() stays public for
* tests/low-level callers. sha256 is also the blob's strong ETag.
*/
putAndLink(
buf: Buffer,
mime: string,
): { uri: string; sha256: string; size: number } {
const stored = this.put(buf, mime);
const base = this.environmentService.getSandboxPublicUrl();
return {
uri: `${base}${SANDBOX_API_PATH}/${stored.id}`,
sha256: stored.sha256,
size: stored.size,
};
}
/**
* Adapter to the package's blob-sandbox sink contract `{ put, has, evict }`.
* The sink speaks anonymous `uri`s while the store is keyed by `id`, so this is
* the ONE place that maps a sandbox uri back to its id (the last path segment).
* Both wiring sites (embedded MCP + in-app agent tools) use this so the uri↔id
* mapping and URL composition live next to putAndLink, not copy-pasted.
*/
asSink(): {
put: (buf: Buffer, mime: string) => { uri: string; sha256: string; size: number };
has: (uri: string) => boolean;
evict: (uri: string) => void;
} {
const idOf = (uri: string) => uri.substring(uri.lastIndexOf('/') + 1);
return {
put: (buf, mime) => this.putAndLink(buf, mime),
has: (uri) => this.has(idOf(uri)),
evict: (uri) => this.remove(idOf(uri)),
};
}
/** True if the blob is still live (not evicted/expired). */
has(id: string): boolean {
return this.get(id) !== undefined;
}
/** Drop a blob by id (public wrapper over the private FIFO evict). */
remove(id: string): void {
this.evict(id);
}
/** Returns the entry, or undefined if missing OR expired (lazy expiry). */
get(id: string): SandboxEntry | undefined {
const entry = this.map.get(id);
if (!entry) return undefined;
if (entry.expiresAt <= Date.now()) {
this.evict(id);
return undefined;
}
return entry;
}
/** Current number of live entries (test/diagnostic helper). */
get size(): number {
return this.map.size;
}
/** Current total bytes held (test/diagnostic helper). */
get bytes(): number {
return this.totalBytes;
}
private evict(id: string): void {
const entry = this.map.get(id);
if (entry) {
this.totalBytes -= entry.buf.length;
this.map.delete(id);
}
}
private sweep(): void {
const now = Date.now();
for (const [id, entry] of this.map) {
if (entry.expiresAt <= now) {
this.evict(id);
}
}
}
}

View File

@@ -13,7 +13,6 @@ import fastifyCookie from '@fastify/cookie';
import fastifyIp from 'fastify-ip';
import { InternalLogFilter } from './common/logger/internal-log-filter';
import { EnvironmentService } from './integrations/environment/environment.service';
import { SANDBOX_API_PATH } from './integrations/sandbox/sandbox.constants';
import { resolveFrameHeader } from './common/helpers';
import { resolveTrustProxy } from './integrations/environment/trust-proxy.util';
@@ -127,10 +126,6 @@ async function bootstrap() {
'/api/workspace/create',
'/api/workspace/joined',
'/api/workspace/find-by-email',
// Anonymous in-RAM blob sandbox: a remote consumer fetches blobs by an
// unguessable UUID without any workspace host context, so the
// workspace-resolution gate must not apply.
SANDBOX_API_PATH,
];
if (

View File

@@ -16,7 +16,7 @@ license.
> that interface. Other Docmost MCPs are human-shaped — they expose "open the page" and
> "replace the page"; this one exposes the editing primitives a model is good at.
It exposes **40 tools** built around three ideas that the other Docmost MCPs do not
It exposes **38 tools** built around three ideas that the other Docmost MCPs do not
combine:
1. **Surgical, token-cheap edits.** Address a single block by id and patch it, or run
@@ -106,7 +106,7 @@ There are several Docmost MCPs. Here is a capability-by-capability comparison.
## Tools
All 40 tools, grouped by what you'd reach for them.
All 38 tools, grouped by what you'd reach for them.
### Exploration & retrieval
@@ -203,14 +203,6 @@ All 40 tools, grouped by what you'd reach for them.
node referencing the old attachment (recursively, including callouts/tables) via the
live document, preserving comments, alignment and alt text. (In-place overwrite is
deliberately avoided — some Docmost versions corrupt the attachment on overwrite.)
- **`stash_page`** — Serialize a whole page (its full ProseMirror JSON) into an ephemeral
in-RAM blob and return ONLY a short anonymous URL — the body never enters the model
context, so it is the way to hand a large page (and its images) to an external consumer
without truncation. Every internal file/image attachment is mirrored into the same
sandbox and its `src` rewritten to a sandbox URL; external http(s) images are left
untouched. Returns `{ uri, size, sha256, images:{ mirrored, failed } }` (`sha256` is also
the blob's ETag). Blobs are RAM-only, expire after a short TTL (~1h) and are bound to the
server instance that created them.
### Comments

View File

@@ -17,7 +17,7 @@
> «открыть страницу» и «заменить страницу»; этот даёт примитивы редактирования, в которых
> модель сильна.
Сервер предоставляет **40 инструментов**, построенных вокруг трёх идей, которые другие
Сервер предоставляет **38 инструментов**, построенных вокруг трёх идей, которые другие
Docmost-MCP не сочетают:
1. **Точечные, экономичные по токенам правки.** Адресуйте отдельный блок по id и патчите
@@ -109,7 +109,7 @@ Docmost-MCP не сочетают:
## Инструменты
Все 40 инструментов, сгруппированы по задачам, для которых вы их возьмёте.
Все 38 инструментов, сгруппированы по задачам, для которых вы их возьмёте.
### Чтение и поиск
@@ -209,15 +209,6 @@ Docmost-MCP не сочетают:
коллауты/таблицы), через живой документ, сохраняя комментарии, выравнивание и alt-текст.
(Перезапись «по месту» намеренно не используется — некоторые версии Docmost портят
вложение при перезаписи.)
- **`stash_page`** — Сериализовать страницу целиком (её полный ProseMirror JSON) в
эфемерный blob в оперативной памяти и вернуть ТОЛЬКО короткий анонимный URL — тело
никогда не попадает в контекст модели, поэтому это способ передать большую страницу
(вместе с её изображениями) внешнему потребителю без усечения. Каждое внутреннее
файловое/графическое вложение зеркалируется в тот же sandbox, а его `src` переписывается
на URL sandbox; внешние http(s)-изображения остаются нетронутыми. Возвращает
`{ uri, size, sha256, images:{ mirrored, failed } }` (`sha256` — это также ETag blob'а).
Blob'ы хранятся только в оперативной памяти, истекают через короткий TTL (~1 ч) и
привязаны к тому экземпляру сервера, который их создал.
### Комментарии

View File

@@ -7,7 +7,6 @@ import { TiptapTransformer } from "@hocuspocus/transformer";
import * as Y from "yjs";
import WebSocket from "ws";
import { convertProseMirrorToMarkdown } from "./lib/markdown-converter.js";
import { collectInternalFileNodes, normalizeFileUrl, resolveInternalFilePath, } from "./lib/internal-file-urls.js";
import { updatePageContentRealtime, replacePageContent, markdownToProseMirror, markdownToProseMirrorCanonical, mutatePageContent, buildCollabWsUrl, assertYjsEncodable, applyDocToFragment, } from "./lib/collaboration.js";
import { footnoteWarningsField } from "./lib/footnote-analyze.js";
import { buildPageTree } from "./lib/tree.js";
@@ -52,13 +51,6 @@ export class DocmostClient {
// its token instead of calling POST /auth/collab-token; on a 401/403 it is
// re-invoked once. Used by the internal agent to carry signed provenance.
getCollabTokenFn = null;
// Optional blob-sandbox sink for the stash tool. Null when not configured.
sandboxPut = null;
// Optional probes paired with the sink. `has` lets stashPage detect a blob
// FIFO-evicted by a LATER put in the same stash; `evict` lets it free this
// op's image blobs if the final doc put throws. Null when the sink omits them.
sandboxHas = null;
sandboxEvict = null;
// In-flight login dedup: when the token expires, the 401 interceptor,
// ensureAuthenticated, getCollabTokenWithReauth and the two multipart retries
// can all call login() at once. Memoizing a single promise collapses that
@@ -85,11 +77,6 @@ export class DocmostClient {
if (config.getCollabToken) {
this.getCollabTokenFn = config.getCollabToken;
}
if (config.sandbox) {
this.sandboxPut = config.sandbox.put;
this.sandboxHas = config.sandbox.has ?? null;
this.sandboxEvict = config.sandbox.evict ?? null;
}
this.client = axios.create({
baseURL: this.apiUrl,
// Default request timeout so a hung connection cannot wedge a per-page
@@ -618,181 +605,6 @@ export class DocmostClient {
content: data.content || { type: "doc", content: [] },
};
}
/**
* Fetch an INTERNAL Docmost file (authed loopback) for sandbox mirroring.
* `src` is normalized to `/api/files/<id>/<file>`; `this.client.baseURL`
* already ends in `/api`, so we strip the leading `/api` and request the
* relative path with the client's Authorization header. Returns the raw bytes
* and the response Content-Type (mime), defaulting to octet-stream.
*
* The fetch is size-bounded (hard 64 MiB ceiling) purely to protect memory;
* the authoritative per-blob cap is enforced by the sandbox `put`. The path is
* resolved via resolveInternalFilePath, which REJECTS (throws) any traversal
* or percent-encoded src that would let an attacker-controlled `attrs.src`
* escape `/api/files/` and reach another internal endpoint (SSRF). That throw
* happens before this.client.get, so a malicious src is counted as a failed
* mirror — it never reaches the network.
*/
async fetchInternalFile(src) {
const HARD_CEILING = 64 * 1024 * 1024; // 64 MiB memory guard
const relPath = resolveInternalFilePath(src);
const response = await this.client.get(relPath, {
responseType: "arraybuffer",
timeout: 30000,
maxContentLength: HARD_CEILING,
maxBodyLength: HARD_CEILING,
});
const buffer = Buffer.from(response.data);
if (buffer.length === 0) {
throw new Error(`Empty file response from "${src}"`);
}
const rawCt = response.headers?.["content-type"];
const mime = typeof rawCt === "string" && rawCt.length > 0
? rawCt.split(";")[0].trim().toLowerCase()
: "application/octet-stream";
return { buffer, mime };
}
/**
* Stash a page's full content into the in-RAM blob sandbox and return ONLY a
* short anonymous URL — the body never enters the model context (this is the
* whole point: ~30KB+ ProseMirror docs blow the model context if passed as a
* tool argument). Every INTERNAL file/image src (the type-agnostic criterion,
* so drawio/excalidraw/video/file nodes are covered too) is mirrored into the
* sandbox and its `src` rewritten to the sandbox URL, so an external consumer
* can fetch the images anonymously. External http(s) srcs are left untouched.
*
* Blobs live in RAM with a short TTL and are cleared on restart — consume the
* URLs within the TTL and one uptime. A failed image fetch never aborts the
* doc: the original src is kept and the failure counted.
*
* Returns { uri, sha256, size, images:{mirrored, failed} }. `uri` and `sha256`
* are for the document blob; `sha256` is also the blob's ETag (integrity).
*/
async stashPage(pageId) {
if (!this.sandboxPut) {
throw new Error("stash_page is unavailable: the blob sandbox is not configured on this server");
}
await this.ensureAuthenticated();
// Stash the SAME shape get_page_json returns (id/title/.../content), with a
// deep clone so the rewrite never mutates anything shared.
const pageJson = await this.getPageJson(pageId);
const cloned = structuredClone(pageJson);
// Group internal-file nodes by normalized src so each unique resource is
// fetched + stored ONCE (dedup), and every node sharing that src points at
// the one sandbox blob. Capture each node's ORIGINAL raw src per-node:
// dedup groups nodes whose normalized src is equal even when their raw srcs
// differ (e.g. `/api/files/...` vs the bare `/files/...`), so on a revert we
// must restore each node's own original value, not the group key.
const bySrc = new Map();
for (const node of collectInternalFileNodes(cloned.content)) {
const origSrc = String(node.attrs.src);
const src = normalizeFileUrl(origSrc);
const entry = { node, origSrc };
const group = bySrc.get(src);
if (group)
group.push(entry);
else
bySrc.set(src, [entry]);
}
let mirrored = 0;
let failed = 0;
// Record every successful mirror so it can be (a) reverted if its blob gets
// FIFO-evicted by a LATER put in this same stash, and (b) freed if the final
// doc put throws.
const mirrors = [];
const MAX_CONCURRENCY = 5;
const groups = [...bySrc.entries()];
for (let i = 0; i < groups.length; i += MAX_CONCURRENCY) {
const batch = groups.slice(i, i + MAX_CONCURRENCY);
await Promise.all(batch.map(async ([src, entries]) => {
try {
const { buffer, mime } = await this.fetchInternalFile(src);
// put may throw if the blob exceeds the per-blob/total caps.
const stored = this.sandboxPut(buffer, mime);
for (const entry of entries)
entry.node.attrs.src = stored.uri;
mirrors.push({ uri: stored.uri, entries });
mirrored++;
}
catch (err) {
// One bad/oversized image (or a rejected traversal src) must not
// abort the document. Logged unconditionally (never the blob body),
// matching the package's ungated console.warn convention.
failed++;
console.warn(`stash_page: failed to mirror "${src}": ${err instanceof Error ? err.message : String(err)}`);
}
}));
}
// Revert one mirror's nodes to their original internal srcs and re-count it
// as failed (its blob was FIFO-evicted before the doc could reference it
// safely).
const revertMirror = (mirror) => {
for (const entry of mirror.entries)
entry.node.attrs.src = entry.origSrc;
mirrored--;
failed++;
console.warn(`stash_page: mirrored blob ${mirror.uri} was evicted before the doc ` +
`could safely reference it; reverted its src and counted it as failed`);
};
// Pre-put reconciliation: an image put earlier in THIS stash can FIFO-evict
// an even-earlier image of the same stash. Drop those from the live set
// first so the first serialized doc is already mostly correct.
let liveMirrors = mirrors;
if (this.sandboxHas) {
liveMirrors = [];
for (const mirror of mirrors) {
if (this.sandboxHas(mirror.uri))
liveMirrors.push(mirror);
else
revertMirror(mirror);
}
}
// Put the document, then reconcile against eviction caused by the doc put
// ITSELF (the doc is newest, FIFO drops oldest = this stash's images). Each
// iteration reverts >=1 mirror, so the loop terminates (worst case: all
// images reverted and the doc references no sandbox image URLs).
let stored;
for (;;) {
const docBuf = Buffer.from(JSON.stringify(cloned), "utf8");
let docStored;
try {
docStored = this.sandboxPut(docBuf, "application/json");
}
catch (err) {
// The doc put failed (e.g. doc exceeds the cap). Free this op's image
// blobs instead of leaking them in RAM for the whole TTL, then
// re-throw.
if (this.sandboxEvict) {
for (const mirror of liveMirrors)
this.sandboxEvict(mirror.uri);
}
throw err;
}
if (!this.sandboxHas) {
stored = docStored;
break;
}
const evictedNow = liveMirrors.filter((m) => !this.sandboxHas(m.uri));
if (evictedNow.length === 0) {
stored = docStored;
break;
}
// The doc we just stored references now-dead blobs. Revert those nodes,
// drop the stale doc blob, and loop to re-serialize + re-put the
// corrected doc.
for (const mirror of evictedNow)
revertMirror(mirror);
liveMirrors = liveMirrors.filter((m) => this.sandboxHas(m.uri));
if (this.sandboxEvict)
this.sandboxEvict(docStored.uri);
}
return {
uri: stored.uri,
sha256: stored.sha256,
size: stored.size,
images: { mirrored, failed },
};
}
/**
* Compact outline of a page's top-level blocks (no full document body).
* Cheap way to locate sections/tables and grab block ids before drilling in

View File

@@ -285,38 +285,6 @@ export function createDocmostMcpServer(config) {
const result = await docmostClient.editPageText(pageId, edits);
return jsonContent(result);
});
// Tool: stash_page — returns a resource_link (NOT embedded text) so the doc
// body never enters the model context. Registered directly (not via
// registerShared) because that helper only emits text content. Also returns
// `structuredContent` carrying the full documented `{uri, sha256, size, images}`
// shape alongside the resource_link, so MCP clients receive the blob's sha256
// (its ETag, for integrity) and mirror counts, not just the link.
server.registerTool(SHARED_TOOL_SPECS.stashPage.mcpName, {
description: SHARED_TOOL_SPECS.stashPage.description,
inputSchema: SHARED_TOOL_SPECS.stashPage.buildShape(z),
}, async ({ pageId }) => {
const result = await docmostClient.stashPage(pageId);
return {
content: [
{
type: "resource_link",
uri: result.uri,
name: "page.json",
mimeType: "application/json",
size: result.size,
},
],
// Mirror the full documented result shape ({ uri, size, sha256, images })
// as structuredContent so MCP clients get the blob's sha256 (its ETag, for
// integrity) and the mirror counts, not just the resource_link.
structuredContent: {
uri: result.uri,
sha256: result.sha256,
size: result.size,
images: result.images,
},
};
});
// Tool: patch_node
server.registerTool("patch_node", {
description: "Replaces a single block identified by its attrs.id WITHOUT resending the " +

View File

@@ -1,110 +0,0 @@
// Detection + collection of INTERNAL Docmost file URLs inside a ProseMirror doc.
//
// An internal file URL is a relative path served by Docmost's authenticated
// attachment route (`GET /api/files/:fileId/:fileName`). It is useless to an
// external consumer (relative + needs a Docmost session), so the stash tool
// mirrors every such resource into the blob sandbox and rewrites its `src`.
//
// The criterion is "internal file URL", NOT the node TYPE: image, drawio,
// excalidraw, video and file nodes all carry such a `src`, so a type-agnostic
// walker covers them all. External http(s) srcs (CDNs) are left untouched.
//
// Mirrors editor-ext's isInternalFileUrl / normalizeFileUrl (kept as a local
// dup so the ESM mcp package does not depend on the editor-ext build).
function isInternalFileUrl(url) {
if (typeof url !== "string")
return false;
const normalized = url.trim();
return (normalized.startsWith("/api/files/") || normalized.startsWith("/files/"));
}
/** Normalize a bare `/files/...` src to the canonical `/api/files/...` form. */
export function normalizeFileUrl(src) {
const trimmed = src.trim();
if (trimmed.startsWith("/files/"))
return "/api" + trimmed;
return trimmed;
}
/**
* Resolve a page-content `src` into the safe, `/api`-relative path the stash
* tool may fetch over the authenticated loopback client — or THROW.
*
* SECURITY (SSRF / path-traversal): `src` comes from page content and is fully
* attacker-controllable. The mirroring fetch runs through the AUTHENTICATED
* loopback axios client whose baseURL ends in `/api`, so a naive
* `src.replace(/^\/api/, "")` lets a crafted value like
* `/api/files/../auth/whoami` collapse (via axios/WHATWG URL `..` resolution)
* into an ARBITRARY internal GET endpoint, whose authed response would then be
* stored in the anonymous sandbox (SSRF + data exfiltration). A prefix-only
* `startsWith("/api/files/")` check does NOT defend against this because the
* `..` segments are still present in the raw string and resolved later.
*
* This function defeats that by resolving the canonical pathname FIRST and only
* then asserting it still lives under `/api/files/`:
* - it rejects any percent-encoded dot/slash (`%2e` / `%2f`): the WHATWG URL
* parser collapses LITERAL `../` but does NOT decode `%2f` separators, so a
* content-controlled src must never be allowed to smuggle those past the
* canonicalization;
* - it resolves `new URL(trimmed, "http://internal.invalid").pathname`, which
* normalizes `..`/`.` segments (e.g. `/api/files/../auth/whoami` →
* `/api/auth/whoami`);
* - it then requires the canonical pathname to start with `/api/files/`, so a
* traversal that escaped that subtree is rejected.
*
* Returns the path RELATIVE to the `/api` base (e.g. `/files/<id>/<name>`),
* ready to hand to the loopback client. The throw happens BEFORE any network
* call, so a rejected src is counted as a failed mirror and its original src is
* kept (the per-image try/catch in stashPage never aborts the whole document).
*/
export function resolveInternalFilePath(src) {
const trimmed = src.trim();
// Percent-encoded dot/slash must never reach the URL canonicalizer: the
// WHATWG parser does NOT decode `%2f` into a path separator, so an encoded
// `..%2fauth` would survive canonicalization and still escape /api/files/.
if (/%2e|%2f/i.test(trimmed)) {
throw new Error(`Refusing internal file src with percent-encoded path segment: "${src}"`);
}
let pathname;
try {
// The base host is irrelevant (never contacted); it only lets the parser
// resolve a relative `src` and normalize `..`/`.` segments.
pathname = new URL(trimmed, "http://internal.invalid").pathname;
}
catch {
throw new Error(`Invalid internal file src: "${src}"`);
}
if (!pathname.startsWith("/api/files/")) {
throw new Error(`Refusing internal file src that escapes /api/files/: "${src}"`);
}
// Strip the `/api` base prefix; the loopback client's baseURL already ends
// in `/api`, so it expects the path relative to that (e.g. /files/<id>/<f>).
return pathname.replace(/^\/api/, "");
}
/**
* Recursively collect every node whose `attrs.src` is an internal file URL.
* Returns references to the live nodes (so the caller can rewrite `attrs.src`
* in place on its clone). Descends `content` arrays, covering callouts, tables,
* details and any other nested container.
*/
export function collectInternalFileNodes(doc) {
const out = [];
const visit = (node) => {
if (!node)
return;
if (Array.isArray(node)) {
for (const child of node)
visit(child);
return;
}
if (typeof node !== "object")
return;
if (node.attrs && isInternalFileUrl(node.attrs.src)) {
out.push(node);
}
if (Array.isArray(node.content)) {
for (const child of node.content)
visit(child);
}
};
visit(doc);
return out;
}

View File

@@ -209,27 +209,4 @@ export const SHARED_TOOL_SPECS = {
.describe('List of find/replace operations, applied in order'),
}),
},
// --- hand a large page to an external consumer without bloating context ---
stashPage: {
mcpName: 'stash_page',
inAppKey: 'stashPage',
description: 'Serialize a whole page (the full ProseMirror JSON, as get_page_json ' +
'returns) into an ephemeral in-memory blob and return ONLY a short ' +
'anonymous URL to it — the body NEVER enters the model context, so this ' +
'is the way to hand a large page (or its images) to an external consumer ' +
'without truncation. Every internal file/image attachment is mirrored ' +
'into the same sandbox and its src rewritten to a sandbox URL, so the ' +
'consumer can fetch the images anonymously too; external http(s) images ' +
'are left untouched. Returns { uri, size, sha256, images:{mirrored, ' +
'failed} }. Integrity: the blob is served with ETag = its sha256, so a ' +
'truncated/corrupted fetch is detectable. Blobs are RAM-only: they expire ' +
'after a short TTL (~1h) and are cleared on restart — consume the URL ' +
'within the TTL and one uptime, or re-stash. A blob is bound to the ' +
'server instance that created it: in a multi-replica deployment without ' +
'sticky sessions a blob stored on one instance is not retrievable via the ' +
'sandbox URL on another (it 404s like an expired one).',
buildShape: (z) => ({
pageId: z.string().min(1),
}),
},
};

View File

@@ -13,11 +13,6 @@ import { TiptapTransformer } from "@hocuspocus/transformer";
import * as Y from "yjs";
import WebSocket from "ws";
import { convertProseMirrorToMarkdown } from "./lib/markdown-converter.js";
import {
collectInternalFileNodes,
normalizeFileUrl,
resolveInternalFilePath,
} from "./lib/internal-file-urls.js";
import {
updatePageContentRealtime,
replacePageContent,
@@ -107,14 +102,6 @@ const MIME_TO_EXT: Record<string, string> = {
* Housed here (not in index.ts) so client.ts has no type dependency on index.ts;
* index.ts re-exports it for the package's public surface.
*/
// Sink the stash tool writes blobs into. The host app binds this to its in-RAM
// SandboxStore and composes the public `uri` (the package never sees the store
// or any env). `put` returns the anonymous read URL plus integrity metadata.
export type SandboxPut = (
buf: Buffer,
mime: string,
) => { uri: string; sha256: string; size: number };
export type DocmostMcpConfig = { apiUrl: string } & (
| { email: string; password: string }
| { getToken: () => Promise<string> } // returns a BARE JWT; the client adds "Bearer "
@@ -122,15 +109,6 @@ export type DocmostMcpConfig = { apiUrl: string } & (
// Optional collab-token provider (returns a ready collab JWT). Common to
// both branches; see the type doc above.
getCollabToken?: () => Promise<string>;
// Optional blob sandbox sink. Present only where the stash tool is wired;
// when absent, stash_page throws a clear "not configured" error. The
// optional `has`/`evict` probes let stashPage keep its mirror counts honest
// under the store's FIFO eviction (see stashPage); older sinks omit them.
sandbox?: {
put: SandboxPut;
has?: (uri: string) => boolean;
evict?: (uri: string) => void;
};
};
export class DocmostClient {
@@ -148,13 +126,6 @@ export class DocmostClient {
// its token instead of calling POST /auth/collab-token; on a 401/403 it is
// re-invoked once. Used by the internal agent to carry signed provenance.
private getCollabTokenFn: (() => Promise<string>) | null = null;
// Optional blob-sandbox sink for the stash tool. Null when not configured.
private sandboxPut: SandboxPut | null = null;
// Optional probes paired with the sink. `has` lets stashPage detect a blob
// FIFO-evicted by a LATER put in the same stash; `evict` lets it free this
// op's image blobs if the final doc put throws. Null when the sink omits them.
private sandboxHas: ((uri: string) => boolean) | null = null;
private sandboxEvict: ((uri: string) => void) | null = null;
// In-flight login dedup: when the token expires, the 401 interceptor,
// ensureAuthenticated, getCollabTokenWithReauth and the two multipart retries
// can all call login() at once. Memoizing a single promise collapses that
@@ -194,11 +165,6 @@ export class DocmostClient {
if (config.getCollabToken) {
this.getCollabTokenFn = config.getCollabToken;
}
if (config.sandbox) {
this.sandboxPut = config.sandbox.put;
this.sandboxHas = config.sandbox.has ?? null;
this.sandboxEvict = config.sandbox.evict ?? null;
}
this.client = axios.create({
baseURL: this.apiUrl,
// Default request timeout so a hung connection cannot wedge a per-page
@@ -801,203 +767,6 @@ export class DocmostClient {
};
}
/**
* Fetch an INTERNAL Docmost file (authed loopback) for sandbox mirroring.
* `src` is normalized to `/api/files/<id>/<file>`; `this.client.baseURL`
* already ends in `/api`, so we strip the leading `/api` and request the
* relative path with the client's Authorization header. Returns the raw bytes
* and the response Content-Type (mime), defaulting to octet-stream.
*
* The fetch is size-bounded (hard 64 MiB ceiling) purely to protect memory;
* the authoritative per-blob cap is enforced by the sandbox `put`. The path is
* resolved via resolveInternalFilePath, which REJECTS (throws) any traversal
* or percent-encoded src that would let an attacker-controlled `attrs.src`
* escape `/api/files/` and reach another internal endpoint (SSRF). That throw
* happens before this.client.get, so a malicious src is counted as a failed
* mirror — it never reaches the network.
*/
private async fetchInternalFile(
src: string,
): Promise<{ buffer: Buffer; mime: string }> {
const HARD_CEILING = 64 * 1024 * 1024; // 64 MiB memory guard
const relPath = resolveInternalFilePath(src);
const response = await this.client.get(relPath, {
responseType: "arraybuffer",
timeout: 30000,
maxContentLength: HARD_CEILING,
maxBodyLength: HARD_CEILING,
});
const buffer = Buffer.from(response.data);
if (buffer.length === 0) {
throw new Error(`Empty file response from "${src}"`);
}
const rawCt = response.headers?.["content-type"];
const mime =
typeof rawCt === "string" && rawCt.length > 0
? rawCt.split(";")[0].trim().toLowerCase()
: "application/octet-stream";
return { buffer, mime };
}
/**
* Stash a page's full content into the in-RAM blob sandbox and return ONLY a
* short anonymous URL — the body never enters the model context (this is the
* whole point: ~30KB+ ProseMirror docs blow the model context if passed as a
* tool argument). Every INTERNAL file/image src (the type-agnostic criterion,
* so drawio/excalidraw/video/file nodes are covered too) is mirrored into the
* sandbox and its `src` rewritten to the sandbox URL, so an external consumer
* can fetch the images anonymously. External http(s) srcs are left untouched.
*
* Blobs live in RAM with a short TTL and are cleared on restart — consume the
* URLs within the TTL and one uptime. A failed image fetch never aborts the
* doc: the original src is kept and the failure counted.
*
* Returns { uri, sha256, size, images:{mirrored, failed} }. `uri` and `sha256`
* are for the document blob; `sha256` is also the blob's ETag (integrity).
*/
async stashPage(pageId: string): Promise<{
uri: string;
sha256: string;
size: number;
images: { mirrored: number; failed: number };
}> {
if (!this.sandboxPut) {
throw new Error(
"stash_page is unavailable: the blob sandbox is not configured on this server",
);
}
await this.ensureAuthenticated();
// Stash the SAME shape get_page_json returns (id/title/.../content), with a
// deep clone so the rewrite never mutates anything shared.
const pageJson = await this.getPageJson(pageId);
const cloned: any = structuredClone(pageJson);
// Group internal-file nodes by normalized src so each unique resource is
// fetched + stored ONCE (dedup), and every node sharing that src points at
// the one sandbox blob. Capture each node's ORIGINAL raw src per-node:
// dedup groups nodes whose normalized src is equal even when their raw srcs
// differ (e.g. `/api/files/...` vs the bare `/files/...`), so on a revert we
// must restore each node's own original value, not the group key.
const bySrc = new Map<string, Array<{ node: any; origSrc: string }>>();
for (const node of collectInternalFileNodes(cloned.content)) {
const origSrc = String(node.attrs.src);
const src = normalizeFileUrl(origSrc);
const entry = { node, origSrc };
const group = bySrc.get(src);
if (group) group.push(entry);
else bySrc.set(src, [entry]);
}
let mirrored = 0;
let failed = 0;
// Record every successful mirror so it can be (a) reverted if its blob gets
// FIFO-evicted by a LATER put in this same stash, and (b) freed if the final
// doc put throws.
const mirrors: Array<{
uri: string;
entries: Array<{ node: any; origSrc: string }>;
}> = [];
const MAX_CONCURRENCY = 5;
const groups = [...bySrc.entries()];
for (let i = 0; i < groups.length; i += MAX_CONCURRENCY) {
const batch = groups.slice(i, i + MAX_CONCURRENCY);
await Promise.all(
batch.map(async ([src, entries]) => {
try {
const { buffer, mime } = await this.fetchInternalFile(src);
// put may throw if the blob exceeds the per-blob/total caps.
const stored = this.sandboxPut!(buffer, mime);
for (const entry of entries) entry.node.attrs.src = stored.uri;
mirrors.push({ uri: stored.uri, entries });
mirrored++;
} catch (err) {
// One bad/oversized image (or a rejected traversal src) must not
// abort the document. Logged unconditionally (never the blob body),
// matching the package's ungated console.warn convention.
failed++;
console.warn(
`stash_page: failed to mirror "${src}": ${
err instanceof Error ? err.message : String(err)
}`,
);
}
}),
);
}
// Revert one mirror's nodes to their original internal srcs and re-count it
// as failed (its blob was FIFO-evicted before the doc could reference it
// safely).
const revertMirror = (mirror: {
uri: string;
entries: Array<{ node: any; origSrc: string }>;
}) => {
for (const entry of mirror.entries) entry.node.attrs.src = entry.origSrc;
mirrored--;
failed++;
console.warn(
`stash_page: mirrored blob ${mirror.uri} was evicted before the doc ` +
`could safely reference it; reverted its src and counted it as failed`,
);
};
// Pre-put reconciliation: an image put earlier in THIS stash can FIFO-evict
// an even-earlier image of the same stash. Drop those from the live set
// first so the first serialized doc is already mostly correct.
let liveMirrors = mirrors;
if (this.sandboxHas) {
liveMirrors = [];
for (const mirror of mirrors) {
if (this.sandboxHas(mirror.uri)) liveMirrors.push(mirror);
else revertMirror(mirror);
}
}
// Put the document, then reconcile against eviction caused by the doc put
// ITSELF (the doc is newest, FIFO drops oldest = this stash's images). Each
// iteration reverts >=1 mirror, so the loop terminates (worst case: all
// images reverted and the doc references no sandbox image URLs).
let stored: { uri: string; sha256: string; size: number };
for (;;) {
const docBuf = Buffer.from(JSON.stringify(cloned), "utf8");
let docStored: { uri: string; sha256: string; size: number };
try {
docStored = this.sandboxPut(docBuf, "application/json");
} catch (err) {
// The doc put failed (e.g. doc exceeds the cap). Free this op's image
// blobs instead of leaking them in RAM for the whole TTL, then
// re-throw.
if (this.sandboxEvict) {
for (const mirror of liveMirrors) this.sandboxEvict(mirror.uri);
}
throw err;
}
if (!this.sandboxHas) {
stored = docStored;
break;
}
const evictedNow = liveMirrors.filter((m) => !this.sandboxHas!(m.uri));
if (evictedNow.length === 0) {
stored = docStored;
break;
}
// The doc we just stored references now-dead blobs. Revert those nodes,
// drop the stale doc blob, and loop to re-serialize + re-put the
// corrected doc.
for (const mirror of evictedNow) revertMirror(mirror);
liveMirrors = liveMirrors.filter((m) => this.sandboxHas!(m.uri));
if (this.sandboxEvict) this.sandboxEvict(docStored.uri);
}
return {
uri: stored.uri,
sha256: stored.sha256,
size: stored.size,
images: { mirrored, failed },
};
}
/**
* Compact outline of a page's top-level blocks (no full document body).
* Cheap way to locate sections/tables and grab block ids before drilling in

View File

@@ -408,43 +408,6 @@ registerShared(SHARED_TOOL_SPECS.editPageText, async ({ pageId, edits }) => {
return jsonContent(result);
});
// Tool: stash_page — returns a resource_link (NOT embedded text) so the doc
// body never enters the model context. Registered directly (not via
// registerShared) because that helper only emits text content. Also returns
// `structuredContent` carrying the full documented `{uri, sha256, size, images}`
// shape alongside the resource_link, so MCP clients receive the blob's sha256
// (its ETag, for integrity) and mirror counts, not just the link.
server.registerTool(
SHARED_TOOL_SPECS.stashPage.mcpName,
{
description: SHARED_TOOL_SPECS.stashPage.description,
inputSchema: SHARED_TOOL_SPECS.stashPage.buildShape!(z),
},
async ({ pageId }: { pageId: string }) => {
const result = await docmostClient.stashPage(pageId);
return {
content: [
{
type: "resource_link" as const,
uri: result.uri,
name: "page.json",
mimeType: "application/json",
size: result.size,
},
],
// Mirror the full documented result shape ({ uri, size, sha256, images })
// as structuredContent so MCP clients get the blob's sha256 (its ETag, for
// integrity) and the mirror counts, not just the resource_link.
structuredContent: {
uri: result.uri,
sha256: result.sha256,
size: result.size,
images: result.images,
},
};
},
);
// Tool: patch_node
server.registerTool(
"patch_node",

View File

@@ -1,113 +0,0 @@
// Detection + collection of INTERNAL Docmost file URLs inside a ProseMirror doc.
//
// An internal file URL is a relative path served by Docmost's authenticated
// attachment route (`GET /api/files/:fileId/:fileName`). It is useless to an
// external consumer (relative + needs a Docmost session), so the stash tool
// mirrors every such resource into the blob sandbox and rewrites its `src`.
//
// The criterion is "internal file URL", NOT the node TYPE: image, drawio,
// excalidraw, video and file nodes all carry such a `src`, so a type-agnostic
// walker covers them all. External http(s) srcs (CDNs) are left untouched.
//
// Mirrors editor-ext's isInternalFileUrl / normalizeFileUrl (kept as a local
// dup so the ESM mcp package does not depend on the editor-ext build).
function isInternalFileUrl(url: unknown): boolean {
if (typeof url !== "string") return false;
const normalized = url.trim();
return (
normalized.startsWith("/api/files/") || normalized.startsWith("/files/")
);
}
/** Normalize a bare `/files/...` src to the canonical `/api/files/...` form. */
export function normalizeFileUrl(src: string): string {
const trimmed = src.trim();
if (trimmed.startsWith("/files/")) return "/api" + trimmed;
return trimmed;
}
/**
* Resolve a page-content `src` into the safe, `/api`-relative path the stash
* tool may fetch over the authenticated loopback client — or THROW.
*
* SECURITY (SSRF / path-traversal): `src` comes from page content and is fully
* attacker-controllable. The mirroring fetch runs through the AUTHENTICATED
* loopback axios client whose baseURL ends in `/api`, so a naive
* `src.replace(/^\/api/, "")` lets a crafted value like
* `/api/files/../auth/whoami` collapse (via axios/WHATWG URL `..` resolution)
* into an ARBITRARY internal GET endpoint, whose authed response would then be
* stored in the anonymous sandbox (SSRF + data exfiltration). A prefix-only
* `startsWith("/api/files/")` check does NOT defend against this because the
* `..` segments are still present in the raw string and resolved later.
*
* This function defeats that by resolving the canonical pathname FIRST and only
* then asserting it still lives under `/api/files/`:
* - it rejects any percent-encoded dot/slash (`%2e` / `%2f`): the WHATWG URL
* parser collapses LITERAL `../` but does NOT decode `%2f` separators, so a
* content-controlled src must never be allowed to smuggle those past the
* canonicalization;
* - it resolves `new URL(trimmed, "http://internal.invalid").pathname`, which
* normalizes `..`/`.` segments (e.g. `/api/files/../auth/whoami` →
* `/api/auth/whoami`);
* - it then requires the canonical pathname to start with `/api/files/`, so a
* traversal that escaped that subtree is rejected.
*
* Returns the path RELATIVE to the `/api` base (e.g. `/files/<id>/<name>`),
* ready to hand to the loopback client. The throw happens BEFORE any network
* call, so a rejected src is counted as a failed mirror and its original src is
* kept (the per-image try/catch in stashPage never aborts the whole document).
*/
export function resolveInternalFilePath(src: string): string {
const trimmed = src.trim();
// Percent-encoded dot/slash must never reach the URL canonicalizer: the
// WHATWG parser does NOT decode `%2f` into a path separator, so an encoded
// `..%2fauth` would survive canonicalization and still escape /api/files/.
if (/%2e|%2f/i.test(trimmed)) {
throw new Error(
`Refusing internal file src with percent-encoded path segment: "${src}"`,
);
}
let pathname: string;
try {
// The base host is irrelevant (never contacted); it only lets the parser
// resolve a relative `src` and normalize `..`/`.` segments.
pathname = new URL(trimmed, "http://internal.invalid").pathname;
} catch {
throw new Error(`Invalid internal file src: "${src}"`);
}
if (!pathname.startsWith("/api/files/")) {
throw new Error(
`Refusing internal file src that escapes /api/files/: "${src}"`,
);
}
// Strip the `/api` base prefix; the loopback client's baseURL already ends
// in `/api`, so it expects the path relative to that (e.g. /files/<id>/<f>).
return pathname.replace(/^\/api/, "");
}
/**
* Recursively collect every node whose `attrs.src` is an internal file URL.
* Returns references to the live nodes (so the caller can rewrite `attrs.src`
* in place on its clone). Descends `content` arrays, covering callouts, tables,
* details and any other nested container.
*/
export function collectInternalFileNodes(doc: unknown): any[] {
const out: any[] = [];
const visit = (node: any): void => {
if (!node) return;
if (Array.isArray(node)) {
for (const child of node) visit(child);
return;
}
if (typeof node !== "object") return;
if (node.attrs && isInternalFileUrl(node.attrs.src)) {
out.push(node);
}
if (Array.isArray(node.content)) {
for (const child of node.content) visit(child);
}
};
visit(doc);
return out;
}

View File

@@ -266,29 +266,4 @@ export const SHARED_TOOL_SPECS = {
.describe('List of find/replace operations, applied in order'),
}),
},
// --- hand a large page to an external consumer without bloating context ---
stashPage: {
mcpName: 'stash_page',
inAppKey: 'stashPage',
description:
'Serialize a whole page (the full ProseMirror JSON, as get_page_json ' +
'returns) into an ephemeral in-memory blob and return ONLY a short ' +
'anonymous URL to it — the body NEVER enters the model context, so this ' +
'is the way to hand a large page (or its images) to an external consumer ' +
'without truncation. Every internal file/image attachment is mirrored ' +
'into the same sandbox and its src rewritten to a sandbox URL, so the ' +
'consumer can fetch the images anonymously too; external http(s) images ' +
'are left untouched. Returns { uri, size, sha256, images:{mirrored, ' +
'failed} }. Integrity: the blob is served with ETag = its sha256, so a ' +
'truncated/corrupted fetch is detectable. Blobs are RAM-only: they expire ' +
'after a short TTL (~1h) and are cleared on restart — consume the URL ' +
'within the TTL and one uptime, or re-stash. A blob is bound to the ' +
'server instance that created it: in a multi-replica deployment without ' +
'sticky sessions a blob stored on one instance is not retrievable via the ' +
'sandbox URL on another (it 404s like an expired one).',
buildShape: (z) => ({
pageId: z.string().min(1),
}),
},
} satisfies Record<string, SharedToolSpec>;

View File

@@ -1,155 +0,0 @@
// Server round-trip test for the stash_page MCP tool result shape. The in-app
// path returns the full documented `{ uri, size, sha256, images }` object, but
// the MCP transport must deliver the SAME shape: a resource_link (primary
// payload) PLUS a `structuredContent` mirror carrying sha256 + image counts.
// This connects a real MCP Client to the server over a linked in-memory
// transport pair and asserts both halves of the result, end to end.
import { test, after } from "node:test";
import assert from "node:assert/strict";
import http from "node:http";
import { createHash } from "node:crypto";
import { createDocmostMcpServer } from "../../build/index.js";
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { InMemoryTransport } from "@modelcontextprotocol/sdk/inMemory.js";
function readBody(req) {
return new Promise((resolve) => {
let raw = "";
req.on("data", (c) => (raw += c));
req.on("end", () => resolve(raw));
});
}
function startServer(handler) {
return new Promise((resolve) => {
const server = http.createServer(handler);
server.listen(0, "127.0.0.1", () => {
const { port } = server.address();
resolve({ server, baseURL: `http://127.0.0.1:${port}/api` });
});
});
}
const openServers = [];
async function spawn(handler) {
const { server, baseURL } = await startServer(handler);
openServers.push(server);
return baseURL;
}
after(async () => {
await Promise.all(openServers.map((s) => new Promise((r) => s.close(r))));
});
// Minimal in-memory sandbox sink: store the blob and return a uri + sha256 +
// size, with has/evict probes the client's reconciliation may call.
function makeSandbox() {
const live = new Map();
const idOf = (uri) => uri.substring(uri.lastIndexOf("/") + 1);
let n = 0;
return {
put(buf) {
const sha256 = createHash("sha256").update(buf).digest("hex");
const id = `id-${n++}`;
live.set(id, buf.length);
return { uri: `https://sb.test/api/sb/${id}`, sha256, size: buf.length };
},
has(uri) {
return live.has(idOf(uri));
},
evict(uri) {
live.delete(idOf(uri));
},
};
}
const IMAGE_BYTES = Buffer.from([0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a]);
// One internal image (so images.mirrored === 1) inside a normal page doc.
function pageDoc() {
return {
type: "doc",
content: [
{
type: "image",
attrs: { src: "/api/files/att-1/pic.png", attachmentId: "att-1" },
},
],
};
}
// Mock Docmost: login, page info, internal file bytes — same pattern as
// stash-page.test.mjs.
async function buildBaseURL() {
return spawn(async (req, res) => {
await readBody(req);
if (req.url === "/api/auth/login") {
res.writeHead(200, {
"Content-Type": "application/json",
"Set-Cookie": "authToken=tok; HttpOnly",
});
res.end(JSON.stringify({ token: "tok" }));
return;
}
if (req.url === "/api/pages/info") {
res.writeHead(200, { "Content-Type": "application/json" });
res.end(
JSON.stringify({ data: { id: "page-1", title: "T", content: pageDoc() } }),
);
return;
}
if (req.url.startsWith("/api/files/")) {
res.writeHead(200, { "Content-Type": "image/png" });
res.end(IMAGE_BYTES);
return;
}
res.writeHead(404);
res.end();
});
}
test("stash_page MCP tool returns a resource_link AND a structuredContent mirror", async () => {
const baseURL = await buildBaseURL();
const sandbox = makeSandbox();
const server = createDocmostMcpServer({
apiUrl: baseURL,
email: "u@example.com",
password: "pw",
sandbox,
});
const client = new Client({ name: "test-client", version: "0.0.0" });
const [a, b] = InMemoryTransport.createLinkedPair();
await server.connect(b);
await client.connect(a);
try {
const res = await client.callTool({
name: "stash_page",
arguments: { pageId: "page-1" },
});
// Primary payload: a resource_link pointing at the sandbox doc blob.
const link = res.content[0];
assert.equal(link.type, "resource_link");
assert.match(link.uri, /^https:\/\/sb\.test\/api\/sb\//);
// structuredContent mirrors the full documented shape.
const sc = res.structuredContent;
assert.equal(typeof sc, "object");
assert.equal(sc.uri, link.uri); // same blob as the link
assert.match(sc.sha256, /^[0-9a-f]{64}$/); // 64-hex ETag
assert.equal(typeof sc.size, "number");
assert.deepEqual(sc.images, { mirrored: 1, failed: 0 });
// Deep-equal the whole structured payload against what the mock implies.
assert.deepEqual(sc, {
uri: link.uri,
sha256: sc.sha256,
size: sc.size,
images: { mirrored: 1, failed: 0 },
});
} finally {
await client.close();
await server.close();
}
});

View File

@@ -1,378 +0,0 @@
// Mock-HTTP test for DocmostClient.stashPage: a local http server stands in for
// Docmost so the whole flow stays deterministic and offline. Asserts the tool
// (1) serializes the page into the sandbox and returns ONLY a link (uri + sha256
// + size), never the body; (2) mirrors INTERNAL image srcs into the sandbox and
// rewrites them to the sandbox uri; (3) leaves EXTERNAL http(s) srcs untouched;
// (4) de-duplicates a repeated internal src to a single blob; (5) counts a
// failed image fetch without aborting the document.
import { test, after } from "node:test";
import assert from "node:assert/strict";
import http from "node:http";
import { createHash } from "node:crypto";
import { DocmostClient } from "../../build/client.js";
function readBody(req) {
return new Promise((resolve) => {
let raw = "";
req.on("data", (c) => (raw += c));
req.on("end", () => resolve(raw));
});
}
function startServer(handler) {
return new Promise((resolve) => {
const server = http.createServer(handler);
server.listen(0, "127.0.0.1", () => {
const { port } = server.address();
resolve({ server, baseURL: `http://127.0.0.1:${port}/api` });
});
});
}
const openServers = [];
async function spawn(handler) {
const { server, baseURL } = await startServer(handler);
openServers.push(server);
return baseURL;
}
after(async () => {
await Promise.all(openServers.map((s) => new Promise((r) => s.close(r))));
});
// In-memory sandbox sink mirroring the host binding: store the blob, return a
// uri + sha256 + size. Records every put so the test can inspect what was
// stashed (and verify the doc body never leaves via the return value). Models
// the real store's FIFO eviction + cap + the has/evict probes so B1 (self-
// eviction reconciliation and doc-put-throw cleanup) is testable. Default
// maxTotal is effectively unlimited so the happy-path tests behave as before.
//
// `throwOnJson` forces the final document put to throw, standing in for "doc
// exceeds the cap".
function makeSandbox({ maxTotal = Infinity, throwOnJson = false } = {}) {
const puts = [];
const evicted = [];
// id -> size, in insertion order (Map preserves it) so the oldest is first.
const live = new Map();
let total = 0;
const idOf = (uri) => uri.substring(uri.lastIndexOf("/") + 1);
return {
puts,
evicted,
put(buf, mime) {
if (throwOnJson && mime === "application/json") {
throw new Error("doc blob exceeds the sandbox cap");
}
const sha256 = createHash("sha256").update(buf).digest("hex");
const id = `id-${puts.length}`;
puts.push({ buf, mime, sha256, id });
live.set(id, buf.length);
total += buf.length;
// FIFO-evict the oldest live blobs until this put fits under the cap.
while (total > maxTotal && live.size > 0) {
const oldest = live.keys().next().value;
if (oldest === id) break; // never evict the blob we just stored
total -= live.get(oldest);
live.delete(oldest);
evicted.push(oldest);
}
return { uri: `https://sb.test/api/sb/${id}`, sha256, size: buf.length };
},
has(uri) {
return live.has(idOf(uri));
},
evict(uri) {
const id = idOf(uri);
if (live.has(id)) {
total -= live.get(id);
live.delete(id);
}
evicted.push(id);
},
};
}
const IMAGE_BYTES = Buffer.from([0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a]); // "PNG" header-ish
function pageDoc() {
return {
type: "doc",
content: [
{
type: "image",
attrs: { src: "/api/files/att-1/pic.png", attachmentId: "att-1", width: 100 },
},
// Same internal src again -> must dedup to ONE blob, both rewritten.
{
type: "image",
attrs: { src: "/api/files/att-1/pic.png", attachmentId: "att-1", width: 50 },
},
// External CDN image -> must be left untouched.
{
type: "image",
attrs: { src: "https://cdn.example.com/remote.png" },
},
],
};
}
// Build a client wired to a server that logs in, serves the page, and serves the
// internal file bytes. `fileStatus` lets a test force the file fetch to fail;
// `doc` overrides the served page; `fileBytes`/`fileHeaders` shape the file
// response (used by the empty-body / missing-Content-Type branch tests).
async function buildClient(
sandbox,
{
fileStatus = 200,
doc = pageDoc(),
fileBytes = IMAGE_BYTES,
fileHeaders = { "Content-Type": "image/png" },
} = {},
) {
const baseURL = await spawn(async (req, res) => {
await readBody(req);
if (req.url === "/api/auth/login") {
res.writeHead(200, {
"Content-Type": "application/json",
"Set-Cookie": "authToken=tok; HttpOnly",
});
res.end(JSON.stringify({ token: "tok" }));
return;
}
if (req.url === "/api/pages/info") {
res.writeHead(200, { "Content-Type": "application/json" });
res.end(JSON.stringify({ data: { id: "page-1", title: "T", content: doc } }));
return;
}
if (req.url.startsWith("/api/files/")) {
if (fileStatus !== 200) {
res.writeHead(fileStatus);
res.end();
return;
}
res.writeHead(200, fileHeaders);
res.end(fileBytes);
return;
}
res.writeHead(404);
res.end();
});
return new DocmostClient({
apiUrl: baseURL,
email: "u@example.com",
password: "pw",
sandbox: {
put: (buf, mime) => sandbox.put(buf, mime),
has: (uri) => sandbox.has(uri),
evict: (uri) => sandbox.evict(uri),
},
});
}
// A page with several DISTINCT internal images (each a unique attachment id) so
// each is its own sandbox blob — needed to exercise FIFO self-eviction.
function multiImageDoc(n) {
return {
type: "doc",
content: Array.from({ length: n }, (_, i) => ({
type: "image",
attrs: { src: `/api/files/att-${i}/pic.png`, attachmentId: `att-${i}` },
})),
};
}
test("stashPage stores the doc + mirrors/rewrites internal images, returns only a link", async () => {
const sandbox = makeSandbox();
const client = await buildClient(sandbox);
const result = await client.stashPage("page-1");
// Returns ONLY a link shape — never the document body.
assert.equal(typeof result.uri, "string");
assert.match(result.uri, /^https:\/\/sb\.test\/api\/sb\//);
assert.equal(typeof result.sha256, "string");
assert.equal(typeof result.size, "number");
assert.ok(!("doc" in result) && !("content" in result) && !("body" in result));
assert.deepEqual(result.images, { mirrored: 1, failed: 0 });
// One image blob (dedup) + one doc blob = 2 puts.
assert.equal(sandbox.puts.length, 2);
const imagePut = sandbox.puts[0];
const docPut = sandbox.puts[1];
assert.equal(imagePut.mime, "image/png");
assert.ok(imagePut.buf.equals(IMAGE_BYTES));
assert.equal(docPut.mime, "application/json");
// The returned uri/sha256 are the DOCUMENT blob's.
assert.equal(result.sha256, docPut.sha256);
// Inspect the stashed document: internal srcs rewritten, external untouched.
const stashed = JSON.parse(docPut.buf.toString("utf8"));
const imgs = stashed.content.content.filter((n) => n.type === "image");
assert.equal(imgs[0].attrs.src, "https://sb.test/api/sb/id-0");
assert.equal(imgs[1].attrs.src, "https://sb.test/api/sb/id-0"); // same blob (dedup)
assert.equal(imgs[2].attrs.src, "https://cdn.example.com/remote.png"); // external kept
});
test("stashPage counts a failed image fetch without aborting the document", async () => {
const sandbox = makeSandbox();
const client = await buildClient(sandbox, { fileStatus: 500 });
const result = await client.stashPage("page-1");
assert.deepEqual(result.images, { mirrored: 0, failed: 1 });
// Only the doc blob was stored (image fetch failed).
assert.equal(sandbox.puts.length, 1);
assert.equal(sandbox.puts[0].mime, "application/json");
// The failed internal src is LEFT as-is so nothing is silently dropped.
const stashed = JSON.parse(sandbox.puts[0].buf.toString("utf8"));
const imgs = stashed.content.content.filter((n) => n.type === "image");
assert.equal(imgs[0].attrs.src, "/api/files/att-1/pic.png");
});
test("stashPage throws a clear error when no sandbox is configured", async () => {
const baseURL = await spawn(async (req, res) => {
await readBody(req);
res.writeHead(200, { "Content-Type": "application/json" });
res.end(JSON.stringify({}));
});
const client = new DocmostClient({
apiUrl: baseURL,
email: "u@example.com",
password: "pw",
});
await assert.rejects(() => client.stashPage("page-1"), /not configured/);
});
test("stashPage reverts a FIFO-evicted image and counts it as failed (B1)", async () => {
// 3 distinct images of S=4000 bytes each; doc JSON is far smaller than one
// image. With a cap of 4500: storing img1 evicts img0, storing img2 evicts
// img1 — so only img2 survives the loop (img0 + img1 reverted). The doc
// (4000 + a few hundred bytes <= 4500) then fits alongside the survivor, so it
// does NOT trigger further eviction. The stored doc must therefore reference
// exactly one live blob and revert the other two to their internal srcs.
const BIG = Buffer.alloc(4000, 0x41);
const sandbox = makeSandbox({ maxTotal: 4500 });
const client = await buildClient(sandbox, {
doc: multiImageDoc(3),
fileBytes: BIG,
});
const result = await client.stashPage("page-1");
// Two images were evicted before the doc was stored -> counted as failed.
assert.deepEqual(result.images, { mirrored: 1, failed: 2 });
// Inspect the stashed doc: no node may point at an evicted (now-dead) blob,
// and every reverted node carries its ORIGINAL internal src again.
const docPut = sandbox.puts.find((p) => p.mime === "application/json");
const stashed = JSON.parse(docPut.buf.toString("utf8"));
const imgs = stashed.content.content.filter((n) => n.type === "image");
let live = 0;
let reverted = 0;
for (const img of imgs) {
const src = img.attrs.src;
if (src.startsWith("https://sb.test/api/sb/")) {
assert.ok(sandbox.has(src), `doc references evicted blob ${src}`);
live++;
} else {
// Reverted to the original internal src.
assert.match(src, /^\/api\/files\/att-\d+\/pic\.png$/);
reverted++;
}
}
assert.equal(live, 1);
assert.equal(reverted, 2);
});
test("stashPage reverts an image evicted by the DOC put itself (after-put reconcile, B1)", async () => {
// Both images (1000 bytes each) survive the image phase: total 2000 <= cap
// 2500. The doc, however, serializes large (a node with a ~700-byte string
// attr), so putting it (newest) tips total over the cap and FIFO-evicts the
// OLDEST image (img0) — an eviction caused by the doc put itself, which only
// the after-put reconciliation can catch. The loop then reverts img0, drops
// the stale doc blob, and re-puts the corrected doc (now total = img1 +
// docSize <= cap, so img1 survives).
const BIG = Buffer.alloc(1000, 0x41);
const sandbox = makeSandbox({ maxTotal: 2500 });
const doc = {
type: "doc",
content: [
{ type: "image", attrs: { src: "/api/files/att-0/pic.png", attachmentId: "att-0" } },
{ type: "image", attrs: { src: "/api/files/att-1/pic.png", attachmentId: "att-1" } },
// Bulk the doc JSON up so the doc put crosses the cap on its own. Stays in
// the doc across reverts, so each re-serialization is similarly large.
{ type: "paragraph", attrs: { filler: "x".repeat(700) }, content: [] },
],
};
const client = await buildClient(sandbox, { doc, fileBytes: BIG });
const result = await client.stashPage("page-1");
// The doc put evicted exactly one image -> reverted + counted as failed.
assert.deepEqual(result.images, { mirrored: 1, failed: 1 });
// Use the LAST json put: the first (stale) doc referenced the now-dead blob
// and was itself evicted; the corrected re-put is the one that stands.
const docPut = sandbox.puts.filter((p) => p.mime === "application/json").at(-1);
const stashed = JSON.parse(docPut.buf.toString("utf8"));
const imgs = stashed.content.content.filter((n) => n.type === "image");
let live = 0;
let reverted = 0;
for (const img of imgs) {
const src = img.attrs.src;
if (src.startsWith("https://sb.test/api/sb/")) {
assert.ok(sandbox.has(src), `final doc references evicted blob ${src}`);
live++;
} else {
assert.match(src, /^\/api\/files\/att-\d+\/pic\.png$/);
reverted++;
}
}
assert.equal(live, 1);
assert.equal(reverted, 1);
});
test("stashPage frees image blobs when the doc put throws (B1)", async () => {
// Two distinct images mirror fine; the final JSON doc put throws (doc exceeds
// cap). stashPage must reject AND evict every image blob it stored this op.
const sandbox = makeSandbox({ throwOnJson: true });
const client = await buildClient(sandbox, { doc: multiImageDoc(2) });
await assert.rejects(() => client.stashPage("page-1"));
// Both image blobs were stored, then evicted on the doc-put failure.
const imagePuts = sandbox.puts.filter((p) => p.mime === "image/png");
assert.equal(imagePuts.length, 2);
for (const p of imagePuts) {
assert.ok(sandbox.evicted.includes(p.id), `image ${p.id} was not freed`);
}
});
test("stashPage counts an empty file response as failed (B1/fetchInternalFile)", async () => {
const sandbox = makeSandbox();
const client = await buildClient(sandbox, {
fileBytes: Buffer.alloc(0),
fileHeaders: { "Content-Type": "image/png", "Content-Length": "0" },
});
const result = await client.stashPage("page-1");
// The single internal image (deduped) yielded an empty body -> failed.
assert.deepEqual(result.images, { mirrored: 0, failed: 1 });
// Only the doc blob was stored.
assert.equal(sandbox.puts.filter((p) => p.mime === "image/png").length, 0);
});
test("stashPage mirrors a file with no Content-Type as octet-stream (fetchInternalFile)", async () => {
const sandbox = makeSandbox();
// No Content-Type header at all -> fetchInternalFile defaults to octet-stream.
const client = await buildClient(sandbox, { fileHeaders: {} });
const result = await client.stashPage("page-1");
assert.equal(result.images.mirrored, 1);
const imagePut = sandbox.puts.find((p) => p.mime !== "application/json");
assert.ok(imagePut, "expected an image put");
assert.equal(imagePut.mime, "application/octet-stream");
});

View File

@@ -0,0 +1,212 @@
import { test } from "node:test";
import assert from "node:assert/strict";
import { readFileSync } from "node:fs";
import { fileURLToPath } from "node:url";
import { dirname, resolve } from "node:path";
import { DocmostClient } from "../../build/index.js";
// Drift guard for the THIRD hand-written layer of the AI tool set (issue #193,
// layer 3): the in-app server hand-mirrors the DocmostClient method signatures
// it consumes as the `DocmostClientLike` interface in
// apps/server/src/core/ai-chat/tools/docmost-client.loader.ts ("Signatures here
// mirror that file exactly"). That mirror lives across the ESM(mcp)/CJS(server)
// boundary and the package ships NO .d.ts, so the server typecheck cannot verify
// the names against the real class — a rename/removal in client.ts would surface
// only as a runtime "x is not a function" inside an agent tool call.
//
// SCOPE: this guard checks the method-NAME set only, not signatures. It pins the
// contract from the mcp side (ESM, where the real class is directly importable):
// every method the embedding host depends on MUST exist as a function on a real
// DocmostClient instance. If you rename/remove a client method, this fails here
// AND you must update DocmostClientLike to match. It does NOT verify parameter or
// return-type parity — signature drift between the hand-mirror and client.ts can
// still ship silently; full signature/type parity is the deferred staged-plan
// item below.
//
// Keep the HOST_CONTRACT_METHODS NAME list aligned with the method NAMES declared
// in the server's DocmostClientLike interface (the in-app per-user tool adapter
// only — it is a SUBSET of the DocmostClient surface — covers only what the in-app adapter
// consumes; the standalone MCP transport (packages/mcp/src/index.ts) calls additional
// client methods (insertImage/replaceImage/deleteComment/updateComment/insertFootnote)
// that this guard does NOT track — the MCP transport's own typecheck covers those). Full type-derivation
// of DocmostClientLike from this class is deferred (see the staged plan in
// docmost-client.loader.ts): the package emits no declarations and the real
// (inferred, concrete) return types conflict with the host's loose
// `Record<string,unknown>` + `as`-cast result handling.
const HOST_CONTRACT_METHODS = [
// read
"search",
"getPage",
"getWorkspace",
"getSpaces",
"listPages",
"listSidebarPages",
"getOutline",
"getPageJson",
"getNode",
"getTable",
"listComments",
"getComment",
"checkNewComments",
"listShares",
"listPageHistory",
"getPageHistory",
"diffPageVersions",
"exportPageMarkdown",
// write (page)
"createPage",
"updatePage",
"renamePage",
"movePage",
"deletePage",
"editPageText",
"patchNode",
"insertNode",
"deleteNode",
"updatePageJson",
"tableInsertRow",
"tableDeleteRow",
"tableUpdateCell",
"copyPageContent",
"importPageMarkdown",
"sharePage",
"unsharePage",
"restorePageVersion",
"transformPage",
// write (comment)
"createComment",
"resolveComment",
];
test("DocmostClient implements every method the in-app DocmostClientLike mirror declares", () => {
// The constructor is side-effect-free (no network/login on construction): it
// only stores config and creates an axios instance, so it is safe to build a
// throwaway instance here with a dummy token provider.
const client = new DocmostClient({
apiUrl: "http://127.0.0.1:1/api",
getToken: async () => "test-token",
});
const missing = HOST_CONTRACT_METHODS.filter(
(name) => typeof client[name] !== "function",
);
assert.deepEqual(
missing,
[],
`DocmostClient is missing host-contract method(s): ${missing.join(", ")}. ` +
`Update packages/mcp/src/client.ts and/or the server's DocmostClientLike ` +
`interface (apps/server/src/core/ai-chat/tools/docmost-client.loader.ts) ` +
`so the hand-mirrored method NAMES stay aligned (this guards names only, ` +
`not signatures).`,
);
});
test("HOST_CONTRACT_METHODS has no duplicates", () => {
assert.equal(
new Set(HOST_CONTRACT_METHODS).size,
HOST_CONTRACT_METHODS.length,
);
});
// Parse the method names declared in the server's `DocmostClientLike` interface
// body. We read the .ts source as plain text (no TS compiler dep, and the file
// lives in the CJS server tree across the ESM boundary): scan from the
// `export interface DocmostClientLike {` line to its closing brace at column 0,
// matching member-signature lines like ` methodName(`. Nested param-object
// braces (`opts: { ... }`) are indented, so only the interface's own closing
// `}` (column 0) ends the scan.
function parseDocmostClientLikeMethods() {
const here = dirname(fileURLToPath(import.meta.url));
// packages/mcp/test/unit -> repo root is four levels up.
const loaderPath = resolve(
here,
"../../../../apps/server/src/core/ai-chat/tools/docmost-client.loader.ts",
);
let source;
try {
source = readFileSync(loaderPath, "utf8");
} catch (err) {
if (err && err.code === "ENOENT") {
throw new Error(
`Expected monorepo layout; server tree at ${loaderPath} not found. ` +
`This drift-guard reads the server's DocmostClientLike interface via a ` +
`fixed relative path and must run from inside the monorepo checkout.`,
);
}
throw err;
}
const lines = source.split(/\r?\n/);
const startIdx = lines.findIndex((l) =>
/^export interface DocmostClientLike\s*\{/.test(l),
);
assert.notEqual(
startIdx,
-1,
`Could not find "export interface DocmostClientLike {" in ${loaderPath}. ` +
`If the interface was renamed/moved, update this drift-guard test.`,
);
const methods = [];
let closed = false;
// Track whether we are inside a `/* ... */` block comment. Inner lines of a
// block comment need NOT start with `*`, so a `name(` line inside one would be
// falsely parsed as an interface method without this. (`//` line comments can
// never match the method regex below since they start with `/`.)
let inBlockComment = false;
for (let i = startIdx + 1; i < lines.length; i++) {
const line = lines[i];
if (inBlockComment) {
// Stay in the block until we see its closing `*/`.
if (line.includes("*/")) inBlockComment = false;
continue;
}
// Enter a block comment only when it opens without closing on the same line;
// a self-contained `/* ... */` on one line cannot precede a method name we
// care about (such lines start with `/`, so the method regex won't match).
if (line.includes("/*") && !line.includes("*/")) {
inBlockComment = true;
continue;
}
if (/^\}/.test(line)) {
closed = true;
break;
}
// Method-name match: a TS identifier (letters/digits/`_`/`$`, not starting
// with a digit) optionally followed by a generic clause (`method<T>(`), then
// the opening paren of the signature.
const m = /^\s*([A-Za-z_$][A-Za-z0-9_$]*)\s*(?:<[^>]*>)?\(/.exec(line);
if (m) methods.push(m[1]);
}
assert.ok(
closed,
`Did not find the closing brace of DocmostClientLike in ${loaderPath}.`,
);
assert.ok(
methods.length > 0,
`Parsed zero methods from DocmostClientLike in ${loaderPath} — the parser ` +
`is likely out of date with the interface formatting.`,
);
return methods;
}
// The point of the guard is to protect the DocmostClientLike mirror <-> client.ts
// link, but HOST_CONTRACT_METHODS is itself a HAND-COPY of that interface kept in
// sync manually. The list<->interface link must be tested too: a method consumed
// by the adapter and added to DocmostClientLike but forgotten here (or removed
// from the interface but left here) would otherwise escape both the server
// typecheck (pkg emits no .d.ts) and the first test above (name not in the list).
// Assert the two agree BOTH ways.
test("HOST_CONTRACT_METHODS exactly mirrors the server's DocmostClientLike interface", () => {
const interfaceMethods = parseDocmostClientLikeMethods();
assert.deepEqual(
[...HOST_CONTRACT_METHODS].sort(),
[...interfaceMethods].sort(),
`HOST_CONTRACT_METHODS has drifted from the DocmostClientLike interface in ` +
`apps/server/src/core/ai-chat/tools/docmost-client.loader.ts. Add/remove ` +
`method names in HOST_CONTRACT_METHODS so it lists EXACTLY the methods ` +
`declared in that interface (both directions are checked).`,
);
});

View File

@@ -1,101 +0,0 @@
// Unit tests for the internal-file URL helpers the stash tool relies on. The
// critical case is resolveInternalFilePath, whose whole job is to REJECT a
// content-controlled `src` that tries to escape /api/files/ (SSRF / traversal)
// before it ever reaches the authenticated loopback client.
import { test } from "node:test";
import assert from "node:assert/strict";
import {
resolveInternalFilePath,
normalizeFileUrl,
collectInternalFileNodes,
} from "../../build/lib/internal-file-urls.js";
test("resolveInternalFilePath accepts a normal internal src", () => {
assert.equal(
resolveInternalFilePath("/api/files/att-1/pic.png"),
"/files/att-1/pic.png",
);
});
test("resolveInternalFilePath rejects traversal / encoded variants (SSRF guard)", () => {
// `..` collapses to /api/auth/whoami -> outside /api/files/ -> rejected.
assert.throws(() => resolveInternalFilePath("/api/files/../auth/whoami"));
// Escapes the /api base entirely.
assert.throws(() => resolveInternalFilePath("/api/files/../../internal"));
// Percent-encoded dot -> rejected before canonicalization.
assert.throws(() => resolveInternalFilePath("/api/files/%2e%2e/x"));
// Percent-encoded slash separator -> rejected before canonicalization.
assert.throws(() => resolveInternalFilePath("/api/files/..%2fauth"));
});
test("resolveInternalFilePath drops a foreign host and keeps only the /api/files/ pathname (SSRF accept-path)", () => {
// ACCEPT path: an absolute URL has its host dropped; only the canonical
// pathname survives, and it must still start with /api/files/. This is SAFE
// because the loopback axios client ignores any host in `src` and uses its own
// /api baseURL — so a foreign host like evil.com is never contacted. This is
// the SOLE SSRF/traversal guard for content-controlled `src`, so it must be
// pinned: a future refactor to a prefix-only check would silently open a
// bypass with no failing test.
assert.equal(
resolveInternalFilePath("http://evil.com/api/files/x/y.png"),
"/files/x/y.png",
);
// Protocol-relative URL: host likewise dropped, pathname kept.
assert.equal(
resolveInternalFilePath("//evil.com/api/files/x/y.png"),
"/files/x/y.png",
);
});
test("resolveInternalFilePath rejects a foreign-host src whose pathname escapes /api/files/", () => {
// Even though the host is dropped, the canonical pathname /api/auth/whoami
// does NOT start with /api/files/, so it is rejected.
assert.throws(() =>
resolveInternalFilePath("https://evil.com/api/auth/whoami"),
);
// The WHATWG URL parser converts backslashes to `/` for http(s), so this
// collapses to /api/auth/whoami and escapes the /api/files/ subtree.
assert.throws(() => resolveInternalFilePath("/api/files\\..\\auth\\whoami"));
});
test("resolveInternalFilePath wraps a new URL parse failure in a clear error", () => {
// `http://[` has no %2e/%2f so it passes the first guard, then fails the
// `new URL(...)` parse — exercising the catch branch that re-throws with a
// clear message.
assert.throws(
() => resolveInternalFilePath("http://["),
/Invalid internal file src/,
);
});
test("normalizeFileUrl rewrites the bare /files/ branch and leaves /api/files/ alone", () => {
assert.equal(
normalizeFileUrl("/files/att-1/pic.png"),
"/api/files/att-1/pic.png",
);
assert.equal(
normalizeFileUrl("/api/files/att-1/pic.png"),
"/api/files/att-1/pic.png",
);
});
test("collectInternalFileNodes recurses into nested content containers", () => {
// The internal image is buried inside a callout's content array, so a
// regression on the recursion (e.g. a shallow .filter()) would miss it.
const nested = {
type: "image",
attrs: { src: "/api/files/att-9/deep.png", attachmentId: "att-9" },
};
const doc = {
type: "doc",
content: [
{
type: "callout",
content: [{ type: "paragraph", content: [nested] }],
},
],
};
const found = collectInternalFileNodes(doc);
assert.equal(found.length, 1);
assert.equal(found[0], nested);
});