Files
gitmost/packages/mcp/build/lib/internal-file-urls.js
claude code agent 227 2fe4ca8537 feat(sandbox): in-RAM blob sandbox for out-of-band page transfer (#243)
Add an ephemeral, process-local blob store so the in-app agent (and the
embedded MCP) can hand a large page document and its images to an external
consumer WITHOUT routing the bytes through the model context or Docmost auth.

- SandboxStore (@Injectable singleton): Map<uuid,{buf,mime,sha256,expiresAt}>
  in RAM only. put() picks a per-blob cap by mime (image vs doc), enforces a
  total-bytes RAM guard with oldest-first eviction, and stamps a TTL; get()
  lazily expires. sha256 computed at put() doubles as the strong ETag. An
  unref'd sweep interval clears expired entries and is cleared on destroy.
- GET /api/sb/:uuid anonymous controller: serves raw bytes with Content-Type,
  Content-Length and ETag=sha256; 404 on missing/expired/non-UUID (anti-
  traversal), 304 on a matching If-None-Match. No tokens, no 401 — the
  capability is the unguessable UUID + short TTL + TLS. Auth-exempt the same
  way as /api/files/public (no JwtAuthGuard) plus an /api/sb entry in main.ts's
  workspace-resolution preHandler so a remote consumer with no workspace host
  is not rejected.
- stash_page tool in both layers (MCP resource_link + in-app {uri,size,sha256,
  images}). client.stashPage serializes the get_page_json shape, mirrors every
  INTERNAL file/image src (type-agnostic, covers drawio/excalidraw/video/file)
  into the sandbox under Docmost auth and rewrites src to the sandbox URL;
  external http(s) srcs are left untouched; dedup by src; a failed image fetch
  is counted, never aborts the doc.
- SANDBOX_PUBLIC_URL / SANDBOX_TTL_MS / SANDBOX_MAX_BYTES /
  SANDBOX_MAX_IMAGE_BYTES / SANDBOX_MAX_TOTAL_BYTES wired through the
  environment service + validation + .env.example.
- SandboxModule (@Global) provides the shared store to the controller,
  McpService and AiChatToolsService (same instance for put and get).

Tests: SandboxStore (round-trip, sha256, TTL lazy + sweep, caps, eviction),
SandboxController (200+ETag+CT+CL, 404 missing/expired/non-UUID, 304), and a
mock-HTTP stashPage test (mirror+rewrite internal, keep external, dedup, failed
image counted, returns only a link). Interoperates with the vvzvlad/habr-mcp
consumer's anonymous-GET + sha256-ETag + resource_link contract.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 15:13:11 +03:00

56 lines
2.1 KiB
JavaScript

// Detection + collection of INTERNAL Docmost file URLs inside a ProseMirror doc.
//
// An internal file URL is a relative path served by Docmost's authenticated
// attachment route (`GET /api/files/:fileId/:fileName`). It is useless to an
// external consumer (relative + needs a Docmost session), so the stash tool
// mirrors every such resource into the blob sandbox and rewrites its `src`.
//
// The criterion is "internal file URL", NOT the node TYPE: image, drawio,
// excalidraw, video and file nodes all carry such a `src`, so a type-agnostic
// walker covers them all. External http(s) srcs (CDNs) are left untouched.
//
// Mirrors editor-ext's isInternalFileUrl / normalizeFileUrl (kept as a local
// dup so the ESM mcp package does not depend on the editor-ext build).
export function isInternalFileUrl(url) {
if (typeof url !== "string")
return false;
const normalized = url.trim();
return (normalized.startsWith("/api/files/") || normalized.startsWith("/files/"));
}
/** Normalize a bare `/files/...` src to the canonical `/api/files/...` form. */
export function normalizeFileUrl(src) {
const trimmed = src.trim();
if (trimmed.startsWith("/files/"))
return "/api" + trimmed;
return trimmed;
}
/**
* Recursively collect every node whose `attrs.src` is an internal file URL.
* Returns references to the live nodes (so the caller can rewrite `attrs.src`
* in place on its clone). Descends `content` arrays, covering callouts, tables,
* details and any other nested container.
*/
export function collectInternalFileNodes(doc) {
const out = [];
const visit = (node) => {
if (!node)
return;
if (Array.isArray(node)) {
for (const child of node)
visit(child);
return;
}
if (typeof node !== "object")
return;
if (node.attrs && isInternalFileUrl(node.attrs.src)) {
out.push(node);
}
if (Array.isArray(node.content)) {
for (const child of node.content)
visit(child);
}
};
visit(doc);
return out;
}