fix(mcp): write page body before title to avoid split-brain on failure (#159 )

updatePage (markdown) and updatePageJson wrote the title via REST FIRST, then the body via collab. If the body write failed (e.g. a collab persist timeout), the page was left with the NEW title over its OLD body — a split-brain the tool reported as an error but never repaired (red-team finding #10). Reorder both: write the body first, and only set the title after the body has persisted. Now a body-write failure leaves the title untouched (no split-brain). A title write failing after a successful body is rarer (REST is fast) and leaves correct content under a stale title — the strictly lesser inconsistency — which is the same trade-off the issue's "atomic, or roll back the title" intends, without the fragility of a rollback write that could itself fail. No unit test: both paths require a live collab provider and the suite has no provider mock; the change is a pure reordering. All 306 mcp tests still pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fix(share): SEO route must not leak a restricted page's title (#159 )
2026-06-25 05:14:42 +03:00 · 2026-06-25 05:08:02 +03:00 · 2026-06-25 05:02:44 +03:00 · 2026-06-25 04:52:05 +03:00 · 2026-06-25 04:35:06 +03:00 · 2026-06-25 04:33:57 +03:00
7 changed files with 14 additions and 381 deletions
--- a/.env.example
+++ b/.env.example
@@ -149,19 +149,6 @@ MCP_DOCMOST_PASSWORD=
 # your egress drops idle connections faster than ~10s. Default 10000 (10 s).
 # AI_STREAM_KEEPALIVE_MS=10000

-# Silence timeout (ms) for EXTERNAL-MCP transport ONLY (not the chat provider).
-# Tighter than AI_STREAM_TIMEOUT_MS so a byte-silent/hung MCP server is broken in
-# ~5 min instead of 15. Note it also cuts a legitimately long but byte-silent
-# single tool call (a slow crawl that emits nothing until done) and an SSE
-# transport idling >5 min BETWEEN tool calls. Default 300000 (5 min).
-# AI_MCP_STREAM_TIMEOUT_MS=300000
-
-# Total wall-clock cap (ms) for ONE external MCP tool call (app-level, not
-# transport). Aborts a tool that keeps the socket warm (SSE heartbeats / trickle)
-# but never returns a result — which the silence timeout above never breaks.
-# Default 900000 (15 min).
-# AI_MCP_CALL_TIMEOUT_MS=900000
-
 # --- Anonymous public-share AI assistant ---
 # Opt-in per workspace (AI settings -> "public share assistant"; off by default).
 # When enabled, anonymous visitors of a published share can ask an AI about that
--- a/apps/server/package.json
+++ b/apps/server/package.json
@@ -11,7 +11,7 @@
    "start": "cross-env NODE_ENV=development nest start",
    "start:dev": "cross-env NODE_ENV=development nest start --watch",
    "start:debug": "cross-env NODE_ENV=development nest start --debug --watch",
-    "start:prod": "cross-env NODE_ENV=production node --heapsnapshot-near-heap-limit=2 dist/main",
+    "start:prod": "cross-env NODE_ENV=production node dist/main",
    "collab:prod": "cross-env NODE_ENV=production node dist/collaboration/server/collab-main",
    "collab:dev": "cross-env NODE_ENV=development node dist/collaboration/server/collab-main",
    "email:dev": "email dev -p 5019 -d ./src/integrations/transactional/emails",
--- a/apps/server/src/core/ai-chat/ai-chat.service.ts
+++ b/apps/server/src/core/ai-chat/ai-chat.service.ts
@@ -575,19 +575,6 @@ export class AiChatService {
        },
      });

-      // Drain the stream independently of the client socket so the turn always
-      // runs to completion (or to its abort) and the terminal callbacks
-      // (onFinish/onError/onAbort) fire — releasing the per-turn object graph
-      // (history, the per-request toolset closures, captured steps, SDK buffers)
-      // and closing leased MCP clients. WITHOUT this, a client disconnect leaves
-      // the pipe's dead socket as the only reader; backpressure stalls the stream,
-      // the callbacks never run, and every dropped turn stays rooted in memory —
-      // the heap-OOM leak. consumeStream removes that backpressure (AI SDK v6
-      // "Handling client disconnects"). NOT awaited (fire-and-forget); the stream
-      // errors are already logged by the streamText `onError` callback above, so
-      // swallow here to avoid an unhandledRejection.
-      void result.consumeStream({ onError: () => undefined });
-
      // Stream the UI-message protocol straight to the hijacked Node response.
      // Without onError the AI SDK masks the cause ('An error occurred.') and the
      // UI shows a generic failure. Surface the real provider message instead.
--- a/apps/server/src/core/ai-chat/external-mcp/mcp-call-timeout.spec.ts
+++ b/apps/server/src/core/ai-chat/external-mcp/mcp-call-timeout.spec.ts
@@ -1,205 +0,0 @@
-import { type Tool, type ToolCallOptions } from 'ai';
-import {
-  wrapToolWithCallTimeout,
-  wrapToolsWithCallTimeout,
-} from './mcp-clients.service';
-import {
-  mcpStreamTimeoutMs,
-  mcpCallTimeoutMs,
-} from '../../../integrations/ai/ai-streaming-fetch';
-
-/**
- * Per-call total-timeout guard for external MCP tools (mcp-clients.service).
- *
- * `@ai-sdk/mcp`'s tool execute has NO built-in per-call timeout — a tool that
- * keeps the connection warm but never returns is otherwise unbounded. The
- * wrapper attaches a fresh AbortController + timer per CALL and composes it with
- * the turn's abortSignal via AbortSignal.any, so EITHER the per-call timeout OR a
- * client disconnect aborts the in-flight call.
- *
- * Fake timers prove the timeout fires WITHOUT real waiting; no leaked timer keeps
- * the process alive after a fast resolve.
- */
-const CALL_TIMEOUT_MS = 900_000;
-
-/** Build a Tool around an `execute` impl, mirroring the SDK's minimal shape. */
-function toolWith(
-  execute: (args: unknown, options: ToolCallOptions) => unknown,
-): Tool {
-  return { description: 'x', inputSchema: undefined, execute } as unknown as Tool;
-}
-
-/** Invoke a (possibly wrapped) tool's execute with an optional turn signal. */
-function callExecute(
-  tool: Tool,
-  args: unknown,
-  abortSignal?: AbortSignal,
-): unknown {
-  const execute = tool.execute as (
-    args: unknown,
-    options: ToolCallOptions,
-  ) => unknown;
-  return execute(args, { abortSignal } as ToolCallOptions);
-}
-
-describe('wrapToolWithCallTimeout', () => {
-  beforeEach(() => jest.useFakeTimers());
-  afterEach(() => {
-    jest.clearAllTimers();
-    jest.useRealTimers();
-  });
-
-  it('aborts a tool that only rejects when its abortSignal fires, after ms elapses', async () => {
-    // The tool resolves NEVER on its own — it only settles when the abortSignal
-    // it is handed aborts. So a resolution proves the per-call timer fired and
-    // aborted the call (not the tool finishing by itself).
-    let received: AbortSignal | undefined;
-    const tool = toolWith((_args, options) => {
-      received = options.abortSignal;
-      return new Promise((_resolve, reject) => {
-        options.abortSignal?.addEventListener('abort', () => {
-          reject(options.abortSignal?.reason ?? new Error('aborted'));
-        });
-      });
-    });
-
-    const wrapped = wrapToolWithCallTimeout(tool, CALL_TIMEOUT_MS);
-    const promise = callExecute(wrapped, { q: 'x' }) as Promise<unknown>;
-    // Attach the rejection handler synchronously so advancing timers cannot mark
-    // it an unhandled rejection.
-    const settled = promise.then(
-      () => ({ ok: true as const }),
-      (err: unknown) => ({ ok: false as const, err }),
-    );
-
-    // Nothing fired yet.
-    jest.advanceTimersByTime(CALL_TIMEOUT_MS - 1);
-    // Past the cap -> the per-call timer aborts the composed signal.
-    jest.advanceTimersByTime(2);
-
-    const result = await settled;
-    expect(result.ok).toBe(false);
-    expect(received).toBeInstanceOf(AbortSignal);
-    // The abort reason / rejection mentions the timeout.
-    const message =
-      (result as { err: unknown }).err instanceof Error
-        ? ((result as { err: Error }).err.message)
-        : String((result as { err: unknown }).err);
-    expect(message).toMatch(/timed out after 900000ms/);
-  });
-
-  it('aborts a REAL-client-style tool that never settles and ignores abort (race fix)', async () => {
-    // Models the ACTUAL @ai-sdk/mcp semantics: its in-flight promise does NOT
-    // reject on abort (it only checks the signal when a response arrives), so a
-    // warm-but-stuck call NEVER settles on its own and does NOT listen to the
-    // abort signal. The wrapper must still reject after `ms` via the race — an
-    // implementation that merely `await original(...)` would hang here forever.
-    // This test FAILS against the old await-only code and PASSES with the race.
-    const tool = toolWith(() => new Promise(() => {})); // never settles, no abort
-    const wrapped = wrapToolWithCallTimeout(tool, CALL_TIMEOUT_MS);
-    const promise = callExecute(wrapped, { q: 'x' }) as Promise<unknown>;
-    // Assert the rejection without hanging: drive fake time async so the timer's
-    // abort -> race rejection microtasks flush, then await the rejection.
-    const expectation = expect(promise).rejects.toThrow(/timed out after 900000ms/);
-    await jest.advanceTimersByTimeAsync(CALL_TIMEOUT_MS + 1);
-    await expectation;
-  });
-
-  it('passes a fast tool through and leaks no timer (advancing later does not throw)', async () => {
-    const tool = toolWith(() => Promise.resolve('fast-result'));
-    const wrapped = wrapToolWithCallTimeout(tool, CALL_TIMEOUT_MS);
-
-    const value = await (callExecute(wrapped, {}) as Promise<unknown>);
-    expect(value).toBe('fast-result');
-
-    // The timer was cleared in the finally — advancing past the cap aborts
-    // nothing and throws nothing.
-    expect(() => jest.advanceTimersByTime(CALL_TIMEOUT_MS * 2)).not.toThrow();
-  });
-
-  it('aborts when the caller turn signal aborts before the timeout (disconnect path)', async () => {
-    // Real-client semantics: the tool never settles and does NOT listen to abort,
-    // so the wrapper must reject via the race when the caller's turn signal (a
-    // client disconnect) aborts BEFORE the per-call cap. The race propagates the
-    // caller's abort reason.
-    const tool = toolWith(() => new Promise(() => {})); // never settles, no abort
-    const wrapped = wrapToolWithCallTimeout(tool, CALL_TIMEOUT_MS);
-    const turn = new AbortController();
-    const promise = callExecute(wrapped, {}, turn.signal) as Promise<unknown>;
-    const settled = promise.then(
-      () => ({ ok: true as const }),
-      (err: unknown) => ({ ok: false as const, err }),
-    );
-
-    // Disconnect well before the cap; the per-call timer never fires here.
-    turn.abort(new Error('client disconnected'));
-    const result = await settled;
-    expect(result.ok).toBe(false);
-    const message =
-      (result as { err: unknown }).err instanceof Error
-        ? (result as { err: Error }).err.message
-        : String((result as { err: unknown }).err);
-    // The caller's abort reason propagates through the race.
-    expect(message).toMatch(/client disconnected/);
-  });
-
-  it('passes a tool with no execute through unchanged', () => {
-    const noExecute = { description: 'x', inputSchema: undefined } as unknown as Tool;
-    const wrapped = wrapToolWithCallTimeout(noExecute, CALL_TIMEOUT_MS);
-    // Same object back, execute still absent.
-    expect(wrapped).toBe(noExecute);
-    expect((wrapped as { execute?: unknown }).execute).toBeUndefined();
-  });
-});
-
-describe('wrapToolsWithCallTimeout', () => {
-  beforeEach(() => jest.useFakeTimers());
-  afterEach(() => {
-    jest.clearAllTimers();
-    jest.useRealTimers();
-  });
-
-  it('wraps every tool in the map (each call gets its own guard)', async () => {
-    const tools: Record<string, Tool> = {
-      a: toolWith(() => Promise.resolve('A')),
-      b: toolWith(() => Promise.resolve('B')),
-    };
-    const out = wrapToolsWithCallTimeout(tools, CALL_TIMEOUT_MS);
-    expect(Object.keys(out)).toEqual(['a', 'b']);
-    expect(await (callExecute(out.a, {}) as Promise<unknown>)).toBe('A');
-    expect(await (callExecute(out.b, {}) as Promise<unknown>)).toBe('B');
-  });
-});
-
-describe('mcp timeout env helpers', () => {
-  const ORIG_SILENCE = process.env.AI_MCP_STREAM_TIMEOUT_MS;
-  const ORIG_CALL = process.env.AI_MCP_CALL_TIMEOUT_MS;
-  afterEach(() => {
-    if (ORIG_SILENCE === undefined) delete process.env.AI_MCP_STREAM_TIMEOUT_MS;
-    else process.env.AI_MCP_STREAM_TIMEOUT_MS = ORIG_SILENCE;
-    if (ORIG_CALL === undefined) delete process.env.AI_MCP_CALL_TIMEOUT_MS;
-    else process.env.AI_MCP_CALL_TIMEOUT_MS = ORIG_CALL;
-  });
-
-  it('mcpStreamTimeoutMs defaults to 5 min and honors a positive override', () => {
-    delete process.env.AI_MCP_STREAM_TIMEOUT_MS;
-    expect(mcpStreamTimeoutMs()).toBe(300_000);
-    process.env.AI_MCP_STREAM_TIMEOUT_MS = '60000';
-    expect(mcpStreamTimeoutMs()).toBe(60_000);
-    for (const bad of ['0', '-1', 'x', '']) {
-      process.env.AI_MCP_STREAM_TIMEOUT_MS = bad;
-      expect(mcpStreamTimeoutMs()).toBe(300_000);
-    }
-  });
-
-  it('mcpCallTimeoutMs defaults to 15 min and honors a positive override', () => {
-    delete process.env.AI_MCP_CALL_TIMEOUT_MS;
-    expect(mcpCallTimeoutMs()).toBe(900_000);
-    process.env.AI_MCP_CALL_TIMEOUT_MS = '120000';
-    expect(mcpCallTimeoutMs()).toBe(120_000);
-    for (const bad of ['0', '-1', 'x', '']) {
-      process.env.AI_MCP_CALL_TIMEOUT_MS = bad;
-      expect(mcpCallTimeoutMs()).toBe(900_000);
-    }
-  });
-});
--- a/apps/server/src/core/ai-chat/external-mcp/mcp-clients.service.ts
+++ b/apps/server/src/core/ai-chat/external-mcp/mcp-clients.service.ts
@@ -1,16 +1,12 @@
 import { isIP } from 'node:net';
 import { lookup as dnsLookup, type LookupAddress } from 'node:dns';
 import { Injectable, Logger } from '@nestjs/common';
-import { type Tool, type ToolCallOptions } from 'ai';
+import { type Tool } from 'ai';
 import { createMCPClient } from '@ai-sdk/mcp';
 import { Agent, type Dispatcher } from 'undici';
 import { AiMcpServerRepo } from '@docmost/db/repos/ai-chat/ai-mcp-server.repo';
 import { AiMcpServer } from '@docmost/db/types/entity.types';
-import {
-  streamingDispatcherOptions,
-  mcpStreamTimeoutMs,
-  mcpCallTimeoutMs,
-} from '../../../integrations/ai/ai-streaming-fetch';
+import { streamingDispatcherOptions } from '../../../integrations/ai/ai-streaming-fetch';
 import { SecretBoxService } from '../../../integrations/crypto/secret-box';
 import { isUrlAllowed, isIpAllowed } from './ssrf-guard';

@@ -251,8 +247,6 @@ export class McpClientsService {
    const tools: Record<string, Tool> = {};
    const clients: McpClient[] = [];
    const outcomes: ServerOutcome[] = [];
-    // Per-call total wall-clock cap, read once for this build (env-overridable).
-    const callTimeoutMs = mcpCallTimeoutMs();
    const instructions: McpServerInstruction[] = [];

    for (const server of servers) {
@@ -263,16 +257,12 @@ export class McpClientsService {
        const allow = server.toolAllowlist;
        const picked =
          Array.isArray(allow) && allow.length > 0 ? pick(raw, allow) : raw;
-        // Bound each tool's execute with a per-call total-timeout guard before
-        // merging, so a single chatty-but-stuck call is aborted after the cap.
-        const guarded = wrapToolsWithCallTimeout(picked, callTimeoutMs);
        // Namespace each tool with the sanitized server name AND disambiguate
        // against names already merged from earlier servers, so no external
-        // tool is silently overwritten on collision. The returned count drives
-        // whether this server's prompt guidance is included (≥1 tool merged).
+        // tool is silently overwritten on collision.
        const merged = this.mergeNamespaced(
          tools,
-          guarded,
+          picked,
          server.name,
          server.id,
        );
@@ -459,21 +449,17 @@ export function validateResolvedAddresses(addrs: readonly LookupAddress[]): {
 * to an IP literal).
 */
 function buildPinnedDispatcher(): Agent {
-  // External-MCP traffic uses a DEDICATED, shorter silence timeout
-  // (`AI_MCP_STREAM_TIMEOUT_MS`, default 5 min) — deliberately tighter than the
-  // chat provider's 15-min `streamTimeoutMs()` — so a byte-silent/hung MCP
-  // upstream is broken in ~5 min instead of 15. We keep the keep-alive options
-  // from `streamingDispatcherOptions()` but OVERRIDE headers/body timeouts.
-  // Accepted trade-off: a legitimately long but byte-silent single tool call,
-  // and an SSE transport idling >5 min BETWEEN tool calls, are also cut here; the
-  // per-call total cap (wrapToolsWithCallTimeout, `AI_MCP_CALL_TIMEOUT_MS`) is the
-  // complementary guard for chatty-but-stuck calls that keep the socket warm yet
-  // never return.
-  const mcpSilenceMs = mcpStreamTimeoutMs();
  return new Agent({
+    // Raise undici's default 300s headers/body timeouts on external MCP traffic
+    // to the same generous-but-finite silence timeout the chat fetch uses (#175).
+    // A long agent turn keeps an SSE transport (e.g. crawl4ai's /mcp/sse) open
+    // across the whole turn; that connection can idle BETWEEN tool calls longer
+    // than 5 min, and undici's bodyTimeout would otherwise sever it mid-task — a
+    // tool-call failure that aborts the streamed turn and shows the user "Lost
+    // connection to the AI provider". A slow single tool call (a crawl) can
+    // likewise exceed headersTimeout. The timeout stays FINITE so a genuinely
+    // hung server is still broken eventually.
    ...streamingDispatcherOptions(),
-    headersTimeout: mcpSilenceMs,
-    bodyTimeout: mcpSilenceMs,
    connect: {
      lookup: (hostname, _options, callback) => {
        // Always resolve ALL addresses ourselves; do not trust the caller's
@@ -644,78 +630,6 @@ function disambiguate(
  return capName(`${name.slice(0, MAX_TOOL_NAME_LENGTH - 14)}_${Date.now()}`);
 }

-/**
- * Wrap every tool's execute with a per-call total-timeout guard so a single
- * external MCP tool call that keeps the connection warm but never returns is
- * aborted after `ms` wall-clock (complements the transport silence timeout).
- */
-export function wrapToolsWithCallTimeout(
-  tools: Record<string, Tool>,
-  ms: number,
-): Record<string, Tool> {
-  const out: Record<string, Tool> = {};
-  for (const [name, t] of Object.entries(tools)) {
-    out[name] = wrapToolWithCallTimeout(t, ms);
-  }
-  return out;
-}
-
-/**
- * Per-call total-timeout wrapper for one MCP tool. A fresh AbortController +
- * timer bounds the call; it is composed with the turn's abortSignal via
- * AbortSignal.any so EITHER the per-call timeout OR a client disconnect aborts
- * the call. We RACE the call against the composed abort signal rather than just
- * awaiting it, because @ai-sdk/mcp does NOT settle its in-flight promise on abort
- * (verified in @ai-sdk/mcp@1.0.52: request() only does throwIfAborted() once
- * before send and only re-checks the signal inside the response-message handler,
- * which runs ONLY when a response arrives). So for a warm-but-stuck call awaiting
- * `original` alone would hang forever even after the timer aborts.
- */
-export function wrapToolWithCallTimeout(tool: Tool, ms: number): Tool {
-  const original = tool.execute;
-  if (typeof original !== 'function') return tool;
-  const execute = async (args: unknown, options: ToolCallOptions) => {
-    const controller = new AbortController();
-    const timer = setTimeout(() => {
-      controller.abort(new Error(`MCP tool call timed out after ${ms}ms`));
-    }, ms);
-    timer.unref?.();
-    const abortSignal = options?.abortSignal
-      ? AbortSignal.any([options.abortSignal, controller.signal])
-      : controller.signal;
-    // Reject as soon as the composed signal fires, independent of whether
-    // `original` ever settles. The losing `original` promise is left pending; it
-    // is cleaned up when the client is closed at turn end, and Promise.race
-    // attaches a rejection handler to BOTH inputs so a late rejection of either
-    // is never an unhandled rejection (do NOT add an extra .catch — it could
-    // swallow the real result and would break the race semantics).
-    const aborted = new Promise<never>((_, reject) => {
-      const fail = () => reject(abortReason(abortSignal));
-      if (abortSignal.aborted) fail();
-      else abortSignal.addEventListener('abort', fail, { once: true });
-    });
-    try {
-      return await Promise.race([
-        original(args, { ...options, abortSignal }),
-        aborted,
-      ]);
-    } finally {
-      clearTimeout(timer);
-    }
-  };
-  // `Tool` is a union whose `execute` overloads conflict; cast narrowly so the
-  // wrapped tool keeps every other field while swapping only `execute`.
-  return { ...tool, execute } as unknown as Tool;
-}
-
-/** The signal's reason as an Error (informative thrown value on abort/timeout). */
-function abortReason(signal: AbortSignal): Error {
-  const r = signal.reason;
-  return r instanceof Error
-    ? r
-    : new Error(typeof r === 'string' ? r : 'MCP tool call aborted');
-}
-
 /** Reject a promise after `ms`, so a hung connect/tools() never stalls a turn. */
 function withTimeout<T>(promise: Promise<T>, ms: number): Promise<T> {
  return new Promise<T>((resolve, reject) => {
--- a/apps/server/src/core/ai-chat/public-share-chat.service.ts
+++ b/apps/server/src/core/ai-chat/public-share-chat.service.ts
@@ -244,15 +244,6 @@ export class PublicShareChatService {
        },
      });

-      // Drain the stream independently of the client socket so the turn always
-      // runs to completion (or to its abort) even when the anonymous client
-      // disconnects — otherwise the dead socket is the only reader, backpressure
-      // stalls the stream, and the per-turn object graph stays rooted (heap-OOM
-      // leak). consumeStream removes that backpressure (AI SDK v6 "Handling
-      // client disconnects"). Fire-and-forget; stream errors are already logged
-      // by the streamText `onError` callback above.
-      void result.consumeStream({ onError: () => undefined });
-
      // Stream the UI-message protocol straight to the hijacked Node response.
      // Surface the real provider message (AI SDK error bodies never carry the
      // API key, so this is safe; we never dump the resolved config).
--- a/apps/server/src/integrations/ai/ai-streaming-fetch.ts
+++ b/apps/server/src/integrations/ai/ai-streaming-fetch.ts
@@ -70,47 +70,6 @@ export function streamKeepAliveMs(): number {
  return positiveEnv('AI_STREAM_KEEPALIVE_MS', DEFAULT_STREAM_KEEPALIVE_MS);
 }

-/** Default SILENCE timeout for EXTERNAL-MCP transport (5 min). */
-const DEFAULT_MCP_STREAM_TIMEOUT_MS = 300_000;
-
-/** Default total wall-clock cap for ONE external MCP tool call (15 min). */
-const DEFAULT_MCP_CALL_TIMEOUT_MS = 900_000;
-
-/**
- * SILENCE timeout (ms) for EXTERNAL-MCP transport ONLY. Override with
- * `AI_MCP_STREAM_TIMEOUT_MS`; a missing/invalid/non-positive value falls back to
- * {@link DEFAULT_MCP_STREAM_TIMEOUT_MS} (5 min).
- *
- * Deliberately tighter than the chat provider's {@link streamTimeoutMs} (15 min)
- * so a byte-silent/hung MCP upstream is broken in ~5 min instead of 15. This is
- * the undici `headersTimeout`/`bodyTimeout` for the external-MCP dispatcher only
- * — it must NOT change the chat provider, which legitimately needs 15 min between
- * reasoning chunks (#175).
- *
- * Trade-off: a legitimately long but byte-silent single tool call (a slow crawl
- * that emits nothing until done) and an SSE transport that idles >5 min BETWEEN
- * tool calls are also cut here. The per-call total cap ({@link mcpCallTimeoutMs},
- * applied in mcp-clients.service) is the complementary guard for chatty-but-stuck
- * calls that keep the socket warm yet never return.
- */
-export function mcpStreamTimeoutMs(): number {
-  return positiveEnv('AI_MCP_STREAM_TIMEOUT_MS', DEFAULT_MCP_STREAM_TIMEOUT_MS);
-}
-
-/**
- * Total wall-clock cap (ms) for ONE external MCP tool call — APP-LEVEL, not
- * transport. Override with `AI_MCP_CALL_TIMEOUT_MS`; a missing/invalid/
- * non-positive value falls back to {@link DEFAULT_MCP_CALL_TIMEOUT_MS} (15 min).
- *
- * Catches a tool that keeps the connection warm (SSE heartbeats / trickle) but
- * never returns a result — which the transport silence timeout
- * ({@link mcpStreamTimeoutMs}) would never break because the socket never goes
- * byte-silent.
- */
-export function mcpCallTimeoutMs(): number {
-  return positiveEnv('AI_MCP_CALL_TIMEOUT_MS', DEFAULT_MCP_CALL_TIMEOUT_MS);
-}
-
 /**
 * undici `Agent` options for streaming AI traffic — the (generous, finite)
 * silence timeouts plus the keep-alive recycle window. Shared by the chat
Author	SHA1	Message	Date
claude code agent 227	a670de7638	fix(mcp): write page body before title to avoid split-brain on failure (#159 ) updatePage (markdown) and updatePageJson wrote the title via REST FIRST, then the body via collab. If the body write failed (e.g. a collab persist timeout), the page was left with the NEW title over its OLD body — a split-brain the tool reported as an error but never repaired (red-team finding #10). Reorder both: write the body first, and only set the title after the body has persisted. Now a body-write failure leaves the title untouched (no split-brain). A title write failing after a successful body is rarer (REST is fast) and leaves correct content under a stale title — the strictly lesser inconsistency — which is the same trade-off the issue's "atomic, or roll back the title" intends, without the fragility of a rollback write that could itself fail. No unit test: both paths require a live collab provider and the suite has no provider mock; the change is a pure reordering. All 306 mcp tests still pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 05:14:42 +03:00
claude code agent 227	9703fc2b36	fix(share): SEO route must not leak a restricted page's title (#159 ) `ShareSeoController.getShare` resolved the inherited share with the RAW `getShareForPage`, which does NOT run the restricted-ancestor gate. So for a page shared with includeSubPages whose descendant is permission-restricted, the SEO route served that descendant's real title in <title>/og:title/twitter:title to anonymous visitors and crawlers — even though the content API returns 404 for it (red-team finding #3). Funnel the SEO path through the canonical `resolveReadableSharePage` boundary (the single place that checks `hasRestrictedAncestor`): a non-readable page now serves the plain SPA index with no meta. Also honour `isSharingAllowed` — a share whose workspace/space sharing toggle was flipped off after creation no longer leaks its title via SEO. Title comes from the server-resolved page; `buildShareMetaHtml` already emits robots=noindex when the share opted out of indexing. Tests (controller routing, fs spied at call time so bcrypt's native loader is untouched): non-readable page => plain index, no title; sharing-disabled => plain index; readable+indexing => title + og:title, no noindex; readable+no- indexing => noindex. Asserts getShareForPage is never called by the SEO path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 05:08:02 +03:00
claude code agent 227	86e631034c	fix(ai-chat): validate the open page server-side so the agent edits the right one (#159 ) The client sends the "current page" as { id, title } in the request body and the server echoed BOTH verbatim into the system prompt context and the getCurrentPage tool. id and title are independently attacker/desync-controllable (two tabs, stale navigation), so openPage.id could point at page B while openPage.title said "Page A" — the model then reported "updated Page A" while it actually edited page B (CASL still allowed it; the user has access). Red-team finding #4. Resolve the open page ONCE against the DB via a new `resolveOpenPageContext`: workspace-scoped lookup + access check, returning the AUTHORITATIVE { id, title } (title from the DB row, never the client) or null (fail-closed) for a missing / foreign / inaccessible page. That validated value now feeds the system prompt, the getCurrentPage tool, AND the new-chat history origin (which previously did this validation inline, for the id only — now shared, and the title is fixed too). Tests: resolveOpenPageContext covers no-id, not-found, foreign-workspace, Forbidden, non-Forbidden-fault (fail-closed), the DB-title-wins-over-client case, and null-title coercion. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 05:02:44 +03:00
claude code agent 227	524d69d089	feat(ai-chat): per-MCP-server instructions in the agent system prompt (#180 ) Admins can now give each EXTERNAL MCP server a free-text instruction ("how/ when to use this server's tools") that the agent receives in its SYSTEM PROMPT next to the tool descriptions — porting the built-in SERVER_INSTRUCTIONS idea to admin-configured servers. Trusted, admin-authored text (like a system prompt); NON-secret, so unlike headersEnc it IS returned in views/forms. - Migration: nullable `instructions text` on ai_mcp_servers (old rows = null = no guidance). Table type + repo insert/update (blank/whitespace -> null via blankToNull). DTO `@MaxLength(4000)`. Service threads it through McpServerView/toView. - mcp-clients: `McpServerInstruction { serverName, toolPrefix, instructions }` threaded through the toolset/cache/lease. Guidance is built ONLY for a server that actually connected AND contributed >=1 callable tool (the allowlist may filter all of them out) AND has non-blank text — so a guide never appears for tools the agent cannot call. Cached with the toolset, so an edit is picked up next turn via the existing CRUD cache invalidation. - System prompt: `buildMcpToolingBlock` renders an <mcp_tooling> block INSIDE the safety sandwich (after context, before the trailing SAFETY_FRAMEWORK) so it informs tool choice but cannot override the rules; each section is headed by the server's `prefix_*` namespace. Empty/blank -> block omitted. The caller (ai-chat.service) now builds the external toolset BEFORE the prompt and passes external.instructions; client-handle lifecycle (close-once) unchanged. - Client: instructions field in types + a Textarea (autosize, maxLength 4000) in the MCP-server form with a namespace-prefix hint; i18n (en/ru). Tests across every layer (prompt block placement + both SAFETY copies; view blank->null; buildEntry includes guidance only for connected+>=1-tool+non-blank; DTO MaxLength; repo + integration round-trip; service wiring). Delegated impl reviewed (APPROVE); applied the import-type follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 04:52:05 +03:00
claude code agent 227	6c6253caac	ci(test): run the server integration suite against real Postgres/Redis (#159 ) The only test command in CI was `pnpm -r test` (unit `.spec.ts` on mocks). `test:int` (`.int-spec.ts`, real Postgres/Redis) ran nowhere in CI — there were no DB `services:` — so the cost-cap, FK-cascade, jsonb round-trip and real AI-apply integration tests never gated a PR, and regressions in those high-severity paths stayed green (red-team finding #7). Add `services: postgres (pgvector) + redis` and a `pnpm --filter server test:int` step. The pgvector image is required because migrations create vector columns and global-setup runs `CREATE EXTENSION vector`. Service credentials/db match the defaults in apps/server/test/integration (docmost / docmost_dev_pw, maintenance db `docmost`, redis 6379), so no TEST_*_URL overrides are needed; global-setup drops/recreates the isolated docmost_test DB and migrates it. NOTE: the workflow change itself can only be validated by an actual CI run (YAML parses locally); the int-spec suite is verified passing locally on this branch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 04:35:06 +03:00
claude code agent 227	791bff419a	fix(mcp): refuse ambiguous patch_node/delete_node on duplicated ids (#159 ) Docmost duplicates block ids on copy/paste, and copyPageContent writes the source document verbatim with the same ids. `patchNode`/`deleteNode` address a block by `attrs.id` via replaceNodeById/deleteNodeById, which act on EVERY node sharing the id — so a single patch_node/delete_node could silently replace/remove multiple unrelated blocks with no signal to the model (red-team finding #6). Guard both write paths: when more than one node matches the id, skip the write entirely (the transform returns null -> no mutation) and throw a clear "ambiguous id — N nodes share it" error so the model re-targets with a more specific anchor. Only an unambiguous single match is written; the 0-match and 1-match behavior is unchanged. The duplicate-count basis is covered by node-ops.test.mjs (replaceNodeById / deleteNodeById report count===2 for a 2-duplicate doc). The end-to-end guard is not unit-tested because patchNode/deleteNode require a live collab provider and the test suite has no provider mock. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 04:33:57 +03:00
claude code agent 227	4ba8991bca	feat(footnotes): multi-backlinks — definition returns to ALL its references (#168 ) After #166 a repeated `[^a]` is one footnote (reuse): one number, one definition, N forward links. But the definition's ↩ only returned to the FIRST reference. Now a definition with N references shows ↩ a b c …, each backlink scrolling to its own occurrence (Pandoc/Wikipedia convention); a single-reference footnote keeps the plain ↩ unchanged. - editor-ext: `computeFootnoteRefCounts(doc)` (id -> occurrence count) cached alongside the number map in the numbering plugin state; `getFootnoteRefCount` getter (O(1), no per-render doc walk). `scrollToReference(id, index?)` picks the index-th `sup[data-footnote-ref][data-id]` occurrence (document order), falling back to the first. - client: FootnoteDefinitionView renders one lettered link (a, b, c, … aa …) per occurrence when refCount > 1; the chrome stays after the contentDOM so the #146 caret invariant holds. i18n keys (ru) added. Tests: computeFootnoteRefCounts + getFootnoteRefCount (reuse counts, unknown id => 0); structure test gains 3 cases (N lettered links render, click jumps to the n-th occorrence, single ref => one ↩). NOTE: the visual layout of the backlink row needs a real browser to verify (jsdom can't); the structural and behavioral contract is covered headless. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 04:27:29 +03:00
claude code agent 227	c79e7e549d	fix(db): jsonb double-encoding follow-ups from PR #172 review (#173 ) PR #172 fixed the jsonb double-encoding for `tool_allowlist` but the same class of bug, and the same re-derived workaround, remained elsewhere. 1. model_config (agent roles): jsonbObject still used the buggy `::jsonb` bind, so `ai_agent_roles.model_config` round-tripped as a jsonb STRING SCALAR. The read-path `typeof === 'object'` check then failed and the model override was SILENTLY dropped (role fell back to the default model). Fixed to `::text::jsonb` and added `parseModelConfig` + `normalizeRow` so every read self-heals already-corrupted rows (no migration). 2. Centralized the write workaround as `jsonbBind()` in database/utils.ts — one implementation with one explanation of the quirk — replacing the per-repo `jsonbArray` (mcp) and `jsonbObject` (roles). 3. Integration coverage (the fix is a DB round-trip a unit test cannot see; the read-side parser MASKS a write regression): new ai-mcp-server-repo.int-spec asserts `jsonb_typeof(tool_allowlist)='array'` after insert + heals a seeded string-scalar row; ai-agent-roles-repo int-spec gains the same for `model_config` (`'object'` + heal). 4. Updated the stale `ai-mcp-servers.types.ts` comment (the driver returns a JSON string for legacy rows; the repo normalizes every read). 5. Fail-open logging: a corrupt tool_allowlist degrades to "no restriction" (agent gets ALL tools) — normalizeRow now warns (server id only, never contents) so the silent widening leaves a trace. 6. Simplified parseToolAllowlist (normalize the string once, then a single array-of-strings check) — identical behaviour, all 12 cases still pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 04:19:34 +03:00
claude code agent 227	443ad3a856	fix(mcp): replaceImage no longer yanks the cursor (#164 ) `mutateLiveContentUnlocked` — the write path used by `replaceImage` — still did the pre-#152 destructive write (delete the whole fragment + applyUpdate a fresh Y.Doc), discarding every Yjs node id. y-prosemirror anchors the editor selection to those ids, so an open editor's cursor snapped to the document end on every image swap, exactly the #152 jump that the main write path no longer causes. Switch it to the same `applyDocToFragment(ydoc, newDoc)` structural diff (updateYFragment) as the main path, so unchanged nodes keep their ids and the live cursor stays put. It runs its own atomic transact, so the old explicit transact/delete is gone; the now-unused docmostExtensions import is dropped. Regression tests (cursor-stability suite): a sibling paragraph's RelativePosition survives a top-level image src/attachmentId swap, and an image nested in a callout, matching the shapes replaceImage produces. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 04:07:36 +03:00
claude code agent 227	66225294b2	feat(ai-chat): compact reasoning rendering — collapse blank lines (#181 ) The "Thinking" (reasoning) block rendered with large vertical gaps: models emit reasoning with a blank line (\n\n) between every list item and paragraph, which `marked` turns into loose lists (each <li> wrapped in a <p>) and separate <p> paragraphs, each carrying a margin. - Add `collapseBlankLines(text)`: collapse 2+ newlines to one, EXCEPT inside fenced code blocks (``` / ~~~) where blank lines are significant. Applied in reasoning-block.tsx before renderChatMarkdown, so loose lists become tight (no <li><p>) and paragraphs join; `breaks: true` keeps single \n as <br>, preserving line breaks. Reasoning-only — the normal answer is untouched. - Drop `white-space: pre-wrap` from `.reasoningText`: on the rendered markdown <div> it turned the newlines between block tags into visible blank lines on top of the margins. The plain-text fallback <Text> that needs pre-wrap already sets it inline. Tests: collapseBlankLines unit (collapse, fence preservation incl. tilde and unclosed fences) + rendered-HTML assertions that a blank-line-separated list becomes a tight list and still parses as a list after a paragraph. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 04:04:11 +03:00
claude code agent 227	25121e269e	fix(ai-chat): tick the live token counter between agent steps (#163 ) The header token badge (and the "Thinking… · N tokens" line) froze between agent steps and jumped in chunks instead of ticking smoothly. liveTurnTokens returned the authoritative server `usage` VERBATIM as soon as it appeared, but the server only attaches usage at a step boundary and it is cumulative over COMPLETED steps — so during the next (in-flight) step the figure stayed frozen at the previous boundary and the running text estimate was ignored. Combine both sources per component via max: always compute the running estimate (chars/≈4 over the message's reasoning/text parts, which includes the in-flight step) and take max(authoritativeBase, estimate). Between boundaries the estimate ticks the number up; at a boundary the authoritative figure snaps it exact; and because the server usage is cumulative and we only ever take the max, the counter is monotonic (never drops). Reasoning/output stay split; the #151 reasoning-only authoritative count is preserved. Backward compatible: in every existing test the estimate is <= the authoritative figure, so max returns the same value. +4 tests for the in-flight-step-exceeds- base case (output + reasoning), the authoritative-wins case, and monotonicity. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 03:59:56 +03:00