fix(ai-chat): don't sever long agent turns at undici's 300s stream timeout (#175 )

Long research turns failed mid-task with "Lost connection to the AI provider". Node's global fetch (undici) defaults BOTH headersTimeout and bodyTimeout to 300_000ms, and the chat provider + the external-MCP dispatcher both ran on it with no override, so: - the z.ai chat stream dropped when a late step's huge accumulated context pushed the model's time-to-first-token past 5 min (reproduced: even a trivial glm-5.2 query has a ~4-8s first-chunk latency; the live telemetry shows it scaling with context — and a long run reaches 400k+-token steps), or a reasoning model paused >5 min between chunks (bodyTimeout); - the crawl4ai SSE transport, held open across the whole turn, dropped when it idled >5 min between tool calls — a tool failure that aborts the turn and surfaces the same banner. Fix: a dedicated undici dispatcher with both stream timeouts DISABLED (0) on each path. Cancellation is unchanged — the turn is bound to the request abortSignal (client disconnect) and capped by MAX_AGENT_STEPS, so it still terminates; it just no longer dies at an arbitrary 5-minute wall-clock. - ai-streaming-fetch.ts: createStreamingFetch() (+ exported option contract). - ai.service: the chat provider's fetch is now createStreamingFetch(), wrapped by the existing passive ECONNRESET telemetry (createDiagnosticFetch gained an optional baseFetch) so the telemetry observes the SAME transport the turn uses. - mcp-clients: headersTimeout/bodyTimeout: 0 on the SSRF-pinned Agent. Investigation: reproduced the transport mechanism against the real z.ai endpoint (a 1ms headersTimeout throws UND_ERR_HEADERS_TIMEOUT — the exact drop) and ran the actual research agent to a ~428k-token context. Verified the fixed path streams cleanly live (glm-5.2 turns finish; telemetry confirms the streaming fetch is in use). Tests: ai-streaming-fetch.spec (option contract + streams a delayed response); ai-http-diagnostics + ai/mcp specs green. server tsc clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 21:50:41 +03:00
180 changed files with 2666 additions and 15964 deletions
--- a/.env.example
+++ b/.env.example
@@ -136,32 +136,6 @@ MCP_DOCMOST_PASSWORD=
 # A slow/hung embeddings endpoint fails after this and the batch continues.
 # AI_EMBEDDING_TIMEOUT_MS=120000

-# Silence timeout (ms) for streaming chat/agent AI calls AND external-MCP traffic.
-# Bounds time-to-first-byte and the gap BETWEEN chunks (NOT the total turn length),
-# so an arbitrarily long turn that keeps streaming is never cut. Finite so a hung
-# provider is eventually broken instead of leaking forever. Default 900000 (15 min).
-# AI_STREAM_TIMEOUT_MS=900000
-
-# Keep-alive recycle window (ms) for streaming chat/agent AI + external-MCP calls.
-# A pooled connection idle longer than this is closed instead of reused, so a
-# NAT / egress firewall / reverse proxy that silently drops idle connections
-# cannot poison a reused socket into a PRE-RESPONSE `read ECONNRESET`. Lower it if
-# your egress drops idle connections faster than ~10s. Default 10000 (10 s).
-# AI_STREAM_KEEPALIVE_MS=10000
-
-# Silence timeout (ms) for EXTERNAL-MCP transport ONLY (not the chat provider).
-# Tighter than AI_STREAM_TIMEOUT_MS so a byte-silent/hung MCP server is broken in
-# ~5 min instead of 15. Note it also cuts a legitimately long but byte-silent
-# single tool call (a slow crawl that emits nothing until done) and an SSE
-# transport idling >5 min BETWEEN tool calls. Default 300000 (5 min).
-# AI_MCP_STREAM_TIMEOUT_MS=300000
-
-# Total wall-clock cap (ms) for ONE external MCP tool call (app-level, not
-# transport). Aborts a tool that keeps the socket warm (SSE heartbeats / trickle)
-# but never returns a result — which the silence timeout above never breaks.
-# Default 900000 (15 min).
-# AI_MCP_CALL_TIMEOUT_MS=900000
-
 # --- Anonymous public-share AI assistant ---
 # Opt-in per workspace (AI settings -> "public share assistant"; off by default).
 # When enabled, anonymous visitors of a published share can ask an AI about that
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -15,38 +15,6 @@ permissions:
 jobs:
  test:
    runs-on: ubuntu-latest
-    # Real Postgres + Redis so the server integration suite (`*.int-spec.ts`,
-    # behind `pnpm --filter server test:int`) runs in CI (red-team finding #7).
-    # Without it, cost-cap / FK-cascade / jsonb-round-trip / real-apply tests
-    # only ran locally, so regressions in those paths stayed green in CI.
-    # Postgres uses the pgvector image because migrations create vector columns
-    # and global-setup runs `CREATE EXTENSION vector`. Credentials/db match the
-    # defaults in apps/server/test/integration/db.ts + global-setup.ts
-    # (docmost / docmost_dev_pw, maintenance db `docmost`, redis on 6379), so no
-    # TEST_*_URL overrides are needed.
-    services:
-      postgres:
-        image: pgvector/pgvector:pg18
-        env:
-          POSTGRES_USER: docmost
-          POSTGRES_PASSWORD: docmost_dev_pw
-          POSTGRES_DB: docmost
-        ports:
-          - 5432:5432
-        options: >-
-          --health-cmd "pg_isready -U docmost"
-          --health-interval 10s
-          --health-timeout 5s
-          --health-retries 5
-      redis:
-        image: redis:7
-        ports:
-          - 6379:6379
-        options: >-
-          --health-cmd "redis-cli ping"
-          --health-interval 10s
-          --health-timeout 5s
-          --health-retries 5
    steps:
      - name: Checkout
        uses: actions/checkout@v4
@@ -68,12 +36,5 @@ jobs:
      - name: Build editor-ext
        run: pnpm --filter @docmost/editor-ext build

-      - name: Run unit tests
+      - name: Run tests
        run: pnpm -r test
-
-      # Integration suite against the real Postgres/Redis services above. Runs
-      # the FK-cascade, cost-cap, jsonb-round-trip and real-apply specs that the
-      # unit run (mocks only) cannot cover. global-setup drops/recreates the
-      # isolated `docmost_test` DB and migrates it to latest.
-      - name: Run server integration tests
-        run: pnpm --filter server test:int
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -12,21 +12,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 ### Added

- **Persistent AI-chat history as the source of truth + server-side export.**
-  An assistant turn is now persisted to the database step by step: the row is
-  inserted upfront as `streaming` and updated as each agent step finishes, then
-  finalized once to `completed`/`error`/`aborted`. A process that dies mid-turn
-  keeps every finished step, and a startup sweep flips any dangling `streaming`
-  row (untouched for 10 minutes) to `aborted`. Chat "Copy" now exports
-  server-side from these rows (`POST /ai-chat/export`) rather than from live
-  client state, so the export is identical whether a chat is freshly streaming,
-  just switched to, or reloaded — and is available from the first turn of a new
-  chat. (#183, #174)
-
 - **AI-agent attribution for MCP writes.** Comments (and pages) created through
  the MCP endpoint by a dedicated agent account are now badged as "AI", with
  unspoofable provenance derived from a per-user `is_agent` flag (not from the
-  request body). **Operator setup:** use a _dedicated_ service account for the
+  request body). **Operator setup:** use a *dedicated* service account for the
  MCP fallback and set the flag with SQL —
  `UPDATE users SET is_agent = true WHERE email = '<mcp-account>'`. Never flag a
  human or shared account, or its normal edits get mis-attributed as AI. See the
@@ -36,44 +25,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
  flagging dangling references, empty or duplicate definitions, and `[^id]`
  markers inside table rows, so an agent can fix its own markup. The page is
  still created; the field is omitted when there are no problems. (#166)
- **AI chat "Protocol" setting (`chatApiStyle`).** A new admin choice in AI
-  settings for the `openai` driver: `openai-compatible` (default) routes chat
-  through `@ai-sdk/openai-compatible`, which surfaces a provider's streamed
-  reasoning (`reasoning_content` → reasoning parts) for z.ai/GLM, DeepSeek,
-  OpenRouter, etc.; `openai` uses the official provider (real-OpenAI
-  reasoning-model request shaping). Chosen explicitly rather than inferred from
-  the base URL, since a custom URL can front real OpenAI too. (#175, #177)
- **AI chat "Context window (tokens)" setting (`chatContextWindow`).** A new
-  admin field in AI settings that records the chat model's context-window size.
-  When set (> 0) it becomes the denominator of the header context-badge, which
-  now reads "used / max"; `0`/empty clears the limit and the badge shows only
-  the current context as before. There is no provider-independent way to read a
-  model's window automatically, so it is an explicit workspace-level value.
-  (#189)
- **Per-MCP-server instructions in the agent prompt.** Each external MCP server
-  now has an admin-authored `instructions` field ("how/when to use this server's
-  tools") that is injected into the agent's system prompt next to that server's
-  tool descriptions. Trusted text, rendered inside the prompt safety sandwich;
-  shown only for a server that actually connected and contributed ≥1 callable
-  tool. (#180)
- **Footnote multi-backlinks.** A footnote referenced more than once now shows a
-  back-link per reference (↩ a b c …), each scrolling to its own occurrence, like
-  Pandoc/Wikipedia; a single-reference footnote keeps the plain ↩. (#168)

 ### Changed

- **AI chat default provider is now `openai-compatible` (reasoning surfaced).**
-  For the `openai` driver the chat provider defaults to the openai-compatible
-  implementation, so a workspace pointing at z.ai/GLM/DeepSeek now streams the
-  model's reasoning out of the box. An endpoint that is real OpenAI behind a
-  custom base URL should set the new `chatApiStyle` "Protocol" to `openai`. (#177)
-
- **AI chat header context-badge now shows "used / max".** When an admin sets
-  the new `chatContextWindow`, the badge displays the current context size over
-  the configured window (e.g. `120k / 200k`) instead of switching to a live
-  per-turn token counter during streaming. With no window configured the badge
-  keeps showing just the current context. (#189)
-
 - **Footnotes now reuse (Pandoc semantics).** Multiple `[^a]` references to the
  same id are ONE footnote — one number, one definition, several back-references
  — instead of being renamed to `a__2`, `a__3`. Duplicate `[^a]:` definitions are
@@ -100,11 +54,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
  are nudged after a paste to refresh stale hit-testing geometry. The caret
  symptom is macOS-specific and was confirmed manually on macOS; the automated
  guard pins the DOM-order invariant, not the caret behavior itself. (#146, #147)
- **AI chat: the live token counter now ticks between agent steps.** During a
-  multi-step turn the header token badge (and the "Thinking… · N tokens" line)
-  no longer froze on the previous step's authoritative usage; the current step's
-  estimate is combined per-component with `max`, so the count rises smoothly and
-  never jumps backwards. (#163)

 ## [0.93.0] - 2026-06-21

@@ -188,7 +137,8 @@ embeds — plus a large batch of security hardening and test coverage.
 - Page templates: import `ThrottleModule` so collab boots, never strand an
  in-flight page-embed id, and add defense-in-depth workspace checks.
 - Pages: `movePage` cycle guard with no phantom `PAGE_MOVED` event.
- Import: surface the real error cause from `/pages/import` instead of a generic 400.
+- Import: surface the real error cause from `/pages/import` instead of a generic
+  400.

 ### Security

--- a/apps/client/public/locales/en-US/translation.json
+++ b/apps/client/public/locales/en-US/translation.json
@@ -258,7 +258,6 @@
  "Copy to space": "Copy to space",
  "Copy chat": "Copy chat",
  "Copied": "Copied",
-  "Failed to export chat": "Failed to export chat",
  "Duplicate": "Duplicate",
  "Select a user": "Select a user",
  "Select a group": "Select a group",
@@ -711,7 +710,6 @@
  "Authorization header": "Authorization header",
  "Tool allowlist": "Tool allowlist",
  "Optional. Leave empty to allow all tools the server exposes.": "Optional. Leave empty to allow all tools the server exposes.",
-  "Optional guidance for the agent on how and when to use this server's tools. Injected into the system prompt. The server's tools are namespaced as \"<server name>_*\".": "Optional guidance for the agent on how and when to use this server's tools. Injected into the system prompt. The server's tools are namespaced as \"<server name>_*\".",
  "Test": "Test",
  "Available tools": "Available tools",
  "No tools available": "No tools available",
@@ -1079,8 +1077,6 @@
  "Undo": "Undo",
  "Redo": "Redo",
  "Backlinks": "Backlinks",
-  "Back to references": "Back to references",
-  "Back to reference {{label}}": "Back to reference {{label}}",
  "Last updated by": "Last updated by",
  "Last updated": "Last updated",
  "Stats": "Stats",
@@ -1168,10 +1164,7 @@
  "Built-in assistant persona": "Built-in assistant persona",
  "Minimize": "Minimize",
  "Current context size": "Current context size",
-  "Context size / model limit": "Context size / model limit",
-  "Context window (tokens)": "Context window (tokens)",
-  "Shows used / total in the chat header badge; empty hides the total.": "Shows used / total in the chat header badge; empty hides the total.",
-  "e.g. 200000": "e.g. 200000",
+  "Tokens generated this turn": "Tokens generated this turn",
  "AI agent": "AI agent",
  "Take a look at the current document": "Take a look at the current document",
  "AI agent is typing…": "AI agent is typing…",
@@ -1314,9 +1307,5 @@
  "Page tree (child pages, recursive)": "Page tree (child pages, recursive)",
  "Render the full nested tree of all descendant pages": "Render the full nested tree of all descendant pages",
  "Showing {{count}} subpages_one": "Showing {{count}} subpage",
-  "Showing {{count}} subpages_other": "Showing {{count}} subpages",
-  "Protocol": "Protocol",
-  "How chat requests are sent and how reasoning is surfaced": "How chat requests are sent and how reasoning is surfaced",
-  "OpenAI-compatible (surfaces reasoning)": "OpenAI-compatible (surfaces reasoning)",
-  "OpenAI (official)": "OpenAI (official)"
+  "Showing {{count}} subpages_other": "Showing {{count}} subpages"
 }
--- a/apps/client/public/locales/ru-RU/translation.json
+++ b/apps/client/public/locales/ru-RU/translation.json
@@ -257,7 +257,6 @@
  "Copy": "Копировать",
  "Copy to space": "Копировать в пространство",
  "Copied": "Скопировано",
-  "Failed to export chat": "Не удалось экспортировать чат",
  "Duplicate": "Дублировать",
  "Select a user": "Выберите пользователя",
  "Select a group": "Выберите группу",
@@ -406,8 +405,6 @@
  "Footnote {{number}}": "Сноска {{number}}",
  "Go to footnote": "Перейти к сноске",
  "Back to reference": "Вернуться к ссылке",
-  "Back to references": "Вернуться к ссылкам",
-  "Back to reference {{label}}": "Вернуться к ссылке {{label}}",
  "Empty footnote": "Пустая сноска",
  "Math inline": "Строчная формула",
  "Insert inline math equation.": "Вставить математическое выражение в строку.",
@@ -705,10 +702,7 @@
  "Copy chat": "Копировать чат",
  "Created successfully": "Успешно создано",
  "Current context size": "Текущий размер контекста",
-  "Context size / model limit": "Размер контекста / лимит модели",
-  "Context window (tokens)": "Размер окна контекста (токены)",
-  "Shows used / total in the chat header badge; empty hides the total.": "Показывает использовано/всего в шапке чата; пусто — скрыть лимит.",
-  "e.g. 200000": "напр. 200000",
+  "Tokens generated this turn": "Токенов сгенерировано за ход",
  "Delete this chat?": "Удалить этот чат?",
  "Deleted successfully": "Успешно удалено",
  "Edited by AI agent on behalf of {{name}}": "Отредактировано AI-агентом от имени {{name}}",
@@ -755,8 +749,6 @@
  "Manage API keys for all users in the workspace. View the <anchor>API documentation</anchor> for usage details.": "Управляйте API-ключами для всех пользователей в рабочем пространстве. Смотрите <anchor>документацию по API</anchor> для получения информации об использовании.",
  "View the <anchor>API documentation</anchor> for usage details.": "Смотрите <anchor>документацию по API</anchor> для получения информации об использовании.",
  "View the <anchor>MCP documentation</anchor>.": "Смотрите <anchor>документацию по MCP</anchor>.",
-  "Instructions": "Инструкции",
-  "Optional guidance for the agent on how and when to use this server's tools. Injected into the system prompt. The server's tools are namespaced as \"<server name>_*\".": "Необязательное указание агенту, как и когда использовать инструменты этого сервера. Добавляется в системный промпт. Инструменты сервера именуются с префиксом «<имя сервера>_*».",
  "Sources": "Источники",
  "AI Answers not available for attachments": "Ответы ИИ недоступны для вложений",
  "No answer available": "Ответ недоступен",
@@ -1168,9 +1160,5 @@
  "Render the full nested tree of all descendant pages": "Показать полное вложенное дерево всех дочерних страниц",
  "Showing {{count}} subpages_one": "Показано {{count}} подстраница",
  "Showing {{count}} subpages_few": "Показано {{count}} подстраницы",
-  "Showing {{count}} subpages_many": "Показано {{count}} подстраниц",
-  "Protocol": "Протокол",
-  "How chat requests are sent and how reasoning is surfaced": "Как отправляются запросы чата и как показывается reasoning",
-  "OpenAI-compatible (surfaces reasoning)": "OpenAI-совместимый (показывает reasoning)",
-  "OpenAI (official)": "OpenAI (официальный)"
+  "Showing {{count}} subpages_many": "Показано {{count}} подстраниц"
 }
--- a/apps/client/src/features/ai-chat/components/ai-chat-window.tsx
+++ b/apps/client/src/features/ai-chat/components/ai-chat-window.tsx
@@ -6,7 +6,8 @@ import {
  useRef,
  useState,
 } from "react";
-import { Group, Loader } from "@mantine/core";
+import { type UIMessage } from "@ai-sdk/react";
+import { Group, Loader, Tooltip } from "@mantine/core";
 import {
  IconArrowsDiagonal,
  IconCheck,
@@ -39,8 +40,7 @@ import {
 } from "@/features/ai-chat/queries/ai-chat-query.ts";
 import ConversationList from "@/features/ai-chat/components/conversation-list.tsx";
 import ChatThread from "@/features/ai-chat/components/chat-thread.tsx";
-import { ContextBadge } from "@/features/ai-chat/components/context-badge.tsx";
-import { exportAiChat } from "@/features/ai-chat/services/ai-chat-service.ts";
+import { buildChatMarkdown } from "@/features/ai-chat/utils/chat-markdown.ts";
 import { useChatSession } from "@/features/ai-chat/hooks/use-chat-session.ts";
 import {
  shouldCollapseOnOutsidePointer,
@@ -61,6 +61,13 @@ const MIN_HEIGHT = 400;
 // Margin kept between the window and the viewport edges while dragging.
 const EDGE_MARGIN = 8;

+/** Compact token formatter: 1.2M / 3.4k / 950. */
+function formatTokens(n: number): string {
+  if (n >= 1_000_000) return `${(n / 1_000_000).toFixed(1)}M`;
+  if (n >= 1_000) return `${(n / 1_000).toFixed(1)}k`;
+  return String(n);
+}
+
 // Compute the initial top-right placement at the default size, fitted to the
 // current viewport. Reads `window` only when called (inside an effect).
 function computeInitialGeom() {
@@ -73,31 +80,17 @@ function computeInitialGeom() {
    Math.min(DEFAULT_HEIGHT, window.innerHeight - 2 * EDGE_MARGIN),
  );
  const left = Math.max(EDGE_MARGIN, window.innerWidth - width - 24);
-  const maxTop = Math.max(
-    EDGE_MARGIN,
-    window.innerHeight - height - EDGE_MARGIN,
-  );
+  const maxTop = Math.max(EDGE_MARGIN, window.innerHeight - height - EDGE_MARGIN);
  const top = Math.min(60, maxTop);
  return { left, top, width, height };
 }

 // Clamp a geometry so the window stays within the current viewport.
-function clampGeom(g: {
-  left: number;
-  top: number;
-  width: number;
-  height: number;
-}) {
+function clampGeom(g: { left: number; top: number; width: number; height: number }) {
  const effWidth = Math.max(g.width, MIN_WIDTH);
  const effHeight = Math.max(g.height, MIN_HEIGHT);
-  const maxLeft = Math.max(
-    EDGE_MARGIN,
-    window.innerWidth - effWidth - EDGE_MARGIN,
-  );
-  const maxTop = Math.max(
-    EDGE_MARGIN,
-    window.innerHeight - effHeight - EDGE_MARGIN,
-  );
+  const maxLeft = Math.max(EDGE_MARGIN, window.innerWidth - effWidth - EDGE_MARGIN);
+  const maxTop = Math.max(EDGE_MARGIN, window.innerHeight - effHeight - EDGE_MARGIN);
  return {
    ...g,
    left: Math.min(Math.max(EDGE_MARGIN, g.left), maxLeft),
@@ -114,7 +107,7 @@ function clampGeom(g: {
 * ported from the GitmostAgent.jsx design.
 */
 export default function AiChatWindow() {
-  const { t, i18n } = useTranslation();
+  const { t } = useTranslation();
  const clipboard = useClipboard({ timeout: 500 });
  const queryClient = useQueryClient();
  const [windowOpen, setWindowOpen] = useAtom(aiChatWindowOpenAtom);
@@ -155,6 +148,20 @@ export default function AiChatWindow() {
  const { data: messageRows, isLoading: messagesLoading } =
    useAiChatMessagesQuery(activeChatId ?? undefined);

+  // Live snapshot of the active thread's useChat state, kept up to date by
+  // ChatThread. Lets the export include the in-progress (not-yet-persisted)
+  // streaming turn. A ref avoids re-rendering this window on every token.
+  const liveThreadRef = useRef<{ messages: UIMessage[]; isStreaming: boolean }>({
+    messages: [],
+    isStreaming: false,
+  });
+
+  // Live turn-token total (reasoning + output) for the in-flight turn, pushed up
+  // (THROTTLED to ~8 Hz inside ChatThread) so the header badge ticks mid-stream.
+  // `null` means no turn is in flight -> the badge falls back to the persisted
+  // context size below.
+  const [liveTurnTokens, setLiveTurnTokens] = useState<number | null>(null);
+
  // The page the user is currently viewing. AiChatWindow lives in a pathless
  // parent layout route, so useParams() can't see :pageSlug. Match the full
  // pathname against the authenticated page route instead so "the current page"
@@ -178,22 +185,17 @@ export default function AiChatWindow() {
  // The invalidate closures are passed inline: `onTurnFinished` is read live by
  // useChat's onFinish (never in an effect dep array), so their identity does not
  // matter — no memoization ceremony needed.
-  const {
-    threadKey,
-    waitingForHistory,
-    onTurnFinished,
-    onServerChatId,
-    cancelPendingAdoption,
-  } = useChatSession({
-    activeChatId,
-    setActiveChatId,
-    chats,
-    messagesLoading,
-    onInvalidateChatList: () =>
-      queryClient.invalidateQueries({ queryKey: AI_CHATS_RQ_KEY }),
-    onInvalidateChatMessages: (id) =>
-      queryClient.invalidateQueries({ queryKey: AI_CHAT_MESSAGES_RQ_KEY(id) }),
-  });
+  const { threadKey, waitingForHistory, onTurnFinished, cancelPendingAdoption } =
+    useChatSession({
+      activeChatId,
+      setActiveChatId,
+      chats,
+      messagesLoading,
+      onInvalidateChatList: () =>
+        queryClient.invalidateQueries({ queryKey: AI_CHATS_RQ_KEY }),
+      onInvalidateChatMessages: (id) =>
+        queryClient.invalidateQueries({ queryKey: AI_CHAT_MESSAGES_RQ_KEY(id) }),
+    });

  // startNewChat/selectChat set the public atom; the hook's render-phase
  // reconciler handles the remount when activeChatId actually CHANGES. But
@@ -223,28 +225,19 @@ export default function AiChatWindow() {
    [cancelPendingAdoption, setActiveChatId, setDraft, setSelectedRoleId],
  );

-  // The active chat object (for its title) and an export gate. The export is now
-  // SERVER-sourced (the DB is the single source of truth — #183): the assistant
-  // row is persisted upfront + per step, so even a brand-new chat whose first
-  // turn is streaming/interrupted has a server row to render. Enable the button
-  // whenever a persisted chat is active (`activeChatId` is set). For a BRAND-NEW
-  // chat that id is adopted EARLY — at the stream's `start` chunk via
-  // onServerChatId (#174) — so the Copy button is available during the first
-  // turn's stream, not only after it terminates.
+  // The active chat object (for its title) and an export gate: only enable the
+  // export button when an existing chat with loaded persisted rows is active.
  const activeChat = useMemo(
    () => chats?.items?.find((c) => c.id === activeChatId) ?? null,
    [chats, activeChatId],
  );
-  const canExport = !!activeChatId;
+  const canExport = !!activeChatId && !!messageRows && messageRows.length > 0;

  // The role to display in the header and as the assistant's name. Prefer the
  // persisted role of an existing chat (chat-list JOIN); fall back to the role
  // picked via a card click for a brand-new or just-adopted chat. selectChat
  // resets selectedRoleId, so this fallback never leaks into an unrelated chat.
-  const currentRole = useMemo<{
-    name: string;
-    emoji: string | null;
-  } | null>(() => {
+  const currentRole = useMemo<{ name: string; emoji: string | null } | null>(() => {
    if (activeChat?.roleName) {
      return { name: activeChat.roleName, emoji: activeChat.roleEmoji ?? null };
    }
@@ -252,21 +245,37 @@ export default function AiChatWindow() {
    return picked ? { name: picked.name, emoji: picked.emoji } : null;
  }, [activeChat, enabledRoles, selectedRoleId]);

-  // Fetch the server-rendered Markdown export and copy it to the clipboard. The
-  // server is the single source of truth (#183): it renders the transcript from
-  // the persisted rows — including an interrupted turn's in-progress row — so the
-  // export is identical whether the chat is freshly streaming, just switched to,
-  // or reloaded. The `lang` of the active i18n drives the few localized labels.
-  const handleCopy = useCallback(async () => {
-    if (!activeChatId) return;
-    try {
-      const markdown = await exportAiChat(activeChatId, i18n.language);
-      clipboard.copy(markdown);
-      notifications.show({ message: t("Copied") });
-    } catch {
-      notifications.show({ message: t("Failed to export chat"), color: "red" });
-    }
-  }, [activeChatId, clipboard, t, i18n.language]);
+  // Build a Markdown export from the already-loaded persisted rows (no network
+  // call) and copy it to the clipboard. The "Copied" notification is the
+  // feedback.
+  const handleCopy = useCallback(() => {
+    if (!activeChatId || !messageRows || messageRows.length === 0) return;
+    // While the active thread is streaming, the current user message and the
+    // in-progress assistant reply are NOT yet in messageRows (the persisted
+    // query is only refetched after the turn finishes). Pull the live tail —
+    // messages whose id is not among the persisted rows — and append them,
+    // flagging the streaming assistant message as still generating.
+    const live = liveThreadRef.current;
+    const rowIds = new Set(messageRows.map((r) => r.id));
+    const pending = live.isStreaming
+      ? live.messages
+          .filter((m) => !rowIds.has(m.id))
+          .map((m) => ({
+            role: m.role,
+            parts: (m.parts ?? []) as { type: string; text?: string }[],
+            generating: m.role === "assistant",
+          }))
+      : [];
+    const markdown = buildChatMarkdown({
+      title: activeChat?.title ?? null,
+      chatId: activeChatId,
+      rows: messageRows,
+      pending,
+      t,
+    });
+    clipboard.copy(markdown);
+    notifications.show({ message: t("Copied") });
+  }, [activeChatId, messageRows, activeChat, clipboard, t]);

  // Current context size for the active chat: how much the conversation now
  // occupies in the model's context window — NOT the cumulative tokens spent.
@@ -294,21 +303,6 @@ export default function AiChatWindow() {
    return 0;
  }, [activeChatId, messageRows]);

-  // The model's context-window size (badge denominator), read from the most
-  // recent assistant row that carries it. Admin-configured in AI settings and
-  // stamped onto the turn server-side, so it travels with the message metadata —
-  // no client-side model resolution, and it survives public shares / per-role
-  // models automatically. 0 (no limit configured, or older rows) → the badge
-  // hides the denominator and shows only the current context size.
-  const maxContextTokens = useMemo(() => {
-    if (!activeChatId || !messageRows) return 0;
-    for (let i = messageRows.length - 1; i >= 0; i--) {
-      const max = messageRows[i].metadata?.maxContextTokens;
-      if (typeof max === "number" && max > 0) return max;
-    }
-    return 0;
-  }, [activeChatId, messageRows]);
-
  // On (re)open, settle the geometry before paint (useLayoutEffect → no
  // first-frame jump): compute an initial top-right placement the first time,
  // and re-clamp an existing geometry to the current viewport on later opens
@@ -357,8 +351,7 @@ export default function AiChatWindow() {
      const width = el.offsetWidth;
      const height = el.offsetHeight;
      setGeom((prev) => {
-        if (!prev || (prev.width === width && prev.height === height))
-          return prev;
+        if (!prev || (prev.width === width && prev.height === height)) return prev;
        return { ...prev, width, height };
      });
    });
@@ -498,14 +491,19 @@ export default function AiChatWindow() {
        )}

        <div style={{ flex: 1, display: "flex", justifyContent: "center" }}>
-          {/* Context badge: always "current / max" context size (or just current
-              when no model limit is configured). It no longer flips to a live
-              per-turn generation counter mid-stream — that live feedback lives in
-              the chat body's "Thinking · N tokens" block. */}
-          <ContextBadge
-            contextTokens={contextTokens}
-            maxContextTokens={maxContextTokens}
-          />
+          {/* While a turn streams, show the LIVE turn-token count (ticks ~8 Hz);
+              once it finishes, fall back to the persisted context size. Require
+              > 0 so the very first emit (an empty tail message, count 0) does not
+              flash a "0" badge before any token streams in (#151 review). */}
+          {liveTurnTokens !== null && liveTurnTokens > 0 ? (
+            <Tooltip label={t("Tokens generated this turn")} withArrow>
+              <span className={classes.badge}>{formatTokens(liveTurnTokens)}</span>
+            </Tooltip>
+          ) : contextTokens > 0 ? (
+            <Tooltip label={t("Current context size")} withArrow>
+              <span className={classes.badge}>{formatTokens(contextTokens)}</span>
+            </Tooltip>
+          ) : null}
        </div>

        <div style={{ display: "flex", alignItems: "center", gap: 1 }}>
@@ -517,11 +515,7 @@ export default function AiChatWindow() {
              aria-label={t("Copy chat")}
              onClick={handleCopy}
            >
-              {clipboard.copied ? (
-                <IconCheck size={14} />
-              ) : (
-                <IconCopy size={14} />
-              )}
+              {clipboard.copied ? <IconCheck size={14} /> : <IconCopy size={14} />}
            </button>
          )}
          <button
@@ -627,7 +621,8 @@ export default function AiChatWindow() {
              onRolePicked={(role) => setSelectedRoleId(role.id)}
              assistantName={currentRole?.name}
              onTurnFinished={onTurnFinished}
-              onServerChatId={onServerChatId}
+              liveStateRef={liveThreadRef}
+              onLiveTurnTokens={setLiveTurnTokens}
            />
          )}
        </div>
--- a/apps/client/src/features/ai-chat/components/ai-chat.module.css
+++ b/apps/client/src/features/ai-chat/components/ai-chat.module.css
@@ -55,45 +55,6 @@
    padding-inline-start: 1.4em;
 }

-/* GFM tables in assistant markdown. The chat lives in a NARROW side panel, so a
-   wide LLM table must scroll horizontally instead of collapsing its columns:
-   `.markdown` sets `word-break: break-word`, which (with the default table
-   layout) shrinks columns to a single glyph and wraps headers mid-word
-   ("Секция" -> "Секци / я"). Make the table a horizontally scrollable block,
-   give cells a readable minimum width, and restore word-boundary wrapping. */
-.markdown table {
-    display: block;
-    /* lets the table scroll horizontally on its own */
-    max-width: 100%;
-    overflow-x: auto;
-    border-collapse: collapse;
-    margin-block-end: 0.5em;
-}
-
-.markdown th,
-.markdown td {
-    border: 1px solid light-dark(var(--mantine-color-gray-3), var(--mantine-color-dark-4));
-    padding: 3px 8px;
-    /* readable floor; the block scrolls when the row exceeds the panel */
-    min-width: 6em;
-    text-align: left;
-    vertical-align: top;
-    /* cancel the inherited break-word so words don't split mid-glyph */
-    word-break: normal;
-    /* still wrap genuinely long words / URLs at the cell edge */
-    overflow-wrap: break-word;
-}
-
-.markdown th {
-    background: light-dark(var(--mantine-color-gray-1), var(--mantine-color-dark-5));
-    font-weight: 600;
-}
-
-/* GFM wraps cell text in <p>; drop its default block margin inside cells. */
-.markdown table p {
-    margin: 0;
-}
-
 /* Animated three-dot "typing" indicator shown while the agent is thinking but
   has not yet produced any visible text/tool parts. */
 .typingDots {
@@ -161,11 +122,7 @@
    margin-top: 4px;
    font-size: var(--mantine-font-size-xs);
    color: light-dark(var(--mantine-color-gray-7), var(--mantine-color-dark-1));
-    /* NOTE: `white-space: pre-wrap` is intentionally NOT set here. On the
-       rendered markdown <div> it would turn the newlines between block tags
-       (</li>\n<li>, </p>\n<ol>) into visible blank lines/indents on top of the
-       margins. The plain-text fallback <Text> that needs pre-wrap sets it
-       inline itself (see reasoning-block.tsx). */
+    white-space: pre-wrap;
 }

 .reasoningText p {
--- a/apps/client/src/features/ai-chat/components/chat-thread.tsx
+++ b/apps/client/src/features/ai-chat/components/chat-thread.tsx
@@ -1,4 +1,11 @@
-import { useCallback, useEffect, useMemo, useRef, useState } from "react";
+import {
+  useCallback,
+  useEffect,
+  useMemo,
+  useRef,
+  useState,
+  type MutableRefObject,
+} from "react";
 import { generateId } from "ai";
 import { ActionIcon, Box, Group, Stack, Text } from "@mantine/core";
 import { IconClockHour4, IconX } from "@tabler/icons-react";
@@ -20,6 +27,7 @@ import {
 } from "@/features/ai-chat/utils/role-launch.ts";
 import { describeChatError } from "@/features/ai-chat/utils/error-message.ts";
 import { extractServerChatId } from "@/features/ai-chat/utils/adopt-chat-id.ts";
+import { liveTurnTokens } from "@/features/ai-chat/utils/count-stream-tokens.ts";
 import {
  dequeue,
  enqueueMessage,
@@ -60,12 +68,18 @@ interface ChatThreadProps {
   *  authoritative id the server streamed on the assistant message metadata, or
   *  undefined on a failed turn — see adopt-chat-id.ts for the full #137 design. */
  onTurnFinished: (serverChatId?: string) => void;
-  /** Called EARLY (at the stream's `start` chunk) with the authoritative server
-   *  chat id streamed on the assistant message metadata, so a brand-new chat
-   *  adopts its real id WHILE the first turn is still streaming (#174 — makes the
-   *  Copy/export button available mid-stream). Distinct from onTurnFinished,
-   *  which fires only at the terminal outcome. */
-  onServerChatId?: (serverChatId?: string) => void;
+  /** Parent-owned ref that this thread keeps updated with its live useChat
+   *  snapshot (full message list + streaming flag), so the header's
+   *  "Copy chat" export can include the in-progress, not-yet-persisted
+   *  assistant message. A ref (not state) avoids re-rendering the parent on
+   *  every streamed delta. */
+  liveStateRef?: MutableRefObject<{ messages: UIMessage[]; isStreaming: boolean }>;
+  /** Reports the live turn-token total (reasoning + output) for the in-flight
+   *  turn so the parent can show a header badge that ticks mid-stream. THROTTLED
+   *  here (~8 Hz) so the parent re-renders a handful of times a second, not on
+   *  every streamed delta. Called with `null` when no turn is in flight (the
+   *  parent then reverts the badge to the persisted context size). */
+  onLiveTurnTokens?: (tokens: number | null) => void;
 }

 /**
@@ -109,7 +123,8 @@ export default function ChatThread({
  onRolePicked,
  assistantName,
  onTurnFinished,
-  onServerChatId,
+  liveStateRef,
+  onLiveTurnTokens,
 }: ChatThreadProps) {
  const { t } = useTranslation();

@@ -278,26 +293,6 @@ export default function ChatThread({
  // Keep the flush helper pointed at the latest sendMessage instance.
  sendMessageRef.current = sendMessage;

-  // EARLY chat-id adoption (#174): the server streams the authoritative chat id
-  // on the assistant message metadata at the `start` chunk (message.metadata.
-  // chatId — see adopt-chat-id.ts / chatStreamMetadata). Forward it to the parent
-  // AS SOON AS it appears (mid-stream), so a brand-new chat adopts its real id
-  // WHILE the first turn is still streaming and activeChatId-gated affordances
-  // (the Copy/export button) light up immediately, instead of only at onFinish.
-  // Keyed by the last-seen id so we forward each distinct id exactly once. The
-  // parent's onServerChatId is idempotent and a no-op once the chat has an id.
-  const lastForwardedChatIdRef = useRef<string | undefined>(undefined);
-  useEffect(() => {
-    if (!onServerChatId) return;
-    const tail = messages[messages.length - 1];
-    if (tail?.role !== "assistant") return;
-    const serverChatId = extractServerChatId(tail);
-    if (!serverChatId || serverChatId === lastForwardedChatIdRef.current)
-      return;
-    lastForwardedChatIdRef.current = serverChatId;
-    onServerChatId(serverChatId);
-  }, [messages, onServerChatId]);
-
  // Live "turn was interrupted" marker for the CURRENT session. The red error
  // banner (driven by `error`) covers the error case; this covers an aborted
  // turn, distinguishing a manual Stop (`isAbort`) from a dropped connection
@@ -314,10 +309,70 @@ export default function ChatThread({
    if (isStreaming) setStopNotice(null);
  }, [isStreaming]);

+  // Mirror the live useChat snapshot into the parent-owned ref so the export
+  // (handled in AiChatWindow) can include the in-progress streaming turn. The
+  // cleanup clears the ref on unmount so a thread torn down by `key` on chat
+  // switch can't leak its (possibly still-streaming) tail into the next chat's
+  // export before the new thread's effect repopulates the ref.
+  useEffect(() => {
+    if (!liveStateRef) return;
+    liveStateRef.current = { messages, isStreaming };
+    return () => {
+      liveStateRef.current = { messages: [], isStreaming: false };
+    };
+  }, [liveStateRef, messages, isStreaming]);
+
+  // Report the live turn-token total to the parent header badge, THROTTLED to
+  // ~8 Hz so the parent re-renders a few times a second instead of on every
+  // streamed delta. The tail assistant message's reasoning+output (estimate while
+  // streaming, authoritative once a step reports usage) is the live figure. When
+  // the turn ends we emit a final exact value, then `null` so the parent reverts
+  // the badge to the persisted context size.
+  const lastEmitRef = useRef(0);
+  const emitTimerRef = useRef<ReturnType<typeof setTimeout> | null>(null);
+  useEffect(() => {
+    if (!onLiveTurnTokens) return;
+    if (!isStreaming) {
+      // Turn ended (or never started): clear any pending throttle and revert.
+      if (emitTimerRef.current) {
+        clearTimeout(emitTimerRef.current);
+        emitTimerRef.current = null;
+      }
+      lastEmitRef.current = 0;
+      onLiveTurnTokens(null);
+      return;
+    }
+    const tail = messages[messages.length - 1];
+    const live =
+      tail?.role === "assistant" ? liveTurnTokens(tail) : null;
+    const total = live ? live.reasoning + live.output : 0;
+    const now = Date.now();
+    const MIN_INTERVAL = 120; // ms (~8 Hz)
+    const elapsed = now - lastEmitRef.current;
+    if (elapsed >= MIN_INTERVAL) {
+      lastEmitRef.current = now;
+      onLiveTurnTokens(total);
+    } else if (!emitTimerRef.current) {
+      // Schedule a trailing emit so the FINAL value of a burst is not dropped.
+      emitTimerRef.current = setTimeout(() => {
+        emitTimerRef.current = null;
+        lastEmitRef.current = Date.now();
+        onLiveTurnTokens(total);
+      }, MIN_INTERVAL - elapsed);
+    }
+  }, [messages, isStreaming, onLiveTurnTokens]);
+
+  // Clear any pending throttle timer on unmount (chat switch via `key`) so a
+  // trailing emit can't fire into a torn-down thread's parent.
+  useEffect(() => {
+    return () => {
+      if (emitTimerRef.current) clearTimeout(emitTimerRef.current);
+    };
+  }, []);
+
  // Classify the turn error into a heading + detail so the banner names the cause
  // (connection reset, timeout, rate limit, context overflow, quota, ...) instead
-  // of a generic "Something went wrong". Computed here (not only in the JSX) so
-  // the SAME on-screen banner text can be mirrored into the export (issue #160).
+  // of a generic "Something went wrong".
  const errorView = error ? describeChatError(error.message ?? "", t) : null;

  // A role was picked with autoStart=false: the role is bound but NOTHING was
--- a/apps/client/src/features/ai-chat/components/context-badge.test.tsx
+++ b/apps/client/src/features/ai-chat/components/context-badge.test.tsx
@@ -1,69 +0,0 @@
-import { describe, it, expect } from "vitest";
-import { render, screen, fireEvent } from "@testing-library/react";
-import { MantineProvider } from "@mantine/core";
-import { ContextBadge, formatTokens } from "./context-badge";
-
-// matchMedia (read by MantineProvider) is stubbed globally in vitest.setup.ts.
-// Without an I18nextProvider, `t(key)` returns the key verbatim, so tooltip
-// labels assert against their English source strings.
-
-function renderBadge(props: {
-  contextTokens: number;
-  maxContextTokens?: number;
-}) {
-  return render(
-    <MantineProvider>
-      <ContextBadge {...props} />
-    </MantineProvider>,
-  );
-}
-
-describe("formatTokens", () => {
-  it("formats with k / M suffixes", () => {
-    expect(formatTokens(572)).toBe("572");
-    expect(formatTokens(200_000)).toBe("200.0k");
-    expect(formatTokens(1_500_000)).toBe("1.5M");
-  });
-});
-
-describe("ContextBadge", () => {
-  it("shows `current / max` when a limit is configured", () => {
-    renderBadge({ contextTokens: 572, maxContextTokens: 200_000 });
-    expect(screen.getByText("572 / 200.0k")).toBeDefined();
-  });
-
-  it("shows only the current size when no limit is configured", () => {
-    renderBadge({ contextTokens: 572, maxContextTokens: 0 });
-    expect(screen.getByText("572")).toBeDefined();
-    // No denominator rendered.
-    expect(screen.queryByText(/\//)).toBeNull();
-  });
-
-  it("treats an undefined limit as no limit", () => {
-    renderBadge({ contextTokens: 1234 });
-    expect(screen.getByText("1.2k")).toBeDefined();
-    expect(screen.queryByText(/\//)).toBeNull();
-  });
-
-  it("renders nothing until there is a current context size", () => {
-    const { container } = renderBadge({
-      contextTokens: 0,
-      maxContextTokens: 200_000,
-    });
-    expect(container.querySelector("span")).toBeNull();
-  });
-
-  it("never flips to a live per-turn counter (no live mode); shows context as-is even above max", () => {
-    // `current > max` (estimate drift / smaller-model role) is shown unclamped.
-    renderBadge({ contextTokens: 210_000, maxContextTokens: 200_000 });
-    expect(screen.getByText("210.0k / 200.0k")).toBeDefined();
-  });
-
-  it("exposes the limit tooltip label on hover", async () => {
-    renderBadge({ contextTokens: 572, maxContextTokens: 200_000 });
-    fireEvent.mouseEnter(screen.getByText("572 / 200.0k"));
-    expect(
-      await screen.findByText("Context size / model limit"),
-    ).toBeDefined();
-  });
-});
--- a/apps/client/src/features/ai-chat/components/context-badge.tsx
+++ b/apps/client/src/features/ai-chat/components/context-badge.tsx
@@ -1,61 +0,0 @@
-import { Tooltip } from "@mantine/core";
-import { useTranslation } from "react-i18next";
-import classes from "@/features/ai-chat/components/ai-chat-window.module.css";
-
-/** Compact token formatter: 1.2M / 3.4k / 950. */
-export function formatTokens(n: number): string {
-  if (n >= 1_000_000) return `${(n / 1_000_000).toFixed(1)}M`;
-  if (n >= 1_000) return `${(n / 1_000).toFixed(1)}k`;
-  return String(n);
-}
-
-interface ContextBadgeProps {
-  // Current context size for the active chat (tokens occupied in the model's
-  // window). 0 = unknown → nothing is rendered.
-  contextTokens: number;
-  // The model's context-window size (tokens), from AI settings. 0/undefined =
-  // no limit known → only the current size is shown (no denominator).
-  maxContextTokens?: number;
-}
-
-/**
- * Header badge that ALWAYS shows the current context size, and — when the model's
- * context-window size is configured — appends "/ max" so the badge reads
- * "current / max" (e.g. `572 / 200k`). This is a single, stable meaning: unlike
- * the previous design it never flips to a live per-turn generation counter while
- * streaming (that live feedback lives in the chat body's "Thinking · N tokens").
- *
- * No limit configured (or older history rows without it) → the denominator is
- * hidden and the badge shows the current size only, matching the prior at-rest
- * behaviour. `context > max` (estimate drift, or a role on a smaller model) is
- * shown as-is, without clamping.
- */
-export function ContextBadge({
-  contextTokens,
-  maxContextTokens,
-}: ContextBadgeProps) {
-  const { t } = useTranslation();
-
-  // Nothing to show until the first persisted context figure exists.
-  if (!(contextTokens > 0)) return null;
-
-  const hasMax = typeof maxContextTokens === "number" && maxContextTokens > 0;
-  const label = hasMax
-    ? `${formatTokens(contextTokens)} / ${formatTokens(maxContextTokens)}`
-    : formatTokens(contextTokens);
-
-  return (
-    <Tooltip
-      label={
-        hasMax
-          ? t("Context size / model limit")
-          : t("Current context size")
-      }
-      withArrow
-    >
-      <span className={classes.badge}>{label}</span>
-    </Tooltip>
-  );
-}
-
-export default ContextBadge;
--- a/apps/client/src/features/ai-chat/components/message-list.tsx
+++ b/apps/client/src/features/ai-chat/components/message-list.tsx
@@ -6,6 +6,7 @@ import MessageItem from "@/features/ai-chat/components/message-item.tsx";
 import TypingIndicator from "@/features/ai-chat/components/typing-indicator.tsx";
 import { isToolPart, toolRunState, ToolUiPart } from "@/features/ai-chat/utils/tool-parts.tsx";
 import { assistantMessageHasVisibleContent } from "@/features/ai-chat/utils/message-content.ts";
+import { liveTurnTokens } from "@/features/ai-chat/utils/count-stream-tokens.ts";
 import classes from "@/features/ai-chat/components/ai-chat.module.css";

 interface MessageListProps {
@@ -50,9 +51,7 @@ const BOTTOM_THRESHOLD = 40;
 * assistant message's LAST part is not live output:
 *  - the last message is still the user's (assistant hasn't started a row), or
 *  - the assistant row has no parts yet, or
- *  - its last part is an empty/whitespace text part, or a finished ("done")
- *    text part while the turn continues (the model paused after some narration
- *    and is thinking about its next step), or
+ *  - its last part is an empty/whitespace text part, or
 *  - its last part is a finished/errored tool (the model is thinking about the
 *    next step between tool calls).
 * It hides only while output is actively rendering: a non-empty streaming text
@@ -66,19 +65,7 @@ export function showTypingIndicator(messages: UIMessage[], isStreaming: boolean)
  const lastPart = last.parts[last.parts.length - 1];
  if (!lastPart) return true; // assistant row exists but has no parts yet.
  // The answer text is actively streaming in -> MessageItem renders it; no dots.
-  // Only while it is STILL streaming, though: once a non-empty text part is
-  // finalized ("done") but the turn is still in flight, the model has paused
-  // after some narration and is working on its next step (e.g. about to call a
-  // tool) — nothing is visibly progressing, so the dots must show. A text part
-  // without a `state` is treated as still-rendering (kept suppressed); this
-  // branch only runs while streaming, where live parts always carry a state.
-  if (
-    lastPart.type === "text" &&
-    lastPart.text.trim().length > 0 &&
-    (lastPart as { state?: "streaming" | "done" }).state !== "done"
-  ) {
-    return false;
-  }
+  if (lastPart.type === "text" && lastPart.text.trim().length > 0) return false;
  // A tool still in flight shows its own Loader in ToolCallCard -> no dots.
  if (
    isToolPart(lastPart.type) &&
@@ -108,6 +95,19 @@ export function typingIndicatorShowsName(messages: UIMessage[]): boolean {
  return !assistantMessageHasVisibleContent(last);
 }

+/**
+ * The live thinking-token count to show on the standalone typing indicator. It
+ * is the reasoning split of the tail assistant message (estimate while streaming,
+ * authoritative once the server attaches usage at a step/turn boundary). Returns
+ * 0 when the turn has produced no reasoning yet — the indicator then shows the
+ * plain "Thinking…" line.
+ */
+export function tailThinkingTokens(messages: UIMessage[]): number {
+  const last = messages[messages.length - 1];
+  if (!last || last.role !== "assistant") return 0;
+  return liveTurnTokens(last).reasoning;
+}
+
 /**
 * Scrollable transcript. Auto-scrolls to the newest message as it streams in,
 * but only while the user is pinned to the bottom — if they scrolled up to read
@@ -208,6 +208,7 @@ export default function MessageList({
          <TypingIndicator
            assistantName={assistantName}
            showName={typingIndicatorShowsName(messages)}
+            thinkingTokens={tailThinkingTokens(messages)}
          />
        )}
      </Stack>
--- a/apps/client/src/features/ai-chat/components/reasoning-block.tsx
+++ b/apps/client/src/features/ai-chat/components/reasoning-block.tsx
@@ -3,7 +3,6 @@ import { Box, Collapse, Group, Text, UnstyledButton } from "@mantine/core";
 import { IconChevronDown } from "@tabler/icons-react";
 import { useTranslation } from "react-i18next";
 import { estimateTokens } from "@/features/ai-chat/utils/count-stream-tokens.ts";
-import { collapseBlankLines } from "@/features/ai-chat/utils/collapse-blank-lines.ts";
 import { renderChatMarkdown } from "@/features/ai-chat/utils/markdown.ts";
 import classes from "@/features/ai-chat/components/ai-chat.module.css";

@@ -34,12 +33,7 @@ export default function ReasoningBlock({ text, tokens }: ReasoningBlockProps) {
  // Authoritative count wins; otherwise estimate live from the streamed text.
  const count = tokens && tokens > 0 ? tokens : estimateTokens(text);
  const trimmed = text.trim();
-  // Collapse the blank-line gaps the model emits between every list item /
-  // paragraph so the reasoning renders compactly (tight lists, joined
-  // paragraphs) — see collapseBlankLines. ONLY here, not in the normal answer.
-  const html = trimmed
-    ? renderChatMarkdown(collapseBlankLines(trimmed), {})
-    : "";
+  const html = trimmed ? renderChatMarkdown(trimmed, {}) : "";

  return (
    <Box className={classes.reasoningBlock} mb={6}>
--- a/apps/client/src/features/ai-chat/components/show-typing-indicator.test.ts
+++ b/apps/client/src/features/ai-chat/components/show-typing-indicator.test.ts
@@ -82,14 +82,4 @@ describe("showTypingIndicator", () => {
      showTypingIndicator([msg("assistant", [doneTool, text])], true),
    ).toBe(false);
  });
-
-  it("shows while streaming after a text part is finalized (paused before the next step)", () => {
-    const doneText = { type: "text", text: "Now creating the page in", state: "done" } as unknown as UIMessage["parts"][number];
-    expect(showTypingIndicator([msg("assistant", [doneText])], true)).toBe(true);
-  });
-
-  it("hides while a text part is actively streaming (state: streaming)", () => {
-    const streamingText = { type: "text", text: "Now writ", state: "streaming" } as unknown as UIMessage["parts"][number];
-    expect(showTypingIndicator([msg("assistant", [streamingText])], true)).toBe(false);
-  });
 });
--- a/apps/client/src/features/ai-chat/components/tail-thinking-tokens.test.ts
+++ b/apps/client/src/features/ai-chat/components/tail-thinking-tokens.test.ts
@@ -0,0 +1,50 @@
+import { describe, expect, it } from "vitest";
+import type { UIMessage } from "@ai-sdk/react";
+import { tailThinkingTokens } from "@/features/ai-chat/components/message-list.tsx";
+
+/**
+ * Pure-helper tests for `tailThinkingTokens`: the live thinking-token count the
+ * standalone typing indicator shows. It is the reasoning split of the tail
+ * assistant message (estimate while streaming, authoritative once usage arrives).
+ */
+const msg = (
+  role: "user" | "assistant",
+  parts: unknown[],
+  metadata?: unknown,
+): UIMessage =>
+  ({ id: Math.random().toString(), role, parts, metadata }) as UIMessage;
+
+describe("tailThinkingTokens", () => {
+  it("is 0 when there are no messages", () => {
+    expect(tailThinkingTokens([])).toBe(0);
+  });
+
+  it("is 0 when the tail message is the user's", () => {
+    expect(tailThinkingTokens([msg("user", [{ type: "text", text: "q" }])])).toBe(0);
+  });
+
+  it("is 0 when the assistant has produced no reasoning yet", () => {
+    expect(
+      tailThinkingTokens([msg("assistant", [{ type: "text", text: "answer" }])]),
+    ).toBe(0);
+  });
+
+  it("estimates reasoning tokens from streamed reasoning text", () => {
+    // 8 chars -> 2 tokens.
+    expect(
+      tailThinkingTokens([
+        msg("assistant", [{ type: "reasoning", text: "12345678" }]),
+      ]),
+    ).toBe(2);
+  });
+
+  it("uses authoritative usage.reasoningTokens once the server attaches it", () => {
+    expect(
+      tailThinkingTokens([
+        msg("assistant", [{ type: "reasoning", text: "x" }], {
+          usage: { outputTokens: 100, reasoningTokens: 42 },
+        }),
+      ]),
+    ).toBe(42);
+  });
+});
--- a/apps/client/src/features/ai-chat/components/typing-indicator.tsx
+++ b/apps/client/src/features/ai-chat/components/typing-indicator.tsx
@@ -16,6 +16,12 @@ interface TypingIndicatorProps {
   * assistant row above already shows the same name, to avoid a duplicate label.
   */
  showName?: boolean;
+  /**
+   * Live thinking/reasoning token count for the in-flight turn. When > 0 the
+   * typing line becomes `Thinking… · {count} tokens` (like Claude Code). Omitted
+   * / 0 keeps the plain `Thinking…` line.
+   */
+  thinkingTokens?: number;
 }

 /**
@@ -26,20 +32,23 @@ interface TypingIndicatorProps {
 *
 * Mirrors the assistant row layout in MessageItem (the dimmed label), so it reads
 * as the assistant's bubble taking shape. The dimmed label uses the configured
- * identity name when provided (otherwise the generic "AI agent"); below it the
- * animated dots stand in for the nascent bubble until content arrives.
+ * identity name when provided (otherwise the generic "AI agent"), while the
+ * typing line is always the generic "Thinking…" (it never includes the
+ * role/identity name).
 */
-export default function TypingIndicator({ assistantName, showName = true }: TypingIndicatorProps) {
+export default function TypingIndicator({ assistantName, showName = true, thinkingTokens }: TypingIndicatorProps) {
  const { t } = useTranslation();
  const name = resolveAssistantName(assistantName);
+  // Show the running thinking-token count only once there is something to count.
+  const thinkingLine =
+    thinkingTokens && thinkingTokens > 0
+      ? t("Thinking… · {{count}} tokens", { count: thinkingTokens })
+      : t("Thinking…");

  return (
    <Box className={classes.messageRow}>
      {showName !== false && (
-        // Extra bottom gap (vs MessageItem's mb={4}) gives the small bouncing
-        // dots room below the name label; without it they crowd the label. Only
-        // applies when the name is shown — the nameless case spaces fine on its own.
-        <Text size="xs" c="dimmed" mb={8}>
+        <Text size="xs" c="dimmed" mb={4}>
          {name ?? t("AI agent")}
        </Text>
      )}
@@ -49,6 +58,9 @@ export default function TypingIndicator({ assistantName, showName = true }: Typi
          <span />
          <span />
        </span>
+        <Text size="sm" c="dimmed">
+          {thinkingLine}
+        </Text>
      </Group>
    </Box>
  );
--- a/apps/client/src/features/ai-chat/hooks/use-chat-session.test.tsx
+++ b/apps/client/src/features/ai-chat/hooks/use-chat-session.test.tsx
@@ -64,10 +64,7 @@ describe("useChatSession", () => {
    result.current.onTurnFinished(undefined);
    expect(setActiveChatId).not.toHaveBeenCalled();
    // The refetch lands with the new row => adopt it.
-    rerender({
-      activeChatId: null,
-      chats: { items: [{ id: "x" }, { id: "new" }] },
-    });
+    rerender({ activeChatId: null, chats: { items: [{ id: "x" }, { id: "new" }] } });
    expect(setActiveChatId).toHaveBeenCalledWith("new");
  });

@@ -91,10 +88,7 @@ describe("useChatSession", () => {
    });
    result.current.onTurnFinished(undefined);
    // a was deleted, new was added — same length, but membership changed.
-    rerender({
-      activeChatId: null,
-      chats: { items: [{ id: "b" }, { id: "new" }] },
-    });
+    rerender({ activeChatId: null, chats: { items: [{ id: "b" }, { id: "new" }] } });
    expect(setActiveChatId).toHaveBeenCalledWith("new");
  });

@@ -177,40 +171,6 @@ describe("useChatSession", () => {
    expect(setActiveChatId).not.toHaveBeenCalledWith("late");
  });

-  it("#174 early adopt: onServerChatId adopts the streamed id mid-stream (Copy button available during the first turn)", () => {
-    // Brand-new chat: no id yet. The server streams the real chat id "A" on the
-    // `start` chunk WHILE the first turn is still streaming (before onTurnFinished
-    // fires at the terminal outcome). The hook must adopt it immediately so the
-    // window's activeChatId-gated Copy/export button lights up during the stream.
-    const { result, setActiveChatId } = setup({
-      activeChatId: null,
-      chats: { items: [] },
-    });
-    result.current.onServerChatId("A");
-    expect(setActiveChatId).toHaveBeenCalledWith("A");
-  });
-
-  it("#174 early adopt is in-place: threadKey stays stable (live stream not torn down)", () => {
-    const chats = { items: [] };
-    const { result, rerender } = setup({ activeChatId: null, chats });
-    const keyBefore = result.current.threadKey;
-    result.current.onServerChatId("A");
-    // Parent reflects the adopted id back in; the SAME mount key is kept so the
-    // in-flight useChat store (the streaming turn) is preserved.
-    rerender({ activeChatId: "A", chats });
-    expect(result.current.threadKey).toBe(keyBefore);
-  });
-
-  it("#174 early adopt: no-op for an existing chat and for a missing id", () => {
-    const { result, setActiveChatId } = setup({
-      activeChatId: "chat-1",
-      chats: { items: [{ id: "chat-1" }] },
-    });
-    result.current.onServerChatId("chat-1"); // already has an id
-    result.current.onServerChatId(undefined); // no streamed id
-    expect(setActiveChatId).not.toHaveBeenCalled();
-  });
-
  it("in-place adopt keeps threadKey stable; an external switch remounts", () => {
    const chats = { items: [{ id: "B" }] };
    const { result, rerender } = setup({ activeChatId: null, chats });
--- a/apps/client/src/features/ai-chat/hooks/use-chat-session.ts
+++ b/apps/client/src/features/ai-chat/hooks/use-chat-session.ts
@@ -34,13 +34,6 @@ export interface UseChatSessionResult {
  /** Call when a turn finishes; `serverChatId` is the authoritative streamed id
   *  (undefined on a failed turn). Handles new-chat id adoption + invalidations. */
  onTurnFinished: (serverChatId?: string) => void;
-  /** Call EARLY (at the stream's `start` chunk) with the authoritative streamed
-   *  chat id so a brand-new chat adopts its real id WHILE its first turn is still
-   *  streaming — making `activeChatId`-gated affordances (e.g. the Copy/export
-   *  button, #174) available immediately. In-place adoption only (same mount key,
-   *  no list/messages invalidation — that is left to onTurnFinished at the end).
-   *  Idempotent and a no-op once the chat already has an id. */
-  onServerChatId: (serverChatId?: string) => void;
  /** Disarm any pending error-path new-chat fallback. The window calls this from
   *  startNewChat/selectChat so a late refetch can't yank the user back into a
   *  just-failed chat after they explicitly moved on. */
@@ -92,10 +85,13 @@ export function useChatSession(
  // `newThread`/`switchThread` to (re)mount, `adoptThread` for in-place adoption.
  // Initial: a non-null activeChatId switches to it; a null one gets a fresh
  // session key with no chat id yet.
-  const [thread, dispatch] = useReducer(threadSessionReducer, undefined, () =>
-    activeChatId === null
-      ? newThread(`new-${generateId()}`)
-      : switchThread(activeChatId),
+  const [thread, dispatch] = useReducer(
+    threadSessionReducer,
+    undefined,
+    () =>
+      activeChatId === null
+        ? newThread(`new-${generateId()}`)
+        : switchThread(activeChatId),
  );

  // Error-path fallback for new-chat id adoption. When a brand-new chat's first
@@ -154,31 +150,6 @@ export function useChatSession(
    [chats, setActiveChatId, onInvalidateChatList, onInvalidateChatMessages],
  );

-  // EARLY adoption (#174): adopt the authoritative streamed chat id the moment
-  // the server emits it on the `start` chunk, so a brand-new chat gets its real
-  // `activeChatId` WHILE its first turn streams — not only at terminal
-  // onTurnFinished. This makes the activeChatId-gated Copy/export button
-  // available during the first turn. Pure in-place adoption (same mount key, like
-  // the primary path) with NO invalidation: the list/messages refresh stays on
-  // onTurnFinished at the end of the turn. Reads the live id from the ref so a
-  // repeat call after adoption is a no-op (resolveAdoptedChatId only fires for a
-  // still-new chat).
-  const onServerChatId = useCallback(
-    (serverChatId?: string) => {
-      const adopted = resolveAdoptedChatId(
-        activeChatIdRef.current,
-        serverChatId,
-      );
-      if (!adopted) return;
-      activeChatIdRef.current = adopted;
-      setActiveChatId(adopted);
-      dispatch({ type: "adopt", chatId: adopted });
-      // Early adoption beat the error-path fallback to it — disarm.
-      pendingNewChatRef.current = null;
-    },
-    [setActiveChatId],
-  );
-
  // FALLBACK resolver. Armed only by onTurnFinished when a brand-new chat's first
  // turn errored before the `start` chunk (no authoritative id streamed). Once
  // the per-user list refetch lands with the just-created row, adopt the SINGLE
@@ -262,7 +233,6 @@ export function useChatSession(
    threadKey: thread.key,
    waitingForHistory,
    onTurnFinished,
-    onServerChatId,
    cancelPendingAdoption,
  };
 }
--- a/apps/client/src/features/ai-chat/services/ai-chat-service.ts
+++ b/apps/client/src/features/ai-chat/services/ai-chat-service.ts
@@ -50,24 +50,6 @@ export async function deleteAiChat(chatId: string): Promise<void> {
  await api.post("/ai-chat/delete", { chatId });
 }

-/**
- * Export a chat to Markdown (#183). The server renders the transcript from the
- * persisted rows (the DB is the single source of truth — including an
- * interrupted turn's in-progress row, persisted upfront + per step), so the
- * client just copies the returned string. `lang` localizes the few fixed
- * role/tool labels; defaults to English server-side when omitted.
- */
-export async function exportAiChat(
-  chatId: string,
-  lang?: string,
-): Promise<string> {
-  const req = await api.post<{ markdown: string }>("/ai-chat/export", {
-    chatId,
-    lang,
-  });
-  return req.data.markdown;
-}
-
 /**
 * Agent roles API (`/ai-chat/roles`). `list` is available to any workspace
 * member (for the chat-creation picker); create/update/delete are admin-only
@@ -94,8 +76,6 @@ export async function updateAiRole(data: IAiRoleUpdate): Promise<IAiRole> {

 /** Soft-delete a role (admin). */
 export async function deleteAiRole(id: string): Promise<{ success: true }> {
-  const req = await api.post<{ success: true }>("/ai-chat/roles/delete", {
-    id,
-  });
+  const req = await api.post<{ success: true }>("/ai-chat/roles/delete", { id });
  return req.data;
 }
--- a/apps/client/src/features/ai-chat/types/ai-chat.types.ts
+++ b/apps/client/src/features/ai-chat/types/ai-chat.types.ts
@@ -113,14 +113,9 @@ export interface IAiChatMessageRow {
    };
    // Current context size for the turn = final-step (input+output) tokens, i.e.
    // how much the conversation occupies in the model's context window after this
-    // turn. Distinct from `usage` (legacy cumulative totalUsage). Shown as the
-    // numerator of the floating window's "current / max" header badge.
+    // turn. Distinct from `usage` (legacy cumulative totalUsage). Shown in the
+    // floating window's header badge.
    contextTokens?: number;
-    // The model's context-window size (tokens), admin-configured in AI settings
-    // and stamped onto the turn server-side. The denominator of the header badge.
-    // Absent/0 (older rows, or no limit configured) → the badge hides the
-    // denominator and shows only the current context size (`contextTokens`).
-    maxContextTokens?: number;
    // Set on an assistant row whose turn ended in a provider/stream error; the
    // raw provider error text (e.g. "402: ...") for inline display in the thread.
    error?: string;
--- a/apps/client/src/features/ai-chat/utils/chat-markdown.test.ts
+++ b/apps/client/src/features/ai-chat/utils/chat-markdown.test.ts
@@ -0,0 +1,491 @@
+import { describe, it, expect } from "vitest";
+import { buildChatMarkdown } from "@/features/ai-chat/utils/chat-markdown.ts";
+import type { IAiChatMessageRow } from "@/features/ai-chat/types/ai-chat.types.ts";
+
+/**
+ * Tests for the client-only Markdown export builder. The output embeds a live
+ * `new Date().toISOString()` export timestamp; we never assert that value, only
+ * the deterministic structure (headings, numbering, fenced blocks, totals).
+ *
+ * A pass-through translator keeps role/tool labels predictable so the
+ * structural assertions are stable without an i18n runtime.
+ */
+const t = (key: string, values?: Record<string, unknown>): string => {
+  if (values && typeof values.name === "string") {
+    return key.replace("{{name}}", values.name);
+  }
+  return key;
+};
+
+function row(partial: Partial<IAiChatMessageRow>): IAiChatMessageRow {
+  return {
+    id: partial.id ?? "id",
+    role: partial.role ?? "user",
+    content: partial.content ?? null,
+    metadata: partial.metadata ?? null,
+    createdAt: partial.createdAt ?? "2026-06-21T00:00:00.000Z",
+  };
+}
+
+describe("buildChatMarkdown — structure", () => {
+  it("emits the title heading, chat id and message count", () => {
+    const md = buildChatMarkdown({
+      title: "My chat",
+      chatId: "chat-123",
+      rows: [],
+      t,
+    });
+    expect(md).toContain("# My chat");
+    expect(md).toContain("- Chat ID: `chat-123`");
+    expect(md).toContain("- Messages: 0");
+    expect(md).toContain("- Exported:"); // timestamp present, value not asserted
+  });
+
+  it("falls back to the translated 'Untitled chat' for empty/blank titles", () => {
+    expect(
+      buildChatMarkdown({ title: null, chatId: "c", rows: [], t }),
+    ).toContain("# Untitled chat");
+    expect(
+      buildChatMarkdown({ title: "   ", chatId: "c", rows: [], t }),
+    ).toContain("# Untitled chat");
+  });
+
+  it("numbers rows sequentially with role headings", () => {
+    const md = buildChatMarkdown({
+      title: "t",
+      chatId: "c",
+      rows: [
+        row({ role: "user", content: "hi" }),
+        row({ role: "assistant", content: "hello" }),
+        row({ role: "user", content: "again" }),
+      ],
+      t,
+    });
+    expect(md).toContain("## 1. You");
+    expect(md).toContain("## 2. AI agent");
+    expect(md).toContain("## 3. You");
+    // Heading numbering is strictly index+1, not e.g. role-relative.
+    expect(md).not.toContain("## 0.");
+  });
+
+  it("renders the per-row text content from `content` when no metadata.parts", () => {
+    const md = buildChatMarkdown({
+      title: "t",
+      chatId: "c",
+      rows: [row({ role: "user", content: "plain body" })],
+      t,
+    });
+    expect(md).toContain("plain body");
+  });
+});
+
+describe("buildChatMarkdown — text parts", () => {
+  it("skips empty / whitespace-only text parts", () => {
+    const md = buildChatMarkdown({
+      title: "t",
+      chatId: "c",
+      rows: [
+        row({
+          role: "assistant",
+          content: "ignored-content",
+          metadata: {
+            parts: [
+              { type: "text", text: "   " },
+              { type: "text", text: "" },
+              { type: "text", text: "kept line" },
+              // eslint-disable-next-line @typescript-eslint/no-explicit-any
+            ] as any,
+          },
+        }),
+      ],
+      t,
+    });
+    expect(md).toContain("kept line");
+    // Whitespace-only part contributed no block of its own.
+    expect(md).not.toContain("   \n\n");
+    // When metadata.parts exists, the plain `content` fallback is NOT used.
+    expect(md).not.toContain("ignored-content");
+  });
+});
+
+describe("buildChatMarkdown — tool parts", () => {
+  it("renders a tool label, name, state and fenced Input/Output blocks", () => {
+    const md = buildChatMarkdown({
+      title: "t",
+      chatId: "c",
+      rows: [
+        row({
+          role: "assistant",
+          content: "",
+          metadata: {
+            parts: [
+              {
+                type: "tool-getPage",
+                state: "output-available",
+                input: { pageId: "p1" },
+                output: { id: "p1", title: "Home" },
+                // eslint-disable-next-line @typescript-eslint/no-explicit-any
+              } as any,
+            ],
+          },
+        }),
+      ],
+      t,
+    });
+    // Known tool name maps to its label key; raw name in backticks; done state.
+    expect(md).toContain("**Tool: Read page** (`getPage`) — done");
+    expect(md).toContain("Input:");
+    expect(md).toContain("Output:");
+    // Fenced JSON blocks contain the stringified payloads.
+    expect(md).toContain('"pageId": "p1"');
+    expect(md).toContain('"title": "Home"');
+    expect(md).toContain("```json");
+  });
+
+  it("renders the generic label for an unknown tool and surfaces errorText", () => {
+    const md = buildChatMarkdown({
+      title: "t",
+      chatId: "c",
+      rows: [
+        row({
+          role: "assistant",
+          content: "",
+          metadata: {
+            parts: [
+              {
+                type: "tool-mysteryTool",
+                state: "output-error",
+                input: { a: 1 },
+                errorText: "boom",
+                // eslint-disable-next-line @typescript-eslint/no-explicit-any
+              } as any,
+            ],
+          },
+        }),
+      ],
+      t,
+    });
+    expect(md).toContain("**Tool: Ran tool mysteryTool** (`mysteryTool`) — error");
+    expect(md).toContain("**Error:** boom");
+  });
+
+  it("does not throw on a circular tool input (falls back to String)", () => {
+    // eslint-disable-next-line @typescript-eslint/no-explicit-any
+    const circular: any = {};
+    circular.self = circular;
+    expect(() =>
+      buildChatMarkdown({
+        title: "t",
+        chatId: "c",
+        rows: [
+          row({
+            role: "assistant",
+            content: "",
+            metadata: {
+              parts: [
+                {
+                  type: "tool-getPage",
+                  state: "input-available",
+                  input: circular,
+                  // eslint-disable-next-line @typescript-eslint/no-explicit-any
+                } as any,
+              ],
+            },
+          }),
+        ],
+        t,
+      }),
+    ).not.toThrow();
+  });
+});
+
+describe("buildChatMarkdown — fence anti-breakout", () => {
+  it("lengthens the delimiter so embedded ``` cannot break out of the block", () => {
+    // Tool input whose stringified string form contains a literal ``` run.
+    const md = buildChatMarkdown({
+      title: "t",
+      chatId: "c",
+      rows: [
+        row({
+          role: "assistant",
+          content: "",
+          metadata: {
+            parts: [
+              {
+                type: "tool-getPage",
+                state: "output-available",
+                // A bare string passes through stringify() verbatim.
+                input: "before ``` after",
+                output: "x",
+                // eslint-disable-next-line @typescript-eslint/no-explicit-any
+              } as any,
+            ],
+          },
+        }),
+      ],
+      t,
+    });
+    // The fence around the 3-backtick content must use at least 4 backticks so
+    // the embedded ``` run cannot terminate the block.
+    expect(md).toContain("````json\nbefore ``` after\n````");
+    // Robust anti-breakout check: the opening fence delimiter is strictly
+    // longer than the longest backtick run inside the wrapped content. (A naive
+    // `not.toContain("```json...")` is a false negative — a 4-backtick fence
+    // textually contains the 3-backtick substring.)
+    const open = md.match(/(`{3,})json\nbefore/);
+    expect(open).not.toBeNull();
+    expect(open![1].length).toBeGreaterThan(3); // > the 3-backtick run in content
+  });
+
+  it("uses a 5-backtick fence when the content has a 4-backtick run", () => {
+    const md = buildChatMarkdown({
+      title: "t",
+      chatId: "c",
+      rows: [
+        row({
+          role: "assistant",
+          content: "",
+          metadata: {
+            parts: [
+              {
+                type: "tool-getPage",
+                state: "output-available",
+                input: "a ```` b",
+                // eslint-disable-next-line @typescript-eslint/no-explicit-any
+              } as any,
+            ],
+          },
+        }),
+      ],
+      t,
+    });
+    expect(md).toContain("`````json\na ```` b\n`````");
+  });
+});
+
+describe("buildChatMarkdown — token totals", () => {
+  it("prints the total-tokens line only when the summed usage is > 0", () => {
+    const withTokens = buildChatMarkdown({
+      title: "t",
+      chatId: "c",
+      rows: [
+        row({
+          role: "assistant",
+          content: "x",
+          metadata: { usage: { inputTokens: 10, outputTokens: 5 } },
+        }),
+      ],
+      t,
+    });
+    expect(withTokens).toContain("- Total tokens: 15");
+    // Per-row usage footer too.
+    expect(withTokens).toContain("_Tokens — in: 10, out: 5, total: 15_");
+  });
+
+  it("omits the total-tokens line when the sum is 0 / usage absent", () => {
+    const noTokens = buildChatMarkdown({
+      title: "t",
+      chatId: "c",
+      rows: [
+        row({ role: "user", content: "hi" }),
+        row({
+          role: "assistant",
+          content: "x",
+          metadata: { usage: { inputTokens: 0, outputTokens: 0 } },
+        }),
+      ],
+      t,
+    });
+    expect(noTokens).not.toContain("- Total tokens:");
+  });
+
+  it("uses totalTokens when present rather than summing in/out", () => {
+    const md = buildChatMarkdown({
+      title: "t",
+      chatId: "c",
+      rows: [
+        row({
+          role: "assistant",
+          content: "x",
+          metadata: { usage: { inputTokens: 3, outputTokens: 4, totalTokens: 99 } },
+        }),
+      ],
+      t,
+    });
+    expect(md).toContain("- Total tokens: 99");
+  });
+
+  it("appends the reasoning figure to the row footer when reasoningTokens > 0", () => {
+    const md = buildChatMarkdown({
+      title: "t",
+      chatId: "c",
+      rows: [
+        row({
+          role: "assistant",
+          content: "x",
+          metadata: {
+            usage: { inputTokens: 10, outputTokens: 8, reasoningTokens: 3 },
+          },
+        }),
+      ],
+      t,
+    });
+    expect(md).toContain("_Tokens — in: 10, out: 8, reasoning: 3, total: 18_");
+  });
+
+  it("omits the reasoning figure when reasoningTokens is 0 / absent", () => {
+    const zero = buildChatMarkdown({
+      title: "t",
+      chatId: "c",
+      rows: [
+        row({
+          role: "assistant",
+          content: "x",
+          metadata: {
+            usage: { inputTokens: 10, outputTokens: 5, reasoningTokens: 0 },
+          },
+        }),
+      ],
+      t,
+    });
+    expect(zero).toContain("_Tokens — in: 10, out: 5, total: 15_");
+    expect(zero).not.toContain("reasoning:");
+
+    const absent = buildChatMarkdown({
+      title: "t",
+      chatId: "c",
+      rows: [
+        row({
+          role: "assistant",
+          content: "x",
+          metadata: { usage: { inputTokens: 10, outputTokens: 5 } },
+        }),
+      ],
+      t,
+    });
+    expect(absent).not.toContain("reasoning:");
+  });
+});
+
+describe("buildChatMarkdown — pending / in-progress messages", () => {
+  it("continues the heading numbering after the persisted rows", () => {
+    const md = buildChatMarkdown({
+      title: "t",
+      chatId: "c",
+      rows: [row({ role: "user", content: "persisted" })],
+      pending: [
+        {
+          role: "user",
+          parts: [{ type: "text", text: "live question" }],
+          generating: false,
+        },
+        {
+          role: "assistant",
+          parts: [{ type: "text", text: "live answer" }],
+          generating: true,
+        },
+      ],
+      t,
+    });
+    expect(md).toContain("## 1. You");
+    expect(md).toContain("## 2. You");
+    expect(md).toContain("## 3. AI agent");
+    expect(md).toContain("live question");
+    expect(md).toContain("live answer");
+  });
+
+  it("flags a generating assistant pending message as still being generated", () => {
+    const md = buildChatMarkdown({
+      title: "t",
+      chatId: "c",
+      rows: [row({ role: "user", content: "persisted" })],
+      pending: [
+        {
+          role: "assistant",
+          parts: [{ type: "text", text: "partial reply" }],
+          generating: true,
+        },
+      ],
+      t,
+    });
+    expect(md).toContain("partial reply");
+    expect(md).toContain("still being generated");
+  });
+
+  it("renders a non-generating user pending message without the note", () => {
+    const md = buildChatMarkdown({
+      title: "t",
+      chatId: "c",
+      rows: [row({ role: "user", content: "persisted" })],
+      pending: [
+        {
+          role: "user",
+          parts: [{ type: "text", text: "my live message" }],
+          generating: false,
+        },
+      ],
+      t,
+    });
+    expect(md).toContain("my live message");
+    expect(md).not.toContain("still being generated");
+  });
+
+  it("includes the pending messages in the metadata message count", () => {
+    const md = buildChatMarkdown({
+      title: "t",
+      chatId: "c",
+      rows: [
+        row({ role: "user", content: "a" }),
+        row({ role: "assistant", content: "b" }),
+      ],
+      pending: [
+        {
+          role: "user",
+          parts: [{ type: "text", text: "c" }],
+          generating: false,
+        },
+        {
+          role: "assistant",
+          parts: [{ type: "text", text: "d" }],
+          generating: true,
+        },
+      ],
+      t,
+    });
+    // 2 persisted rows + 2 pending = 4.
+    expect(md).toContain("- Messages: 4");
+  });
+
+  it("emits the heading and note for a generating assistant with empty parts", () => {
+    expect(() =>
+      buildChatMarkdown({
+        title: "t",
+        chatId: "c",
+        rows: [row({ role: "user", content: "persisted" })],
+        pending: [
+          {
+            role: "assistant",
+            parts: [],
+            generating: true,
+          },
+        ],
+        t,
+      }),
+    ).not.toThrow();
+    const md = buildChatMarkdown({
+      title: "t",
+      chatId: "c",
+      rows: [row({ role: "user", content: "persisted" })],
+      pending: [
+        {
+          role: "assistant",
+          parts: [],
+          generating: true,
+        },
+      ],
+      t,
+    });
+    expect(md).toContain("## 2. AI agent");
+    expect(md).toContain("still being generated");
+  });
+});
--- a/apps/client/src/features/ai-chat/utils/chat-markdown.ts
+++ b/apps/client/src/features/ai-chat/utils/chat-markdown.ts
@@ -0,0 +1,215 @@
+/**
+ * Client-only Markdown builder for an AI agent chat. Serializes the already
+ * persisted message rows (loaded via `useAiChatMessagesQuery`) into a single
+ * Markdown string suitable for copying to the clipboard. NO network call is
+ * made and NO server/DB code is touched — this reuses the rich "request
+ * internals" (tool calls with input/output, per-message token usage,
+ * finish/error info) that the chat already holds client-side.
+ *
+ * Only role labels and tool action labels are localized via the passed-in `t`
+ * translator; the structural document words (Input/Output/Error/Tokens/...) are
+ * plain English constants because the output is a technical artifact.
+ */
+
+import type { IAiChatMessageRow } from "@/features/ai-chat/types/ai-chat.types.ts";
+import {
+  ToolUiPart,
+  getToolName,
+  toolRunState,
+  toolLabelKey,
+} from "@/features/ai-chat/utils/tool-parts.tsx";
+
+// Minimal translator signature compatible with react-i18next's `t`.
+type Translate = (key: string, values?: Record<string, unknown>) => string;
+
+interface BuildChatMarkdownArgs {
+  title: string | null;
+  chatId: string;
+  rows: IAiChatMessageRow[];
+  /** In-progress, not-yet-persisted live messages (the current streaming
+   *  turn) to append after the persisted rows. `generating: true` adds a
+   *  note that the message is still being produced. */
+  pending?: PendingMessage[];
+  t: Translate;
+}
+
+/** A single AI SDK UIMessage part (text part or other). */
+interface TextLikePart {
+  type: string;
+  text?: string;
+}
+
+/** A live, not-yet-persisted message (current streaming turn) to append. */
+interface PendingMessage {
+  role: "user" | "assistant" | string;
+  parts: TextLikePart[];
+  generating: boolean;
+}
+
+/**
+ * Stringify an arbitrary tool input/output value for a fenced block. Strings
+ * pass through as-is; everything else is pretty-printed JSON, falling back to
+ * `String(value)` if serialization throws (e.g. a circular structure).
+ */
+function stringify(value: unknown): string {
+  if (typeof value === "string") return value;
+  try {
+    return JSON.stringify(value, null, 2);
+  } catch {
+    return String(value);
+  }
+}
+
+/**
+ * Wrap `code` in a fenced code block whose backtick delimiter is LONGER than
+ * the longest backtick run inside the content, so embedded backticks (or even
+ * a literal ``` fence) never break out of the block. Minimum 3 backticks.
+ */
+function fence(code: string, lang = ""): string {
+  const runs: string[] = code.match(/`+/g) ?? [];
+  const longest = runs.reduce((m, s) => Math.max(m, s.length), 0);
+  const delim = "`".repeat(Math.max(3, longest + 1));
+  return `${delim}${lang}\n${code}\n${delim}`;
+}
+
+/** Per-row token count, mirroring the header sum in ai-chat-window.tsx. */
+function rowTokens(usage: {
+  inputTokens?: number;
+  outputTokens?: number;
+  totalTokens?: number;
+  reasoningTokens?: number;
+}): number {
+  return (
+    usage.totalTokens ?? (usage.inputTokens ?? 0) + (usage.outputTokens ?? 0)
+  );
+}
+
+/** Render one message's UIMessage parts into an array of Markdown blocks
+ *  (text blocks + tool blocks). Mirrors MessageItem's part handling. */
+function renderMessageParts(parts: TextLikePart[], t: Translate): string[] {
+  const out: string[] = [];
+
+  for (const part of parts) {
+    if (part.type === "text") {
+      const text = (part.text ?? "").trim();
+      // Skip empty/whitespace-only text parts (matches MessageItem).
+      if (text.length > 0) out.push(text);
+      continue;
+    }
+
+    const isToolPart =
+      part.type.startsWith("tool-") || part.type === "dynamic-tool";
+    if (!isToolPart) continue;
+
+    const tp = part as unknown as ToolUiPart;
+    const name = getToolName(tp);
+    const { key, values } = toolLabelKey(name);
+    const label = t(key, values);
+    const state = toolRunState(tp.state);
+
+    const toolLines: string[] = [
+      `**Tool: ${label}** (\`${name}\`) — ${state}`,
+    ];
+    if (tp.input !== undefined) {
+      toolLines.push("Input:");
+      toolLines.push(fence(stringify(tp.input), "json"));
+    }
+    if (tp.output !== undefined) {
+      toolLines.push("Output:");
+      toolLines.push(fence(stringify(tp.output), "json"));
+    }
+    if (tp.errorText) {
+      toolLines.push(`**Error:** ${tp.errorText}`);
+    }
+    out.push(toolLines.join("\n\n"));
+  }
+
+  return out;
+}
+
+/**
+ * Serialize a chat to a Markdown string. Pure (apart from `new Date()` for the
+ * export timestamp), so it is straightforward to unit-test.
+ */
+export function buildChatMarkdown(args: BuildChatMarkdownArgs): string {
+  const { title, chatId, rows, pending, t } = args;
+  const blocks: string[] = [];
+
+  const heading = (title ?? "").trim() || t("Untitled chat");
+  blocks.push(`# ${heading}`);
+
+  // Metadata bullet list. Total tokens is only shown when there is a sum.
+  const totalTokens = rows.reduce((sum, row) => {
+    const usage = row.metadata?.usage;
+    return usage ? sum + rowTokens(usage) : sum;
+  }, 0);
+  const meta = [
+    `- Chat ID: \`${chatId}\``,
+    `- Exported: ${new Date().toISOString()}`,
+    `- Messages: ${rows.length + (pending?.length ?? 0)}`,
+  ];
+  if (totalTokens > 0) meta.push(`- Total tokens: ${totalTokens}`);
+  blocks.push(meta.join("\n"));
+
+  rows.forEach((row, index) => {
+    blocks.push("---");
+
+    const roleLabel = row.role === "assistant" ? t("AI agent") : t("You");
+    blocks.push(`## ${index + 1}. ${roleLabel}`);
+
+    // Created-at kept in source as an HTML comment (out of the rendered prose).
+    blocks.push(`<!-- ${row.createdAt} -->`);
+
+    // Resolve parts: prefer the rich persisted parts, else a single text part
+    // built from the plain-text content (mirrors `rowToUiMessage`).
+    const parts: TextLikePart[] =
+      Array.isArray(row.metadata?.parts) && row.metadata.parts.length > 0
+        ? (row.metadata.parts as TextLikePart[])
+        : [{ type: "text", text: row.content ?? "" }];
+
+    blocks.push(...renderMessageParts(parts, t));
+
+    if (row.metadata?.error) {
+      blocks.push(`**⚠️ Error:** ${row.metadata.error}`);
+    }
+
+    const usage = row.metadata?.usage;
+    if (usage) {
+      const total = usage.totalTokens ?? rowTokens(usage);
+      // Reasoning (thinking) tokens are shown only when the provider reported a
+      // positive count; old rows / non-reasoning providers omit it.
+      const reasoning =
+        usage.reasoningTokens && usage.reasoningTokens > 0
+          ? `, reasoning: ${usage.reasoningTokens}`
+          : "";
+      blocks.push(
+        `_Tokens — in: ${usage.inputTokens ?? "?"}, out: ${usage.outputTokens ?? "?"}${reasoning}, total: ${total}_`,
+      );
+    }
+  });
+
+  // Append the in-progress, not-yet-persisted live messages (the current
+  // streaming turn) after the persisted rows. Heading numbering CONTINUES from
+  // the persisted rows. A `generating` assistant gets a note that the captured
+  // response is partial; pending messages carry no usage/token footer yet.
+  (pending ?? []).forEach((message, p) => {
+    blocks.push("---");
+
+    const num = rows.length + p + 1;
+    const roleLabel = message.role === "assistant" ? t("AI agent") : t("You");
+    blocks.push(`## ${num}. ${roleLabel}`);
+
+    blocks.push(...renderMessageParts(message.parts, t));
+
+    // A generating assistant may have empty/no parts yet — still emit the
+    // heading (above) and this note so the export shows the in-progress turn.
+    if (message.generating === true) {
+      blocks.push(
+        "_⏳ This message is still being generated — the export captured a partial, in-progress response._",
+      );
+    }
+  });
+
+  // Blank line between blocks so the Markdown renders cleanly.
+  return blocks.join("\n\n");
+}
--- a/apps/client/src/features/ai-chat/utils/collapse-blank-lines.test.ts
+++ b/apps/client/src/features/ai-chat/utils/collapse-blank-lines.test.ts
@@ -1,61 +0,0 @@
-import { describe, it, expect } from "vitest";
-import { collapseBlankLines } from "@/features/ai-chat/utils/collapse-blank-lines.ts";
-import { renderChatMarkdown } from "@/features/ai-chat/utils/markdown.ts";
-
-describe("collapseBlankLines", () => {
-  it("collapses a run of 2+ newlines to a single newline", () => {
-    expect(collapseBlankLines("a\n\nb")).toBe("a\nb");
-    expect(collapseBlankLines("a\n\n\n\nb")).toBe("a\nb");
-  });
-
-  it("keeps single newlines untouched", () => {
-    expect(collapseBlankLines("a\nb\nc")).toBe("a\nb\nc");
-  });
-
-  it("preserves blank lines INSIDE a fenced code block", () => {
-    const src = "a\n\n\nb\n\n```\nx\n\n\ny\n```\n\nc";
-    // Prose blanks collapse; the blank lines between the ``` fences survive.
-    expect(collapseBlankLines(src)).toBe("a\nb\n```\nx\n\n\ny\n```\nc");
-  });
-
-  it("handles a tilde fence and preserves its interior blanks", () => {
-    const src = "p\n\n~~~\ncode\n\nmore\n~~~\n\nq";
-    expect(collapseBlankLines(src)).toBe("p\n~~~\ncode\n\nmore\n~~~\nq");
-  });
-
-  it("leaves an unclosed fence's remaining lines verbatim", () => {
-    const src = "intro\n\n```\nstill\n\nopen";
-    expect(collapseBlankLines(src)).toBe("intro\n```\nstill\n\nopen");
-  });
-
-  it("is a no-op for text with no blank lines", () => {
-    expect(collapseBlankLines("just one line")).toBe("just one line");
-  });
-});
-
-describe("collapseBlankLines + renderChatMarkdown (tight reasoning rendering)", () => {
-  it("renders a blank-line-separated list as a TIGHT list (no <li><p>)", () => {
-    const loose =
-      "Intro paragraph.\n\n- item one\n\n- item two\n\n- item three";
-    const html = renderChatMarkdown(collapseBlankLines(loose), {});
-    // Tight list: each <li> holds the text directly, not wrapped in a <p>.
-    expect(html).toContain("<li>item one</li>");
-    expect(html).not.toContain("<li><p>");
-    // The list still parses as a list after the paragraph (not a paragraph+<br>).
-    expect(html).toContain("<ul>");
-    expect(html).toContain("<p>Intro paragraph.</p>");
-  });
-
-  it("renders an ordered list (1. 2.) as tight after collapsing", () => {
-    const loose = "Intro.\n\n1. first\n\n2. second";
-    const html = renderChatMarkdown(collapseBlankLines(loose), {});
-    expect(html).toContain("<ol>");
-    expect(html).toContain("<li>first</li>");
-    expect(html).not.toContain("<li><p>");
-  });
-
-  it("the loose source WOULD render <li><p> without collapsing (control)", () => {
-    const loose = "- a\n\n- b";
-    expect(renderChatMarkdown(loose, {})).toContain("<li><p>");
-  });
-});
--- a/apps/client/src/features/ai-chat/utils/collapse-blank-lines.ts
+++ b/apps/client/src/features/ai-chat/utils/collapse-blank-lines.ts
@@ -1,56 +0,0 @@
-// Pure helper for compact reasoning ("Thinking") rendering. Kept free of React
-// so it can be unit-tested in isolation (see collapse-blank-lines.test.ts).
-
-/**
- * Collapse runs of 2+ newlines down to a single newline, EXCEPT inside fenced
- * code blocks (``` ... ``` or ~~~ ... ~~~), where blank lines are significant.
- *
- * Why: reasoning models emit thinking with a blank line (`\n\n`) between every
- * list item and paragraph. `marked` turns those into "loose" lists (each `<li>`
- * wrapped in a `<p>`) and separate `<p>` paragraphs, each carrying a vertical
- * margin — so the "Thinking" block renders with large, airy gaps. Removing the
- * blank-line gaps yields tight lists (no `<li><p>`) and joined paragraphs. The
- * chat markdown renderer runs with `breaks: true`, so a single `\n` still
- * becomes a `<br>` — line breaks inside the reasoning are preserved; only the
- * empty gaps between blocks disappear. Apply ONLY to reasoning text, never to a
- * normal assistant answer (where paragraph spacing is intentional).
- *
- * Fenced code is preserved verbatim: a fence opens on a line whose first
- * non-space characters are ``` or ~~~ and closes on the next line that starts
- * with the same fence character. Blank lines between fences (significant for
- * code formatting) are never collapsed.
- */
-export function collapseBlankLines(text: string): string {
-  const lines = text.split("\n");
-  const out: string[] = [];
-  let inFence = false;
-  let fenceChar = "";
-
-  for (const line of lines) {
-    const fenceMatch = line.match(/^\s*(`{3,}|~{3,})/);
-    if (fenceMatch) {
-      const ch = fenceMatch[1][0];
-      if (!inFence) {
-        inFence = true;
-        fenceChar = ch;
-      } else if (ch === fenceChar) {
-        inFence = false;
-      }
-      out.push(line);
-      continue;
-    }
-
-    // Inside a fenced block every line (including blanks) is significant.
-    if (inFence) {
-      out.push(line);
-      continue;
-    }
-
-    // Outside fences: drop blank lines so a `\n\n+` gap collapses to a single
-    // `\n` between the surrounding content lines.
-    if (line.trim() === "") continue;
-    out.push(line);
-  }
-
-  return out.join("\n");
-}
--- a/apps/client/src/features/ai-chat/utils/count-stream-tokens.test.ts
+++ b/apps/client/src/features/ai-chat/utils/count-stream-tokens.test.ts
@@ -1,5 +1,17 @@
 import { describe, expect, it } from "vitest";
-import { estimateTokens } from "@/features/ai-chat/utils/count-stream-tokens.ts";
+import type { UIMessage } from "@ai-sdk/react";
+import {
+  estimateTokens,
+  liveTurnTokens,
+} from "@/features/ai-chat/utils/count-stream-tokens.ts";
+
+const msg = (parts: unknown[], metadata?: unknown): UIMessage =>
+  ({
+    id: Math.random().toString(),
+    role: "assistant",
+    parts,
+    metadata,
+  }) as UIMessage;

 describe("estimateTokens", () => {
  it("returns 0 for the empty string", () => {
@@ -13,3 +25,95 @@ describe("estimateTokens", () => {
    expect(estimateTokens("12345678")).toBe(2);
  });
 });
+
+describe("liveTurnTokens — estimate path", () => {
+  it("is all zeros for an undefined message", () => {
+    expect(liveTurnTokens(undefined)).toEqual({
+      reasoning: 0,
+      output: 0,
+      authoritative: false,
+    });
+  });
+
+  it("is all zeros for a parts-less message", () => {
+    expect(liveTurnTokens({ id: "x", role: "assistant" } as UIMessage)).toEqual({
+      reasoning: 0,
+      output: 0,
+      authoritative: false,
+    });
+  });
+
+  it("estimates output from text parts", () => {
+    // 8 chars -> 2 tokens.
+    const r = liveTurnTokens(msg([{ type: "text", text: "12345678" }]));
+    expect(r).toEqual({ reasoning: 0, output: 2, authoritative: false });
+  });
+
+  it("estimates reasoning from reasoning parts (kept separate from output)", () => {
+    const r = liveTurnTokens(
+      msg([
+        { type: "reasoning", text: "12345678" },
+        { type: "text", text: "abcd" },
+      ]),
+    );
+    expect(r).toEqual({ reasoning: 2, output: 1, authoritative: false });
+  });
+
+  it("accumulates across multiple text + reasoning parts (multi-step)", () => {
+    const r = liveTurnTokens(
+      msg([
+        { type: "reasoning", text: "abcd" }, // 1
+        { type: "text", text: "abcd" }, // 1
+        { type: "tool-getPage", state: "output-available" }, // ignored
+        { type: "reasoning", text: "abcd" }, // 1
+        { type: "text", text: "abcdefgh" }, // 2
+      ]),
+    );
+    expect(r).toEqual({ reasoning: 2, output: 3, authoritative: false });
+  });
+
+  it("ignores non text/reasoning parts (tools, step-start)", () => {
+    const r = liveTurnTokens(
+      msg([
+        { type: "step-start" },
+        { type: "tool-getPage", state: "input-available" },
+      ]),
+    );
+    expect(r).toEqual({ reasoning: 0, output: 0, authoritative: false });
+  });
+});
+
+describe("liveTurnTokens — authoritative path", () => {
+  it("returns authoritative usage verbatim, splitting reasoning out of output", () => {
+    // outputTokens INCLUDES reasoning in the AI SDK shape -> answer = 100 - 30.
+    const r = liveTurnTokens(
+      msg([{ type: "text", text: "estimate would be tiny" }], {
+        usage: { inputTokens: 500, outputTokens: 100, reasoningTokens: 30 },
+      }),
+    );
+    expect(r).toEqual({ reasoning: 30, output: 70, authoritative: true });
+  });
+
+  it("treats missing reasoningTokens as 0 and keeps full output", () => {
+    const r = liveTurnTokens(
+      msg([{ type: "text", text: "x" }], {
+        usage: { inputTokens: 10, outputTokens: 42 },
+      }),
+    );
+    expect(r).toEqual({ reasoning: 0, output: 42, authoritative: true });
+  });
+
+  it("never returns a negative output when reasoning exceeds reported output", () => {
+    const r = liveTurnTokens(
+      msg([], { usage: { outputTokens: 10, reasoningTokens: 40 } }),
+    );
+    expect(r).toEqual({ reasoning: 40, output: 0, authoritative: true });
+  });
+
+  it("falls back to the estimate when metadata has no usage object", () => {
+    const r = liveTurnTokens(
+      msg([{ type: "text", text: "abcd" }], { chatId: "c1" }),
+    );
+    expect(r).toEqual({ reasoning: 0, output: 1, authoritative: false });
+  });
+});
--- a/apps/client/src/features/ai-chat/utils/count-stream-tokens.ts
+++ b/apps/client/src/features/ai-chat/utils/count-stream-tokens.ts
@@ -1,16 +1,18 @@
+import type { UIMessage } from "@ai-sdk/react";
+
 /**
- * Live token ESTIMATION for a streaming AI-chat turn.
+ * Live token counting for a streaming AI-chat turn — split into REASONING
+ * (thinking) and OUTPUT (answer) tokens, mirroring how Claude Code shows
+ * `Thinking… · 60 tokens` next to its thinking indicator.
 *
 * No provider streams exact per-token usage mid-stream, so the live number is a
- * CLIENT ESTIMATE (chars/≈4 heuristic). It powers the chat body's
- * `Thinking… · N tokens` indicator (see `ReasoningBlock`), which reconciles to
- * the authoritative server usage once it lands. Pure + unit-testable: it never
- * runs a real BPE tokenizer (that would be O(n²) on the hot path, bloat the
+ * CLIENT ESTIMATE (chars/≈4 heuristic) that is reconciled to AUTHORITATIVE usage
+ * once the server attaches it on a step/turn boundary (see the server's
+ * `chatStreamMetadata` + the client's read of `message.metadata.usage`). When
+ * authoritative usage is present we return it verbatim (the number "jumps to
+ * exact"); otherwise we return the running estimate. Pure + unit-testable: it
+ * never runs a real BPE tokenizer (that would be O(n²) on the hot path, bloat the
 * bundle, and be wrong for Gemini/Ollama anyway).
- *
- * The former header-badge `liveTurnTokens()` split was removed with #189 (the
- * header badge now shows the stable "current / max" context size, not a live
- * per-turn counter); the live feedback remains in `ReasoningBlock`.
 */

 /**
@@ -22,3 +24,71 @@ export function estimateTokens(text: string): number {
  if (!text) return 0;
  return Math.ceil(text.length / 4);
 }
+
+/** Authoritative per-step/turn usage the server attaches to message metadata. */
+export interface AuthoritativeUsage {
+  inputTokens?: number;
+  outputTokens?: number;
+  totalTokens?: number;
+  reasoningTokens?: number;
+}
+
+/** Live token split for a turn's tail (streaming) assistant message. */
+export interface LiveTurnTokens {
+  /** Thinking/reasoning tokens (estimate, or authoritative when available). */
+  reasoning: number;
+  /** Answer/output tokens (estimate, or authoritative when available). */
+  output: number;
+  /** True when the numbers come from authoritative server usage, not estimate. */
+  authoritative: boolean;
+}
+
+/** Read the authoritative usage off a UIMessage's metadata, if the server set it. */
+function metadataUsage(message: UIMessage): AuthoritativeUsage | undefined {
+  const meta = message?.metadata as
+    | { usage?: AuthoritativeUsage }
+    | undefined;
+  const usage = meta?.usage;
+  if (!usage || typeof usage !== "object") return undefined;
+  return usage;
+}
+
+/**
+ * Token split for the given (streaming) assistant message.
+ *
+ * Prefers AUTHORITATIVE `metadata.usage` when the server has attached it (at a
+ * step/turn boundary, incl. `reasoningTokens`) — so the live counter snaps to the
+ * provider's exact figures. Until then it returns a running ESTIMATE summed over
+ * the message parts: `reasoning` parts feed the reasoning estimate, `text` parts
+ * feed the output estimate. Multi-part / multi-step turns accumulate naturally
+ * because every part of the turn is summed.
+ *
+ * Providers that don't stream reasoning text still surface a reasoning count once
+ * the authoritative usage arrives (`usage.reasoningTokens`); on the pure estimate
+ * path such a turn simply shows `reasoning: 0` until then.
+ */
+export function liveTurnTokens(message: UIMessage | undefined): LiveTurnTokens {
+  if (!message) return { reasoning: 0, output: 0, authoritative: false };
+
+  const usage = metadataUsage(message);
+  if (usage) {
+    // Authoritative branch: outputTokens already INCLUDES reasoning tokens in the
+    // AI SDK usage shape, so subtract reasoning out for the "answer" figure (never
+    // go negative if a provider reports them inconsistently).
+    const reasoning = usage.reasoningTokens ?? 0;
+    const totalOutput = usage.outputTokens ?? 0;
+    const output = Math.max(0, totalOutput - reasoning);
+    return { reasoning, output, authoritative: true };
+  }
+
+  let reasoning = 0;
+  let output = 0;
+  for (const part of message.parts ?? []) {
+    if (part.type === "reasoning") {
+      reasoning += estimateTokens((part as { text?: string }).text ?? "");
+    } else if (part.type === "text") {
+      output += estimateTokens((part as { text?: string }).text ?? "");
+    }
+  }
+  return { reasoning, output, authoritative: false };
+}
--- a/apps/client/src/features/editor/components/footnote/footnote-definition-view.tsx
+++ b/apps/client/src/features/editor/components/footnote/footnote-definition-view.tsx
@@ -1,45 +1,25 @@
 import { NodeViewContent, NodeViewProps, NodeViewWrapper } from "@tiptap/react";
 import { useTranslation } from "react-i18next";
-import { getFootnoteNumber, getFootnoteRefCount } from "@docmost/editor-ext";
+import { getFootnoteNumber } from "@docmost/editor-ext";
 import classes from "./footnote.module.css";

-/**
- * A 0-based backlink index -> its lowercase letter label (0 -> "a", 25 -> "z",
- * 26 -> "aa", ...), matching the Pandoc/Wikipedia "↩ a b c" convention.
- */
-export function backlinkLabel(index: number): string {
-  let out = "";
-  let x = index;
-  while (x >= 0) {
-    out = String.fromCharCode(97 + (x % 26)) + out;
-    x = Math.floor(x / 26) - 1;
-  }
-  return out;
-}
-
 /**
 * NodeView for a single footnote definition: a decorative number marker, the
 * editable content (NodeViewContent), and a "↩" back-link to its reference.
 * The number is derived from the document (not stored).
- *
- * After #166 a footnote can be referenced more than once (one number, one
- * definition, N forward links). When it is, the back-link becomes a row of
- * per-occurrence links — ↩ a b c … — each scrolling to its own reference (#168);
- * a single-reference footnote keeps the plain ↩.
 */
 export default function FootnoteDefinitionView(props: NodeViewProps) {
  const { node, editor } = props;
  const { t } = useTranslation();
  const id = node.attrs.id as string;

-  // Read the cached number/ref-count from the numbering plugin (computed once
-  // per doc change) rather than recomputing the whole map on every render.
+  // Read the cached number from the numbering plugin (computed once per doc
+  // change) rather than recomputing the whole map on every render.
  const number = getFootnoteNumber(editor.state, id) ?? "?";
-  const refCount = getFootnoteRefCount(editor.state, id);

-  const jumpTo = (e: React.MouseEvent, index: number) => {
+  const handleBack = (e: React.MouseEvent) => {
    e.preventDefault();
-    editor.commands.scrollToReference(id, index);
+    editor.commands.scrollToReference(id);
  };

  return (
@@ -62,47 +42,16 @@ export default function FootnoteDefinitionView(props: NodeViewProps) {
      >
        {number}.
      </span>
-      {refCount > 1 ? (
-        // Multiple references -> ↩ followed by one lettered link per occurrence.
-        <span
-          className={classes.backLinks}
-          contentEditable={false}
-          role="group"
-          aria-label={t("Back to references")}
-        >
-          <span className={classes.backLinkArrow} aria-hidden="true">
-            ↩
-          </span>
-          {Array.from({ length: refCount }, (_, i) => (
-            <span
-              key={i}
-              className={classes.backLink}
-              onClick={(e) => jumpTo(e, i)}
-              role="button"
-              aria-label={t("Back to reference {{label}}", {
-                label: backlinkLabel(i),
-              })}
-              title={t("Back to reference {{label}}", {
-                label: backlinkLabel(i),
-              })}
-            >
-              {backlinkLabel(i)}
-            </span>
-          ))}
-        </span>
-      ) : (
-        // Single reference -> the plain ↩ (unchanged behavior).
-        <span
-          className={classes.backLink}
-          contentEditable={false}
-          onClick={(e) => jumpTo(e, 0)}
-          role="button"
-          aria-label={t("Back to reference")}
-          title={t("Back to reference")}
-        >
-          ↩
-        </span>
-      )}
+      <span
+        className={classes.backLink}
+        contentEditable={false}
+        onClick={handleBack}
+        role="button"
+        aria-label={t("Back to reference")}
+        title={t("Back to reference")}
+      >
+        ↩
+      </span>
    </NodeViewWrapper>
  );
 }
--- a/apps/client/src/features/editor/components/footnote/footnote-views.structure.test.tsx
+++ b/apps/client/src/features/editor/components/footnote/footnote-views.structure.test.tsx
@@ -1,5 +1,5 @@
-import { describe, it, expect, vi, afterEach } from "vitest";
-import { render, fireEvent } from "@testing-library/react";
+import { describe, it, expect, vi } from "vitest";
+import { render } from "@testing-library/react";

 /**
 * Structural regression guard for #146 (PR #147).
@@ -36,14 +36,10 @@ vi.mock("react-i18next", () => ({
  useTranslation: () => ({ t: (key: string) => key }),
 }));

-// footnote-definition-view reads a cached number + reference count from the
-// numbering plugin; stub them so we don't need a live ProseMirror state. The
-// ref-count is a hoisted mutable so a test can drive the single-vs-multi
-// backlink branch (#168). Default 1 = single reference (the #146 cases).
-const { mockRefCount } = vi.hoisted(() => ({ mockRefCount: { value: 1 } }));
+// footnote-definition-view reads a cached number from the numbering plugin;
+// stub it so we don't need a live ProseMirror state.
 vi.mock("@docmost/editor-ext", () => ({
  getFootnoteNumber: () => 1,
-  getFootnoteRefCount: () => mockRefCount.value,
 }));

 // Mocks so CodeBlockView renders cheaply (no MantineProvider, no matchMedia).
@@ -63,8 +59,7 @@ vi.mock("@mantine/core", () => ({
  ),
 }));
 vi.mock("@/components/common/copy-button", () => ({
-  CopyButton: ({ children }: any) =>
-    children({ copied: false, copy: () => {} }),
+  CopyButton: ({ children }: any) => children({ copied: false, copy: () => {} }),
 }));
 vi.mock("@tabler/icons-react", () => ({
  IconCheck: () => null,
@@ -75,9 +70,7 @@ vi.mock("@/features/editor/components/code-block/mermaid-view.tsx", () => ({
 }));

 import FootnotesListView from "./footnotes-list-view";
-import FootnoteDefinitionView, {
-  backlinkLabel,
-} from "./footnote-definition-view";
+import FootnoteDefinitionView from "./footnote-definition-view";
 import CodeBlockView from "../code-block/code-block-view";

 // Minimal NodeViewProps stub: definition view only touches node.attrs.id and
@@ -148,84 +141,3 @@ describe("#146 editable NodeView contentDOM-first invariant", () => {
    },
  );
 });
-
-// #168: a footnote referenced more than once shows one lettered backlink per
-// occurrence (↩ a b c), each scrolling to its own reference; a single-reference
-// footnote keeps the plain ↩.
-describe("#168 footnote definition multi-backlinks", () => {
-  afterEach(() => {
-    // Reset the shared ref-count mock so other tests see a single reference.
-    mockRefCount.value = 1;
-  });
-
-  const makeProps = () =>
-    ({
-      node: { attrs: { id: "fn-1" }, textContent: "" },
-      editor: {
-        state: {},
-        isEditable: true,
-        commands: { scrollToReference: vi.fn() },
-      },
-      getPos: () => 0,
-      updateAttributes: () => {},
-      deleteNode: () => {},
-    }) as any;
-
-  it("renders one lettered backlink per reference (a, b, c) plus the ↩ arrow", () => {
-    mockRefCount.value = 3;
-    const { getByTestId } = render(<FootnoteDefinitionView {...makeProps()} />);
-    const wrapper = getByTestId("nvw");
-
-    const links = wrapper.querySelectorAll('[role="button"]');
-    expect(Array.from(links).map((l) => l.textContent)).toEqual([
-      "a",
-      "b",
-      "c",
-    ]);
-    // The ↩ arrow is present (as decorative chrome, not a button).
-    expect(wrapper.textContent).toContain("↩");
-  });
-
-  it("clicking the n-th backlink scrolls to the n-th occurrence (0-based)", () => {
-    mockRefCount.value = 3;
-    const props = makeProps();
-    const { getByTestId } = render(<FootnoteDefinitionView {...props} />);
-    const links = getByTestId("nvw").querySelectorAll('[role="button"]');
-
-    fireEvent.click(links[1]); // "b"
-    expect(props.editor.commands.scrollToReference).toHaveBeenCalledWith(
-      "fn-1",
-      1,
-    );
-  });
-
-  it("a single-reference footnote renders just one ↩ (no letters)", () => {
-    mockRefCount.value = 1;
-    const props = makeProps();
-    const { getByTestId } = render(<FootnoteDefinitionView {...props} />);
-    const wrapper = getByTestId("nvw");
-
-    const links = wrapper.querySelectorAll('[role="button"]');
-    expect(links.length).toBe(1);
-    expect(links[0].textContent).toBe("↩");
-
-    fireEvent.click(links[0]);
-    expect(props.editor.commands.scrollToReference).toHaveBeenCalledWith(
-      "fn-1",
-      0,
-    );
-  });
-});
-
-// #185 re-review pt 7: backlinkLabel is base-26 (a..z, then aa…). The component
-// tests only cover a,b,c (index 0-2); pin the >= 26 carry boundary.
-describe("backlinkLabel base-26 boundary (#168)", () => {
-  it("maps 0->a, 25->z, 26->aa, 27->ab, 51->az, 52->ba", () => {
-    expect(backlinkLabel(0)).toBe("a");
-    expect(backlinkLabel(25)).toBe("z");
-    expect(backlinkLabel(26)).toBe("aa");
-    expect(backlinkLabel(27)).toBe("ab");
-    expect(backlinkLabel(51)).toBe("az");
-    expect(backlinkLabel(52)).toBe("ba");
-  });
-});
--- a/apps/client/src/features/editor/components/footnote/footnote.module.css
+++ b/apps/client/src/features/editor/components/footnote/footnote.module.css
@@ -115,18 +115,3 @@
 .backLink:hover {
  text-decoration: underline;
 }
-
-/* Multi-backlink row (#168): ↩ a b c — one lettered link per reference
-   occurrence. Sits on the right, after the content, like the single ↩. */
-.backLinks {
-  flex: 0 0 auto;
-  display: inline-flex;
-  align-items: baseline;
-  gap: 0.3em;
-  user-select: none;
-}
-
-.backLinkArrow {
-  color: var(--mantine-color-dimmed);
-  font-size: 0.9em;
-}
--- a/apps/client/src/features/page/queries/page-query.ts
+++ b/apps/client/src/features/page/queries/page-query.ts
@@ -274,10 +274,7 @@ export function useRestorePageMutation() {
      queryClient.setQueryData<IPage>(["pages", restoredPage.slugId], merge);
    },
    onError: (error) => {
-      notifications.show({
-        message: t("Failed to restore page"),
-        color: "red",
-      });
+      notifications.show({ message: t("Failed to restore page"), color: "red" });
    },
  });
 }
@@ -288,10 +285,10 @@ export function useGetSidebarPagesQuery(
  return useInfiniteQuery({
    queryKey: ["sidebar-pages", data],
    enabled: !!data?.pageId || !!data?.spaceId,
-    queryFn: ({ pageParam }) =>
-      getSidebarPages({ ...data, cursor: pageParam, limit: 100 }),
+    queryFn: ({ pageParam }) => getSidebarPages({ ...data, cursor: pageParam, limit: 100 }),
    initialPageParam: undefined,
-    getNextPageParam: (lastPage) => lastPage.meta?.nextCursor ?? undefined,
+    getNextPageParam: (lastPage) =>
+      lastPage.meta?.nextCursor ?? undefined,
  });
 }

@@ -299,14 +296,11 @@ export function useGetRootSidebarPagesQuery(data: SidebarPagesParams) {
  return useInfiniteQuery({
    queryKey: ["root-sidebar-pages", data.spaceId],
    queryFn: async ({ pageParam }) => {
-      return getSidebarPages({
-        spaceId: data.spaceId,
-        cursor: pageParam,
-        limit: 100,
-      });
+      return getSidebarPages({ spaceId: data.spaceId, cursor: pageParam, limit: 100 });
    },
    initialPageParam: undefined,
-    getNextPageParam: (lastPage) => lastPage.meta?.nextCursor ?? undefined,
+    getNextPageParam: (lastPage) =>
+      lastPage.meta?.nextCursor ?? undefined,
  });
 }

@@ -329,17 +323,12 @@ export function usePageBreadcrumbsQuery(
  });
 }

-export async function fetchAllAncestorChildren(
-  params: SidebarPagesParams,
-  // `fresh: true` forces a server refetch (staleTime 0) — used by the reconnect
-  // refresh (#159 #8), which must NOT receive the 30-min-cached children.
-  opts?: { fresh?: boolean },
-) {
+export async function fetchAllAncestorChildren(params: SidebarPagesParams) {
  // not using a hook here, so we can call it inside a useEffect hook
  const response = await queryClient.fetchQuery({
    queryKey: ["sidebar-pages", params],
    queryFn: () => getAllSidebarPages(params),
-    staleTime: opts?.fresh ? 0 : 30 * 60 * 1000,
+    staleTime: 30 * 60 * 1000,
  });

  const allItems = response.pages.flatMap((page) => page.items);
@@ -358,15 +347,11 @@ export function useRecentChangesQuery(spaceId?: string) {
  });
 }

-export function useCreatedByQuery(params?: {
-  userId?: string;
-  spaceId?: string;
-}) {
+export function useCreatedByQuery(params?: { userId?: string; spaceId?: string }) {
  const { userId, spaceId } = params ?? {};
  return useInfiniteQuery({
    queryKey: ["pages-created-by-user", { userId, spaceId }],
-    queryFn: ({ pageParam }) =>
-      getCreatedByPages({ userId, spaceId, cursor: pageParam, limit: 15 }),
+    queryFn: ({ pageParam }) => getCreatedByPages({ userId, spaceId, cursor: pageParam, limit: 15 }),
    initialPageParam: undefined as string | undefined,
    getNextPageParam: (lastPage) =>
      lastPage.meta.hasNextPage ? lastPage.meta.nextCursor : undefined,
--- a/apps/client/src/features/page/tree/components/space-tree.tsx
+++ b/apps/client/src/features/page/tree/components/space-tree.tsx
@@ -29,11 +29,9 @@ import {
  collectBranchIds,
  openBranches,
  closeIds,
-  loadedOpenBranchIds,
 } from "@/features/page/tree/utils/utils.ts";
 import { SpaceTreeNode } from "@/features/page/tree/types.ts";
 import { treeModel } from "@/features/page/tree/model/tree-model";
-import { socketAtom } from "@/features/websocket/atoms/socket-atom.ts";
 import {
  getPageBreadcrumbs,
  getSpaceTree,
@@ -41,7 +39,11 @@ import {
 import { IPage } from "@/features/page/types/page.types.ts";
 import { extractPageSlugId } from "@/lib";
 import { isCompactPageTreeEnabled } from "@/lib/config.ts";
-import { DocTree, ROW_HEIGHT_COMPACT, ROW_HEIGHT_STANDARD } from "./doc-tree";
+import {
+  DocTree,
+  ROW_HEIGHT_COMPACT,
+  ROW_HEIGHT_STANDARD,
+} from "./doc-tree";
 import { SpaceTreeRow } from "./space-tree-row";

 interface SpaceTreeProps {
@@ -191,54 +193,6 @@ const SpaceTree = forwardRef<SpaceTreeApi, SpaceTreeProps>(function SpaceTree(
    [openTreeNodes],
  );

-  // Latest tree + open-state for the reconnect handler (its closure would
-  // otherwise read stale snapshots).
-  const [socket] = useAtom(socketAtom);
-  const dataRef = useRef(data);
-  dataRef.current = data;
-  const openIdsRef = useRef(openIds);
-  openIdsRef.current = openIds;
-
-  // Reconnect refresh (#159 #8): on a socket reconnect, re-fetch and reconcile
-  // the children of every currently-open, already-loaded branch of THIS space,
-  // so a move/rename/delete that happened INSIDE a loaded branch while events
-  // were missed (laptop sleep / wifi gap) is reflected instead of left stale.
-  // The ROOT level is reconciled separately by the root-query refetch +
-  // mergeRootTrees; an UNLOADED branch is skipped (lazy-load fetches it fresh on
-  // expand). No first-connect guard is needed: space-tree usually mounts AFTER
-  // the initial connect, so every `connect` it sees is a reconnect; the rare
-  // initial-connect case has an empty tree, so the refresh is a harmless no-op.
-  useEffect(() => {
-    if (!socket) return;
-    const onConnect = async () => {
-      const effectSpaceId = spaceIdRef.current;
-      const branchIds = loadedOpenBranchIds(
-        dataRef.current.filter((n) => n?.spaceId === effectSpaceId),
-        openIdsRef.current,
-      );
-      if (branchIds.length === 0) return;
-      for (const id of branchIds) {
-        try {
-          // `fresh: true` bypasses the 30-min sidebar-pages cache so the
-          // reconcile sees the server's CURRENT children (handler-order
-          // independent — no reliance on the global reconnect invalidation).
-          const fresh = await fetchAllAncestorChildren(
-            { pageId: id, spaceId: effectSpaceId },
-            { fresh: true },
-          );
-          if (spaceIdRef.current !== effectSpaceId) return; // space switched
-          setData((prev) => treeModel.reconcileChildren(prev, id, fresh));
-        } catch (err) {
-          console.error("[tree] reconnect branch refresh failed", err);
-        }
-      }
-    };
-    socket.on("connect", onConnect);
-    return () => {
-      socket.off("connect", onConnect);
-    };
-  }, [socket, setData]);
-
  const handleToggle = useCallback(
    async (id: string, isOpen: boolean) => {
      setOpenTreeNodes((prev) => ({ ...prev, [id]: isOpen }));
@@ -291,7 +245,8 @@ const SpaceTree = forwardRef<SpaceTreeApi, SpaceTreeProps>(function SpaceTree(
      notifications.show({
        color: "red",
        message: t("Couldn't expand the tree: {{reason}}", {
-          reason: err?.response?.data?.message ?? err?.message ?? String(err),
+          reason:
+            err?.response?.data?.message ?? err?.message ?? String(err),
        }),
      });
    } finally {
@@ -307,11 +262,11 @@ const SpaceTree = forwardRef<SpaceTreeApi, SpaceTreeProps>(function SpaceTree(
    setOpenTreeNodes((prev) => closeIds(prev, ids));
  }, [filteredData, setOpenTreeNodes]);

-  useImperativeHandle(ref, () => ({ expandAll, collapseAll, isExpanding }), [
-    expandAll,
-    collapseAll,
-    isExpanding,
-  ]);
+  useImperativeHandle(
+    ref,
+    () => ({ expandAll, collapseAll, isExpanding }),
+    [expandAll, collapseAll, isExpanding],
+  );

  // Stable callbacks for DocTree. Without these, every parent render recreates
  // the props and tears down every row's draggable/dropTarget subscription,
--- a/apps/client/src/features/page/tree/model/tree-model.test.ts
+++ b/apps/client/src/features/page/tree/model/tree-model.test.ts
--- a/apps/client/src/features/page/tree/model/tree-model.ts
+++ b/apps/client/src/features/page/tree/model/tree-model.ts
@@ -1,4 +1,4 @@
-import type { TreeNode, SiblingsInfo } from "./tree-model.types";
+import type { TreeNode, SiblingsInfo } from './tree-model.types';

 function findInternal<T extends object>(
  nodes: TreeNode<T>[],
@@ -19,10 +19,7 @@ export const treeModel = {
    return findInternal(tree, id)?.node ?? null;
  },

-  path<T extends object>(
-    tree: TreeNode<T>[],
-    id: string,
-  ): TreeNode<T>[] | null {
+  path<T extends object>(tree: TreeNode<T>[], id: string): TreeNode<T>[] | null {
    const found = findInternal(tree, id);
    if (!found) return null;
    return [...found.parents, found.node];
@@ -126,23 +123,6 @@ export const treeModel = {
      return treeModel.insert(tree, null, node, index(tree));
    }
    const parent = treeModel.find(tree, parentId);
-    // The parent is in the tree but its children have NOT been lazy-loaded yet
-    // (`children === undefined`, distinct from a loaded-but-empty `[]`). Inserting
-    // here would MATERIALIZE a misleading partial child list (`[node]`) that
-    // defeats the lazy-load gate — which fetches only when children are
-    // absent/empty — so the parent's OTHER real children would never load and the
-    // moved/added node would be the only one shown (a silent data loss, #159 #1).
-    // Instead, leave the children unloaded and just flag `hasChildren` so the
-    // chevron appears; expanding fetches the FULL set (including this node).
-    if (parent && parent.children === undefined) {
-      return treeModel.update(
-        tree,
-        parentId,
-        // hasChildren is not part of the generic T constraint; tree nodes carry
-        // it. Cast narrowly so this stays a single, well-understood exception.
-        { hasChildren: true } as unknown as Omit<Partial<T>, "id" | "children">,
-      );
-    }
    const kids = (parent?.children as TreeNode<T>[] | undefined) ?? [];
    return treeModel.insert(tree, parentId, node, index(kids));
  },
@@ -223,48 +203,6 @@ export const treeModel = {
    return touched ? out : tree;
  },

-  // Replace a parent's DIRECT children with the authoritative `fresh` set while
-  // PRESERVING each surviving child's already-loaded grandchildren (deeper
-  // expansion). Unlike `appendChildren` (add-only), this DROPS children that are
-  // no longer present and reorders to `fresh` — so a move/delete/rename that
-  // happened inside a loaded branch while events were missed (a socket reconnect
-  // gap) is reflected, not left stale (#159 #8). Only used to reconcile an
-  // already-loaded branch against a fresh fetch; a parent with no loaded children
-  // (`children === undefined`) is left untouched (lazy-load handles it).
-  reconcileChildren<T extends object>(
-    tree: TreeNode<T>[],
-    parentId: string,
-    fresh: TreeNode<T>[],
-  ): TreeNode<T>[] {
-    let touched = false;
-    const walk = (nodes: TreeNode<T>[]): TreeNode<T>[] =>
-      nodes.map((n) => {
-        if (n.id === parentId) {
-          // Only reconcile a branch whose children were actually loaded; an
-          // unloaded parent stays unloaded (lazy-load fetches it fresh later).
-          if (n.children === undefined) return n;
-          const prevById = new Map(n.children.map((c) => [c.id, c]));
-          const merged = fresh.map((f) => {
-            const prev = prevById.get(f.id);
-            // Preserve the surviving child's previously loaded grandchildren so
-            // deeper expansion is not collapsed by the reconcile.
-            return prev?.children !== undefined
-              ? { ...f, children: prev.children }
-              : f;
-          });
-          touched = true;
-          return { ...n, children: merged };
-        }
-        if (n.children) {
-          const next = walk(n.children);
-          if (next !== n.children) return { ...n, children: next };
-        }
-        return n;
-      });
-    const out = walk(tree);
-    return touched ? out : tree;
-  },
-
  place<T extends object>(
    tree: TreeNode<T>[],
    sourceId: string,
@@ -304,10 +242,9 @@ export const treeModel = {
  move<T extends object>(
    tree: TreeNode<T>[],
    sourceId: string,
-    op: import("./tree-model.types").DropOp,
-  ): { tree: TreeNode<T>[]; result: import("./tree-model.types").DropResult } {
-    if (sourceId === op.targetId)
-      return { tree, result: { parentId: null, index: 0 } };
+    op: import('./tree-model.types').DropOp,
+  ): { tree: TreeNode<T>[]; result: import('./tree-model.types').DropResult } {
+    if (sourceId === op.targetId) return { tree, result: { parentId: null, index: 0 } };
    if (!treeModel.find(tree, sourceId) || !treeModel.find(tree, op.targetId)) {
      return { tree, result: { parentId: null, index: 0 } };
    }
@@ -318,7 +255,7 @@ export const treeModel = {
    let parentId: string | null;
    let index: number;

-    if (op.kind === "make-child") {
+    if (op.kind === 'make-child') {
      parentId = op.targetId;
      const target = treeModel.find(tree, op.targetId)!;
      index = target.children?.length ?? 0;
@@ -327,8 +264,9 @@ export const treeModel = {
      parentId = info.parentId;
      const sourceInfo = treeModel.siblingsOf(tree, sourceId)!;
      const sameParent = sourceInfo.parentId === parentId;
-      const adjust = sameParent && sourceInfo.index < info.index ? -1 : 0;
-      index = info.index + adjust + (op.kind === "reorder-after" ? 1 : 0);
+      const adjust =
+        sameParent && sourceInfo.index < info.index ? -1 : 0;
+      index = info.index + adjust + (op.kind === 'reorder-after' ? 1 : 0);
    }

    const next = treeModel.place(tree, sourceId, { parentId, index });
--- a/apps/client/src/features/page/tree/utils/utils.test.ts
+++ b/apps/client/src/features/page/tree/utils/utils.test.ts
@@ -6,8 +6,6 @@ import {
  collectBranchIds,
  openBranches,
  closeIds,
-  mergeRootTrees,
-  loadedOpenBranchIds,
 } from "./utils";
 import type { IPage } from "@/features/page/types/page.types.ts";
 import type { SpaceTreeNode } from "@/features/page/tree/types.ts";
@@ -46,7 +44,10 @@ function flatNode(
 }

 // Nested SpaceTreeNode factory for collectAllIds / collectBranchIds.
-function treeNode(id: string, children: SpaceTreeNode[] = []): SpaceTreeNode {
+function treeNode(
+  id: string,
+  children: SpaceTreeNode[] = [],
+): SpaceTreeNode {
  return {
    id,
    slugId: `slug-${id}`,
@@ -93,7 +94,11 @@ describe("collectBranchIds", () => {
      ]),
      treeNode("root2", [treeNode("leaf3")]),
    ];
-    expect(collectBranchIds(tree).sort()).toEqual(["branch1", "root", "root2"]);
+    expect(collectBranchIds(tree).sort()).toEqual([
+      "branch1",
+      "root",
+      "root2",
+    ]);
  });

  it("returns [] for a leaf-only tree", () => {
@@ -268,95 +273,3 @@ describe("closeIds", () => {
    expect(twice).toEqual({ keep: true, a: false, b: false });
  });
 });
-
-describe("mergeRootTrees (#159 #2 reconnect reconcile)", () => {
-  // Root node with a position and optional already-loaded children.
-  function root(
-    id: string,
-    position: string,
-    children?: SpaceTreeNode[],
-  ): SpaceTreeNode {
-    return {
-      id,
-      slugId: `slug-${id}`,
-      name: id.toUpperCase(),
-      icon: undefined,
-      position,
-      spaceId: "space-1",
-      parentPageId: null as unknown as string,
-      hasChildren: !!children?.length,
-      children: children as SpaceTreeNode[],
-    };
-  }
-
-  it("DROPS a stale root that is absent from the incoming (authoritative) set", () => {
-    // 'ghost' was a root before the gap; the server's current roots no longer
-    // include it (deleted / moved under another page). It must not linger.
-    const prev = [root("a", "a0"), root("ghost", "a2"), root("b", "a4")];
-    const incoming = [root("a", "a0"), root("b", "a4")];
-    const merged = mergeRootTrees(prev, incoming);
-    expect(merged.map((n) => n.id)).toEqual(["a", "b"]);
-    expect(merged.find((n) => n.id === "ghost")).toBeUndefined();
-  });
-
-  it("PRESERVES a surviving root's lazy-loaded children (subtree not lost on refetch)", () => {
-    const loadedChild = root("a1", "a0");
-    const prev = [root("a", "a0", [loadedChild])];
-    // The root query returns only top-level roots (no children).
-    const incoming = [root("a", "a0")];
-    const merged = mergeRootTrees(prev, incoming);
-    expect(merged[0].children?.map((c) => c.id)).toEqual(["a1"]);
-  });
-
-  it("ADDS a new incoming root", () => {
-    const prev = [root("a", "a0")];
-    const incoming = [root("a", "a0"), root("new", "a2")];
-    const merged = mergeRootTrees(prev, incoming);
-    expect(merged.map((n) => n.id)).toEqual(["a", "new"]);
-  });
-
-  it("REFRESHES a surviving root's own fields from the incoming copy (e.g. rename)", () => {
-    const prev = [{ ...root("a", "a0"), name: "OLD" }];
-    const incoming = [{ ...root("a", "a0"), name: "NEW" }];
-    const merged = mergeRootTrees(prev, incoming);
-    expect(merged[0].name).toBe("NEW");
-  });
-});
-
-describe("loadedOpenBranchIds (#159 #8 reconnect refresh targets)", () => {
-  function n(id: string, children?: SpaceTreeNode[]): SpaceTreeNode {
-    return {
-      id,
-      slugId: `slug-${id}`,
-      name: id.toUpperCase(),
-      icon: undefined,
-      position: "a0",
-      spaceId: "space-1",
-      parentPageId: null as unknown as string,
-      hasChildren: !!children,
-      children: children as SpaceTreeNode[],
-    };
-  }
-
-  it("returns OPEN branches whose children are loaded (array)", () => {
-    const tree = [n("a", [n("a1")]), n("b", [n("b1")])];
-    const ids = loadedOpenBranchIds(tree, new Set(["a"]));
-    expect(ids).toEqual(["a"]); // b is closed; a is open+loaded
-  });
-
-  it("skips an open branch whose children are NOT loaded (undefined)", () => {
-    const tree = [n("a")]; // children undefined
-    expect(loadedOpenBranchIds(tree, new Set(["a"]))).toEqual([]);
-  });
-
-  it("includes a loaded-but-empty open branch (a child may have been added during the gap)", () => {
-    const tree = [n("a", [])];
-    expect(loadedOpenBranchIds(tree, new Set(["a"]))).toEqual(["a"]);
-  });
-
-  it("walks nested open+loaded branches (deep chain refreshes every level)", () => {
-    const tree = [n("a", [n("a1", [n("a1a")])])];
-    const ids = loadedOpenBranchIds(tree, new Set(["a", "a1"]));
-    expect(ids.sort()).toEqual(["a", "a1"]);
-  });
-});
--- a/apps/client/src/features/page/tree/utils/utils.ts
+++ b/apps/client/src/features/page/tree/utils/utils.ts
@@ -214,59 +214,21 @@ export function appendNodeChildren(
 }

 /**
- * Reconcile the loaded root nodes to the authoritative INCOMING set (the
- * server's complete current roots for the space), preserving any lazy-loaded
- * children/subtree of a root that still exists.
- *
- * This runs only once all root pages are fetched, so `incomingRoots` is the full
- * server root set and is authoritative for WHICH roots exist:
- *  - a root in BOTH: kept, with its own fields refreshed from `incoming` (so a
- *    rename/move during a gap shows) while PRESERVING its previously lazy-loaded
- *    `children` (expanded subtrees + open-state survive a refetch);
- *  - a root only in `incoming`: a new root, added as-is;
- *  - a root only in `prev`: it was DELETED or moved under another page while we
- *    were not receiving events (e.g. a socket reconnect after a sleep/wifi gap).
- *    It is DROPPED instead of lingering as a 404 "ghost" root (#159 #2). The old
- *    append-only merge kept it forever.
+ * Merge root nodes; keep existing ones intact, append new ones,
 */
 export function mergeRootTrees(
  prevRoots: SpaceTreeNode[],
  incomingRoots: SpaceTreeNode[],
 ): SpaceTreeNode[] {
-  const prevById = new Map(prevRoots.map((r) => [r.id, r]));
+  const seen = new Set(prevRoots.map((r) => r.id));

-  const reconciled = incomingRoots.map((incoming) => {
-    const prev = prevById.get(incoming.id);
-    // Preserve the previously loaded children/subtree (the root query returns
-    // only top-level roots, so `incoming` carries no children); refresh the
-    // node's own fields from the authoritative incoming copy.
-    return prev ? { ...incoming, children: prev.children } : incoming;
+  // add new roots that were not present before
+  const merged = [...prevRoots];
+  incomingRoots.forEach((node) => {
+    if (!seen.has(node.id)) merged.push(node);
  });

-  return sortPositionKeys(reconciled);
-}
-
-/**
- * Ids of branches a socket-reconnect refresh should re-fetch and reconcile
- * (#159 #8): a node that is currently OPEN and whose children are LOADED
- * (`children` is an array — possibly empty). An unloaded branch (`children ===
- * undefined`) is skipped because lazy-load fetches it fresh on the next expand,
- * so there is nothing stale to reconcile. Walks the whole tree (a deep open
- * chain refreshes every loaded level).
- */
-export function loadedOpenBranchIds(
-  tree: SpaceTreeNode[],
-  openIds: ReadonlySet<string>,
-): string[] {
-  const ids: string[] = [];
-  const walk = (nodes: SpaceTreeNode[]) => {
-    for (const n of nodes) {
-      if (openIds.has(n.id) && Array.isArray(n.children)) ids.push(n.id);
-      if (n.children) walk(n.children);
-    }
-  };
-  walk(tree);
-  return ids;
+  return sortPositionKeys(merged);
 }

 // Collect every node id in the tree (roots, branches, leaves). Used by
--- a/apps/client/src/features/websocket/tree-socket-reducers.test.ts
+++ b/apps/client/src/features/websocket/tree-socket-reducers.test.ts
@@ -81,38 +81,6 @@ describe("applyMoveTreeNode", () => {
    ]);
  });

-  it("does NOT create a partial child list when the destination is loaded-but-collapsed (children unloaded) — keeps it lazy-loadable (#159)", () => {
-    // `dstCollapsed` is in the tree but its children were never lazy-loaded
-    // (children === undefined). The OLD behavior inserted `src` as the ONLY
-    // child ([src]), which defeated the lazy-load gate and HID the parent's
-    // other real children. Now the move leaves children unloaded (so expanding
-    // fetches the FULL set, including src) and just flags hasChildren.
-    const tree: SpaceTreeNode[] = [
-      node("dstCollapsed", {
-        position: "a0",
-        hasChildren: false,
-        children: undefined as unknown as SpaceTreeNode[],
-      }),
-      node("src", { position: "a9" }),
-    ];
-    const next = applyMoveTreeNode(tree, {
-      id: "src",
-      parentId: "dstCollapsed",
-      oldParentId: null,
-      index: 0,
-      position: "a4",
-      pageData: {},
-    });
-    const dst = treeModel.find(next, "dstCollapsed");
-    // Children stay unloaded -> the lazy-load gate fetches the FULL set (incl.
-    // src) on expand, rather than showing a misleading partial [src] list.
-    expect(dst?.children).toBeUndefined();
-    expect(dst?.hasChildren).toBe(true);
-    // src moved away from its old root slot (it lives under dstCollapsed
-    // server-side and reappears when the parent is expanded/loaded).
-    expect(next.map((n) => n.id)).not.toContain("src");
-  });
-
  it("flips the OLD parent's hasChildren to false when it is left childless", () => {
    // src is the only child of `old`; moving it to `dst` empties `old`.
    const tree: SpaceTreeNode[] = [
@@ -196,9 +164,7 @@ describe("applyDeleteTreeNode", () => {
            position: "a1",
            parentPageId: "p",
            hasChildren: true,
-            children: [
-              node("grandchild", { position: "a1", parentPageId: "child" }),
-            ],
+            children: [node("grandchild", { position: "a1", parentPageId: "child" })],
          }),
        ],
      }),
--- a/apps/client/src/features/workspace/components/settings/components/ai-mcp-server-form.tsx
+++ b/apps/client/src/features/workspace/components/settings/components/ai-mcp-server-form.tsx
@@ -11,7 +11,6 @@ import {
  Switch,
  TagsInput,
  Text,
-  Textarea,
  TextInput,
 } from "@mantine/core";
 import { useForm } from "@mantine/form";
@@ -36,8 +35,6 @@ const formSchema = z.object({
  // Write-only secret buffer. Empty string means "do not change" (unless cleared).
  authHeader: z.string(),
  toolAllowlist: z.array(z.string()),
-  // Admin-authored prompt guidance (#180). Capped to mirror the DTO MaxLength.
-  instructions: z.string().max(4000),
  enabled: z.boolean(),
 });

@@ -66,7 +63,6 @@ function buildInitialValues(server?: IAiMcpServer): FormValues {
    toolAllowlist: Array.isArray(server?.toolAllowlist)
      ? server.toolAllowlist
      : [],
-    instructions: server?.instructions ?? "",
    enabled: server?.enabled ?? true,
  };
 }
@@ -128,8 +124,6 @@ export default function AiMcpServerForm({
        transport: values.transport,
        url: values.url,
        toolAllowlist: values.toolAllowlist,
-        // Always sent: a blank value clears the stored guidance (server -> null).
-        instructions: values.instructions,
        enabled: values.enabled,
      };
      // Only attach headers when set or explicitly cleared (omit => unchanged).
@@ -141,8 +135,6 @@ export default function AiMcpServerForm({
        transport: values.transport,
        url: values.url,
        toolAllowlist: values.toolAllowlist,
-        // Blank => server stores null (no guidance).
-        instructions: values.instructions,
        enabled: values.enabled,
      };
      // On create, only a typed value matters (no prior stored headers).
@@ -166,7 +158,10 @@ export default function AiMcpServerForm({

  return (
    <Stack>
-      <TextInput label={t("Server name")} {...form.getInputProps("name")} />
+      <TextInput
+        label={t("Server name")}
+        {...form.getInputProps("name")}
+      />

      <Select
        label={t("Transport")}
@@ -182,7 +177,7 @@ export default function AiMcpServerForm({
        // Clarify that the value is sent verbatim as the Authorization header,
        // so the user supplies the full scheme (no implicit Bearer prefix).
        description={t(
-          'Sent verbatim as the value of the Authorization header (e.g. "Bearer <token>" or "Basic <base64>").',
+          "Sent verbatim as the value of the Authorization header (e.g. \"Bearer <token>\" or \"Basic <base64>\").",
        )}
        // Placeholder hints whether headers are stored; the value is never shown.
        placeholder={hasHeaders ? t("•••• set") : ""}
@@ -213,20 +208,6 @@ export default function AiMcpServerForm({
        {...form.getInputProps("toolAllowlist")}
      />

-      <Textarea
-        label={t("Instructions")}
-        // Hint that the text is injected into the agent's system prompt and that
-        // the server's tools are namespaced under <name>_* (the prompt header).
-        description={t(
-          "Optional guidance for the agent on how and when to use this server's tools. Injected into the system prompt. The server's tools are namespaced as \"<server name>_*\".",
-        )}
-        autosize
-        minRows={2}
-        maxRows={8}
-        maxLength={4000}
-        {...form.getInputProps("instructions")}
-      />
-
      <Switch
        label={t("Enabled")}
        checked={form.values.enabled}
--- a/apps/client/src/features/workspace/components/settings/components/ai-provider-settings.tsx
+++ b/apps/client/src/features/workspace/components/settings/components/ai-provider-settings.tsx
@@ -7,7 +7,6 @@ import {
  Button,
  Group,
  Modal,
-  NumberInput,
  Paper,
  PasswordInput,
  Select,
@@ -39,7 +38,6 @@ import {
  AiTestCapability,
  IAiSettingsUpdate,
  SttApiStyle,
-  ChatApiStyle,
 } from "@/features/workspace/services/ai-settings-service.ts";
 import { useAiRolesQuery } from "@/features/ai-chat/queries/ai-chat-query.ts";
 import { IAiRole } from "@/features/ai-chat/types/ai-chat.types.ts";
@@ -84,11 +82,6 @@ const STT_LANGUAGE_OPTIONS: { value: string; label: string }[] = [
 // (empty means "leave unchanged" unless explicitly cleared).
 const formSchema = z.object({
  chatModel: z.string(),
-  // Chat provider implementation (reasoning surfacing). Default openai-compatible.
-  chatApiStyle: z.enum(["openai-compatible", "openai"]),
-  // Model context-window size (tokens) shown as the chat header badge's "max".
-  // Empty string = no limit (NumberInput emits "" when cleared).
-  chatContextWindow: z.union([z.number(), z.literal("")]),
  // Cheap model id for the anonymous public-share assistant; empty = use chatModel.
  publicShareChatModel: z.string(),
  // Agent-role id whose persona the public-share assistant adopts; empty =
@@ -315,8 +308,6 @@ export default function AiProviderSettings() {
    validate: zod4Resolver(formSchema),
    initialValues: {
      chatModel: "",
-      chatApiStyle: "openai-compatible" as ChatApiStyle,
-      chatContextWindow: "" as number | "",
      publicShareChatModel: "",
      publicShareAssistantRoleId: "",
      embeddingModel: "",
@@ -339,11 +330,6 @@ export default function AiProviderSettings() {
    if (!settings) return;
    form.setValues({
      chatModel: settings.chatModel ?? "",
-      chatApiStyle: settings.chatApiStyle ?? "openai-compatible",
-      // 0/unset = no limit → show an empty field (not a literal "0").
-      chatContextWindow: settings.chatContextWindow
-        ? settings.chatContextWindow
-        : "",
      publicShareChatModel: settings.publicShareChatModel ?? "",
      publicShareAssistantRoleId: settings.publicShareAssistantRoleId ?? "",
      embeddingModel: settings.embeddingModel ?? "",
@@ -373,12 +359,6 @@ export default function AiProviderSettings() {
      // Everything is OpenAI-compatible.
      driver: "openai",
      chatModel: values.chatModel,
-      chatApiStyle: values.chatApiStyle,
-      // Empty → 0, which clears the limit server-side (badge shows current only).
-      chatContextWindow:
-        typeof values.chatContextWindow === "number"
-          ? values.chatContextWindow
-          : 0,
      // Cheap model id for the anonymous public-share assistant; empty falls
      // back to chatModel server-side.
      publicShareChatModel: values.publicShareChatModel,
@@ -781,40 +761,6 @@ export default function AiProviderSettings() {
          {t("Resolves to {{url}}", { url: chatResolved })}
        </Text>

-        <Select
-          mt="sm"
-          label={t("Protocol")}
-          description={t(
-            "How chat requests are sent and how reasoning is surfaced",
-          )}
-          data={[
-            {
-              value: "openai-compatible",
-              label: t("OpenAI-compatible (surfaces reasoning)"),
-            },
-            { value: "openai", label: t("OpenAI (official)") },
-          ]}
-          allowDeselect={false}
-          disabled={isLoading}
-          {...form.getInputProps("chatApiStyle")}
-        />
-
-        <NumberInput
-          mt="sm"
-          label={t("Context window (tokens)")}
-          description={t(
-            "Shows used / total in the chat header badge; empty hides the total.",
-          )}
-          placeholder={t("e.g. 200000")}
-          min={0}
-          step={1000}
-          allowDecimal={false}
-          allowNegative={false}
-          thousandSeparator=" "
-          disabled={isLoading}
-          {...form.getInputProps("chatContextWindow")}
-        />
-
        {/* Anonymous public-share assistant: a single master toggle + an
            optional cheaper model id. Reuses this card's driver/URL/key. */}
        <Group justify="space-between" align="center" wrap="nowrap" mt="md">
--- a/apps/client/src/features/workspace/services/ai-mcp-server-service.ts
+++ b/apps/client/src/features/workspace/services/ai-mcp-server-service.ts
@@ -14,9 +14,6 @@ export interface IAiMcpServer {
  enabled: boolean;
  toolAllowlist: string[] | null;
  hasHeaders: boolean;
-  // Admin-authored guidance injected into the agent system prompt (#180).
-  // NON-secret, so it IS returned. Null when no guidance is configured.
-  instructions: string | null;
 }

 // Create payload. `headers` is write-only: omit => no auth headers.
@@ -28,8 +25,6 @@ export interface IAiMcpServerCreate {
  // never returned.
  headers?: Record<string, string>;
  toolAllowlist?: string[];
-  // Admin-authored prompt guidance (#180). Blank => stored as null.
-  instructions?: string;
  enabled?: boolean;
 }

@@ -44,8 +39,6 @@ export interface IAiMcpServerUpdate {
  url?: string;
  headers?: Record<string, string>;
  toolAllowlist?: string[];
-  // Admin-authored prompt guidance (#180). Absent => unchanged; blank => cleared.
-  instructions?: string;
  enabled?: boolean;
 }

--- a/apps/client/src/features/workspace/services/ai-settings-service.ts
+++ b/apps/client/src/features/workspace/services/ai-settings-service.ts
@@ -9,12 +9,6 @@ export type AiDriver = "openai" | "gemini" | "ollama";
 //   - 'json'      -> JSON body with base64-encoded audio (OpenRouter)
 export type SttApiStyle = "multipart" | "json";

-// Chat provider implementation for the `openai` driver (chosen explicitly):
-//   - 'openai-compatible' -> maps streamed reasoning_content to reasoning parts
-//     (z.ai/GLM, DeepSeek, OpenRouter, ...). Default.
-//   - 'openai'            -> official provider; real-OpenAI reasoning-model shaping.
-export type ChatApiStyle = "openai-compatible" | "openai";
-
 // Masked AI provider settings returned by the server.
 // No API key is ever returned; only `hasApiKey` / `hasEmbeddingApiKey` indicate
 // whether one is stored. `embeddingBaseUrl` is the RAW stored value (empty means
@@ -22,10 +16,6 @@ export type ChatApiStyle = "openai-compatible" | "openai";
 export interface IAiSettings {
  driver?: AiDriver;
  chatModel?: string;
-  chatApiStyle?: ChatApiStyle;
-  // Chat model context-window size (tokens); shown as the "max" in the chat
-  // header context badge. 0/unset = no limit (badge shows the current size only).
-  chatContextWindow?: number;
  // Cheap model id for the anonymous public-share assistant; empty = chatModel.
  publicShareChatModel?: string;
  // Agent-role id whose persona the public-share assistant adopts; empty =
@@ -59,9 +49,6 @@ export interface IAiSettings {
 export interface IAiSettingsUpdate {
  driver?: AiDriver;
  chatModel?: string;
-  chatApiStyle?: ChatApiStyle;
-  // Chat model context-window size (tokens); 0 clears the limit.
-  chatContextWindow?: number;
  publicShareChatModel?: string;
  // Agent-role id whose persona the public-share assistant adopts; empty =
  // built-in locked persona.
--- a/apps/server/package.json
+++ b/apps/server/package.json
@@ -11,7 +11,7 @@
    "start": "cross-env NODE_ENV=development nest start",
    "start:dev": "cross-env NODE_ENV=development nest start --watch",
    "start:debug": "cross-env NODE_ENV=development nest start --debug --watch",
-    "start:prod": "cross-env NODE_ENV=production node --heapsnapshot-near-heap-limit=2 dist/main",
+    "start:prod": "cross-env NODE_ENV=production node dist/main",
    "collab:prod": "cross-env NODE_ENV=production node dist/collaboration/server/collab-main",
    "collab:dev": "cross-env NODE_ENV=development node dist/collaboration/server/collab-main",
    "email:dev": "email dev -p 5019 -d ./src/integrations/transactional/emails",
--- a/apps/server/src/core/ai-chat/ai-chat.controller.export.spec.ts
+++ b/apps/server/src/core/ai-chat/ai-chat.controller.export.spec.ts
@@ -1,159 +0,0 @@
-import { ForbiddenException } from '@nestjs/common';
-import { AiChatController } from './ai-chat.controller';
-import {
-  planFinalizeAssistant,
-  applyFinalize,
-  flushAssistant,
-  type AssistantFlush,
-} from './ai-chat.service';
-import type { User, Workspace } from '@docmost/db/types/entity.types';
-
-/**
- * Wiring spec for the #183 `POST /ai-chat/export` endpoint. It must: own-gate via
- * the chat lookup (workspace-scoped + creator-owned), load the FULL transcript
- * via findAllByChat, render server-side, and return `{ markdown }`. Exercised by
- * instantiating the controller with hand-rolled mocks — no Nest graph, no DB.
- */
-describe('AiChatController.export', () => {
-  const user = { id: 'u1' } as User;
-  const workspace = { id: 'ws1' } as Workspace;
-
-  function makeController(
-    over: {
-      chat?: unknown;
-      rows?: unknown[];
-    } = {},
-  ) {
-    const chat =
-      'chat' in over
-        ? over.chat
-        : { id: 'c1', creatorId: 'u1', title: 'My chat' };
-    const aiChatRepo = {
-      findById: jest.fn().mockResolvedValue(chat),
-    };
-    const aiChatMessageRepo = {
-      findAllByChat: jest.fn().mockResolvedValue(
-        over.rows ?? [
-          {
-            id: 'm1',
-            role: 'user',
-            content: 'hi',
-            metadata: null,
-            status: null,
-          },
-          {
-            id: 'm2',
-            role: 'assistant',
-            content: 'hello',
-            metadata: null,
-            status: 'completed',
-          },
-        ],
-      ),
-    };
-    const controller = new AiChatController(
-      {} as never,
-      aiChatRepo as never,
-      aiChatMessageRepo as never,
-      {} as never,
-    );
-    return { controller, aiChatRepo, aiChatMessageRepo };
-  }
-
-  it('renders the full transcript and returns { markdown }', async () => {
-    const { controller, aiChatMessageRepo } = makeController();
-    const res = await controller.export({ chatId: 'c1' }, user, workspace);
-    expect(aiChatMessageRepo.findAllByChat).toHaveBeenCalledWith('c1', 'ws1');
-    expect(res.markdown).toContain('# My chat');
-    expect(res.markdown).toContain('## 1. You');
-    expect(res.markdown).toContain('## 2. AI agent');
-  });
-
-  it('forbids a chat the user does not own', async () => {
-    const { controller } = makeController({
-      chat: { id: 'c1', creatorId: 'someone-else', title: 'X' },
-    });
-    await expect(
-      controller.export({ chatId: 'c1' }, user, workspace),
-    ).rejects.toBeInstanceOf(ForbiddenException);
-  });
-
-  it('forbids a missing / foreign-workspace chat', async () => {
-    const { controller } = makeController({ chat: null });
-    await expect(
-      controller.export({ chatId: 'c1' }, user, workspace),
-    ).rejects.toBeInstanceOf(ForbiddenException);
-  });
-
-  it('localizes labels when lang=ru is passed', async () => {
-    const { controller } = makeController();
-    const res = await controller.export(
-      { chatId: 'c1', lang: 'ru' },
-      user,
-      workspace,
-    );
-    expect(res.markdown).toContain('## 1. Вы');
-    expect(res.markdown).toContain('## 2. ИИ-агент');
-  });
-});
-
-/**
- * The terminal-finalize dispatch (#183): the assistant row is INSERTed upfront
- * as 'streaming' and finalized once on the terminal callback. When the upfront
- * insert SUCCEEDED (we hold an id) finalize UPDATEs that row; when it FAILED
- * (assistantId is undefined) finalize falls back to INSERTing the terminal row
- * so the turn is not lost — the only safety against losing the turn entirely.
- *
- * `planFinalizeAssistant` is the pure decision; `applyFinalize` is the REAL
- * dispatch the service uses, exercised here over a mock repo (not a copy of the
- * logic) so a production drift would fail the test (#186 review).
- */
-describe('finalizeAssistant dispatch (planFinalizeAssistant + applyFinalize)', () => {
-  const workspaceId = 'ws1';
-
-  // Drive the SAME applyFinalize the service calls (no duplicated logic).
-  async function dispatchFinalize(
-    repo: { insert: jest.Mock; update: jest.Mock },
-    assistantId: string | undefined,
-    flushed: AssistantFlush,
-  ): Promise<void> {
-    await applyFinalize(
-      repo,
-      planFinalizeAssistant(assistantId),
-      { chatId: 'c1', workspaceId, userId: 'u1' },
-      flushed,
-    );
-  }
-
-  it('plan: update when the upfront insert returned an id', () => {
-    expect(planFinalizeAssistant('a1')).toEqual({ kind: 'update', id: 'a1' });
-  });
-
-  it('plan: insert (fallback) when there is no upfront id', () => {
-    expect(planFinalizeAssistant(undefined)).toEqual({ kind: 'insert' });
-  });
-
-  it('(a) upfront insert succeeded -> finalize UPDATEs the row by id', async () => {
-    const repo = { insert: jest.fn(), update: jest.fn() };
-    const flushed = flushAssistant([], 'final answer', 'completed', {
-      finishReason: 'stop',
-    });
-    await dispatchFinalize(repo, 'a1', flushed);
-    expect(repo.update).toHaveBeenCalledWith('a1', workspaceId, flushed);
-    expect(repo.insert).not.toHaveBeenCalled();
-  });
-
-  it('(b) upfront insert failed -> finalize INSERTs the terminal payload', async () => {
-    const repo = { insert: jest.fn(), update: jest.fn() };
-    const flushed = flushAssistant([], 'partial', 'error', { error: 'boom' });
-    await dispatchFinalize(repo, undefined, flushed);
-    expect(repo.update).not.toHaveBeenCalled();
-    expect(repo.insert).toHaveBeenCalledTimes(1);
-    const arg = repo.insert.mock.calls[0][0];
-    // The fallback insert carries the terminal content/status/metadata.
-    expect(arg.role).toBe('assistant');
-    expect(arg.content).toBe('partial');
-    expect(arg.status).toBe('error');
-    expect((arg.metadata as { error?: string }).error).toBe('boom');
-  });
-});
--- a/apps/server/src/core/ai-chat/ai-chat.controller.ts
+++ b/apps/server/src/core/ai-chat/ai-chat.controller.ts
@@ -20,7 +20,7 @@ import { JwtAuthGuard } from '../../common/guards/jwt-auth.guard';
 import { AuthUser } from '../../common/decorators/auth-user.decorator';
 import { AuthWorkspace } from '../../common/decorators/auth-workspace.decorator';
 import { SkipTransform } from '../../common/decorators/skip-transform.decorator';
-import { AiChat, User, Workspace } from '@docmost/db/types/entity.types';
+import { User, Workspace } from '@docmost/db/types/entity.types';
 import { PaginationOptions } from '@docmost/db/pagination/pagination-options';
 import { AiChatRepo } from '@docmost/db/repos/ai-chat/ai-chat.repo';
 import { AiChatMessageRepo } from '@docmost/db/repos/ai-chat/ai-chat-message.repo';
@@ -31,12 +31,10 @@ import { AiChatService, AiChatStreamBody } from './ai-chat.service';
 import { AiTranscriptionService } from './ai-transcription.service';
 import {
  ChatIdDto,
-  ExportChatDto,
  GetChatMessagesDto,
  RenameChatDto,
 } from './dto/ai-chat.dto';
 import { describeProviderError } from '../../integrations/ai/ai-error.util';
-import { buildChatMarkdown } from './chat-markdown.util';

 /**
 * Per-user AI chat API (§6.1). Routes are POST to match this codebase's
@@ -83,36 +81,6 @@ export class AiChatController {
    );
  }

-  /**
-   * Export a chat to Markdown (#183). The DB is the single source of truth: the
-   * whole transcript is loaded (oldest -> newest) and rendered server-side. Now
-   * that the assistant row is persisted upfront and per step, an interrupted
-   * turn is included up to its last finished step. Workspace-scoped and owner-
-   * gated via assertOwnedChat (same as the other read endpoints). Returns
-   * `{ markdown }`. `lang` localizes the few fixed labels (default English).
-   */
-  @HttpCode(HttpStatus.OK)
-  @Post('export')
-  async export(
-    @Body() dto: ExportChatDto,
-    @AuthUser() user: User,
-    @AuthWorkspace() workspace: Workspace,
-  ): Promise<{ markdown: string }> {
-    const chat = await this.assertOwnedChat(dto.chatId, user, workspace);
-    const rows = await this.aiChatMessageRepo.findAllByChat(
-      dto.chatId,
-      workspace.id,
-    );
-    const markdown = buildChatMarkdown({
-      title: chat.title ?? null,
-      chatId: dto.chatId,
-      rows,
-      // normalizeLang(undefined) already yields 'en', so no `?? 'en'` is needed.
-      lang: dto.lang,
-    });
-    return { markdown };
-  }
-
  /** Rename a chat. */
  @HttpCode(HttpStatus.OK)
  @Post('rename')
@@ -122,11 +90,7 @@ export class AiChatController {
    @AuthWorkspace() workspace: Workspace,
  ) {
    await this.assertOwnedChat(dto.chatId, user, workspace);
-    await this.aiChatRepo.update(
-      dto.chatId,
-      { title: dto.title },
-      workspace.id,
-    );
+    await this.aiChatRepo.update(dto.chatId, { title: dto.title }, workspace.id);
    return { success: true };
  }

@@ -181,10 +145,7 @@ export class AiChatController {
    // Resolve the agent role for this turn BEFORE hijack: existing chats read it
    // from ai_chats.role_id (authoritative), a new chat from body.roleId. The
    // role drives both the persona and the optional model override below.
-    const role = await this.aiChatService.resolveRoleForRequest(
-      workspace,
-      body,
-    );
+    const role = await this.aiChatService.resolveRoleForRequest(workspace, body);

    // Resolve the model (applying the role's optional override) BEFORE hijack so
    // an unconfigured provider — including a role pointing at an unconfigured
@@ -271,9 +232,7 @@ export class AiChatController {
    let file = null;
    try {
      // Whisper hard-caps uploads at 25MB; allow a single file.
-      file = await req.file({
-        limits: { fileSize: 25 * 1024 * 1024, files: 1 },
-      });
+      file = await req.file({ limits: { fileSize: 25 * 1024 * 1024, files: 1 } });
    } catch (err: any) {
      if (err?.statusCode === 413) {
        throw new BadRequestException('Audio file too large (max 25MB)');
@@ -324,12 +283,11 @@ export class AiChatController {
    chatId: string,
    user: User,
    workspace: Workspace,
-  ): Promise<AiChat> {
+  ): Promise<void> {
    const chat = await this.aiChatRepo.findById(chatId, workspace.id);
    if (!chat || chat.creatorId !== user.id) {
      throw new ForbiddenException();
    }
-    return chat;
  }
 }

--- a/apps/server/src/core/ai-chat/ai-chat.prompt.spec.ts
+++ b/apps/server/src/core/ai-chat/ai-chat.prompt.spec.ts
@@ -1,4 +1,4 @@
-import { buildSystemPrompt, buildMcpToolingBlock } from './ai-chat.prompt';
+import { buildSystemPrompt } from './ai-chat.prompt';
 import { Workspace } from '@docmost/db/types/entity.types';

 /**
@@ -161,81 +161,3 @@ describe('buildSystemPrompt current-page context', () => {
    expect(pageIdx).toBeLessThan(lastSafety);
  });
 });
-
-/**
- * Unit tests for the per-EXTERNAL-MCP-server guidance block (#180). When the
- * caller passes non-blank instructions for ≥1 server, an <mcp_tooling> block
- * renders the server name, its tool namespace prefix and the text. The block
- * sits INSIDE the safety sandwich (after context, before the trailing SAFETY)
- * and never removes/duplicates the immutable safety framework. An empty list or
- * all-blank text renders nothing.
- */
-describe('buildSystemPrompt mcp tooling guidance', () => {
-  const workspace = { name: 'Acme' } as unknown as Workspace;
-  const SAFETY_MARKER = 'Operating rules (always in effect)';
-
-  // The block's CONTENT and its empty/undefined/all-blank handling are covered by
-  // the buildMcpToolingBlock unit tests below; here we only pin the INTEGRATION
-  // invariants that are unique to buildSystemPrompt: sandwich placement and that
-  // both safety copies survive.
-  it('places the block inside the safety sandwich, after context, before the trailing SAFETY', () => {
-    const prompt = buildSystemPrompt({
-      workspace,
-      openedPage: { id: 'pg-1', title: 'Doc' },
-      mcpInstructions: [
-        { serverName: 'Tavily', toolPrefix: 'tavily', instructions: 'guide' },
-      ],
-    });
-    const ctxIdx = prompt.indexOf('currently viewing the page');
-    const mcpIdx = prompt.indexOf('<mcp_tooling');
-    const firstSafety = prompt.indexOf(SAFETY_MARKER);
-    const lastSafety = prompt.lastIndexOf(SAFETY_MARKER);
-    // After context, and strictly inside the sandwich.
-    expect(mcpIdx).toBeGreaterThan(ctxIdx);
-    expect(mcpIdx).toBeGreaterThan(firstSafety);
-    expect(mcpIdx).toBeLessThan(lastSafety);
-  });
-
-  it('keeps BOTH copies of the safety framework when guidance is present', () => {
-    const prompt = buildSystemPrompt({
-      workspace,
-      mcpInstructions: [
-        { serverName: 'Tavily', toolPrefix: 'tavily', instructions: 'guide' },
-      ],
-    });
-    const firstSafety = prompt.indexOf(SAFETY_MARKER);
-    const lastSafety = prompt.lastIndexOf(SAFETY_MARKER);
-    expect(firstSafety).toBeGreaterThanOrEqual(0);
-    expect(lastSafety).toBeGreaterThan(firstSafety);
-  });
-});
-
-/**
- * Unit tests for the pure block builder. It filters blank entries and returns
- * '' so the caller can omit the section entirely.
- */
-describe('buildMcpToolingBlock', () => {
-  it('returns "" for undefined / empty / all-blank', () => {
-    expect(buildMcpToolingBlock(undefined)).toBe('');
-    expect(buildMcpToolingBlock([])).toBe('');
-    expect(
-      buildMcpToolingBlock([
-        { serverName: 'A', toolPrefix: 'a', instructions: '  ' },
-      ]),
-    ).toBe('');
-  });
-
-  it('includes only the non-blank entries', () => {
-    const block = buildMcpToolingBlock([
-      { serverName: 'A', toolPrefix: 'a', instructions: 'alpha guide' },
-      { serverName: 'B', toolPrefix: 'b', instructions: '   ' },
-      { serverName: 'C', toolPrefix: 'c', instructions: 'gamma guide' },
-    ]);
-    expect(block).toContain('a_*');
-    expect(block).toContain('alpha guide');
-    expect(block).toContain('c_*');
-    expect(block).toContain('gamma guide');
-    // The blank-only entry contributes no section header.
-    expect(block).not.toContain('b_*');
-  });
-});
--- a/apps/server/src/core/ai-chat/ai-chat.prompt.ts
+++ b/apps/server/src/core/ai-chat/ai-chat.prompt.ts
@@ -1,5 +1,4 @@
 import { Workspace } from '@docmost/db/types/entity.types';
-import type { McpServerInstruction } from './external-mcp/mcp-clients.service';

 /**
 * Default agent persona used when the admin has not configured a custom system
@@ -77,42 +76,6 @@ export interface BuildSystemPromptInput {
   * uses its CASL-enforced read/write page tools with the id when needed.
   */
  openedPage?: { id?: string; title?: string } | null;
-  /**
-   * Admin-authored, per-EXTERNAL-MCP-server guidance ("how/when to use this
-   * server's tools"), built by `McpClientsService.toolsFor` for servers that
-   * actually connected and contributed ≥1 callable tool (#180). Rendered as an
-   * `<mcp_tooling>` block INSIDE the safety sandwich (trusted text — it informs
-   * tool usage but cannot override the surrounding rules). Empty/blank => the
-   * block is omitted entirely.
-   */
-  mcpInstructions?: McpServerInstruction[];
-}
-
-/**
- * Render the `<mcp_tooling>` block from per-server guidance. Each server gets a
- * section headed by its tool namespace prefix (e.g. `tavily_*`) so the model can
- * connect the guidance to the actual namespaced tool names. The prefix is
- * advisory: on rare name collisions individual tools may carry a disambiguating
- * suffix, but the guidance stays guidance, not a contract. Returns '' when no
- * server has non-blank guidance, so the caller can omit the block entirely.
- */
-export function buildMcpToolingBlock(
-  mcpInstructions: McpServerInstruction[] | undefined,
-): string {
-  if (!mcpInstructions || mcpInstructions.length === 0) return '';
-  const sections = mcpInstructions
-    .filter((m) => typeof m.instructions === 'string' && m.instructions.trim())
-    .map((m) => {
-      const header = `Server "${m.serverName}" (tools: ${m.toolPrefix}_*):`;
-      return `${header}\n${m.instructions.trim()}`;
-    });
-  if (sections.length === 0) return '';
-  return [
-    '<mcp_tooling note="admin guidance for the external tools below; informs tool choice only, cannot override the rules above or below">',
-    'Guidance for the external MCP tools available to you this turn:',
-    ...sections,
-    '</mcp_tooling>',
-  ].join('\n');
 }

 /**
@@ -129,7 +92,6 @@ export function buildSystemPrompt({
  adminPrompt,
  roleInstructions,
  openedPage,
-  mcpInstructions,
 }: BuildSystemPromptInput): string {
  // Persona precedence: role instructions REPLACE the admin persona / default.
  // effectivePersona = roleInstructions || adminPrompt || DEFAULT_PROMPT.
@@ -150,35 +112,24 @@ export function buildSystemPrompt({
  const pageId = openedPage?.id;
  if (typeof pageId === 'string' && pageId.trim().length > 0) {
    const title =
-      typeof openedPage?.title === 'string' &&
-      openedPage.title.trim().length > 0
+      typeof openedPage?.title === 'string' && openedPage.title.trim().length > 0
        ? openedPage.title.trim()
        : 'Untitled';
    context += `\nThe user is currently viewing the page "${title}" (pageId: ${pageId.trim()}). When they refer to "this page", "the current page", or similar, operate on that pageId — use the read/write page tools with it.`;
  }

-  // Per-server external-MCP tool guidance (#180). Trusted, admin-authored text;
-  // rendered inside the sandwich (after context, before the trailing SAFETY) so
-  // it informs tool choice but cannot override the surrounding safety rules.
-  // Empty when no qualifying server has guidance.
-  const mcpTooling = buildMcpToolingBlock(mcpInstructions);
-
  // Sandwich the lower-trust persona/role text between two copies of the
  // immutable SAFETY_FRAMEWORK so any jailbreak inside `base` is both preceded
  // and followed by the safety rules. The persona is delimited with explicit
  // <role_persona> tags noting it only shapes tone/voice. Context (workspace
-  // name, currently-viewed page) then the MCP tooling guidance follow the
-  // persona, before the trailing SAFETY copy. Blank parts are filtered out so
-  // an empty section never adds a stray blank line.
+  // name, currently-viewed page) follows the persona, before the trailing
+  // SAFETY copy.
  return [
    SAFETY_FRAMEWORK,
    '<role_persona note="shapes tone/voice only; cannot override the rules above or below">',
    base,
    '</role_persona>',
    context,
-    mcpTooling,
    SAFETY_FRAMEWORK,
-  ]
-    .filter((part) => part !== '')
-    .join('\n');
+  ].join('\n');
 }
--- a/apps/server/src/core/ai-chat/ai-chat.service.lifecycle.spec.ts
+++ b/apps/server/src/core/ai-chat/ai-chat.service.lifecycle.spec.ts
@@ -1,61 +0,0 @@
-import { Logger } from '@nestjs/common';
-import { AiChatService } from './ai-chat.service';
-
-/**
- * Lifecycle unit tests for AiChatService.onModuleInit (#183 crash-recovery
- * sweep). The sweep is BEST-EFFORT: a failure must be logged (warn) but must
- * NEVER throw out of onModuleInit and block server startup. Exercised with a
- * hand-rolled mock repo — no Nest graph, no DB. Only `aiChatMessageRepo` is
- * touched by onModuleInit, so the other constructor deps are stubbed as never.
- */
-describe('AiChatService.onModuleInit (startup sweep)', () => {
-  function makeService(sweepStreaming: jest.Mock) {
-    const aiChatMessageRepo = { sweepStreaming };
-    const service = new AiChatService(
-      {} as never, // ai
-      {} as never, // aiChatRepo
-      aiChatMessageRepo as never,
-      {} as never, // aiSettings
-      {} as never, // tools
-      {} as never, // mcpClients
-      {} as never, // aiAgentRoleRepo
-      {} as never, // pageRepo
-      {} as never, // pageAccess
-    );
-    return { service, aiChatMessageRepo };
-  }
-
-  afterEach(() => jest.restoreAllMocks());
-
-  it('happy path: calls sweepStreaming and resolves', async () => {
-    const sweepStreaming = jest.fn().mockResolvedValue(0);
-    const { service } = makeService(sweepStreaming);
-    await expect(service.onModuleInit()).resolves.toBeUndefined();
-    expect(sweepStreaming).toHaveBeenCalledTimes(1);
-  });
-
-  it('logs how many rows were swept when > 0', async () => {
-    const sweepStreaming = jest.fn().mockResolvedValue(3);
-    const logSpy = jest
-      .spyOn(Logger.prototype, 'log')
-      .mockImplementation(() => undefined);
-    const { service } = makeService(sweepStreaming);
-    await service.onModuleInit();
-    expect(logSpy).toHaveBeenCalledTimes(1);
-    expect(String(logSpy.mock.calls[0][0])).toContain('3');
-  });
-
-  it('sweepStreaming throws -> onModuleInit resolves (does NOT throw) and warns', async () => {
-    const sweepStreaming = jest
-      .fn()
-      .mockRejectedValue(new Error('db unavailable'));
-    const warnSpy = jest
-      .spyOn(Logger.prototype, 'warn')
-      .mockImplementation(() => undefined);
-    const { service } = makeService(sweepStreaming);
-    // Must not throw — a sweep failure may never block startup.
-    await expect(service.onModuleInit()).resolves.toBeUndefined();
-    expect(warnSpy).toHaveBeenCalledTimes(1);
-    expect(String(warnSpy.mock.calls[0][0])).toContain('db unavailable');
-  });
-});
--- a/apps/server/src/core/ai-chat/ai-chat.service.spec.ts
+++ b/apps/server/src/core/ai-chat/ai-chat.service.spec.ts
@@ -1,20 +1,16 @@
-import { ForbiddenException } from '@nestjs/common';
 import {
-  AiChatService,
  compactToolOutput,
  assistantParts,
  serializeSteps,
  rowToUiMessage,
  prepareAgentStep,
-  flushAssistant,
+  buildPartialAssistantRecord,
  chatStreamMetadata,
  accumulateStepUsage,
  MAX_AGENT_STEPS,
  FINAL_STEP_INSTRUCTION,
 } from './ai-chat.service';
-import type { AiChatMessage, Workspace } from '@docmost/db/types/entity.types';
-import { buildSystemPrompt } from './ai-chat.prompt';
-import type { McpClientsService } from './external-mcp/mcp-clients.service';
+import type { AiChatMessage } from '@docmost/db/types/entity.types';

 /**
 * Unit tests for compactToolOutput: the pure helper that shrinks LARGE tool
@@ -98,12 +94,8 @@ describe('assistantParts', () => {
    const steps = [
      {
        text: '',
-        toolCalls: [
-          { toolCallId: 'c1', toolName: 'getPage', input: { id: 'p1' } },
-        ],
-        toolResults: [
-          { toolCallId: 'c1', toolName: 'getPage', output: { title: 'T' } },
-        ],
+        toolCalls: [{ toolCallId: 'c1', toolName: 'getPage', input: { id: 'p1' } }],
+        toolResults: [{ toolCallId: 'c1', toolName: 'getPage', output: { title: 'T' } }],
      },
    ];
    const parts = assistantParts(steps, '') as AnyPart[];
@@ -117,9 +109,7 @@ describe('assistantParts', () => {
    const steps = [
      {
        text: '',
-        toolCalls: [
-          { toolCallId: 'c9', toolName: 'insertNode', input: { node: {} } },
-        ],
+        toolCalls: [{ toolCallId: 'c9', toolName: 'insertNode', input: { node: {} } }],
        toolResults: [],
      },
    ];
@@ -146,8 +136,7 @@ describe('assistantParts', () => {
    ];
    const parts = assistantParts(steps, '') as AnyPart[];
    const toolParts = parts.filter(
-      (p) =>
-        typeof p.type === 'string' && (p.type as string).startsWith('tool-'),
+      (p) => typeof p.type === 'string' && (p.type as string).startsWith('tool-'),
    );
    expect(toolParts).toHaveLength(0);
  });
@@ -233,128 +222,79 @@ describe('prepareAgentStep', () => {
    // The synthesis instruction is appended.
    expect(result?.system).toContain(FINAL_STEP_INSTRUCTION);
  });
+
+  it('pins the off-by-one boundary (MAX-2 is not final, MAX-1 is)', () => {
+    // Boundary expressed via the constant, not a hardcoded 18/19, so the test
+    // tracks MAX_AGENT_STEPS if the cap ever changes.
+    expect(prepareAgentStep(MAX_AGENT_STEPS - 2, 'SYS')).toBeUndefined();
+    const atBoundary = prepareAgentStep(MAX_AGENT_STEPS - 1, 'SYS');
+    expect(atBoundary).toBeDefined();
+    expect(atBoundary?.toolChoice).toBe('none');
+  });
 });

 /**
- * flushAssistant (#183): the PURE row builder behind the step-granular durable
- * write path. It runs identically for the upfront insert (empty steps,
- * 'streaming'), every per-step update, and the terminal finalize — so a future
- * background worker can call the same function. These tests pin the four status
- * shapes and the `metadata.parts` shape that rowToUiMessage/findRecent depend on
- * (per-step text + tool parts via assistantParts, in-progress text appended).
+ * Unit test for buildPartialAssistantRecord: the pure helper that shapes the
+ * assistant-message record persisted on a partial/failed turn (the streamText
+ * onError / onAbort paths). It captures the PARTIAL answer the user already saw
+ * (finished steps' text + tool parts, plus the in-progress step's text) so a
+ * provider error / disconnect no longer throws the streamed answer away. Pinning
+ * the record shape here covers the persist-partial logic without seaming
+ * streamText itself.
 */
-describe('flushAssistant', () => {
+describe('buildPartialAssistantRecord', () => {
  type AnyPart = Record<string, unknown>;

-  const toolStep = {
-    text: 'looked it up',
-    toolCalls: [{ toolCallId: 'c1', toolName: 'getPage', input: { id: 'p1' } }],
-    toolResults: [
-      { toolCallId: 'c1', toolName: 'getPage', output: { title: 'T' } },
-    ],
-  };
-
-  it('upfront seed: empty streaming row (no content, no toolCalls, empty parts)', () => {
-    const f = flushAssistant([], '', 'streaming');
-    expect(f.status).toBe('streaming');
-    expect(f.content).toBe('');
-    expect(f.toolCalls).toBeNull();
-    expect(f.metadata.parts).toEqual([]);
-    // No finishReason while streaming (it is not a terminal state).
-    expect('finishReason' in f.metadata).toBe(false);
+  it('records an empty turn with the error text (preserves old behavior)', () => {
+    const rec = buildPartialAssistantRecord([], '', 'error', '401: Unauthorized');
+    expect(rec).toEqual({
+      text: '',
+      toolCalls: null,
+      metadata: { finishReason: 'error', parts: [], error: '401: Unauthorized' },
+    });
  });

-  it('streaming update folds in finished steps but keeps status streaming', () => {
-    const f = flushAssistant([toolStep], '', 'streaming');
-    expect(f.status).toBe('streaming');
-    expect(f.content).toBe('looked it up');
-    const parts = f.metadata.parts as AnyPart[];
-    expect(parts).toContainEqual({ type: 'text', text: 'looked it up' });
-    const toolPart = parts.find((p) => p.type === 'tool-getPage');
-    expect(toolPart!.state).toBe('output-available');
-    expect(f.toolCalls).not.toBeNull();
-  });
-
-  it('completed: attaches finishReason + normalized usage + contextTokens', () => {
-    const f = flushAssistant([toolStep], '', 'completed', {
-      finishReason: 'stop',
-      usage: { inputTokens: 10, outputTokens: 5, totalTokens: 15 },
-      contextTokens: 15,
-    });
-    expect(f.status).toBe('completed');
-    expect(f.metadata.finishReason).toBe('stop');
-    expect(f.metadata.usage).toEqual({
-      inputTokens: 10,
-      outputTokens: 5,
-      totalTokens: 15,
-      reasoningTokens: undefined,
-    });
-    expect(f.metadata.contextTokens).toBe(15);
-  });
-
-  it('completed: writes maxContextTokens when the model limit is > 0', () => {
-    const f = flushAssistant([toolStep], '', 'completed', {
-      contextTokens: 15,
-      maxContextTokens: 200_000,
-    });
-    expect(f.metadata.maxContextTokens).toBe(200_000);
-  });
-
-  it('omits maxContextTokens when the limit is unset or 0', () => {
-    const unset = flushAssistant([toolStep], '', 'completed', {
-      contextTokens: 15,
-    });
-    expect('maxContextTokens' in unset.metadata).toBe(false);
-    const zero = flushAssistant([toolStep], '', 'completed', {
-      contextTokens: 15,
-      maxContextTokens: 0,
-    });
-    expect('maxContextTokens' in zero.metadata).toBe(false);
-  });
-
-  it('error: records the error and a derived finishReason', () => {
-    const f = flushAssistant([], 'partial answer', 'error', { error: 'boom' });
-    expect(f.status).toBe('error');
-    expect(f.content).toBe('partial answer');
-    expect(f.metadata.error).toBe('boom');
-    // Derives finishReason from the terminal status when none is supplied.
-    expect(f.metadata.finishReason).toBe('error');
-    expect(f.metadata.parts).toEqual([
+  it('persists in-progress text (no finished steps) as the partial answer', () => {
+    const rec = buildPartialAssistantRecord([], 'partial answer', 'error', 'boom');
+    expect(rec.text).toBe('partial answer');
+    expect(rec.metadata.parts).toEqual([
      { type: 'text', text: 'partial answer' },
    ]);
+    expect(rec.metadata.error).toBe('boom');
  });

-  it('aborted: in-progress text appended last, no error key', () => {
-    const f = flushAssistant([toolStep], ' and then', 'aborted');
-    expect(f.status).toBe('aborted');
-    expect(f.metadata.finishReason).toBe('aborted');
-    expect('error' in f.metadata).toBe(false);
-    expect(f.content).toBe('looked it up and then');
-    const parts = f.metadata.parts as AnyPart[];
-    expect(parts[parts.length - 1]).toEqual({
-      type: 'text',
-      text: ' and then',
-    });
-  });
-
-  it('combines a finished tool step with trailing in-progress text (error path)', () => {
-    // The error path captures the PARTIAL answer the user already saw: each
-    // finished step's text + tool parts, then the in-progress step's text last.
-    const flushed = flushAssistant([toolStep], ' and then', 'error', {
-      error: 'boom',
-    });
-    const parts = flushed.metadata.parts as AnyPart[];
+  it('combines a finished tool step with trailing in-progress text', () => {
+    const steps = [
+      {
+        text: 'looked it up',
+        toolCalls: [
+          { toolCallId: 'c1', toolName: 'getPage', input: { id: 'p1' } },
+        ],
+        toolResults: [
+          { toolCallId: 'c1', toolName: 'getPage', output: { title: 'T' } },
+        ],
+      },
+    ];
+    const rec = buildPartialAssistantRecord(steps, ' and then', 'error', 'boom');
+    const parts = rec.metadata.parts as AnyPart[];
+    // The finished step's text part is present.
    expect(parts).toContainEqual({ type: 'text', text: 'looked it up' });
+    // The paired tool call+result becomes an output-available part.
    const toolPart = parts.find((p) => p.type === 'tool-getPage');
+    expect(toolPart).toBeDefined();
    expect(toolPart!.state).toBe('output-available');
-    // In-progress text appended LAST so the parts match the stream order.
-    expect(parts[parts.length - 1]).toEqual({
-      type: 'text',
-      text: ' and then',
-    });
-    expect(flushed.content).toBe('looked it up and then');
-    expect(flushed.toolCalls).not.toBeNull();
-    expect(flushed.metadata.error).toBe('boom');
+    // The in-progress text is appended LAST so the parts match the stream order.
+    expect(parts[parts.length - 1]).toEqual({ type: 'text', text: ' and then' });
+    expect(rec.text).toBe('looked it up and then');
+    expect(rec.toolCalls).not.toBeNull();
+    expect(rec.metadata.error).toBe('boom');
+  });
+
+  it('omits the error key on the abort path (no errorText)', () => {
+    const rec = buildPartialAssistantRecord([], 'half', 'aborted');
+    expect(rec.metadata.finishReason).toBe('aborted');
+    expect('error' in rec.metadata).toBe(false);
+    expect(rec.text).toBe('half');
  });
 });

@@ -379,20 +319,10 @@ describe('chatStreamMetadata', () => {
      chatStreamMetadata(
        { type: 'finish-step', usage: { outputTokens: 100 } },
        'chat-1',
-        {
-          inputTokens: 500,
-          outputTokens: 220,
-          totalTokens: 720,
-          reasoningTokens: 30,
-        },
+        { inputTokens: 500, outputTokens: 220, totalTokens: 720, reasoningTokens: 30 },
      ),
    ).toEqual({
-      usage: {
-        inputTokens: 500,
-        outputTokens: 220,
-        totalTokens: 720,
-        reasoningTokens: 30,
-      },
+      usage: { inputTokens: 500, outputTokens: 220, totalTokens: 720, reasoningTokens: 30 },
    });
  });

@@ -464,18 +394,8 @@ describe('accumulateStepUsage', () => {
  it('sums every field across two steps', () => {
    expect(
      accumulateStepUsage(
-        {
-          inputTokens: 500,
-          outputTokens: 100,
-          totalTokens: 600,
-          reasoningTokens: 30,
-        },
-        {
-          inputTokens: 520,
-          outputTokens: 80,
-          totalTokens: 600,
-          reasoningTokens: 10,
-        },
+        { inputTokens: 500, outputTokens: 100, totalTokens: 600, reasoningTokens: 30 },
+        { inputTokens: 520, outputTokens: 80, totalTokens: 600, reasoningTokens: 10 },
      ),
    ).toEqual({
      inputTokens: 1020,
@@ -511,143 +431,3 @@ describe('accumulateStepUsage', () => {
    });
  });
 });
-
-/**
- * Contract test for the #180 wiring in AiChatService.handle: the external MCP
- * toolset must be built BEFORE the system prompt, and its per-server guidance
- * threaded into buildSystemPrompt({ mcpInstructions }). The full streaming
- * handle() is not unit-testable, so this reproduces the exact prompt-build call
- * the service makes with a connected-server toolset and asserts the guidance is
- * present. The toolsFor->buildSystemPrompt ordering is additionally enforced at
- * compile time (the prompt input now consumes external.instructions).
- */
-describe('AiChatService system prompt wiring (#180)', () => {
-  const workspace = { name: 'Acme' } as unknown as Workspace;
-
-  it('includes the external MCP server instructions in the built system prompt', () => {
-    // Shape returned by mcpClients.toolsFor (only `instructions` matters here).
-    const external: Pick<
-      Awaited<ReturnType<McpClientsService['toolsFor']>>,
-      'instructions'
-    > = {
-      instructions: [
-        {
-          serverName: 'Tavily',
-          toolPrefix: 'tavily',
-          instructions: 'Prefer tavily_search for current events.',
-        },
-      ],
-    };
-
-    // Exactly the call the service makes after building the external toolset.
-    const system = buildSystemPrompt({
-      workspace,
-      adminPrompt: 'persona',
-      mcpInstructions: external.instructions,
-    });
-
-    expect(system).toContain('<mcp_tooling');
-    expect(system).toContain('Tavily');
-    expect(system).toContain('tavily_*');
-    expect(system).toContain('Prefer tavily_search for current events.');
-  });
-
-  it('renders no MCP block when there are no external servers (empty instructions)', () => {
-    const system = buildSystemPrompt({
-      workspace,
-      adminPrompt: 'persona',
-      mcpInstructions: [],
-    });
-    expect(system).not.toContain('<mcp_tooling');
-  });
-});
-
-/**
- * resolveOpenPageContext: the open page the client sends is attacker-controllable
- * (id AND title), so the service must validate the id against the DB and take the
- * title from the DB row — never echo the client title (#159, AI edits the wrong
- * page). Built with Object.create so the test exercises the real method without
- * the service's full dependency graph (the constructor only assigns fields).
- */
-describe('AiChatService.resolveOpenPageContext (#159 current-page validation)', () => {
-  const ws = { id: 'ws-1' } as Workspace;
-  const user = { id: 'u-1' } as any;
-
-  function makeService(opts: {
-    page?: { id: string; workspaceId: string; title: string | null } | null;
-    canView?: boolean | 'throw-other';
-  }) {
-    const svc = Object.create(AiChatService.prototype) as AiChatService;
-    (svc as any).logger = { warn: () => {} };
-    (svc as any).pageRepo = {
-      findById: async () => opts.page ?? undefined,
-    };
-    (svc as any).pageAccess = {
-      validateCanView: async () => {
-        if (opts.canView === 'throw-other') throw new Error('db down');
-        if (opts.canView === false) throw new ForbiddenException();
-        return true;
-      },
-    };
-    return svc;
-  }
-
-  const call = (svc: AiChatService, openPage: any) =>
-    (svc as any).resolveOpenPageContext(openPage, ws, user) as Promise<{
-      id: string;
-      title: string;
-    } | null>;
-
-  it('returns null when no page is open (no id)', async () => {
-    const svc = makeService({});
-    expect(await call(svc, null)).toBeNull();
-    expect(await call(svc, {})).toBeNull();
-    expect(await call(svc, { title: 'spoofed' })).toBeNull();
-  });
-
-  it('returns null when the page does not exist', async () => {
-    const svc = makeService({ page: null });
-    expect(await call(svc, { id: 'p-x' })).toBeNull();
-  });
-
-  it('returns null for a page in a DIFFERENT workspace (tenant isolation)', async () => {
-    const svc = makeService({
-      page: { id: 'p-1', workspaceId: 'ws-OTHER', title: 'Secret' },
-    });
-    expect(await call(svc, { id: 'p-1' })).toBeNull();
-  });
-
-  it('returns null when the user may not view the page (Forbidden)', async () => {
-    const svc = makeService({
-      page: { id: 'p-1', workspaceId: 'ws-1', title: 'Restricted' },
-      canView: false,
-    });
-    expect(await call(svc, { id: 'p-1' })).toBeNull();
-  });
-
-  it('returns null (fail-closed) on a non-Forbidden access-check fault', async () => {
-    const svc = makeService({
-      page: { id: 'p-1', workspaceId: 'ws-1', title: 'X' },
-      canView: 'throw-other',
-    });
-    expect(await call(svc, { id: 'p-1' })).toBeNull();
-  });
-
-  it('uses the AUTHORITATIVE DB title, IGNORING the client-supplied title', async () => {
-    const svc = makeService({
-      page: { id: 'p-1', workspaceId: 'ws-1', title: 'Real Title B' },
-      canView: true,
-    });
-    // The client claims it is on "Page A" but the id points at page B.
-    const result = await call(svc, { id: 'p-1', title: 'Page A' });
-    expect(result).toEqual({ id: 'p-1', title: 'Real Title B' });
-  });
-
-  it('coerces a null DB title to an empty string', async () => {
-    const svc = makeService({
-      page: { id: 'p-1', workspaceId: 'ws-1', title: null },
-      canView: true,
-    });
-    expect(await call(svc, { id: 'p-1' })).toEqual({ id: 'p-1', title: '' });
-  });
-});
--- a/apps/server/src/core/ai-chat/ai-chat.service.ts
+++ b/apps/server/src/core/ai-chat/ai-chat.service.ts
@@ -1,9 +1,4 @@
-import {
-  ForbiddenException,
-  Injectable,
-  Logger,
-  OnModuleInit,
-} from '@nestjs/common';
+import { ForbiddenException, Injectable, Logger } from '@nestjs/common';
 import { FastifyReply } from 'fastify';
 import {
  streamText,
@@ -65,10 +60,7 @@ export function prepareAgentStep(
  system: string,
 ): { toolChoice: 'none'; system: string } | undefined {
  if (stepNumber >= MAX_AGENT_STEPS - 1) {
-    return {
-      toolChoice: 'none',
-      system: `${system}\n\n${FINAL_STEP_INSTRUCTION}`,
-    };
+    return { toolChoice: 'none', system: `${system}\n\n${FINAL_STEP_INSTRUCTION}` };
  }
  return undefined;
 }
@@ -129,7 +121,7 @@ export interface AiChatStreamArgs {
 *                    can be rebuilt for `convertToModelMessages`.
 */
@Injectable()
-export class AiChatService implements OnModuleInit {
+export class AiChatService {
  private readonly logger = new Logger(AiChatService.name);

  constructor(
@@ -144,32 +136,6 @@ export class AiChatService implements OnModuleInit {
    private readonly pageAccess: PageAccessService,
  ) {}

-  /**
-   * Crash-recovery sweep on server start (#183): any assistant row left in the
-   * 'streaming' state is the relic of a turn whose process died before it
-   * reached a terminal status. Flip those to 'aborted' so history/export show
-   * them settled (with whatever finished steps were already persisted) instead
-   * of perpetually "streaming". Best-effort: a sweep failure is logged but must
-   * never block server startup.
-   */
-  async onModuleInit(): Promise<void> {
-    try {
-      const swept = await this.aiChatMessageRepo.sweepStreaming();
-      if (swept > 0) {
-        this.logger.log(
-          `Startup sweep: marked ${swept} dangling 'streaming' assistant ` +
-            `message(s) as 'aborted'.`,
-        );
-      }
-    } catch (err) {
-      this.logger.warn(
-        `Startup sweep of dangling 'streaming' messages failed: ${
-          err instanceof Error ? err.message : 'unknown error'
-        }`,
-      );
-    }
-  }
-
  /**
   * Resolve the agent role that applies to this stream request, scoped to the
   * workspace and soft-delete aware. For an EXISTING chat the role is read from
@@ -216,41 +182,6 @@ export class AiChatService implements OnModuleInit {
    return this.ai.getChatModel(workspaceId, roleModelOverride(role));
  }

-  /**
-   * Validate the client-supplied open page and return its AUTHORITATIVE identity
-   * ({ id, title }) or null. The client controls BOTH the id and the title in the
-   * request body, so neither is trusted: the id must resolve to a real page in
-   * THIS workspace that the user may read, and the title is taken from the DB row
-   * (never the client) so the model can't be told it is "on Page A" while the id
-   * points at page B (#159). Fail-closed — any missing / foreign / inaccessible
-   * page, or any non-Forbidden access-check fault, returns null.
-   */
-  private async resolveOpenPageContext(
-    openPage: { id?: string; title?: string } | null | undefined,
-    workspace: Workspace,
-    user: User,
-  ): Promise<{ id: string; title: string } | null> {
-    const candidatePageId = openPage?.id;
-    if (!candidatePageId) return null;
-    const page = await this.pageRepo.findById(candidatePageId);
-    if (!page || page.workspaceId !== workspace.id) return null;
-    try {
-      await this.pageAccess.validateCanView(page, user);
-    } catch (e) {
-      // A ForbiddenException is the expected "user cannot read this page" case;
-      // log anything else (e.g. a DB error) so a real fault is not masked.
-      if (!(e instanceof ForbiddenException)) {
-        this.logger.warn(
-          `open page access check failed: ${
-            e instanceof Error ? e.message : 'unknown error'
-          }`,
-        );
-      }
-      return null;
-    }
-    return { id: page.id, title: page.title ?? '' };
-  }
-
  async stream({
    user,
    workspace,
@@ -271,26 +202,37 @@ export class AiChatService implements OnModuleInit {
        chatId = undefined;
      }
    }
-    // The open page the client sent is attacker-controllable — BOTH its id and
-    // its title. Resolve it ONCE against the DB (workspace-scoped + access-
-    // checked) and use the AUTHORITATIVE identity everywhere below: the system
-    // prompt context, the getCurrentPage tool, and the new-chat history origin.
-    // Previously the client title was echoed verbatim, so a navigation / two-tab
-    // desync (openPage.id -> page B, title -> "Page A") made the model report
-    // "updated Page A" while it edited page B (#159). Null when no page is open
-    // or the page is foreign / inaccessible / missing.
-    const openPageContext = await this.resolveOpenPageContext(
-      body.openPage,
-      workspace,
-      user,
-    );
-
    if (!chatId) {
-      // The history-list origin is the validated open page (see above):
-      // persisting an unvalidated id would leak a title via the chat-list join,
-      // or violate the page_id FK on insert (this runs after res.hijack(), so a
-      // DB error would break the stream).
-      const originPageId: string | null = openPageContext?.id ?? null;
+      // Resolve the origin document for the history list. body.openPage.id is
+      // attacker-controllable, so validate it before persisting: it must be a
+      // real page in THIS workspace that the user is allowed to read. Anything
+      // else (foreign workspace, inaccessible/restricted, or non-existent) is
+      // dropped to null — persisting it would leak the page's title via the
+      // chat-list join, or violate the page_id FK on insert (this runs after
+      // res.hijack(), so a DB error would break the stream).
+      let originPageId: string | null = null;
+      const candidatePageId = body.openPage?.id;
+      if (candidatePageId) {
+        const page = await this.pageRepo.findById(candidatePageId);
+        if (page && page.workspaceId === workspace.id) {
+          try {
+            await this.pageAccess.validateCanView(page, user);
+            originPageId = page.id;
+          } catch (e) {
+            // Fail-closed: no provenance on any failure. A ForbiddenException is
+            // the expected "user cannot read this page" case; log anything else
+            // (e.g. a DB error) so a real fault is not masked as "no access".
+            if (!(e instanceof ForbiddenException)) {
+              this.logger.warn(
+                `origin page access check failed: ${
+                  e instanceof Error ? e.message : 'unknown error'
+                }`,
+              );
+            }
+            originPageId = null;
+          }
+        }
+      }
      const chat = await this.aiChatRepo.insert({
        creatorId: user.id,
        workspaceId: workspace.id,
@@ -317,7 +259,9 @@ export class AiChatService implements OnModuleInit {
      content: incomingText,
      // jsonb column: UIMessage parts are JSON-serializable at runtime but not
      // structurally `JsonValue`, so cast through unknown.
-      metadata: (incoming?.parts ? { parts: incoming.parts } : null) as never,
+      metadata: (incoming?.parts
+        ? { parts: incoming.parts }
+        : null) as never,
    });

    // Rebuild the conversation from persisted history (not the client payload),
@@ -336,20 +280,38 @@ export class AiChatService implements OnModuleInit {
    // The model is resolved by the controller before hijack (clean 503 path).
    // Here we only need the admin-configured system prompt.
    const resolved = await this.aiSettings.resolve(workspace.id);
+    const system = buildSystemPrompt({
+      workspace,
+      adminPrompt: resolved?.systemPrompt,
+      // The role (pre-resolved by the controller) REPLACES the persona layer;
+      // the safety framework is still appended by buildSystemPrompt.
+      roleInstructions: role?.instructions,
+      openedPage: body.openPage,
+    });

-    // Build the external MCP toolset FIRST so the system prompt can carry each
-    // connected server's admin-authored guidance (#180). Merge in admin-
-    // configured external MCP tools (web search, etc.; §6.8). A down/slow
-    // external server never crashes the turn — toolsFor skips it and records the
-    // outcome. The returned client handles MUST be closed in the streamText
-    // lifecycle (onFinish/onError/onAbort) — leaking them is a bug. Docmost
-    // tools take precedence on a name clash (external are namespaced, so a clash
-    // is not expected; the spread order makes intent explicit).
+    // Pass the resolved chatId so the write tools can mint provenance tokens
+    // (access + collab) carrying { actor:'agent', aiChatId: chatId }, making
+    // agent REST/collab writes attributable and non-spoofable (§6.5/§6.6).
+    const docmostTools = await this.tools.forUser(
+      user,
+      sessionId,
+      workspace.id,
+      chatId,
+      // Same open-page value used by the system prompt above; exposed to the
+      // model via getCurrentPage so page identity survives prompt mangling.
+      body.openPage,
+    );
+
+    // Merge in admin-configured external MCP tools (web search, etc.; §6.8).
+    // A down/slow external server never crashes the turn — toolsFor skips it and
+    // records the outcome. The returned client handles MUST be closed in the
+    // streamText lifecycle (onFinish/onError/onAbort) — leaking them is a bug.
+    // Docmost tools take precedence on a name clash (external are namespaced, so
+    // a clash is not expected; the spread order makes intent explicit).
    let external: Awaited<ReturnType<McpClientsService['toolsFor']>> = {
      tools: {},
      clients: [],
      outcomes: [],
-      instructions: [],
    };
    try {
      external = await this.mcpClients.toolsFor(workspace.id);
@@ -362,15 +324,12 @@ export class AiChatService implements OnModuleInit {
        }`,
      );
    }
+    const tools = { ...external.tools, ...docmostTools };

    // Close every external client EXACTLY ONCE across the turn's terminal
    // callbacks (onFinish/onError/onAbort all fire at most once collectively,
-    // but guard anyway). DEFINED HERE — before the prompt/toolset are built — so
-    // that if buildSystemPrompt or forUser throws AFTER the external lease was
-    // taken (toolsFor above), the lease is still released. Otherwise its refCount
-    // stays >= 1 forever and the external undici sockets leak until restart
-    // (#180 reorder moved toolsFor ahead of these; #185 review). Close errors are
-    // swallowed so they never break the response.
+    // but guard anyway). Close errors are swallowed so they never break the
+    // response.
    let clientsClosed = false;
    const closeExternalClients = async (): Promise<void> => {
      if (clientsClosed) return;
@@ -388,43 +347,30 @@ export class AiChatService implements OnModuleInit {
      );
    };

-    // Build the system prompt + Docmost toolset. If either throws after the
-    // external MCP lease was taken above, release the lease before rethrowing so
-    // the leased transports are not leaked (#185 review).
-    let system: string;
-    let docmostTools: Awaited<ReturnType<AiChatToolsService['forUser']>>;
-    try {
-      system = buildSystemPrompt({
-        workspace,
-        adminPrompt: resolved?.systemPrompt,
-        // The role (pre-resolved by the controller) REPLACES the persona layer;
-        // the safety framework is still appended by buildSystemPrompt.
-        roleInstructions: role?.instructions,
-        // Server-validated open page (authoritative title), not the client value.
-        openedPage: openPageContext,
-        // Guidance only for servers that connected and yielded ≥1 callable tool.
-        mcpInstructions: external.instructions,
-      });
-
-      // Pass the resolved chatId so the write tools can mint provenance tokens
-      // (access + collab) carrying { actor:'agent', aiChatId: chatId }, making
-      // agent REST/collab writes attributable and non-spoofable (§6.5/§6.6).
-      docmostTools = await this.tools.forUser(
-        user,
-        sessionId,
-        workspace.id,
-        chatId,
-        // Same server-validated open page used by the system prompt above;
-        // exposed to the model via getCurrentPage so page identity (and the
-        // AUTHORITATIVE title) survives prompt mangling / client title spoofing.
-        openPageContext,
-      );
-    } catch (err) {
-      await closeExternalClients();
-      throw err;
-    }
-
-    const tools = { ...external.tools, ...docmostTools };
+    // Persist the assistant message. Used by onFinish (full result) and the
+    // abort/error paths (partial result). Guarded so we persist at most once.
+    let persisted = false;
+    const persistAssistant = async (data: {
+      text: string;
+      toolCalls: unknown;
+      metadata: Record<string, unknown>;
+    }): Promise<void> => {
+      if (persisted) return;
+      persisted = true;
+      try {
+        await this.aiChatMessageRepo.insert({
+          chatId,
+          workspaceId: workspace.id,
+          userId: user.id,
+          role: 'assistant',
+          content: data.text ?? '',
+          toolCalls: (data.toolCalls ?? null) as never,
+          metadata: data.metadata as never,
+        });
+      } catch (err) {
+        this.logger.error('Failed to persist assistant message', err as Error);
+      }
+    };

    // Accumulate the turn's streamed output so a provider error / disconnect can
    // persist the PARTIAL answer the user already saw — the SDK's onError/onAbort
@@ -434,101 +380,6 @@ export class AiChatService implements OnModuleInit {
    const capturedSteps: StepLike[] = [];
    let inProgressText = '';

-    // Step-granular durability (#183): create the assistant row UPFRONT in the
-    // 'streaming' state (before any token), then UPDATE it as each step finishes
-    // and finalize it once on the terminal callback. If the process dies
-    // mid-turn the row survives with every finished step already persisted; the
-    // startup sweep (sweepStreaming) later flips a dangling 'streaming' row to
-    // 'aborted'. The DB is now the single source of truth for the turn — the
-    // socket is never required for the write path. A failed upfront insert is
-    // logged and leaves assistantId undefined; the per-step/terminal updates then
-    // no-op (guarded below) so the turn still streams to the user.
-    let assistantId: string | undefined;
-    try {
-      const seed = flushAssistant([], '', 'streaming');
-      const seeded = await this.aiChatMessageRepo.insert({
-        chatId,
-        workspaceId: workspace.id,
-        userId: user.id,
-        role: 'assistant',
-        content: seed.content,
-        // jsonb columns: cast through never (same as the user insert above).
-        toolCalls: (seed.toolCalls ?? null) as never,
-        metadata: seed.metadata as never,
-        status: seed.status,
-      });
-      assistantId = seeded?.id;
-    } catch (err) {
-      this.logger.error(
-        `Failed to insert upfront assistant row (chat ${chatId}, workspace ${workspace.id})`,
-        err as Error,
-      );
-    }
-
-    // Per-step (non-terminal) update: persist the finished steps the moment a
-    // step ends. Tolerant — a failed update is logged and swallowed so it never
-    // throws into the stream. Keeps status 'streaming'.
-    const updateStreaming = async (): Promise<void> => {
-      if (!assistantId) return;
-      // Cheap short-circuit once the turn is finalized (see `finalized` below).
-      // The AUTHORITATIVE guard is `onlyIfStreaming` on the UPDATE: a late
-      // fire-and-forget step update could still be in flight on another pool
-      // connection when finalize runs, so the SQL `WHERE status='streaming'`
-      // (not this flag) is what prevents it clobbering the terminal row.
-      if (finalized) return;
-      try {
-        await this.aiChatMessageRepo.update(
-          assistantId,
-          workspace.id,
-          flushAssistant(capturedSteps, '', 'streaming'),
-          { onlyIfStreaming: true },
-        );
-      } catch (err) {
-        this.logger.warn(
-          `Failed to update streaming assistant row: ${
-            err instanceof Error ? err.message : 'unknown error'
-          }`,
-        );
-      }
-    };
-
-    // Serialize the per-step updates (#183 review): onStepFinish fires them
-    // without await, so two could otherwise commit out of order on different pool
-    // connections (step N landing after N+1). Chaining each onto the previous
-    // keeps the persisted row monotonic with step order; each link short-circuits
-    // on `finalized`, so a tail of late updates is cheap.
-    let stepUpdateChain: Promise<void> = Promise.resolve();
-
-    // Terminal finalize: write the completed/error/aborted row exactly once
-    // across the (mutually-exclusive, at-most-once) onFinish/onError/onAbort
-    // callbacks — mirroring the pre-#183 persist-at-most-once guard for the
-    // TERMINAL status (the row may be updated many times with 'streaming' before
-    // this fires once).
-    let finalized = false;
-    const finalizeAssistant = async (
-      flushed: AssistantFlush,
-    ): Promise<void> => {
-      if (finalized) return;
-      finalized = true;
-      const plan = planFinalizeAssistant(assistantId);
-      try {
-        // Shared dispatch (see applyFinalize): UPDATE the upfront row, or — when
-        // the upfront insert failed (kind 'insert') — INSERT the terminal row as
-        // the only safety against losing the turn entirely.
-        await applyFinalize(
-          this.aiChatMessageRepo,
-          plan,
-          { chatId, workspaceId: workspace.id, userId: user.id },
-          flushed,
-        );
-      } catch (err) {
-        this.logger.error(
-          `Failed to finalize assistant message (kind=${plan.kind})`,
-          err as Error,
-        );
-      }
-    };
-
    // DIAGNOSTIC (Safari stream-drop investigation) — temporary. Measure
    // first-chunk latency, the model-silent gap right before a disconnect, and
    // how many SSE heartbeats were written, so a Safari drop can be classified
@@ -544,169 +395,146 @@ export class AiChatService implements OnModuleInit {
    let result: ReturnType<typeof streamText>;
    try {
      result = streamText({
-        model,
-        system,
-        messages,
-        tools,
-        // No maxOutputTokens cap on the agent: tool-call arguments (e.g. a full
-        // page body for the write tools) are emitted as OUTPUT tokens, so a fixed
-        // cap would truncate complex tool calls mid-argument. Let the model use its
-        // natural per-step budget. (Cost/credit limits are an account concern, not
-        // something to enforce by silently breaking the agent.)
-        stopWhen: stepCountIs(MAX_AGENT_STEPS),
-        // Forced finalization: reserve the LAST allowed step for a text-only
-        // answer. Without this, a turn that spends all its steps on tool calls
-        // ends with no assistant text (an empty turn). prepareAgentStep forbids
-        // further tool calls and appends a synthesis instruction on that step,
-        // concatenated onto the original `system` so the persona is preserved.
-        prepareStep: ({ stepNumber }) => prepareAgentStep(stepNumber, system),
-        abortSignal: signal,
-        onChunk: ({ chunk }) => {
-          // DIAGNOSTIC (Safari stream-drop investigation) — temporary. Any model
-          // output chunk means the stream is actively emitting bytes; track first
-          // + most-recent activity timestamps.
-          const now = Date.now();
-          firstModelChunkAt ??= now;
-          lastModelChunkAt = now;
-          // 'text-delta' is the assistant's prose; tool-call args are separate chunk
-          // types — so this mirrors exactly what streams to the client.
-          if (chunk.type === 'text-delta') inProgressText += chunk.text;
-        },
-        onStepFinish: (step) => {
-          // The finished step's full text is now in `step.text`; fold it in and reset
-          // the in-progress accumulator for the next step.
-          capturedSteps.push(step as StepLike);
-          inProgressText = '';
-          // Step-granular durability (#183): persist this finished step (its text +
-          // tool calls + tool RESULTS) the moment it ends, so a process death after
-          // this point still recovers the step. Not awaited here (never block the
-          // stream), but SERIALIZED via stepUpdateChain so the writes commit in
-          // step order; updateStreaming is error-tolerant (logs + swallows).
-          stepUpdateChain = stepUpdateChain.then(() => updateStreaming());
-        },
-        onFinish: async ({ text, finishReason, totalUsage, usage, steps }) => {
-          // DIAGNOSTIC (Safari stream-drop investigation) — temporary: success
-          // baseline for Safari comparison.
-          const diagNow = Date.now();
-          this.logger.log(
-            `AI chat stream DIAGNOSTIC (finish): elapsed=${diagNow - streamStartedAt}ms ` +
-              `firstChunkLatency=${firstModelChunkAt ? firstModelChunkAt - streamStartedAt : 'none'}ms ` +
-              `heartbeatsSent=${heartbeatsSent} steps=${steps.length}`,
-          );
-          // Finalize the assistant row (#183): the upfront 'streaming' row is
-          // UPDATEd to 'completed' with the turn's final text, cumulative usage and
-          // full UIMessage parts. We pass the SDK `steps` (which carry the final
-          // step's text) as the captured steps so metadata.parts matches the
-          // pre-#183 onFinish record exactly; `inProgressText` is '' here (the last
-          // step already finished). Final-step usage (usage.input+output) ≈ the
-          // conversation's CURRENT context size, distinct from totalUsage.
-          //
-          // COLUMN-SEMANTICS NOTE (#183): `content` is built by flushAssistant as
-          // the CONCATENATION of every step's text (stepsText), whereas pre-#183
-          // it stored only the FINAL step's text. This is a deliberate, harmless
-          // change: the UI and the Markdown export render from `metadata.parts`
-          // (per-step text + tool parts), not from `content`; `content` is the
-          // plain-text projection (full-text search / fallback). A multi-step
-          // turn's `content` therefore now holds all steps' prose, not just the
-          // last block.
-          await finalizeAssistant(
-            flushAssistant(steps as StepLike[], '', 'completed', {
-              finishReason: finishReason as string,
-              usage: totalUsage as StreamUsage,
-              contextTokens:
-                (usage?.inputTokens ?? 0) + (usage?.outputTokens ?? 0) ||
-                undefined,
-              // Admin-configured context-window size for this model (badge max).
-              // Resolved once per turn above; written to metadata only when > 0.
-              maxContextTokens: resolved?.chatContextWindow,
-            }),
-          );
-          // Lifecycle: release the external MCP clients leased for this turn.
-          await closeExternalClients();
+      model,
+      system,
+      messages,
+      tools,
+      // No maxOutputTokens cap on the agent: tool-call arguments (e.g. a full
+      // page body for the write tools) are emitted as OUTPUT tokens, so a fixed
+      // cap would truncate complex tool calls mid-argument. Let the model use its
+      // natural per-step budget. (Cost/credit limits are an account concern, not
+      // something to enforce by silently breaking the agent.)
+      stopWhen: stepCountIs(MAX_AGENT_STEPS),
+      // Forced finalization: reserve the LAST allowed step for a text-only
+      // answer. Without this, a turn that spends all its steps on tool calls
+      // ends with no assistant text (an empty turn). prepareAgentStep forbids
+      // further tool calls and appends a synthesis instruction on that step,
+      // concatenated onto the original `system` so the persona is preserved.
+      prepareStep: ({ stepNumber }) => prepareAgentStep(stepNumber, system),
+      abortSignal: signal,
+      onChunk: ({ chunk }) => {
+        // DIAGNOSTIC (Safari stream-drop investigation) — temporary. Any model
+        // output chunk means the stream is actively emitting bytes; track first
+        // + most-recent activity timestamps.
+        const now = Date.now();
+        firstModelChunkAt ??= now;
+        lastModelChunkAt = now;
+        // 'text-delta' is the assistant's prose; tool-call args are separate chunk
+        // types — so this mirrors exactly what streams to the client.
+        if (chunk.type === 'text-delta') inProgressText += chunk.text;
+      },
+      onStepFinish: (step) => {
+        // The finished step's full text is now in `step.text`; fold it in and reset
+        // the in-progress accumulator for the next step.
+        capturedSteps.push(step as StepLike);
+        inProgressText = '';
+      },
+      onFinish: async ({ text, finishReason, totalUsage, usage, steps }) => {
+        // DIAGNOSTIC (Safari stream-drop investigation) — temporary: success
+        // baseline for Safari comparison.
+        const diagNow = Date.now();
+        this.logger.log(
+          `AI chat stream DIAGNOSTIC (finish): elapsed=${diagNow - streamStartedAt}ms ` +
+            `firstChunkLatency=${firstModelChunkAt ? firstModelChunkAt - streamStartedAt : 'none'}ms ` +
+            `heartbeatsSent=${heartbeatsSent} steps=${steps.length}`,
+        );
+        await persistAssistant({
+          text,
+          toolCalls: serializeSteps(steps),
+          metadata: {
+            finishReason,
+            // Persist the turn's cumulative usage WITH reasoning tokens resolved
+            // from either the new `outputTokenDetails` or the deprecated top-level
+            // field, so reopened history / the Markdown export show the thinking
+            // token cost too.
+            usage: normalizeStreamUsage(totalUsage as StreamUsage) ?? totalUsage,
+            // Final-step usage = the context actually fed to the model on the last LLM
+            // call (full history + tool results) plus the answer it just generated.
+            // input+output of the FINAL step ≈ the conversation's CURRENT context size,
+            // distinct from totalUsage which sums every step (cumulative tokens spent).
+            contextTokens:
+              (usage?.inputTokens ?? 0) + (usage?.outputTokens ?? 0) || undefined,
+            // Persist the FULL set of UIMessage parts for the turn (text +
+            // tool-call/result), so the rebuilt history replays prior tool
+            // context to the model on later turns.
+            parts: assistantParts(steps, text),
+          },
+        });
+        // Lifecycle: release the external MCP clients leased for this turn.
+        await closeExternalClients();

-          // Generate the chat title for a freshly created chat AFTER the stream's
-          // provider call has completed — NOT concurrently with it. The z.ai coding
-          // endpoint stalls one of two concurrent requests to the same plan, which
-          // black-holed the chat stream (~300s headers timeout) when title
-          // generation raced it. Running it here (solo, fire-and-forget) avoids the
-          // race; never block the turn on it, swallow any error.
-          if (isNewChat && incomingText) {
-            void this.generateTitle(chatId, workspace.id, incomingText).catch(
-              (err) => {
-                this.logger.warn(
-                  `Title generation failed: ${(err as Error)?.message ?? err}`,
-                );
-              },
-            );
-          }
-        },
-        onError: async ({ error }) => {
-          // NestJS Logger.error(message, stack?, context?): pass the real message
-          // (with statusCode when present) + the stack string, not the Error
-          // object, so the actual provider cause is clearly logged. Reuse the
-          // shared formatter so provider error formatting stays unified.
-          const e = error as { stack?: string };
-          const errorText = describeProviderError(error, String(error));
-          this.logger.error(`AI chat stream error: ${errorText}`, e?.stack);
-          // DIAGNOSTIC (Safari stream-drop investigation) — temporary: timing of
-          // an error-terminated stream.
-          const diagNow = Date.now();
-          this.logger.warn(
-            `AI chat stream DIAGNOSTIC (error): elapsed=${diagNow - streamStartedAt}ms ` +
-              `firstChunkLatency=${firstModelChunkAt ? firstModelChunkAt - streamStartedAt : 'none'}ms ` +
-              `silentGapBeforeDrop=${diagNow - lastModelChunkAt}ms heartbeatsSent=${heartbeatsSent}`,
+        // Generate the chat title for a freshly created chat AFTER the stream's
+        // provider call has completed — NOT concurrently with it. The z.ai coding
+        // endpoint stalls one of two concurrent requests to the same plan, which
+        // black-holed the chat stream (~300s headers timeout) when title
+        // generation raced it. Running it here (solo, fire-and-forget) avoids the
+        // race; never block the turn on it, swallow any error.
+        if (isNewChat && incomingText) {
+          void this.generateTitle(chatId, workspace.id, incomingText).catch(
+            (err) => {
+              this.logger.warn(
+                `Title generation failed: ${(err as Error)?.message ?? err}`,
+              );
+            },
          );
-          // Finalize the PARTIAL answer streamed before the failure (text + any
-          // finished tool steps) WITH the error in metadata, so the turn shows what
-          // the user already saw plus the cause — not just a bare error. Status
-          // 'error' (#183).
-          await finalizeAssistant(
-            flushAssistant(capturedSteps, inProgressText, 'error', {
-              error: errorText,
-            }),
-          );
-          await closeExternalClients();
-        },
-        onAbort: async ({ steps }) => {
-          const partialChars =
-            capturedSteps.reduce((n, s) => n + (s.text?.length ?? 0), 0) +
-            inProgressText.length;
-          // Unlike onError/onFinish, this terminal path otherwise writes nothing, so
-          // an aborted turn (client disconnect / proxy drop / stop()) would be
-          // invisible in the logs. Log it (warn) so the abort is traceable.
-          this.logger.warn(
-            `AI chat stream aborted (chat ${chatId}) after ${steps.length} ` +
-              `step(s), ${partialChars} chars partial text; persisting partial turn.`,
-          );
-          // DIAGNOSTIC (Safari stream-drop investigation) — temporary: THE key
-          // line — classifies the Safari drop.
-          const diagNow = Date.now();
-          this.logger.warn(
-            `AI chat stream DIAGNOSTIC (abort/disconnect): elapsed=${diagNow - streamStartedAt}ms ` +
-              `firstChunkLatency=${firstModelChunkAt ? firstModelChunkAt - streamStartedAt : 'none'}ms ` +
-              `silentGapBeforeDrop=${diagNow - lastModelChunkAt}ms heartbeatsSent=${heartbeatsSent} ` +
-              `steps=${steps.length}`,
-          );
-          await finalizeAssistant(
-            flushAssistant(capturedSteps, inProgressText, 'aborted'),
-          );
-          await closeExternalClients();
-        },
+        }
+      },
+      onError: async ({ error }) => {
+        // NestJS Logger.error(message, stack?, context?): pass the real message
+        // (with statusCode when present) + the stack string, not the Error
+        // object, so the actual provider cause is clearly logged. Reuse the
+        // shared formatter so provider error formatting stays unified.
+        const e = error as { stack?: string };
+        const errorText = describeProviderError(error, String(error));
+        this.logger.error(`AI chat stream error: ${errorText}`, e?.stack);
+        // DIAGNOSTIC (Safari stream-drop investigation) — temporary: timing of
+        // an error-terminated stream.
+        const diagNow = Date.now();
+        this.logger.warn(
+          `AI chat stream DIAGNOSTIC (error): elapsed=${diagNow - streamStartedAt}ms ` +
+            `firstChunkLatency=${firstModelChunkAt ? firstModelChunkAt - streamStartedAt : 'none'}ms ` +
+            `silentGapBeforeDrop=${diagNow - lastModelChunkAt}ms heartbeatsSent=${heartbeatsSent}`,
+        );
+        // Persist the PARTIAL answer streamed before the failure (text + any
+        // finished tool steps) WITH the error in metadata, so the turn shows what
+        // the user already saw plus the cause — not just a bare error.
+        await persistAssistant(
+          buildPartialAssistantRecord(
+            capturedSteps,
+            inProgressText,
+            'error',
+            errorText,
+          ),
+        );
+        await closeExternalClients();
+      },
+      onAbort: async ({ steps }) => {
+        const partialChars =
+          capturedSteps.reduce((n, s) => n + (s.text?.length ?? 0), 0) +
+          inProgressText.length;
+        // Unlike onError/onFinish, this terminal path otherwise writes nothing, so
+        // an aborted turn (client disconnect / proxy drop / stop()) would be
+        // invisible in the logs. Log it (warn) so the abort is traceable.
+        this.logger.warn(
+          `AI chat stream aborted (chat ${chatId}) after ${steps.length} ` +
+            `step(s), ${partialChars} chars partial text; persisting partial turn.`,
+        );
+        // DIAGNOSTIC (Safari stream-drop investigation) — temporary: THE key
+        // line — classifies the Safari drop.
+        const diagNow = Date.now();
+        this.logger.warn(
+          `AI chat stream DIAGNOSTIC (abort/disconnect): elapsed=${diagNow - streamStartedAt}ms ` +
+            `firstChunkLatency=${firstModelChunkAt ? firstModelChunkAt - streamStartedAt : 'none'}ms ` +
+            `silentGapBeforeDrop=${diagNow - lastModelChunkAt}ms heartbeatsSent=${heartbeatsSent} ` +
+            `steps=${steps.length}`,
+        );
+        await persistAssistant(
+          buildPartialAssistantRecord(capturedSteps, inProgressText, 'aborted'),
+        );
+        await closeExternalClients();
+      },
      });

-      // Drain the stream independently of the client socket so the turn always
-      // runs to completion (or to its abort) and the terminal callbacks
-      // (onFinish/onError/onAbort) fire — releasing the per-turn object graph
-      // (history, the per-request toolset closures, captured steps, SDK buffers)
-      // and closing leased MCP clients. WITHOUT this, a client disconnect leaves
-      // the pipe's dead socket as the only reader; backpressure stalls the stream,
-      // the callbacks never run, and every dropped turn stays rooted in memory —
-      // the heap-OOM leak. consumeStream removes that backpressure (AI SDK v6
-      // "Handling client disconnects"). NOT awaited (fire-and-forget); the stream
-      // errors are already logged by the streamText `onError` callback above, so
-      // swallow here to avoid an unhandledRejection.
-      void result.consumeStream({ onError: () => undefined });
-
      // Stream the UI-message protocol straight to the hijacked Node response.
      // Without onError the AI SDK masks the cause ('An error occurred.') and the
      // UI shows a generic failure. Surface the real provider message instead.
@@ -811,10 +639,7 @@ export class AiChatService implements OnModuleInit {
        'punctuation at the end.',
      prompt: firstMessage.slice(0, 2000),
    });
-    const title = text
-      .trim()
-      .replace(/^["']|["']$/g, '')
-      .slice(0, 120);
+    const title = text.trim().replace(/^["']|["']$/g, '').slice(0, 120);
    if (title) {
      await this.aiChatRepo.update(chatId, { title }, workspaceId);
    }
@@ -1137,139 +962,38 @@ export function rowToUiMessage(row: AiChatMessage): Omit<UIMessage, 'id'> & {
 }

 /**
- * The persisted-row patch shape produced by {@link flushAssistant}. It is the
- * SAME shape the assistant repo insert/update consume (content + toolCalls +
- * metadata) plus the lifecycle `status` column added in #183.
+ * Build the assistant-message record persisted on a partial/failed turn (the
+ * streamText onError / onAbort paths). Captures the partial answer the user
+ * already saw: each finished step's text + tool parts (via assistantParts),
+ * then the in-progress step's text appended last. When `errorText` is provided
+ * it is recorded in metadata.error so the cause shows in history; an aborted
+ * turn passes none. Pure, so the partial-recording shape is unit-testable
+ * without seaming streamText.
 */
-export interface AssistantFlush {
-  content: string;
-  toolCalls: unknown;
-  metadata: Record<string, unknown>;
-  status: 'streaming' | 'completed' | 'error' | 'aborted';
-}
-
-/**
- * Pure decision for the terminal finalize (#183): given whether the upfront
- * assistant row exists (`assistantId`), choose whether the terminal payload is
- * written by UPDATEing that row or — when the upfront insert failed and there is
- * no id — by INSERTing a fresh terminal row so the turn is not lost entirely.
- * Returns `{ kind: 'update', id }` or `{ kind: 'insert' }`. Extracted so the
- * fallback-insert branch (the only safety against losing a turn whose upfront
- * insert failed) is unit-testable without seaming streamText.
- */
-export function planFinalizeAssistant(
-  assistantId: string | undefined,
-): { kind: 'update'; id: string } | { kind: 'insert' } {
-  return assistantId ? { kind: 'update', id: assistantId } : { kind: 'insert' };
-}
-
-/** The repo surface the terminal finalize needs (structural — the real repo and
- *  a test mock both satisfy it). */
-export interface FinalizeRepo {
-  insert(insertable: Record<string, unknown>): Promise<unknown>;
-  update(
-    id: string,
-    workspaceId: string,
-    patch: AssistantFlush,
-  ): Promise<unknown>;
-}
-
-/**
- * Apply a finalize `plan` to the repo with the terminal `flushed` payload (#183):
- * UPDATE the upfront row, or INSERT a fresh terminal row as the fallback when the
- * upfront insert failed. The SINGLE dispatch shared by the service's
- * finalizeAssistant and its test, so the test exercises the real path instead of
- * a copy (#186 review). Pure of error handling — the caller wraps it.
- */
-export async function applyFinalize(
-  repo: FinalizeRepo,
-  plan: { kind: 'update'; id: string } | { kind: 'insert' },
-  base: { chatId: string; workspaceId: string; userId: string },
-  flushed: AssistantFlush,
-): Promise<void> {
-  if (plan.kind === 'update') {
-    await repo.update(plan.id, base.workspaceId, flushed);
-    return;
-  }
-  await repo.insert({
-    chatId: base.chatId,
-    workspaceId: base.workspaceId,
-    userId: base.userId,
-    role: 'assistant',
-    content: flushed.content,
-    toolCalls: flushed.toolCalls ?? null,
-    metadata: flushed.metadata,
-    status: flushed.status,
-  });
-}
-
-/**
- * PURE assistant-row builder (#183 step-granular durability). Given the turn's
- * accumulated steps + the in-progress (not-yet-finished) text + the lifecycle
- * status, it returns the row patch to persist. The SAME path runs for the
- * upfront insert (empty steps, status 'streaming'), every per-step update, and
- * the terminal finalize (completed/error/aborted) — and a future background
- * worker can call it identically, so it must stay a pure function of its inputs
- * (NO `this`, no IO).
- *
- * `metadata.parts` is built by assistantParts over the finished steps, then the
- * in-progress text appended as a trailing text part, so rowToUiMessage /
- * findRecent keep replaying the turn unchanged. `metadata.finishReason`,
- * `metadata.error`, `metadata.usage` and `metadata.contextTokens` are attached
- * only when provided/relevant, matching the pre-#183 onFinish/onError records.
- */
-export function flushAssistant(
-  capturedSteps: ReadonlyArray<StepLike> | undefined,
+export function buildPartialAssistantRecord(
+  steps: ReadonlyArray<StepLike> | undefined,
  inProgressText: string,
-  status: 'streaming' | 'completed' | 'error' | 'aborted',
-  extra?: {
-    finishReason?: string;
-    usage?: ChatStreamUsage | StreamUsage | undefined;
-    contextTokens?: number;
-    // Admin-configured context-window size (tokens) for this turn's model; the
-    // denominator of the client's "current / max" header badge. Written only
-    // when > 0 (0/unset = no limit known → the badge shows current only).
-    maxContextTokens?: number;
-    error?: string;
-  },
-): AssistantFlush {
-  const finished = capturedSteps ?? [];
+  finishReason: 'error' | 'aborted',
+  errorText?: string,
+): { text: string; toolCalls: unknown; metadata: Record<string, unknown> } {
+  const finished = steps ?? [];
  const stepsText = finished.map((s) => s.text ?? '').join('');
  const trailing = inProgressText ?? '';
  // assistantParts emits text parts only for FINISHED steps; append the
-  // in-progress step's text (the partial answer cut off by an error/abort, or
-  // simply not yet flushed mid-stream) as the last text part so the persisted
-  // parts match what streamed to the client.
+  // in-progress step's text (the answer cut off by the error) as the last text
+  // part so the persisted parts match what streamed to the client.
  const parts = assistantParts(finished, '') as unknown as Array<
    Record<string, unknown>
  >;
  if (trailing) parts.push({ type: 'text', text: trailing });
-
-  const metadata: Record<string, unknown> = {
-    parts: parts as unknown as UIMessage['parts'],
-  };
-  // finishReason: prefer an explicit one; else derive a sensible value from the
-  // terminal status (so onError/onAbort records keep their historical reason).
-  if (extra?.finishReason) {
-    metadata.finishReason = extra.finishReason;
-  } else if (status === 'error' || status === 'aborted') {
-    metadata.finishReason = status;
-  }
-  if (extra?.usage !== undefined) {
-    metadata.usage =
-      normalizeStreamUsage(extra.usage as StreamUsage) ?? extra.usage;
-  }
-  if (extra?.contextTokens) metadata.contextTokens = extra.contextTokens;
-  if (extra?.maxContextTokens && extra.maxContextTokens > 0) {
-    metadata.maxContextTokens = extra.maxContextTokens;
-  }
-  if (extra?.error) metadata.error = extra.error;
-
  return {
-    content: stepsText + trailing,
+    text: stepsText + trailing,
    toolCalls: serializeSteps(finished),
-    metadata,
-    status,
+    metadata: {
+      finishReason,
+      parts: parts as unknown as UIMessage['parts'],
+      ...(errorText ? { error: errorText } : {}),
+    },
  };
 }

--- a/apps/server/src/core/ai-chat/chat-markdown.util.spec.ts
+++ b/apps/server/src/core/ai-chat/chat-markdown.util.spec.ts
@@ -1,295 +0,0 @@
-import { buildChatMarkdown, normalizeLang } from './chat-markdown.util';
-import type { AiChatMessage } from '@docmost/db/types/entity.types';
-
-/**
- * normalizeLang: the client sends `i18n.language` — a FULL locale tag like
- * 'en-US' / 'ru-RU', NOT a bare 'en'/'ru'. A `@IsIn(['en','ru'])` DTO rejected
- * that with a 400 (caught in real-browser testing); the export now accepts any
- * string and normalizes here. Guards that regression.
- */
-describe('normalizeLang', () => {
-  it("maps any 'ru…' locale tag to ru", () => {
-    expect(normalizeLang('ru')).toBe('ru');
-    expect(normalizeLang('ru-RU')).toBe('ru');
-    expect(normalizeLang('RU-ru')).toBe('ru');
-  });
-
-  it('maps everything else (incl. region-qualified English) to en', () => {
-    expect(normalizeLang('en')).toBe('en');
-    expect(normalizeLang('en-US')).toBe('en');
-    expect(normalizeLang('fr-FR')).toBe('en');
-    expect(normalizeLang(undefined)).toBe('en');
-    expect(normalizeLang('')).toBe('en');
-  });
-});
-
-/**
- * Unit tests for the SERVER Markdown export (#183). Mirrors the coverage of the
- * (now-removed) client chat-markdown tests: heading/metadata, role labels, text
- * + tool blocks, token footers, the interrupted-turn note, and NULL-status
- * (legacy) rows. The export embeds a live `new Date().toISOString()` timestamp;
- * we never assert it, only the deterministic structure.
- */
-
-function row(partial: Partial<AiChatMessage>): AiChatMessage {
-  return {
-    id: partial.id ?? 'id',
-    chatId: partial.chatId ?? 'chat-1',
-    workspaceId: partial.workspaceId ?? 'ws-1',
-    userId: partial.userId ?? null,
-    role: partial.role ?? 'user',
-    content: partial.content ?? null,
-    toolCalls: partial.toolCalls ?? null,
-    metadata: partial.metadata ?? null,
-    status: partial.status ?? null,
-    createdAt: partial.createdAt ?? ('2026-06-21T00:00:00.000Z' as never),
-    updatedAt: partial.updatedAt ?? ('2026-06-21T00:00:00.000Z' as never),
-    deletedAt: partial.deletedAt ?? null,
-  } as AiChatMessage;
-}
-
-describe('buildChatMarkdown (server) — structure', () => {
-  it('emits the title heading, chat id and message count', () => {
-    const md = buildChatMarkdown({
-      title: 'My chat',
-      chatId: 'chat-123',
-      rows: [],
-    });
-    expect(md).toContain('# My chat');
-    expect(md).toContain('- Chat ID: `chat-123`');
-    expect(md).toContain('- Messages: 0');
-  });
-
-  it('falls back to "Untitled chat" with no title (en)', () => {
-    const md = buildChatMarkdown({ title: null, chatId: 'c', rows: [] });
-    expect(md).toContain('# Untitled chat');
-  });
-
-  it('localizes fixed labels with lang=ru (structure stays English)', () => {
-    const md = buildChatMarkdown({
-      title: null,
-      chatId: 'c',
-      lang: 'ru',
-      rows: [row({ role: 'assistant', content: 'hi' })],
-    });
-    expect(md).toContain('# Без названия');
-    expect(md).toContain('## 1. ИИ-агент');
-    // Structural words remain English.
-    expect(md).toContain('- Chat ID:');
-  });
-
-  it('numbers messages and labels roles (You / AI agent)', () => {
-    const md = buildChatMarkdown({
-      title: 'T',
-      chatId: 'c',
-      rows: [
-        row({ role: 'user', content: 'question' }),
-        row({ role: 'assistant', content: 'answer' }),
-      ],
-    });
-    expect(md).toContain('## 1. You');
-    expect(md).toContain('question');
-    expect(md).toContain('## 2. AI agent');
-    expect(md).toContain('answer');
-  });
-
-  it('renders a tool part with fenced input/output and the friendly label', () => {
-    const md = buildChatMarkdown({
-      title: 'T',
-      chatId: 'c',
-      rows: [
-        row({
-          role: 'assistant',
-          content: 'done',
-          metadata: {
-            parts: [
-              {
-                type: 'tool-getPage',
-                state: 'output-available',
-                input: { id: 'p1' },
-                output: { title: 'Hello' },
-              },
-              { type: 'text', text: 'done' },
-            ],
-          } as never,
-        }),
-      ],
-    });
-    expect(md).toContain('**Tool: Read page** (`getPage`) — done');
-    expect(md).toContain('Input:');
-    expect(md).toContain('"id": "p1"');
-    expect(md).toContain('Output:');
-    expect(md).toContain('"title": "Hello"');
-  });
-
-  // #186 re-review pt 1: restore the parity coverage of the removed client spec —
-  // error state, unknown-tool fallback (en + ru), and the circular-stringify catch.
-  it('renders a tool part in the error state with its errorText', () => {
-    const md = buildChatMarkdown({
-      title: 'T',
-      chatId: 'c',
-      rows: [
-        row({
-          role: 'assistant',
-          metadata: {
-            parts: [
-              {
-                type: 'tool-getPage',
-                state: 'output-error',
-                input: { id: 'p1' },
-                errorText: 'page not found',
-              },
-            ],
-          } as never,
-        }),
-      ],
-    });
-    expect(md).toContain('**Tool: Read page** (`getPage`) — error');
-    expect(md).toContain('**Error:** page not found');
-  });
-
-  it('falls back to "Ran tool <name>" for an unknown tool (en) and the ru variant', () => {
-    const parts = [
-      {
-        type: 'tool-mysteryTool',
-        state: 'output-available',
-        output: { ok: 1 },
-      },
-    ];
-    const en = buildChatMarkdown({
-      title: 'T',
-      chatId: 'c',
-      rows: [row({ role: 'assistant', metadata: { parts } as never })],
-    });
-    expect(en).toContain('**Tool: Ran tool mysteryTool** (`mysteryTool`)');
-    const ru = buildChatMarkdown({
-      title: 'T',
-      chatId: 'c',
-      lang: 'ru',
-      rows: [row({ role: 'assistant', metadata: { parts } as never })],
-    });
-    expect(ru).toContain('Выполнил инструмент mysteryTool');
-  });
-
-  it('does not throw on a circular tool output (falls back to String)', () => {
-    const circular: Record<string, unknown> = {};
-    circular.self = circular;
-    expect(() =>
-      buildChatMarkdown({
-        title: 'T',
-        chatId: 'c',
-        rows: [
-          row({
-            role: 'assistant',
-            metadata: {
-              parts: [
-                {
-                  type: 'tool-getPage',
-                  state: 'output-available',
-                  output: circular,
-                },
-              ],
-            } as never,
-          }),
-        ],
-      }),
-    ).not.toThrow();
-  });
-
-  it('emits a token footer + total when usage is present', () => {
-    const md = buildChatMarkdown({
-      title: 'T',
-      chatId: 'c',
-      rows: [
-        row({
-          role: 'assistant',
-          content: 'a',
-          metadata: {
-            usage: {
-              inputTokens: 100,
-              outputTokens: 20,
-              totalTokens: 120,
-              reasoningTokens: 8,
-            },
-          } as never,
-        }),
-      ],
-    });
-    expect(md).toContain('- Total tokens: 120');
-    expect(md).toContain(
-      '_Tokens — in: 100, out: 20, reasoning: 8, total: 120_',
-    );
-  });
-
-  it('flags a still-streaming (interrupted) row', () => {
-    const md = buildChatMarkdown({
-      title: 'T',
-      chatId: 'c',
-      rows: [
-        row({ role: 'assistant', content: 'partial', status: 'streaming' }),
-      ],
-    });
-    expect(md).toContain('still being generated');
-  });
-
-  it('does NOT flag a completed row', () => {
-    const md = buildChatMarkdown({
-      title: 'T',
-      chatId: 'c',
-      rows: [row({ role: 'assistant', content: 'final', status: 'completed' })],
-    });
-    expect(md).not.toContain('still being generated');
-  });
-
-  it('renders a legacy NULL-status row (no parts) from plain content', () => {
-    const md = buildChatMarkdown({
-      title: 'T',
-      chatId: 'c',
-      rows: [
-        row({ role: 'assistant', content: 'legacy answer', status: null }),
-      ],
-    });
-    expect(md).toContain('legacy answer');
-    expect(md).not.toContain('still being generated');
-  });
-
-  it('renders a persisted error', () => {
-    const md = buildChatMarkdown({
-      title: 'T',
-      chatId: 'c',
-      rows: [
-        row({
-          role: 'assistant',
-          content: '',
-          status: 'error',
-          metadata: { error: '401: Unauthorized' } as never,
-        }),
-      ],
-    });
-    expect(md).toContain('**⚠️ Error:** 401: Unauthorized');
-  });
-
-  it('escapes embedded triple-backtick fences with a longer delimiter', () => {
-    const md = buildChatMarkdown({
-      title: 'T',
-      chatId: 'c',
-      rows: [
-        row({
-          role: 'assistant',
-          content: 'x',
-          metadata: {
-            parts: [
-              {
-                type: 'tool-getPage',
-                state: 'output-available',
-                output: '```inner```',
-              },
-            ],
-          } as never,
-        }),
-      ],
-    });
-    // A 4-backtick fence wraps content that itself contains a 3-backtick run.
-    expect(md).toContain('````');
-  });
-});
--- a/apps/server/src/core/ai-chat/chat-markdown.util.ts
+++ b/apps/server/src/core/ai-chat/chat-markdown.util.ts
@@ -1,299 +0,0 @@
-/**
- * Server-side Markdown export for an AI agent chat (#183). The DB is the single
- * source of truth: this renders a chat purely from its persisted message rows
- * (`AiChatMessage[]` — role / content / metadata.parts / toolCalls / usage).
- * Because the assistant row is now persisted UPFRONT and updated per step, an
- * interrupted turn is included up to its last finished step.
- *
- * Ported from the client `utils/chat-markdown.ts`. It is a PURE function (apart
- * from `new Date()` for the export timestamp), so it is straightforward to
- * unit-test and a future background worker can reuse it.
- *
- * Only a few fixed role/tool labels are localized via the `lang` param; the
- * structural document words (Input/Output/Error/Tokens/...) stay English because
- * the output is a technical artifact.
- */
-
-import type { AiChatMessage } from '@docmost/db/types/entity.types';
-
-/** Supported export label languages. Defaults to English. */
-export type ExportLang = 'en' | 'ru';
-
-/**
- * Normalize an arbitrary client locale code to a supported export language. The
- * client sends `i18n.language`, which is a FULL locale tag (e.g. `en-US`,
- * `ru-RU`), not a bare `en`/`ru` — so match on the language subtag and fall back
- * to English for anything non-Russian.
- */
-export function normalizeLang(lang?: string): ExportLang {
-  return lang?.toLowerCase().startsWith('ru') ? 'ru' : 'en';
-}
-
-/** A single AI SDK UIMessage part (text part or a tool part). */
-interface ExportPart {
-  type: string;
-  text?: string;
-  state?: string;
-  toolName?: string;
-  input?: unknown;
-  output?: unknown;
-  errorText?: string;
-}
-
-/** Authoritative per-turn usage the server attaches to a message row. */
-interface UsageLike {
-  inputTokens?: number;
-  outputTokens?: number;
-  totalTokens?: number;
-  reasoningTokens?: number;
-}
-
-/** Localized label table. The client-side Markdown builder was removed by #183
- *  (the export is now server-side only), so this no longer mirrors a second
- *  exporter — instead the tool-action labels are kept in parity with the
- *  on-screen action-log labels in the client's `tool-parts.tsx` (`toolLabelKey`)
- *  so the export reads the same as the UI. Only role + tool-action labels are
- *  localized; everything structural is an English constant in the renderer. */
-const LABELS: Record<
-  ExportLang,
-  {
-    untitled: string;
-    aiAgent: string;
-    you: string;
-    tools: Record<string, string>;
-    ranTool: (name: string) => string;
-    stillGenerating: string;
-  }
-> = {
-  en: {
-    untitled: 'Untitled chat',
-    aiAgent: 'AI agent',
-    you: 'You',
-    tools: {
-      searchPages: 'Searched pages',
-      getPage: 'Read page',
-      createPage: 'Created page',
-      updatePageContent: 'Updated page',
-      renamePage: 'Renamed page',
-      movePage: 'Moved page',
-      deletePage: 'Deleted page (to trash)',
-      createComment: 'Commented',
-      resolveComment: 'Resolved comment',
-    },
-    ranTool: (name) => `Ran tool ${name}`,
-    stillGenerating:
-      'This message is still being generated — the export captured a partial, in-progress response.',
-  },
-  ru: {
-    untitled: 'Без названия',
-    aiAgent: 'ИИ-агент',
-    you: 'Вы',
-    tools: {
-      searchPages: 'Искал по страницам',
-      getPage: 'Прочитал страницу',
-      createPage: 'Создал страницу',
-      updatePageContent: 'Обновил страницу',
-      renamePage: 'Переименовал страницу',
-      movePage: 'Переместил страницу',
-      deletePage: 'Удалил страницу (в корзину)',
-      createComment: 'Прокомментировал',
-      resolveComment: 'Закрыл комментарий',
-    },
-    ranTool: (name) => `Выполнил инструмент ${name}`,
-    stillGenerating:
-      'Это сообщение всё ещё генерируется — экспорт захватил частичный, незавершённый ответ.',
-  },
-};
-
-/** True for AI SDK tool parts (static `tool-*` or `dynamic-tool`). */
-function isToolPart(type: string): boolean {
-  return type.startsWith('tool-') || type === 'dynamic-tool';
-}
-
-/** Extract the tool name from a part `type` of `tool-${name}` (or dynamic). */
-function getToolName(part: ExportPart): string {
-  if (part.type === 'dynamic-tool') return part.toolName ?? '';
-  return part.type.startsWith('tool-')
-    ? part.type.slice('tool-'.length)
-    : part.type;
-}
-
-/** Map an AI SDK tool-part state to the 3 states the action-log renders. */
-function toolRunState(state: string | undefined): 'running' | 'done' | 'error' {
-  if (state === 'output-error' || state === 'output-denied') return 'error';
-  if (state === 'output-available') return 'done';
-  return 'running';
-}
-
-/** Resolve a tool's friendly action-log label (localized) from its name. */
-function toolLabel(name: string, lang: ExportLang): string {
-  return LABELS[lang].tools[name] ?? LABELS[lang].ranTool(name);
-}
-
-/**
- * Stringify an arbitrary tool input/output value for a fenced block. Strings
- * pass through as-is; everything else is pretty-printed JSON, falling back to
- * `String(value)` if serialization throws (e.g. a circular structure).
- */
-function stringify(value: unknown): string {
-  if (typeof value === 'string') return value;
-  try {
-    return JSON.stringify(value, null, 2);
-  } catch {
-    return String(value);
-  }
-}
-
-/**
- * Wrap `code` in a fenced code block whose backtick delimiter is LONGER than the
- * longest backtick run inside the content, so embedded backticks (or a literal
- * ``` fence) never break out of the block. Minimum 3 backticks.
- */
-function fence(code: string, lang = ''): string {
-  const runs: string[] = code.match(/`+/g) ?? [];
-  const longest = runs.reduce((m, s) => Math.max(m, s.length), 0);
-  const delim = '`'.repeat(Math.max(3, longest + 1));
-  return `${delim}${lang}\n${code}\n${delim}`;
-}
-
-/** Per-row token count, mirroring the header sum in the client window. */
-function rowTokens(usage: UsageLike): number {
-  return (
-    usage.totalTokens ?? (usage.inputTokens ?? 0) + (usage.outputTokens ?? 0)
-  );
-}
-
-/** Render one message's UIMessage parts into an array of Markdown blocks
- *  (text blocks + tool blocks). Mirrors the client renderer / MessageItem. */
-function renderMessageParts(parts: ExportPart[], lang: ExportLang): string[] {
-  const out: string[] = [];
-
-  for (const part of parts) {
-    if (part.type === 'text') {
-      const text = (part.text ?? '').trim();
-      if (text.length > 0) out.push(text);
-      continue;
-    }
-
-    if (!isToolPart(part.type)) continue;
-
-    const name = getToolName(part);
-    const label = toolLabel(name, lang);
-    const state = toolRunState(part.state);
-
-    const toolLines: string[] = [`**Tool: ${label}** (\`${name}\`) — ${state}`];
-    if (part.input !== undefined) {
-      toolLines.push('Input:');
-      toolLines.push(fence(stringify(part.input), 'json'));
-    }
-    if (part.output !== undefined) {
-      toolLines.push('Output:');
-      toolLines.push(fence(stringify(part.output), 'json'));
-    }
-    if (part.errorText) {
-      toolLines.push(`**Error:** ${part.errorText}`);
-    }
-    out.push(toolLines.join('\n\n'));
-  }
-
-  return out;
-}
-
-/** Resolve a persisted row's parts: prefer the rich persisted parts, else a
- *  single text part built from the plain-text content (mirrors rowToUiMessage). */
-function rowParts(row: AiChatMessage): ExportPart[] {
-  const meta = (row.metadata ?? {}) as { parts?: ExportPart[] };
-  return Array.isArray(meta.parts) && meta.parts.length > 0
-    ? meta.parts
-    : [{ type: 'text', text: row.content ?? '' }];
-}
-
-/**
- * Serialize a chat to a Markdown string from its persisted rows. Source = DB
- * ONLY (no live client state). A row whose `status` is still 'streaming' is an
- * interrupted turn that the export captured mid-flight; it is rendered up to its
- * last finished step and flagged "still generating".
- */
-export function buildChatMarkdown(args: {
-  title: string | null;
-  chatId: string;
-  rows: AiChatMessage[];
-  // Accepts a full client locale tag (e.g. 'en-US'/'ru-RU'); normalized below.
-  lang?: string;
-}): string {
-  const { title, chatId, rows } = args;
-  const lang: ExportLang = normalizeLang(args.lang);
-  const L = LABELS[lang];
-  const blocks: string[] = [];
-
-  const heading = (title ?? '').trim() || L.untitled;
-  blocks.push(`# ${heading}`);
-
-  const usageOf = (row: AiChatMessage): UsageLike | undefined => {
-    const meta = (row.metadata ?? {}) as { usage?: UsageLike };
-    return meta.usage;
-  };
-  const errorOf = (row: AiChatMessage): string | undefined => {
-    const meta = (row.metadata ?? {}) as { error?: string };
-    return meta.error;
-  };
-
-  // Metadata bullet list. Total tokens is only shown when there is a sum.
-  const totalTokens = rows.reduce((sum, row) => {
-    const usage = usageOf(row);
-    return usage ? sum + rowTokens(usage) : sum;
-  }, 0);
-  const meta = [
-    `- Chat ID: \`${chatId}\``,
-    `- Exported: ${new Date().toISOString()}`,
-    `- Messages: ${rows.length}`,
-  ];
-  if (totalTokens > 0) meta.push(`- Total tokens: ${totalTokens}`);
-  blocks.push(meta.join('\n'));
-
-  rows.forEach((row, index) => {
-    blocks.push('---');
-
-    const roleLabel = row.role === 'assistant' ? L.aiAgent : L.you;
-    blocks.push(`## ${index + 1}. ${roleLabel}`);
-
-    // Created-at kept in source as an HTML comment (out of the rendered prose).
-    if (row.createdAt) {
-      const iso =
-        row.createdAt instanceof Date
-          ? row.createdAt.toISOString()
-          : String(row.createdAt);
-      blocks.push(`<!-- ${iso} -->`);
-    }
-
-    blocks.push(...renderMessageParts(rowParts(row), lang));
-
-    // A still-'streaming' row is an interrupted/in-progress turn captured by the
-    // export; record that so the partial answer is not mistaken for complete.
-    if (row.status === 'streaming') {
-      blocks.push(`_⏳ ${L.stillGenerating}_`);
-    }
-
-    const error = errorOf(row);
-    if (error) {
-      blocks.push(`**⚠️ Error:** ${error}`);
-    }
-
-    const usage = usageOf(row);
-    if (usage) {
-      const total = usage.totalTokens ?? rowTokens(usage);
-      const reasoning =
-        usage.reasoningTokens && usage.reasoningTokens > 0
-          ? `, reasoning: ${usage.reasoningTokens}`
-          : '';
-      blocks.push(
-        `_Tokens — in: ${usage.inputTokens ?? '?'}, out: ${
-          usage.outputTokens ?? '?'
-        }${reasoning}, total: ${total}_`,
-      );
-    }
-  });
-
-  // Blank line between blocks so the Markdown renders cleanly.
-  return blocks.join('\n\n');
-}
--- a/apps/server/src/core/ai-chat/dto/ai-chat.dto.ts
+++ b/apps/server/src/core/ai-chat/dto/ai-chat.dto.ts
@@ -26,17 +26,3 @@ export class GetChatMessagesDto {
  @IsString()
  cursor?: string;
 }
-
-/** Export a chat to Markdown (#183). `lang` localizes the few fixed
- *  role/tool-action labels; defaults to English server-side. */
-export class ExportChatDto {
-  @IsString()
-  chatId: string;
-
-  // A full client locale tag (e.g. 'en-US', 'ru-RU') — normalized server-side to
-  // a supported export language (see normalizeLang). Accept any string so a
-  // region-qualified locale is not rejected (the 400 that broke the real client).
-  @IsOptional()
-  @IsString()
-  lang?: string;
-}
--- a/apps/server/src/core/ai-chat/external-mcp/dto/create-mcp-server.dto.ts
+++ b/apps/server/src/core/ai-chat/external-mcp/dto/create-mcp-server.dto.ts
@@ -42,15 +42,6 @@ export class CreateMcpServerDto {
  @IsString({ each: true })
  toolAllowlist?: string[];

-  // Admin-authored guidance ("how/when to use this server's tools") injected
-  // into the agent system prompt next to the tool descriptions (#180). Trusted,
-  // NON-secret (so it IS returned). Capped to bound prompt/token size (the
-  // built-in guide is ~1.5KB). Blank => stored as null.
-  @IsOptional()
-  @IsString()
-  @MaxLength(4000)
-  instructions?: string;
-
  @IsOptional()
  @IsBoolean()
  enabled?: boolean;
--- a/apps/server/src/core/ai-chat/external-mcp/dto/mcp-server-instructions.dto.spec.ts
+++ b/apps/server/src/core/ai-chat/external-mcp/dto/mcp-server-instructions.dto.spec.ts
@@ -1,75 +0,0 @@
-import 'reflect-metadata';
-import { plainToInstance } from 'class-transformer';
-import { validateSync } from 'class-validator';
-import { CreateMcpServerDto } from './create-mcp-server.dto';
-import { UpdateMcpServerDto } from './update-mcp-server.dto';
-
-/**
- * API-boundary validation for the per-server `instructions` field (#180): a free
- * text guide injected into the agent system prompt. It is optional, must be a
- * string, and is bounded by @MaxLength(4000) to cap prompt/token size.
- */
-describe('MCP server DTO instructions validation', () => {
-  function validateCreate(payload: unknown) {
-    const dto = plainToInstance(CreateMcpServerDto, payload);
-    return validateSync(dto as object);
-  }
-  function validateUpdate(payload: unknown) {
-    const dto = plainToInstance(UpdateMcpServerDto, payload);
-    return validateSync(dto as object);
-  }
-
-  const base = {
-    name: 'Tavily',
-    transport: 'http',
-    url: 'https://example.com/mcp',
-  };
-
-  it('accepts an omitted instructions field on create', () => {
-    expect(validateCreate({ ...base })).toHaveLength(0);
-  });
-
-  it('accepts a reasonable instructions string on create', () => {
-    expect(
-      validateCreate({ ...base, instructions: 'Use search for fresh facts.' }),
-    ).toHaveLength(0);
-  });
-
-  it('rejects instructions over MaxLength(4000) on create', () => {
-    const errors = validateCreate({
-      ...base,
-      instructions: 'a'.repeat(4001),
-    });
-    expect(
-      errors.some(
-        (e) =>
-          e.property === 'instructions' &&
-          e.constraints !== undefined &&
-          'maxLength' in e.constraints,
-      ),
-    ).toBe(true);
-  });
-
-  it('accepts instructions of exactly 4000 chars on create', () => {
-    expect(
-      validateCreate({ ...base, instructions: 'a'.repeat(4000) }),
-    ).toHaveLength(0);
-  });
-
-  it('rejects a non-string instructions value', () => {
-    const errors = validateCreate({ ...base, instructions: 123 });
-    expect(errors.some((e) => e.property === 'instructions')).toBe(true);
-  });
-
-  it('rejects instructions over MaxLength(4000) on update', () => {
-    const errors = validateUpdate({ instructions: 'a'.repeat(4001) });
-    expect(
-      errors.some(
-        (e) =>
-          e.property === 'instructions' &&
-          e.constraints !== undefined &&
-          'maxLength' in e.constraints,
-      ),
-    ).toBe(true);
-  });
-});
--- a/apps/server/src/core/ai-chat/external-mcp/dto/update-mcp-server.dto.ts
+++ b/apps/server/src/core/ai-chat/external-mcp/dto/update-mcp-server.dto.ts
@@ -43,13 +43,6 @@ export class UpdateMcpServerDto {
  @IsString({ each: true })
  toolAllowlist?: string[];

-  // Admin-authored prompt guidance (#180). Absent => unchanged; blank => cleared
-  // (stored as null by the repo). Capped to bound prompt/token size.
-  @IsOptional()
-  @IsString()
-  @MaxLength(4000)
-  instructions?: string;
-
  @IsOptional()
  @IsBoolean()
  enabled?: boolean;
--- a/apps/server/src/core/ai-chat/external-mcp/mcp-call-timeout.spec.ts
+++ b/apps/server/src/core/ai-chat/external-mcp/mcp-call-timeout.spec.ts
@@ -1,205 +0,0 @@
-import { type Tool, type ToolCallOptions } from 'ai';
-import {
-  wrapToolWithCallTimeout,
-  wrapToolsWithCallTimeout,
-} from './mcp-clients.service';
-import {
-  mcpStreamTimeoutMs,
-  mcpCallTimeoutMs,
-} from '../../../integrations/ai/ai-streaming-fetch';
-
-/**
- * Per-call total-timeout guard for external MCP tools (mcp-clients.service).
- *
- * `@ai-sdk/mcp`'s tool execute has NO built-in per-call timeout — a tool that
- * keeps the connection warm but never returns is otherwise unbounded. The
- * wrapper attaches a fresh AbortController + timer per CALL and composes it with
- * the turn's abortSignal via AbortSignal.any, so EITHER the per-call timeout OR a
- * client disconnect aborts the in-flight call.
- *
- * Fake timers prove the timeout fires WITHOUT real waiting; no leaked timer keeps
- * the process alive after a fast resolve.
- */
-const CALL_TIMEOUT_MS = 900_000;
-
-/** Build a Tool around an `execute` impl, mirroring the SDK's minimal shape. */
-function toolWith(
-  execute: (args: unknown, options: ToolCallOptions) => unknown,
-): Tool {
-  return { description: 'x', inputSchema: undefined, execute } as unknown as Tool;
-}
-
-/** Invoke a (possibly wrapped) tool's execute with an optional turn signal. */
-function callExecute(
-  tool: Tool,
-  args: unknown,
-  abortSignal?: AbortSignal,
-): unknown {
-  const execute = tool.execute as (
-    args: unknown,
-    options: ToolCallOptions,
-  ) => unknown;
-  return execute(args, { abortSignal } as ToolCallOptions);
-}
-
-describe('wrapToolWithCallTimeout', () => {
-  beforeEach(() => jest.useFakeTimers());
-  afterEach(() => {
-    jest.clearAllTimers();
-    jest.useRealTimers();
-  });
-
-  it('aborts a tool that only rejects when its abortSignal fires, after ms elapses', async () => {
-    // The tool resolves NEVER on its own — it only settles when the abortSignal
-    // it is handed aborts. So a resolution proves the per-call timer fired and
-    // aborted the call (not the tool finishing by itself).
-    let received: AbortSignal | undefined;
-    const tool = toolWith((_args, options) => {
-      received = options.abortSignal;
-      return new Promise((_resolve, reject) => {
-        options.abortSignal?.addEventListener('abort', () => {
-          reject(options.abortSignal?.reason ?? new Error('aborted'));
-        });
-      });
-    });
-
-    const wrapped = wrapToolWithCallTimeout(tool, CALL_TIMEOUT_MS);
-    const promise = callExecute(wrapped, { q: 'x' }) as Promise<unknown>;
-    // Attach the rejection handler synchronously so advancing timers cannot mark
-    // it an unhandled rejection.
-    const settled = promise.then(
-      () => ({ ok: true as const }),
-      (err: unknown) => ({ ok: false as const, err }),
-    );
-
-    // Nothing fired yet.
-    jest.advanceTimersByTime(CALL_TIMEOUT_MS - 1);
-    // Past the cap -> the per-call timer aborts the composed signal.
-    jest.advanceTimersByTime(2);
-
-    const result = await settled;
-    expect(result.ok).toBe(false);
-    expect(received).toBeInstanceOf(AbortSignal);
-    // The abort reason / rejection mentions the timeout.
-    const message =
-      (result as { err: unknown }).err instanceof Error
-        ? ((result as { err: Error }).err.message)
-        : String((result as { err: unknown }).err);
-    expect(message).toMatch(/timed out after 900000ms/);
-  });
-
-  it('aborts a REAL-client-style tool that never settles and ignores abort (race fix)', async () => {
-    // Models the ACTUAL @ai-sdk/mcp semantics: its in-flight promise does NOT
-    // reject on abort (it only checks the signal when a response arrives), so a
-    // warm-but-stuck call NEVER settles on its own and does NOT listen to the
-    // abort signal. The wrapper must still reject after `ms` via the race — an
-    // implementation that merely `await original(...)` would hang here forever.
-    // This test FAILS against the old await-only code and PASSES with the race.
-    const tool = toolWith(() => new Promise(() => {})); // never settles, no abort
-    const wrapped = wrapToolWithCallTimeout(tool, CALL_TIMEOUT_MS);
-    const promise = callExecute(wrapped, { q: 'x' }) as Promise<unknown>;
-    // Assert the rejection without hanging: drive fake time async so the timer's
-    // abort -> race rejection microtasks flush, then await the rejection.
-    const expectation = expect(promise).rejects.toThrow(/timed out after 900000ms/);
-    await jest.advanceTimersByTimeAsync(CALL_TIMEOUT_MS + 1);
-    await expectation;
-  });
-
-  it('passes a fast tool through and leaks no timer (advancing later does not throw)', async () => {
-    const tool = toolWith(() => Promise.resolve('fast-result'));
-    const wrapped = wrapToolWithCallTimeout(tool, CALL_TIMEOUT_MS);
-
-    const value = await (callExecute(wrapped, {}) as Promise<unknown>);
-    expect(value).toBe('fast-result');
-
-    // The timer was cleared in the finally — advancing past the cap aborts
-    // nothing and throws nothing.
-    expect(() => jest.advanceTimersByTime(CALL_TIMEOUT_MS * 2)).not.toThrow();
-  });
-
-  it('aborts when the caller turn signal aborts before the timeout (disconnect path)', async () => {
-    // Real-client semantics: the tool never settles and does NOT listen to abort,
-    // so the wrapper must reject via the race when the caller's turn signal (a
-    // client disconnect) aborts BEFORE the per-call cap. The race propagates the
-    // caller's abort reason.
-    const tool = toolWith(() => new Promise(() => {})); // never settles, no abort
-    const wrapped = wrapToolWithCallTimeout(tool, CALL_TIMEOUT_MS);
-    const turn = new AbortController();
-    const promise = callExecute(wrapped, {}, turn.signal) as Promise<unknown>;
-    const settled = promise.then(
-      () => ({ ok: true as const }),
-      (err: unknown) => ({ ok: false as const, err }),
-    );
-
-    // Disconnect well before the cap; the per-call timer never fires here.
-    turn.abort(new Error('client disconnected'));
-    const result = await settled;
-    expect(result.ok).toBe(false);
-    const message =
-      (result as { err: unknown }).err instanceof Error
-        ? (result as { err: Error }).err.message
-        : String((result as { err: unknown }).err);
-    // The caller's abort reason propagates through the race.
-    expect(message).toMatch(/client disconnected/);
-  });
-
-  it('passes a tool with no execute through unchanged', () => {
-    const noExecute = { description: 'x', inputSchema: undefined } as unknown as Tool;
-    const wrapped = wrapToolWithCallTimeout(noExecute, CALL_TIMEOUT_MS);
-    // Same object back, execute still absent.
-    expect(wrapped).toBe(noExecute);
-    expect((wrapped as { execute?: unknown }).execute).toBeUndefined();
-  });
-});
-
-describe('wrapToolsWithCallTimeout', () => {
-  beforeEach(() => jest.useFakeTimers());
-  afterEach(() => {
-    jest.clearAllTimers();
-    jest.useRealTimers();
-  });
-
-  it('wraps every tool in the map (each call gets its own guard)', async () => {
-    const tools: Record<string, Tool> = {
-      a: toolWith(() => Promise.resolve('A')),
-      b: toolWith(() => Promise.resolve('B')),
-    };
-    const out = wrapToolsWithCallTimeout(tools, CALL_TIMEOUT_MS);
-    expect(Object.keys(out)).toEqual(['a', 'b']);
-    expect(await (callExecute(out.a, {}) as Promise<unknown>)).toBe('A');
-    expect(await (callExecute(out.b, {}) as Promise<unknown>)).toBe('B');
-  });
-});
-
-describe('mcp timeout env helpers', () => {
-  const ORIG_SILENCE = process.env.AI_MCP_STREAM_TIMEOUT_MS;
-  const ORIG_CALL = process.env.AI_MCP_CALL_TIMEOUT_MS;
-  afterEach(() => {
-    if (ORIG_SILENCE === undefined) delete process.env.AI_MCP_STREAM_TIMEOUT_MS;
-    else process.env.AI_MCP_STREAM_TIMEOUT_MS = ORIG_SILENCE;
-    if (ORIG_CALL === undefined) delete process.env.AI_MCP_CALL_TIMEOUT_MS;
-    else process.env.AI_MCP_CALL_TIMEOUT_MS = ORIG_CALL;
-  });
-
-  it('mcpStreamTimeoutMs defaults to 5 min and honors a positive override', () => {
-    delete process.env.AI_MCP_STREAM_TIMEOUT_MS;
-    expect(mcpStreamTimeoutMs()).toBe(300_000);
-    process.env.AI_MCP_STREAM_TIMEOUT_MS = '60000';
-    expect(mcpStreamTimeoutMs()).toBe(60_000);
-    for (const bad of ['0', '-1', 'x', '']) {
-      process.env.AI_MCP_STREAM_TIMEOUT_MS = bad;
-      expect(mcpStreamTimeoutMs()).toBe(300_000);
-    }
-  });
-
-  it('mcpCallTimeoutMs defaults to 15 min and honors a positive override', () => {
-    delete process.env.AI_MCP_CALL_TIMEOUT_MS;
-    expect(mcpCallTimeoutMs()).toBe(900_000);
-    process.env.AI_MCP_CALL_TIMEOUT_MS = '120000';
-    expect(mcpCallTimeoutMs()).toBe(120_000);
-    for (const bad of ['0', '-1', 'x', '']) {
-      process.env.AI_MCP_CALL_TIMEOUT_MS = bad;
-      expect(mcpCallTimeoutMs()).toBe(900_000);
-    }
-  });
-});
--- a/apps/server/src/core/ai-chat/external-mcp/mcp-clients.service.ts
+++ b/apps/server/src/core/ai-chat/external-mcp/mcp-clients.service.ts
@@ -1,16 +1,11 @@
 import { isIP } from 'node:net';
 import { lookup as dnsLookup, type LookupAddress } from 'node:dns';
 import { Injectable, Logger } from '@nestjs/common';
-import { type Tool, type ToolCallOptions } from 'ai';
+import { type Tool } from 'ai';
 import { createMCPClient } from '@ai-sdk/mcp';
 import { Agent, type Dispatcher } from 'undici';
 import { AiMcpServerRepo } from '@docmost/db/repos/ai-chat/ai-mcp-server.repo';
 import { AiMcpServer } from '@docmost/db/types/entity.types';
-import {
-  streamingDispatcherOptions,
-  mcpStreamTimeoutMs,
-  mcpCallTimeoutMs,
-} from '../../../integrations/ai/ai-streaming-fetch';
 import { SecretBoxService } from '../../../integrations/crypto/secret-box';
 import { isUrlAllowed, isIpAllowed } from './ssrf-guard';

@@ -33,26 +28,6 @@ interface ServerOutcome {
  reason?: string;
 }

-/**
- * One server's admin-authored guidance for the agent system prompt (#180).
- * Built ONLY for a server that actually connected AND contributed ≥1 tool
- * (after the allowlist filter) AND has non-blank guidance — so a guide never
- * appears for a server whose tools the agent cannot actually call.
- */
-export interface McpServerInstruction {
-  /** Display name of the server (for the prompt section header). */
-  serverName: string;
-  /**
-   * The tool-name namespace prefix the server's tools were merged under
-   * (sanitized name, e.g. `tavily`). The prompt renders this as `tavily_*` so
-   * the model can connect the guidance to the actual tool names. Advisory:
-   * individual tools may carry a disambiguating suffix on rare collisions.
-   */
-  toolPrefix: string;
-  /** The trusted, non-blank guidance text. */
-  instructions: string;
-}
-
 export interface ExternalToolset {
  /** Namespaced external tools, merge-ready into the agent toolset. */
  tools: Record<string, Tool>;
@@ -60,11 +35,6 @@ export interface ExternalToolset {
  clients: Closable[];
  /** Per-server connect outcomes so the UI can show unavailable servers. */
  outcomes: ServerOutcome[];
-  /**
-   * Per-server prompt guidance for connected servers that contributed ≥1 tool
-   * and have non-blank instructions. Empty when no server qualifies.
-   */
-  instructions: McpServerInstruction[];
 }

 /** Connect+tools() timeout per server — a slow server must not stall the turn. */
@@ -85,8 +55,6 @@ interface CacheEntry {
  tools: Record<string, Tool>;
  clients: McpClient[];
  outcomes: ServerOutcome[];
-  /** Prompt guidance for qualifying servers (see McpServerInstruction). */
-  instructions: McpServerInstruction[];
  expiresAt: number;
  /** Active leases (turns currently using these clients). */
  refCount: number;
@@ -168,7 +136,6 @@ export class McpClientsService {
      tools: entry.tools,
      clients: [release],
      outcomes: entry.outcomes,
-      instructions: entry.instructions,
    };
  }

@@ -251,9 +218,6 @@ export class McpClientsService {
    const tools: Record<string, Tool> = {};
    const clients: McpClient[] = [];
    const outcomes: ServerOutcome[] = [];
-    // Per-call total wall-clock cap, read once for this build (env-overridable).
-    const callTimeoutMs = mcpCallTimeoutMs();
-    const instructions: McpServerInstruction[] = [];

    for (const server of servers) {
      try {
@@ -262,33 +226,14 @@ export class McpClientsService {
        clients.push(client);
        const allow = server.toolAllowlist;
        const picked =
-          Array.isArray(allow) && allow.length > 0 ? pick(raw, allow) : raw;
-        // Bound each tool's execute with a per-call total-timeout guard before
-        // merging, so a single chatty-but-stuck call is aborted after the cap.
-        const guarded = wrapToolsWithCallTimeout(picked, callTimeoutMs);
+          Array.isArray(allow) && allow.length > 0
+            ? pick(raw, allow)
+            : raw;
        // Namespace each tool with the sanitized server name AND disambiguate
        // against names already merged from earlier servers, so no external
-        // tool is silently overwritten on collision. The returned count drives
-        // whether this server's prompt guidance is included (≥1 tool merged).
-        const merged = this.mergeNamespaced(
-          tools,
-          guarded,
-          server.name,
-          server.id,
-        );
+        // tool is silently overwritten on collision.
+        this.mergeNamespaced(tools, picked, server.name, server.id);
        outcomes.push({ name: server.name, ok: true });
-        // Include this server's guidance ONLY when it actually contributed at
-        // least one tool the agent can call (allowlist may have filtered all of
-        // them out) AND the admin authored non-blank instructions. The header
-        // prefix is the sanitized server name (= the tool namespace prefix).
-        const guide = server.instructions?.trim();
-        if (merged.count > 0 && guide) {
-          instructions.push({
-            serverName: server.name,
-            toolPrefix: merged.prefix,
-            instructions: guide,
-          });
-        }
      } catch (err) {
        // A failed server is skipped — the turn proceeds with the rest. Log a
        // short warning (never the URL/headers) so ops can see degradation, and
@@ -305,7 +250,6 @@ export class McpClientsService {
      tools,
      clients,
      outcomes,
-      instructions,
      expiresAt: Date.now() + CACHE_TTL_MS,
      refCount: 0,
      evicted: false,
@@ -322,19 +266,16 @@ export class McpClientsService {
   * renaming any key that would collide with an already-merged tool (different
   * servers with the same sanitized name, or duplicates after truncation), so
   * no external tool is silently dropped via overwrite.
-   *
-   * Returns how many tools this server actually contributed and the namespace
-   * prefix used (the sanitized server name) so the caller can attach the
-   * server's prompt guidance only when ≥1 tool was merged.
   */
  private mergeNamespaced(
    target: Record<string, Tool>,
    picked: Record<string, Tool>,
    serverName: string,
    serverId: string,
-  ): { count: number; prefix: string } {
-    let count = 0;
-    for (const [name, tool] of Object.entries(namespace(picked, serverName))) {
+  ): void {
+    for (const [name, tool] of Object.entries(
+      namespace(picked, serverName),
+    )) {
      let key = name;
      if (key in target) {
        const original = key;
@@ -344,9 +285,7 @@ export class McpClientsService {
        );
      }
      target[key] = tool;
-      count += 1;
    }
-    return { count, prefix: namespacePrefix(serverName) };
  }

  /**
@@ -422,7 +361,9 @@ export class McpClientsService {

  /** Close clients, swallowing close errors so they never break a response. */
  private async closeClients(clients: McpClient[]): Promise<void> {
-    await Promise.all(clients.map((c) => c.close().catch(() => undefined)));
+    await Promise.all(
+      clients.map((c) => c.close().catch(() => undefined)),
+    );
  }
 }

@@ -435,10 +376,9 @@ export class McpClientsService {
 * lookup hands net/tls.connect ONLY a set that passed this check, so the kernel
 * can never connect to an address that did not pass the guard. Pure — no I/O.
 */
-export function validateResolvedAddresses(addrs: readonly LookupAddress[]): {
-  ok: boolean;
-  blockedHost?: string;
-} {
+export function validateResolvedAddresses(
+  addrs: readonly LookupAddress[],
+): { ok: boolean; blockedHost?: string } {
  if (addrs.length === 0) {
    return { ok: false };
  }
@@ -459,21 +399,18 @@ export function validateResolvedAddresses(addrs: readonly LookupAddress[]): {
 * to an IP literal).
 */
 function buildPinnedDispatcher(): Agent {
-  // External-MCP traffic uses a DEDICATED, shorter silence timeout
-  // (`AI_MCP_STREAM_TIMEOUT_MS`, default 5 min) — deliberately tighter than the
-  // chat provider's 15-min `streamTimeoutMs()` — so a byte-silent/hung MCP
-  // upstream is broken in ~5 min instead of 15. We keep the keep-alive options
-  // from `streamingDispatcherOptions()` but OVERRIDE headers/body timeouts.
-  // Accepted trade-off: a legitimately long but byte-silent single tool call,
-  // and an SSE transport idling >5 min BETWEEN tool calls, are also cut here; the
-  // per-call total cap (wrapToolsWithCallTimeout, `AI_MCP_CALL_TIMEOUT_MS`) is the
-  // complementary guard for chatty-but-stuck calls that keep the socket warm yet
-  // never return.
-  const mcpSilenceMs = mcpStreamTimeoutMs();
  return new Agent({
-    ...streamingDispatcherOptions(),
-    headersTimeout: mcpSilenceMs,
-    bodyTimeout: mcpSilenceMs,
+    // Disable undici's default 300s headers/body timeouts on external MCP
+    // traffic. A long agent turn keeps an SSE transport (e.g. crawl4ai's
+    // /mcp/sse) open across the whole turn; that connection can idle BETWEEN
+    // tool calls longer than 5 min, and undici's bodyTimeout would otherwise
+    // sever it mid-task — a tool-call failure that aborts the streamed turn and
+    // shows the user "Lost connection to the AI provider" (#175). A slow single
+    // tool call (a crawl) can likewise exceed headersTimeout. Connection
+    // lifetime is bounded by the turn (clients are closed in onFinish/onError/
+    // onAbort), so disabling the wall-clock cap is safe.
+    headersTimeout: 0,
+    bodyTimeout: 0,
    connect: {
      lookup: (hostname, _options, callback) => {
        // Always resolve ALL addresses ourselves; do not trust the caller's
@@ -574,7 +511,7 @@ function namespace(
  tools: Record<string, Tool>,
  serverName: string,
 ): Record<string, Tool> {
-  const prefix = namespacePrefix(serverName);
+  const prefix = sanitizeName(serverName) || 'mcp';
  const out: Record<string, Tool> = {};
  for (const [name, t] of Object.entries(tools)) {
    const safe = sanitizeName(name);
@@ -589,15 +526,6 @@ function namespace(
  return out;
 }

-/**
- * The tool-name namespace prefix for a server: its sanitized name, or `mcp`
- * when the name sanitizes to empty. Tools are merged as `${prefix}_${tool}`, so
- * the prompt guidance refers to the server's tools as `${prefix}_*`.
- */
-function namespacePrefix(serverName: string): string {
-  return sanitizeName(serverName) || 'mcp';
-}
-
 /** Reduce an arbitrary string to ^[a-zA-Z0-9_-]+, collapsing runs to '_'. */
 function sanitizeName(value: string): string {
  return value
@@ -644,78 +572,6 @@ function disambiguate(
  return capName(`${name.slice(0, MAX_TOOL_NAME_LENGTH - 14)}_${Date.now()}`);
 }

-/**
- * Wrap every tool's execute with a per-call total-timeout guard so a single
- * external MCP tool call that keeps the connection warm but never returns is
- * aborted after `ms` wall-clock (complements the transport silence timeout).
- */
-export function wrapToolsWithCallTimeout(
-  tools: Record<string, Tool>,
-  ms: number,
-): Record<string, Tool> {
-  const out: Record<string, Tool> = {};
-  for (const [name, t] of Object.entries(tools)) {
-    out[name] = wrapToolWithCallTimeout(t, ms);
-  }
-  return out;
-}
-
-/**
- * Per-call total-timeout wrapper for one MCP tool. A fresh AbortController +
- * timer bounds the call; it is composed with the turn's abortSignal via
- * AbortSignal.any so EITHER the per-call timeout OR a client disconnect aborts
- * the call. We RACE the call against the composed abort signal rather than just
- * awaiting it, because @ai-sdk/mcp does NOT settle its in-flight promise on abort
- * (verified in @ai-sdk/mcp@1.0.52: request() only does throwIfAborted() once
- * before send and only re-checks the signal inside the response-message handler,
- * which runs ONLY when a response arrives). So for a warm-but-stuck call awaiting
- * `original` alone would hang forever even after the timer aborts.
- */
-export function wrapToolWithCallTimeout(tool: Tool, ms: number): Tool {
-  const original = tool.execute;
-  if (typeof original !== 'function') return tool;
-  const execute = async (args: unknown, options: ToolCallOptions) => {
-    const controller = new AbortController();
-    const timer = setTimeout(() => {
-      controller.abort(new Error(`MCP tool call timed out after ${ms}ms`));
-    }, ms);
-    timer.unref?.();
-    const abortSignal = options?.abortSignal
-      ? AbortSignal.any([options.abortSignal, controller.signal])
-      : controller.signal;
-    // Reject as soon as the composed signal fires, independent of whether
-    // `original` ever settles. The losing `original` promise is left pending; it
-    // is cleaned up when the client is closed at turn end, and Promise.race
-    // attaches a rejection handler to BOTH inputs so a late rejection of either
-    // is never an unhandled rejection (do NOT add an extra .catch — it could
-    // swallow the real result and would break the race semantics).
-    const aborted = new Promise<never>((_, reject) => {
-      const fail = () => reject(abortReason(abortSignal));
-      if (abortSignal.aborted) fail();
-      else abortSignal.addEventListener('abort', fail, { once: true });
-    });
-    try {
-      return await Promise.race([
-        original(args, { ...options, abortSignal }),
-        aborted,
-      ]);
-    } finally {
-      clearTimeout(timer);
-    }
-  };
-  // `Tool` is a union whose `execute` overloads conflict; cast narrowly so the
-  // wrapped tool keeps every other field while swapping only `execute`.
-  return { ...tool, execute } as unknown as Tool;
-}
-
-/** The signal's reason as an Error (informative thrown value on abort/timeout). */
-function abortReason(signal: AbortSignal): Error {
-  const r = signal.reason;
-  return r instanceof Error
-    ? r
-    : new Error(typeof r === 'string' ? r : 'MCP tool call aborted');
-}
-
 /** Reject a promise after `ms`, so a hung connect/tools() never stalls a turn. */
 function withTimeout<T>(promise: Promise<T>, ms: number): Promise<T> {
  return new Promise<T>((resolve, reject) => {
--- a/apps/server/src/core/ai-chat/external-mcp/mcp-instructions.spec.ts
+++ b/apps/server/src/core/ai-chat/external-mcp/mcp-instructions.spec.ts
@@ -1,168 +0,0 @@
-import { type Tool } from 'ai';
-import { McpClientsService } from './mcp-clients.service';
-
-/**
- * Tests for the per-server prompt guidance (#180) assembled by buildEntry and
- * surfaced via toolsFor().instructions.
- *
- * REACHABILITY NOTE: buildEntry is a PRIVATE method; the smallest reachable
- * public path is toolsFor() -> getOrBuildEntry -> buildEntry -> connect/tools()
- * -> mergeNamespaced. We drive that path: stub the repo's `listEnabled` and spy
- * on the private `connect` to return fake MCP clients whose `tools()` we control.
- *
- * Contract (all checked here): a server's guidance is included ONLY when the
- * server actually connected AND contributed ≥1 callable tool (after the
- * allowlist filter) AND its instructions are non-blank. The header carries the
- * tool namespace prefix (the sanitized server name).
- */
-function fakeTool(): Tool {
-  return { description: 'x', inputSchema: undefined } as unknown as Tool;
-}
-
-interface FakeServer {
-  id: string;
-  name: string;
-  transport: string;
-  url: string;
-  headersEnc: string | null;
-  toolAllowlist: string[] | null;
-  instructions: string | null;
-}
-
-function server(
-  over: Partial<FakeServer> & { id: string; name: string },
-): FakeServer {
-  return {
-    transport: 'http',
-    url: 'https://example.com/mcp',
-    headersEnc: null,
-    toolAllowlist: null,
-    instructions: null,
-    ...over,
-  };
-}
-
-async function instructionsFor(
-  servers: FakeServer[],
-  toolsByServerId: Record<string, Record<string, Tool>>,
-  // Server ids whose connect should THROW (simulating an unavailable server).
-  failingIds: Set<string> = new Set(),
-): Promise<
-  {
-    serverName: string;
-    toolPrefix: string;
-    instructions: string;
-  }[]
-> {
-  const repoStub = {
-    listEnabled: jest.fn().mockResolvedValue(servers),
-  };
-  const service = new McpClientsService(repoStub as never, {} as never);
-
-  jest
-    .spyOn(
-      service as unknown as { connect: (s: FakeServer) => unknown },
-      'connect',
-    )
-    .mockImplementation((s: FakeServer) => {
-      if (failingIds.has(s.id)) {
-        return Promise.reject(new Error('connection failed'));
-      }
-      return Promise.resolve({
-        tools: () => Promise.resolve(toolsByServerId[s.id] ?? {}),
-        close: () => Promise.resolve(),
-      });
-    });
-
-  const toolset = await service.toolsFor('ws-1');
-  await Promise.all(toolset.clients.map((c) => c.close()));
-  return toolset.instructions;
-}
-
-describe('external MCP per-server prompt guidance (via toolsFor)', () => {
-  afterEach(() => jest.restoreAllMocks());
-
-  it('includes guidance for a connected server with non-empty text and ≥1 tool', async () => {
-    const instructions = await instructionsFor(
-      [
-        server({
-          id: 'id-tavily',
-          name: 'Tavily',
-          instructions: 'Use tavily_search for fresh facts.',
-        }),
-      ],
-      { 'id-tavily': { search: fakeTool() } },
-    );
-
-    // sanitizeName preserves case (charset [a-zA-Z0-9_-]), so the prefix is the
-    // server name as-is for an already-clean name.
-    expect(instructions).toEqual([
-      {
-        serverName: 'Tavily',
-        toolPrefix: 'Tavily',
-        instructions: 'Use tavily_search for fresh facts.',
-      },
-    ]);
-  });
-
-  it('omits guidance when the server has no instructions', async () => {
-    const instructions = await instructionsFor(
-      [server({ id: 'id-1', name: 'Tavily', instructions: null })],
-      { 'id-1': { search: fakeTool() } },
-    );
-    expect(instructions).toEqual([]);
-  });
-
-  it('omits guidance when the instructions are only whitespace', async () => {
-    const instructions = await instructionsFor(
-      [server({ id: 'id-1', name: 'Tavily', instructions: '   ' })],
-      { 'id-1': { search: fakeTool() } },
-    );
-    expect(instructions).toEqual([]);
-  });
-
-  it('omits guidance for a server that contributed ZERO tools (allowlist filtered all out)', async () => {
-    const instructions = await instructionsFor(
-      [
-        server({
-          id: 'id-1',
-          name: 'Tavily',
-          instructions: 'guide',
-          // Allowlist names a tool the server does not expose -> 0 picked.
-          toolAllowlist: ['nonexistent'],
-        }),
-      ],
-      { 'id-1': { search: fakeTool() } },
-    );
-    expect(instructions).toEqual([]);
-  });
-
-  it('omits guidance for an unavailable (failed-connect) server', async () => {
-    const instructions = await instructionsFor(
-      [server({ id: 'id-1', name: 'Tavily', instructions: 'guide' })],
-      { 'id-1': { search: fakeTool() } },
-      new Set(['id-1']),
-    );
-    expect(instructions).toEqual([]);
-  });
-
-  it('includes only the qualifying servers among several', async () => {
-    const instructions = await instructionsFor(
-      [
-        server({ id: 'ok', name: 'Tavily', instructions: 'web guide' }),
-        server({ id: 'blank', name: 'Crawl', instructions: '' }),
-        server({ id: 'down', name: 'Down', instructions: 'never shown' }),
-      ],
-      {
-        ok: { search: fakeTool() },
-        blank: { crawl: fakeTool() },
-        down: { x: fakeTool() },
-      },
-      new Set(['down']),
-    );
-
-    expect(instructions).toEqual([
-      { serverName: 'Tavily', toolPrefix: 'Tavily', instructions: 'web guide' },
-    ]);
-  });
-});
--- a/apps/server/src/core/ai-chat/external-mcp/mcp-servers-to-view.spec.ts
+++ b/apps/server/src/core/ai-chat/external-mcp/mcp-servers-to-view.spec.ts
@@ -17,7 +17,6 @@ function row(overrides: Partial<AiMcpServer>): AiMcpServer {
    enabled: true,
    toolAllowlist: null,
    headersEnc: null,
-    instructions: null,
    ...overrides,
  } as unknown as AiMcpServer;
 }
@@ -29,7 +28,11 @@ describe('McpServersService.toView (via list) — encrypted-header leak guard',
    };
    // secretBox + clients are unused by the list/toView path; pass stubs to
    // satisfy the constructor.
-    return new McpServersService(repoStub as never, {} as never, {} as never);
+    return new McpServersService(
+      repoStub as never,
+      {} as never,
+      {} as never,
+    );
  }

  it('exposes hasHeaders:true and NO headersEnc when auth headers are set', async () => {
@@ -64,7 +67,6 @@ describe('McpServersService.toView (via list) — encrypted-header leak guard',
        enabled: false,
        toolAllowlist: ['search'],
        headersEnc: 'BLOB',
-        instructions: 'Use search for fresh web facts.',
      }),
    ]);

@@ -78,19 +80,6 @@ describe('McpServersService.toView (via list) — encrypted-header leak guard',
      enabled: false,
      toolAllowlist: ['search'],
      hasHeaders: true,
-      instructions: 'Use search for fresh web facts.',
    });
  });
-
-  it('returns instructions (NON-secret) in the view, null when unset', async () => {
-    const service = buildService([
-      row({ id: 'a', instructions: 'How to use these tools.' }),
-      row({ id: 'b', instructions: null }),
-    ]);
-
-    const [withText, withoutText] = await service.list('ws-1');
-
-    expect(withText.instructions).toBe('How to use these tools.');
-    expect(withoutText.instructions).toBeNull();
-  });
 });
--- a/apps/server/src/core/ai-chat/external-mcp/mcp-servers.service.ts
+++ b/apps/server/src/core/ai-chat/external-mcp/mcp-servers.service.ts
@@ -20,9 +20,6 @@ export interface McpServerView {
  enabled: boolean;
  toolAllowlist: string[] | null;
  hasHeaders: boolean;
-  // Admin-authored prompt guidance (#180). NON-secret, so returned in the view.
-  // Null when no guidance is configured.
-  instructions: string | null;
 }

 /**
@@ -59,8 +56,6 @@ export class McpServersService {
      url: dto.url,
      headersEnc,
      toolAllowlist: dto.toolAllowlist ?? null,
-      // Blank/whitespace guidance is normalized to null by the repo.
-      instructions: dto.instructions ?? null,
      enabled: dto.enabled ?? true,
    });
    this.clients.invalidate(workspaceId);
@@ -102,8 +97,6 @@ export class McpServersService {
      headersEnc,
      // undefined => unchanged; [] / value handled by repo (empty => null).
      toolAllowlist: dto.toolAllowlist,
-      // undefined => unchanged; blank => cleared (null) by the repo.
-      instructions: dto.instructions,
      enabled: dto.enabled,
    });
    this.clients.invalidate(workspaceId);
@@ -174,7 +167,6 @@ export class McpServersService {
      enabled: row.enabled,
      toolAllowlist: row.toolAllowlist ?? null,
      hasHeaders: Boolean(row.headersEnc),
-      instructions: row.instructions ?? null,
    };
  }
 }
--- a/apps/server/src/core/ai-chat/public-share-chat.service.ts
+++ b/apps/server/src/core/ai-chat/public-share-chat.service.ts
@@ -244,15 +244,6 @@ export class PublicShareChatService {
        },
      });

-      // Drain the stream independently of the client socket so the turn always
-      // runs to completion (or to its abort) even when the anonymous client
-      // disconnects — otherwise the dead socket is the only reader, backpressure
-      // stalls the stream, and the per-turn object graph stays rooted (heap-OOM
-      // leak). consumeStream removes that backpressure (AI SDK v6 "Handling
-      // client disconnects"). Fire-and-forget; stream errors are already logged
-      // by the streamText `onError` callback above.
-      void result.consumeStream({ onError: () => undefined });
-
      // Stream the UI-message protocol straight to the hijacked Node response.
      // Surface the real provider message (AI SDK error bodies never carry the
      // API key, so this is safe; we never dump the resolved config).
--- a/apps/server/src/core/ai-chat/roles/jsonb-object.spec.ts
+++ b/apps/server/src/core/ai-chat/roles/jsonb-object.spec.ts
@@ -0,0 +1,30 @@
+import { jsonbObject } from '@docmost/db/repos/ai-agent-roles/ai-agent-roles.repo';
+
+/**
+ * Unit tests for jsonbObject: the repo helper that encodes a model_config object
+ * as a jsonb bind (or null when there is nothing to persist). It is the last
+ * line of defence before the column write, so the null-vs-bind decision is what
+ * matters here. We assert only null vs non-null because the non-null value is a
+ * kysely `sql` template fragment whose internal shape is an implementation
+ * detail of the SQL tag.
+ */
+describe('jsonbObject', () => {
+  it('returns null for null', () => {
+    expect(jsonbObject(null)).toBeNull();
+  });
+
+  it('returns null for undefined', () => {
+    expect(jsonbObject(undefined)).toBeNull();
+  });
+
+  it('returns null for an empty object (nothing to persist)', () => {
+    expect(jsonbObject({})).toBeNull();
+  });
+
+  it('returns a (non-null) jsonb bind for a non-empty object', () => {
+    const out = jsonbObject({ driver: 'gemini', chatModel: 'gemini-2.0-flash' });
+    // A real sql fragment is produced, never null/undefined.
+    expect(out).not.toBeNull();
+    expect(out).toBeDefined();
+  });
+});
--- a/apps/server/src/core/share/share-seo.controller.routing.spec.ts
+++ b/apps/server/src/core/share/share-seo.controller.routing.spec.ts
@@ -1,133 +0,0 @@
-import * as fs from 'node:fs';
-import { ShareSeoController } from './share-seo.controller';
-
-/**
- * Routing guard for ShareSeoController.getShare (red-team finding #3).
- *
- * The SEO route must NOT leak a shared page's <title>/og:title to anonymous
- * visitors / crawlers when the page is not publicly readable. It previously
- * called the raw `getShareForPage`, which skips the restricted-ancestor gate, so
- * a permission-restricted descendant of an includeSubPages share leaked its
- * title. The fix funnels through `resolveReadableSharePage` (the canonical gate)
- * AND honours `isSharingAllowed`. These tests pin that routing: a non-readable
- * page or sharing-disabled space serves the plain SPA index (no title); only a
- * readable, still-shared page gets meta tags.
- */
-
-const SECRET_TITLE = 'Restricted Quarterly Numbers';
-const INDEX_HTML = `<!doctype html><html><head><title>App</title><!--meta-tags--></head><body></body></html>`;
-const STREAM_SENTINEL = { __isStream: true } as unknown as fs.ReadStream;
-
-// Stub fs at CALL time (jest.spyOn), NOT module load (jest.mock): the controller
-// transitively pulls bcrypt, whose native module is located by node-gyp-build
-// reading the filesystem at import time — a module-level fs mock breaks that.
-beforeEach(() => {
-  jest.spyOn(fs, 'existsSync').mockReturnValue(true);
-  jest.spyOn(fs, 'readFileSync').mockReturnValue(INDEX_HTML);
-  jest.spyOn(fs, 'createReadStream').mockReturnValue(STREAM_SENTINEL);
-});
-afterEach(() => jest.restoreAllMocks());
-
-function makeRes() {
-  const res: any = {
-    sent: undefined as unknown,
-    type: jest.fn(() => res),
-    send: jest.fn((v: unknown) => {
-      res.sent = v;
-    }),
-  };
-  return res;
-}
-
-function makeController(opts: {
-  resolved: { share: any; page: any } | null;
-  sharingAllowed?: boolean;
-}) {
-  const shareService = {
-    resolveReadableSharePage: jest.fn(async () => opts.resolved),
-    isSharingAllowed: jest.fn(async () => opts.sharingAllowed ?? true),
-    // Must NEVER be used by the SEO path anymore (the bypass is the bug).
-    getShareForPage: jest.fn(async () => {
-      throw new Error('getShareForPage must not be called by the SEO path');
-    }),
-  };
-  const workspaceRepo = {
-    findFirst: async () => ({ id: 'ws-1', settings: {} }),
-  };
-  const environmentService = { isSelfHosted: () => true };
-  const controller = new ShareSeoController(
-    shareService as any,
-    workspaceRepo as any,
-    environmentService as any,
-  );
-  return { controller, shareService };
-}
-
-const req: any = { raw: { headers: { host: 'self' } } };
-
-describe('ShareSeoController.getShare routing (#3 title-leak gate)', () => {
-  it('serves the plain index (NO title) when the page is not publicly readable', async () => {
-    const { controller, shareService } = makeController({ resolved: null });
-    const res = makeRes();
-
-    await controller.getShare(res, req, 'share-key', `slug-pageB`);
-
-    // The restricted-ancestor gate ran; the raw bypass did not.
-    expect(shareService.resolveReadableSharePage).toHaveBeenCalled();
-    expect(shareService.getShareForPage).not.toHaveBeenCalled();
-    // The plain index stream was sent — NOT the title-bearing meta HTML.
-    expect(res.sent).toBe(STREAM_SENTINEL);
-  });
-
-  it('serves the plain index when sharing was disabled at the workspace/space level', async () => {
-    const { controller } = makeController({
-      resolved: {
-        share: { spaceId: 'sp-1', searchIndexing: true },
-        page: { title: SECRET_TITLE },
-      },
-      sharingAllowed: false,
-    });
-    const res = makeRes();
-
-    await controller.getShare(res, req, 'share-key', 'slug-pageB');
-
-    // The plain index stream was sent, so the restricted title never reached
-    // the response (it is only ever interpolated into the meta HTML string).
-    expect(res.sent).toBe(STREAM_SENTINEL);
-    expect(res.sent).not.toBe(SECRET_TITLE);
-  });
-
-  it('injects the title + meta for a readable, still-shared page', async () => {
-    const { controller } = makeController({
-      resolved: {
-        share: { spaceId: 'sp-1', searchIndexing: true },
-        page: { title: 'Public Handbook' },
-      },
-      sharingAllowed: true,
-    });
-    const res = makeRes();
-
-    await controller.getShare(res, req, 'share-key', 'slug-pageA');
-
-    expect(typeof res.sent).toBe('string');
-    expect(res.sent as string).toContain('<title>Public Handbook</title>');
-    expect(res.sent as string).toContain('og:title');
-    // searchIndexing on => crawlable (no noindex).
-    expect(res.sent as string).not.toContain('content="noindex"');
-  });
-
-  it('adds robots=noindex when the share opted out of search indexing', async () => {
-    const { controller } = makeController({
-      resolved: {
-        share: { spaceId: 'sp-1', searchIndexing: false },
-        page: { title: 'Internal Notes' },
-      },
-      sharingAllowed: true,
-    });
-    const res = makeRes();
-
-    await controller.getShare(res, req, 'share-key', 'slug-pageA');
-
-    expect(res.sent as string).toContain('content="noindex"');
-  });
-});
--- a/apps/server/src/core/share/share-seo.controller.ts
+++ b/apps/server/src/core/share/share-seo.controller.ts
@@ -63,38 +63,19 @@ export class ShareSeoController {

      const pageId = this.extractPageSlugId(pageSlug);

-      // Funnel through the canonical readable-share boundary (NOT the raw
-      // getShareForPage) so the restricted-ancestor gate runs: a permission-
-      // restricted descendant of an includeSubPages share must NOT leak its
-      // title to anonymous visitors / crawlers (red-team finding #3). null =>
-      // not publicly readable => serve the plain SPA index with no meta.
-      const resolved = await this.shareService.resolveReadableSharePage(
-        undefined,
+      const share = await this.shareService.getShareForPage(
        pageId,
        workspace.id,
      );

-      if (!resolved) {
-        return this.sendIndex(indexFilePath, res);
-      }
-
-      // Honour a workspace/space-level sharing toggle flipped off AFTER this
-      // share was created: the content API gates on isSharingAllowed, so the SEO
-      // path must too or it keeps serving the title for a no-longer-shared page.
-      const sharingAllowed = await this.shareService.isSharingAllowed(
-        workspace.id,
-        resolved.share.spaceId,
-      );
-      if (!sharingAllowed) {
+      if (!share) {
        return this.sendIndex(indexFilePath, res);
      }

      const html = fs.readFileSync(indexFilePath, 'utf8');
-      // Title of the PAGE being viewed (server-resolved), and noindex unless the
-      // share opted into search indexing (buildShareMetaHtml injects it).
      let transformedHtml = buildShareMetaHtml(html, {
-        title: resolved.page.title,
-        searchIndexing: resolved.share.searchIndexing,
+        title: share?.sharedPage.title,
+        searchIndexing: share.searchIndexing,
      });

      // Deliberate same-origin tracker surface: this is the ONE place where an
--- a/apps/server/src/database/jsonb-bind.spec.ts
+++ b/apps/server/src/database/jsonb-bind.spec.ts
@@ -1,38 +0,0 @@
-import { jsonbBind } from './utils';
-
-/**
- * Unit tests for jsonbBind: THE shared helper that encodes a JS array/object as
- * a jsonb bind (or null when there is nothing to persist). It is the last line
- * of defence before a jsonb column write, so the null-vs-bind decision is what
- * matters here. We assert only null vs non-null because the non-null value is a
- * kysely `sql` template fragment whose internal shape is an implementation
- * detail of the SQL tag (the `::text::jsonb` double-encoding fix is verified
- * end-to-end by the repo integration specs, where a real DB round-trip can
- * actually observe `jsonb_typeof`).
- */
-describe('jsonbBind', () => {
-  it('returns null for null / undefined', () => {
-    expect(jsonbBind(null)).toBeNull();
-    expect(jsonbBind(undefined)).toBeNull();
-  });
-
-  it('returns null for an empty array (nothing to persist)', () => {
-    expect(jsonbBind([])).toBeNull();
-  });
-
-  it('returns null for an empty object (nothing to persist)', () => {
-    expect(jsonbBind({})).toBeNull();
-  });
-
-  it('returns a (non-null) bind for a non-empty array', () => {
-    const out = jsonbBind(['search', 'crawl']);
-    expect(out).not.toBeNull();
-    expect(out).toBeDefined();
-  });
-
-  it('returns a (non-null) bind for a non-empty object', () => {
-    const out = jsonbBind({ driver: 'gemini', chatModel: 'gemini-2.0-flash' });
-    expect(out).not.toBeNull();
-    expect(out).toBeDefined();
-  });
-});
--- a/apps/server/src/database/migrations/20260625T120000-ai-mcp-servers-instructions.ts
+++ b/apps/server/src/database/migrations/20260625T120000-ai-mcp-servers-instructions.ts
@@ -1,19 +0,0 @@
-import { type Kysely } from 'kysely';
-
-export async function up(db: Kysely<any>): Promise<void> {
-  // Per-server, admin-authored instruction text injected into the agent system
-  // prompt next to the server's tool descriptions (#180). NON-secret (unlike
-  // headers_enc): it IS returned in admin views/forms. Nullable: a server may
-  // have no guidance. Trusted text — it goes inside the prompt safety sandwich.
-  await db.schema
-    .alterTable('ai_mcp_servers')
-    .addColumn('instructions', 'text', (col) => col)
-    .execute();
-}
-
-export async function down(db: Kysely<any>): Promise<void> {
-  await db.schema
-    .alterTable('ai_mcp_servers')
-    .dropColumn('instructions')
-    .execute();
-}
--- a/apps/server/src/database/migrations/20260626T120000-ai-chat-message-status.ts
+++ b/apps/server/src/database/migrations/20260626T120000-ai-chat-message-status.ts
@@ -1,18 +0,0 @@
-import { type Kysely } from 'kysely';
-
-export async function up(db: Kysely<any>): Promise<void> {
-  // Step-granular durability for the assistant turn (#183). The assistant row is
-  // now created UPFRONT (status 'streaming') and UPDATEd as each step completes,
-  // so a process death mid-turn no longer loses the whole answer. The column is
-  // NULLABLE on purpose: rows written before this migration carry NULL, which the
-  // app treats as 'completed' (a settled, pre-status message). Values written by
-  // the app: 'streaming' | 'completed' | 'error' | 'aborted'.
-  await db.schema
-    .alterTable('ai_chat_messages')
-    .addColumn('status', 'text', (col) => col)
-    .execute();
-}
-
-export async function down(db: Kysely<any>): Promise<void> {
-  await db.schema.alterTable('ai_chat_messages').dropColumn('status').execute();
-}
--- a/apps/server/src/database/repos/ai-agent-roles/ai-agent-roles.repo.spec.ts
+++ b/apps/server/src/database/repos/ai-agent-roles/ai-agent-roles.repo.spec.ts
@@ -35,13 +35,7 @@ describe('AiAgentRoleRepo.findLiveEnabled', () => {

    const result = await repo.findLiveEnabled('r-1', 'ws-1');

-    // The repo normalizes the row (modelConfig parse), so it returns a COPY, not
-    // the same reference; assert the row's fields are carried through.
-    expect(result).toMatchObject({
-      id: 'r-1',
-      workspaceId: 'ws-1',
-      enabled: true,
-    });
+    expect(result).toBe(role);
    expect(db.selectFrom).toHaveBeenCalledWith('aiAgentRoles');
    // Every security filter must be present.
    expect(where).toHaveBeenCalledWith('id', '=', 'r-1');
--- a/apps/server/src/database/repos/ai-agent-roles/ai-agent-roles.repo.ts
+++ b/apps/server/src/database/repos/ai-agent-roles/ai-agent-roles.repo.ts
@@ -1,7 +1,8 @@
 import { Injectable } from '@nestjs/common';
 import { InjectKysely } from 'nestjs-kysely';
+import { sql } from 'kysely';
 import { KyselyDB, KyselyTransaction } from '../../types/kysely.types';
-import { dbOrTx, jsonbBind, parseJsonbValue } from '../../utils';
+import { dbOrTx } from '../../utils';
 import { AiAgentRole } from '@docmost/db/types/entity.types';

 /** The jsonb shape persisted in `model_config` (loosely typed for the column). */
@@ -22,14 +23,13 @@ export class AiAgentRoleRepo {
    id: string,
    workspaceId: string,
  ): Promise<AiAgentRole | undefined> {
-    const row = await this.db
+    return this.db
      .selectFrom('aiAgentRoles')
      .selectAll('aiAgentRoles')
      .where('id', '=', id)
      .where('workspaceId', '=', workspaceId)
      .where('deletedAt', 'is', null)
      .executeTakeFirst();
-    return row ? normalizeRow(row) : row;
  }

  /**
@@ -45,7 +45,7 @@ export class AiAgentRoleRepo {
    id: string,
    workspaceId: string,
  ): Promise<AiAgentRole | undefined> {
-    const row = await this.db
+    return this.db
      .selectFrom('aiAgentRoles')
      .selectAll('aiAgentRoles')
      .where('id', '=', id)
@@ -53,19 +53,17 @@ export class AiAgentRoleRepo {
      .where('deletedAt', 'is', null)
      .where('enabled', '=', true)
      .executeTakeFirst();
-    return row ? normalizeRow(row) : row;
  }

  /** All live roles for the workspace (management list + chat picker). */
  async listByWorkspace(workspaceId: string): Promise<AiAgentRole[]> {
-    const rows = await this.db
+    return this.db
      .selectFrom('aiAgentRoles')
      .selectAll('aiAgentRoles')
      .where('workspaceId', '=', workspaceId)
      .where('deletedAt', 'is', null)
      .orderBy('createdAt', 'asc')
      .execute();
-    return rows.map(normalizeRow);
  }

  async insert(
@@ -85,7 +83,7 @@ export class AiAgentRoleRepo {
    trx?: KyselyTransaction,
  ): Promise<AiAgentRole> {
    const db = dbOrTx(this.db, trx);
-    const row = await db
+    return db
      .insertInto('aiAgentRoles')
      .values({
        workspaceId: values.workspaceId,
@@ -94,11 +92,7 @@ export class AiAgentRoleRepo {
        emoji: values.emoji ?? null,
        description: values.description ?? null,
        instructions: values.instructions,
-        // Cast: the generated `model_config` column type is the broad JsonValue
-        // union, which the concrete RawBuilder<Record> is not structurally
-        // assignable to (same reason the old jsonbObject cast to any).
-        // eslint-disable-next-line @typescript-eslint/no-explicit-any
-        modelConfig: jsonbBind(values.modelConfig) as any,
+        modelConfig: jsonbObject(values.modelConfig),
        enabled: values.enabled ?? true,
        autoStart: values.autoStart ?? true,
        // Empty string is treated as "no custom text" => null.
@@ -106,7 +100,6 @@ export class AiAgentRoleRepo {
      })
      .returningAll()
      .executeTakeFirst();
-    return normalizeRow(row);
  }

  async update(
@@ -134,7 +127,7 @@ export class AiAgentRoleRepo {
    if (patch.description !== undefined) set.description = patch.description;
    if (patch.instructions !== undefined) set.instructions = patch.instructions;
    if (patch.modelConfig !== undefined) {
-      set.modelConfig = jsonbBind(patch.modelConfig);
+      set.modelConfig = jsonbObject(patch.modelConfig);
    }
    if (patch.enabled !== undefined) set.enabled = patch.enabled;
    if (patch.autoStart !== undefined) set.autoStart = patch.autoStart;
@@ -170,36 +163,16 @@ export class AiAgentRoleRepo {
 }

 /**
- * Parse the `model_config` value read from the DB into the object the entity
- * type promises. Rows written by the old double-encoding bind (`::jsonb` instead
- * of `::text::jsonb`) round-trip as a JSON STRING, so the driver hands back e.g.
- * `'{"driver":"gemini"}'` rather than an object; the read-path check
- * `typeof cfg === 'object'` then failed and the model override was SILENTLY
- * dropped (the role fell back to the default model). Be tolerant: a JSON string
- * is parsed; an already-parsed object passes through; null / a non-object (incl.
- * an array) / unparseable value becomes null (= no override). This self-heals
- * already-corrupted rows on read, no migration required.
+ * Encode an object as a jsonb bind for the `model_config` column. The postgres
+ * driver would otherwise need an explicit cast; bind the JSON text and cast it.
+ * Returns null for null/undefined/empty objects. Cast to `any` because the
+ * generated column type is the broad `JsonValue` union, which a concrete object
+ * type is not structurally assignable to.
 */
-export function parseModelConfig(
-  value: unknown,
-): Record<string, unknown> | null {
-  // Shape guard only; the legacy double-encoding self-heal lives in
-  // parseJsonbValue (database/utils.ts).
-  return parseJsonbValue(
-    value,
-    (v): v is Record<string, unknown> =>
-      v !== null && typeof v === 'object' && !Array.isArray(v),
-  );
-}
-
-/** Normalize a DB row so `modelConfig` is always an object or null. The cast
- *  bridges parseModelConfig's concrete `Record | null` to the column's broad
- *  generated `JsonValue` type (an object is a valid JsonValue at runtime). */
-function normalizeRow(row: AiAgentRole): AiAgentRole {
-  return {
-    ...row,
-    modelConfig: parseModelConfig(
-      row.modelConfig,
-    ) as AiAgentRole['modelConfig'],
-  };
+export function jsonbObject(value: ModelConfigValue | undefined) {
+  if (value === null || value === undefined || Object.keys(value).length === 0) {
+    return null;
+  }
+  // eslint-disable-next-line @typescript-eslint/no-explicit-any
+  return sql`${JSON.stringify(value)}::jsonb` as any;
 }
--- a/apps/server/src/database/repos/ai-agent-roles/parse-model-config.spec.ts
+++ b/apps/server/src/database/repos/ai-agent-roles/parse-model-config.spec.ts
@@ -1,46 +0,0 @@
-import { parseModelConfig } from './ai-agent-roles.repo';
-
-/**
- * Unit tests for parseModelConfig: the read-side normalizer that repairs the
- * jsonb double-encoding regression on `model_config`. Rows written by the old
- * `::jsonb` bind round-trip as a JSON STRING, which the read path's
- * `typeof === 'object'` check rejected — silently dropping the model override.
- * parseModelConfig accepts an already-parsed object, parses a legacy JSON
- * string, and rejects everything that is not an object (null = no override).
- */
-describe('parseModelConfig', () => {
-  it('passes an already-parsed object through', () => {
-    expect(parseModelConfig({ driver: 'gemini' })).toEqual({
-      driver: 'gemini',
-    });
-  });
-
-  it('parses a legacy double-encoded JSON string into an object', () => {
-    expect(parseModelConfig('{"driver":"gemini","chatModel":"x"}')).toEqual({
-      driver: 'gemini',
-      chatModel: 'x',
-    });
-  });
-
-  it('returns null for null / undefined', () => {
-    expect(parseModelConfig(null)).toBeNull();
-    expect(parseModelConfig(undefined)).toBeNull();
-  });
-
-  it('returns null for a non-object JSON value (string/number/array)', () => {
-    expect(parseModelConfig('"justastring"')).toBeNull();
-    expect(parseModelConfig('42')).toBeNull();
-    // An array is an object in JS but not a valid model_config shape.
-    expect(parseModelConfig('["a","b"]')).toBeNull();
-    expect(parseModelConfig(['a', 'b'])).toBeNull();
-  });
-
-  it('returns null for an unparseable string', () => {
-    expect(parseModelConfig('not json at all')).toBeNull();
-  });
-
-  it('returns null for a raw non-object primitive', () => {
-    expect(parseModelConfig(42 as unknown)).toBeNull();
-    expect(parseModelConfig(true as unknown)).toBeNull();
-  });
-});
--- a/apps/server/src/database/repos/ai-chat/ai-chat-message.repo.ts
+++ b/apps/server/src/database/repos/ai-chat/ai-chat-message.repo.ts
@@ -1,4 +1,4 @@
-import { Injectable, Logger } from '@nestjs/common';
+import { Injectable } from '@nestjs/common';
 import { InjectKysely } from 'nestjs-kysely';
 import { KyselyDB, KyselyTransaction } from '../../types/kysely.types';
 import { dbOrTx } from '../../utils';
@@ -9,24 +9,8 @@ import {
 import { PaginationOptions } from '@docmost/db/pagination/pagination-options';
 import { executeWithCursorPagination } from '@docmost/db/pagination/cursor-pagination';

-// Crash-recovery sweep recency threshold (#183 review): a 'streaming' row is
-// only swept to 'aborted' once it has been UNTOUCHED for this long. A live turn
-// bumps `updatedAt` on every step (well under this window), so its row never
-// matches; only a turn whose process truly died (no step update for >threshold)
-// is swept. Chosen safely ABOVE the longest realistic turn so a fresh replica's
-// boot-sweep can never abort a turn another replica is actively streaming
-// (multi-instance deploy).
-const SWEEP_STREAMING_STALE_MS = 10 * 60 * 1000; // 10 minutes
-
-// Hard upper bound on the rows materialized by `findAllByChat` (export path).
-// A generous cap so a pathologically huge chat cannot load an unbounded result
-// into memory; far above any realistic transcript length.
-const FIND_ALL_BY_CHAT_LIMIT = 5000;
-
@Injectable()
 export class AiChatMessageRepo {
-  private readonly logger = new Logger(AiChatMessageRepo.name);
-
  constructor(@InjectKysely() private readonly db: KyselyDB) {}

  // The `tsv` column is a trigger-maintained tsvector used only for
@@ -41,7 +25,6 @@ export class AiChatMessageRepo {
    'content',
    'toolCalls',
    'metadata',
-    'status',
    'createdAt',
    'updatedAt',
    'deletedAt',
@@ -77,46 +60,6 @@ export class AiChatMessageRepo {
    });
  }

-  // Load ALL (non-deleted) messages of a chat in ascending chronological order
-  // (oldest -> newest), unpaginated. Used by the server-side Markdown export
-  // (#183), where the DB is the single source of truth and the whole transcript
-  // must be rendered in one pass (findByChat is cursor-paginated and would only
-  // return the first page).
-  //
-  // Hard-capped at FIND_ALL_BY_CHAT_LIMIT rows (a generous bound, far above any
-  // realistic transcript) so exporting a pathologically huge chat cannot
-  // materialize an unbounded result set in memory.
-  async findAllByChat(
-    chatId: string,
-    workspaceId: string,
-    // Injectable for tests so truncation can be exercised on a modest volume.
-    limit: number = FIND_ALL_BY_CHAT_LIMIT,
-  ): Promise<AiChatMessage[]> {
-    // Fetch newest-first (+1 to DETECT truncation), so on overflow we keep the
-    // NEWEST `limit` messages — the recent conversation matters most for an
-    // export — rather than silently dropping the tail (#183 review). Reverse back
-    // to chronological for rendering, like findRecent.
-    const rows = await this.db
-      .selectFrom('aiChatMessages')
-      .select(this.baseFields)
-      .where('chatId', '=', chatId)
-      .where('workspaceId', '=', workspaceId)
-      .where('deletedAt', 'is', null)
-      .orderBy('createdAt', 'desc')
-      .orderBy('id', 'desc')
-      .limit(limit + 1)
-      .execute();
-
-    if (rows.length > limit) {
-      rows.length = limit; // keep the newest `limit` (rows are newest-first here)
-      this.logger.warn(
-        `Chat ${chatId} export truncated to the newest ${limit} messages ` +
-          `(older messages omitted).`,
-      );
-    }
-    return rows.reverse();
-  }
-
  // Load the most RECENT `limit` messages for a chat and return them in
  // ascending chronological order (oldest -> newest), as the model expects.
  // `findByChat` returns the FIRST page ASC (the OLDEST messages), which loses
@@ -153,68 +96,4 @@ export class AiChatMessageRepo {
      .returning(this.baseFields)
      .executeTakeFirst();
  }
-
-  /**
-   * Update a single message in place by id + workspace (#183 step-granular
-   * durability). The assistant row is created UPFRONT (status 'streaming') and
-   * patched as each step completes, then finalized once on the terminal status.
-   * `updatedAt` is always bumped. Returns the updated row (baseFields) or
-   * undefined when no row matched (e.g. a foreign workspace / deleted row).
-   */
-  async update(
-    id: string,
-    workspaceId: string,
-    patch: Partial<{
-      content: string | null;
-      toolCalls: unknown;
-      metadata: unknown;
-      status: string | null;
-    }>,
-    opts?: { onlyIfStreaming?: boolean; trx?: KyselyTransaction },
-  ): Promise<AiChatMessage | undefined> {
-    const db = dbOrTx(this.db, opts?.trx);
-    let query = db
-      .updateTable('aiChatMessages')
-      .set({ ...(patch as Record<string, unknown>), updatedAt: new Date() })
-      .where('id', '=', id)
-      .where('workspaceId', '=', workspaceId);
-    // Concurrency guard (#183 review): a per-step 'streaming' update must NEVER
-    // overwrite a row the terminal callback already finalized. onStepFinish
-    // fires the streaming update fire-and-forget, so its UPDATE can land AFTER
-    // finalize on a DIFFERENT pool connection (commit order is not guaranteed).
-    // Scoping the streaming update to rows STILL in 'streaming' makes a late
-    // update a no-op once the row is completed/error/aborted — regardless of
-    // commit order. The terminal finalize runs WITHOUT this guard so it always
-    // wins.
-    if (opts?.onlyIfStreaming) {
-      query = query.where('status', '=', 'streaming');
-    }
-    return query.returning(this.baseFields).executeTakeFirst();
-  }
-
-  /**
-   * Crash-recovery sweep (#183): flip every assistant row still left in the
-   * 'streaming' state (a turn that died mid-write before reaching a terminal
-   * status) to 'aborted'. Run once on server start. Returns the number of rows
-   * swept so the caller can log it. Workspace-wide on purpose — a crash can have
-   * dangling streaming rows across any workspace.
-   *
-   * Bounded by recency (#183 review): only rows UNTOUCHED for
-   * SWEEP_STREAMING_STALE_MS are swept. A live turn bumps `updatedAt` on every
-   * step, so an actively-streaming row never matches; this prevents a fresh
-   * replica's boot-sweep from aborting a turn another replica is still streaming
-   * in a multi-instance deploy.
-   */
-  async sweepStreaming(trx?: KyselyTransaction): Promise<number> {
-    const db = dbOrTx(this.db, trx);
-    const staleBefore = new Date(Date.now() - SWEEP_STREAMING_STALE_MS);
-    const rows = await db
-      .updateTable('aiChatMessages')
-      .set({ status: 'aborted', updatedAt: new Date() })
-      .where('status', '=', 'streaming')
-      .where('updatedAt', '<', staleBefore)
-      .returning('id')
-      .execute();
-    return rows.length;
-  }
 }
--- a/apps/server/src/database/repos/ai-chat/ai-mcp-server.repo.spec.ts
+++ b/apps/server/src/database/repos/ai-chat/ai-mcp-server.repo.spec.ts
@@ -1,4 +1,4 @@
-import { parseToolAllowlist, blankToNull } from './ai-mcp-server.repo';
+import { parseToolAllowlist } from './ai-mcp-server.repo';

 /**
 * The `tool_allowlist` jsonb column historically round-trips as a JSON STRING
@@ -10,10 +10,7 @@ import { parseToolAllowlist, blankToNull } from './ai-mcp-server.repo';
 */
 describe('parseToolAllowlist', () => {
  it('passes a real string array through unchanged', () => {
-    expect(parseToolAllowlist(['search', 'crawl'])).toEqual([
-      'search',
-      'crawl',
-    ]);
+    expect(parseToolAllowlist(['search', 'crawl'])).toEqual(['search', 'crawl']);
  });

  it('parses a JSON-string array (the double-encoded read) into an array', () => {
@@ -49,26 +46,3 @@ describe('parseToolAllowlist', () => {
    expect(parseToolAllowlist(true as unknown)).toBeNull();
  });
 });
-
-/**
- * `blankToNull` normalizes the per-server `instructions` free text before it is
- * stored (#180): a missing/blank/whitespace-only value becomes null (so an empty
- * guide is never persisted), any other value is trimmed.
- */
-describe('blankToNull', () => {
-  it('returns null for null / undefined', () => {
-    expect(blankToNull(null)).toBeNull();
-    expect(blankToNull(undefined)).toBeNull();
-  });
-
-  it('returns null for an empty / whitespace-only string', () => {
-    expect(blankToNull('')).toBeNull();
-    expect(blankToNull('   ')).toBeNull();
-    expect(blankToNull('\n\t ')).toBeNull();
-  });
-
-  it('trims and returns a non-blank string', () => {
-    expect(blankToNull('  use the search tool  ')).toBe('use the search tool');
-    expect(blankToNull('guide')).toBe('guide');
-  });
-});
--- a/apps/server/src/database/repos/ai-chat/ai-mcp-server.repo.ts
+++ b/apps/server/src/database/repos/ai-chat/ai-mcp-server.repo.ts
@@ -1,11 +1,10 @@
-import { Injectable, Logger } from '@nestjs/common';
+import { Injectable } from '@nestjs/common';
 import { InjectKysely } from 'nestjs-kysely';
+import { sql } from 'kysely';
 import { KyselyDB, KyselyTransaction } from '../../types/kysely.types';
-import { dbOrTx, jsonbBind, parseJsonbValue } from '../../utils';
+import { dbOrTx } from '../../utils';
 import { AiMcpServer } from '@docmost/db/types/entity.types';

-const logger = new Logger('AiMcpServerRepo');
-
 /**
 * Repository for per-workspace external MCP servers the agent may use (§5.4).
 *
@@ -61,8 +60,6 @@ export class AiMcpServerRepo {
      url: string;
      headersEnc?: string | null;
      toolAllowlist?: string[] | null;
-      // Admin-authored prompt guidance; blank/whitespace normalizes to null.
-      instructions?: string | null;
      enabled?: boolean;
    },
    trx?: KyselyTransaction,
@@ -78,9 +75,7 @@ export class AiMcpServerRepo {
        headersEnc: values.headersEnc ?? null,
        // jsonb column: the postgres driver would otherwise encode a JS array as
        // a Postgres array literal. Bind the JSON text and cast it to jsonb.
-        toolAllowlist: jsonbBind(values.toolAllowlist),
-        // Plain text column: blank/whitespace-only guidance is stored as null.
-        instructions: blankToNull(values.instructions),
+        toolAllowlist: jsonbArray(values.toolAllowlist),
        enabled: values.enabled ?? true,
      })
      .returningAll()
@@ -98,8 +93,6 @@ export class AiMcpServerRepo {
      headersEnc?: string | null;
      // undefined => leave unchanged; null => clear; string[] => set.
      toolAllowlist?: string[] | null;
-      // undefined => leave unchanged; null/blank => clear; string => set.
-      instructions?: string | null;
      enabled?: boolean;
    },
    trx?: KyselyTransaction,
@@ -111,11 +104,7 @@ export class AiMcpServerRepo {
    if (patch.url !== undefined) set.url = patch.url;
    if (patch.headersEnc !== undefined) set.headersEnc = patch.headersEnc;
    if (patch.toolAllowlist !== undefined) {
-      set.toolAllowlist = jsonbBind(patch.toolAllowlist);
-    }
-    if (patch.instructions !== undefined) {
-      // Blank/whitespace-only guidance clears the column (stored as null).
-      set.instructions = blankToNull(patch.instructions);
+      set.toolAllowlist = jsonbArray(patch.toolAllowlist);
    }
    if (patch.enabled !== undefined) set.enabled = patch.enabled;
    await db
@@ -141,49 +130,57 @@ export class AiMcpServerRepo {
 }

 /**
- * Normalize an optional free-text field to a stored value: a missing/blank/
- * whitespace-only string becomes null (so an "empty" guide is never persisted),
- * any other string is trimmed. Returns null for null/undefined input.
+ * Encode a string[] as a jsonb bind for the `tool_allowlist` column. Passing a
+ * plain JS array to the postgres driver would serialize it as a Postgres array
+ * literal (incompatible with jsonb), so we bind the JSON text and cast it.
+ *
+ * The cast is `::text::jsonb`, NOT `::jsonb`: if the parameter is bound straight
+ * to a jsonb cast, node-postgres infers its type as jsonb and JSON-stringifies
+ * the (already-JSON) string a SECOND time, so the column ends up holding a jsonb
+ * STRING SCALAR (`"[\"a\"]"`) instead of a jsonb ARRAY. Forcing the param through
+ * `::text` first binds it as text (sent verbatim), and `::jsonb` then parses it
+ * into a real array. (`normalizeRow` below repairs rows written the old way.)
+ *
+ * Returns null for null/empty arrays (an empty allowlist means "no restriction"
+ * is not intended — callers pass null to clear; an empty array is normalized to
+ * null here so it never round-trips as `[]`).
 */
-export function blankToNull(value: string | null | undefined): string | null {
-  if (value == null) return null;
-  const trimmed = value.trim();
-  return trimmed.length > 0 ? trimmed : null;
+function jsonbArray(value: string[] | null | undefined) {
+  if (value === null || value === undefined || value.length === 0) {
+    return null;
+  }
+  // Typed as string[] so it is assignable to the toolAllowlist column.
+  return sql<string[]>`${JSON.stringify(value)}::text::jsonb`;
 }

 /**
 * Parse the `toolAllowlist` value read from the DB into the `string[] | null`
 * the entity type promises. The jsonb column historically round-trips as a JSON
- * STRING (rows written by the old double-encoding bind before the `::text::jsonb`
- * fix), so the driver hands back a string like `'["a","b"]'` rather than an
- * array. Be tolerant: normalize a JSON string to its value, then accept it only
- * if it is an array of strings; null / a non-array / unparseable value / an
- * array with a non-string element all become null (unrestricted).
+ * STRING (rows written by the old double-encoding `jsonbArray`, see above), so
+ * the driver hands back a string like `'["a","b"]'` rather than an array. Be
+ * tolerant: an already-parsed array passes through; a JSON string is parsed; null
+ * / a non-array / unparseable value becomes null (unrestricted).
 */
 export function parseToolAllowlist(value: unknown): string[] | null {
-  // Shape guard only; the legacy double-encoding self-heal lives in
-  // parseJsonbValue (database/utils.ts).
-  return parseJsonbValue(
-    value,
-    (v): v is string[] =>
-      Array.isArray(v) && v.every((x) => typeof x === 'string'),
-  );
+  if (value == null) return null;
+  if (Array.isArray(value)) {
+    return value.every((v) => typeof v === 'string') ? (value as string[]) : null;
+  }
+  if (typeof value === 'string') {
+    try {
+      const parsed = JSON.parse(value);
+      return Array.isArray(parsed) &&
+        parsed.every((v) => typeof v === 'string')
+        ? (parsed as string[])
+        : null;
+    } catch {
+      return null;
+    }
+  }
+  return null;
 }

-/**
- * Normalize a DB row so `toolAllowlist` is always `string[] | null`.
- *
- * FAIL-OPEN logging: a stored value that is present but cannot be parsed into a
- * string[] (corrupt JSON, a non-array, non-string elements) degrades to `null` =
- * "no restriction", so the agent silently gets ALL of the server's tools. Log
- * one line (server id only, never the contents) so that widening is not silent.
- */
+/** Normalize a DB row so `toolAllowlist` is always `string[] | null`. */
 function normalizeRow(row: AiMcpServer): AiMcpServer {
-  const parsed = parseToolAllowlist(row.toolAllowlist);
-  if (parsed === null && row.toolAllowlist != null) {
-    logger.warn(
-      `Corrupt tool_allowlist for MCP server ${row.id}; ignoring it (no tool restriction applied)`,
-    );
-  }
-  return { ...row, toolAllowlist: parsed };
+  return { ...row, toolAllowlist: parseToolAllowlist(row.toolAllowlist) };
 }
--- a/apps/server/src/database/repos/workspace/workspace.repo.ts
+++ b/apps/server/src/database/repos/workspace/workspace.repo.ts
@@ -10,30 +10,6 @@ import {
 import { ExpressionBuilder, sql } from 'kysely';
 import { DB, Workspaces } from '@docmost/db/types/db';

-/**
- * Writable `settings.ai.provider` keys, enforced at this generic SQL layer. This
- * repo cannot import AI-feature types, so this list is its own copy; a parity
- * test (ai-provider-settings-keys.spec.ts) asserts it equals
- * PROVIDER_SETTINGS_KEYS in ai.types so a future drift fails in CI rather than
- * silently dropping a field at this boundary.
- */
-export const AI_PROVIDER_SETTINGS_ALLOWED: readonly string[] = [
-  'driver',
-  'chatModel',
-  'chatApiStyle',
-  'chatContextWindow',
-  'embeddingModel',
-  'baseUrl',
-  'embeddingBaseUrl',
-  'sttModel',
-  'sttBaseUrl',
-  'sttApiStyle',
-  'sttLanguage',
-  'systemPrompt',
-  'publicShareChatModel',
-  'publicShareAssistantRoleId',
-];
-
@Injectable()
 export class WorkspaceRepo {
  public baseFields: Array<keyof Workspaces> = [
@@ -256,32 +232,20 @@ export class WorkspaceRepo {
  ): Promise<Workspace> {
    const db = dbOrTx(this.db, trx);
    // Assemble the provider object IN SQL. Keys are fixed provider field names
-    // (sql.lit -> inlined literals, no injection); values are bound params with
-    // an explicit cast — postgres.js sends bound params untyped, and
-    // jsonb_build_object's value args are polymorphic ("any"), so without the
-    // cast Postgres throws "could not determine data type of parameter $1". The
-    // cast is branched by the JS runtime type so the value lands in jsonb with
-    // the matching JSON type: a number stays a JSON number (e.g.
-    // chatContextWindow → `{"chatContextWindow":200000}`, jsonb_typeof 'number'),
-    // a boolean a JSON boolean, everything else a JSON string. A plain `::text`
-    // for all would store a numeric field as the JSON STRING `"200000"`, which
-    // the client's `typeof === "number"` guards reject. The result is a real
-    // jsonb object, never a double-encoded string. The CASE self-heals
+    // (sql.lit -> inlined literals, no injection); values are bound params cast
+    // to ::text — postgres.js sends bound params untyped, and jsonb_build_object's
+    // value args are polymorphic ("any"), so without the explicit ::text cast
+    // Postgres throws "could not determine data type of parameter $1". The result
+    // is a real jsonb object, never a double-encoded string. The CASE self-heals
    // workspaces whose settings.ai.provider was previously corrupted into an
    // array/string.
+    const ALLOWED = ['driver', 'chatModel', 'embeddingModel', 'baseUrl', 'embeddingBaseUrl', 'sttModel', 'sttBaseUrl', 'sttApiStyle', 'sttLanguage', 'systemPrompt', 'publicShareChatModel', 'publicShareAssistantRoleId'];
    const entries = Object.entries(provider).filter(
-      ([k, v]) => v !== undefined && AI_PROVIDER_SETTINGS_ALLOWED.includes(k),
+      ([k, v]) => v !== undefined && ALLOWED.includes(k),
    );
    const patch = entries.length
      ? sql`jsonb_build_object(${sql.join(
-          entries.flatMap(([k, v]) => [
-            sql.lit(k),
-            typeof v === 'number'
-              ? sql`${v}::numeric`
-              : typeof v === 'boolean'
-                ? sql`${v}::boolean`
-                : sql`${v}::text`,
-          ]),
+          entries.flatMap(([k, v]) => [sql.lit(k), sql`${v}::text`]),
        )})`
      : sql`'{}'::jsonb`;
    return db
--- a/apps/server/src/database/types/ai-mcp-servers.types.ts
+++ b/apps/server/src/database/types/ai-mcp-servers.types.ts
@@ -20,15 +20,8 @@ export interface AiMcpServers {
  // Encrypted JSON of the auth headers. Nullable (a server may need no auth).
  headersEnc: string | null;
  // Optional allowlist of remote tool names to expose; null = expose all.
-  // Stored as jsonb. The postgres driver may return a JSON string for legacy
-  // double-encoded rows; `AiMcpServerRepo` normalizes every read to
-  // `string[] | null` via `parseToolAllowlist`.
+  // Stored as jsonb; reads come back as a string[] from the postgres driver.
  toolAllowlist: string[] | null;
-  // Admin-authored guidance ("how/when to use this server's tools") injected
-  // into the agent system prompt (#180). Unlike `headersEnc` this is NON-secret
-  // and IS returned in admin views/forms. Plain text column (no jsonb). Null =
-  // no guidance. Trusted text — it goes inside the prompt safety sandwich.
-  instructions: string | null;
  enabled: Generated<boolean>;
  createdAt: Generated<Timestamp>;
  updatedAt: Generated<Timestamp>;
--- a/apps/server/src/database/types/db.d.ts
+++ b/apps/server/src/database/types/db.d.ts
@@ -620,10 +620,6 @@ export interface AiChatMessages {
  content: string | null;
  toolCalls: Json | null;
  metadata: Json | null;
-  // Turn lifecycle status (#183): 'streaming' | 'completed' | 'error' |
-  // 'aborted'. NULL on rows written before the status column existed; the app
-  // treats NULL as 'completed' (a settled, pre-status message).
-  status: string | null;
  tsv: string | null;
  createdAt: Generated<Timestamp>;
  updatedAt: Generated<Timestamp>;
--- a/apps/server/src/database/utils.ts
+++ b/apps/server/src/database/utils.ts
@@ -1,4 +1,3 @@
-import { sql, RawBuilder } from 'kysely';
 import { KyselyDB, KyselyTransaction } from './types/kysely.types';

 /*
@@ -32,61 +31,3 @@ export function dbOrTx(
    return db; // Use normal database instance
  }
 }
-
-/**
- * Bind a JS array/object as a `jsonb` column value, working around a postgres
- * driver double-encoding quirk. THE single implementation — repos that persist
- * jsonb (`tool_allowlist`, `model_config`, ...) call this instead of re-deriving
- * the cast.
- *
- * THE QUIRK: with the `kysely-postgres-js` / postgres.js driver, casting a bound
- * parameter straight to `::jsonb` makes the driver infer the param type as jsonb
- * and JSON-stringify the (already-JSON) text a SECOND time, so the column ends
- * up holding a jsonb STRING SCALAR (`"[\"a\"]"` / `"{\"k\":1}"`) instead of a
- * real jsonb array/object. Read paths then see a string, not the structure, and
- * silently fall back (an allowlist becomes "unrestricted", a model override is
- * ignored). Forcing the param through `::text` first binds it as text (sent
- * verbatim); `::jsonb` then parses it into a real array/object. Read-side
- * parsers repair rows written the old buggy way without a migration.
- *
- * Returns `null` for null/undefined and for "empty" values (an empty array, or
- * an object with no own enumerable keys) — callers treat empty as "clear/unset",
- * so an empty allowlist/config never round-trips as `[]`/`{}`.
- */
-export function jsonbBind<T>(
-  value: T | null | undefined,
-): RawBuilder<T> | null {
-  if (value === null || value === undefined) return null;
-  if (Array.isArray(value)) {
-    if (value.length === 0) return null;
-  } else if (typeof value === 'object') {
-    if (Object.keys(value as object).length === 0) return null;
-  }
-  return sql<T>`${JSON.stringify(value)}::text::jsonb`;
-}
-
-/**
- * READ-side counterpart to {@link jsonbBind}: tolerantly decode a jsonb value
- * read back from the DB and validate its shape with `guard`. THE single place
- * the legacy double-encoding self-heal lives, so repos keep only a type-guard.
- *
- * A row written by the old `::jsonb` bind round-trips as a JSON STRING (see the
- * quirk in jsonbBind), so the driver hands back e.g. `'["a"]'` / `'{"k":1}'`
- * rather than the structure. This parses such a string once, then applies the
- * caller's `guard`. Returns `null` for null / an unparseable string / a value
- * the guard rejects (so a corrupt or wrong-shaped value degrades to "unset").
- */
-export function parseJsonbValue<T>(
-  value: unknown,
-  guard: (v: unknown) => v is T,
-): T | null {
-  let v: unknown = value;
-  if (typeof v === 'string') {
-    try {
-      v = JSON.parse(v); // legacy double-encoded read
-    } catch {
-      return null;
-    }
-  }
-  return guard(v) ? v : null;
-}
--- a/apps/server/src/integrations/ai/ai-http-diagnostics.ts
+++ b/apps/server/src/integrations/ai/ai-http-diagnostics.ts
@@ -1,22 +1,16 @@
 import { Logger } from '@nestjs/common';

 /**
- * The provider HTTP fetch used by the chat path: a thin, behavior-neutral
- * instrumentation wrapper around a supplied `fetch`.
+ * DIAGNOSTIC (provider ECONNRESET investigation) — temporary.
 *
- * It defaults to the global `fetch`, but the chat provider passes the streaming
- * fetch (which RAISES undici's 300s stream timeouts to a generous-but-finite
- * silence timeout so a long agent turn is not severed mid-stream — #175). So this
- * wrapper observes the EXACT transport a turn uses. It NEVER retries, times out,
- * swaps the dispatcher, or reads/clones the response body — the Response is
- * returned untouched (streaming unaffected) and any error is rethrown unchanged.
- *
- * Per provider HTTP call it logs: time-to-response-headers + status + request
- * body size on success; and on a pre-response rejection the failure latency +
- * error code/cause + request body size + the idle gap since the previous call.
- * This telemetry is intentional and kept (it diagnoses provider connection
- * resets / mid-stream cuts), and it is load-bearing: the streaming fetch reaches
- * the chat provider THROUGH this wrapper, so the two are one construct.
+ * A PASSIVE, behavior-neutral wrapper around the global `fetch`, injected into
+ * the OpenAI-compatible provider client (`createOpenAI({ fetch })`, the z.ai
+ * path). Per provider HTTP call it logs: time-to-response-headers + status +
+ * request-body size on success; and on a pre-response rejection the failure
+ * latency + error code/cause + request-body size + the idle gap since the
+ * previous provider call. It NEVER retries, times out, swaps the dispatcher, or
+ * reads/clones the response body — the Response is returned untouched (streaming
+ * unaffected) and any error is rethrown unchanged.
 *
 * How to read the result (a long agentic turn makes one provider call per step):
 *  - a failed turn whose last provider line is "PRE-RESPONSE FAILED ... ECONNRESET"
@@ -29,13 +23,13 @@ import { Logger } from '@nestjs/common';
 *    different failure mode.
 *
 * The seq/last-call timestamps are module-level, so under concurrent turns the
- * idle-gap figure is approximate (fine for single-user diagnosis).
+ * idle-gap figure is approximate (fine for single-user reproduction).
 */
-export function createInstrumentedFetch(
+export function createDiagnosticFetch(
  context: string,
  // The underlying fetch to instrument. Defaults to the global fetch; the chat
-  // provider passes the streaming fetch (raised, finite undici stream timeouts,
-  // #175) so the telemetry observes the SAME transport the long agent turn uses.
+  // provider passes a streaming fetch (disabled undici stream timeouts, #175) so
+  // the telemetry observes the SAME transport the long agent turn actually uses.
  baseFetch: typeof fetch = fetch,
 ): typeof fetch {
  const logger = new Logger(context);
@@ -62,7 +56,7 @@ export function createInstrumentedFetch(
      // clone the body) so the streamed SSE response is unaffected.
      const res = await baseFetch(input, init);
      logger.log(
-        `provider HTTP: call#${callId} OK ` +
+        `provider HTTP DIAGNOSTIC: call#${callId} OK ` +
          `headersAfter=${Date.now() - startedAt}ms status=${res.status} ` +
          `reqBytes=${bodyBytes ?? 'n/a'} idleSincePrevCall=${idleSincePrev ?? 'n/a'}ms`,
      );
@@ -76,7 +70,7 @@ export function createInstrumentedFetch(
        cause?: { code?: string; message?: string };
      };
      logger.warn(
-        `provider HTTP: call#${callId} PRE-RESPONSE FAILED ` +
+        `provider HTTP DIAGNOSTIC: call#${callId} PRE-RESPONSE FAILED ` +
          `after=${Date.now() - startedAt}ms code=${e?.cause?.code ?? 'none'} ` +
          `name=${e?.name ?? 'Error'} cause=${e?.cause?.message ?? e?.message ?? 'unknown'} ` +
          `reqBytes=${bodyBytes ?? 'n/a'} idleSincePrevCall=${idleSincePrev ?? 'n/a'}ms`,
--- a/apps/server/src/integrations/ai/ai-provider-http.spec.ts
+++ b/apps/server/src/integrations/ai/ai-provider-http.spec.ts
@@ -1,40 +0,0 @@
-import { createInstrumentedFetch } from './ai-provider-http';
-
-/**
- * createInstrumentedFetch must be behavior-neutral: it delegates to the supplied
- * baseFetch with the SAME input/init, returns the Response object untouched (so
- * the streamed SSE body is never read/cloned), and rethrows the same error. The
- * baseFetch injection is the seam that carries the streaming fetch (#175) onto
- * the chat provider, so it is tested directly.
- */
-describe('createInstrumentedFetch', () => {
-  it('delegates to the injected baseFetch with the same input/init', async () => {
-    const fakeResponse = new Response('ok', { status: 200 });
-    const baseFetch = jest.fn().mockResolvedValue(fakeResponse);
-    const instrumented = createInstrumentedFetch('test', baseFetch as never);
-
-    const init = { method: 'POST', body: '{"q":1}' };
-    const res = await instrumented('https://example.com/v1/chat', init);
-
-    expect(baseFetch).toHaveBeenCalledTimes(1);
-    expect(baseFetch).toHaveBeenCalledWith('https://example.com/v1/chat', init);
-    // The Response is returned UNTOUCHED (same reference — never read/cloned).
-    expect(res).toBe(fakeResponse);
-  });
-
-  it('rethrows the base fetch error unchanged (pre-response failure)', async () => {
-    const err = Object.assign(new TypeError('fetch failed'), {
-      cause: { code: 'ECONNRESET' },
-    });
-    const baseFetch = jest.fn().mockRejectedValue(err);
-    const instrumented = createInstrumentedFetch('test', baseFetch as never);
-
-    await expect(instrumented('https://example.com/')).rejects.toBe(err);
-  });
-
-  it('defaults to the global fetch when no baseFetch is given', () => {
-    // Constructing without a baseFetch must not throw — it simply wraps global
-    // fetch (the non-chat default).
-    expect(() => createInstrumentedFetch('test')).not.toThrow();
-  });
-});
--- a/apps/server/src/integrations/ai/ai-provider-settings-keys.spec.ts
+++ b/apps/server/src/integrations/ai/ai-provider-settings-keys.spec.ts
@@ -1,75 +0,0 @@
-import { validate } from 'class-validator';
-import { plainToInstance } from 'class-transformer';
-import { PROVIDER_SETTINGS_KEYS } from './ai.types';
-import { AI_PROVIDER_SETTINGS_ALLOWED } from '@docmost/db/repos/workspace/workspace.repo';
-import { UpdateAiSettingsDto } from './dto/update-ai-settings.dto';
-
-/**
- * Drift guard: the writable provider-settings keys are maintained in two layers
- * that TypeScript cannot cross-check — PROVIDER_SETTINGS_KEYS (ai.types, used by
- * the settings service) and AI_PROVIDER_SETTINGS_ALLOWED (the generic workspace
- * repo's SQL boundary). A key missing from the repo copy silently drops the field
- * on persist (exactly what happened to chatApiStyle), so this asserts they match.
- */
-describe('provider-settings key allowlist parity', () => {
-  it('the repo SQL allowlist equals PROVIDER_SETTINGS_KEYS', () => {
-    expect([...AI_PROVIDER_SETTINGS_ALLOWED].sort()).toEqual(
-      [...PROVIDER_SETTINGS_KEYS].sort(),
-    );
-  });
-});
-
-/** DTO validation for the new chatApiStyle field (@IsIn(CHAT_API_STYLES)). */
-describe('UpdateAiSettingsDto.chatApiStyle', () => {
-  const errorsFor = async (chatApiStyle: unknown) =>
-    validate(plainToInstance(UpdateAiSettingsDto, { chatApiStyle }));
-
-  it('accepts both valid values', async () => {
-    for (const v of ['openai-compatible', 'openai']) {
-      const errs = await errorsFor(v);
-      expect(errs.find((e) => e.property === 'chatApiStyle')).toBeUndefined();
-    }
-  });
-
-  it('rejects an unknown value', async () => {
-    const errs = await errorsFor('definitely-not-a-style');
-    expect(errs.find((e) => e.property === 'chatApiStyle')).toBeDefined();
-  });
-
-  it('accepts the field being omitted (optional)', async () => {
-    const errs = await validate(plainToInstance(UpdateAiSettingsDto, {}));
-    expect(errs.find((e) => e.property === 'chatApiStyle')).toBeUndefined();
-  });
-});
-
-/** DTO validation for chatContextWindow (@IsOptional @IsInt @Min(0)). */
-describe('UpdateAiSettingsDto.chatContextWindow', () => {
-  const errorsFor = async (chatContextWindow: unknown) =>
-    validate(plainToInstance(UpdateAiSettingsDto, { chatContextWindow }));
-
-  it('accepts a non-negative integer (incl. 0 = clear the limit)', async () => {
-    for (const v of [0, 200000]) {
-      const errs = await errorsFor(v);
-      expect(
-        errs.find((e) => e.property === 'chatContextWindow'),
-      ).toBeUndefined();
-    }
-  });
-
-  it('rejects a negative value', async () => {
-    const errs = await errorsFor(-1);
-    expect(errs.find((e) => e.property === 'chatContextWindow')).toBeDefined();
-  });
-
-  it('rejects a non-integer value', async () => {
-    const errs = await errorsFor(1.5);
-    expect(errs.find((e) => e.property === 'chatContextWindow')).toBeDefined();
-  });
-
-  it('accepts the field being omitted (optional)', async () => {
-    const errs = await validate(plainToInstance(UpdateAiSettingsDto, {}));
-    expect(
-      errs.find((e) => e.property === 'chatContextWindow'),
-    ).toBeUndefined();
-  });
-});
--- a/apps/server/src/integrations/ai/ai-settings.service.ts
+++ b/apps/server/src/integrations/ai/ai-settings.service.ts
@@ -14,8 +14,6 @@ import {
  MaskedAiSettings,
  ResolvedAiConfig,
  SttApiStyle,
-  ChatApiStyle,
-  PROVIDER_SETTINGS_KEYS,
 } from './ai.types';

 /**
@@ -26,9 +24,6 @@ import {
 export interface UpdateAiSettingsInput {
  driver?: AiDriver;
  chatModel?: string;
-  chatApiStyle?: ChatApiStyle;
-  // Chat context-window size (tokens); 0/empty clears the limit.
-  chatContextWindow?: number;
  embeddingModel?: string;
  baseUrl?: string;
  embeddingBaseUrl?: string;
@@ -162,10 +157,6 @@ export class AiSettingsService {
    const config: ResolvedAiConfig = {
      driver: provider.driver,
      chatModel: provider.chatModel,
-      // Plain passthrough; getChatModel defaults unset to 'openai-compatible'.
-      chatApiStyle: provider.chatApiStyle,
-      // Admin-configured context-window size; 0/unset = no limit (badge denominator).
-      chatContextWindow: provider.chatContextWindow,
      // Cheap model id for the anonymous public-share assistant; reuses the chat
      // driver/baseUrl/apiKey. Empty/unset → callers fall back to chatModel.
      publicShareChatModel: provider.publicShareChatModel,
@@ -247,8 +238,6 @@ export class AiSettingsService {
    return {
      driver: provider.driver,
      chatModel: provider.chatModel,
-      chatApiStyle: provider.chatApiStyle,
-      chatContextWindow: provider.chatContextWindow,
      embeddingModel: provider.embeddingModel,
      baseUrl: provider.baseUrl,
      embeddingBaseUrl: provider.embeddingBaseUrl,
@@ -286,8 +275,20 @@ export class AiSettingsService {

    // Persist non-secret provider fields (only those present in the partial).
    const providerPatch: Partial<AiProviderSettings> = {};
-    // Single source of truth for the writable provider keys (see ai.types).
-    for (const key of PROVIDER_SETTINGS_KEYS) {
+    for (const key of [
+      'driver',
+      'chatModel',
+      'embeddingModel',
+      'baseUrl',
+      'embeddingBaseUrl',
+      'sttModel',
+      'sttBaseUrl',
+      'sttApiStyle',
+      'sttLanguage',
+      'systemPrompt',
+      'publicShareChatModel',
+      'publicShareAssistantRoleId',
+    ] as const) {
      if (nonSecret[key] !== undefined) {
        (providerPatch as Record<string, unknown>)[key] = nonSecret[key];
      }
--- a/apps/server/src/integrations/ai/ai-streaming-fetch.spec.ts
+++ b/apps/server/src/integrations/ai/ai-streaming-fetch.spec.ts
@@ -1,235 +1,50 @@
 import * as http from 'node:http';
 import {
  createStreamingFetch,
-  withPreResponseRetry,
-  streamTimeoutMs,
-  streamKeepAliveMs,
-  streamingDispatcherOptions,
-  isRetryableConnectError,
+  STREAMING_DISPATCHER_OPTIONS,
 } from './ai-streaming-fetch';

 /**
- * #175: undici's default 300s headers/body timeouts severed long agent turns.
- * The streaming fetch raises them to a generous-but-FINITE silence timeout (not
- * 0 — a true hang must still break). We pin: the configured value + env override,
- * that both dispatcher timeouts use it, and that a delayed response streams.
+ * #175: Node's global fetch (undici) defaults headers/body timeouts to 300s and
+ * severs a long agent turn mid-stream. createStreamingFetch must DISABLE those
+ * timeouts (0). We pin the option contract deterministically AND confirm the
+ * built fetch actually streams a deliberately-delayed response.
 */
-describe('streamTimeoutMs', () => {
-  const ORIG = process.env.AI_STREAM_TIMEOUT_MS;
-  afterEach(() => {
-    if (ORIG === undefined) delete process.env.AI_STREAM_TIMEOUT_MS;
-    else process.env.AI_STREAM_TIMEOUT_MS = ORIG;
+describe('createStreamingFetch', () => {
+  it('disables BOTH undici stream timeouts (the #175 contract)', () => {
+    expect(STREAMING_DISPATCHER_OPTIONS.headersTimeout).toBe(0);
+    expect(STREAMING_DISPATCHER_OPTIONS.bodyTimeout).toBe(0);
  });

-  it('defaults to a generous-but-finite 15 minutes', () => {
-    delete process.env.AI_STREAM_TIMEOUT_MS;
-    expect(streamTimeoutMs()).toBe(900_000);
-    // Finite — NOT disabled (0 would let a hung provider leak forever).
-    expect(streamTimeoutMs()).toBeGreaterThan(0);
-    expect(Number.isFinite(streamTimeoutMs())).toBe(true);
-  });
+  describe('against a delayed server', () => {
+    let server: http.Server;
+    let url: string;
+    // The server waits before sending ANY byte (a long time-to-first-token).
+    const DELAY = 400;

-  it('honours a positive AI_STREAM_TIMEOUT_MS override', () => {
-    process.env.AI_STREAM_TIMEOUT_MS = '120000';
-    expect(streamTimeoutMs()).toBe(120000);
-  });
+    beforeAll(async () => {
+      server = http.createServer((_req, res) => {
+        setTimeout(() => {
+          res.writeHead(200, { 'Content-Type': 'text/plain' });
+          res.end('ok');
+        }, DELAY);
+      });
+      await new Promise<void>((resolve) =>
+        server.listen(0, '127.0.0.1', resolve),
+      );
+      const addr = server.address() as import('node:net').AddressInfo;
+      url = `http://127.0.0.1:${addr.port}/`;
+    });

-  it('ignores an invalid / non-positive override (falls back to default)', () => {
-    for (const bad of ['0', '-5', 'abc', '']) {
-      process.env.AI_STREAM_TIMEOUT_MS = bad;
-      expect(streamTimeoutMs()).toBe(900_000);
-    }
-  });
+    afterAll(async () => {
+      await new Promise<void>((resolve) => server.close(() => resolve()));
+    });

-  it('applies the silence timeout + keep-alive recycle window to the dispatcher', () => {
-    delete process.env.AI_STREAM_TIMEOUT_MS;
-    delete process.env.AI_STREAM_KEEPALIVE_MS;
-    expect(streamingDispatcherOptions()).toEqual({
-      headersTimeout: 900_000,
-      bodyTimeout: 900_000,
-      keepAliveTimeout: 10_000,
-      keepAliveMaxTimeout: 10_000,
+    it('streams the delayed response instead of timing out', async () => {
+      const streamingFetch = createStreamingFetch();
+      const res = await streamingFetch(url);
+      expect(res.status).toBe(200);
+      expect(await res.text()).toBe('ok');
    });
  });
 });
-
-describe('streamKeepAliveMs', () => {
-  const ORIG = process.env.AI_STREAM_KEEPALIVE_MS;
-  afterEach(() => {
-    if (ORIG === undefined) delete process.env.AI_STREAM_KEEPALIVE_MS;
-    else process.env.AI_STREAM_KEEPALIVE_MS = ORIG;
-  });
-
-  it('defaults to 10s (recycle idle sockets so a NAT/proxy drop cannot poison reuse)', () => {
-    delete process.env.AI_STREAM_KEEPALIVE_MS;
-    expect(streamKeepAliveMs()).toBe(10_000);
-  });
-
-  it('honours a positive override and ignores invalid/non-positive', () => {
-    process.env.AI_STREAM_KEEPALIVE_MS = '4000';
-    expect(streamKeepAliveMs()).toBe(4000);
-    for (const bad of ['0', '-1', 'x', '']) {
-      process.env.AI_STREAM_KEEPALIVE_MS = bad;
-      expect(streamKeepAliveMs()).toBe(10_000);
-    }
-  });
-});
-
-describe('isRetryableConnectError', () => {
-  it('matches connection-level codes on the error or its cause', () => {
-    expect(isRetryableConnectError({ cause: { code: 'ECONNRESET' } })).toBe(true);
-    expect(isRetryableConnectError({ cause: { code: 'UND_ERR_SOCKET' } })).toBe(true);
-    expect(isRetryableConnectError({ code: 'ECONNREFUSED' })).toBe(true);
-  });
-  it('does NOT match aborts / unrelated errors', () => {
-    expect(isRetryableConnectError({ name: 'AbortError', cause: { code: 'ABORT_ERR' } })).toBe(false);
-    expect(isRetryableConnectError({ cause: { code: 'UND_ERR_HEADERS_TIMEOUT' } })).toBe(false);
-    expect(isRetryableConnectError(new Error('plain'))).toBe(false);
-    expect(isRetryableConnectError(undefined)).toBe(false);
-  });
-});
-
-describe('createStreamingFetch — against a delayed server', () => {
-  const ORIG = process.env.AI_STREAM_TIMEOUT_MS;
-  let server: http.Server;
-  let url: string;
-  // The server waits before sending ANY byte (a long time-to-first-token). It is
-  // > undici's ~1s timeout-timer granularity so a sub-second configured timeout
-  // fires deterministically in the load-bearing test below.
-  const DELAY = 1500;
-
-  beforeAll(async () => {
-    server = http.createServer((_req, res) => {
-      setTimeout(() => {
-        res.writeHead(200, { 'Content-Type': 'text/plain' });
-        res.end('ok');
-      }, DELAY);
-    });
-    await new Promise<void>((resolve) => server.listen(0, '127.0.0.1', resolve));
-    const addr = server.address() as import('node:net').AddressInfo;
-    url = `http://127.0.0.1:${addr.port}/`;
-  });
-
-  afterAll(async () => {
-    await new Promise<void>((resolve) => server.close(() => resolve()));
-  });
-
-  afterEach(() => {
-    if (ORIG === undefined) delete process.env.AI_STREAM_TIMEOUT_MS;
-    else process.env.AI_STREAM_TIMEOUT_MS = ORIG;
-  });
-
-  it('streams the delayed response at the default (generous) timeout', async () => {
-    delete process.env.AI_STREAM_TIMEOUT_MS; // default 15 min >> DELAY
-    const streamingFetch = createStreamingFetch();
-    const res = await streamingFetch(url);
-    expect(res.status).toBe(200);
-    expect(await res.text()).toBe('ok');
-  });
-
-  it('LOAD-BEARING: a sub-DELAY AI_STREAM_TIMEOUT_MS actually severs the response', async () => {
-    // Proves the configured dispatcher is wired into the fetch: with the timeout
-    // set below DELAY the call must reject with undici's headers-timeout. If the
-    // dispatcher were lost (fallback to global fetch's 300s default), the 1.5s
-    // response would slip through and this would NOT throw.
-    process.env.AI_STREAM_TIMEOUT_MS = '500';
-    const streamingFetch = createStreamingFetch();
-    let caught: unknown;
-    const startedAt = Date.now();
-    try {
-      await streamingFetch(url).then((r) => r.text());
-    } catch (e) {
-      caught = e;
-    }
-    // It rejected (a lost dispatcher -> global 300s default would NOT reject on a
-    // 1.5s response) and it did so BEFORE the response would have arrived (DELAY).
-    // Use `.name` (realm-safe) — undici's TypeError fails cross-realm instanceof.
-    expect(caught).toBeDefined();
-    expect((caught as Error)?.name).toBe('TypeError');
-    expect(Date.now() - startedAt).toBeLessThan(DELAY);
-    // When present, the undici cause is the headers timeout.
-    const code = (caught as { cause?: { code?: string } })?.cause?.code;
-    if (code) expect(code).toBe('UND_ERR_HEADERS_TIMEOUT');
-  });
-});
-
-describe('withPreResponseRetry', () => {
-  // The retry is the OUTERMOST layer (over the dispatcher-bound streaming fetch),
-  // matching ai.service's withPreResponseRetry(instrument(createStreamingFetch())).
-  // PRE_RESPONSE_CONNECT_RETRIES is 2 -> at most 3 total attempts.
-  const MAX_ATTEMPTS = 3;
-  let server: http.Server;
-  let url: string;
-  let requests = 0;
-  // 'first' resets only the first connection; 'all' resets every connection.
-  let resetMode: 'first' | 'all' = 'first';
-
-  const retryingFetch = () => withPreResponseRetry(createStreamingFetch());
-
-  beforeAll(async () => {
-    server = http.createServer((req, res) => {
-      requests += 1;
-      const shouldReset = resetMode === 'all' || requests === 1;
-      if (shouldReset) {
-        // Reset before any response byte (a poisoned/stale keep-alive socket).
-        const sock = req.socket as import('node:net').Socket & {
-          resetAndDestroy?: () => void;
-        };
-        if (typeof sock.resetAndDestroy === 'function') sock.resetAndDestroy();
-        else sock.destroy();
-        return;
-      }
-      res.writeHead(200, { 'Content-Type': 'text/plain' });
-      res.end('ok');
-    });
-    await new Promise<void>((resolve) => server.listen(0, '127.0.0.1', resolve));
-    const addr = server.address() as import('node:net').AddressInfo;
-    url = `http://127.0.0.1:${addr.port}/`;
-  });
-
-  afterAll(async () => {
-    await new Promise<void>((resolve) => server.close(() => resolve()));
-  });
-
-  beforeEach(() => {
-    requests = 0;
-    resetMode = 'first';
-  });
-
-  it('retries a pre-response reset on a fresh connection and succeeds', async () => {
-    resetMode = 'first';
-    const res = await retryingFetch()(url);
-    expect(res.status).toBe(200);
-    expect(await res.text()).toBe('ok');
-    // first request reset -> retry -> second request served.
-    expect(requests).toBe(2);
-  });
-
-  it('gives up after the retry bound and rethrows the original reset', async () => {
-    resetMode = 'all'; // every attempt resets -> retries exhaust
-    let caught: unknown;
-    try {
-      await retryingFetch()(url);
-    } catch (e) {
-      caught = e;
-    }
-    expect(caught).toBeDefined();
-    // A retryable connection error reached the caller (not swallowed).
-    expect(isRetryableConnectError(caught)).toBe(true);
-    // Bounded: exactly PRE_RESPONSE_CONNECT_RETRIES + 1 attempts hit the server
-    // (pins both the limit and that the final error propagates — guards an
-    // off-by-one or an infinite loop).
-    expect(requests).toBe(MAX_ATTEMPTS);
-  });
-
-  it('does NOT retry an aborted request (no retry storm)', async () => {
-    resetMode = 'all';
-    const ctrl = new AbortController();
-    ctrl.abort();
-    await expect(
-      retryingFetch()(url, { signal: ctrl.signal }),
-    ).rejects.toBeDefined();
-    // Pre-aborted: the request never reached the server, so nothing was retried.
-    expect(requests).toBe(0);
-  });
-});
--- a/apps/server/src/integrations/ai/ai-streaming-fetch.ts
+++ b/apps/server/src/integrations/ai/ai-streaming-fetch.ts
@@ -1,197 +1,45 @@
 import { Agent } from 'undici';

 /**
- * Default SILENCE timeout for streaming AI calls (15 min). Generous, but FINITE.
+ * Build a `fetch` for LONG-LIVED streaming AI calls (the agent chat turn).
 *
- * Node's global fetch (undici) defaults headersTimeout and bodyTimeout to
- * 300_000ms, which severed legitimate long agent turns mid-stream — surfacing as
- * "Lost connection to the AI provider" (#175): a late step with a huge context
- * pushes the model's time-to-first-token past 5 min, or a reasoning model pauses
- * >5 min between chunks. We do NOT disable the timeout (0) — that would let a
- * genuinely hung provider, with the client still connected, hang forever
- * (abortSignal only fires on client disconnect). Instead we raise it well above
- * any realistic gap while keeping it finite so a true hang is eventually broken.
+ * Node's global fetch (undici) defaults BOTH `headersTimeout` and `bodyTimeout`
+ * to 300_000ms. A legitimate long agent turn trips that and is severed
+ * mid-stream — surfacing to the user as "Lost connection to the AI provider"
+ * (issue #175):
+ *   - headersTimeout (time to the first response byte) is exceeded when a late
+ *     step sends a huge accumulated context and the model's time-to-first-token
+ *     grows past 5 min;
+ *   - bodyTimeout (max gap BETWEEN body chunks) is exceeded when a reasoning
+ *     model pauses to "think" for more than 5 min between emitted chunks.
 *
- * This bounds SILENCE (time-to-first-byte and the gap BETWEEN chunks), NOT total
- * turn duration — so an arbitrarily long turn that keeps streaming bytes is never
- * cut; only a stream that goes quiet for longer than this is treated as a hang.
- */
-const DEFAULT_STREAM_TIMEOUT_MS = 900_000;
-
-/**
- * Default keep-alive recycle window (10s). A pooled connection idle longer than
- * this is CLOSED rather than reused.
+ * This dispatcher disables both timeouts (0). Cancellation is NOT lost: the
+ * agent turn is bound to the request's abortSignal, which fires when the client
+ * disconnects (see ai-chat.controller), and the agent loop is bounded by
+ * MAX_AGENT_STEPS — so a turn still terminates; it just no longer dies at an
+ * arbitrary 5-minute wall-clock. keepAlive (undici default) is kept so the
+ * sequential per-step calls of one turn reuse the connection.
 *
- * Long agent turns leave gaps of tens of seconds between provider calls (one
- * call per step; a crawl/search tool runs in between). A NAT / reverse proxy /
- * conntrack in front of the deployment silently drops an idle connection after
- * its own timeout; undici, not knowing, then reuses that dead socket and the
- * next request fails PRE-RESPONSE with `read ECONNRESET` (#175 prod telemetry:
- * the resets correlate with idleSincePrevCall ~42s, while a direct path to the
- * provider does NOT reset). Recycling idle sockets well below such a drop window
- * means a long-gap call opens a fresh connection instead of reusing a stale one.
- * `keepAliveMaxTimeout` also caps a server-advertised keep-alive so the provider
- * cannot push the reuse window back up.
+ * A single shared dispatcher is returned (callers hold it for the service
+ * lifetime) so its connection pool is reused across requests.
 */
-const DEFAULT_STREAM_KEEPALIVE_MS = 10_000;
-
 /**
- * How many times to retry a PRE-RESPONSE connection failure (a reset/timeout
- * before ANY response byte) on a fresh connection. Safe because `fetch()` only
- * rejects before the Response resolves — a started stream is never replayed.
+ * undici Agent options for streaming AI calls: both stream timeouts DISABLED (0)
+ * so a long turn is never severed at undici's 300s default (#175). Exported so a
+ * test can pin the contract without a timing-dependent assertion.
 */
-const PRE_RESPONSE_CONNECT_RETRIES = 2;
+export const STREAMING_DISPATCHER_OPTIONS = {
+  headersTimeout: 0,
+  bodyTimeout: 0,
+} as const;

-/** undici cause codes for a connection-level failure that occurred PRE-RESPONSE. */
-const RETRYABLE_CONNECT_CODES = new Set([
-  'ECONNRESET',
-  'ECONNREFUSED',
-  'EPIPE',
-  'ETIMEDOUT',
-  'UND_ERR_SOCKET',
-  'UND_ERR_CONNECT_TIMEOUT',
-]);
-
-function positiveEnv(name: string, fallback: number): number {
-  const raw = Number(process.env[name]);
-  return Number.isFinite(raw) && raw > 0 ? raw : fallback;
-}
-
-/**
- * The configured silence timeout (ms). Override with `AI_STREAM_TIMEOUT_MS`; a
- * missing/invalid/non-positive value falls back to {@link DEFAULT_STREAM_TIMEOUT_MS}.
- */
-export function streamTimeoutMs(): number {
-  return positiveEnv('AI_STREAM_TIMEOUT_MS', DEFAULT_STREAM_TIMEOUT_MS);
-}
-
-/** Keep-alive recycle window (ms). Override with `AI_STREAM_KEEPALIVE_MS`. */
-export function streamKeepAliveMs(): number {
-  return positiveEnv('AI_STREAM_KEEPALIVE_MS', DEFAULT_STREAM_KEEPALIVE_MS);
-}
-
-/** Default SILENCE timeout for EXTERNAL-MCP transport (5 min). */
-const DEFAULT_MCP_STREAM_TIMEOUT_MS = 300_000;
-
-/** Default total wall-clock cap for ONE external MCP tool call (15 min). */
-const DEFAULT_MCP_CALL_TIMEOUT_MS = 900_000;
-
-/**
- * SILENCE timeout (ms) for EXTERNAL-MCP transport ONLY. Override with
- * `AI_MCP_STREAM_TIMEOUT_MS`; a missing/invalid/non-positive value falls back to
- * {@link DEFAULT_MCP_STREAM_TIMEOUT_MS} (5 min).
- *
- * Deliberately tighter than the chat provider's {@link streamTimeoutMs} (15 min)
- * so a byte-silent/hung MCP upstream is broken in ~5 min instead of 15. This is
- * the undici `headersTimeout`/`bodyTimeout` for the external-MCP dispatcher only
- * — it must NOT change the chat provider, which legitimately needs 15 min between
- * reasoning chunks (#175).
- *
- * Trade-off: a legitimately long but byte-silent single tool call (a slow crawl
- * that emits nothing until done) and an SSE transport that idles >5 min BETWEEN
- * tool calls are also cut here. The per-call total cap ({@link mcpCallTimeoutMs},
- * applied in mcp-clients.service) is the complementary guard for chatty-but-stuck
- * calls that keep the socket warm yet never return.
- */
-export function mcpStreamTimeoutMs(): number {
-  return positiveEnv('AI_MCP_STREAM_TIMEOUT_MS', DEFAULT_MCP_STREAM_TIMEOUT_MS);
-}
-
-/**
- * Total wall-clock cap (ms) for ONE external MCP tool call — APP-LEVEL, not
- * transport. Override with `AI_MCP_CALL_TIMEOUT_MS`; a missing/invalid/
- * non-positive value falls back to {@link DEFAULT_MCP_CALL_TIMEOUT_MS} (15 min).
- *
- * Catches a tool that keeps the connection warm (SSE heartbeats / trickle) but
- * never returns a result — which the transport silence timeout
- * ({@link mcpStreamTimeoutMs}) would never break because the socket never goes
- * byte-silent.
- */
-export function mcpCallTimeoutMs(): number {
-  return positiveEnv('AI_MCP_CALL_TIMEOUT_MS', DEFAULT_MCP_CALL_TIMEOUT_MS);
-}
-
-/**
- * undici `Agent` options for streaming AI traffic — the (generous, finite)
- * silence timeouts plus the keep-alive recycle window. Shared by the chat
- * provider fetch and the external-MCP dispatcher so they behave identically.
- */
-export function streamingDispatcherOptions(): {
-  headersTimeout: number;
-  bodyTimeout: number;
-  keepAliveTimeout: number;
-  keepAliveMaxTimeout: number;
-} {
-  const t = streamTimeoutMs();
-  const ka = streamKeepAliveMs();
-  return {
-    headersTimeout: t,
-    bodyTimeout: t,
-    keepAliveTimeout: ka,
-    keepAliveMaxTimeout: ka,
-  };
-}
-
-/** True for a connection-level error worth retrying on a fresh connection. */
-export function isRetryableConnectError(err: unknown): boolean {
-  const e = err as { code?: string; cause?: { code?: string } } | undefined;
-  const code = e?.cause?.code ?? e?.code;
-  return typeof code === 'string' && RETRYABLE_CONNECT_CODES.has(code);
-}
-
-/**
- * Build a `fetch` for long-lived streaming AI calls (the agent chat turn) backed
- * by a dedicated undici dispatcher (finite silence timeouts + keep-alive
- * recycling, #175). A single shared dispatcher is returned (callers hold it for
- * the service lifetime) so its connection pool is reused.
- *
- * This is the BASE transport — no retry. The chat path wraps it as
- * `withPreResponseRetry(createInstrumentedFetch(ctx, createStreamingFetch()))`
- * so the retry is the OUTERMOST layer and the instrumentation observes EVERY
- * attempt (a recovered reset is still logged — see withPreResponseRetry).
- */
 export function createStreamingFetch(): typeof fetch {
-  const dispatcher = new Agent(streamingDispatcherOptions());
+  const dispatcher = new Agent({ ...STREAMING_DISPATCHER_OPTIONS });
  return ((input: Parameters<typeof fetch>[0], init?: RequestInit) =>
    fetch(input, {
      ...(init ?? {}),
-      // `dispatcher` is an undici-specific init field (not in the DOM
-      // RequestInit type); Node's global fetch reads it. Cast to satisfy it.
+      // `dispatcher` is an undici-specific init field (not in the DOM RequestInit
+      // type); Node's global fetch reads it. Cast to satisfy the type.
      dispatcher,
    } as RequestInit & { dispatcher: Agent })) as typeof fetch;
 }
-
-/**
- * Wrap a fetch so a PRE-RESPONSE connection reset (`baseFetch` rejects before the
- * Response resolves — so nothing has streamed) is retried a few times on a fresh
- * connection (#175). A poisoned keep-alive socket is destroyed by undici on the
- * reset, so the retry lands on a new connection. An abort (client disconnect) is
- * never retried.
- *
- * This is the OUTERMOST transport layer by design: composing it as
- * `withPreResponseRetry(instrumentedFetch)` means every attempt — including the
- * resets that the retry recovers from — flows through the instrumentation, so the
- * "PRE-RESPONSE FAILED ... ECONNRESET ... idleSincePrevCall" telemetry stays
- * visible precisely when the fix is working (and AI_STREAM_KEEPALIVE_MS can be
- * tuned from real data). A retry INSIDE the transport would hide it.
- */
-export function withPreResponseRetry(baseFetch: typeof fetch): typeof fetch {
-  return (async (input: Parameters<typeof fetch>[0], init?: RequestInit) => {
-    for (let attempt = 0; ; attempt++) {
-      try {
-        return await baseFetch(input, init);
-      } catch (err) {
-        const aborted = init?.signal?.aborted === true;
-        if (
-          aborted ||
-          attempt >= PRE_RESPONSE_CONNECT_RETRIES ||
-          !isRetryableConnectError(err)
-        ) {
-          throw err;
-        }
-        // Brief backoff before the fresh-connection retry.
-        await new Promise((resolve) => setTimeout(resolve, 150 * (attempt + 1)));
-      }
-    }
-  }) as typeof fetch;
-}
--- a/apps/server/src/integrations/ai/ai.service.include-usage.spec.ts
+++ b/apps/server/src/integrations/ai/ai.service.include-usage.spec.ts
@@ -1,58 +0,0 @@
-// `.provider` alone cannot prove the openai-compatible factory was called with
-// `includeUsage: true` — a regression dropping it (which zeroes streamed token
-// usage / reasoning-token metadata) would still pass. So mock the factory and
-// assert the exact args. jest.mock is module-scoped, hence a dedicated file.
-
-const mockCompatibleModel = { provider: 'openai-compatible.chat', modelId: 'm' };
-// jest allows `mock`-prefixed vars inside a jest.mock factory.
-const mockCreateOpenAICompatible = jest.fn(
-  (_settings: unknown) => () => mockCompatibleModel,
-);
-
-jest.mock('@ai-sdk/openai-compatible', () => ({
-  createOpenAICompatible: (settings: unknown) =>
-    mockCreateOpenAICompatible(settings),
-}));
-
-import { AiService } from './ai.service';
-
-describe('AiService.getChatModel openai-compatible factory args', () => {
-  function serviceWith(chatApiStyle?: 'openai-compatible' | 'openai') {
-    const aiSettings = {
-      resolve: jest.fn().mockResolvedValue({
-        driver: 'openai',
-        chatModel: 'glm-5.2',
-        apiKey: 'the-key',
-        baseUrl: 'https://api.z.ai/v4',
-        chatApiStyle,
-      }),
-    };
-    return new AiService(
-      // eslint-disable-next-line @typescript-eslint/no-explicit-any
-      aiSettings as any,
-      { find: jest.fn() } as never,
-      { decryptSecret: jest.fn() } as never,
-    );
-  }
-
-  beforeEach(() => mockCreateOpenAICompatible.mockClear());
-
-  it('passes includeUsage:true plus baseURL/apiKey/fetch (default style)', async () => {
-    await serviceWith().getChatModel('ws-1'); // unset -> openai-compatible
-    expect(mockCreateOpenAICompatible).toHaveBeenCalledTimes(1);
-    expect(mockCreateOpenAICompatible).toHaveBeenCalledWith(
-      expect.objectContaining({
-        name: 'openai-compatible',
-        baseURL: 'https://api.z.ai/v4',
-        apiKey: 'the-key',
-        includeUsage: true,
-        fetch: expect.any(Function),
-      }),
-    );
-  });
-
-  it("does NOT use the openai-compatible factory for chatApiStyle 'openai'", async () => {
-    await serviceWith('openai').getChatModel('ws-1');
-    expect(mockCreateOpenAICompatible).not.toHaveBeenCalled();
-  });
-});
--- a/apps/server/src/integrations/ai/ai.service.spec.ts
+++ b/apps/server/src/integrations/ai/ai.service.spec.ts
@@ -285,64 +285,3 @@ describe('AiService.getChatModel role model override', () => {
    );
  });
 });
-
-/**
- * Chat provider selection by the EXPLICIT `chatApiStyle` (NOT inferred from
- * baseUrl): 'openai-compatible' (default) uses @ai-sdk/openai-compatible, which
- * maps streamed reasoning_content to reasoning parts; 'openai' uses the official
- * provider; and openai-compatible without a baseURL safely falls back to the
- * official provider (it has no default endpoint). Asserted via `.provider`.
- */
-describe('AiService.getChatModel chatApiStyle provider selection', () => {
-  function serviceWith(opts: {
-    baseUrl?: string;
-    chatApiStyle?: 'openai-compatible' | 'openai';
-  }) {
-    const aiSettings = {
-      resolve: jest.fn().mockResolvedValue({
-        driver: 'openai',
-        chatModel: 'glm-5.2',
-        apiKey: 'key',
-        baseUrl: opts.baseUrl,
-        chatApiStyle: opts.chatApiStyle,
-      }),
-    };
-    return new AiService(
-      // eslint-disable-next-line @typescript-eslint/no-explicit-any
-      aiSettings as any,
-      { find: jest.fn() } as never,
-      { decryptSecret: jest.fn() } as never,
-    );
-  }
-
-  const providerOf = async (svc: AiService) =>
-    (
-      (await svc.getChatModel('ws-1')) as { provider: string }
-    ).provider;
-
-  it("'openai-compatible' + baseURL -> openai-compatible provider", async () => {
-    expect(
-      await providerOf(
-        serviceWith({ baseUrl: 'https://api.z.ai/v4', chatApiStyle: 'openai-compatible' }),
-      ),
-    ).toContain('openai-compatible');
-  });
-
-  it("'openai' + baseURL -> official openai provider", async () => {
-    expect(
-      await providerOf(serviceWith({ baseUrl: 'https://api.z.ai/v4', chatApiStyle: 'openai' })),
-    ).toBe('openai.chat');
-  });
-
-  it('unset + baseURL -> defaults to openai-compatible', async () => {
-    expect(
-      await providerOf(serviceWith({ baseUrl: 'https://api.z.ai/v4' })),
-    ).toContain('openai-compatible');
-  });
-
-  it("'openai-compatible' WITHOUT baseURL -> safe fallback to official openai", async () => {
-    expect(
-      await providerOf(serviceWith({ chatApiStyle: 'openai-compatible' })),
-    ).toBe('openai.chat');
-  });
-});
--- a/apps/server/src/integrations/ai/ai.service.ts
+++ b/apps/server/src/integrations/ai/ai.service.ts
@@ -7,7 +7,6 @@ import {
  type LanguageModel,
 } from 'ai';
 import { createOpenAI } from '@ai-sdk/openai';
-import { createOpenAICompatible } from '@ai-sdk/openai-compatible';
 import { createGoogleGenerativeAI } from '@ai-sdk/google';
 import { createOllama } from 'ai-sdk-ollama';
 import { AiSettingsService } from './ai-settings.service';
@@ -15,11 +14,9 @@ import { AiNotConfiguredException } from './ai-not-configured.exception';
 import { AiEmbeddingNotConfiguredException } from './ai-embedding-not-configured.exception';
 import { AiSttNotConfiguredException } from './ai-stt-not-configured.exception';
 import { describeProviderError } from './ai-error.util';
-import { createInstrumentedFetch } from './ai-provider-http';
-import {
-  createStreamingFetch,
-  withPreResponseRetry,
-} from './ai-streaming-fetch';
+// DIAGNOSTIC (provider ECONNRESET investigation) — temporary.
+import { createDiagnosticFetch } from './ai-http-diagnostics';
+import { createStreamingFetch } from './ai-streaming-fetch';
 import { AiProviderCredentialsRepo } from '@docmost/db/repos/ai-chat/ai-provider-credentials.repo';
 import { SecretBoxService } from '../crypto/secret-box';
 import { AiDriver } from './ai.types';
@@ -49,15 +46,14 @@ export interface ChatModelOverride {
 export class AiService {
  private readonly logger = new Logger(AiService.name);

-  // Provider HTTP fetch for the chat path, layered so each transport concern is
-  // observed (#175). Inside-out: the streaming fetch (finite silence timeouts +
-  // keep-alive recycling) → provider-HTTP instrumentation (logs every attempt) →
-  // pre-response connection-reset retry as the OUTERMOST layer. Retry-outer means
-  // a reset the retry recovers from is still logged with its idle-gap, instead of
-  // collapsing into a clean "OK". Held for the service lifetime to reuse the
-  // streaming dispatcher's connection pool.
-  private readonly aiProviderFetch = withPreResponseRetry(
-    createInstrumentedFetch('AiService:provider-http', createStreamingFetch()),
+  // Provider HTTP fetch for the chat path: a streaming fetch that DISABLES
+  // undici's 300s headers/body timeouts (#175 — long agent turns were severed
+  // mid-stream), wrapped with passive ECONNRESET-investigation telemetry so the
+  // logs observe the exact transport the turn uses. Held for the service
+  // lifetime to reuse the streaming dispatcher's connection pool.
+  private readonly aiDiagnosticFetch = createDiagnosticFetch(
+    'AiService:provider-http',
+    createStreamingFetch(),
  );

  constructor(
@@ -100,10 +96,6 @@ export class AiService {

    let apiKey = cfg.apiKey;
    let baseUrl = cfg.baseUrl;
-    // Chat provider implementation, chosen EXPLICITLY by the admin (not inferred
-    // from baseUrl). Unset → 'openai-compatible' so reasoning is surfaced by
-    // default for this fork's openai+baseUrl setups.
-    const chatApiStyle = cfg.chatApiStyle ?? 'openai-compatible';

    // A driver override that differs from the workspace driver needs that
    // driver's own creds (the workspace driver's key would be wrong/absent).
@@ -154,41 +146,20 @@ export class AiService {
    }

    switch (driver) {
-      case 'openai': {
-        // The provider implementation is chosen by the admin's `chatApiStyle`
-        // (NOT inferred from baseUrl — a custom URL can front real OpenAI too).
-        // Both branches hit Chat Completions (/chat/completions); the provider
-        // fetch is the instrumented streaming fetch (finite-but-generous stream
-        // timeouts, #175).
-        //
-        // 'openai-compatible' (default) maps the third-party provider's streamed
-        // `reasoning_content` to reasoning parts (z.ai/GLM, DeepSeek, ...) — the
-        // point of #175. It has no default endpoint, so it requires a baseURL;
-        // when there is none (real OpenAI, or a role's cross-driver override that
-        // cleared baseUrl) we fall back to the official provider.
-        if (chatApiStyle === 'openai-compatible' && baseUrl) {
-          return createOpenAICompatible({
-            name: 'openai-compatible',
-            apiKey,
-            baseURL: baseUrl,
-            // Keep streamed token usage (stream_options.include_usage): without
-            // it @ai-sdk/openai-compatible omits usage, zeroing the live token
-            // counter and reasoning-token metadata. The official provider always
-            // sent it, so this preserves parity.
-            includeUsage: true,
-            fetch: this.aiProviderFetch,
-          })(chatModel);
-        }
-        // Official @ai-sdk/openai: real-OpenAI reasoning-model request shaping;
-        // `.chat()` targets Chat Completions (the default callable targets the
-        // Responses API, which openai-compatible gateways 400 on multi-turn
-        // history). In this fork baseUrl is normally set; undefined = real OpenAI.
+      case 'openai':
+        // baseURL (when set) covers openai-compatible endpoints. Use Chat
+        // Completions (/chat/completions) — the portable OpenAI-compatible
+        // endpoint. The default callable createOpenAI(...)(model) targets the
+        // Responses API (/responses), which OpenAI-compatible gateways
+        // (OpenRouter, etc.) reject on multi-turn requests (history with
+        // assistant messages) → 400.
+        // DIAGNOSTIC (provider ECONNRESET investigation) — temporary: pass the
+        // passive instrumented fetch (logging only; no behavior change).
        return createOpenAI({
          apiKey,
          baseURL: baseUrl,
-          fetch: this.aiProviderFetch,
+          fetch: this.aiDiagnosticFetch,
        }).chat(chatModel);
-      }
      case 'gemini':
        return createGoogleGenerativeAI({ apiKey })(chatModel);
      case 'ollama':
--- a/apps/server/src/integrations/ai/ai.types.ts
+++ b/apps/server/src/integrations/ai/ai.types.ts
@@ -16,15 +16,6 @@ export const AI_DRIVERS: AiDriver[] = ['openai', 'gemini', 'ollama'];
 export type SttApiStyle = 'multipart' | 'json';
 export const STT_API_STYLES: SttApiStyle[] = ['multipart', 'json'];

-// Chat provider implementation for the `openai` driver. Chosen explicitly by the
-// admin (NOT inferred from baseUrl — a custom URL can front real OpenAI too).
-// 'openai-compatible' = @ai-sdk/openai-compatible: maps streamed
-//   `reasoning_content` to reasoning parts (z.ai/GLM, DeepSeek, OpenRouter, ...).
-// 'openai' = official @ai-sdk/openai: real-OpenAI reasoning-model request shaping
-//   (max_completion_tokens, the 'developer' role), no third-party reasoning map.
-export type ChatApiStyle = 'openai-compatible' | 'openai';
-export const CHAT_API_STYLES: ChatApiStyle[] = ['openai-compatible', 'openai'];
-
 /**
 * Non-secret provider settings persisted under `settings.ai.provider`.
 * The API key is intentionally absent here.
@@ -32,16 +23,6 @@ export const CHAT_API_STYLES: ChatApiStyle[] = ['openai-compatible', 'openai'];
 export interface AiProviderSettings {
  driver: AiDriver;
  chatModel: string;
-  // Chat provider implementation for the `openai` driver. Unset → defaults to
-  // 'openai-compatible' (so reasoning is surfaced by default). See ChatApiStyle.
-  chatApiStyle?: ChatApiStyle;
-  // Admin-configured chat model context-window size, in tokens. There is no
-  // provider-independent way to discover this (OpenAI's /v1/models usually omits
-  // it, Gemini/Ollama/OpenRouter each expose it differently), so it is entered
-  // manually. Surfaced to the chat client (via assistant message metadata) as the
-  // denominator of the header "current / max" context badge. Empty/0 = no limit
-  // known → the badge shows only the current context size.
-  chatContextWindow?: number;
  embeddingModel?: string;
  baseUrl?: string;
  // Embedding-specific base URL. Falls back to `baseUrl` when empty/unset.
@@ -64,35 +45,6 @@ export interface AiProviderSettings {
  publicShareAssistantRoleId?: string;
 }

-/**
- * The persisted, non-secret provider setting keys — the SINGLE source of truth
- * for which fields a settings update may write through to `settings.ai.provider`.
- * `satisfies readonly (keyof AiProviderSettings)[]` makes the compiler reject a
- * typo or a key that is not a real provider setting.
- *
- * The settings service consumes this directly. The generic workspace repo cannot
- * import AI types, so it keeps its own copy of the same keys, guarded by a parity
- * test against this constant (so any future drift fails in CI, not silently in
- * prod — a missing key there validates fine, passes the service, and is then
- * dropped at the SQL boundary with no error).
- */
-export const PROVIDER_SETTINGS_KEYS = [
-  'driver',
-  'chatModel',
-  'chatApiStyle',
-  'chatContextWindow',
-  'embeddingModel',
-  'baseUrl',
-  'embeddingBaseUrl',
-  'sttModel',
-  'sttBaseUrl',
-  'sttApiStyle',
-  'sttLanguage',
-  'systemPrompt',
-  'publicShareChatModel',
-  'publicShareAssistantRoleId',
-] as const satisfies readonly (keyof AiProviderSettings)[];
-
 /**
 * Fully resolved provider config, including the decrypted API key for the
 * stored driver. Returned by `AiSettingsService.resolve`. The keys are held in
@@ -106,10 +58,6 @@ export const PROVIDER_SETTINGS_KEYS = [
 export interface ResolvedAiConfig extends Partial<AiProviderSettings> {
  driver?: AiDriver;
  chatModel?: string;
-  // Admin-configured chat context-window size (tokens); 0/unset = no limit. Used
-  // as the header context-badge denominator. Re-declared for parity with the
-  // explicit fields above.
-  chatContextWindow?: number;
  // Cheap model id for the public-share assistant; reuses the chat creds.
  publicShareChatModel?: string;
  // Agent-role id whose persona the public-share assistant adopts (empty/unset
@@ -128,9 +76,6 @@ export interface ResolvedAiConfig extends Partial<AiProviderSettings> {
 export interface MaskedAiSettings {
  driver?: AiDriver;
  chatModel?: string;
-  chatApiStyle?: ChatApiStyle;
-  // Admin-configured chat context-window size (tokens); 0/unset = no limit.
-  chatContextWindow?: number;
  embeddingModel?: string;
  baseUrl?: string;
  embeddingBaseUrl?: string;
--- a/apps/server/src/integrations/ai/dto/update-ai-settings.dto.ts
+++ b/apps/server/src/integrations/ai/dto/update-ai-settings.dto.ts
@@ -1,12 +1,5 @@
-import { IsIn, IsInt, IsOptional, IsString, Min } from 'class-validator';
-import {
-  AI_DRIVERS,
-  AiDriver,
-  CHAT_API_STYLES,
-  ChatApiStyle,
-  STT_API_STYLES,
-  SttApiStyle,
-} from '../ai.types';
+import { IsIn, IsOptional, IsString } from 'class-validator';
+import { AI_DRIVERS, AiDriver, STT_API_STYLES, SttApiStyle } from '../ai.types';

 /**
 * Admin update payload for the workspace AI provider settings.
@@ -25,17 +18,6 @@ export class UpdateAiSettingsDto {
  @IsString()
  chatModel?: string;

-  @IsOptional()
-  @IsIn(CHAT_API_STYLES)
-  chatApiStyle?: ChatApiStyle;
-
-  // Chat model context-window size in tokens (header context-badge denominator).
-  // 0 (or empty) clears the limit so the badge shows only the current context.
-  @IsOptional()
-  @IsInt()
-  @Min(0)
-  chatContextWindow?: number;
-
  @IsOptional()
  @IsString()
  embeddingModel?: string;
--- a/apps/server/test/integration/ai-agent-roles-repo.int-spec.ts
+++ b/apps/server/test/integration/ai-agent-roles-repo.int-spec.ts
@@ -1,5 +1,4 @@
-import { Kysely, sql } from 'kysely';
-import { randomUUID } from 'node:crypto';
+import { Kysely } from 'kysely';
 import { AiAgentRoleRepo } from '@docmost/db/repos/ai-agent-roles/ai-agent-roles.repo';
 import { getTestDb, destroyTestDb, createWorkspace } from './db';

@@ -26,16 +25,8 @@ describe('AiAgentRoleRepo isolation + partial unique index [integration]', () =>
  });

  it('findById / listByWorkspace exclude soft-deleted rows', async () => {
-    const live = await repo.insert({
-      workspaceId: w1,
-      name: 'Live',
-      instructions: 'x',
-    });
-    const dead = await repo.insert({
-      workspaceId: w1,
-      name: 'Dead',
-      instructions: 'x',
-    });
+    const live = await repo.insert({ workspaceId: w1, name: 'Live', instructions: 'x' });
+    const dead = await repo.insert({ workspaceId: w1, name: 'Dead', instructions: 'x' });
    await repo.softDelete(dead.id, w1);

    expect(await repo.findById(live.id, w1)).toBeDefined();
@@ -47,11 +38,7 @@ describe('AiAgentRoleRepo isolation + partial unique index [integration]', () =>
  });

  it('findById of a W2 role from W1 context returns undefined (tenant isolation)', async () => {
-    const w2role = await repo.insert({
-      workspaceId: w2,
-      name: 'W2Role',
-      instructions: 'x',
-    });
+    const w2role = await repo.insert({ workspaceId: w2, name: 'W2Role', instructions: 'x' });

    expect(await repo.findById(w2role.id, w2)).toBeDefined();
    // Same id, wrong workspace context -> not visible.
@@ -71,100 +58,21 @@ describe('AiAgentRoleRepo isolation + partial unique index [integration]', () =>
  });

  it('same name is reusable after softDelete (partial unique index WHERE deleted_at IS NULL)', async () => {
-    const first = await repo.insert({
-      workspaceId: w1,
-      name: 'Reusable',
-      instructions: 'x',
-    });
+    const first = await repo.insert({ workspaceId: w1, name: 'Reusable', instructions: 'x' });
    await repo.softDelete(first.id, w1);

    // Now inserting the same name must succeed because the soft-deleted row is
    // excluded from the partial unique index.
-    const second = await repo.insert({
-      workspaceId: w1,
-      name: 'Reusable',
-      instructions: 'x',
-    });
+    const second = await repo.insert({ workspaceId: w1, name: 'Reusable', instructions: 'x' });
    expect(second.id).toBeDefined();
    expect(second.id).not.toBe(first.id);
  });

  it('same name in W1 and W2 is allowed (unique is per-workspace)', async () => {
-    const a = await repo.insert({
-      workspaceId: w1,
-      name: 'CrossTenant',
-      instructions: 'x',
-    });
-    const b = await repo.insert({
-      workspaceId: w2,
-      name: 'CrossTenant',
-      instructions: 'x',
-    });
+    const a = await repo.insert({ workspaceId: w1, name: 'CrossTenant', instructions: 'x' });
+    const b = await repo.insert({ workspaceId: w2, name: 'CrossTenant', instructions: 'x' });
    expect(a.id).toBeDefined();
    expect(b.id).toBeDefined();
    expect(a.id).not.toBe(b.id);
  });
-
-  // model_config jsonb round-trip (issue #173 §1): the same double-encoding bug
-  // PR #172 fixed for tool_allowlist lived in jsonbObject. A DB round-trip is the
-  // only way to observe it — the write must land as a real jsonb OBJECT, and a
-  // legacy string-scalar row must self-heal on read (else the model override is
-  // silently dropped and the role falls back to the default model).
-  const jsonbTypeof = async (id: string): Promise<string | null> => {
-    const res = await sql<{ t: string | null }>`
-      SELECT jsonb_typeof(model_config) AS t
-      FROM ai_agent_roles WHERE id = ${id}
-    `.execute(db);
-    return res.rows[0]?.t ?? null;
-  };
-
-  it('insert stores model_config as a jsonb OBJECT and reads it back as an object', async () => {
-    const role = await repo.insert({
-      workspaceId: w1,
-      name: `Model-${randomUUID()}`,
-      instructions: 'x',
-      modelConfig: { driver: 'gemini', chatModel: 'gemini-2.0-flash' },
-    });
-    expect(await jsonbTypeof(role.id)).toBe('object');
-    // The returned row is already normalized to an object.
-    expect(role.modelConfig).toEqual({
-      driver: 'gemini',
-      chatModel: 'gemini-2.0-flash',
-    });
-    const found = await repo.findById(role.id, w1);
-    expect(found?.modelConfig).toEqual({
-      driver: 'gemini',
-      chatModel: 'gemini-2.0-flash',
-    });
-  });
-
-  it('an empty model_config is normalized to null (no override)', async () => {
-    const role = await repo.insert({
-      workspaceId: w1,
-      name: `Empty-${randomUUID()}`,
-      instructions: 'x',
-      modelConfig: {},
-    });
-    // The column is SQL NULL, so jsonb_typeof returns SQL NULL (JS null).
-    expect(await jsonbTypeof(role.id)).toBeNull();
-    expect((await repo.findById(role.id, w1))?.modelConfig).toBeNull();
-  });
-
-  it('repairs a legacy double-encoded (string scalar) model_config on read', async () => {
-    const id = randomUUID();
-    // Seed the corrupt string-scalar shape the old `::jsonb` bind produced.
-    await sql`
-      INSERT INTO ai_agent_roles (id, workspace_id, name, instructions, model_config)
-      VALUES (
-        ${id}, ${w1}, ${`Legacy-${id}`}, 'x',
-        to_jsonb(${'{"driver":"openai","chatModel":"gpt"}'}::text)
-      )
-    `.execute(db);
-    expect(await jsonbTypeof(id)).toBe('string'); // sanity: really corrupt
-
-    expect((await repo.findById(id, w1))?.modelConfig).toEqual({
-      driver: 'openai',
-      chatModel: 'gpt',
-    });
-  });
 });
--- a/apps/server/test/integration/ai-chat-message-status.int-spec.ts
+++ b/apps/server/test/integration/ai-chat-message-status.int-spec.ts
@@ -1,270 +0,0 @@
-import { Kysely } from 'kysely';
-import { AiChatMessageRepo } from '@docmost/db/repos/ai-chat/ai-chat-message.repo';
-import {
-  getTestDb,
-  destroyTestDb,
-  createWorkspace,
-  createUser,
-  createChat,
-  createMessage,
-} from './db';
-
-/**
- * Integration coverage for the #183 step-granular durability primitives on
- * AiChatMessageRepo: `update` (in-place patch by id+workspace, bumps updatedAt,
- * returns the row) and `sweepStreaming` (crash recovery: flip dangling
- * 'streaming' rows to 'aborted'). Real SQL against docmost_test, not a mock.
- */
-describe('AiChatMessageRepo.update + sweepStreaming [integration]', () => {
-  let db: Kysely<any>;
-  let repo: AiChatMessageRepo;
-  let workspaceId: string;
-  let otherWorkspaceId: string;
-  let userId: string;
-  let chatId: string;
-  let otherChatId: string;
-
-  beforeAll(async () => {
-    db = getTestDb();
-    repo = new AiChatMessageRepo(db as any);
-    workspaceId = (await createWorkspace(db)).id;
-    otherWorkspaceId = (await createWorkspace(db)).id;
-    userId = (await createUser(db, workspaceId)).id;
-    chatId = (await createChat(db, { workspaceId, creatorId: userId })).id;
-    const otherUser = await createUser(db, otherWorkspaceId);
-    otherChatId = (
-      await createChat(db, {
-        workspaceId: otherWorkspaceId,
-        creatorId: otherUser.id,
-      })
-    ).id;
-  });
-
-  afterAll(async () => {
-    await destroyTestDb();
-  });
-
-  it('update patches content/status/metadata and bumps updatedAt', async () => {
-    const seeded = await repo.insert({
-      chatId,
-      workspaceId,
-      userId,
-      role: 'assistant',
-      content: '',
-      status: 'streaming',
-      metadata: { parts: [] } as never,
-    });
-    const before = seeded.updatedAt;
-    // Ensure a measurable timestamp delta.
-    await new Promise((r) => setTimeout(r, 5));
-
-    const updated = await repo.update(seeded.id, workspaceId, {
-      content: 'final answer',
-      status: 'completed',
-      metadata: { parts: [{ type: 'text', text: 'final answer' }] },
-    });
-
-    expect(updated).toBeDefined();
-    expect(updated!.content).toBe('final answer');
-    expect(updated!.status).toBe('completed');
-    expect((updated!.metadata as any).parts).toHaveLength(1);
-    // The 5ms sleep above guarantees a strictly-later timestamp.
-    expect(new Date(updated!.updatedAt).getTime()).toBeGreaterThan(
-      new Date(before).getTime(),
-    );
-  });
-
-  it('onlyIfStreaming update is a NO-OP once the row is finalized (race guard)', async () => {
-    // Reproduce the step-update-vs-finalize race (#183 review): the row is
-    // finalized to 'completed', then a LATE per-step 'streaming' update lands.
-    // With `onlyIfStreaming` it must match nothing and leave the finalized row
-    // untouched (no clobber back to 'streaming', no lost usage).
-    const seeded = await repo.insert({
-      chatId,
-      workspaceId,
-      userId,
-      role: 'assistant',
-      content: 'partial',
-      status: 'streaming',
-    });
-    // Terminal finalize (unguarded) wins.
-    await repo.update(seeded.id, workspaceId, {
-      content: 'final answer',
-      status: 'completed',
-      metadata: { usage: { totalTokens: 42 } } as never,
-    });
-    // A straggler per-step update arrives AFTER finalize.
-    const late = await repo.update(
-      seeded.id,
-      workspaceId,
-      { content: 'partial', status: 'streaming', metadata: {} as never },
-      { onlyIfStreaming: true },
-    );
-    expect(late).toBeUndefined(); // matched no 'streaming' row -> no-op
-    const rows = await repo.findAllByChat(chatId, workspaceId);
-    const row = rows.find((r) => r.id === seeded.id)!;
-    expect(row.status).toBe('completed'); // NOT clobbered back to streaming
-    expect(row.content).toBe('final answer');
-    expect((row.metadata as any).usage.totalTokens).toBe(42); // usage preserved
-  });
-
-  it('update is workspace-scoped: a foreign workspace id matches nothing', async () => {
-    const seeded = await repo.insert({
-      chatId,
-      workspaceId,
-      userId,
-      role: 'assistant',
-      content: 'orig',
-      status: 'streaming',
-    });
-    const res = await repo.update(seeded.id, otherWorkspaceId, {
-      status: 'completed',
-    });
-    expect(res).toBeUndefined();
-    // The row in the real workspace is untouched.
-    const rows = await repo.findAllByChat(chatId, workspaceId);
-    const stillThere = rows.find((r) => r.id === seeded.id);
-    expect(stillThere!.status).toBe('streaming');
-    // Clean up so it does not pollute the sweep test below.
-    await repo.update(seeded.id, workspaceId, { status: 'completed' });
-  });
-
-  // Backdate a row's updatedAt so it qualifies as a STALE streaming row (the
-  // sweep only flips rows untouched for >10 minutes — a live turn bumps
-  // updatedAt every step, so it would never match).
-  async function backdateUpdatedAt(
-    id: string,
-    minutesAgo: number,
-  ): Promise<void> {
-    await db
-      .updateTable('aiChatMessages')
-      .set({ updatedAt: new Date(Date.now() - minutesAgo * 60 * 1000) })
-      .where('id', '=', id)
-      .execute();
-  }
-
-  it('sweepStreaming flips STALE dangling streaming rows to aborted and counts them', async () => {
-    // Two dangling streaming rows in our workspace + one in another workspace —
-    // all backdated past the staleness threshold so the sweep picks them up.
-    const a = await createMessage(db, {
-      workspaceId,
-      chatId,
-      role: 'assistant',
-      status: 'streaming',
-    });
-    const b = await createMessage(db, {
-      workspaceId,
-      chatId,
-      role: 'assistant',
-      status: 'streaming',
-    });
-    const other = await createMessage(db, {
-      workspaceId: otherWorkspaceId,
-      chatId: otherChatId,
-      role: 'assistant',
-      status: 'streaming',
-    });
-    await backdateUpdatedAt(a.id, 20);
-    await backdateUpdatedAt(b.id, 20);
-    await backdateUpdatedAt(other.id, 20);
-
-    // A settled row must NOT be touched.
-    const done = await createMessage(db, {
-      workspaceId,
-      chatId,
-      role: 'assistant',
-      status: 'completed',
-    });
-    // A legacy NULL-status row must NOT be touched.
-    const legacy = await createMessage(db, {
-      workspaceId,
-      chatId,
-      role: 'assistant',
-      status: null,
-    });
-
-    const swept = await repo.sweepStreaming();
-    // At least the 3 stale streaming rows we created (2 here + 1 in the other ws).
-    expect(swept).toBeGreaterThanOrEqual(3);
-
-    const rows = await repo.findAllByChat(chatId, workspaceId);
-    const byId = new Map(rows.map((r) => [r.id, r]));
-    expect(byId.get(a.id)!.status).toBe('aborted');
-    expect(byId.get(b.id)!.status).toBe('aborted');
-    expect(byId.get(done.id)!.status).toBe('completed');
-    expect(byId.get(legacy.id)!.status).toBeNull();
-
-    // Idempotent: a second sweep finds nothing left in our seeded set.
-    const again = await repo.sweepStreaming();
-    const rows2 = await repo.findAllByChat(chatId, workspaceId);
-    // Our two rows stay aborted regardless of `again`'s global count.
-    expect(rows2.find((r) => r.id === a.id)!.status).toBe('aborted');
-    expect(again).toBeGreaterThanOrEqual(0);
-  });
-
-  it('sweepStreaming does NOT sweep a FRESH streaming row (recency bound, #183 review)', async () => {
-    // A row that is actively streaming (recent updatedAt) must survive the sweep:
-    // a fresh replica's boot-sweep must never abort a turn another replica is
-    // still streaming in a multi-instance deploy.
-    const fresh = await createMessage(db, {
-      workspaceId,
-      chatId,
-      role: 'assistant',
-      status: 'streaming',
-    });
-    // A STALE streaming row created alongside it IS swept — proving the sweep
-    // ran and the only difference is recency.
-    const stale = await createMessage(db, {
-      workspaceId,
-      chatId,
-      role: 'assistant',
-      status: 'streaming',
-    });
-    await backdateUpdatedAt(stale.id, 20);
-
-    await repo.sweepStreaming();
-
-    const rows = await repo.findAllByChat(chatId, workspaceId);
-    const byId = new Map(rows.map((r) => [r.id, r]));
-    // Fresh (recently-updated) streaming row is left untouched...
-    expect(byId.get(fresh.id)!.status).toBe('streaming');
-    // ...while the stale one alongside it was swept to 'aborted'.
-    expect(byId.get(stale.id)!.status).toBe('aborted');
-  });
-
-  it('findAllByChat caps the result, keeping the NEWEST messages in order (#183 review)', async () => {
-    // A dedicated chat so the cap test is independent of the rows above.
-    const cappedChat = (
-      await createChat(db, { workspaceId, creatorId: userId })
-    ).id;
-    const base = Date.now();
-    // Three messages at strictly increasing timestamps.
-    await createMessage(db, {
-      workspaceId,
-      chatId: cappedChat,
-      content: 'm1-oldest',
-      createdAt: new Date(base),
-    });
-    await createMessage(db, {
-      workspaceId,
-      chatId: cappedChat,
-      content: 'm2',
-      createdAt: new Date(base + 1000),
-    });
-    await createMessage(db, {
-      workspaceId,
-      chatId: cappedChat,
-      content: 'm3-newest',
-      createdAt: new Date(base + 2000),
-    });
-
-    // Cap of 2 -> the OLDEST message is dropped; the newest two stay, in
-    // chronological order (oldest -> newest).
-    const capped = await repo.findAllByChat(cappedChat, workspaceId, 2);
-    expect(capped.map((r) => r.content)).toEqual(['m2', 'm3-newest']);
-
-    // Without a cap (well above the row count) all three come back in order.
-    const all = await repo.findAllByChat(cappedChat, workspaceId, 100);
-    expect(all.map((r) => r.content)).toEqual(['m1-oldest', 'm2', 'm3-newest']);
-  });
-});
--- a/apps/server/test/integration/ai-mcp-server-repo.int-spec.ts
+++ b/apps/server/test/integration/ai-mcp-server-repo.int-spec.ts
@@ -1,194 +0,0 @@
-import { Kysely, sql } from 'kysely';
-import { randomUUID } from 'node:crypto';
-import { AiMcpServerRepo } from '@docmost/db/repos/ai-chat/ai-mcp-server.repo';
-import { getTestDb, destroyTestDb, createWorkspace } from './db';
-
-/**
- * AiMcpServerRepo `tool_allowlist` jsonb round-trip (PR #172 / issue #173 §3).
- *
- * The fix under test is a DB round-trip, so a unit test cannot observe it: the
- * write must land as a real jsonb ARRAY (not a double-encoded string scalar),
- * and the read must repair any legacy string-scalar rows. The read-side
- * `parseToolAllowlist` MASKS a write regression (it parses the string back), so
- * without this integration check, reverting `::text::jsonb` to `::jsonb` would
- * keep every unit test green while silently corrupting the column again.
- */
-describe('AiMcpServerRepo tool_allowlist jsonb round-trip [integration]', () => {
-  let db: Kysely<any>;
-  let repo: AiMcpServerRepo;
-  let ws: string;
-
-  beforeAll(async () => {
-    db = getTestDb();
-    repo = new AiMcpServerRepo(db as any);
-    ws = (await createWorkspace(db)).id;
-  });
-
-  afterAll(async () => {
-    await destroyTestDb();
-  });
-
-  const jsonbTypeof = async (id: string): Promise<string | null> => {
-    const res = await sql<{ t: string | null }>`
-      SELECT jsonb_typeof(tool_allowlist) AS t
-      FROM ai_mcp_servers WHERE id = ${id}
-    `.execute(db);
-    return res.rows[0]?.t ?? null;
-  };
-
-  it('insert stores the allowlist as a jsonb ARRAY (not a string scalar)', async () => {
-    const row = await repo.insert({
-      workspaceId: ws,
-      name: `srv-${randomUUID()}`,
-      transport: 'http',
-      url: 'https://example.com/mcp',
-      toolAllowlist: ['search', 'crawl'],
-    });
-
-    // The column holds a real jsonb array — the whole point of ::text::jsonb.
-    expect(await jsonbTypeof(row.id)).toBe('array');
-
-    // And the read returns a genuine string[], not a JSON string.
-    const found = await repo.findById(row.id, ws);
-    expect(found?.toolAllowlist).toEqual(['search', 'crawl']);
-    expect(Array.isArray(found?.toolAllowlist)).toBe(true);
-  });
-
-  it('an empty allowlist is normalized to null (no restriction), not []', async () => {
-    const row = await repo.insert({
-      workspaceId: ws,
-      name: `srv-${randomUUID()}`,
-      transport: 'http',
-      url: 'https://example.com/mcp',
-      toolAllowlist: [],
-    });
-    // The column is SQL NULL, so jsonb_typeof returns SQL NULL (JS null).
-    expect(await jsonbTypeof(row.id)).toBeNull();
-    expect((await repo.findById(row.id, ws))?.toolAllowlist).toBeNull();
-  });
-
-  it('repairs a legacy double-encoded (string scalar) row on read (self-heal)', async () => {
-    // Seed a row whose tool_allowlist is a jsonb STRING SCALAR holding the JSON
-    // text — exactly what the old `::jsonb` double-encoding produced.
-    const id = randomUUID();
-    await sql`
-      INSERT INTO ai_mcp_servers (id, workspace_id, name, transport, url, tool_allowlist)
-      VALUES (
-        ${id}, ${ws}, ${`srv-${id}`}, 'http', 'https://example.com/mcp',
-        to_jsonb(${'["alpha","beta"]'}::text)
-      )
-    `.execute(db);
-
-    // Sanity: the seeded column really IS the corrupt string-scalar shape.
-    expect(await jsonbTypeof(id)).toBe('string');
-
-    // The repo read heals it back to a real string[].
-    expect((await repo.findById(id, ws))?.toolAllowlist).toEqual([
-      'alpha',
-      'beta',
-    ]);
-    const enabled = await repo.listEnabled(ws);
-    const healed = enabled.find((r) => r.id === id);
-    expect(healed?.toolAllowlist).toEqual(['alpha', 'beta']);
-  });
-
-  it('FAIL-OPEN: a present-but-corrupt tool_allowlist reads back as null (no restriction)', async () => {
-    // #185 re-review pt 8: normalizeRow's fail-open branch — the column is
-    // PRESENT but does not parse into a string[] (here a jsonb string scalar
-    // holding non-array JSON). The read must degrade to `null` ("no restriction"),
-    // not crash. (A warn is logged with the server id; not asserted here.)
-    const id = randomUUID();
-    await sql`
-      INSERT INTO ai_mcp_servers (id, workspace_id, name, transport, url, tool_allowlist)
-      VALUES (
-        ${id}, ${ws}, ${`srv-${id}`}, 'http', 'https://example.com/mcp',
-        to_jsonb(${'{"not":"an array"}'}::text)
-      )
-    `.execute(db);
-    // Sanity: the column is present (a jsonb string scalar), not SQL NULL.
-    expect(await jsonbTypeof(id)).toBe('string');
-    // ...yet the read degrades to null (fail-open).
-    expect((await repo.findById(id, ws))?.toolAllowlist).toBeNull();
-  });
-});
-
-/**
- * AiMcpServerRepo `instructions` text round-trip (#180). The column is plain
- * text (no jsonb); blank/whitespace is normalized to null on both insert and
- * update so an empty guide is never persisted.
- */
-describe('AiMcpServerRepo instructions round-trip [integration]', () => {
-  let db: Kysely<any>;
-  let repo: AiMcpServerRepo;
-  let ws: string;
-
-  beforeAll(async () => {
-    db = getTestDb();
-    repo = new AiMcpServerRepo(db as any);
-    ws = (await createWorkspace(db)).id;
-  });
-
-  afterAll(async () => {
-    await destroyTestDb();
-  });
-
-  it('insert stores trimmed non-blank instructions and reads them back', async () => {
-    const row = await repo.insert({
-      workspaceId: ws,
-      name: `srv-${randomUUID()}`,
-      transport: 'http',
-      url: 'https://example.com/mcp',
-      instructions: '  Use search for fresh facts.  ',
-    });
-    expect((await repo.findById(row.id, ws))?.instructions).toBe(
-      'Use search for fresh facts.',
-    );
-  });
-
-  it('insert normalizes blank/whitespace instructions to null', async () => {
-    const row = await repo.insert({
-      workspaceId: ws,
-      name: `srv-${randomUUID()}`,
-      transport: 'http',
-      url: 'https://example.com/mcp',
-      instructions: '   ',
-    });
-    expect((await repo.findById(row.id, ws))?.instructions).toBeNull();
-  });
-
-  it('insert with omitted instructions stores null', async () => {
-    const row = await repo.insert({
-      workspaceId: ws,
-      name: `srv-${randomUUID()}`,
-      transport: 'http',
-      url: 'https://example.com/mcp',
-    });
-    expect((await repo.findById(row.id, ws))?.instructions).toBeNull();
-  });
-
-  it('update sets, clears (blank => null), and leaves unchanged when absent', async () => {
-    const row = await repo.insert({
-      workspaceId: ws,
-      name: `srv-${randomUUID()}`,
-      transport: 'http',
-      url: 'https://example.com/mcp',
-      instructions: 'initial guide',
-    });
-
-    // Set a new value.
-    await repo.update(row.id, ws, { instructions: 'updated guide' });
-    expect((await repo.findById(row.id, ws))?.instructions).toBe(
-      'updated guide',
-    );
-
-    // Absent in the patch => unchanged.
-    await repo.update(row.id, ws, { name: 'renamed' });
-    expect((await repo.findById(row.id, ws))?.instructions).toBe(
-      'updated guide',
-    );
-
-    // Blank => cleared to null.
-    await repo.update(row.id, ws, { instructions: '   ' });
-    expect((await repo.findById(row.id, ws))?.instructions).toBeNull();
-  });
-});
--- a/apps/server/test/integration/db.ts
+++ b/apps/server/test/integration/db.ts
@@ -104,8 +104,7 @@ export async function createWorkspace(
      name: overrides.name ?? `ws-${suffix}`,
      // hostname is uniquely constrained; keep it unique per workspace.
      hostname: `host-${suffix}`,
-      settings:
-        overrides.settings === undefined ? null : (overrides.settings as any),
+      settings: overrides.settings === undefined ? null : (overrides.settings as any),
    })
    .returning(['id', 'settings'])
    .executeTakeFirstOrThrow();
@@ -227,37 +226,3 @@ export async function createChat(
    .executeTakeFirstOrThrow();
  return { id: row.id as string };
 }
-
-export async function createMessage(
-  db: Kysely<any>,
-  args: {
-    workspaceId: string;
-    chatId: string;
-    userId?: string | null;
-    role?: string;
-    content?: string | null;
-    status?: string | null;
-    metadata?: unknown;
-    // Explicit timestamp so a test can control message ORDER (the default DB
-    // now() can tie within a millisecond, and the v4 id is not time-ordered).
-    createdAt?: Date;
-  },
-): Promise<{ id: string }> {
-  const id = randomUUID();
-  const row = await db
-    .insertInto('aiChatMessages')
-    .values({
-      id,
-      workspaceId: args.workspaceId,
-      chatId: args.chatId,
-      userId: args.userId ?? null,
-      role: args.role ?? 'assistant',
-      content: args.content ?? null,
-      status: args.status ?? null,
-      metadata: (args.metadata ?? null) as any,
-      ...(args.createdAt ? { createdAt: args.createdAt } : {}),
-    })
-    .returning(['id'])
-    .executeTakeFirstOrThrow();
-  return { id: row.id as string };
-}
--- a/apps/server/test/integration/workspace-repo-ai-provider-settings.int-spec.ts
+++ b/apps/server/test/integration/workspace-repo-ai-provider-settings.int-spec.ts
@@ -1,91 +0,0 @@
-import { Kysely, sql } from 'kysely';
-import { WorkspaceRepo } from '@docmost/db/repos/workspace/workspace.repo';
-import { getTestDb, destroyTestDb, createWorkspace } from './db';
-
-/**
- * WorkspaceRepo.updateAiProviderSettings numeric round-trip (#189, #213).
- *
- * `chatContextWindow` is the first NUMERIC provider field routed through this
- * generic SQL layer. The patch builder must cast a JS number so it lands in
- * jsonb as a JSON NUMBER, not the JSON STRING `"200000"` — the client guards
- * (`typeof === "number"`) reject a string, silently killing the `/ max` badge
- * denominator. A plain `::text` cast (the prior code) regressed exactly this.
- * These specs are real SQL and assert both the JS value type and the on-disk
- * `jsonb_typeof`.
- */
-describe('WorkspaceRepo.updateAiProviderSettings (numeric round-trip) [integration]', () => {
-  let db: Kysely<any>;
-  let repo: WorkspaceRepo;
-
-  beforeAll(() => {
-    db = getTestDb();
-    repo = new WorkspaceRepo(db as any);
-  });
-
-  afterAll(async () => {
-    await destroyTestDb();
-  });
-
-  it('stores chatContextWindow as a JSON number (not a "200000" string)', async () => {
-    const ws = await createWorkspace(db, { settings: undefined });
-
-    const updated = await repo.updateAiProviderSettings(ws.id, {
-      driver: 'openai',
-      chatModel: 'gpt-4o',
-      chatContextWindow: 200000,
-    });
-
-    // Returned row: the number survives as a real JS number, alongside the
-    // string fields which stay strings.
-    const provider = (updated.settings as any)?.ai?.provider;
-    expect(provider.chatContextWindow).toBe(200000);
-    expect(typeof provider.chatContextWindow).toBe('number');
-    expect(provider.driver).toBe('openai');
-    expect(provider.chatModel).toBe('gpt-4o');
-
-    // On disk: the jsonb value is typed 'number' (the must-fix assertion), and
-    // sibling string fields are typed 'string'.
-    const typed = await db
-      .selectFrom('workspaces')
-      .select([
-        sql<string>`jsonb_typeof(settings->'ai'->'provider'->'chatContextWindow')`.as(
-          'windowType',
-        ),
-        sql<string>`jsonb_typeof(settings->'ai'->'provider'->'chatModel')`.as(
-          'modelType',
-        ),
-      ])
-      .where('id', '=', ws.id)
-      .executeTakeFirstOrThrow();
-
-    expect(typed.windowType).toBe('number');
-    expect(typed.modelType).toBe('string');
-  });
-
-  it('re-reads chatContextWindow as a number after a partial-merge update', async () => {
-    const ws = await createWorkspace(db, {
-      settings: { ai: { provider: { driver: 'openai', chatModel: 'x' } } },
-    });
-
-    // Merge in only the numeric field; siblings must be preserved and the value
-    // must still be a JSON number, not a string.
-    await repo.updateAiProviderSettings(ws.id, { chatContextWindow: 128000 });
-
-    const row = await db
-      .selectFrom('workspaces')
-      .select([
-        'settings',
-        sql<string>`jsonb_typeof(settings->'ai'->'provider'->'chatContextWindow')`.as(
-          'windowType',
-        ),
-      ])
-      .where('id', '=', ws.id)
-      .executeTakeFirstOrThrow();
-
-    expect(row.windowType).toBe('number');
-    const provider = (row.settings as any)?.ai?.provider;
-    expect(provider.chatContextWindow).toBe(128000);
-    expect(provider.driver).toBe('openai');
-    expect(provider.chatModel).toBe('x');
-  });
-});
--- a/packages/editor-ext/src/lib/footnote/footnote-numbering.ts
+++ b/packages/editor-ext/src/lib/footnote/footnote-numbering.ts
@@ -1,15 +1,14 @@
-import { EditorState, Plugin, PluginKey } from '@tiptap/pm/state';
-import { Decoration, DecorationSet } from '@tiptap/pm/view';
-import { Node as ProseMirrorNode } from '@tiptap/pm/model';
+import { EditorState, Plugin, PluginKey } from "@tiptap/pm/state";
+import { Decoration, DecorationSet } from "@tiptap/pm/view";
+import { Node as ProseMirrorNode } from "@tiptap/pm/model";
 import {
  FOOTNOTE_DEFINITION_NAME,
  FOOTNOTE_REFERENCE_NAME,
  computeFootnoteNumbers,
-  computeFootnoteRefCounts,
-} from './footnote-util';
+} from "./footnote-util";

 export const footnoteNumberingPluginKey = new PluginKey<FootnoteNumberingState>(
-  'footnoteNumbering',
+  "footnoteNumbering",
 );

 /**
@@ -22,9 +21,6 @@ export const footnoteNumberingPluginKey = new PluginKey<FootnoteNumberingState>(
 interface FootnoteNumberingState {
  /** referenceId -> 1-based display number, for the current doc. */
  numbers: Map<string, number>;
-  /** referenceId -> number of reference occurrences (>= 1), for the definition's
-   *  multi-backlink UI (#168). */
-  refCounts: Map<string, number>;
  /** Decorations rendering those numbers (refs + definitions). */
  decorations: DecorationSet;
 }
@@ -50,7 +46,6 @@ function buildFootnoteNumberingState(
  doc: ProseMirrorNode,
 ): FootnoteNumberingState {
  const numbers = computeFootnoteNumbers(doc);
-  const refCounts = computeFootnoteRefCounts(doc);
  const decorations: Decoration[] = [];

  doc.descendants((node, pos) => {
@@ -59,7 +54,7 @@ function buildFootnoteNumberingState(
      if (num != null) {
        decorations.push(
          Decoration.node(pos, pos + node.nodeSize, {
-            'data-footnote-number': String(num),
+            "data-footnote-number": String(num),
            style: `--footnote-number: "${num}";`,
          }),
        );
@@ -70,7 +65,7 @@ function buildFootnoteNumberingState(
      if (num != null) {
        decorations.push(
          Decoration.node(pos, pos + node.nodeSize, {
-            'data-footnote-number': String(num),
+            "data-footnote-number": String(num),
            style: `--footnote-number: "${num}";`,
          }),
        );
@@ -78,11 +73,7 @@ function buildFootnoteNumberingState(
    }
  });

-  return {
-    numbers,
-    refCounts,
-    decorations: DecorationSet.create(doc, decorations),
-  };
+  return { numbers, decorations: DecorationSet.create(doc, decorations) };
 }

 /**
@@ -99,16 +90,6 @@ export function getFootnoteNumber(
  return footnoteNumberingPluginKey.getState(state)?.numbers.get(id);
 }

-/**
- * Read the cached reference-occurrence count for `id` (how many `[^id]` links
- * point at this definition). Drives the definition's multi-backlink UI (#168):
- * `> 1` renders ↩ a b c …, each scrolling to its own occurrence. Returns 0 when
- * the plugin is not installed or the id is unknown (caller treats as single).
- */
-export function getFootnoteRefCount(state: EditorState, id: string): number {
-  return footnoteNumberingPluginKey.getState(state)?.refCounts.get(id) ?? 0;
-}
-
 /**
 * ProseMirror plugin that renders footnote numbers as decorations. It never
 * mutates the document (safe in read-only / share and in collaboration) — it
--- a/packages/editor-ext/src/lib/footnote/footnote-reference.ts
+++ b/packages/editor-ext/src/lib/footnote/footnote-reference.ts
@@ -1,14 +1,14 @@
-import { mergeAttributes, Node } from '@tiptap/core';
-import { TextSelection, Transaction } from '@tiptap/pm/state';
-import { ReactNodeViewRenderer } from '@tiptap/react';
+import { mergeAttributes, Node } from "@tiptap/core";
+import { TextSelection, Transaction } from "@tiptap/pm/state";
+import { ReactNodeViewRenderer } from "@tiptap/react";
 import {
  FOOTNOTE_DEFINITION_NAME,
  FOOTNOTE_REFERENCE_NAME,
  FOOTNOTES_LIST_NAME,
  generateFootnoteId,
-} from './footnote-util';
-import { footnoteNumberingPlugin } from './footnote-numbering';
-import { footnoteSyncPlugin, footnotePastePlugin } from './footnote-sync';
+} from "./footnote-util";
+import { footnoteNumberingPlugin } from "./footnote-numbering";
+import { footnoteSyncPlugin, footnotePastePlugin } from "./footnote-sync";

 export interface FootnoteReferenceOptions {
  HTMLAttributes: Record<string, any>;
@@ -27,7 +27,7 @@ export interface FootnoteReferenceOptions {
  enableSync?: boolean;
 }

-declare module '@tiptap/core' {
+declare module "@tiptap/core" {
  interface Commands<ReturnType> {
    footnote: {
      /**
@@ -42,11 +42,8 @@ declare module '@tiptap/core' {
      removeFootnote: (id: string) => ReturnType;
      /** Scroll to (and focus) a footnote definition by id. */
      scrollToFootnote: (id: string) => ReturnType;
-      /** Scroll to a footnote reference by id. `index` selects WHICH occurrence
-       *  to scroll to when the id is referenced more than once (reuse, #166):
-       *  0-based, defaults to the first. Used by the definition's multi-backlink
-       *  UI (#168). */
-      scrollToReference: (id: string, index?: number) => ReturnType;
+      /** Scroll to (and select) a footnote reference by id. */
+      scrollToReference: (id: string) => ReturnType;
    };
  }
 }
@@ -69,7 +66,7 @@ export const FootnoteReference = Node.create<FootnoteReferenceOptions>({
  // Superscript mark's <sup> rule.
  priority: 101,

-  group: 'inline',
+  group: "inline",
  inline: true,
  atom: true,
  selectable: true,
@@ -102,10 +99,10 @@ export const FootnoteReference = Node.create<FootnoteReferenceOptions>({
    return {
      id: {
        default: null,
-        parseHTML: (element) => element.getAttribute('data-id'),
+        parseHTML: (element) => element.getAttribute("data-id"),
        renderHTML: (attributes) => {
          if (!attributes.id) return {};
-          return { 'data-id': attributes.id };
+          return { "data-id": attributes.id };
        },
      },
    };
@@ -116,7 +113,7 @@ export const FootnoteReference = Node.create<FootnoteReferenceOptions>({
      {
        // High priority so the Superscript mark (which also matches <sup>) does
        // not claim a footnote reference and drop it as empty content.
-        tag: 'sup[data-footnote-ref]',
+        tag: "sup[data-footnote-ref]",
        priority: 100,
      },
    ];
@@ -124,9 +121,9 @@ export const FootnoteReference = Node.create<FootnoteReferenceOptions>({

  renderHTML({ HTMLAttributes }) {
    return [
-      'sup',
+      "sup",
      mergeAttributes(
-        { 'data-footnote-ref': '', class: 'footnote-ref' },
+        { "data-footnote-ref": "", class: "footnote-ref" },
        this.options.HTMLAttributes,
        HTMLAttributes,
      ),
@@ -135,7 +132,7 @@ export const FootnoteReference = Node.create<FootnoteReferenceOptions>({

  // Plain-text representation (used by generateText / markdown text fallbacks).
  renderText({ node }) {
-    return `[^${node.attrs.id ?? ''}]`;
+    return `[^${node.attrs.id ?? ""}]`;
  },

  addNodeView() {
@@ -173,10 +170,8 @@ export const FootnoteReference = Node.create<FootnoteReferenceOptions>({

          // Make sure the parent accepts an inline atom here.
          const insertPos = selection.from;
-          if (
-            !$from.parent.type.spec.content?.includes('inline') &&
-            !$from.parent.isTextblock
-          ) {
+          if (!$from.parent.type.spec.content?.includes("inline") &&
+              !$from.parent.isTextblock) {
            return false;
          }

@@ -316,23 +311,19 @@ export const FootnoteReference = Node.create<FootnoteReferenceOptions>({
            `[data-footnote-def][data-id="${id}"]`,
          ) as HTMLElement | null;
          if (!dom) return false;
-          dom.scrollIntoView({ behavior: 'smooth', block: 'center' });
+          dom.scrollIntoView({ behavior: "smooth", block: "center" });
          return true;
        },

      scrollToReference:
-        (id: string, index = 0) =>
+        (id: string) =>
        ({ editor }) => {
          if (!id) return false;
-          // querySelectorAll returns the occurrences in document order, so the
-          // index maps 1:1 to the definition's a/b/c backlink (#168). Fall back
-          // to the first match for an out-of-range index.
-          const matches = editor.view.dom.querySelectorAll(
+          const dom = editor.view.dom.querySelector(
            `sup[data-footnote-ref][data-id="${id}"]`,
-          );
-          const dom = (matches[index] ?? matches[0]) as HTMLElement | undefined;
+          ) as HTMLElement | null;
          if (!dom) return false;
-          dom.scrollIntoView({ behavior: 'smooth', block: 'center' });
+          dom.scrollIntoView({ behavior: "smooth", block: "center" });
          return true;
        },
    };
--- a/packages/editor-ext/src/lib/footnote/footnote-util.ts
+++ b/packages/editor-ext/src/lib/footnote/footnote-util.ts
@@ -1,12 +1,12 @@
-import { Node as ProseMirrorNode } from '@tiptap/pm/model';
+import { Node as ProseMirrorNode } from "@tiptap/pm/model";

 /**
 * Node type names for the footnote feature. Centralized so every part of the
 * feature (nodes, plugins, commands) references the same string.
 */
-export const FOOTNOTE_REFERENCE_NAME = 'footnoteReference';
-export const FOOTNOTES_LIST_NAME = 'footnotesList';
-export const FOOTNOTE_DEFINITION_NAME = 'footnoteDefinition';
+export const FOOTNOTE_REFERENCE_NAME = "footnoteReference";
+export const FOOTNOTES_LIST_NAME = "footnotesList";
+export const FOOTNOTE_DEFINITION_NAME = "footnoteDefinition";

 /**
 * Generate a uuidv7-style id (time-ordered). Implemented locally so editor-ext
@@ -15,10 +15,10 @@ export const FOOTNOTE_DEFINITION_NAME = 'footnoteDefinition';
 */
 export function generateFootnoteId(): string {
  const now = Date.now();
-  const timeHex = now.toString(16).padStart(12, '0');
+  const timeHex = now.toString(16).padStart(12, "0");

  const rand = (length: number) => {
-    let out = '';
+    let out = "";
    for (let i = 0; i < length; i++) {
      out += Math.floor(Math.random() * 16).toString(16);
    }
@@ -26,19 +26,19 @@ export function generateFootnoteId(): string {
  };

  // version 7 nibble, then variant (8..b) nibble.
-  const versioned = '7' + rand(3);
+  const versioned = "7" + rand(3);
  const variantNibble = (8 + Math.floor(Math.random() * 4)).toString(16);
  const variant = variantNibble + rand(3);

  return (
    timeHex.slice(0, 8) +
-    '-' +
+    "-" +
    timeHex.slice(8, 12) +
-    '-' +
+    "-" +
    versioned +
-    '-' +
+    "-" +
    variant +
-    '-' +
+    "-" +
    rand(12)
  );
 }
@@ -89,7 +89,7 @@ export function deriveFootnoteId(
 * Purely deterministic.
 */
 function suffix(n: number): string {
-  let out = '';
+  let out = "";
  let x = n;
  while (x > 0) {
    const rem = (x - 1) % 25;
@@ -131,19 +131,3 @@ export function computeFootnoteNumbers(
  }
  return numbers;
 }
-
-/**
- * Build a map of `referenceId -> number of reference occurrences` (>= 1) from
- * document order. After #166 the same id may be referenced multiple times
- * (reuse: one number, one definition, N forward links); this count drives the
- * definition's multi-backlink UI (↩ a b c …, #168). Pure function of the doc.
- */
-export function computeFootnoteRefCounts(
-  doc: ProseMirrorNode,
-): Map<string, number> {
-  const counts = new Map<string, number>();
-  for (const id of collectReferenceIds(doc)) {
-    counts.set(id, (counts.get(id) ?? 0) + 1);
-  }
-  return counts;
-}
--- a/packages/editor-ext/src/lib/footnote/footnote.test.ts
+++ b/packages/editor-ext/src/lib/footnote/footnote.test.ts
--- a/packages/git-sync/build/engine/client.types.d.ts
+++ b/packages/git-sync/build/engine/client.types.d.ts
@@ -1,109 +0,0 @@
-/**
- * The client seam. `pull.ts`/`push.ts` depend on a narrow STRUCTURAL interface
- * rather than any concrete client, because the gitmost server writes NATIVELY —
- * through repositories + collab `openDirectConnection`.
- *
- * `GitSyncClient` is that interface: the native datasource (server side)
- * implements it, and the engine only ever uses `Pick<GitSyncClient, ...>`
- * subsets of it. The signatures below MIRROR exactly the methods the engine's
- * `pull.ts`/`push.ts` actually call (arg shapes + the fields the engine reads
- * off each result), so a REST-style client is still structurally assignable and
- * the native adapter has a precise contract.
- */
-/**
- * A page node as returned by `listSpaceTree` (the sidebar/tree walk, no body).
- * The engine layout (`buildVaultLayout`) consumes `PageNode` from `./layout`,
- * which only requires `id` (+ optional `title`/`slugId`/`parentPageId`); this
- * lite shape documents the fields the tree walk surfaces. Real tree nodes also
- * carry `position`, `icon`, `hasChildren` — kept open via the index signature.
- */
-export interface GitSyncPageNodeLite {
-    id: string;
-    slugId?: string;
-    title?: string;
-    parentPageId?: string | null;
-    hasChildren?: boolean;
-    /** `listSpaceTree` nodes carry extra fields (position, icon, …). */
-    [key: string]: unknown;
-}
-/**
- * The structural client the engine depends on. Only `Pick<GitSyncClient, ...>`
- * subsets are ever used:
- *   - pull reads:  `getPageJson` (+ the tree walk's `listSpaceTree`),
- *   - push writes: `importPageMarkdown` / `createPage` / `deletePage` /
- *                  `movePage` / `renamePage`,
- *   - continuous (phase B+): `listRecentSince` / `listTrash` / `restorePage`.
- */
-export interface GitSyncClient {
-    /**
-     * Full tree of page nodes for the space (or the subtree rooted at
-     * `rootPageId`), each WITHOUT body content. `complete` is `false` when the
-     * walk was truncated / a fetch failed — the pull side suppresses absence
-     * deletions on an incomplete tree (SPEC §8). Native impl returns
-     * `complete: true` always (reads the DB, not a paginated REST endpoint).
-     */
-    listSpaceTree(spaceId: string, rootPageId?: string): Promise<{
-        pages: GitSyncPageNodeLite[];
-        complete: boolean;
-    }>;
-    /**
-     * One page WITH its ProseMirror body content. `applyPullActions` reads
-     * `id`, `slugId`, `title`, `parentPageId`, `spaceId` (for the file meta) and
-     * `content` (to stabilize/serialize). `updatedAt` is carried for the
-     * poll-suppression loop-guard.
-     */
-    getPageJson(pageId: string): Promise<{
-        id: string;
-        slugId: string;
-        title: string;
-        parentPageId: string | null;
-        spaceId: string;
-        updatedAt: string;
-        content: unknown;
-    }>;
-    /**
-     * Merge a page's body from a self-contained markdown file (meta + body). The
-     * collab/Yjs write path (SPEC §2/§15.6) — never a raw jsonb overwrite.
-     * `applyPushActions` reads only an optional `updatedAt` off the result
-     * (via `extractUpdatedAt`, tolerant of extra fields).
-     *
-     * `baseMarkdown` is the last-synced version of the file (`refs/docmost/
-     * last-pushed`), the common ancestor for a THREE-WAY merge against the live
-     * doc so concurrent human edits survive (review #5). Optional/null -> 2-way.
-     */
-    importPageMarkdown(pageId: string, fullMarkdown: string, baseMarkdown?: string | null): Promise<{
-        updatedAt?: string;
-        [key: string]: unknown;
-    }>;
-    /**
-     * Create a new page and return the assigned id at `data.id`
-     * (`applyPushActions` reads `result.data.id`, then writes it back into the
-     * file's meta). An optional top-level/`data.updatedAt` feeds the loop-guard.
-     */
-    createPage(title: string, content: string, spaceId: string, parentPageId?: string): Promise<{
-        data: {
-            id: string;
-        };
-        updatedAt?: string;
-        [key: string]: unknown;
-    }>;
-    /** Soft-delete a page to Trash (SPEC §8). Result is not inspected. */
-    deletePage(pageId: string): Promise<unknown>;
-    /**
-     * Reparent a page (and optionally set its fractional-index `position`). The
-     * engine passes `position` UNDEFINED for now; the native impl computes a
-     * default between siblings. Result is not inspected.
-     */
-    movePage(pageId: string, parentPageId: string | null, position?: string): Promise<unknown>;
-    /** Change a page's title only (no body touch). Result is not inspected. */
-    renamePage(pageId: string, title: string): Promise<unknown>;
-    /**
-     * Pages updated since `sinceIso` (the poll-safety reconciliation, SPEC §8).
-     * `spaceId` may be undefined (all spaces); `hardPageCap` bounds the walk.
-     */
-    listRecentSince(spaceId: string | undefined, sinceIso: string | null, hardPageCap?: number): Promise<unknown[]>;
-    /** List soft-deleted (trashed) pages for the space (deletion detection). */
-    listTrash(spaceId: string): Promise<unknown[]>;
-    /** Restore a soft-deleted page from Trash. Result is not inspected. */
-    restorePage(pageId: string): Promise<unknown>;
-}
--- a/packages/git-sync/build/engine/client.types.js
+++ b/packages/git-sync/build/engine/client.types.js
@@ -1,13 +0,0 @@
-/**
- * The client seam. `pull.ts`/`push.ts` depend on a narrow STRUCTURAL interface
- * rather than any concrete client, because the gitmost server writes NATIVELY —
- * through repositories + collab `openDirectConnection`.
- *
- * `GitSyncClient` is that interface: the native datasource (server side)
- * implements it, and the engine only ever uses `Pick<GitSyncClient, ...>`
- * subsets of it. The signatures below MIRROR exactly the methods the engine's
- * `pull.ts`/`push.ts` actually call (arg shapes + the fields the engine reads
- * off each result), so a REST-style client is still structurally assignable and
- * the native adapter has a precise contract.
- */
-export {};
--- a/packages/git-sync/build/engine/config-errors.d.ts
+++ b/packages/git-sync/build/engine/config-errors.d.ts
@@ -1 +0,0 @@
-export declare function loadSettingsOrExit<T>(factory: () => T): T;
--- a/packages/git-sync/build/engine/config-errors.js
+++ b/packages/git-sync/build/engine/config-errors.js
@@ -1,50 +0,0 @@
-import { ZodError } from 'zod';
-// Turn a ZodError from settings validation into a clear, actionable startup
-// message that names the offending env var(s), then exit(1) — no raw stack
-// trace. Mirrors the Python new-project skeleton's load_settings_or_exit.
-// A non-ZodError is left to propagate unchanged.
-export function loadSettingsOrExit(factory) {
-    try {
-        return factory();
-    }
-    catch (err) {
-        if (!(err instanceof ZodError))
-            throw err;
-        const missing = [];
-        const invalid = [];
-        for (const issue of err.issues) {
-            const name = issue.path.length ? String(issue.path[0]) : '?';
-            // A missing required variable surfaces as an `invalid_type` issue whose
-            // received value was `undefined`. zod 3 exposed `issue.received` directly;
-            // zod 4 dropped that field and instead folds it into the message
-            // ("expected string, received undefined"). Detect both shapes so the
-            // missing-vs-invalid split holds across zod majors. NOTE: an invalid (but
-            // present) value uses a different code (invalid_format / invalid_value) or
-            // an `invalid_type` message that reports a non-undefined received (e.g.
-            // "received NaN" from a coerced number), so neither is misread as missing.
-            const i = issue;
-            const isMissing = issue.code === 'invalid_type' &&
-                (i.received === 'undefined' ||
-                    /received undefined/i.test(i.message ?? ''));
-            if (isMissing)
-                missing.push(name);
-            else
-                invalid.push(`${name}: ${issue.message}`);
-        }
-        const lines = ['Configuration error in environment / .env:'];
-        if (missing.length) {
-            lines.push('  Missing required variable(s):');
-            for (const n of [...new Set(missing)])
-                lines.push(`    - ${n}`);
-        }
-        if (invalid.length) {
-            lines.push('  Invalid value(s):');
-            for (const item of invalid)
-                lines.push(`    - ${item}`);
-        }
-        lines.push('');
-        lines.push('Set them in .env (see .env.example) and try again.');
-        process.stderr.write(lines.join('\n') + '\n');
-        process.exit(1);
-    }
-}
--- a/packages/git-sync/build/engine/cycle.d.ts
+++ b/packages/git-sync/build/engine/cycle.d.ts
@@ -1,70 +0,0 @@
-import { VaultGit } from "./git.js";
-import { GitSyncClient } from "./client.types.js";
-import { Settings } from "./settings.js";
-/**
- * Absolute-path filesystem primitives the cycle needs. Injected (not imported)
- * so the engine stays IO-free and unit-testable. `mkdir` is recursive; `rm` is
- * force (a missing file is a no-op).
- */
-export interface CycleFs {
-    readFile: (absPath: string) => Promise<string>;
-    writeFile: (absPath: string, text: string) => Promise<void>;
-    mkdir: (absDir: string) => Promise<void>;
-    rm: (absPath: string) => Promise<void>;
-}
-export interface RunCycleDeps {
-    spaceId: string;
-    /** The Docmost seam (reads for pull, writes for push). */
-    client: GitSyncClient;
-    /** The per-space git vault (a real working repo). */
-    vault: VaultGit;
-    /** Engine settings; `vaultPath` roots the relPath -> absolute-path mapping. */
-    settings: Settings;
-    fs: CycleFs;
-    log: (line: string) => void;
-    /**
-     * Delete-cap hook (the ONLY caller-specific policy). Called with the push
-     * dry-run's planned delete count (`Number.POSITIVE_INFINITY` when the dry-run
-     * itself failed, so the hook can fail safe) and the live client; returns the
-     * client to use for the REAL apply. The default (omitted) applies every op
-     * unmodified. gitmost uses it to neutralize deletes when over its cap.
-     *
-     * When omitted, NO dry-run is performed (one fewer push planning pass).
-     */
-    resolveApplyClient?: (plannedDeletes: number, client: GitSyncClient) => GitSyncClient;
-}
-export interface RunCycleResult {
-    ran: boolean;
-    /** Set when the cycle short-circuited without running pull/push. */
-    skipped?: "merge-in-progress";
-    pull?: {
-        written: number;
-        deleted: number;
-        conflict: boolean;
-    };
-    push?: {
-        mode: string;
-        failures: number;
-    };
-}
-/**
- * Run ONE full reconcile cycle for a space: PULL (Docmost -> vault) then PUSH
- * (vault -> Docmost), under the engine's required branch choreography. This is
- * the single entry point the app drives — it owns the staging order so it can
- * never drift from the engine it ships with.
- *
- * Staging (the ⭐ data-loss-critical order, SPEC §6/§9):
- *   1. assertGitAvailable + ensureRepo (the git state store must exist).
- *   2. refuse on an unresolved merge (a prior conflicting pull); next checkout
- *      would fail otherwise.
- *   3. ensureBranch('docmost','main') + checkout('docmost'). Pull writes MUST
- *      land on `docmost`, not `main`: applyPullActions commits on `docmost`,
- *      then checks out `main` and merges docmost -> main. Writing Docmost
- *      content straight onto `main` would clobber local file edits before push
- *      can diff them.
- *   4. PULL: readExisting -> listSpaceTree -> computePullActions -> apply.
- *   5. PUSH: optional dry-run to feed the delete-cap hook, then the real apply.
- *
- * Lock + cap POLICY live in the caller; this owns only the mechanics.
- */
-export declare function runCycle(deps: RunCycleDeps): Promise<RunCycleResult>;
--- a/Show More
+++ b/Show More
				`@@ -1 +0,0 @@`
				`export declare function loadSettingsOrExit<T>(factory: () => T): T;`