fix(ai): store chatContextWindow as a JSON number, not a ::text string

chatContextWindow (#189) is the first numeric provider field routed through WorkspaceRepo.updateAiProviderSettings, whose patch builder cast every value as `${v}::text`. The DTO validates it as @IsInt(), so a JS number 200000 was stored as the JSON STRING "200000". The client guards require `typeof === "number"` (ai-chat-window.tsx, context-badge.tsx), so the `/ max` badge denominator never rendered and the whole feature silently no-opped. Branch the jsonb_build_object value cast by JS runtime type: numbers -> ::numeric (real JSON number), booleans -> ::boolean, everything else -> ::text (unchanged for the existing string fields). This is the root fix (store as a real number) rather than coercing on read, so every reader sees the correct type. Add a DB round-trip int-spec asserting jsonb_typeof(settings->'ai'->'provider'->'chatContextWindow') = 'number' and that the value re-reads as the number 200000, including the partial-merge path. CHANGELOG: Added entry for the chatContextWindow setting and a Changed entry for the badge's new "used / max" meaning. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
feat(ai-chat): context badge shows current/max (#189 )
2026-06-26 17:19:34 +03:00 · 2026-06-26 06:27:45 +03:00
23 changed files with 446 additions and 613 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -43,6 +43,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
  OpenRouter, etc.; `openai` uses the official provider (real-OpenAI
  reasoning-model request shaping). Chosen explicitly rather than inferred from
  the base URL, since a custom URL can front real OpenAI too. (#175, #177)
+- **AI chat "Context window (tokens)" setting (`chatContextWindow`).** A new
+  admin field in AI settings that records the chat model's context-window size.
+  When set (> 0) it becomes the denominator of the header context-badge, which
+  now reads "used / max"; `0`/empty clears the limit and the badge shows only
+  the current context as before. There is no provider-independent way to read a
+  model's window automatically, so it is an explicit workspace-level value.
+  (#189)
 - **Per-MCP-server instructions in the agent prompt.** Each external MCP server
  now has an admin-authored `instructions` field ("how/when to use this server's
  tools") that is injected into the agent's system prompt next to that server's
@@ -61,6 +68,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
  model's reasoning out of the box. An endpoint that is real OpenAI behind a
  custom base URL should set the new `chatApiStyle` "Protocol" to `openai`. (#177)

+- **AI chat header context-badge now shows "used / max".** When an admin sets
+  the new `chatContextWindow`, the badge displays the current context size over
+  the configured window (e.g. `120k / 200k`) instead of switching to a live
+  per-turn token counter during streaming. With no window configured the badge
+  keeps showing just the current context. (#189)
+
 - **Footnotes now reuse (Pandoc semantics).** Multiple `[^a]` references to the
  same id are ONE footnote — one number, one definition, several back-references
  — instead of being renamed to `a__2`, `a__3`. Duplicate `[^a]:` definitions are
@@ -92,16 +105,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
  no longer froze on the previous step's authoritative usage; the current step's
  estimate is combined per-component with `max`, so the count rises smoothly and
  never jumps backwards. (#163)
- **AI chat: "New chat" pressed during the first turn's stream now resets the
-  thread instead of leaving the old turn streaming.** While a brand-new,
-  not-yet-adopted chat streamed its first turn, hitting "New chat" left
-  `activeChatId === null` (a no-op for the atom), so the reconciler never
-  remounted and the in-flight thread kept streaming behind the fresh one — and a
-  late refetch / late `onFinish` from that abandoned thread could yank the user
-  back into the chat they just left. "New chat" now forces a fresh empty thread
-  unconditionally and the finished thread's mount key is checked so a late
-  callback from an abandoned thread no longer adopts or re-arms the fallback.
-  (#161)

 ## [0.93.0] - 2026-06-21

--- a/apps/client/public/locales/en-US/translation.json
+++ b/apps/client/public/locales/en-US/translation.json
@@ -1168,7 +1168,10 @@
  "Built-in assistant persona": "Built-in assistant persona",
  "Minimize": "Minimize",
  "Current context size": "Current context size",
-  "Tokens generated this turn": "Tokens generated this turn",
+  "Context size / model limit": "Context size / model limit",
+  "Context window (tokens)": "Context window (tokens)",
+  "Shows used / total in the chat header badge; empty hides the total.": "Shows used / total in the chat header badge; empty hides the total.",
+  "e.g. 200000": "e.g. 200000",
  "AI agent": "AI agent",
  "Take a look at the current document": "Take a look at the current document",
  "AI agent is typing…": "AI agent is typing…",
--- a/apps/client/public/locales/ru-RU/translation.json
+++ b/apps/client/public/locales/ru-RU/translation.json
@@ -705,7 +705,10 @@
  "Copy chat": "Копировать чат",
  "Created successfully": "Успешно создано",
  "Current context size": "Текущий размер контекста",
-  "Tokens generated this turn": "Токенов сгенерировано за ход",
+  "Context size / model limit": "Размер контекста / лимит модели",
+  "Context window (tokens)": "Размер окна контекста (токены)",
+  "Shows used / total in the chat header badge; empty hides the total.": "Показывает использовано/всего в шапке чата; пусто — скрыть лимит.",
+  "e.g. 200000": "напр. 200000",
  "Delete this chat?": "Удалить этот чат?",
  "Deleted successfully": "Успешно удалено",
  "Edited by AI agent on behalf of {{name}}": "Отредактировано AI-агентом от имени {{name}}",
--- a/apps/client/src/features/ai-chat/components/ai-chat-window.tsx
+++ b/apps/client/src/features/ai-chat/components/ai-chat-window.tsx
@@ -6,7 +6,7 @@ import {
  useRef,
  useState,
 } from "react";
-import { Group, Loader, Tooltip } from "@mantine/core";
+import { Group, Loader } from "@mantine/core";
 import {
  IconArrowsDiagonal,
  IconCheck,
@@ -39,6 +39,7 @@ import {
 } from "@/features/ai-chat/queries/ai-chat-query.ts";
 import ConversationList from "@/features/ai-chat/components/conversation-list.tsx";
 import ChatThread from "@/features/ai-chat/components/chat-thread.tsx";
+import { ContextBadge } from "@/features/ai-chat/components/context-badge.tsx";
 import { exportAiChat } from "@/features/ai-chat/services/ai-chat-service.ts";
 import { useChatSession } from "@/features/ai-chat/hooks/use-chat-session.ts";
 import {
@@ -60,13 +61,6 @@ const MIN_HEIGHT = 400;
 // Margin kept between the window and the viewport edges while dragging.
 const EDGE_MARGIN = 8;

-/** Compact token formatter: 1.2M / 3.4k / 950. */
-function formatTokens(n: number): string {
-  if (n >= 1_000_000) return `${(n / 1_000_000).toFixed(1)}M`;
-  if (n >= 1_000) return `${(n / 1_000).toFixed(1)}k`;
-  return String(n);
-}
-
 // Compute the initial top-right placement at the default size, fitted to the
 // current viewport. Reads `window` only when called (inside an effect).
 function computeInitialGeom() {
@@ -161,12 +155,6 @@ export default function AiChatWindow() {
  const { data: messageRows, isLoading: messagesLoading } =
    useAiChatMessagesQuery(activeChatId ?? undefined);

-  // Live turn-token total (reasoning + output) for the in-flight turn, pushed up
-  // (THROTTLED to ~8 Hz inside ChatThread) so the header badge ticks mid-stream.
-  // `null` means no turn is in flight -> the badge falls back to the persisted
-  // context size below.
-  const [liveTurnTokens, setLiveTurnTokens] = useState<number | null>(null);
-
  // The page the user is currently viewing. AiChatWindow lives in a pathless
  // parent layout route, so useParams() can't see :pageSlug. Match the full
  // pathname against the authenticated page route instead so "the current page"
@@ -195,7 +183,6 @@ export default function AiChatWindow() {
    waitingForHistory,
    onTurnFinished,
    onServerChatId,
-    startFreshThread,
    cancelPendingAdoption,
  } = useChatSession({
    activeChatId,
@@ -210,26 +197,18 @@ export default function AiChatWindow() {

  // startNewChat/selectChat set the public atom; the hook's render-phase
  // reconciler handles the remount when activeChatId actually CHANGES. But
-  // pressing "New chat" while already in a not-yet-adopted new chat leaves
-  // activeChatId === null (a no-op for the atom), so the reconciler never fires —
-  // startFreshThread forces the remount unconditionally (and disarms any armed
-  // error-path fallback) so a late refetch can't yank the user into a just-failed
-  // chat after they chose a fresh one (#161).
+  // pressing "New chat" while already in a new chat leaves activeChatId === null
+  // (a no-op for the atom), so the reconciler never fires — explicitly disarm any
+  // armed error-path fallback here so a late refetch can't yank the user into a
+  // just-failed chat after they chose a fresh one.
  const startNewChat = useCallback((): void => {
-    // Force a fresh thread UNCONDITIONALLY. On a brand-new, not-yet-adopted chat
-    // (activeChatId === null) whose first turn is still streaming, setActiveChatId
-    // (null) is a no-op and the render-phase reconciler would not remount — leaving
-    // the streaming thread in place (#161). startFreshThread guarantees a clean
-    // remount and already disarms any armed error-path fallback (so a separate
-    // cancelPendingAdoption() call here is redundant); the abandoned thread's late
-    // finish is rejected by the threadKey guard in the session hook.
-    startFreshThread();
+    cancelPendingAdoption();
    setActiveChatId(null);
    setHistoryOpen(false);
    setDraft("");
    // Default the picker back to "Universal assistant" for the fresh chat.
    setSelectedRoleId(null);
-  }, [startFreshThread, setActiveChatId, setDraft, setSelectedRoleId]);
+  }, [cancelPendingAdoption, setActiveChatId, setDraft, setSelectedRoleId]);

  const selectChat = useCallback(
    (chatId: string): void => {
@@ -315,6 +294,21 @@ export default function AiChatWindow() {
    return 0;
  }, [activeChatId, messageRows]);

+  // The model's context-window size (badge denominator), read from the most
+  // recent assistant row that carries it. Admin-configured in AI settings and
+  // stamped onto the turn server-side, so it travels with the message metadata —
+  // no client-side model resolution, and it survives public shares / per-role
+  // models automatically. 0 (no limit configured, or older rows) → the badge
+  // hides the denominator and shows only the current context size.
+  const maxContextTokens = useMemo(() => {
+    if (!activeChatId || !messageRows) return 0;
+    for (let i = messageRows.length - 1; i >= 0; i--) {
+      const max = messageRows[i].metadata?.maxContextTokens;
+      if (typeof max === "number" && max > 0) return max;
+    }
+    return 0;
+  }, [activeChatId, messageRows]);
+
  // On (re)open, settle the geometry before paint (useLayoutEffect → no
  // first-frame jump): compute an initial top-right placement the first time,
  // and re-clamp an existing geometry to the current viewport on later opens
@@ -504,23 +498,14 @@ export default function AiChatWindow() {
        )}

        <div style={{ flex: 1, display: "flex", justifyContent: "center" }}>
-          {/* While a turn streams, show the LIVE turn-token count (ticks ~8 Hz);
-              once it finishes, fall back to the persisted context size. Require
-              > 0 so the very first emit (an empty tail message, count 0) does not
-              flash a "0" badge before any token streams in (#151 review). */}
-          {liveTurnTokens !== null && liveTurnTokens > 0 ? (
-            <Tooltip label={t("Tokens generated this turn")} withArrow>
-              <span className={classes.badge}>
-                {formatTokens(liveTurnTokens)}
-              </span>
-            </Tooltip>
-          ) : contextTokens > 0 ? (
-            <Tooltip label={t("Current context size")} withArrow>
-              <span className={classes.badge}>
-                {formatTokens(contextTokens)}
-              </span>
-            </Tooltip>
-          ) : null}
+          {/* Context badge: always "current / max" context size (or just current
+              when no model limit is configured). It no longer flips to a live
+              per-turn generation counter mid-stream — that live feedback lives in
+              the chat body's "Thinking · N tokens" block. */}
+          <ContextBadge
+            contextTokens={contextTokens}
+            maxContextTokens={maxContextTokens}
+          />
        </div>

        <div style={{ display: "flex", alignItems: "center", gap: 1 }}>
@@ -642,9 +627,7 @@ export default function AiChatWindow() {
              onRolePicked={(role) => setSelectedRoleId(role.id)}
              assistantName={currentRole?.name}
              onTurnFinished={onTurnFinished}
-              threadKey={threadKey}
              onServerChatId={onServerChatId}
-              onLiveTurnTokens={setLiveTurnTokens}
            />
          )}
        </div>
--- a/apps/client/src/features/ai-chat/components/chat-thread.test.tsx
+++ b/apps/client/src/features/ai-chat/components/chat-thread.test.tsx
@@ -1,94 +0,0 @@
-import { describe, it, expect, vi, beforeEach } from "vitest";
-import { render, act } from "@testing-library/react";
-import { MantineProvider } from "@mantine/core";
-
-// Capture the options ChatThread passes to useChat so the test can drive the
-// hook's terminal callbacks (here: onError) directly, without a real stream. The
-// box is created via vi.hoisted so the hoisted vi.mock factory below can close
-// over it.
-const { useChatBox } = vi.hoisted(() => ({
-  useChatBox: { options: null as unknown as Record<string, unknown> | null },
-}));
-
-// Mock the AI SDK hook: record the options and return an inert, ready store so
-// ChatThread renders without any network/streaming machinery.
-vi.mock("@ai-sdk/react", () => ({
-  useChat: (options: Record<string, unknown>) => {
-    useChatBox.options = options;
-    return {
-      messages: [],
-      sendMessage: vi.fn(),
-      status: "ready",
-      stop: vi.fn(),
-      error: null,
-    };
-  },
-}));
-
-// Stub react-i18next so `t` returns the key (other component tests use the same
-// pattern); ChatThread's rendered chrome is irrelevant to this wiring test.
-vi.mock("react-i18next", () => ({
-  useTranslation: () => ({ t: (key: string) => key }),
-}));
-
-// Mock the heavy presentational children to trivial stubs — this test only
-// exercises the onError → onTurnFinished wiring, not their rendering.
-vi.mock("@/features/ai-chat/components/message-list.tsx", () => ({
-  default: () => null,
-}));
-vi.mock("@/features/ai-chat/components/chat-input.tsx", () => ({
-  default: () => null,
-}));
-vi.mock("@/features/ai-chat/components/role-cards.tsx", () => ({
-  default: () => null,
-}));
-vi.mock("@/features/ai-chat/components/chat-error-alert.tsx", () => ({
-  default: () => null,
-}));
-vi.mock("@/features/ai-chat/components/chat-stopped-notice.tsx", () => ({
-  default: () => null,
-}));
-
-import ChatThread from "./chat-thread";
-
-// matchMedia (read by MantineProvider) is stubbed globally in vitest.setup.ts.
-
-describe("ChatThread onError wiring (#161)", () => {
-  beforeEach(() => {
-    useChatBox.options = null;
-    vi.clearAllMocks();
-  });
-
-  it("onError calls onTurnFinished with (undefined, threadKey) so a late error from an abandoned thread is rejected", () => {
-    const onTurnFinished = vi.fn();
-    // Silence the deliberate console.error ChatThread logs for devtools.
-    const consoleError = vi
-      .spyOn(console, "error")
-      .mockImplementation(() => {});
-
-    render(
-      <MantineProvider>
-        <ChatThread
-          chatId="c1"
-          onTurnFinished={onTurnFinished}
-          threadKey="thread-key-1"
-        />
-      </MantineProvider>,
-    );
-
-    const options = useChatBox.options;
-    expect(options).not.toBeNull();
-    expect(typeof options?.onError).toBe("function");
-
-    // Drive the captured onError exactly as the AI SDK would on a stream error.
-    act(() => {
-      (options!.onError as (e: Error) => void)(new Error("stream blew up"));
-    });
-
-    // The thread's own mount key must be forwarded with NO server id, so the
-    // session hook can reject this finish if the thread has been abandoned.
-    expect(onTurnFinished).toHaveBeenCalledWith(undefined, "thread-key-1");
-
-    consoleError.mockRestore();
-  });
-});
--- a/apps/client/src/features/ai-chat/components/chat-thread.tsx
+++ b/apps/client/src/features/ai-chat/components/chat-thread.tsx
@@ -20,7 +20,6 @@ import {
 } from "@/features/ai-chat/utils/role-launch.ts";
 import { describeChatError } from "@/features/ai-chat/utils/error-message.ts";
 import { extractServerChatId } from "@/features/ai-chat/utils/adopt-chat-id.ts";
-import { liveTurnTokens } from "@/features/ai-chat/utils/count-stream-tokens.ts";
 import {
  dequeue,
  enqueueMessage,
@@ -60,24 +59,13 @@ interface ChatThreadProps {
   *  new chat, adopts the freshly created chat id. `serverChatId` is the
   *  authoritative id the server streamed on the assistant message metadata, or
   *  undefined on a failed turn — see adopt-chat-id.ts for the full #137 design. */
-  onTurnFinished: (serverChatId?: string, finishingThreadKey?: string) => void;
-  /** This thread's mount key (the same value the parent uses as React `key`).
-   *  Forwarded back through onTurnFinished so the session hook can tell a finish
-   *  from THIS still-mounted thread from a late finish of an abandoned thread the
-   *  user already left via New chat / switch (#161). */
-  threadKey?: string;
+  onTurnFinished: (serverChatId?: string) => void;
  /** Called EARLY (at the stream's `start` chunk) with the authoritative server
   *  chat id streamed on the assistant message metadata, so a brand-new chat
   *  adopts its real id WHILE the first turn is still streaming (#174 — makes the
   *  Copy/export button available mid-stream). Distinct from onTurnFinished,
   *  which fires only at the terminal outcome. */
  onServerChatId?: (serverChatId?: string) => void;
-  /** Reports the live turn-token total (reasoning + output) for the in-flight
-   *  turn so the parent can show a header badge that ticks mid-stream. THROTTLED
-   *  here (~8 Hz) so the parent re-renders a handful of times a second, not on
-   *  every streamed delta. Called with `null` when no turn is in flight (the
-   *  parent then reverts the badge to the persisted context size). */
-  onLiveTurnTokens?: (tokens: number | null) => void;
 }

 /**
@@ -121,9 +109,7 @@ export default function ChatThread({
  onRolePicked,
  assistantName,
  onTurnFinished,
-  threadKey,
  onServerChatId,
-  onLiveTurnTokens,
 }: ChatThreadProps) {
  const { t } = useTranslation();

@@ -264,7 +250,7 @@ export default function ChatThread({
      // Forward the authoritative server chatId (streamed on the assistant
      // message metadata) so the parent adopts the REAL created chat id for a new
      // chat — see adopt-chat-id.ts for the full #137 design.
-      onTurnFinished(extractServerChatId(message), threadKey);
+      onTurnFinished(extractServerChatId(message));
      // Show a neutral "stopped" marker for an aborted turn; the red error banner
      // (via `error`) already covers isError, and a clean finish clears any marker.
      if (isError) setStopNotice(null);
@@ -285,7 +271,7 @@ export default function ChatThread({
      // Surface the raw failure in the browser console (devtools) for debugging;
      // the UI separately shows a friendly classified banner (see errorView).
      console.error("AI chat stream error:", streamError);
-      onTurnFinished(undefined, threadKey);
+      onTurnFinished();
    },
  });

@@ -334,53 +320,6 @@ export default function ChatThread({
  // the SAME on-screen banner text can be mirrored into the export (issue #160).
  const errorView = error ? describeChatError(error.message ?? "", t) : null;

-  // Report the live turn-token total to the parent header badge, THROTTLED to
-  // ~8 Hz so the parent re-renders a few times a second instead of on every
-  // streamed delta. The tail assistant message's reasoning+output (estimate while
-  // streaming, authoritative once a step reports usage) is the live figure. When
-  // the turn ends we emit a final exact value, then `null` so the parent reverts
-  // the badge to the persisted context size.
-  const lastEmitRef = useRef(0);
-  const emitTimerRef = useRef<ReturnType<typeof setTimeout> | null>(null);
-  useEffect(() => {
-    if (!onLiveTurnTokens) return;
-    if (!isStreaming) {
-      // Turn ended (or never started): clear any pending throttle and revert.
-      if (emitTimerRef.current) {
-        clearTimeout(emitTimerRef.current);
-        emitTimerRef.current = null;
-      }
-      lastEmitRef.current = 0;
-      onLiveTurnTokens(null);
-      return;
-    }
-    const tail = messages[messages.length - 1];
-    const live = tail?.role === "assistant" ? liveTurnTokens(tail) : null;
-    const total = live ? live.reasoning + live.output : 0;
-    const now = Date.now();
-    const MIN_INTERVAL = 120; // ms (~8 Hz)
-    const elapsed = now - lastEmitRef.current;
-    if (elapsed >= MIN_INTERVAL) {
-      lastEmitRef.current = now;
-      onLiveTurnTokens(total);
-    } else if (!emitTimerRef.current) {
-      // Schedule a trailing emit so the FINAL value of a burst is not dropped.
-      emitTimerRef.current = setTimeout(() => {
-        emitTimerRef.current = null;
-        lastEmitRef.current = Date.now();
-        onLiveTurnTokens(total);
-      }, MIN_INTERVAL - elapsed);
-    }
-  }, [messages, isStreaming, onLiveTurnTokens]);
-
-  // Clear any pending throttle timer on unmount (chat switch via `key`) so a
-  // trailing emit can't fire into a torn-down thread's parent.
-  useEffect(() => {
-    return () => {
-      if (emitTimerRef.current) clearTimeout(emitTimerRef.current);
-    };
-  }, []);
-
  // A role was picked with autoStart=false: the role is bound but NOTHING was
  // sent, so chatId stays null and the empty state would keep showing the cards.
  // This flag hides the cards and reveals the composer (with the role indicated)
--- a/apps/client/src/features/ai-chat/components/context-badge.test.tsx
+++ b/apps/client/src/features/ai-chat/components/context-badge.test.tsx
@@ -0,0 +1,69 @@
+import { describe, it, expect } from "vitest";
+import { render, screen, fireEvent } from "@testing-library/react";
+import { MantineProvider } from "@mantine/core";
+import { ContextBadge, formatTokens } from "./context-badge";
+
+// matchMedia (read by MantineProvider) is stubbed globally in vitest.setup.ts.
+// Without an I18nextProvider, `t(key)` returns the key verbatim, so tooltip
+// labels assert against their English source strings.
+
+function renderBadge(props: {
+  contextTokens: number;
+  maxContextTokens?: number;
+}) {
+  return render(
+    <MantineProvider>
+      <ContextBadge {...props} />
+    </MantineProvider>,
+  );
+}
+
+describe("formatTokens", () => {
+  it("formats with k / M suffixes", () => {
+    expect(formatTokens(572)).toBe("572");
+    expect(formatTokens(200_000)).toBe("200.0k");
+    expect(formatTokens(1_500_000)).toBe("1.5M");
+  });
+});
+
+describe("ContextBadge", () => {
+  it("shows `current / max` when a limit is configured", () => {
+    renderBadge({ contextTokens: 572, maxContextTokens: 200_000 });
+    expect(screen.getByText("572 / 200.0k")).toBeDefined();
+  });
+
+  it("shows only the current size when no limit is configured", () => {
+    renderBadge({ contextTokens: 572, maxContextTokens: 0 });
+    expect(screen.getByText("572")).toBeDefined();
+    // No denominator rendered.
+    expect(screen.queryByText(/\//)).toBeNull();
+  });
+
+  it("treats an undefined limit as no limit", () => {
+    renderBadge({ contextTokens: 1234 });
+    expect(screen.getByText("1.2k")).toBeDefined();
+    expect(screen.queryByText(/\//)).toBeNull();
+  });
+
+  it("renders nothing until there is a current context size", () => {
+    const { container } = renderBadge({
+      contextTokens: 0,
+      maxContextTokens: 200_000,
+    });
+    expect(container.querySelector("span")).toBeNull();
+  });
+
+  it("never flips to a live per-turn counter (no live mode); shows context as-is even above max", () => {
+    // `current > max` (estimate drift / smaller-model role) is shown unclamped.
+    renderBadge({ contextTokens: 210_000, maxContextTokens: 200_000 });
+    expect(screen.getByText("210.0k / 200.0k")).toBeDefined();
+  });
+
+  it("exposes the limit tooltip label on hover", async () => {
+    renderBadge({ contextTokens: 572, maxContextTokens: 200_000 });
+    fireEvent.mouseEnter(screen.getByText("572 / 200.0k"));
+    expect(
+      await screen.findByText("Context size / model limit"),
+    ).toBeDefined();
+  });
+});
--- a/apps/client/src/features/ai-chat/components/context-badge.tsx
+++ b/apps/client/src/features/ai-chat/components/context-badge.tsx
@@ -0,0 +1,61 @@
+import { Tooltip } from "@mantine/core";
+import { useTranslation } from "react-i18next";
+import classes from "@/features/ai-chat/components/ai-chat-window.module.css";
+
+/** Compact token formatter: 1.2M / 3.4k / 950. */
+export function formatTokens(n: number): string {
+  if (n >= 1_000_000) return `${(n / 1_000_000).toFixed(1)}M`;
+  if (n >= 1_000) return `${(n / 1_000).toFixed(1)}k`;
+  return String(n);
+}
+
+interface ContextBadgeProps {
+  // Current context size for the active chat (tokens occupied in the model's
+  // window). 0 = unknown → nothing is rendered.
+  contextTokens: number;
+  // The model's context-window size (tokens), from AI settings. 0/undefined =
+  // no limit known → only the current size is shown (no denominator).
+  maxContextTokens?: number;
+}
+
+/**
+ * Header badge that ALWAYS shows the current context size, and — when the model's
+ * context-window size is configured — appends "/ max" so the badge reads
+ * "current / max" (e.g. `572 / 200k`). This is a single, stable meaning: unlike
+ * the previous design it never flips to a live per-turn generation counter while
+ * streaming (that live feedback lives in the chat body's "Thinking · N tokens").
+ *
+ * No limit configured (or older history rows without it) → the denominator is
+ * hidden and the badge shows the current size only, matching the prior at-rest
+ * behaviour. `context > max` (estimate drift, or a role on a smaller model) is
+ * shown as-is, without clamping.
+ */
+export function ContextBadge({
+  contextTokens,
+  maxContextTokens,
+}: ContextBadgeProps) {
+  const { t } = useTranslation();
+
+  // Nothing to show until the first persisted context figure exists.
+  if (!(contextTokens > 0)) return null;
+
+  const hasMax = typeof maxContextTokens === "number" && maxContextTokens > 0;
+  const label = hasMax
+    ? `${formatTokens(contextTokens)} / ${formatTokens(maxContextTokens)}`
+    : formatTokens(contextTokens);
+
+  return (
+    <Tooltip
+      label={
+        hasMax
+          ? t("Context size / model limit")
+          : t("Current context size")
+      }
+      withArrow
+    >
+      <span className={classes.badge}>{label}</span>
+    </Tooltip>
+  );
+}
+
+export default ContextBadge;
--- a/apps/client/src/features/ai-chat/hooks/use-chat-session.test.tsx
+++ b/apps/client/src/features/ai-chat/hooks/use-chat-session.test.tsx
@@ -1,5 +1,5 @@
 import { describe, it, expect, vi, beforeEach } from "vitest";
-import { renderHook, act } from "@testing-library/react";
+import { renderHook } from "@testing-library/react";
 import { useChatSession } from "./use-chat-session";
 import type { UseChatSessionOptions } from "./use-chat-session";

@@ -120,40 +120,16 @@ describe("useChatSession", () => {
    expect(setActiveChatId).not.toHaveBeenCalledWith("new");
  });

-  it("cancelPendingAdoption (selectChat) disarms a late refetch from adopting the just-failed chat", () => {
-    // cancelPendingAdoption is the explicit disarm the window calls from
-    // selectChat: switching to a chat whose id == null is a no-op for the atom, so
-    // the render-phase reconciler never fires and only this call disarms an armed
-    // error-path fallback. (startNewChat no longer routes through here — it calls
-    // startFreshThread, covered by the next test — but cancelPendingAdoption still
-    // backs selectChat, so this guard must hold.)
+  it("startNewChat while already in a new chat: cancelPendingAdoption stops a late refetch adopting the failed chat", () => {
+    // The Warning path the render-phase reconciler can't catch: pressing "New
+    // chat" while already in a new chat keeps activeChatId === null (a no-op for
+    // the atom), so only the explicit cancelPendingAdoption() disarms.
    const { result, rerender, setActiveChatId } = setup({
      activeChatId: null,
      chats: { items: [{ id: "x" }] },
    });
    result.current.onTurnFinished(undefined); // first turn failed → arm (before=["x"])
-    result.current.cancelPendingAdoption(); // window calls this from selectChat
-    // The just-failed row lands in a late refetch; it must NOT be adopted.
-    rerender({
-      activeChatId: null,
-      chats: { items: [{ id: "x" }, { id: "failed" }] },
-    });
-    expect(setActiveChatId).not.toHaveBeenCalledWith("failed");
-  });
-
-  it("#161: startFreshThread disarms the armed error-path fallback (New chat during the first turn)", () => {
-    // Pressing "New chat" while already in a not-yet-adopted new chat keeps
-    // activeChatId === null, so the render-phase reconciler never fires. The
-    // window now calls startFreshThread() (NOT cancelPendingAdoption) to force a
-    // fresh thread; this test pins the load-bearing fact that startFreshThread
-    // ALSO nulls pendingNewChatRef, so a late refetch of the just-failed row can't
-    // yank the user back into the abandoned chat.
-    const { result, rerender, setActiveChatId } = setup({
-      activeChatId: null,
-      chats: { items: [{ id: "x" }] },
-    });
-    result.current.onTurnFinished(undefined); // first turn failed → arm (before=["x"])
-    act(() => result.current.startFreshThread()); // "New chat" → fresh thread + disarm
+    result.current.cancelPendingAdoption(); // window calls this from startNewChat
    // The just-failed row lands in a late refetch; it must NOT be adopted.
    rerender({
      activeChatId: null,
@@ -251,48 +227,6 @@ describe("useChatSession", () => {
    expect(result.current.threadKey).toBe("C");
  });

-  it("#161: startFreshThread remounts even when activeChatId stays null (New chat mid-stream)", () => {
-    // The bug: pressing New chat while still on a brand-new, not-yet-adopted chat
-    // leaves activeChatId === null, so the render-phase reconciler never fires.
-    // startFreshThread must remount UNCONDITIONALLY (a new mount key).
-    const { result } = setup({ activeChatId: null, chats: { items: [] } });
-    const keyBefore = result.current.threadKey;
-    act(() => result.current.startFreshThread());
-    expect(result.current.threadKey).not.toBe(keyBefore);
-  });
-
-  it("#161: a late finish from an ABANDONED thread does not adopt or invalidate messages", () => {
-    const {
-      result,
-      setActiveChatId,
-      onInvalidateChatList,
-      onInvalidateChatMessages,
-    } = setup({ activeChatId: null, chats: { items: [{ id: "x" }] } });
-    const abandonedKey = result.current.threadKey;
-    // User pressed New chat mid-stream → fresh thread (new mount key).
-    act(() => result.current.startFreshThread());
-    expect(result.current.threadKey).not.toBe(abandonedKey);
-    // The left-behind thread's onFinish fires late, carrying ITS (now stale) key
-    // and the server id of the chat the user just left. It must NOT be adopted.
-    act(() => result.current.onTurnFinished("A", abandonedKey));
-    expect(setActiveChatId).not.toHaveBeenCalled();
-    expect(onInvalidateChatMessages).not.toHaveBeenCalled();
-    // The abandoned chat should still surface in the history list.
-    expect(onInvalidateChatList).toHaveBeenCalled();
-  });
-
-  it("#161: a finish from the CURRENT thread (matching key) still adopts", () => {
-    const { result, setActiveChatId } = setup({
-      activeChatId: null,
-      chats: { items: [{ id: "x" }] },
-    });
-    // Same thread that is mounted reports its finish with the matching key.
-    act(() =>
-      result.current.onTurnFinished("A", result.current.threadKey),
-    );
-    expect(setActiveChatId).toHaveBeenCalledWith("A");
-  });
-
  it("waitingForHistory gates the loader only while opening an unloaded existing chat", () => {
    // Open an existing chat whose history is still loading => loader on.
    const { result, rerender } = setup({
--- a/apps/client/src/features/ai-chat/hooks/use-chat-session.ts
+++ b/apps/client/src/features/ai-chat/hooks/use-chat-session.ts
@@ -32,13 +32,8 @@ export interface UseChatSessionResult {
  /** Show the history loader instead of the live thread. */
  waitingForHistory: boolean;
  /** Call when a turn finishes; `serverChatId` is the authoritative streamed id
-   *  (undefined on a failed turn). `finishingThreadKey` is the mount key of the
-   *  thread that produced this callback — when it no longer matches the mounted
-   *  thread (the user pressed New chat / switched mid-stream), the call is from an
-   *  abandoned thread and must NOT adopt or re-arm the fallback. Omitting it (old
-   *  callers / tests) treats the call as belonging to the current thread. Handles
-   *  new-chat id adoption + invalidations. */
-  onTurnFinished: (serverChatId?: string, finishingThreadKey?: string) => void;
+   *  (undefined on a failed turn). Handles new-chat id adoption + invalidations. */
+  onTurnFinished: (serverChatId?: string) => void;
  /** Call EARLY (at the stream's `start` chunk) with the authoritative streamed
   *  chat id so a brand-new chat adopts its real id WHILE its first turn is still
   *  streaming — making `activeChatId`-gated affordances (e.g. the Copy/export
@@ -46,13 +41,6 @@ export interface UseChatSessionResult {
   *  no list/messages invalidation — that is left to onTurnFinished at the end).
   *  Idempotent and a no-op once the chat already has an id. */
  onServerChatId: (serverChatId?: string) => void;
-  /** Force a brand-new, empty thread (new mount key, no chat id) UNCONDITIONALLY.
-   *  The render-phase reconciler only remounts when `activeChatId` actually
-   *  changes; pressing "New chat" while already in a not-yet-adopted new chat
-   *  leaves `activeChatId === null` (a no-op for the atom), so the reconciler
-   *  never fires and the stale streaming thread stays mounted (#161). The window
-   *  calls this from startNewChat to guarantee a fresh thread regardless. */
-  startFreshThread: () => void;
  /** Disarm any pending error-path new-chat fallback. The window calls this from
   *  startNewChat/selectChat so a late refetch can't yank the user back into a
   *  just-failed chat after they explicitly moved on. */
@@ -97,14 +85,6 @@ export function useChatSession(
  const activeChatIdRef = useRef(activeChatId);
  activeChatIdRef.current = activeChatId;

-  // Live mirror of the mounted thread's key, read by onTurnFinished to tell a
-  // current-thread finish from an ABANDONED one. ai@6's useChat does not abort
-  // its request on unmount, and its callbacks are proxied so onFinish/onError of
-  // a thread the user already left (via New chat / switch) still fire AFTER that
-  // thread unmounts. By then this ref holds the NEW thread's key, so comparing it
-  // to the key the finishing thread reports rejects the abandoned turn (#161).
-  const threadKeyRef = useRef<string>("");
-
  // The mounted thread's identity: ONE atomic value tying ChatThread's mount key
  // (`thread.key`) to the chat id that mounted thread holds (`thread.chatId`).
  // Consolidating these makes the "key vs chat id diverged" state unrepresentable
@@ -118,10 +98,6 @@ export function useChatSession(
      : switchThread(activeChatId),
  );

-  // Keep the live mirror pointed at the currently-mounted thread's key so a late
-  // onTurnFinished can be matched against it (see threadKeyRef above).
-  threadKeyRef.current = thread.key;
-
  // Error-path fallback for new-chat id adoption. When a brand-new chat's first
  // turn errors BEFORE the server's `start` chunk, no authoritative chatId ever
  // reaches the client, so the primary metadata adoption cannot run. We then ARM
@@ -139,21 +115,7 @@ export function useChatSession(
  // yet) we adopt the server's AUTHORITATIVE streamed id (never the newest in the
  // list, which races a second tab — #137; see adopt-chat-id.ts).
  const onTurnFinished = useCallback(
-    (serverChatId?: string, finishingThreadKey?: string) => {
-      // Reject a finish from an ABANDONED thread. After the user pressed New chat
-      // (or switched chats) mid-stream, the left-behind thread's onFinish/onError
-      // still fire (ai@6 does not abort on unmount). Adopting/arming off that late
-      // callback would yank the user back into the chat they just left (#161).
-      // `undefined` (legacy callers/tests) is treated as the current thread.
-      const isCurrentThread =
-        finishingThreadKey === undefined ||
-        finishingThreadKey === threadKeyRef.current;
-      if (!isCurrentThread) {
-        // Still surface the abandoned chat in the history list, but do NOT adopt,
-        // arm the fallback, or invalidate per-chat messages (no thread shows it).
-        onInvalidateChatList();
-        return;
-      }
+    (serverChatId?: string) => {
      // Read the live id from the ref, not the closure: on a failed turn this can
      // run twice in one turn (onFinish + onError) before any re-render, and the
      // primary branch below updates the ref so the second call sees the adopted id.
@@ -296,29 +258,11 @@ export function useChatSession(
    pendingNewChatRef.current = null;
  }, []);

-  // Force a fresh, empty thread regardless of the current `activeChatId`. The
-  // render-phase reconciler only remounts on an activeChatId CHANGE, so "New chat"
-  // pressed while already in a not-yet-adopted new chat (activeChatId stays null)
-  // would otherwise leave the in-flight streaming thread mounted (#161). Dispatch
-  // `reconcile` to chatId:null with a brand-new key so React remounts ChatThread
-  // (a fresh useChat store). Disarm any armed fallback too. After this dispatch
-  // thread.chatId is null; the window also sets activeChatId to null, so the
-  // render-phase reconciler then finds them equal and does not double-remount.
-  const startFreshThread = useCallback(() => {
-    pendingNewChatRef.current = null;
-    dispatch({
-      type: "reconcile",
-      chatId: null,
-      newKey: `new-${generateId()}`,
-    });
-  }, []);
-
  return {
    threadKey: thread.key,
    waitingForHistory,
    onTurnFinished,
    onServerChatId,
-    startFreshThread,
    cancelPendingAdoption,
  };
 }
--- a/apps/client/src/features/ai-chat/types/ai-chat.types.ts
+++ b/apps/client/src/features/ai-chat/types/ai-chat.types.ts
@@ -113,9 +113,14 @@ export interface IAiChatMessageRow {
    };
    // Current context size for the turn = final-step (input+output) tokens, i.e.
    // how much the conversation occupies in the model's context window after this
-    // turn. Distinct from `usage` (legacy cumulative totalUsage). Shown in the
-    // floating window's header badge.
+    // turn. Distinct from `usage` (legacy cumulative totalUsage). Shown as the
+    // numerator of the floating window's "current / max" header badge.
    contextTokens?: number;
+    // The model's context-window size (tokens), admin-configured in AI settings
+    // and stamped onto the turn server-side. The denominator of the header badge.
+    // Absent/0 (older rows, or no limit configured) → the badge hides the
+    // denominator and shows only the current context size (`contextTokens`).
+    maxContextTokens?: number;
    // Set on an assistant row whose turn ended in a provider/stream error; the
    // raw provider error text (e.g. "402: ...") for inline display in the thread.
    error?: string;
--- a/apps/client/src/features/ai-chat/utils/count-stream-tokens.test.ts
+++ b/apps/client/src/features/ai-chat/utils/count-stream-tokens.test.ts
@@ -1,17 +1,5 @@
 import { describe, expect, it } from "vitest";
-import type { UIMessage } from "@ai-sdk/react";
-import {
-  estimateTokens,
-  liveTurnTokens,
-} from "@/features/ai-chat/utils/count-stream-tokens.ts";
-
-const msg = (parts: unknown[], metadata?: unknown): UIMessage =>
-  ({
-    id: Math.random().toString(),
-    role: "assistant",
-    parts,
-    metadata,
-  }) as UIMessage;
+import { estimateTokens } from "@/features/ai-chat/utils/count-stream-tokens.ts";

 describe("estimateTokens", () => {
  it("returns 0 for the empty string", () => {
@@ -25,147 +13,3 @@ describe("estimateTokens", () => {
    expect(estimateTokens("12345678")).toBe(2);
  });
 });
-
-describe("liveTurnTokens — estimate path", () => {
-  it("is all zeros for an undefined message", () => {
-    expect(liveTurnTokens(undefined)).toEqual({
-      reasoning: 0,
-      output: 0,
-      authoritative: false,
-    });
-  });
-
-  it("is all zeros for a parts-less message", () => {
-    expect(liveTurnTokens({ id: "x", role: "assistant" } as UIMessage)).toEqual({
-      reasoning: 0,
-      output: 0,
-      authoritative: false,
-    });
-  });
-
-  it("estimates output from text parts", () => {
-    // 8 chars -> 2 tokens.
-    const r = liveTurnTokens(msg([{ type: "text", text: "12345678" }]));
-    expect(r).toEqual({ reasoning: 0, output: 2, authoritative: false });
-  });
-
-  it("estimates reasoning from reasoning parts (kept separate from output)", () => {
-    const r = liveTurnTokens(
-      msg([
-        { type: "reasoning", text: "12345678" },
-        { type: "text", text: "abcd" },
-      ]),
-    );
-    expect(r).toEqual({ reasoning: 2, output: 1, authoritative: false });
-  });
-
-  it("accumulates across multiple text + reasoning parts (multi-step)", () => {
-    const r = liveTurnTokens(
-      msg([
-        { type: "reasoning", text: "abcd" }, // 1
-        { type: "text", text: "abcd" }, // 1
-        { type: "tool-getPage", state: "output-available" }, // ignored
-        { type: "reasoning", text: "abcd" }, // 1
-        { type: "text", text: "abcdefgh" }, // 2
-      ]),
-    );
-    expect(r).toEqual({ reasoning: 2, output: 3, authoritative: false });
-  });
-
-  it("ignores non text/reasoning parts (tools, step-start)", () => {
-    const r = liveTurnTokens(
-      msg([
-        { type: "step-start" },
-        { type: "tool-getPage", state: "input-available" },
-      ]),
-    );
-    expect(r).toEqual({ reasoning: 0, output: 0, authoritative: false });
-  });
-});
-
-describe("liveTurnTokens — authoritative path", () => {
-  it("returns authoritative usage verbatim, splitting reasoning out of output", () => {
-    // outputTokens INCLUDES reasoning in the AI SDK shape -> answer = 100 - 30.
-    const r = liveTurnTokens(
-      msg([{ type: "text", text: "estimate would be tiny" }], {
-        usage: { inputTokens: 500, outputTokens: 100, reasoningTokens: 30 },
-      }),
-    );
-    expect(r).toEqual({ reasoning: 30, output: 70, authoritative: true });
-  });
-
-  it("treats missing reasoningTokens as 0 and keeps full output", () => {
-    const r = liveTurnTokens(
-      msg([{ type: "text", text: "x" }], {
-        usage: { inputTokens: 10, outputTokens: 42 },
-      }),
-    );
-    expect(r).toEqual({ reasoning: 0, output: 42, authoritative: true });
-  });
-
-  it("never returns a negative output when reasoning exceeds reported output", () => {
-    const r = liveTurnTokens(
-      msg([], { usage: { outputTokens: 10, reasoningTokens: 40 } }),
-    );
-    expect(r).toEqual({ reasoning: 40, output: 0, authoritative: true });
-  });
-
-  it("falls back to the estimate when metadata has no usage object", () => {
-    const r = liveTurnTokens(
-      msg([{ type: "text", text: "abcd" }], { chatId: "c1" }),
-    );
-    expect(r).toEqual({ reasoning: 0, output: 1, authoritative: false });
-  });
-});
-
-describe("liveTurnTokens — combined authoritative + estimate (#163)", () => {
-  it("ticks the in-flight step above the completed-steps authoritative base", () => {
-    // The authoritative usage is the sum over COMPLETED steps (step 1). The
-    // CURRENT step is streaming and its text is NOT in `usage` yet, but it IS in
-    // the parts -> the running estimate must push the live figure above the base
-    // so the badge keeps growing between step boundaries.
-    const longText = "x".repeat(800); // 800 chars -> 200 est output tokens
-    const r = liveTurnTokens(
-      msg([{ type: "text", text: longText }], {
-        usage: { inputTokens: 500, outputTokens: 40 }, // step-1 base: 40 output
-      }),
-    );
-    // max(authOutput=40, estOutput=200) = 200 -> the counter ticks, not frozen.
-    expect(r.output).toBe(200);
-    expect(r.authoritative).toBe(true);
-  });
-
-  it("ticks reasoning of the in-flight step above the authoritative reasoning base", () => {
-    const longReasoning = "r".repeat(400); // 400 chars -> 100 est reasoning
-    const r = liveTurnTokens(
-      msg([{ type: "reasoning", text: longReasoning }], {
-        usage: { inputTokens: 100, outputTokens: 20, reasoningTokens: 20 },
-      }),
-    );
-    // reasoning: max(20, 100) = 100 ; output: max(max(0,20-20)=0, 0) = 0.
-    expect(r.reasoning).toBe(100);
-    expect(r.output).toBe(0);
-    expect(r.authoritative).toBe(true);
-  });
-
-  it("snaps to the authoritative figure once it exceeds the rough estimate", () => {
-    // Short on-screen text (estimate tiny) but a large authoritative output:
-    // the exact figure wins at the boundary (the counter never under-reports).
-    const r = liveTurnTokens(
-      msg([{ type: "text", text: "abcd" }], {
-        usage: { inputTokens: 10, outputTokens: 5000 },
-      }),
-    );
-    expect(r.output).toBe(5000);
-  });
-
-  it("is monotonic: max never drops below the authoritative base when the estimate is smaller", () => {
-    // Mirrors the legacy 'verbatim' tests: estimate < authoritative -> unchanged.
-    const r = liveTurnTokens(
-      msg([{ type: "text", text: "tiny" }], {
-        usage: { inputTokens: 500, outputTokens: 100, reasoningTokens: 30 },
-      }),
-    );
-    expect(r).toEqual({ reasoning: 30, output: 70, authoritative: true });
-  });
-});
--- a/apps/client/src/features/ai-chat/utils/count-stream-tokens.ts
+++ b/apps/client/src/features/ai-chat/utils/count-stream-tokens.ts
@@ -1,18 +1,16 @@
-import type { UIMessage } from "@ai-sdk/react";
-
 /**
- * Live token counting for a streaming AI-chat turn — split into REASONING
- * (thinking) and OUTPUT (answer) tokens, mirroring how Claude Code shows
- * `Thinking… · 60 tokens` next to its thinking indicator.
+ * Live token ESTIMATION for a streaming AI-chat turn.
 *
 * No provider streams exact per-token usage mid-stream, so the live number is a
- * CLIENT ESTIMATE (chars/≈4 heuristic) that is reconciled to AUTHORITATIVE usage
- * once the server attaches it on a step/turn boundary (see the server's
- * `chatStreamMetadata` + the client's read of `message.metadata.usage`). When
- * authoritative usage is present we return it verbatim (the number "jumps to
- * exact"); otherwise we return the running estimate. Pure + unit-testable: it
- * never runs a real BPE tokenizer (that would be O(n²) on the hot path, bloat the
+ * CLIENT ESTIMATE (chars/≈4 heuristic). It powers the chat body's
+ * `Thinking… · N tokens` indicator (see `ReasoningBlock`), which reconciles to
+ * the authoritative server usage once it lands. Pure + unit-testable: it never
+ * runs a real BPE tokenizer (that would be O(n²) on the hot path, bloat the
 * bundle, and be wrong for Gemini/Ollama anyway).
+ *
+ * The former header-badge `liveTurnTokens()` split was removed with #189 (the
+ * header badge now shows the stable "current / max" context size, not a live
+ * per-turn counter); the live feedback remains in `ReasoningBlock`.
 */

 /**
@@ -24,90 +22,3 @@ export function estimateTokens(text: string): number {
  if (!text) return 0;
  return Math.ceil(text.length / 4);
 }
-
-/** Authoritative per-step/turn usage the server attaches to message metadata. */
-export interface AuthoritativeUsage {
-  inputTokens?: number;
-  outputTokens?: number;
-  totalTokens?: number;
-  reasoningTokens?: number;
-}
-
-/** Live token split for a turn's tail (streaming) assistant message. */
-export interface LiveTurnTokens {
-  /** Thinking/reasoning tokens (estimate, or authoritative when available). */
-  reasoning: number;
-  /** Answer/output tokens (estimate, or authoritative when available). */
-  output: number;
-  /** True when the numbers come from authoritative server usage, not estimate. */
-  authoritative: boolean;
-}
-
-/** Read the authoritative usage off a UIMessage's metadata, if the server set it. */
-function metadataUsage(message: UIMessage): AuthoritativeUsage | undefined {
-  const meta = message?.metadata as
-    | { usage?: AuthoritativeUsage }
-    | undefined;
-  const usage = meta?.usage;
-  if (!usage || typeof usage !== "object") return undefined;
-  return usage;
-}
-
-/**
- * Token split for the given (streaming) assistant message.
- *
- * COMBINES the authoritative server usage with the running text estimate so the
- * counter ticks in real time AND lands exact. The server only attaches
- * `metadata.usage` at a step/turn boundary (`finish-step`/`finish`) and it is
- * CUMULATIVE over COMPLETED steps — it does NOT yet include the in-flight step.
- * So a multi-step turn that returned the authoritative figure verbatim would
- * FREEZE between boundaries and jump in steps (issue #163).
- *
- * Instead we always compute the running ESTIMATE (chars/≈4 over the message's
- * `reasoning`/`text` parts, which grows on every streamed delta) and take the
- * per-component MAX of the authoritative base and the estimate:
- *   - between boundaries the estimate of the in-flight step ticks the number up;
- *   - at a boundary the authoritative figure snaps it to exact;
- *   - because the server's usage is cumulative and we only ever take the max, the
- *     number is MONOTONIC — it never drops.
- *
- * Providers that don't stream reasoning text still surface a reasoning count once
- * the authoritative usage arrives (`max(reasoningTokens, 0)`); on the pure
- * estimate path (no usage yet) such a turn shows `reasoning: 0` until then.
- */
-export function liveTurnTokens(message: UIMessage | undefined): LiveTurnTokens {
-  if (!message) return { reasoning: 0, output: 0, authoritative: false };
-
-  // Running ESTIMATE over every reasoning/text part — grows on each delta. This
-  // includes the IN-FLIGHT step, which the authoritative usage does not cover yet.
-  let estReasoning = 0;
-  let estOutput = 0;
-  for (const part of message.parts ?? []) {
-    if (part.type === "reasoning") {
-      estReasoning += estimateTokens((part as { text?: string }).text ?? "");
-    } else if (part.type === "text") {
-      estOutput += estimateTokens((part as { text?: string }).text ?? "");
-    }
-  }
-
-  const usage = metadataUsage(message);
-  if (!usage) {
-    // No authoritative usage streamed yet: the estimate IS the live figure.
-    return { reasoning: estReasoning, output: estOutput, authoritative: false };
-  }
-
-  // Authoritative sum over COMPLETED steps. `outputTokens` already INCLUDES
-  // reasoning in the AI SDK usage shape, so subtract it out for the "answer"
-  // figure (never go negative if a provider reports them inconsistently).
-  const authReasoning = usage.reasoningTokens ?? 0;
-  const authOutput = Math.max(0, (usage.outputTokens ?? 0) - authReasoning);
-
-  // Per-component max: the in-flight step's estimate ticks above the completed-
-  // steps base between boundaries, and the authoritative figure wins once it
-  // exceeds the (rough) estimate at the next boundary. Monotonic by construction.
-  return {
-    reasoning: Math.max(authReasoning, estReasoning),
-    output: Math.max(authOutput, estOutput),
-    authoritative: true,
-  };
-}
--- a/apps/client/src/features/workspace/components/settings/components/ai-provider-settings.tsx
+++ b/apps/client/src/features/workspace/components/settings/components/ai-provider-settings.tsx
@@ -7,6 +7,7 @@ import {
  Button,
  Group,
  Modal,
+  NumberInput,
  Paper,
  PasswordInput,
  Select,
@@ -85,6 +86,9 @@ const formSchema = z.object({
  chatModel: z.string(),
  // Chat provider implementation (reasoning surfacing). Default openai-compatible.
  chatApiStyle: z.enum(["openai-compatible", "openai"]),
+  // Model context-window size (tokens) shown as the chat header badge's "max".
+  // Empty string = no limit (NumberInput emits "" when cleared).
+  chatContextWindow: z.union([z.number(), z.literal("")]),
  // Cheap model id for the anonymous public-share assistant; empty = use chatModel.
  publicShareChatModel: z.string(),
  // Agent-role id whose persona the public-share assistant adopts; empty =
@@ -312,6 +316,7 @@ export default function AiProviderSettings() {
    initialValues: {
      chatModel: "",
      chatApiStyle: "openai-compatible" as ChatApiStyle,
+      chatContextWindow: "" as number | "",
      publicShareChatModel: "",
      publicShareAssistantRoleId: "",
      embeddingModel: "",
@@ -335,6 +340,10 @@ export default function AiProviderSettings() {
    form.setValues({
      chatModel: settings.chatModel ?? "",
      chatApiStyle: settings.chatApiStyle ?? "openai-compatible",
+      // 0/unset = no limit → show an empty field (not a literal "0").
+      chatContextWindow: settings.chatContextWindow
+        ? settings.chatContextWindow
+        : "",
      publicShareChatModel: settings.publicShareChatModel ?? "",
      publicShareAssistantRoleId: settings.publicShareAssistantRoleId ?? "",
      embeddingModel: settings.embeddingModel ?? "",
@@ -365,6 +374,11 @@ export default function AiProviderSettings() {
      driver: "openai",
      chatModel: values.chatModel,
      chatApiStyle: values.chatApiStyle,
+      // Empty → 0, which clears the limit server-side (badge shows current only).
+      chatContextWindow:
+        typeof values.chatContextWindow === "number"
+          ? values.chatContextWindow
+          : 0,
      // Cheap model id for the anonymous public-share assistant; empty falls
      // back to chatModel server-side.
      publicShareChatModel: values.publicShareChatModel,
@@ -785,6 +799,22 @@ export default function AiProviderSettings() {
          {...form.getInputProps("chatApiStyle")}
        />

+        <NumberInput
+          mt="sm"
+          label={t("Context window (tokens)")}
+          description={t(
+            "Shows used / total in the chat header badge; empty hides the total.",
+          )}
+          placeholder={t("e.g. 200000")}
+          min={0}
+          step={1000}
+          allowDecimal={false}
+          allowNegative={false}
+          thousandSeparator=" "
+          disabled={isLoading}
+          {...form.getInputProps("chatContextWindow")}
+        />
+
        {/* Anonymous public-share assistant: a single master toggle + an
            optional cheaper model id. Reuses this card's driver/URL/key. */}
        <Group justify="space-between" align="center" wrap="nowrap" mt="md">
--- a/apps/client/src/features/workspace/services/ai-settings-service.ts
+++ b/apps/client/src/features/workspace/services/ai-settings-service.ts
@@ -23,6 +23,9 @@ export interface IAiSettings {
  driver?: AiDriver;
  chatModel?: string;
  chatApiStyle?: ChatApiStyle;
+  // Chat model context-window size (tokens); shown as the "max" in the chat
+  // header context badge. 0/unset = no limit (badge shows the current size only).
+  chatContextWindow?: number;
  // Cheap model id for the anonymous public-share assistant; empty = chatModel.
  publicShareChatModel?: string;
  // Agent-role id whose persona the public-share assistant adopts; empty =
@@ -57,6 +60,8 @@ export interface IAiSettingsUpdate {
  driver?: AiDriver;
  chatModel?: string;
  chatApiStyle?: ChatApiStyle;
+  // Chat model context-window size (tokens); 0 clears the limit.
+  chatContextWindow?: number;
  publicShareChatModel?: string;
  // Agent-role id whose persona the public-share assistant adopts; empty =
  // built-in locked persona.
--- a/apps/server/src/core/ai-chat/ai-chat.service.spec.ts
+++ b/apps/server/src/core/ai-chat/ai-chat.service.spec.ts
@@ -292,6 +292,26 @@ describe('flushAssistant', () => {
    expect(f.metadata.contextTokens).toBe(15);
  });

+  it('completed: writes maxContextTokens when the model limit is > 0', () => {
+    const f = flushAssistant([toolStep], '', 'completed', {
+      contextTokens: 15,
+      maxContextTokens: 200_000,
+    });
+    expect(f.metadata.maxContextTokens).toBe(200_000);
+  });
+
+  it('omits maxContextTokens when the limit is unset or 0', () => {
+    const unset = flushAssistant([toolStep], '', 'completed', {
+      contextTokens: 15,
+    });
+    expect('maxContextTokens' in unset.metadata).toBe(false);
+    const zero = flushAssistant([toolStep], '', 'completed', {
+      contextTokens: 15,
+      maxContextTokens: 0,
+    });
+    expect('maxContextTokens' in zero.metadata).toBe(false);
+  });
+
  it('error: records the error and a derived finishReason', () => {
    const f = flushAssistant([], 'partial answer', 'error', { error: 'boom' });
    expect(f.status).toBe('error');
--- a/apps/server/src/core/ai-chat/ai-chat.service.ts
+++ b/apps/server/src/core/ai-chat/ai-chat.service.ts
@@ -616,6 +616,9 @@ export class AiChatService implements OnModuleInit {
              contextTokens:
                (usage?.inputTokens ?? 0) + (usage?.outputTokens ?? 0) ||
                undefined,
+              // Admin-configured context-window size for this model (badge max).
+              // Resolved once per turn above; written to metadata only when > 0.
+              maxContextTokens: resolved?.chatContextWindow,
            }),
          );
          // Lifecycle: release the external MCP clients leased for this turn.
@@ -1223,6 +1226,10 @@ export function flushAssistant(
    finishReason?: string;
    usage?: ChatStreamUsage | StreamUsage | undefined;
    contextTokens?: number;
+    // Admin-configured context-window size (tokens) for this turn's model; the
+    // denominator of the client's "current / max" header badge. Written only
+    // when > 0 (0/unset = no limit known → the badge shows current only).
+    maxContextTokens?: number;
    error?: string;
  },
 ): AssistantFlush {
@@ -1253,6 +1260,9 @@ export function flushAssistant(
      normalizeStreamUsage(extra.usage as StreamUsage) ?? extra.usage;
  }
  if (extra?.contextTokens) metadata.contextTokens = extra.contextTokens;
+  if (extra?.maxContextTokens && extra.maxContextTokens > 0) {
+    metadata.maxContextTokens = extra.maxContextTokens;
+  }
  if (extra?.error) metadata.error = extra.error;

  return {
--- a/apps/server/src/database/repos/workspace/workspace.repo.ts
+++ b/apps/server/src/database/repos/workspace/workspace.repo.ts
@@ -21,6 +21,7 @@ export const AI_PROVIDER_SETTINGS_ALLOWED: readonly string[] = [
  'driver',
  'chatModel',
  'chatApiStyle',
+  'chatContextWindow',
  'embeddingModel',
  'baseUrl',
  'embeddingBaseUrl',
@@ -255,11 +256,17 @@ export class WorkspaceRepo {
  ): Promise<Workspace> {
    const db = dbOrTx(this.db, trx);
    // Assemble the provider object IN SQL. Keys are fixed provider field names
-    // (sql.lit -> inlined literals, no injection); values are bound params cast
-    // to ::text — postgres.js sends bound params untyped, and jsonb_build_object's
-    // value args are polymorphic ("any"), so without the explicit ::text cast
-    // Postgres throws "could not determine data type of parameter $1". The result
-    // is a real jsonb object, never a double-encoded string. The CASE self-heals
+    // (sql.lit -> inlined literals, no injection); values are bound params with
+    // an explicit cast — postgres.js sends bound params untyped, and
+    // jsonb_build_object's value args are polymorphic ("any"), so without the
+    // cast Postgres throws "could not determine data type of parameter $1". The
+    // cast is branched by the JS runtime type so the value lands in jsonb with
+    // the matching JSON type: a number stays a JSON number (e.g.
+    // chatContextWindow → `{"chatContextWindow":200000}`, jsonb_typeof 'number'),
+    // a boolean a JSON boolean, everything else a JSON string. A plain `::text`
+    // for all would store a numeric field as the JSON STRING `"200000"`, which
+    // the client's `typeof === "number"` guards reject. The result is a real
+    // jsonb object, never a double-encoded string. The CASE self-heals
    // workspaces whose settings.ai.provider was previously corrupted into an
    // array/string.
    const entries = Object.entries(provider).filter(
@@ -267,7 +274,14 @@ export class WorkspaceRepo {
    );
    const patch = entries.length
      ? sql`jsonb_build_object(${sql.join(
-          entries.flatMap(([k, v]) => [sql.lit(k), sql`${v}::text`]),
+          entries.flatMap(([k, v]) => [
+            sql.lit(k),
+            typeof v === 'number'
+              ? sql`${v}::numeric`
+              : typeof v === 'boolean'
+                ? sql`${v}::boolean`
+                : sql`${v}::text`,
+          ]),
        )})`
      : sql`'{}'::jsonb`;
    return db
--- a/apps/server/src/integrations/ai/ai-provider-settings-keys.spec.ts
+++ b/apps/server/src/integrations/ai/ai-provider-settings-keys.spec.ts
@@ -41,3 +41,35 @@ describe('UpdateAiSettingsDto.chatApiStyle', () => {
    expect(errs.find((e) => e.property === 'chatApiStyle')).toBeUndefined();
  });
 });
+
+/** DTO validation for chatContextWindow (@IsOptional @IsInt @Min(0)). */
+describe('UpdateAiSettingsDto.chatContextWindow', () => {
+  const errorsFor = async (chatContextWindow: unknown) =>
+    validate(plainToInstance(UpdateAiSettingsDto, { chatContextWindow }));
+
+  it('accepts a non-negative integer (incl. 0 = clear the limit)', async () => {
+    for (const v of [0, 200000]) {
+      const errs = await errorsFor(v);
+      expect(
+        errs.find((e) => e.property === 'chatContextWindow'),
+      ).toBeUndefined();
+    }
+  });
+
+  it('rejects a negative value', async () => {
+    const errs = await errorsFor(-1);
+    expect(errs.find((e) => e.property === 'chatContextWindow')).toBeDefined();
+  });
+
+  it('rejects a non-integer value', async () => {
+    const errs = await errorsFor(1.5);
+    expect(errs.find((e) => e.property === 'chatContextWindow')).toBeDefined();
+  });
+
+  it('accepts the field being omitted (optional)', async () => {
+    const errs = await validate(plainToInstance(UpdateAiSettingsDto, {}));
+    expect(
+      errs.find((e) => e.property === 'chatContextWindow'),
+    ).toBeUndefined();
+  });
+});
--- a/apps/server/src/integrations/ai/ai-settings.service.ts
+++ b/apps/server/src/integrations/ai/ai-settings.service.ts
@@ -27,6 +27,8 @@ export interface UpdateAiSettingsInput {
  driver?: AiDriver;
  chatModel?: string;
  chatApiStyle?: ChatApiStyle;
+  // Chat context-window size (tokens); 0/empty clears the limit.
+  chatContextWindow?: number;
  embeddingModel?: string;
  baseUrl?: string;
  embeddingBaseUrl?: string;
@@ -162,6 +164,8 @@ export class AiSettingsService {
      chatModel: provider.chatModel,
      // Plain passthrough; getChatModel defaults unset to 'openai-compatible'.
      chatApiStyle: provider.chatApiStyle,
+      // Admin-configured context-window size; 0/unset = no limit (badge denominator).
+      chatContextWindow: provider.chatContextWindow,
      // Cheap model id for the anonymous public-share assistant; reuses the chat
      // driver/baseUrl/apiKey. Empty/unset → callers fall back to chatModel.
      publicShareChatModel: provider.publicShareChatModel,
@@ -244,6 +248,7 @@ export class AiSettingsService {
      driver: provider.driver,
      chatModel: provider.chatModel,
      chatApiStyle: provider.chatApiStyle,
+      chatContextWindow: provider.chatContextWindow,
      embeddingModel: provider.embeddingModel,
      baseUrl: provider.baseUrl,
      embeddingBaseUrl: provider.embeddingBaseUrl,
--- a/apps/server/src/integrations/ai/ai.types.ts
+++ b/apps/server/src/integrations/ai/ai.types.ts
@@ -35,6 +35,13 @@ export interface AiProviderSettings {
  // Chat provider implementation for the `openai` driver. Unset → defaults to
  // 'openai-compatible' (so reasoning is surfaced by default). See ChatApiStyle.
  chatApiStyle?: ChatApiStyle;
+  // Admin-configured chat model context-window size, in tokens. There is no
+  // provider-independent way to discover this (OpenAI's /v1/models usually omits
+  // it, Gemini/Ollama/OpenRouter each expose it differently), so it is entered
+  // manually. Surfaced to the chat client (via assistant message metadata) as the
+  // denominator of the header "current / max" context badge. Empty/0 = no limit
+  // known → the badge shows only the current context size.
+  chatContextWindow?: number;
  embeddingModel?: string;
  baseUrl?: string;
  // Embedding-specific base URL. Falls back to `baseUrl` when empty/unset.
@@ -73,6 +80,7 @@ export const PROVIDER_SETTINGS_KEYS = [
  'driver',
  'chatModel',
  'chatApiStyle',
+  'chatContextWindow',
  'embeddingModel',
  'baseUrl',
  'embeddingBaseUrl',
@@ -98,6 +106,10 @@ export const PROVIDER_SETTINGS_KEYS = [
 export interface ResolvedAiConfig extends Partial<AiProviderSettings> {
  driver?: AiDriver;
  chatModel?: string;
+  // Admin-configured chat context-window size (tokens); 0/unset = no limit. Used
+  // as the header context-badge denominator. Re-declared for parity with the
+  // explicit fields above.
+  chatContextWindow?: number;
  // Cheap model id for the public-share assistant; reuses the chat creds.
  publicShareChatModel?: string;
  // Agent-role id whose persona the public-share assistant adopts (empty/unset
@@ -117,6 +129,8 @@ export interface MaskedAiSettings {
  driver?: AiDriver;
  chatModel?: string;
  chatApiStyle?: ChatApiStyle;
+  // Admin-configured chat context-window size (tokens); 0/unset = no limit.
+  chatContextWindow?: number;
  embeddingModel?: string;
  baseUrl?: string;
  embeddingBaseUrl?: string;
--- a/apps/server/src/integrations/ai/dto/update-ai-settings.dto.ts
+++ b/apps/server/src/integrations/ai/dto/update-ai-settings.dto.ts
@@ -1,4 +1,4 @@
-import { IsIn, IsOptional, IsString } from 'class-validator';
+import { IsIn, IsInt, IsOptional, IsString, Min } from 'class-validator';
 import {
  AI_DRIVERS,
  AiDriver,
@@ -29,6 +29,13 @@ export class UpdateAiSettingsDto {
  @IsIn(CHAT_API_STYLES)
  chatApiStyle?: ChatApiStyle;

+  // Chat model context-window size in tokens (header context-badge denominator).
+  // 0 (or empty) clears the limit so the badge shows only the current context.
+  @IsOptional()
+  @IsInt()
+  @Min(0)
+  chatContextWindow?: number;
+
  @IsOptional()
  @IsString()
  embeddingModel?: string;
--- a/apps/server/test/integration/workspace-repo-ai-provider-settings.int-spec.ts
+++ b/apps/server/test/integration/workspace-repo-ai-provider-settings.int-spec.ts
@@ -0,0 +1,91 @@
+import { Kysely, sql } from 'kysely';
+import { WorkspaceRepo } from '@docmost/db/repos/workspace/workspace.repo';
+import { getTestDb, destroyTestDb, createWorkspace } from './db';
+
+/**
+ * WorkspaceRepo.updateAiProviderSettings numeric round-trip (#189, #213).
+ *
+ * `chatContextWindow` is the first NUMERIC provider field routed through this
+ * generic SQL layer. The patch builder must cast a JS number so it lands in
+ * jsonb as a JSON NUMBER, not the JSON STRING `"200000"` — the client guards
+ * (`typeof === "number"`) reject a string, silently killing the `/ max` badge
+ * denominator. A plain `::text` cast (the prior code) regressed exactly this.
+ * These specs are real SQL and assert both the JS value type and the on-disk
+ * `jsonb_typeof`.
+ */
+describe('WorkspaceRepo.updateAiProviderSettings (numeric round-trip) [integration]', () => {
+  let db: Kysely<any>;
+  let repo: WorkspaceRepo;
+
+  beforeAll(() => {
+    db = getTestDb();
+    repo = new WorkspaceRepo(db as any);
+  });
+
+  afterAll(async () => {
+    await destroyTestDb();
+  });
+
+  it('stores chatContextWindow as a JSON number (not a "200000" string)', async () => {
+    const ws = await createWorkspace(db, { settings: undefined });
+
+    const updated = await repo.updateAiProviderSettings(ws.id, {
+      driver: 'openai',
+      chatModel: 'gpt-4o',
+      chatContextWindow: 200000,
+    });
+
+    // Returned row: the number survives as a real JS number, alongside the
+    // string fields which stay strings.
+    const provider = (updated.settings as any)?.ai?.provider;
+    expect(provider.chatContextWindow).toBe(200000);
+    expect(typeof provider.chatContextWindow).toBe('number');
+    expect(provider.driver).toBe('openai');
+    expect(provider.chatModel).toBe('gpt-4o');
+
+    // On disk: the jsonb value is typed 'number' (the must-fix assertion), and
+    // sibling string fields are typed 'string'.
+    const typed = await db
+      .selectFrom('workspaces')
+      .select([
+        sql<string>`jsonb_typeof(settings->'ai'->'provider'->'chatContextWindow')`.as(
+          'windowType',
+        ),
+        sql<string>`jsonb_typeof(settings->'ai'->'provider'->'chatModel')`.as(
+          'modelType',
+        ),
+      ])
+      .where('id', '=', ws.id)
+      .executeTakeFirstOrThrow();
+
+    expect(typed.windowType).toBe('number');
+    expect(typed.modelType).toBe('string');
+  });
+
+  it('re-reads chatContextWindow as a number after a partial-merge update', async () => {
+    const ws = await createWorkspace(db, {
+      settings: { ai: { provider: { driver: 'openai', chatModel: 'x' } } },
+    });
+
+    // Merge in only the numeric field; siblings must be preserved and the value
+    // must still be a JSON number, not a string.
+    await repo.updateAiProviderSettings(ws.id, { chatContextWindow: 128000 });
+
+    const row = await db
+      .selectFrom('workspaces')
+      .select([
+        'settings',
+        sql<string>`jsonb_typeof(settings->'ai'->'provider'->'chatContextWindow')`.as(
+          'windowType',
+        ),
+      ])
+      .where('id', '=', ws.id)
+      .executeTakeFirstOrThrow();
+
+    expect(row.windowType).toBe('number');
+    const provider = (row.settings as any)?.ai?.provider;
+    expect(provider.chatContextWindow).toBe(128000);
+    expect(provider.driver).toBe('openai');
+    expect(provider.chatModel).toBe('x');
+  });
+});