feat(ai-chat): realtime token counter + reasoning tokens, Claude-Code style (#151)
Tokens were only counted post-hoc (onFinish) and the header badge updated only on
chat open/switch; reasoning wasn't requested or shown. Now a counter ticks LIVE
during generation and surfaces reasoning ("thinking") tokens separately, like
Claude Code's `Thinking… · N tokens`.
Architecture (AI SDK v6): no provider gives exact per-token usage mid-stream, so
the live number is a cheap client estimate (chars/≈4) reconciled to AUTHORITATIVE
provider usage at step boundaries and turn end. The useChat per-delta re-render is
the existing realtime engine.
- server: `chatStreamMetadata` now also forwards usage on `finish-step` + `finish`;
`sendReasoning: true`; persisted `metadata.usage` carries `reasoningTokens`
(normalized from `outputTokenDetails` or the deprecated field).
- client: pure `count-stream-tokens` (estimateTokens / liveTurnTokens, prefers
authoritative usage else estimate); `Thinking… · N tokens` in the typing
indicator; collapsible "Thinking" reasoning block; throttled (~8 Hz) live
turn-token header badge; `reasoningTokens` in types + Markdown export.
Review fixes folded in:
- v6 `finish-step.usage` is PER-STEP, not cumulative — the server now ACCUMULATES
a running sum (new pure `accumulateStepUsage`) and sends the cumulative, which
converges to `finish.totalUsage`, so the live counter never jumps DOWN on a
multi-step agent turn.
- reasoning double-count: the authoritative turn-total is attributed to a block
ONLY for a single-reasoning-part (one-step) turn; multi-step blocks each show
their own estimate (the authoritative total stays in the header).
- no "0" badge flash at turn start (require live > 0, else show context size).
- comment refreshed (finish-step trigger).
Tests: server `accumulateStepUsage` + updated `chatStreamMetadata` (34 in the
suite); client pure-fn tests. Both tsc clean; 162 client ai-chat + the ai-chat
server suite pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -1147,6 +1147,9 @@
|
|||||||
"Ask a question about this documentation.": "Ask a question about this documentation.",
|
"Ask a question about this documentation.": "Ask a question about this documentation.",
|
||||||
"Ask a question…": "Ask a question…",
|
"Ask a question…": "Ask a question…",
|
||||||
"Thinking…": "Thinking…",
|
"Thinking…": "Thinking…",
|
||||||
|
"Thinking… · {{count}} tokens": "Thinking… · {{count}} tokens",
|
||||||
|
"Thinking": "Thinking",
|
||||||
|
"Thinking · {{count}} tokens": "Thinking · {{count}} tokens",
|
||||||
"The assistant is unavailable right now. Please try again.": "The assistant is unavailable right now. Please try again.",
|
"The assistant is unavailable right now. Please try again.": "The assistant is unavailable right now. Please try again.",
|
||||||
"Public share assistant": "Public share assistant",
|
"Public share assistant": "Public share assistant",
|
||||||
"Let anonymous visitors of public shares ask an AI assistant scoped to that share's pages. You pay for the tokens.": "Let anonymous visitors of public shares ask an AI assistant scoped to that share's pages. You pay for the tokens.",
|
"Let anonymous visitors of public shares ask an AI assistant scoped to that share's pages. You pay for the tokens.": "Let anonymous visitors of public shares ask an AI assistant scoped to that share's pages. You pay for the tokens.",
|
||||||
@@ -1158,6 +1161,7 @@
|
|||||||
"Built-in assistant persona": "Built-in assistant persona",
|
"Built-in assistant persona": "Built-in assistant persona",
|
||||||
"Minimize": "Minimize",
|
"Minimize": "Minimize",
|
||||||
"Current context size": "Current context size",
|
"Current context size": "Current context size",
|
||||||
|
"Tokens generated this turn": "Tokens generated this turn",
|
||||||
"AI agent": "AI agent",
|
"AI agent": "AI agent",
|
||||||
"Take a look at the current document": "Take a look at the current document",
|
"Take a look at the current document": "Take a look at the current document",
|
||||||
"AI agent is typing…": "AI agent is typing…",
|
"AI agent is typing…": "AI agent is typing…",
|
||||||
|
|||||||
@@ -680,6 +680,9 @@
|
|||||||
"AI agent is typing…": "AI-агент печатает…",
|
"AI agent is typing…": "AI-агент печатает…",
|
||||||
"{{name}} is typing…": "{{name}} печатает…",
|
"{{name}} is typing…": "{{name}} печатает…",
|
||||||
"Thinking…": "Думаю…",
|
"Thinking…": "Думаю…",
|
||||||
|
"Thinking… · {{count}} tokens": "Думаю… · {{count}} токенов",
|
||||||
|
"Thinking": "Размышления",
|
||||||
|
"Thinking · {{count}} tokens": "Размышления · {{count}} токенов",
|
||||||
"Agent role": "Роль агента",
|
"Agent role": "Роль агента",
|
||||||
"AI chat": "AI-чат",
|
"AI chat": "AI-чат",
|
||||||
"AI chat is disabled for this workspace.": "AI-чат отключён для этого рабочего пространства.",
|
"AI chat is disabled for this workspace.": "AI-чат отключён для этого рабочего пространства.",
|
||||||
@@ -690,6 +693,7 @@
|
|||||||
"Copy chat": "Копировать чат",
|
"Copy chat": "Копировать чат",
|
||||||
"Created successfully": "Успешно создано",
|
"Created successfully": "Успешно создано",
|
||||||
"Current context size": "Текущий размер контекста",
|
"Current context size": "Текущий размер контекста",
|
||||||
|
"Tokens generated this turn": "Токенов сгенерировано за ход",
|
||||||
"Delete this chat?": "Удалить этот чат?",
|
"Delete this chat?": "Удалить этот чат?",
|
||||||
"Deleted successfully": "Успешно удалено",
|
"Deleted successfully": "Успешно удалено",
|
||||||
"Edited by AI agent on behalf of {{name}}": "Отредактировано AI-агентом от имени {{name}}",
|
"Edited by AI agent on behalf of {{name}}": "Отредактировано AI-агентом от имени {{name}}",
|
||||||
|
|||||||
@@ -156,6 +156,12 @@ export default function AiChatWindow() {
|
|||||||
isStreaming: false,
|
isStreaming: false,
|
||||||
});
|
});
|
||||||
|
|
||||||
|
// Live turn-token total (reasoning + output) for the in-flight turn, pushed up
|
||||||
|
// (THROTTLED to ~8 Hz inside ChatThread) so the header badge ticks mid-stream.
|
||||||
|
// `null` means no turn is in flight -> the badge falls back to the persisted
|
||||||
|
// context size below.
|
||||||
|
const [liveTurnTokens, setLiveTurnTokens] = useState<number | null>(null);
|
||||||
|
|
||||||
// The page the user is currently viewing. AiChatWindow lives in a pathless
|
// The page the user is currently viewing. AiChatWindow lives in a pathless
|
||||||
// parent layout route, so useParams() can't see :pageSlug. Match the full
|
// parent layout route, so useParams() can't see :pageSlug. Match the full
|
||||||
// pathname against the authenticated page route instead so "the current page"
|
// pathname against the authenticated page route instead so "the current page"
|
||||||
@@ -485,11 +491,19 @@ export default function AiChatWindow() {
|
|||||||
)}
|
)}
|
||||||
|
|
||||||
<div style={{ flex: 1, display: "flex", justifyContent: "center" }}>
|
<div style={{ flex: 1, display: "flex", justifyContent: "center" }}>
|
||||||
{contextTokens > 0 && (
|
{/* While a turn streams, show the LIVE turn-token count (ticks ~8 Hz);
|
||||||
|
once it finishes, fall back to the persisted context size. Require
|
||||||
|
> 0 so the very first emit (an empty tail message, count 0) does not
|
||||||
|
flash a "0" badge before any token streams in (#151 review). */}
|
||||||
|
{liveTurnTokens !== null && liveTurnTokens > 0 ? (
|
||||||
|
<Tooltip label={t("Tokens generated this turn")} withArrow>
|
||||||
|
<span className={classes.badge}>{formatTokens(liveTurnTokens)}</span>
|
||||||
|
</Tooltip>
|
||||||
|
) : contextTokens > 0 ? (
|
||||||
<Tooltip label={t("Current context size")} withArrow>
|
<Tooltip label={t("Current context size")} withArrow>
|
||||||
<span className={classes.badge}>{formatTokens(contextTokens)}</span>
|
<span className={classes.badge}>{formatTokens(contextTokens)}</span>
|
||||||
</Tooltip>
|
</Tooltip>
|
||||||
)}
|
) : null}
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div style={{ display: "flex", alignItems: "center", gap: 1 }}>
|
<div style={{ display: "flex", alignItems: "center", gap: 1 }}>
|
||||||
@@ -608,6 +622,7 @@ export default function AiChatWindow() {
|
|||||||
assistantName={currentRole?.name}
|
assistantName={currentRole?.name}
|
||||||
onTurnFinished={onTurnFinished}
|
onTurnFinished={onTurnFinished}
|
||||||
liveStateRef={liveThreadRef}
|
liveStateRef={liveThreadRef}
|
||||||
|
onLiveTurnTokens={setLiveTurnTokens}
|
||||||
/>
|
/>
|
||||||
)}
|
)}
|
||||||
</div>
|
</div>
|
||||||
|
|||||||
@@ -111,6 +111,24 @@
|
|||||||
background: light-dark(var(--mantine-color-gray-0), var(--mantine-color-dark-6));
|
background: light-dark(var(--mantine-color-gray-0), var(--mantine-color-dark-6));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* Collapsible "Thinking" (reasoning) block: a subtle left rule, dimmer than the
|
||||||
|
answer so it reads as secondary thinking context above the real answer. */
|
||||||
|
.reasoningBlock {
|
||||||
|
border-left: 2px solid light-dark(var(--mantine-color-gray-3), var(--mantine-color-dark-4));
|
||||||
|
padding-left: 8px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.reasoningText {
|
||||||
|
margin-top: 4px;
|
||||||
|
font-size: var(--mantine-font-size-xs);
|
||||||
|
color: light-dark(var(--mantine-color-gray-7), var(--mantine-color-dark-1));
|
||||||
|
white-space: pre-wrap;
|
||||||
|
}
|
||||||
|
|
||||||
|
.reasoningText p {
|
||||||
|
margin: 0 0 4px;
|
||||||
|
}
|
||||||
|
|
||||||
.inputWrapper {
|
.inputWrapper {
|
||||||
flex: 0 0 auto;
|
flex: 0 0 auto;
|
||||||
padding-top: var(--mantine-spacing-xs);
|
padding-top: var(--mantine-spacing-xs);
|
||||||
|
|||||||
@@ -23,6 +23,7 @@ import {
|
|||||||
} from "@/features/ai-chat/types/ai-chat.types.ts";
|
} from "@/features/ai-chat/types/ai-chat.types.ts";
|
||||||
import { describeChatError } from "@/features/ai-chat/utils/error-message.ts";
|
import { describeChatError } from "@/features/ai-chat/utils/error-message.ts";
|
||||||
import { extractServerChatId } from "@/features/ai-chat/utils/adopt-chat-id.ts";
|
import { extractServerChatId } from "@/features/ai-chat/utils/adopt-chat-id.ts";
|
||||||
|
import { liveTurnTokens } from "@/features/ai-chat/utils/count-stream-tokens.ts";
|
||||||
import {
|
import {
|
||||||
dequeue,
|
dequeue,
|
||||||
enqueueMessage,
|
enqueueMessage,
|
||||||
@@ -69,6 +70,12 @@ interface ChatThreadProps {
|
|||||||
* assistant message. A ref (not state) avoids re-rendering the parent on
|
* assistant message. A ref (not state) avoids re-rendering the parent on
|
||||||
* every streamed delta. */
|
* every streamed delta. */
|
||||||
liveStateRef?: MutableRefObject<{ messages: UIMessage[]; isStreaming: boolean }>;
|
liveStateRef?: MutableRefObject<{ messages: UIMessage[]; isStreaming: boolean }>;
|
||||||
|
/** Reports the live turn-token total (reasoning + output) for the in-flight
|
||||||
|
* turn so the parent can show a header badge that ticks mid-stream. THROTTLED
|
||||||
|
* here (~8 Hz) so the parent re-renders a handful of times a second, not on
|
||||||
|
* every streamed delta. Called with `null` when no turn is in flight (the
|
||||||
|
* parent then reverts the badge to the persisted context size). */
|
||||||
|
onLiveTurnTokens?: (tokens: number | null) => void;
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@@ -113,6 +120,7 @@ export default function ChatThread({
|
|||||||
assistantName,
|
assistantName,
|
||||||
onTurnFinished,
|
onTurnFinished,
|
||||||
liveStateRef,
|
liveStateRef,
|
||||||
|
onLiveTurnTokens,
|
||||||
}: ChatThreadProps) {
|
}: ChatThreadProps) {
|
||||||
const { t } = useTranslation();
|
const { t } = useTranslation();
|
||||||
|
|
||||||
@@ -310,6 +318,54 @@ export default function ChatThread({
|
|||||||
};
|
};
|
||||||
}, [liveStateRef, messages, isStreaming]);
|
}, [liveStateRef, messages, isStreaming]);
|
||||||
|
|
||||||
|
// Report the live turn-token total to the parent header badge, THROTTLED to
|
||||||
|
// ~8 Hz so the parent re-renders a few times a second instead of on every
|
||||||
|
// streamed delta. The tail assistant message's reasoning+output (estimate while
|
||||||
|
// streaming, authoritative once a step reports usage) is the live figure. When
|
||||||
|
// the turn ends we emit a final exact value, then `null` so the parent reverts
|
||||||
|
// the badge to the persisted context size.
|
||||||
|
const lastEmitRef = useRef(0);
|
||||||
|
const emitTimerRef = useRef<ReturnType<typeof setTimeout> | null>(null);
|
||||||
|
useEffect(() => {
|
||||||
|
if (!onLiveTurnTokens) return;
|
||||||
|
if (!isStreaming) {
|
||||||
|
// Turn ended (or never started): clear any pending throttle and revert.
|
||||||
|
if (emitTimerRef.current) {
|
||||||
|
clearTimeout(emitTimerRef.current);
|
||||||
|
emitTimerRef.current = null;
|
||||||
|
}
|
||||||
|
lastEmitRef.current = 0;
|
||||||
|
onLiveTurnTokens(null);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
const tail = messages[messages.length - 1];
|
||||||
|
const live =
|
||||||
|
tail?.role === "assistant" ? liveTurnTokens(tail) : null;
|
||||||
|
const total = live ? live.reasoning + live.output : 0;
|
||||||
|
const now = Date.now();
|
||||||
|
const MIN_INTERVAL = 120; // ms (~8 Hz)
|
||||||
|
const elapsed = now - lastEmitRef.current;
|
||||||
|
if (elapsed >= MIN_INTERVAL) {
|
||||||
|
lastEmitRef.current = now;
|
||||||
|
onLiveTurnTokens(total);
|
||||||
|
} else if (!emitTimerRef.current) {
|
||||||
|
// Schedule a trailing emit so the FINAL value of a burst is not dropped.
|
||||||
|
emitTimerRef.current = setTimeout(() => {
|
||||||
|
emitTimerRef.current = null;
|
||||||
|
lastEmitRef.current = Date.now();
|
||||||
|
onLiveTurnTokens(total);
|
||||||
|
}, MIN_INTERVAL - elapsed);
|
||||||
|
}
|
||||||
|
}, [messages, isStreaming, onLiveTurnTokens]);
|
||||||
|
|
||||||
|
// Clear any pending throttle timer on unmount (chat switch via `key`) so a
|
||||||
|
// trailing emit can't fire into a torn-down thread's parent.
|
||||||
|
useEffect(() => {
|
||||||
|
return () => {
|
||||||
|
if (emitTimerRef.current) clearTimeout(emitTimerRef.current);
|
||||||
|
};
|
||||||
|
}, []);
|
||||||
|
|
||||||
// Classify the turn error into a heading + detail so the banner names the cause
|
// Classify the turn error into a heading + detail so the banner names the cause
|
||||||
// (connection reset, timeout, rate limit, context overflow, quota, ...) instead
|
// (connection reset, timeout, rate limit, context overflow, quota, ...) instead
|
||||||
// of a generic "Something went wrong".
|
// of a generic "Something went wrong".
|
||||||
|
|||||||
@@ -2,6 +2,7 @@ import { Box, Text } from "@mantine/core";
|
|||||||
import { useTranslation } from "react-i18next";
|
import { useTranslation } from "react-i18next";
|
||||||
import type { UIMessage } from "@ai-sdk/react";
|
import type { UIMessage } from "@ai-sdk/react";
|
||||||
import ToolCallCard from "@/features/ai-chat/components/tool-call-card.tsx";
|
import ToolCallCard from "@/features/ai-chat/components/tool-call-card.tsx";
|
||||||
|
import ReasoningBlock from "@/features/ai-chat/components/reasoning-block.tsx";
|
||||||
import ChatErrorAlert from "@/features/ai-chat/components/chat-error-alert.tsx";
|
import ChatErrorAlert from "@/features/ai-chat/components/chat-error-alert.tsx";
|
||||||
import ChatStoppedNotice from "@/features/ai-chat/components/chat-stopped-notice.tsx";
|
import ChatStoppedNotice from "@/features/ai-chat/components/chat-stopped-notice.tsx";
|
||||||
import { ToolUiPart, isToolPart } from "@/features/ai-chat/utils/tool-parts.tsx";
|
import { ToolUiPart, isToolPart } from "@/features/ai-chat/utils/tool-parts.tsx";
|
||||||
@@ -77,12 +78,45 @@ export default function MessageItem({
|
|||||||
// return won't fire for them.
|
// return won't fire for them.
|
||||||
if (!assistantMessageHasVisibleContent(message)) return null;
|
if (!assistantMessageHasVisibleContent(message)) return null;
|
||||||
|
|
||||||
|
// Authoritative reasoning token count for the turn, if the server attached it
|
||||||
|
// (incl. providers that report a reasoning COUNT without streaming the text).
|
||||||
|
// It is the TURN TOTAL, so it may only be attributed to a block when there is a
|
||||||
|
// SINGLE reasoning part (the common one-step turn) — then that block shows the
|
||||||
|
// exact figure. With multiple reasoning parts (multi-step agent turn) every
|
||||||
|
// block falls back to its own per-part estimate; attributing the turn total to
|
||||||
|
// one of them would double-count against the others' estimates (#151 review).
|
||||||
|
// The authoritative turn total is still surfaced live in the header badge.
|
||||||
|
const reasoningTokens = (
|
||||||
|
message.metadata as { usage?: { reasoningTokens?: number } } | undefined
|
||||||
|
)?.usage?.reasoningTokens;
|
||||||
|
const reasoningPartCount = message.parts.reduce(
|
||||||
|
(acc, p) => (p.type === "reasoning" ? acc + 1 : acc),
|
||||||
|
0,
|
||||||
|
);
|
||||||
|
const lastReasoningIndex = message.parts.reduce(
|
||||||
|
(acc, p, i) => (p.type === "reasoning" ? i : acc),
|
||||||
|
-1,
|
||||||
|
);
|
||||||
|
|
||||||
return (
|
return (
|
||||||
<Box className={classes.messageRow}>
|
<Box className={classes.messageRow}>
|
||||||
<Text size="xs" c="dimmed" mb={4}>
|
<Text size="xs" c="dimmed" mb={4}>
|
||||||
{resolveAssistantName(assistantName) ?? t("AI agent")}
|
{resolveAssistantName(assistantName) ?? t("AI agent")}
|
||||||
</Text>
|
</Text>
|
||||||
{message.parts.map((part, index) => {
|
{message.parts.map((part, index) => {
|
||||||
|
if (part.type === "reasoning") {
|
||||||
|
// Reasoning ("thinking") -> a collapsible block with its own token
|
||||||
|
// count. Empty/whitespace reasoning with no authoritative count carries
|
||||||
|
// nothing to show, so skip it (avoids an empty 0-token block).
|
||||||
|
const text = (part as { text?: string }).text ?? "";
|
||||||
|
const tokens =
|
||||||
|
reasoningPartCount === 1 && index === lastReasoningIndex
|
||||||
|
? reasoningTokens
|
||||||
|
: undefined;
|
||||||
|
if (!text.trim() && !(tokens && tokens > 0)) return null;
|
||||||
|
return <ReasoningBlock key={index} text={text} tokens={tokens} />;
|
||||||
|
}
|
||||||
|
|
||||||
if (part.type === "text") {
|
if (part.type === "text") {
|
||||||
// Skip empty/whitespace-only text parts (a streaming message often
|
// Skip empty/whitespace-only text parts (a streaming message often
|
||||||
// starts with an empty text part before the first token arrives); the
|
// starts with an empty text part before the first token arrives); the
|
||||||
|
|||||||
@@ -6,6 +6,7 @@ import MessageItem from "@/features/ai-chat/components/message-item.tsx";
|
|||||||
import TypingIndicator from "@/features/ai-chat/components/typing-indicator.tsx";
|
import TypingIndicator from "@/features/ai-chat/components/typing-indicator.tsx";
|
||||||
import { isToolPart, toolRunState, ToolUiPart } from "@/features/ai-chat/utils/tool-parts.tsx";
|
import { isToolPart, toolRunState, ToolUiPart } from "@/features/ai-chat/utils/tool-parts.tsx";
|
||||||
import { assistantMessageHasVisibleContent } from "@/features/ai-chat/utils/message-content.ts";
|
import { assistantMessageHasVisibleContent } from "@/features/ai-chat/utils/message-content.ts";
|
||||||
|
import { liveTurnTokens } from "@/features/ai-chat/utils/count-stream-tokens.ts";
|
||||||
import classes from "@/features/ai-chat/components/ai-chat.module.css";
|
import classes from "@/features/ai-chat/components/ai-chat.module.css";
|
||||||
|
|
||||||
interface MessageListProps {
|
interface MessageListProps {
|
||||||
@@ -94,6 +95,19 @@ export function typingIndicatorShowsName(messages: UIMessage[]): boolean {
|
|||||||
return !assistantMessageHasVisibleContent(last);
|
return !assistantMessageHasVisibleContent(last);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* The live thinking-token count to show on the standalone typing indicator. It
|
||||||
|
* is the reasoning split of the tail assistant message (estimate while streaming,
|
||||||
|
* authoritative once the server attaches usage at a step/turn boundary). Returns
|
||||||
|
* 0 when the turn has produced no reasoning yet — the indicator then shows the
|
||||||
|
* plain "Thinking…" line.
|
||||||
|
*/
|
||||||
|
export function tailThinkingTokens(messages: UIMessage[]): number {
|
||||||
|
const last = messages[messages.length - 1];
|
||||||
|
if (!last || last.role !== "assistant") return 0;
|
||||||
|
return liveTurnTokens(last).reasoning;
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Scrollable transcript. Auto-scrolls to the newest message as it streams in,
|
* Scrollable transcript. Auto-scrolls to the newest message as it streams in,
|
||||||
* but only while the user is pinned to the bottom — if they scrolled up to read
|
* but only while the user is pinned to the bottom — if they scrolled up to read
|
||||||
@@ -190,7 +204,13 @@ export default function MessageList({
|
|||||||
assistantName={assistantName}
|
assistantName={assistantName}
|
||||||
/>
|
/>
|
||||||
))}
|
))}
|
||||||
{typing && <TypingIndicator assistantName={assistantName} showName={typingIndicatorShowsName(messages)} />}
|
{typing && (
|
||||||
|
<TypingIndicator
|
||||||
|
assistantName={assistantName}
|
||||||
|
showName={typingIndicatorShowsName(messages)}
|
||||||
|
thinkingTokens={tailThinkingTokens(messages)}
|
||||||
|
/>
|
||||||
|
)}
|
||||||
</Stack>
|
</Stack>
|
||||||
</ScrollArea>
|
</ScrollArea>
|
||||||
);
|
);
|
||||||
|
|||||||
@@ -0,0 +1,83 @@
|
|||||||
|
import { useState } from "react";
|
||||||
|
import { Box, Collapse, Group, Text, UnstyledButton } from "@mantine/core";
|
||||||
|
import { IconChevronDown } from "@tabler/icons-react";
|
||||||
|
import { useTranslation } from "react-i18next";
|
||||||
|
import { estimateTokens } from "@/features/ai-chat/utils/count-stream-tokens.ts";
|
||||||
|
import { renderChatMarkdown } from "@/features/ai-chat/utils/markdown.ts";
|
||||||
|
import classes from "@/features/ai-chat/components/ai-chat.module.css";
|
||||||
|
|
||||||
|
interface ReasoningBlockProps {
|
||||||
|
/** The streamed/persisted reasoning (thinking) text. May be empty when the
|
||||||
|
* provider reports only a reasoning token COUNT without the text. */
|
||||||
|
text: string;
|
||||||
|
/** Authoritative reasoning token count from `usage.reasoningTokens`, when the
|
||||||
|
* step/turn has finished. When absent (or 0) the count is estimated from the
|
||||||
|
* text length so it ticks live as the reasoning streams in. */
|
||||||
|
tokens?: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Collapsible "Thinking" block for an assistant `reasoning` part. Mirrors Claude
|
||||||
|
* Code's surfacing of the model's thinking: a header that shows the thinking
|
||||||
|
* token count (authoritative when the step has reported usage, else a live
|
||||||
|
* estimate from the streamed text) and an expandable body with the reasoning
|
||||||
|
* prose. Collapsed by default so it never crowds out the answer.
|
||||||
|
*
|
||||||
|
* Providers that don't stream reasoning TEXT still render this block from the
|
||||||
|
* authoritative count alone (header only, empty body) so the cost is visible.
|
||||||
|
*/
|
||||||
|
export default function ReasoningBlock({ text, tokens }: ReasoningBlockProps) {
|
||||||
|
const { t } = useTranslation();
|
||||||
|
const [open, setOpen] = useState(false);
|
||||||
|
|
||||||
|
// Authoritative count wins; otherwise estimate live from the streamed text.
|
||||||
|
const count = tokens && tokens > 0 ? tokens : estimateTokens(text);
|
||||||
|
const trimmed = text.trim();
|
||||||
|
const html = trimmed ? renderChatMarkdown(trimmed, {}) : "";
|
||||||
|
|
||||||
|
return (
|
||||||
|
<Box className={classes.reasoningBlock} mb={6}>
|
||||||
|
<UnstyledButton
|
||||||
|
onClick={() => setOpen((o) => !o)}
|
||||||
|
// No body to expand when the provider reported only a token count.
|
||||||
|
disabled={!trimmed}
|
||||||
|
aria-expanded={open}
|
||||||
|
>
|
||||||
|
<Group gap={6} wrap="nowrap" align="center">
|
||||||
|
<IconChevronDown
|
||||||
|
size={12}
|
||||||
|
style={{
|
||||||
|
transform: open ? "none" : "rotate(-90deg)",
|
||||||
|
transition: "transform 150ms ease",
|
||||||
|
opacity: trimmed ? 1 : 0.4,
|
||||||
|
}}
|
||||||
|
/>
|
||||||
|
<Text size="xs" c="dimmed">
|
||||||
|
{count > 0
|
||||||
|
? t("Thinking · {{count}} tokens", { count })
|
||||||
|
: t("Thinking")}
|
||||||
|
</Text>
|
||||||
|
</Group>
|
||||||
|
</UnstyledButton>
|
||||||
|
|
||||||
|
{trimmed && (
|
||||||
|
<Collapse in={open}>
|
||||||
|
{html ? (
|
||||||
|
<div
|
||||||
|
className={classes.reasoningText}
|
||||||
|
// Sanitized by renderChatMarkdown (DOMPurify) before insertion.
|
||||||
|
dangerouslySetInnerHTML={{ __html: html }}
|
||||||
|
/>
|
||||||
|
) : (
|
||||||
|
<Text
|
||||||
|
className={classes.reasoningText}
|
||||||
|
style={{ whiteSpace: "pre-wrap" }}
|
||||||
|
>
|
||||||
|
{trimmed}
|
||||||
|
</Text>
|
||||||
|
)}
|
||||||
|
</Collapse>
|
||||||
|
)}
|
||||||
|
</Box>
|
||||||
|
);
|
||||||
|
}
|
||||||
@@ -0,0 +1,50 @@
|
|||||||
|
import { describe, expect, it } from "vitest";
|
||||||
|
import type { UIMessage } from "@ai-sdk/react";
|
||||||
|
import { tailThinkingTokens } from "@/features/ai-chat/components/message-list.tsx";
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Pure-helper tests for `tailThinkingTokens`: the live thinking-token count the
|
||||||
|
* standalone typing indicator shows. It is the reasoning split of the tail
|
||||||
|
* assistant message (estimate while streaming, authoritative once usage arrives).
|
||||||
|
*/
|
||||||
|
const msg = (
|
||||||
|
role: "user" | "assistant",
|
||||||
|
parts: unknown[],
|
||||||
|
metadata?: unknown,
|
||||||
|
): UIMessage =>
|
||||||
|
({ id: Math.random().toString(), role, parts, metadata }) as UIMessage;
|
||||||
|
|
||||||
|
describe("tailThinkingTokens", () => {
|
||||||
|
it("is 0 when there are no messages", () => {
|
||||||
|
expect(tailThinkingTokens([])).toBe(0);
|
||||||
|
});
|
||||||
|
|
||||||
|
it("is 0 when the tail message is the user's", () => {
|
||||||
|
expect(tailThinkingTokens([msg("user", [{ type: "text", text: "q" }])])).toBe(0);
|
||||||
|
});
|
||||||
|
|
||||||
|
it("is 0 when the assistant has produced no reasoning yet", () => {
|
||||||
|
expect(
|
||||||
|
tailThinkingTokens([msg("assistant", [{ type: "text", text: "answer" }])]),
|
||||||
|
).toBe(0);
|
||||||
|
});
|
||||||
|
|
||||||
|
it("estimates reasoning tokens from streamed reasoning text", () => {
|
||||||
|
// 8 chars -> 2 tokens.
|
||||||
|
expect(
|
||||||
|
tailThinkingTokens([
|
||||||
|
msg("assistant", [{ type: "reasoning", text: "12345678" }]),
|
||||||
|
]),
|
||||||
|
).toBe(2);
|
||||||
|
});
|
||||||
|
|
||||||
|
it("uses authoritative usage.reasoningTokens once the server attaches it", () => {
|
||||||
|
expect(
|
||||||
|
tailThinkingTokens([
|
||||||
|
msg("assistant", [{ type: "reasoning", text: "x" }], {
|
||||||
|
usage: { outputTokens: 100, reasoningTokens: 42 },
|
||||||
|
}),
|
||||||
|
]),
|
||||||
|
).toBe(42);
|
||||||
|
});
|
||||||
|
});
|
||||||
@@ -16,6 +16,12 @@ interface TypingIndicatorProps {
|
|||||||
* assistant row above already shows the same name, to avoid a duplicate label.
|
* assistant row above already shows the same name, to avoid a duplicate label.
|
||||||
*/
|
*/
|
||||||
showName?: boolean;
|
showName?: boolean;
|
||||||
|
/**
|
||||||
|
* Live thinking/reasoning token count for the in-flight turn. When > 0 the
|
||||||
|
* typing line becomes `Thinking… · {count} tokens` (like Claude Code). Omitted
|
||||||
|
* / 0 keeps the plain `Thinking…` line.
|
||||||
|
*/
|
||||||
|
thinkingTokens?: number;
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@@ -30,9 +36,14 @@ interface TypingIndicatorProps {
|
|||||||
* typing line is always the generic "Thinking…" (it never includes the
|
* typing line is always the generic "Thinking…" (it never includes the
|
||||||
* role/identity name).
|
* role/identity name).
|
||||||
*/
|
*/
|
||||||
export default function TypingIndicator({ assistantName, showName = true }: TypingIndicatorProps) {
|
export default function TypingIndicator({ assistantName, showName = true, thinkingTokens }: TypingIndicatorProps) {
|
||||||
const { t } = useTranslation();
|
const { t } = useTranslation();
|
||||||
const name = resolveAssistantName(assistantName);
|
const name = resolveAssistantName(assistantName);
|
||||||
|
// Show the running thinking-token count only once there is something to count.
|
||||||
|
const thinkingLine =
|
||||||
|
thinkingTokens && thinkingTokens > 0
|
||||||
|
? t("Thinking… · {{count}} tokens", { count: thinkingTokens })
|
||||||
|
: t("Thinking…");
|
||||||
|
|
||||||
return (
|
return (
|
||||||
<Box className={classes.messageRow}>
|
<Box className={classes.messageRow}>
|
||||||
@@ -48,7 +59,7 @@ export default function TypingIndicator({ assistantName, showName = true }: Typi
|
|||||||
<span />
|
<span />
|
||||||
</span>
|
</span>
|
||||||
<Text size="sm" c="dimmed">
|
<Text size="sm" c="dimmed">
|
||||||
{t("Thinking…")}
|
{thinkingLine}
|
||||||
</Text>
|
</Text>
|
||||||
</Group>
|
</Group>
|
||||||
</Box>
|
</Box>
|
||||||
|
|||||||
@@ -98,6 +98,10 @@ export interface IAiChatMessageRow {
|
|||||||
inputTokens?: number;
|
inputTokens?: number;
|
||||||
outputTokens?: number;
|
outputTokens?: number;
|
||||||
totalTokens?: number;
|
totalTokens?: number;
|
||||||
|
// Reasoning (thinking) tokens, when the provider reports them. Optional so
|
||||||
|
// old history rows (recorded before this shipped) stay valid. Included in
|
||||||
|
// `outputTokens` per the AI SDK usage shape.
|
||||||
|
reasoningTokens?: number;
|
||||||
};
|
};
|
||||||
// Current context size for the turn = final-step (input+output) tokens, i.e.
|
// Current context size for the turn = final-step (input+output) tokens, i.e.
|
||||||
// how much the conversation occupies in the model's context window after this
|
// how much the conversation occupies in the model's context window after this
|
||||||
|
|||||||
@@ -77,6 +77,7 @@ function rowTokens(usage: {
|
|||||||
inputTokens?: number;
|
inputTokens?: number;
|
||||||
outputTokens?: number;
|
outputTokens?: number;
|
||||||
totalTokens?: number;
|
totalTokens?: number;
|
||||||
|
reasoningTokens?: number;
|
||||||
}): number {
|
}): number {
|
||||||
return (
|
return (
|
||||||
usage.totalTokens ?? (usage.inputTokens ?? 0) + (usage.outputTokens ?? 0)
|
usage.totalTokens ?? (usage.inputTokens ?? 0) + (usage.outputTokens ?? 0)
|
||||||
@@ -175,8 +176,14 @@ export function buildChatMarkdown(args: BuildChatMarkdownArgs): string {
|
|||||||
const usage = row.metadata?.usage;
|
const usage = row.metadata?.usage;
|
||||||
if (usage) {
|
if (usage) {
|
||||||
const total = usage.totalTokens ?? rowTokens(usage);
|
const total = usage.totalTokens ?? rowTokens(usage);
|
||||||
|
// Reasoning (thinking) tokens are shown only when the provider reported a
|
||||||
|
// positive count; old rows / non-reasoning providers omit it.
|
||||||
|
const reasoning =
|
||||||
|
usage.reasoningTokens && usage.reasoningTokens > 0
|
||||||
|
? `, reasoning: ${usage.reasoningTokens}`
|
||||||
|
: "";
|
||||||
blocks.push(
|
blocks.push(
|
||||||
`_Tokens — in: ${usage.inputTokens ?? "?"}, out: ${usage.outputTokens ?? "?"}, total: ${total}_`,
|
`_Tokens — in: ${usage.inputTokens ?? "?"}, out: ${usage.outputTokens ?? "?"}${reasoning}, total: ${total}_`,
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|||||||
@@ -0,0 +1,119 @@
|
|||||||
|
import { describe, expect, it } from "vitest";
|
||||||
|
import type { UIMessage } from "@ai-sdk/react";
|
||||||
|
import {
|
||||||
|
estimateTokens,
|
||||||
|
liveTurnTokens,
|
||||||
|
} from "@/features/ai-chat/utils/count-stream-tokens.ts";
|
||||||
|
|
||||||
|
const msg = (parts: unknown[], metadata?: unknown): UIMessage =>
|
||||||
|
({
|
||||||
|
id: Math.random().toString(),
|
||||||
|
role: "assistant",
|
||||||
|
parts,
|
||||||
|
metadata,
|
||||||
|
}) as UIMessage;
|
||||||
|
|
||||||
|
describe("estimateTokens", () => {
|
||||||
|
it("returns 0 for the empty string", () => {
|
||||||
|
expect(estimateTokens("")).toBe(0);
|
||||||
|
});
|
||||||
|
|
||||||
|
it("ceils chars/4 so any non-empty text is at least 1 token", () => {
|
||||||
|
expect(estimateTokens("a")).toBe(1);
|
||||||
|
expect(estimateTokens("abcd")).toBe(1);
|
||||||
|
expect(estimateTokens("abcde")).toBe(2);
|
||||||
|
expect(estimateTokens("12345678")).toBe(2);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe("liveTurnTokens — estimate path", () => {
|
||||||
|
it("is all zeros for an undefined message", () => {
|
||||||
|
expect(liveTurnTokens(undefined)).toEqual({
|
||||||
|
reasoning: 0,
|
||||||
|
output: 0,
|
||||||
|
authoritative: false,
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
it("is all zeros for a parts-less message", () => {
|
||||||
|
expect(liveTurnTokens({ id: "x", role: "assistant" } as UIMessage)).toEqual({
|
||||||
|
reasoning: 0,
|
||||||
|
output: 0,
|
||||||
|
authoritative: false,
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
it("estimates output from text parts", () => {
|
||||||
|
// 8 chars -> 2 tokens.
|
||||||
|
const r = liveTurnTokens(msg([{ type: "text", text: "12345678" }]));
|
||||||
|
expect(r).toEqual({ reasoning: 0, output: 2, authoritative: false });
|
||||||
|
});
|
||||||
|
|
||||||
|
it("estimates reasoning from reasoning parts (kept separate from output)", () => {
|
||||||
|
const r = liveTurnTokens(
|
||||||
|
msg([
|
||||||
|
{ type: "reasoning", text: "12345678" },
|
||||||
|
{ type: "text", text: "abcd" },
|
||||||
|
]),
|
||||||
|
);
|
||||||
|
expect(r).toEqual({ reasoning: 2, output: 1, authoritative: false });
|
||||||
|
});
|
||||||
|
|
||||||
|
it("accumulates across multiple text + reasoning parts (multi-step)", () => {
|
||||||
|
const r = liveTurnTokens(
|
||||||
|
msg([
|
||||||
|
{ type: "reasoning", text: "abcd" }, // 1
|
||||||
|
{ type: "text", text: "abcd" }, // 1
|
||||||
|
{ type: "tool-getPage", state: "output-available" }, // ignored
|
||||||
|
{ type: "reasoning", text: "abcd" }, // 1
|
||||||
|
{ type: "text", text: "abcdefgh" }, // 2
|
||||||
|
]),
|
||||||
|
);
|
||||||
|
expect(r).toEqual({ reasoning: 2, output: 3, authoritative: false });
|
||||||
|
});
|
||||||
|
|
||||||
|
it("ignores non text/reasoning parts (tools, step-start)", () => {
|
||||||
|
const r = liveTurnTokens(
|
||||||
|
msg([
|
||||||
|
{ type: "step-start" },
|
||||||
|
{ type: "tool-getPage", state: "input-available" },
|
||||||
|
]),
|
||||||
|
);
|
||||||
|
expect(r).toEqual({ reasoning: 0, output: 0, authoritative: false });
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe("liveTurnTokens — authoritative path", () => {
|
||||||
|
it("returns authoritative usage verbatim, splitting reasoning out of output", () => {
|
||||||
|
// outputTokens INCLUDES reasoning in the AI SDK shape -> answer = 100 - 30.
|
||||||
|
const r = liveTurnTokens(
|
||||||
|
msg([{ type: "text", text: "estimate would be tiny" }], {
|
||||||
|
usage: { inputTokens: 500, outputTokens: 100, reasoningTokens: 30 },
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
expect(r).toEqual({ reasoning: 30, output: 70, authoritative: true });
|
||||||
|
});
|
||||||
|
|
||||||
|
it("treats missing reasoningTokens as 0 and keeps full output", () => {
|
||||||
|
const r = liveTurnTokens(
|
||||||
|
msg([{ type: "text", text: "x" }], {
|
||||||
|
usage: { inputTokens: 10, outputTokens: 42 },
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
expect(r).toEqual({ reasoning: 0, output: 42, authoritative: true });
|
||||||
|
});
|
||||||
|
|
||||||
|
it("never returns a negative output when reasoning exceeds reported output", () => {
|
||||||
|
const r = liveTurnTokens(
|
||||||
|
msg([], { usage: { outputTokens: 10, reasoningTokens: 40 } }),
|
||||||
|
);
|
||||||
|
expect(r).toEqual({ reasoning: 40, output: 0, authoritative: true });
|
||||||
|
});
|
||||||
|
|
||||||
|
it("falls back to the estimate when metadata has no usage object", () => {
|
||||||
|
const r = liveTurnTokens(
|
||||||
|
msg([{ type: "text", text: "abcd" }], { chatId: "c1" }),
|
||||||
|
);
|
||||||
|
expect(r).toEqual({ reasoning: 0, output: 1, authoritative: false });
|
||||||
|
});
|
||||||
|
});
|
||||||
@@ -0,0 +1,94 @@
|
|||||||
|
import type { UIMessage } from "@ai-sdk/react";
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Live token counting for a streaming AI-chat turn — split into REASONING
|
||||||
|
* (thinking) and OUTPUT (answer) tokens, mirroring how Claude Code shows
|
||||||
|
* `Thinking… · 60 tokens` next to its thinking indicator.
|
||||||
|
*
|
||||||
|
* No provider streams exact per-token usage mid-stream, so the live number is a
|
||||||
|
* CLIENT ESTIMATE (chars/≈4 heuristic) that is reconciled to AUTHORITATIVE usage
|
||||||
|
* once the server attaches it on a step/turn boundary (see the server's
|
||||||
|
* `chatStreamMetadata` + the client's read of `message.metadata.usage`). When
|
||||||
|
* authoritative usage is present we return it verbatim (the number "jumps to
|
||||||
|
* exact"); otherwise we return the running estimate. Pure + unit-testable: it
|
||||||
|
* never runs a real BPE tokenizer (that would be O(n²) on the hot path, bloat the
|
||||||
|
* bundle, and be wrong for Gemini/Ollama anyway).
|
||||||
|
*/
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Rough token estimate for a piece of text using the standard chars/≈4 heuristic.
|
||||||
|
* Returns 0 for empty/whitespace-free-of-content input, and ceils so any
|
||||||
|
* non-empty text counts as at least one token.
|
||||||
|
*/
|
||||||
|
export function estimateTokens(text: string): number {
|
||||||
|
if (!text) return 0;
|
||||||
|
return Math.ceil(text.length / 4);
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Authoritative per-step/turn usage the server attaches to message metadata. */
|
||||||
|
export interface AuthoritativeUsage {
|
||||||
|
inputTokens?: number;
|
||||||
|
outputTokens?: number;
|
||||||
|
totalTokens?: number;
|
||||||
|
reasoningTokens?: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Live token split for a turn's tail (streaming) assistant message. */
|
||||||
|
export interface LiveTurnTokens {
|
||||||
|
/** Thinking/reasoning tokens (estimate, or authoritative when available). */
|
||||||
|
reasoning: number;
|
||||||
|
/** Answer/output tokens (estimate, or authoritative when available). */
|
||||||
|
output: number;
|
||||||
|
/** True when the numbers come from authoritative server usage, not estimate. */
|
||||||
|
authoritative: boolean;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Read the authoritative usage off a UIMessage's metadata, if the server set it. */
|
||||||
|
function metadataUsage(message: UIMessage): AuthoritativeUsage | undefined {
|
||||||
|
const meta = message?.metadata as
|
||||||
|
| { usage?: AuthoritativeUsage }
|
||||||
|
| undefined;
|
||||||
|
const usage = meta?.usage;
|
||||||
|
if (!usage || typeof usage !== "object") return undefined;
|
||||||
|
return usage;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Token split for the given (streaming) assistant message.
|
||||||
|
*
|
||||||
|
* Prefers AUTHORITATIVE `metadata.usage` when the server has attached it (at a
|
||||||
|
* step/turn boundary, incl. `reasoningTokens`) — so the live counter snaps to the
|
||||||
|
* provider's exact figures. Until then it returns a running ESTIMATE summed over
|
||||||
|
* the message parts: `reasoning` parts feed the reasoning estimate, `text` parts
|
||||||
|
* feed the output estimate. Multi-part / multi-step turns accumulate naturally
|
||||||
|
* because every part of the turn is summed.
|
||||||
|
*
|
||||||
|
* Providers that don't stream reasoning text still surface a reasoning count once
|
||||||
|
* the authoritative usage arrives (`usage.reasoningTokens`); on the pure estimate
|
||||||
|
* path such a turn simply shows `reasoning: 0` until then.
|
||||||
|
*/
|
||||||
|
export function liveTurnTokens(message: UIMessage | undefined): LiveTurnTokens {
|
||||||
|
if (!message) return { reasoning: 0, output: 0, authoritative: false };
|
||||||
|
|
||||||
|
const usage = metadataUsage(message);
|
||||||
|
if (usage) {
|
||||||
|
// Authoritative branch: outputTokens already INCLUDES reasoning tokens in the
|
||||||
|
// AI SDK usage shape, so subtract reasoning out for the "answer" figure (never
|
||||||
|
// go negative if a provider reports them inconsistently).
|
||||||
|
const reasoning = usage.reasoningTokens ?? 0;
|
||||||
|
const totalOutput = usage.outputTokens ?? 0;
|
||||||
|
const output = Math.max(0, totalOutput - reasoning);
|
||||||
|
return { reasoning, output, authoritative: true };
|
||||||
|
}
|
||||||
|
|
||||||
|
let reasoning = 0;
|
||||||
|
let output = 0;
|
||||||
|
for (const part of message.parts ?? []) {
|
||||||
|
if (part.type === "reasoning") {
|
||||||
|
reasoning += estimateTokens((part as { text?: string }).text ?? "");
|
||||||
|
} else if (part.type === "text") {
|
||||||
|
output += estimateTokens((part as { text?: string }).text ?? "");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return { reasoning, output, authoritative: false };
|
||||||
|
}
|
||||||
@@ -5,7 +5,8 @@ import {
|
|||||||
rowToUiMessage,
|
rowToUiMessage,
|
||||||
prepareAgentStep,
|
prepareAgentStep,
|
||||||
buildPartialAssistantRecord,
|
buildPartialAssistantRecord,
|
||||||
chatStreamStartMetadata,
|
chatStreamMetadata,
|
||||||
|
accumulateStepUsage,
|
||||||
MAX_AGENT_STEPS,
|
MAX_AGENT_STEPS,
|
||||||
FINAL_STEP_INSTRUCTION,
|
FINAL_STEP_INSTRUCTION,
|
||||||
} from './ai-chat.service';
|
} from './ai-chat.service';
|
||||||
@@ -298,18 +299,135 @@ describe('buildPartialAssistantRecord', () => {
|
|||||||
});
|
});
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* chatStreamStartMetadata: attach the authoritative chatId to the streamed
|
* chatStreamMetadata: attach metadata to the streamed assistant UI message per
|
||||||
* assistant UI message ONLY on the `start` part (so the client adopts the real
|
* part type — `chatId` on `start` (so the client adopts the real created chat id
|
||||||
* created chat id at the first chunk — see #137). Any non-start part adds none.
|
* at the first chunk — see #137), and AUTHORITATIVE usage (incl. reasoning
|
||||||
|
* tokens) on `finish-step` and `finish` so the client's live token counter snaps
|
||||||
|
* to exact at each step/turn boundary.
|
||||||
*/
|
*/
|
||||||
describe('chatStreamStartMetadata', () => {
|
describe('chatStreamMetadata', () => {
|
||||||
it('returns { chatId } for the start part', () => {
|
it('returns { chatId } for the start part', () => {
|
||||||
expect(chatStreamStartMetadata({ type: 'start' }, 'chat-1')).toEqual({
|
expect(chatStreamMetadata({ type: 'start' }, 'chat-1')).toEqual({
|
||||||
chatId: 'chat-1',
|
chatId: 'chat-1',
|
||||||
});
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
it('returns undefined for a finish part (any non-start part)', () => {
|
it('returns the CUMULATIVE step usage passed in for the finish-step part', () => {
|
||||||
expect(chatStreamStartMetadata({ type: 'finish' }, 'chat-1')).toBeUndefined();
|
// finish-step usage is per-step in v6; the caller accumulates and passes the
|
||||||
|
// running sum, which this just wraps.
|
||||||
|
expect(
|
||||||
|
chatStreamMetadata(
|
||||||
|
{ type: 'finish-step', usage: { outputTokens: 100 } },
|
||||||
|
'chat-1',
|
||||||
|
{ inputTokens: 500, outputTokens: 220, totalTokens: 720, reasoningTokens: 30 },
|
||||||
|
),
|
||||||
|
).toEqual({
|
||||||
|
usage: { inputTokens: 500, outputTokens: 220, totalTokens: 720, reasoningTokens: 30 },
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns turn usage for the finish part (reasoning from deprecated top-level field)', () => {
|
||||||
|
expect(
|
||||||
|
chatStreamMetadata(
|
||||||
|
{
|
||||||
|
type: 'finish',
|
||||||
|
totalUsage: {
|
||||||
|
inputTokens: 1000,
|
||||||
|
outputTokens: 250,
|
||||||
|
totalTokens: 1250,
|
||||||
|
reasoningTokens: 50,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
'chat-1',
|
||||||
|
),
|
||||||
|
).toEqual({
|
||||||
|
usage: {
|
||||||
|
inputTokens: 1000,
|
||||||
|
outputTokens: 250,
|
||||||
|
totalTokens: 1250,
|
||||||
|
reasoningTokens: 50,
|
||||||
|
},
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
it('prefers outputTokenDetails.reasoningTokens over the deprecated field (finish)', () => {
|
||||||
|
expect(
|
||||||
|
chatStreamMetadata(
|
||||||
|
{
|
||||||
|
type: 'finish',
|
||||||
|
totalUsage: {
|
||||||
|
outputTokens: 100,
|
||||||
|
reasoningTokens: 5,
|
||||||
|
outputTokenDetails: { reasoningTokens: 30 },
|
||||||
|
},
|
||||||
|
},
|
||||||
|
'chat-1',
|
||||||
|
),
|
||||||
|
).toEqual({
|
||||||
|
usage: {
|
||||||
|
inputTokens: undefined,
|
||||||
|
outputTokens: 100,
|
||||||
|
totalTokens: undefined,
|
||||||
|
reasoningTokens: 30,
|
||||||
|
},
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns undefined for a finish-step with no accumulated usage', () => {
|
||||||
|
expect(
|
||||||
|
chatStreamMetadata({ type: 'finish-step' }, 'chat-1'),
|
||||||
|
).toBeUndefined();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns undefined for an unrelated part (e.g. text-delta)', () => {
|
||||||
|
expect(
|
||||||
|
chatStreamMetadata({ type: 'text-delta' }, 'chat-1'),
|
||||||
|
).toBeUndefined();
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
/**
|
||||||
|
* accumulateStepUsage: sums per-step usage into a running cumulative total so the
|
||||||
|
* client never sees the live counter jump DOWN on a multi-step agent turn (#151).
|
||||||
|
*/
|
||||||
|
describe('accumulateStepUsage', () => {
|
||||||
|
it('sums every field across two steps', () => {
|
||||||
|
expect(
|
||||||
|
accumulateStepUsage(
|
||||||
|
{ inputTokens: 500, outputTokens: 100, totalTokens: 600, reasoningTokens: 30 },
|
||||||
|
{ inputTokens: 520, outputTokens: 80, totalTokens: 600, reasoningTokens: 10 },
|
||||||
|
),
|
||||||
|
).toEqual({
|
||||||
|
inputTokens: 1020,
|
||||||
|
outputTokens: 180,
|
||||||
|
totalTokens: 1200,
|
||||||
|
reasoningTokens: 40,
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns the step as-is when there is no accumulator yet', () => {
|
||||||
|
expect(accumulateStepUsage(undefined, { outputTokens: 10 })).toEqual({
|
||||||
|
outputTokens: 10,
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns the accumulator unchanged when the step usage is absent', () => {
|
||||||
|
const acc = { outputTokens: 10 };
|
||||||
|
expect(accumulateStepUsage(acc, undefined)).toBe(acc);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns undefined when both sides are absent', () => {
|
||||||
|
expect(accumulateStepUsage(undefined, undefined)).toBeUndefined();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('keeps a field undefined only when neither side has it', () => {
|
||||||
|
expect(
|
||||||
|
accumulateStepUsage({ outputTokens: 5 }, { outputTokens: 7 }),
|
||||||
|
).toEqual({
|
||||||
|
inputTokens: undefined,
|
||||||
|
outputTokens: 12,
|
||||||
|
totalTokens: undefined,
|
||||||
|
reasoningTokens: undefined,
|
||||||
|
});
|
||||||
});
|
});
|
||||||
});
|
});
|
||||||
|
|||||||
@@ -420,7 +420,11 @@ export class AiChatService {
|
|||||||
toolCalls: serializeSteps(steps),
|
toolCalls: serializeSteps(steps),
|
||||||
metadata: {
|
metadata: {
|
||||||
finishReason,
|
finishReason,
|
||||||
usage: totalUsage,
|
// Persist the turn's cumulative usage WITH reasoning tokens resolved
|
||||||
|
// from either the new `outputTokenDetails` or the deprecated top-level
|
||||||
|
// field, so reopened history / the Markdown export show the thinking
|
||||||
|
// token cost too.
|
||||||
|
usage: normalizeStreamUsage(totalUsage as StreamUsage) ?? totalUsage,
|
||||||
// Final-step usage = the context actually fed to the model on the last LLM
|
// Final-step usage = the context actually fed to the model on the last LLM
|
||||||
// call (full history + tool results) plus the answer it just generated.
|
// call (full history + tool results) plus the answer it just generated.
|
||||||
// input+output of the FINAL step ≈ the conversation's CURRENT context size,
|
// input+output of the FINAL step ≈ the conversation's CURRENT context size,
|
||||||
@@ -512,17 +516,42 @@ export class AiChatService {
|
|||||||
// does not buffer responses by default.
|
// does not buffer responses by default.
|
||||||
// Scrub the SDK's hop-by-hop Connection header before it writes the head (Safari/HTTP2).
|
// Scrub the SDK's hop-by-hop Connection header before it writes the head (Safari/HTTP2).
|
||||||
stripStreamingHopByHopHeaders(res.raw);
|
stripStreamingHopByHopHeaders(res.raw);
|
||||||
|
// Running sum of per-step usage (v6 `finish-step.usage` is per-step). Sent
|
||||||
|
// as the cumulative authoritative usage so the client never jumps DOWN.
|
||||||
|
let cumulativeStepUsage: ChatStreamUsage | undefined;
|
||||||
result.pipeUIMessageStreamToResponse(res.raw, {
|
result.pipeUIMessageStreamToResponse(res.raw, {
|
||||||
headers: { 'X-Accel-Buffering': 'no' },
|
headers: { 'X-Accel-Buffering': 'no' },
|
||||||
// Surface the authoritative chatId on the streamed assistant UI message so
|
// Surface the authoritative chatId on the streamed assistant UI message so
|
||||||
// the client adopts the REAL id of the row we created, instead of guessing
|
// the client adopts the REAL id of the row we created, instead of guessing
|
||||||
// the newest chat in its list. `messageMetadata` is invoked by the AI SDK
|
// the newest chat in its list. `messageMetadata` is invoked by the AI SDK
|
||||||
// on the `start` and `finish` stream parts (ai@6); we attach `chatId` on the
|
// on the `start`, `finish-step` and `finish` stream parts (ai@6 — note the
|
||||||
// `start` part so it reaches the client (as message.metadata.chatId) at the
|
// `finish-step` trigger relies on it being delivered as its own
|
||||||
// very first chunk — before any second tab can race a newer chat into the
|
// message-metadata chunk); we attach `chatId` on the `start` part so it
|
||||||
// list. This fixes the two-tab "adoption race" (#137) where a new chat in
|
// reaches the client (as message.metadata.chatId) at the very first chunk —
|
||||||
// tab A could adopt tab B's id and leak its turns into the wrong row.
|
// before any second tab can race a newer chat into the list. This fixes the
|
||||||
messageMetadata: ({ part }) => chatStreamStartMetadata(part, chatId),
|
// two-tab "adoption race" (#137).
|
||||||
|
//
|
||||||
|
// `finish-step.usage` is PER-STEP (not cumulative) in v6, and the client
|
||||||
|
// merges each metadata.usage by replacement — so on a multi-step agent turn
|
||||||
|
// (up to MAX_AGENT_STEPS) the naive per-step value would make the live
|
||||||
|
// counter jump DOWN at each boundary. We keep a running sum here and send
|
||||||
|
// the CUMULATIVE usage, which converges to `finish.totalUsage` (#151).
|
||||||
|
messageMetadata: ({ part }) => {
|
||||||
|
const p = part as StreamMetadataPart;
|
||||||
|
if (p.type === 'finish-step') {
|
||||||
|
cumulativeStepUsage = accumulateStepUsage(
|
||||||
|
cumulativeStepUsage,
|
||||||
|
normalizeStreamUsage(p.usage),
|
||||||
|
);
|
||||||
|
}
|
||||||
|
return chatStreamMetadata(p, chatId, cumulativeStepUsage);
|
||||||
|
},
|
||||||
|
// Stream reasoning (thinking) parts to the client so the live counter can
|
||||||
|
// estimate reasoning tokens from streamed text. v6 default is already
|
||||||
|
// true; set explicitly so the intent survives any future SDK default
|
||||||
|
// change. Providers that don't emit reasoning text still surface the
|
||||||
|
// count via the authoritative `usage.reasoningTokens` on finish-step.
|
||||||
|
sendReasoning: true,
|
||||||
onError: (error: unknown) => {
|
onError: (error: unknown) => {
|
||||||
// Reuse the shared formatter so provider error formatting stays
|
// Reuse the shared formatter so provider error formatting stays
|
||||||
// unified between the log line and the streamed error message.
|
// unified between the log line and the streamed error message.
|
||||||
@@ -573,16 +602,97 @@ export class AiChatService {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/** Shape of the AI SDK v6 LanguageModelUsage we forward to the client. The SDK
|
||||||
|
* exposes `reasoningTokens` both as a (deprecated) top-level field and under
|
||||||
|
* `outputTokenDetails.reasoningTokens`; we normalize to a single field so the
|
||||||
|
* client gets one stable usage shape regardless of provider/SDK version. */
|
||||||
|
interface StreamUsage {
|
||||||
|
inputTokens?: number;
|
||||||
|
outputTokens?: number;
|
||||||
|
totalTokens?: number;
|
||||||
|
reasoningTokens?: number;
|
||||||
|
outputTokenDetails?: { reasoningTokens?: number };
|
||||||
|
}
|
||||||
|
|
||||||
|
/** A streamed part the messageMetadata callback can receive (only the fields we read). */
|
||||||
|
interface StreamMetadataPart {
|
||||||
|
type: string;
|
||||||
|
usage?: StreamUsage;
|
||||||
|
totalUsage?: StreamUsage;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Authoritative usage we attach to a streamed assistant message's metadata. */
|
||||||
|
export interface ChatStreamUsage {
|
||||||
|
inputTokens?: number;
|
||||||
|
outputTokens?: number;
|
||||||
|
totalTokens?: number;
|
||||||
|
reasoningTokens?: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Normalize an AI SDK usage object to our flat client-facing shape, resolving
|
||||||
|
* reasoning tokens from either the new `outputTokenDetails` or the deprecated
|
||||||
|
* top-level field. Returns undefined for a missing usage object. */
|
||||||
|
function normalizeStreamUsage(
|
||||||
|
usage: StreamUsage | undefined,
|
||||||
|
): ChatStreamUsage | undefined {
|
||||||
|
if (!usage) return undefined;
|
||||||
|
const reasoningTokens =
|
||||||
|
usage.outputTokenDetails?.reasoningTokens ?? usage.reasoningTokens;
|
||||||
|
return {
|
||||||
|
inputTokens: usage.inputTokens,
|
||||||
|
outputTokens: usage.outputTokens,
|
||||||
|
totalTokens: usage.totalTokens,
|
||||||
|
reasoningTokens,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Sum a (normalized) per-step usage into a running cumulative usage. v6's
|
||||||
|
* `finish-step.usage` is PER-STEP, so the caller accumulates across steps; the
|
||||||
|
* cumulative sum converges to the turn's `totalUsage` (no down-jump on the
|
||||||
|
* client). Returns undefined only when both sides are absent. Pure. */
|
||||||
|
export function accumulateStepUsage(
|
||||||
|
acc: ChatStreamUsage | undefined,
|
||||||
|
step: ChatStreamUsage | undefined,
|
||||||
|
): ChatStreamUsage | undefined {
|
||||||
|
if (!acc) return step;
|
||||||
|
if (!step) return acc;
|
||||||
|
const add = (a?: number, b?: number): number | undefined =>
|
||||||
|
a == null && b == null ? undefined : (a ?? 0) + (b ?? 0);
|
||||||
|
return {
|
||||||
|
inputTokens: add(acc.inputTokens, step.inputTokens),
|
||||||
|
outputTokens: add(acc.outputTokens, step.outputTokens),
|
||||||
|
totalTokens: add(acc.totalTokens, step.totalTokens),
|
||||||
|
reasoningTokens: add(acc.reasoningTokens, step.reasoningTokens),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Attach the authoritative `chatId` to the streamed assistant message's `start`
|
* Pure metadata builder for the streamed assistant UI message. The AI SDK calls
|
||||||
* part (as `message.metadata.chatId`) so the client can adopt the real id for a
|
* `messageMetadata` on the `start`, `finish-step` and `finish` stream parts; we
|
||||||
* new chat. See the client's adopt-chat-id.ts for the full #137 design.
|
* attach (as `message.metadata`):
|
||||||
|
* - `start` -> `{ chatId }` so the client adopts the real created chat id
|
||||||
|
* at the first chunk (see adopt-chat-id.ts / #137).
|
||||||
|
* - `finish-step` -> `{ usage }` the CUMULATIVE authoritative usage so far
|
||||||
|
* (incl. reasoning tokens) — the caller passes the running
|
||||||
|
* sum (`cumulativeStepUsage`), since v6 per-step usage is not
|
||||||
|
* cumulative; the client snaps to exact without jumping down.
|
||||||
|
* - `finish` -> `{ usage }` from the turn's `totalUsage` (final reconcile).
|
||||||
|
* Any other part type contributes no metadata. Pure + unit-testable.
|
||||||
*/
|
*/
|
||||||
export function chatStreamStartMetadata(
|
export function chatStreamMetadata(
|
||||||
part: { type: string },
|
part: StreamMetadataPart,
|
||||||
chatId: string,
|
chatId: string,
|
||||||
): { chatId: string } | undefined {
|
cumulativeStepUsage?: ChatStreamUsage,
|
||||||
return part.type === 'start' ? { chatId } : undefined;
|
): { chatId: string } | { usage: ChatStreamUsage } | undefined {
|
||||||
|
if (part.type === 'start') return { chatId };
|
||||||
|
if (part.type === 'finish-step') {
|
||||||
|
return cumulativeStepUsage ? { usage: cumulativeStepUsage } : undefined;
|
||||||
|
}
|
||||||
|
if (part.type === 'finish') {
|
||||||
|
const usage = normalizeStreamUsage(part.totalUsage);
|
||||||
|
return usage ? { usage } : undefined;
|
||||||
|
}
|
||||||
|
return undefined;
|
||||||
}
|
}
|
||||||
|
|
||||||
/** The last message with role 'user' from a useChat payload, if any. */
|
/** The last message with role 'user' from a useChat payload, if any. */
|
||||||
|
|||||||
Reference in New Issue
Block a user