Compare commits

..

19 Commits

Author SHA1 Message Date
claude code agent 227
df81851eb3 fix(ai-chat): export the first unsaved turn (#174)
The "Copy chat" button was hidden during a brand-new chat's very first
turn: both the `canExport` gate and the `handleCopy` early-return required
an `activeChatId` AND persisted `messageRows`, neither of which exists yet
while the first turn is streaming or after it was interrupted before any
row was persisted.

Decouple the export gate from persisted state. ChatThread now reports a
reactive `onLiveContentChange(messages.length > 0)` signal (the live
snapshot lives in a non-reactive ref, so a separate reactive flag is
needed to re-render the button); the parent keeps it in `hasLiveContent`
and exports whenever there is anything on screen OR persisted. `handleCopy`
passes a `"unsaved"` placeholder chat id when none exists yet, and the
live-first builder serializes the on-screen thread WYSIWYG.

Builds on #160 (WYSIWYG export); covers the first-turn edge case that was
explicitly out of scope there.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 03:52:03 +03:00
claude code agent 227
4597183a1e fix(ai-chat): WYSIWYG Copy chat export keeps the on-screen partial reply (#160)
"Copy chat" built the Markdown from persisted rows plus a live tail that was
only included while isStreaming. When a turn was interrupted (dropped stream /
"Lost connection" banner) isStreaming flipped false, the live tail was dropped,
and the partial assistant reply visible on screen — whose row often never
persisted — vanished from the export, leaving only the user messages.

- buildChatMarkdown is now live-first: the on-screen `live` messages ARE the
  document. Each is matched to a persisted row by id to enrich it with token
  usage / error / timestamp; authoritative usage/error already on the live
  message win over the row. When `live` is empty it falls back to the persisted
  rows (old format preserved). Only the tail assistant is flagged "still
  generating", and only when it is genuinely the streaming tail — so the
  status==="submitted" window (tail is the user message) never mislabels the
  previous, completed answer.
- The on-screen banner (classified error / dropped connection / manual stop) is
  flattened to a string in ChatThread, mirrored into liveStateRef alongside the
  messages/isStreaming snapshot, and appended at the end of the export.
- handleCopy maps the live messages and passes live/rows/isStreaming/banner.

Tests: chat-markdown rewritten for the live/enrichment/fallback/banner paths and
the submitted-window regression (26); full ai-chat suite green (186). tsc clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 03:42:43 +03:00
claude_code
5aa199660d fix(ai-chat): keep thinking dots visible between streamed steps
showTypingIndicator hid the standalone thinking dots for any non-empty
trailing text part, so during the pause after the model finished an
intermediate narration and before its next step (e.g. a tool call) the
UI looked frozen. Suppress the dots only while the text part is still
streaming: a finalized ("done") trailing text part on an in-flight turn
now shows the dots again, matching the function's documented intent.

- message-list: guard the text branch with state !== "done" (AI SDK v6
  TextUIPart.state); stateless parts keep their previous behavior
- show-typing-indicator.test: add done -> shown and streaming -> hidden cases

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-25 00:34:22 +03:00
claude_code
bf2ebb9d47 fix(ai-chat): increase bottom margin for typing indicator name
The name label was crowding the bouncing dots when displayed. Adding extra bottom margin (mb={8}) gives the dots room and improves readability. The change only applies when the name is shown.
2026-06-25 00:21:53 +03:00
claude_code
ad90e2290e Merge branch 'develop' of https://gitea.vvzvlad.xyz/vvzvlad/gitmost into develop 2026-06-25 00:11:52 +03:00
e262f1695c Merge pull request 'fix(ai-chat): recycle keep-alive sockets + retry pre-response resets (#175)' (#179) from fix/ai-stream-reset-resilience into develop
Reviewed-on: #179
2026-06-25 00:11:50 +03:00
claude code agent 227
c065e26d14 refactor(ai): retry outside instrumentation + retry-exhaustion test (#179 review)
- Invert the transport layers so the pre-response retry is OUTERMOST and the
  provider-HTTP instrumentation is INNER. Before, the retry lived inside
  createStreamingFetch (under the instrumentation), so a reset the retry
  recovered from logged only a clean "OK status=200" — the
  "PRE-RESPONSE FAILED ... ECONNRESET ... idleSincePrevCall" signal went blind
  exactly when the fix works, and AI_STREAM_KEEPALIVE_MS couldn't be tuned from
  prod data. Now createStreamingFetch is the dispatcher-bound BASE (no retry) and
  a new withPreResponseRetry() wraps it; ai.service composes
  withPreResponseRetry(createInstrumentedFetch('AiService:provider-http',
  createStreamingFetch())), so every attempt — including recovered resets — flows
  through the instrumentation. (Also expresses the keepAlive-config vs retry-
  behavior boundary structurally, per review #3.)
- Add the retry-exhaustion test: a server that resets EVERY connection, asserting
  the call rejects with a retryable connection error AND exactly
  PRE_RESPONSE_CONNECT_RETRIES + 1 (= 3) requests reached the server — pinning the
  bound and that the final error propagates (guards an off-by-one / infinite loop
  / swallowed error). Existing happy-retry + abort tests moved onto
  withPreResponseRetry.

Verified on the stand: a normal turn still streams (reasoning + finish) and the
provider-HTTP telemetry still logs. server tsc + ai/mcp specs green (30).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 00:10:40 +03:00
claude_code
91e7335d54 refactor(ai-chat): drop thinking-token text from typing indicator
The live typing placeholder now shows only the bouncing dots; the
"Thinking… · N tokens" line is removed. Clean up the dead plumbing:

- typing-indicator: remove thinkingTokens prop, thinkingLine and the
  <Text> line; keep the animated dots and the dimmed name label
- message-list: remove tailThinkingTokens helper, the thinkingTokens
  prop pass-through, and the now-unused liveTurnTokens import
- delete tail-thinking-tokens.test.ts (tested the removed helper)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-25 00:02:44 +03:00
claude code agent 227
b0faa2fe32 fix(ai-chat): recycle keep-alive sockets + retry pre-response resets (#175)
The real cause of the long-task "Lost connection to the AI provider" — the
earlier 300s-timeout fix (#176) was the wrong layer. The provider-HTTP telemetry
on the user's deploy shows the failures are PRE-RESPONSE `read ECONNRESET` ~500ms
in (not a 300s/15min timeout), correlated with idleSincePrevCall ~42s and large
bodies; and crucially a retry of the SAME request often succeeds. A direct probe
to the real z.ai endpoint does NOT reset (113KB bodies and a 45s-idle keep-alive
reuse both succeed), and another agent (opencode) runs fine from the same infra —
so the provider is healthy and the egress network is usable. The difference is
the transport: undici's keep-alive pool REUSES a socket that the deployment's
egress (NAT / firewall / conntrack) silently dropped during a long idle gap, so
the next request resets pre-response.

Fix (brings gitmost in line with clients that don't reuse stale sockets):
- Keep-alive recycling: the streaming dispatcher (chat fetch AND the external-MCP
  dispatcher, via the shared streamingDispatcherOptions) now sets
  keepAliveTimeout + keepAliveMaxTimeout to a 10s recycle window
  (AI_STREAM_KEEPALIVE_MS), so a connection idle longer than that is closed
  instead of reused — a long-gap step opens a fresh connection. keepAliveMaxTimeout
  also caps a server-advertised keep-alive so the provider can't widen the window.
- Pre-response connection retry: createStreamingFetch retries a connection-level
  reset (ECONNRESET / UND_ERR_SOCKET / ECONNREFUSED / EPIPE / *_TIMEOUT) on a
  fresh connection up to 2 times. This is SAFE because fetch() only rejects before
  the Response resolves — a started stream is never replayed; an abort (client
  disconnect) is never retried.

Tests: ai-streaming-fetch.spec — keep-alive options, streamKeepAliveMs env,
isRetryableConnectError, and a server that resets the first connection so the
retry must land on a fresh one (+ aborted requests are not retried). Verified on
the stand that a normal turn still streams (reasoning + text + finish) through the
new transport. server tsc + ai/mcp specs green.

Note: root cause is the deployment's egress dropping idle connections (Traefik is
inbound-only); this makes the app resilient to it. AI_STREAM_KEEPALIVE_MS can be
lowered if the egress drops faster than ~10s.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 23:51:17 +03:00
claude_code
d1fbcc1bfa Merge pull request 'feat(ai-chat): surface reasoning from openai-compatible providers (z.ai/GLM) (#175)' (#177) from feat/reasoning-openai-compatible into develop 2026-06-24 23:19:15 +03:00
claude code agent 227
6edbbab43b refactor(ai): unify provider-settings allowlist + stronger chatApiStyle tests (#177 review)
Addresses the second #177 review:

- Architecture (the silent allowlist drift): the writable provider-setting keys
  were maintained by hand in two TS-uncheckable places — the key-loop in
  ai-settings.service and the SQL ALLOWED list in the generic workspace repo (a
  miss there silently dropped a field on persist, exactly what bit chatApiStyle).
  Introduce one typed source of truth PROVIDER_SETTINGS_KEYS in ai.types
  (`satisfies readonly (keyof AiProviderSettings)[]`), have the service consume
  it, and keep the repo's own copy (it can't import AI types) guarded by a parity
  test so any future drift fails in CI.
- Tests:
  - ai.service.include-usage.spec: mocks @ai-sdk/openai-compatible and asserts the
    factory is called with { includeUsage: true, baseURL, apiKey, fetch, name } —
    `.provider` alone could not catch a dropped includeUsage (the token-usage
    zeroing regression); also asserts the 'openai' style does NOT use it.
  - ai-provider-settings-keys.spec: the allowlist parity check + DTO validation
    for chatApiStyle (@IsIn accepts both values, rejects garbage, optional).
- CHANGELOG: [Unreleased] entries for the new "Protocol" / chatApiStyle setting
  and the default provider change (openai -> openai-compatible). (#175, #177)

server + client tsc clean; 42 ai/settings specs green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 23:18:31 +03:00
claude code agent 227
59190148db feat(ai-chat): explicit chatApiStyle selector to surface reasoning (#175)
Rebuilt on develop (after #176) and reworked per review: instead of inferring the
provider from baseUrl (`if (baseUrl)`), the admin picks the chat provider
EXPLICITLY via a new `chatApiStyle` ('openai-compatible' | 'openai'), mirroring
the existing sttApiStyle. A custom baseURL can front real OpenAI too, so the
heuristic was fragile.

Why reasoning was missing: glm-5.2 (and DeepSeek etc.) stream their thinking as
`reasoning_content`, but the official @ai-sdk/openai provider does not map that
field. 'openai-compatible' uses @ai-sdk/openai-compatible, which does — so
reasoning parts now stream (verified live: reasoning-start/delta/end appear, and
disappear when set to 'openai').

- Default (unset) = 'openai-compatible', so existing openai+baseUrl workspaces
  surface reasoning with no admin action. No DB migration (field lives in the
  settings.ai.provider JSON blob).
- includeUsage: true on the openai-compatible model — without it the provider
  omits streamed usage, zeroing the live token counter / reasoning-token
  metadata. The official provider always sent it; this keeps parity. (Confirmed
  live: usage.totalTokens present.)
- openai-compatible has no default endpoint, so with no baseURL (real OpenAI, or
  a role's cross-driver override that cleared it) it falls back to the official
  provider.

Plumbing: ai.types (ChatApiStyle / CHAT_API_STYLES + AiProviderSettings /
MaskedAiSettings), update DTO (@IsIn), ai-settings.service (resolve / getMasked /
update allowlist), workspace.repo updateAiProviderSettings ALLOWED (the second,
SQL-level allowlist the review missed — without it the field never persisted),
ai.service selector. Client: ai-settings-service types + a Protocol <Select> in
the chat section + i18n (en/ru). Scope is chat-only (embeddings don't stream
reasoning; STT already has sttApiStyle).

Tests: ai.service.spec — 4 cases (openai-compatible+baseURL, openai+baseURL,
default-unset, openai-compatible-without-baseURL fallback). Verified on the stand:
default streams reasoning + usage; 'openai' drops reasoning; the setting
round-trips. server + client tsc clean; 36 ai/settings specs green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 22:58:15 +03:00
80a4b5a1b0 Merge pull request 'fix(ai-chat): don't sever long agent turns at undici's 300s stream timeout (#175)' (#176) from fix/ai-stream-undici-timeout into develop
Reviewed-on: #176
2026-06-24 22:34:18 +03:00
claude code agent 227
da15b55786 refactor(ai): address PR #176 review — finite-timeout wording, env doc, tests, permanent provider-http module
- Wording: every comment now says the stream timeouts are RAISED to a
  generous-but-finite ~15-min silence timeout, not "disabled (0)" (the stale
  comments contradicted the code, which uses AI_STREAM_TIMEOUT_MS, default
  900000ms).
- Architecture (the load-bearing-temporary trap): the streaming fetch reached
  the chat provider only by riding the "temporary DIAGNOSTIC" telemetry, so
  deleting the telemetry by its own label would silently revert the timeout fix.
  Legitimize it: rename ai-http-diagnostics.ts -> ai-provider-http.ts,
  createDiagnosticFetch -> createInstrumentedFetch, field aiDiagnosticFetch ->
  aiProviderFetch, drop the "temporary" labels, and document the chat transport
  (streaming fetch + instrumentation) as one intentional construct.
- Docs: AI_STREAM_TIMEOUT_MS added to .env.example next to AI_EMBEDDING_TIMEOUT_MS.
- Tests:
  - ai-provider-http.spec: createInstrumentedFetch delegates to the injected
    baseFetch with the same input/init, returns the Response untouched, rethrows
    the error, and defaults to global fetch — covering the baseFetch seam.
  - ai-streaming-fetch.spec: the delayed-server test is now LOAD-BEARING — with
    AI_STREAM_TIMEOUT_MS set below the 1.5s server delay the call actually rejects
    (a lost dispatcher -> global 300s default would NOT), proving the configured
    dispatcher is wired; plus the default-timeout happy path.

server tsc clean; ai-streaming-fetch / ai-provider-http / ai.service / mcp-servers
/ ai-error specs green (41).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 22:31:58 +03:00
claude code agent 227
a14560c7c9 fix(ai-chat): raise undici's 300s stream timeout for long agent turns (#175)
Long research turns failed mid-task with "Lost connection to the AI provider".
Node's global fetch (undici) defaults BOTH headersTimeout and bodyTimeout to
300_000ms, and the chat provider + the external-MCP dispatcher both ran on it
with no override, so:
  - the z.ai chat stream dropped when a late step's huge accumulated context
    pushed the model's time-to-first-token past 5 min (the model reasons
    server-side with NO streamed reasoning, so the connection is silent until the
    first answer token — reproduced: even a trivial glm-5.2 query has a ~4-8s
    first-chunk gap; a long run reaches 400k+-token steps), or a reasoning model
    paused >5 min between chunks (bodyTimeout);
  - the crawl4ai SSE transport, held open across the whole turn, dropped when it
    idled >5 min between tool calls.

Fix: a dedicated undici dispatcher whose stream timeouts are raised to a
generous-but-FINITE silence timeout (default 15 min, AI_STREAM_TIMEOUT_MS) on
each path. NOT disabled (0): that would let a genuinely hung provider — with the
client still connected — hang forever, since the turn's abortSignal only fires on
client disconnect. The timeout bounds SILENCE (time-to-first-byte and the gap
BETWEEN chunks), NOT total turn duration, so an arbitrarily long turn that keeps
streaming is never cut; only a stream quiet for >15 min is treated as a hang.
  - ai-streaming-fetch.ts: createStreamingFetch() + streamTimeoutMs() /
    streamingDispatcherOptions() (the shared, configurable timeout).
  - ai.service: the chat provider fetch is createStreamingFetch(), wrapped by the
    existing passive ECONNRESET telemetry (createDiagnosticFetch gained an
    optional baseFetch) so the telemetry observes the SAME transport.
  - mcp-clients: the SSRF-pinned Agent uses streamingDispatcherOptions().

Investigation: reproduced the transport mechanism against the real z.ai endpoint
(a 1ms headersTimeout throws UND_ERR_HEADERS_TIMEOUT — the exact drop) and ran
the actual research agent to a ~428k-token context. Verified the fixed path
streams cleanly live (glm-5.2 turns finish; telemetry confirms the streaming
fetch is in use).

Tests: ai-streaming-fetch.spec (default 15m + env override + invalid fallback +
both-timeouts + streams a delayed response); ai-http-diagnostics + ai/mcp specs
green. server tsc clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 22:09:10 +03:00
claude_code
4cc8df836f chore(ai): passive z.ai provider HTTP telemetry (#175)
Investigate the intermittent (~20-30%) long-turn failure
"Lost connection to the AI provider" = AI_RetryError / read ECONNRESET
on the gitmost->z.ai link (browser-agnostic, mid-turn). Pure
instrumentation, no behavior change:

- ai-http-diagnostics.ts: a passive fetch wrapper injected into the
  OpenAI-compatible (z.ai) client. Per provider HTTP call it logs
  time-to-headers/status on success, and on a pre-response rejection the
  latency, error code/cause, request-body size and idle-gap since the
  previous call. The Response is returned untouched (streaming intact),
  errors rethrown unchanged; no retry/timeout/dispatcher.
- ai.service.ts: wire the instrumented fetch into the openai case only.

Lets us classify the reset as connection-phase vs mid-stream before
choosing a fix, without repeating the reverted RetryAgent (#140).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 21:24:05 +03:00
claude_code
04a418e1a6 Merge pull request 'fix(mcp): tool allowlist stored/read as jsonb string, not array (edit-page crash + allowlist not enforced)' (#172) from fix/mcp-tool-allowlist-jsonb-shape into develop 2026-06-24 17:14:56 +03:00
claude code agent 227
255bc06883 fix(mcp): tool allowlist stored/read as jsonb string, not array
Opening the edit form for an MCP server that has a saved tool allowlist crashed
the whole settings page (`TypeError: Ke.map is not a function` in Mantine) — and,
worse, the allowlist was silently NOT enforced. Both stem from one root cause:
the `tool_allowlist` jsonb column round-trips as a JSON STRING, not an array.

Root cause: `jsonbArray` bound `JSON.stringify(value)` (already a JSON string)
straight to a `::jsonb` cast. node-postgres infers the param type as jsonb and
JSON-stringifies it a SECOND time, so the column stored a jsonb STRING SCALAR
(`"[\"a\"]"`, jsonb_typeof = string) instead of an array. On read the driver
hands back the JS string `'["a"]'`. Then:
  - the edit form's TagsInput called `.map` on a string -> page crash;
  - mcp-clients did `Array.isArray(allow)` -> false for a string -> fell through
    to "no restriction" and exposed ALL of the server's tools.

Fix (both verified on the stand):
- Write: `jsonbArray` casts `::text::jsonb` so the param is bound as text (sent
  verbatim) and parsed into a real jsonb array. New rows now store
  jsonb_typeof=array.
- Read: `normalizeRow` runs every fetched row through `parseToolAllowlist`, which
  returns `string[] | null` for both shapes (already-array passes through; a JSON
  string is parsed; null/invalid -> null). This REPAIRS existing double-encoded
  rows on read, so the UI and the allowlist enforcement work without a data
  migration. Applied in findById / listByWorkspace / listEnabled.
- Client: defensive `Array.isArray(...) ? ... : []` guard in the form so a bad
  shape can never take the settings page down again.

Tests: ai-mcp-server.repo.spec (8 cases for parseToolAllowlist — array, the
JSON-string read, null, empty, non-array json, unparseable, non-string elements,
non-string primitive). mcp-servers-to-view + mcp-namespacing still green.
Verified live: an old double-encoded row now reads as an array; a newly created
server stores jsonb_typeof=array.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 17:11:50 +03:00
8c06553b49 Merge pull request 'test(footnotes): cover footnoteWarnings import plumbing + doc fixes (#169 second review)' (#171) from fix/footnote-review-1227-followup into develop
Reviewed-on: #171
2026-06-24 16:46:23 +03:00
30 changed files with 1672 additions and 303 deletions

View File

@@ -136,6 +136,19 @@ MCP_DOCMOST_PASSWORD=
# A slow/hung embeddings endpoint fails after this and the batch continues.
# AI_EMBEDDING_TIMEOUT_MS=120000
# Silence timeout (ms) for streaming chat/agent AI calls AND external-MCP traffic.
# Bounds time-to-first-byte and the gap BETWEEN chunks (NOT the total turn length),
# so an arbitrarily long turn that keeps streaming is never cut. Finite so a hung
# provider is eventually broken instead of leaking forever. Default 900000 (15 min).
# AI_STREAM_TIMEOUT_MS=900000
# Keep-alive recycle window (ms) for streaming chat/agent AI + external-MCP calls.
# A pooled connection idle longer than this is closed instead of reused, so a
# NAT / egress firewall / reverse proxy that silently drops idle connections
# cannot poison a reused socket into a PRE-RESPONSE `read ECONNRESET`. Lower it if
# your egress drops idle connections faster than ~10s. Default 10000 (10 s).
# AI_STREAM_KEEPALIVE_MS=10000
# --- Anonymous public-share AI assistant ---
# Opt-in per workspace (AI settings -> "public share assistant"; off by default).
# When enabled, anonymous visitors of a published share can ask an AI about that

View File

@@ -25,9 +25,22 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
flagging dangling references, empty or duplicate definitions, and `[^id]`
markers inside table rows, so an agent can fix its own markup. The page is
still created; the field is omitted when there are no problems. (#166)
- **AI chat "Protocol" setting (`chatApiStyle`).** A new admin choice in AI
settings for the `openai` driver: `openai-compatible` (default) routes chat
through `@ai-sdk/openai-compatible`, which surfaces a provider's streamed
reasoning (`reasoning_content` → reasoning parts) for z.ai/GLM, DeepSeek,
OpenRouter, etc.; `openai` uses the official provider (real-OpenAI
reasoning-model request shaping). Chosen explicitly rather than inferred from
the base URL, since a custom URL can front real OpenAI too. (#175, #177)
### Changed
- **AI chat default provider is now `openai-compatible` (reasoning surfaced).**
For the `openai` driver the chat provider defaults to the openai-compatible
implementation, so a workspace pointing at z.ai/GLM/DeepSeek now streams the
model's reasoning out of the box. An endpoint that is real OpenAI behind a
custom base URL should set the new `chatApiStyle` "Protocol" to `openai`. (#177)
- **Footnotes now reuse (Pandoc semantics).** Multiple `[^a]` references to the
same id are ONE footnote — one number, one definition, several back-references
— instead of being renamed to `a__2`, `a__3`. Duplicate `[^a]:` definitions are

View File

@@ -1307,5 +1307,9 @@
"Page tree (child pages, recursive)": "Page tree (child pages, recursive)",
"Render the full nested tree of all descendant pages": "Render the full nested tree of all descendant pages",
"Showing {{count}} subpages_one": "Showing {{count}} subpage",
"Showing {{count}} subpages_other": "Showing {{count}} subpages"
"Showing {{count}} subpages_other": "Showing {{count}} subpages",
"Protocol": "Protocol",
"How chat requests are sent and how reasoning is surfaced": "How chat requests are sent and how reasoning is surfaced",
"OpenAI-compatible (surfaces reasoning)": "OpenAI-compatible (surfaces reasoning)",
"OpenAI (official)": "OpenAI (official)"
}

View File

@@ -1160,5 +1160,9 @@
"Render the full nested tree of all descendant pages": "Показать полное вложенное дерево всех дочерних страниц",
"Showing {{count}} subpages_one": "Показано {{count}} подстраница",
"Showing {{count}} subpages_few": "Показано {{count}} подстраницы",
"Showing {{count}} subpages_many": "Показано {{count}} подстраниц"
"Showing {{count}} subpages_many": "Показано {{count}} подстраниц",
"Protocol": "Протокол",
"How chat requests are sent and how reasoning is surfaced": "Как отправляются запросы чата и как показывается reasoning",
"OpenAI-compatible (surfaces reasoning)": "OpenAI-совместимый (показывает reasoning)",
"OpenAI (official)": "OpenAI (официальный)"
}

View File

@@ -80,17 +80,31 @@ function computeInitialGeom() {
Math.min(DEFAULT_HEIGHT, window.innerHeight - 2 * EDGE_MARGIN),
);
const left = Math.max(EDGE_MARGIN, window.innerWidth - width - 24);
const maxTop = Math.max(EDGE_MARGIN, window.innerHeight - height - EDGE_MARGIN);
const maxTop = Math.max(
EDGE_MARGIN,
window.innerHeight - height - EDGE_MARGIN,
);
const top = Math.min(60, maxTop);
return { left, top, width, height };
}
// Clamp a geometry so the window stays within the current viewport.
function clampGeom(g: { left: number; top: number; width: number; height: number }) {
function clampGeom(g: {
left: number;
top: number;
width: number;
height: number;
}) {
const effWidth = Math.max(g.width, MIN_WIDTH);
const effHeight = Math.max(g.height, MIN_HEIGHT);
const maxLeft = Math.max(EDGE_MARGIN, window.innerWidth - effWidth - EDGE_MARGIN);
const maxTop = Math.max(EDGE_MARGIN, window.innerHeight - effHeight - EDGE_MARGIN);
const maxLeft = Math.max(
EDGE_MARGIN,
window.innerWidth - effWidth - EDGE_MARGIN,
);
const maxTop = Math.max(
EDGE_MARGIN,
window.innerHeight - effHeight - EDGE_MARGIN,
);
return {
...g,
left: Math.min(Math.max(EDGE_MARGIN, g.left), maxLeft),
@@ -151,9 +165,14 @@ export default function AiChatWindow() {
// Live snapshot of the active thread's useChat state, kept up to date by
// ChatThread. Lets the export include the in-progress (not-yet-persisted)
// streaming turn. A ref avoids re-rendering this window on every token.
const liveThreadRef = useRef<{ messages: UIMessage[]; isStreaming: boolean }>({
const liveThreadRef = useRef<{
messages: UIMessage[];
isStreaming: boolean;
banner: string | null;
}>({
messages: [],
isStreaming: false,
banner: null,
});
// Live turn-token total (reasoning + output) for the in-flight turn, pushed up
@@ -161,6 +180,12 @@ export default function AiChatWindow() {
// `null` means no turn is in flight -> the badge falls back to the persisted
// context size below.
const [liveTurnTokens, setLiveTurnTokens] = useState<number | null>(null);
// Whether the on-screen thread currently holds at least one message. Reported
// reactively by ChatThread (the live snapshot lives in a non-reactive ref). This
// lets the "Copy chat" button stay available for a brand-new, not-yet-persisted
// chat whose first turn is in flight or was interrupted — that case has no
// persisted rows yet, so a persisted-rows-only gate would hide the button (#174).
const [hasLiveContent, setHasLiveContent] = useState(false);
// The page the user is currently viewing. AiChatWindow lives in a pathless
// parent layout route, so useParams() can't see :pageSlug. Match the full
@@ -185,17 +210,21 @@ export default function AiChatWindow() {
// The invalidate closures are passed inline: `onTurnFinished` is read live by
// useChat's onFinish (never in an effect dep array), so their identity does not
// matter — no memoization ceremony needed.
const { threadKey, waitingForHistory, onTurnFinished, cancelPendingAdoption } =
useChatSession({
activeChatId,
setActiveChatId,
chats,
messagesLoading,
onInvalidateChatList: () =>
queryClient.invalidateQueries({ queryKey: AI_CHATS_RQ_KEY }),
onInvalidateChatMessages: (id) =>
queryClient.invalidateQueries({ queryKey: AI_CHAT_MESSAGES_RQ_KEY(id) }),
});
const {
threadKey,
waitingForHistory,
onTurnFinished,
cancelPendingAdoption,
} = useChatSession({
activeChatId,
setActiveChatId,
chats,
messagesLoading,
onInvalidateChatList: () =>
queryClient.invalidateQueries({ queryKey: AI_CHATS_RQ_KEY }),
onInvalidateChatMessages: (id) =>
queryClient.invalidateQueries({ queryKey: AI_CHAT_MESSAGES_RQ_KEY(id) }),
});
// startNewChat/selectChat set the public atom; the hook's render-phase
// reconciler handles the remount when activeChatId actually CHANGES. But
@@ -231,13 +260,23 @@ export default function AiChatWindow() {
() => chats?.items?.find((c) => c.id === activeChatId) ?? null,
[chats, activeChatId],
);
const canExport = !!activeChatId && !!messageRows && messageRows.length > 0;
// Export is available when there is anything to export: either persisted rows
// for the active chat, OR a live on-screen thread with at least one message.
// The live arm covers a brand-new chat whose first turn is streaming or was
// interrupted before the server persisted any row (#174); the persisted arm is
// the steady-state path for an already-saved chat (#160).
const canExport =
hasLiveContent ||
(!!activeChatId && !!messageRows && messageRows.length > 0);
// The role to display in the header and as the assistant's name. Prefer the
// persisted role of an existing chat (chat-list JOIN); fall back to the role
// picked via a card click for a brand-new or just-adopted chat. selectChat
// resets selectedRoleId, so this fallback never leaks into an unrelated chat.
const currentRole = useMemo<{ name: string; emoji: string | null } | null>(() => {
const currentRole = useMemo<{
name: string;
emoji: string | null;
} | null>(() => {
if (activeChat?.roleName) {
return { name: activeChat.roleName, emoji: activeChat.roleEmoji ?? null };
}
@@ -249,28 +288,44 @@ export default function AiChatWindow() {
// call) and copy it to the clipboard. The "Copied" notification is the
// feedback.
const handleCopy = useCallback(() => {
if (!activeChatId || !messageRows || messageRows.length === 0) return;
// While the active thread is streaming, the current user message and the
// in-progress assistant reply are NOT yet in messageRows (the persisted
// query is only refetched after the turn finishes). Pull the live tail —
// messages whose id is not among the persisted rows — and append them,
// flagging the streaming assistant message as still generating.
// Export gate. There must be SOMETHING to export — either a live on-screen
// message or a persisted row. A brand-new chat whose first turn is streaming
// or was interrupted has live messages but no persisted rows yet; it still
// exports the on-screen thread WYSIWYG (#174). Only a truly empty chat (no
// live messages and no rows) is non-exportable (the button is hidden too —
// see `canExport`).
const live = liveThreadRef.current;
const rowIds = new Set(messageRows.map((r) => r.id));
const pending = live.isStreaming
? live.messages
.filter((m) => !rowIds.has(m.id))
.map((m) => ({
role: m.role,
parts: (m.parts ?? []) as { type: string; text?: string }[],
generating: m.role === "assistant",
}))
: [];
const hasRows = !!messageRows && messageRows.length > 0;
if (live.messages.length === 0 && !hasRows) return;
// WYSIWYG export: the live on-screen messages ARE the document (so a partial
// reply from an interrupted turn — which never reached the persisted rows —
// is exported just as it appears). The persisted rows enrich each live
// message (token usage / error / timestamp) by id and serve as the fallback
// when the live mirror is empty. The on-screen banner is appended too. See
// issues #160 and #174. `chatId` may be null for a not-yet-saved chat — use a
// placeholder so the header line still renders.
const markdown = buildChatMarkdown({
title: activeChat?.title ?? null,
chatId: activeChatId,
chatId: activeChatId ?? "unsaved",
live: live.messages.map((m) => ({
id: m.id,
role: m.role,
parts: (m.parts ?? []) as { type: string; text?: string }[],
metadata: m.metadata as
| {
usage?: {
inputTokens?: number;
outputTokens?: number;
totalTokens?: number;
reasoningTokens?: number;
};
error?: string;
}
| undefined,
})),
rows: messageRows,
pending,
isStreaming: live.isStreaming,
banner: live.banner,
t,
});
clipboard.copy(markdown);
@@ -351,7 +406,8 @@ export default function AiChatWindow() {
const width = el.offsetWidth;
const height = el.offsetHeight;
setGeom((prev) => {
if (!prev || (prev.width === width && prev.height === height)) return prev;
if (!prev || (prev.width === width && prev.height === height))
return prev;
return { ...prev, width, height };
});
});
@@ -497,11 +553,15 @@ export default function AiChatWindow() {
flash a "0" badge before any token streams in (#151 review). */}
{liveTurnTokens !== null && liveTurnTokens > 0 ? (
<Tooltip label={t("Tokens generated this turn")} withArrow>
<span className={classes.badge}>{formatTokens(liveTurnTokens)}</span>
<span className={classes.badge}>
{formatTokens(liveTurnTokens)}
</span>
</Tooltip>
) : contextTokens > 0 ? (
<Tooltip label={t("Current context size")} withArrow>
<span className={classes.badge}>{formatTokens(contextTokens)}</span>
<span className={classes.badge}>
{formatTokens(contextTokens)}
</span>
</Tooltip>
) : null}
</div>
@@ -515,7 +575,11 @@ export default function AiChatWindow() {
aria-label={t("Copy chat")}
onClick={handleCopy}
>
{clipboard.copied ? <IconCheck size={14} /> : <IconCopy size={14} />}
{clipboard.copied ? (
<IconCheck size={14} />
) : (
<IconCopy size={14} />
)}
</button>
)}
<button
@@ -623,6 +687,7 @@ export default function AiChatWindow() {
onTurnFinished={onTurnFinished}
liveStateRef={liveThreadRef}
onLiveTurnTokens={setLiveTurnTokens}
onLiveContentChange={setHasLiveContent}
/>
)}
</div>

View File

@@ -73,13 +73,25 @@ interface ChatThreadProps {
* "Copy chat" export can include the in-progress, not-yet-persisted
* assistant message. A ref (not state) avoids re-rendering the parent on
* every streamed delta. */
liveStateRef?: MutableRefObject<{ messages: UIMessage[]; isStreaming: boolean }>;
liveStateRef?: MutableRefObject<{
messages: UIMessage[];
isStreaming: boolean;
banner: string | null;
}>;
/** Reports the live turn-token total (reasoning + output) for the in-flight
* turn so the parent can show a header badge that ticks mid-stream. THROTTLED
* here (~8 Hz) so the parent re-renders a handful of times a second, not on
* every streamed delta. Called with `null` when no turn is in flight (the
* parent then reverts the badge to the persisted context size). */
onLiveTurnTokens?: (tokens: number | null) => void;
/** Reports whether the live thread currently holds at least one message, so the
* parent can gate the "Copy chat" button on the on-screen thread rather than on
* the persisted rows alone. This stays truthy for a brand-new, not-yet-saved
* chat the moment its first user message appears — so an interrupted very first
* turn (no persisted rows yet) is still exportable (#174). Called with `false`
* on unmount so a thread torn down by `key` on chat switch can't leave the
* button enabled for the next, possibly empty, chat. */
onLiveContentChange?: (hasContent: boolean) => void;
}
/**
@@ -125,6 +137,7 @@ export default function ChatThread({
onTurnFinished,
liveStateRef,
onLiveTurnTokens,
onLiveContentChange,
}: ChatThreadProps) {
const { t } = useTranslation();
@@ -309,18 +322,49 @@ export default function ChatThread({
if (isStreaming) setStopNotice(null);
}, [isStreaming]);
// Classify the turn error into a heading + detail so the banner names the cause
// (connection reset, timeout, rate limit, context overflow, quota, ...) instead
// of a generic "Something went wrong". Computed here (not only in the JSX) so
// the SAME on-screen banner text can be mirrored into the export (issue #160).
const errorView = error ? describeChatError(error.message ?? "", t) : null;
// The exact banner the user sees under the message list, flattened to a single
// string for the "Copy chat" export so the artifact records the interruption
// WYSIWYG. Mirrors the JSX precedence below: error first, else the stop notice.
const banner = errorView
? errorView.detail
? `${errorView.title}${errorView.detail}`
: errorView.title
: stopNotice === "manual"
? t("Response stopped.")
: stopNotice === "disconnect"
? t("Connection lost — the answer was interrupted.")
: null;
// Mirror the live useChat snapshot into the parent-owned ref so the export
// (handled in AiChatWindow) can include the in-progress streaming turn. The
// cleanup clears the ref on unmount so a thread torn down by `key` on chat
// switch can't leak its (possibly still-streaming) tail into the next chat's
// export before the new thread's effect repopulates the ref.
// (handled in AiChatWindow) can include the in-progress streaming turn AND the
// on-screen banner. The cleanup clears the ref on unmount so a thread torn down
// by `key` on chat switch can't leak its (possibly still-streaming) tail into
// the next chat's export before the new thread's effect repopulates the ref.
useEffect(() => {
if (!liveStateRef) return;
liveStateRef.current = { messages, isStreaming };
liveStateRef.current = { messages, isStreaming, banner };
return () => {
liveStateRef.current = { messages: [], isStreaming: false };
liveStateRef.current = { messages: [], isStreaming: false, banner: null };
};
}, [liveStateRef, messages, isStreaming]);
}, [liveStateRef, messages, isStreaming, banner]);
// Reactively report "the live thread has content" to the parent. `liveStateRef`
// above is a ref (deliberately non-reactive so streaming deltas don't re-render
// the parent), so the export button needs a SEPARATE reactive signal to flip on
// for a not-yet-persisted chat. Keyed on the boolean only — identical values are
// a no-op setState in the parent, so this does not add per-delta re-renders.
const hasLiveContent = messages.length > 0;
useEffect(() => {
if (!onLiveContentChange) return;
onLiveContentChange(hasLiveContent);
return () => onLiveContentChange(false);
}, [onLiveContentChange, hasLiveContent]);
// Report the live turn-token total to the parent header badge, THROTTLED to
// ~8 Hz so the parent re-renders a few times a second instead of on every
@@ -343,8 +387,7 @@ export default function ChatThread({
return;
}
const tail = messages[messages.length - 1];
const live =
tail?.role === "assistant" ? liveTurnTokens(tail) : null;
const live = tail?.role === "assistant" ? liveTurnTokens(tail) : null;
const total = live ? live.reasoning + live.output : 0;
const now = Date.now();
const MIN_INTERVAL = 120; // ms (~8 Hz)
@@ -370,11 +413,6 @@ export default function ChatThread({
};
}, []);
// Classify the turn error into a heading + detail so the banner names the cause
// (connection reset, timeout, rate limit, context overflow, quota, ...) instead
// of a generic "Something went wrong".
const errorView = error ? describeChatError(error.message ?? "", t) : null;
// A role was picked with autoStart=false: the role is bound but NOTHING was
// sent, so chatId stays null and the empty state would keep showing the cards.
// This flag hides the cards and reveals the composer (with the role indicated)

View File

@@ -6,7 +6,6 @@ import MessageItem from "@/features/ai-chat/components/message-item.tsx";
import TypingIndicator from "@/features/ai-chat/components/typing-indicator.tsx";
import { isToolPart, toolRunState, ToolUiPart } from "@/features/ai-chat/utils/tool-parts.tsx";
import { assistantMessageHasVisibleContent } from "@/features/ai-chat/utils/message-content.ts";
import { liveTurnTokens } from "@/features/ai-chat/utils/count-stream-tokens.ts";
import classes from "@/features/ai-chat/components/ai-chat.module.css";
interface MessageListProps {
@@ -51,7 +50,9 @@ const BOTTOM_THRESHOLD = 40;
* assistant message's LAST part is not live output:
* - the last message is still the user's (assistant hasn't started a row), or
* - the assistant row has no parts yet, or
* - its last part is an empty/whitespace text part, or
* - its last part is an empty/whitespace text part, or a finished ("done")
* text part while the turn continues (the model paused after some narration
* and is thinking about its next step), or
* - its last part is a finished/errored tool (the model is thinking about the
* next step between tool calls).
* It hides only while output is actively rendering: a non-empty streaming text
@@ -65,7 +66,19 @@ export function showTypingIndicator(messages: UIMessage[], isStreaming: boolean)
const lastPart = last.parts[last.parts.length - 1];
if (!lastPart) return true; // assistant row exists but has no parts yet.
// The answer text is actively streaming in -> MessageItem renders it; no dots.
if (lastPart.type === "text" && lastPart.text.trim().length > 0) return false;
// Only while it is STILL streaming, though: once a non-empty text part is
// finalized ("done") but the turn is still in flight, the model has paused
// after some narration and is working on its next step (e.g. about to call a
// tool) — nothing is visibly progressing, so the dots must show. A text part
// without a `state` is treated as still-rendering (kept suppressed); this
// branch only runs while streaming, where live parts always carry a state.
if (
lastPart.type === "text" &&
lastPart.text.trim().length > 0 &&
(lastPart as { state?: "streaming" | "done" }).state !== "done"
) {
return false;
}
// A tool still in flight shows its own Loader in ToolCallCard -> no dots.
if (
isToolPart(lastPart.type) &&
@@ -95,19 +108,6 @@ export function typingIndicatorShowsName(messages: UIMessage[]): boolean {
return !assistantMessageHasVisibleContent(last);
}
/**
* The live thinking-token count to show on the standalone typing indicator. It
* is the reasoning split of the tail assistant message (estimate while streaming,
* authoritative once the server attaches usage at a step/turn boundary). Returns
* 0 when the turn has produced no reasoning yet — the indicator then shows the
* plain "Thinking…" line.
*/
export function tailThinkingTokens(messages: UIMessage[]): number {
const last = messages[messages.length - 1];
if (!last || last.role !== "assistant") return 0;
return liveTurnTokens(last).reasoning;
}
/**
* Scrollable transcript. Auto-scrolls to the newest message as it streams in,
* but only while the user is pinned to the bottom — if they scrolled up to read
@@ -208,7 +208,6 @@ export default function MessageList({
<TypingIndicator
assistantName={assistantName}
showName={typingIndicatorShowsName(messages)}
thinkingTokens={tailThinkingTokens(messages)}
/>
)}
</Stack>

View File

@@ -82,4 +82,14 @@ describe("showTypingIndicator", () => {
showTypingIndicator([msg("assistant", [doneTool, text])], true),
).toBe(false);
});
it("shows while streaming after a text part is finalized (paused before the next step)", () => {
const doneText = { type: "text", text: "Now creating the page in", state: "done" } as unknown as UIMessage["parts"][number];
expect(showTypingIndicator([msg("assistant", [doneText])], true)).toBe(true);
});
it("hides while a text part is actively streaming (state: streaming)", () => {
const streamingText = { type: "text", text: "Now writ", state: "streaming" } as unknown as UIMessage["parts"][number];
expect(showTypingIndicator([msg("assistant", [streamingText])], true)).toBe(false);
});
});

View File

@@ -1,50 +0,0 @@
import { describe, expect, it } from "vitest";
import type { UIMessage } from "@ai-sdk/react";
import { tailThinkingTokens } from "@/features/ai-chat/components/message-list.tsx";
/**
* Pure-helper tests for `tailThinkingTokens`: the live thinking-token count the
* standalone typing indicator shows. It is the reasoning split of the tail
* assistant message (estimate while streaming, authoritative once usage arrives).
*/
const msg = (
role: "user" | "assistant",
parts: unknown[],
metadata?: unknown,
): UIMessage =>
({ id: Math.random().toString(), role, parts, metadata }) as UIMessage;
describe("tailThinkingTokens", () => {
it("is 0 when there are no messages", () => {
expect(tailThinkingTokens([])).toBe(0);
});
it("is 0 when the tail message is the user's", () => {
expect(tailThinkingTokens([msg("user", [{ type: "text", text: "q" }])])).toBe(0);
});
it("is 0 when the assistant has produced no reasoning yet", () => {
expect(
tailThinkingTokens([msg("assistant", [{ type: "text", text: "answer" }])]),
).toBe(0);
});
it("estimates reasoning tokens from streamed reasoning text", () => {
// 8 chars -> 2 tokens.
expect(
tailThinkingTokens([
msg("assistant", [{ type: "reasoning", text: "12345678" }]),
]),
).toBe(2);
});
it("uses authoritative usage.reasoningTokens once the server attaches it", () => {
expect(
tailThinkingTokens([
msg("assistant", [{ type: "reasoning", text: "x" }], {
usage: { outputTokens: 100, reasoningTokens: 42 },
}),
]),
).toBe(42);
});
});

View File

@@ -16,12 +16,6 @@ interface TypingIndicatorProps {
* assistant row above already shows the same name, to avoid a duplicate label.
*/
showName?: boolean;
/**
* Live thinking/reasoning token count for the in-flight turn. When > 0 the
* typing line becomes `Thinking… · {count} tokens` (like Claude Code). Omitted
* / 0 keeps the plain `Thinking…` line.
*/
thinkingTokens?: number;
}
/**
@@ -32,23 +26,20 @@ interface TypingIndicatorProps {
*
* Mirrors the assistant row layout in MessageItem (the dimmed label), so it reads
* as the assistant's bubble taking shape. The dimmed label uses the configured
* identity name when provided (otherwise the generic "AI agent"), while the
* typing line is always the generic "Thinking…" (it never includes the
* role/identity name).
* identity name when provided (otherwise the generic "AI agent"); below it the
* animated dots stand in for the nascent bubble until content arrives.
*/
export default function TypingIndicator({ assistantName, showName = true, thinkingTokens }: TypingIndicatorProps) {
export default function TypingIndicator({ assistantName, showName = true }: TypingIndicatorProps) {
const { t } = useTranslation();
const name = resolveAssistantName(assistantName);
// Show the running thinking-token count only once there is something to count.
const thinkingLine =
thinkingTokens && thinkingTokens > 0
? t("Thinking… · {{count}} tokens", { count: thinkingTokens })
: t("Thinking…");
return (
<Box className={classes.messageRow}>
{showName !== false && (
<Text size="xs" c="dimmed" mb={4}>
// Extra bottom gap (vs MessageItem's mb={4}) gives the small bouncing
// dots room below the name label; without it they crowd the label. Only
// applies when the name is shown — the nameless case spaces fine on its own.
<Text size="xs" c="dimmed" mb={8}>
{name ?? t("AI agent")}
</Text>
)}
@@ -58,9 +49,6 @@ export default function TypingIndicator({ assistantName, showName = true, thinki
<span />
<span />
</span>
<Text size="sm" c="dimmed">
{thinkingLine}
</Text>
</Group>
</Box>
);

View File

@@ -165,7 +165,9 @@ describe("buildChatMarkdown — tool parts", () => {
],
t,
});
expect(md).toContain("**Tool: Ran tool mysteryTool** (`mysteryTool`) — error");
expect(md).toContain(
"**Tool: Ran tool mysteryTool** (`mysteryTool`) — error",
);
expect(md).toContain("**Error:** boom");
});
@@ -307,7 +309,9 @@ describe("buildChatMarkdown — token totals", () => {
row({
role: "assistant",
content: "x",
metadata: { usage: { inputTokens: 3, outputTokens: 4, totalTokens: 99 } },
metadata: {
usage: { inputTokens: 3, outputTokens: 4, totalTokens: 99 },
},
}),
],
t,
@@ -367,125 +371,377 @@ describe("buildChatMarkdown — token totals", () => {
});
});
describe("buildChatMarkdown — pending / in-progress messages", () => {
it("continues the heading numbering after the persisted rows", () => {
// A minimal on-screen (live) message, matching the subset buildChatMarkdown reads.
function live(partial: {
id?: string;
role?: string;
parts?: { type: string; text?: string }[];
metadata?: { usage?: Record<string, number>; error?: string };
}) {
return {
id: partial.id ?? "live-id",
role: partial.role ?? "assistant",
parts: partial.parts ?? [],
metadata: partial.metadata,
};
}
describe("buildChatMarkdown — live (WYSIWYG) source", () => {
it("uses the live messages as the document (what's on screen), numbered from 1", () => {
const md = buildChatMarkdown({
title: "t",
chatId: "c",
rows: [row({ role: "user", content: "persisted" })],
pending: [
{
// Persisted rows hold only the user turn; the assistant reply is live-only.
rows: [row({ id: "u1", role: "user", content: "persisted user" })],
live: [
live({
id: "u1",
role: "user",
parts: [{ type: "text", text: "live question" }],
generating: false,
},
{
parts: [{ type: "text", text: "on-screen user" }],
}),
live({
id: "a1",
role: "assistant",
parts: [{ type: "text", text: "live answer" }],
generating: true,
},
parts: [{ type: "text", text: "on-screen reply" }],
}),
],
isStreaming: false,
t,
});
expect(md).toContain("## 1. You");
expect(md).toContain("## 2. You");
expect(md).toContain("## 3. AI agent");
expect(md).toContain("live question");
expect(md).toContain("live answer");
expect(md).toContain("## 2. AI agent");
expect(md).toContain("on-screen user");
expect(md).toContain("on-screen reply");
// Message count reflects the LIVE document, not rows + live.
expect(md).toContain("- Messages: 2");
});
it("flags a generating assistant pending message as still being generated", () => {
it("captures a partial reply from an interrupted (non-streaming) turn — no 'generating' note", () => {
const md = buildChatMarkdown({
title: "t",
chatId: "c",
rows: [row({ role: "user", content: "persisted" })],
pending: [
{
rows: [row({ id: "u1", role: "user", content: "q" })],
live: [
live({ id: "u1", role: "user", parts: [{ type: "text", text: "q" }] }),
live({
id: "a-live",
role: "assistant",
parts: [{ type: "text", text: "partial reply" }],
generating: true,
},
parts: [{ type: "text", text: "partial plan before the drop" }],
}),
],
isStreaming: false, // the stream dropped — not streaming anymore
banner: "Connection lost — the answer was interrupted.",
t,
});
expect(md).toContain("partial reply");
expect(md).toContain("still being generated");
// The partial assistant answer that was on screen IS in the export.
expect(md).toContain("partial plan before the drop");
// It is NOT flagged still-generating (the turn is over, just interrupted).
expect(md).not.toContain("still being generated");
// The on-screen banner is recorded at the end.
expect(md).toContain("Connection lost — the answer was interrupted.");
});
it("renders a non-generating user pending message without the note", () => {
it("flags ONLY the tail assistant as still generating, and only while streaming", () => {
const streaming = buildChatMarkdown({
title: "t",
chatId: "c",
rows: [],
live: [
live({
id: "a",
role: "assistant",
parts: [{ type: "text", text: "done earlier" }],
}),
live({
id: "u",
role: "user",
parts: [{ type: "text", text: "next q" }],
}),
live({
id: "b",
role: "assistant",
parts: [{ type: "text", text: "streaming now" }],
}),
],
isStreaming: true,
t,
});
// Exactly one "still being generated" note (the tail assistant).
expect(streaming.match(/still being generated/g)?.length).toBe(1);
const idle = buildChatMarkdown({
title: "t",
chatId: "c",
rows: [],
live: [
live({
id: "b",
role: "assistant",
parts: [{ type: "text", text: "final" }],
}),
],
isStreaming: false,
t,
});
expect(idle).not.toContain("still being generated");
});
it("does NOT flag a completed assistant as generating when the streaming tail is a user message", () => {
// The `status === "submitted"` window: the user just sent, isStreaming is
// already true, but the new assistant turn has no message yet so the tail is
// the USER message. The previous assistant answer is complete on screen and
// must not be marked still-generating (WYSIWYG; regression for #160 review).
const md = buildChatMarkdown({
title: "t",
chatId: "c",
rows: [row({ role: "user", content: "persisted" })],
pending: [
{
rows: [],
live: [
live({
id: "a",
role: "assistant",
parts: [{ type: "text", text: "completed answer" }],
}),
live({
id: "u",
role: "user",
parts: [{ type: "text", text: "my live message" }],
generating: false,
},
parts: [{ type: "text", text: "the new question" }],
}),
],
isStreaming: true,
t,
});
expect(md).toContain("my live message");
expect(md).toContain("completed answer");
expect(md).not.toContain("still being generated");
});
it("includes the pending messages in the metadata message count", () => {
it("emits the heading + note for a streaming tail assistant with empty parts", () => {
const md = buildChatMarkdown({
title: "t",
chatId: "c",
rows: [
row({ role: "user", content: "a" }),
row({ role: "assistant", content: "b" }),
],
pending: [
{
role: "user",
parts: [{ type: "text", text: "c" }],
generating: false,
},
{
role: "assistant",
parts: [{ type: "text", text: "d" }],
generating: true,
},
],
t,
});
// 2 persisted rows + 2 pending = 4.
expect(md).toContain("- Messages: 4");
});
it("emits the heading and note for a generating assistant with empty parts", () => {
expect(() =>
buildChatMarkdown({
title: "t",
chatId: "c",
rows: [row({ role: "user", content: "persisted" })],
pending: [
{
role: "assistant",
parts: [],
generating: true,
},
],
t,
}),
).not.toThrow();
const md = buildChatMarkdown({
title: "t",
chatId: "c",
rows: [row({ role: "user", content: "persisted" })],
pending: [
{
role: "assistant",
parts: [],
generating: true,
},
rows: [row({ id: "u1", role: "user", content: "q" })],
live: [
live({ id: "u1", role: "user", parts: [{ type: "text", text: "q" }] }),
live({ id: "a-live", role: "assistant", parts: [] }),
],
isStreaming: true,
t,
});
expect(md).toContain("## 2. AI agent");
expect(md).toContain("still being generated");
});
});
describe("buildChatMarkdown — live enrichment from persisted rows", () => {
it("pulls usage / error / timestamp from the persisted row matched by id", () => {
const md = buildChatMarkdown({
title: "t",
chatId: "c",
rows: [
row({
id: "a1",
role: "assistant",
content: "x",
createdAt: "2026-06-22T10:00:00.000Z",
metadata: {
usage: { inputTokens: 10, outputTokens: 5 },
error: "rate limited",
},
}),
],
live: [
// Same id as the persisted row, but no usage/error/timestamp on the live msg.
live({
id: "a1",
role: "assistant",
parts: [{ type: "text", text: "reply" }],
}),
],
isStreaming: false,
t,
});
expect(md).toContain("reply");
// Token footer + total come from the enriched row.
expect(md).toContain("_Tokens — in: 10, out: 5, total: 15_");
expect(md).toContain("- Total tokens: 15");
expect(md).toContain("**⚠️ Error:** rate limited");
// The persisted timestamp is carried into the export.
expect(md).toContain("<!-- 2026-06-22T10:00:00.000Z -->");
});
it("prefers authoritative usage already on the live message over the row's", () => {
const md = buildChatMarkdown({
title: "t",
chatId: "c",
rows: [
row({
id: "a1",
role: "assistant",
content: "x",
metadata: {
usage: { inputTokens: 1, outputTokens: 1, totalTokens: 2 },
},
}),
],
live: [
live({
id: "a1",
role: "assistant",
parts: [{ type: "text", text: "reply" }],
metadata: {
usage: { inputTokens: 100, outputTokens: 50, totalTokens: 150 },
},
}),
],
isStreaming: false,
t,
});
// The live (authoritative, freshest) usage wins, not the stale row usage.
expect(md).toContain("- Total tokens: 150");
expect(md).not.toContain("- Total tokens: 2");
});
it("a current-turn live message with no matching row renders without a footer", () => {
const md = buildChatMarkdown({
title: "t",
chatId: "c",
rows: [row({ id: "u1", role: "user", content: "q" })],
live: [
live({ id: "u1", role: "user", parts: [{ type: "text", text: "q" }] }),
live({
id: "a-live",
role: "assistant",
parts: [{ type: "text", text: "fresh reply" }],
}),
],
isStreaming: false,
t,
});
expect(md).toContain("fresh reply");
// No persisted row for the live assistant -> no token footer, no timestamp.
expect(md).not.toContain("_Tokens —");
expect(md).not.toContain("<!-- undefined -->");
});
});
describe("buildChatMarkdown — fallback + banner", () => {
it("falls back to the persisted rows when there are no live messages", () => {
const md = buildChatMarkdown({
title: "t",
chatId: "c",
rows: [
row({ role: "user", content: "from rows" }),
row({
role: "assistant",
content: "answer",
metadata: { usage: { inputTokens: 4, outputTokens: 6 } },
}),
],
live: [], // empty live mirror -> fallback path
isStreaming: false,
t,
});
expect(md).toContain("## 1. You");
expect(md).toContain("## 2. AI agent");
expect(md).toContain("from rows");
expect(md).toContain("- Messages: 2");
expect(md).toContain("- Total tokens: 10");
});
it("appends the on-screen banner once, after the messages", () => {
const md = buildChatMarkdown({
title: "t",
chatId: "c",
rows: [row({ role: "user", content: "q" })],
live: [
live({ id: "u", role: "user", parts: [{ type: "text", text: "q" }] }),
],
isStreaming: false,
banner: "Rate limit reached — try again shortly.",
t,
});
expect(md).toContain("_⚠️ Rate limit reached — try again shortly._");
// Banner comes after the (only) message block.
expect(md.indexOf("Rate limit reached")).toBeGreaterThan(
md.indexOf("## 1."),
);
});
it("omits the banner block when there is no banner", () => {
const md = buildChatMarkdown({
title: "t",
chatId: "c",
rows: [row({ role: "user", content: "q" })],
live: [
live({ id: "u", role: "user", parts: [{ type: "text", text: "q" }] }),
],
isStreaming: false,
banner: null,
t,
});
expect(md).not.toContain("_⚠️");
});
});
// #174: a brand-new, not-yet-persisted chat whose first turn is streaming (or was
// interrupted) has live messages but NO persisted rows yet, and its chat id is not
// known (the caller passes a placeholder). The export must still capture the
// on-screen thread WYSIWYG from the live messages alone.
describe("buildChatMarkdown — first-turn export with no persisted base (#174)", () => {
it("builds the document from live messages alone when rows are empty", () => {
const md = buildChatMarkdown({
title: null,
chatId: "unsaved",
rows: [],
live: [
live({
id: "u1",
role: "user",
parts: [{ type: "text", text: "hello" }],
}),
live({
id: "a1",
role: "assistant",
parts: [{ type: "text", text: "partial reply" }],
}),
],
isStreaming: true,
t,
});
// Both on-screen messages are serialized, numbered from 1.
expect(md).toContain("## 1. You");
expect(md).toContain("hello");
expect(md).toContain("## 2. AI agent");
expect(md).toContain("partial reply");
// The streaming tail assistant is flagged as in-progress.
expect(md).toContain("still being generated");
// The placeholder chat id and the live message count are recorded.
expect(md).toContain("- Chat ID: `unsaved`");
expect(md).toContain("- Messages: 2");
// No persisted timestamp exists for a current-turn live message.
expect(md).not.toContain("<!--");
});
it("captures an interrupted first turn (no rows, not streaming) without a generating note", () => {
const md = buildChatMarkdown({
title: null,
chatId: "unsaved",
rows: [],
live: [
live({ id: "u1", role: "user", parts: [{ type: "text", text: "q" }] }),
live({
id: "a1",
role: "assistant",
parts: [{ type: "text", text: "half an answer" }],
}),
],
isStreaming: false,
banner: "Connection dropped — the response was cut off.",
t,
});
expect(md).toContain("half an answer");
// An interrupted (non-streaming) partial is exported as-is, no generating note.
expect(md).not.toContain("still being generated");
// The on-screen banner records the interruption.
expect(md).toContain("_⚠️ Connection dropped — the response was cut off._");
});
});

View File

@@ -25,11 +25,23 @@ type Translate = (key: string, values?: Record<string, unknown>) => string;
interface BuildChatMarkdownArgs {
title: string | null;
chatId: string;
/** The live, on-screen messages — the WYSIWYG source of the export. When
* present and non-empty these DRIVE the document (so it mirrors exactly what
* the user sees, including a partial reply from an interrupted turn). Each is
* matched to a persisted row by `id` to enrich it with token usage / error /
* timestamp. When absent or empty the builder falls back to `rows`. */
live?: LiveMessage[];
/** Persisted message rows. Enrichment source (matched to `live` by id) AND the
* fallback document source when `live` is empty. */
rows: IAiChatMessageRow[];
/** In-progress, not-yet-persisted live messages (the current streaming
* turn) to append after the persisted rows. `generating: true` adds a
* note that the message is still being produced. */
pending?: PendingMessage[];
/** Whether the live thread is still streaming. Only then is the tail assistant
* message flagged "still generating"; an interrupted (non-streaming) partial
* reply is exported as-is and the `banner` explains the interruption. */
isStreaming?: boolean;
/** The on-screen banner text (error / dropped connection / manual stop),
* appended at the end of the export so the artifact records the interruption
* the user saw. */
banner?: string | null;
t: Translate;
}
@@ -39,10 +51,31 @@ interface TextLikePart {
text?: string;
}
/** A live, not-yet-persisted message (current streaming turn) to append. */
interface PendingMessage {
/** Authoritative per-turn usage the server attaches to a message / row. */
interface UsageLike {
inputTokens?: number;
outputTokens?: number;
totalTokens?: number;
reasoningTokens?: number;
}
/** A live, on-screen message (subset of the AI SDK UIMessage we consume). */
interface LiveMessage {
id: string;
role: "user" | "assistant" | string;
parts: TextLikePart[];
metadata?: { usage?: UsageLike; error?: string };
}
/** One message normalized for rendering, regardless of live/persisted origin. */
interface ExportItem {
role: string;
parts: TextLikePart[];
usage?: UsageLike;
error?: string;
/** ISO timestamp from the persisted row, when one is known. */
createdAt?: string;
/** True only for the tail assistant message while the thread is streaming. */
generating: boolean;
}
@@ -127,53 +160,128 @@ function renderMessageParts(parts: TextLikePart[], t: Translate): string[] {
return out;
}
/** Resolve a persisted row's parts: prefer the rich persisted parts, else a
* single text part built from the plain-text content (mirrors `rowToUiMessage`). */
function rowParts(row: IAiChatMessageRow): TextLikePart[] {
return Array.isArray(row.metadata?.parts) && row.metadata.parts.length > 0
? (row.metadata.parts as TextLikePart[])
: [{ type: "text", text: row.content ?? "" }];
}
/**
* Normalize the export to one ordered list of {@link ExportItem}, WYSIWYG-first:
*
* - When `live` messages are present, THEY are the document (what the user sees,
* incl. an interrupted turn's partial reply). Each is matched to a persisted
* row by `id` to pull token usage / error / timestamp — a live message of the
* CURRENT turn has no matching row yet, so it simply renders without a footer.
* Authoritative `usage`/`error` already on the live message metadata win over
* the row (the server attaches usage to the streamed message at a step
* boundary before the row is refetched). Only the tail assistant message is
* flagged `generating`, and only while `isStreaming`.
* - When `live` is empty (e.g. the export runs before the live mirror is
* populated), fall back to the persisted `rows` so the format never regresses.
*/
function resolveItems(
live: LiveMessage[] | undefined,
rows: IAiChatMessageRow[],
isStreaming: boolean,
): ExportItem[] {
if (live && live.length > 0) {
const rowsById = new Map(rows.map((r) => [r.id, r]));
// The "still generating" note may apply ONLY to an assistant message that is
// the actual TAIL of the list — that is where the on-screen typing indicator
// sits. While `status === "submitted"` (isStreaming true) right after the
// user hit send, the tail is the USER message and the new assistant turn has
// no message yet; the previous assistant answer is shown complete on screen,
// so it must NOT be flagged (the indicator renders as a separate bottom
// block, not on that answer).
const lastIndex = live.length - 1;
const tailIsStreamingAssistant =
isStreaming && live[lastIndex]?.role === "assistant";
return live.map((m, i) => {
const row = rowsById.get(m.id);
return {
role: m.role,
parts: m.parts ?? [],
// Authoritative usage/error already on the live message (the server
// attaches usage to the streamed message at a step boundary) wins over
// the persisted row; a current-turn live message has no matching row yet
// and simply renders without a token footer (the accepted WYSIWYG
// tradeoff — an interrupted turn loses only its token footer, not text).
usage: m.metadata?.usage ?? row?.metadata?.usage,
error: m.metadata?.error ?? row?.metadata?.error ?? undefined,
createdAt: row?.createdAt,
generating: tailIsStreamingAssistant && i === lastIndex,
};
});
}
return rows.map((row) => ({
role: row.role,
parts: rowParts(row),
usage: row.metadata?.usage,
error: row.metadata?.error ?? undefined,
createdAt: row.createdAt,
generating: false,
}));
}
/**
* Serialize a chat to a Markdown string. Pure (apart from `new Date()` for the
* export timestamp), so it is straightforward to unit-test.
*/
export function buildChatMarkdown(args: BuildChatMarkdownArgs): string {
const { title, chatId, rows, pending, t } = args;
const { title, chatId, live, rows, isStreaming, banner, t } = args;
const blocks: string[] = [];
const items = resolveItems(live, rows, isStreaming === true);
const heading = (title ?? "").trim() || t("Untitled chat");
blocks.push(`# ${heading}`);
// Metadata bullet list. Total tokens is only shown when there is a sum.
const totalTokens = rows.reduce((sum, row) => {
const usage = row.metadata?.usage;
return usage ? sum + rowTokens(usage) : sum;
}, 0);
const totalTokens = items.reduce(
(sum, item) => (item.usage ? sum + rowTokens(item.usage) : sum),
0,
);
const meta = [
`- Chat ID: \`${chatId}\``,
`- Exported: ${new Date().toISOString()}`,
`- Messages: ${rows.length + (pending?.length ?? 0)}`,
`- Messages: ${items.length}`,
];
if (totalTokens > 0) meta.push(`- Total tokens: ${totalTokens}`);
blocks.push(meta.join("\n"));
rows.forEach((row, index) => {
items.forEach((item, index) => {
blocks.push("---");
const roleLabel = row.role === "assistant" ? t("AI agent") : t("You");
const roleLabel = item.role === "assistant" ? t("AI agent") : t("You");
blocks.push(`## ${index + 1}. ${roleLabel}`);
// Created-at kept in source as an HTML comment (out of the rendered prose).
blocks.push(`<!-- ${row.createdAt} -->`);
// A live message of the current turn has no persisted row yet — omit it.
if (item.createdAt) blocks.push(`<!-- ${item.createdAt} -->`);
// Resolve parts: prefer the rich persisted parts, else a single text part
// built from the plain-text content (mirrors `rowToUiMessage`).
const parts: TextLikePart[] =
Array.isArray(row.metadata?.parts) && row.metadata.parts.length > 0
? (row.metadata.parts as TextLikePart[])
: [{ type: "text", text: row.content ?? "" }];
blocks.push(...renderMessageParts(item.parts, t));
blocks.push(...renderMessageParts(parts, t));
if (row.metadata?.error) {
blocks.push(`**⚠️ Error:** ${row.metadata.error}`);
// A generating assistant may have empty/no parts yet — the heading (above)
// and this note still record the in-progress turn.
if (item.generating) {
blocks.push(
"_⏳ This message is still being generated — the export captured a partial, in-progress response._",
);
}
const usage = row.metadata?.usage;
// A persisted per-message error (the raw provider text) may coexist with the
// trailing `banner` (the classified on-screen alert) when the failed turn's
// row has already been refetched by export time. They describe the same
// failure at different fidelity; showing both is an accepted, minor redundancy.
if (item.error) {
blocks.push(`**⚠️ Error:** ${item.error}`);
}
const usage = item.usage;
if (usage) {
const total = usage.totalTokens ?? rowTokens(usage);
// Reasoning (thinking) tokens are shown only when the provider reported a
@@ -188,27 +296,12 @@ export function buildChatMarkdown(args: BuildChatMarkdownArgs): string {
}
});
// Append the in-progress, not-yet-persisted live messages (the current
// streaming turn) after the persisted rows. Heading numbering CONTINUES from
// the persisted rows. A `generating` assistant gets a note that the captured
// response is partial; pending messages carry no usage/token footer yet.
(pending ?? []).forEach((message, p) => {
// Record the on-screen banner (error / dropped connection / manual stop) so
// the export reflects exactly what the user saw, including an interruption.
if (banner && banner.trim().length > 0) {
blocks.push("---");
const num = rows.length + p + 1;
const roleLabel = message.role === "assistant" ? t("AI agent") : t("You");
blocks.push(`## ${num}. ${roleLabel}`);
blocks.push(...renderMessageParts(message.parts, t));
// A generating assistant may have empty/no parts yet — still emit the
// heading (above) and this note so the export shows the in-progress turn.
if (message.generating === true) {
blocks.push(
"_⏳ This message is still being generated — the export captured a partial, in-progress response._",
);
}
});
blocks.push(`_⚠️ ${banner.trim()}_`);
}
// Blank line between blocks so the Markdown renders cleanly.
return blocks.join("\n\n");

View File

@@ -56,7 +56,13 @@ function buildInitialValues(server?: IAiMcpServer): FormValues {
transport: server?.transport ?? "http",
url: server?.url ?? "",
authHeader: "",
toolAllowlist: server?.toolAllowlist ?? [],
// Defensive: TagsInput calls `.map`, so a non-array here (e.g. an API that
// returns the jsonb column as a JSON string) would crash the whole page. The
// server normalizes this now, but guard anyway so a bad shape can never take
// the settings UI down.
toolAllowlist: Array.isArray(server?.toolAllowlist)
? server.toolAllowlist
: [],
enabled: server?.enabled ?? true,
};
}

View File

@@ -38,6 +38,7 @@ import {
AiTestCapability,
IAiSettingsUpdate,
SttApiStyle,
ChatApiStyle,
} from "@/features/workspace/services/ai-settings-service.ts";
import { useAiRolesQuery } from "@/features/ai-chat/queries/ai-chat-query.ts";
import { IAiRole } from "@/features/ai-chat/types/ai-chat.types.ts";
@@ -82,6 +83,8 @@ const STT_LANGUAGE_OPTIONS: { value: string; label: string }[] = [
// (empty means "leave unchanged" unless explicitly cleared).
const formSchema = z.object({
chatModel: z.string(),
// Chat provider implementation (reasoning surfacing). Default openai-compatible.
chatApiStyle: z.enum(["openai-compatible", "openai"]),
// Cheap model id for the anonymous public-share assistant; empty = use chatModel.
publicShareChatModel: z.string(),
// Agent-role id whose persona the public-share assistant adopts; empty =
@@ -308,6 +311,7 @@ export default function AiProviderSettings() {
validate: zod4Resolver(formSchema),
initialValues: {
chatModel: "",
chatApiStyle: "openai-compatible" as ChatApiStyle,
publicShareChatModel: "",
publicShareAssistantRoleId: "",
embeddingModel: "",
@@ -330,6 +334,7 @@ export default function AiProviderSettings() {
if (!settings) return;
form.setValues({
chatModel: settings.chatModel ?? "",
chatApiStyle: settings.chatApiStyle ?? "openai-compatible",
publicShareChatModel: settings.publicShareChatModel ?? "",
publicShareAssistantRoleId: settings.publicShareAssistantRoleId ?? "",
embeddingModel: settings.embeddingModel ?? "",
@@ -359,6 +364,7 @@ export default function AiProviderSettings() {
// Everything is OpenAI-compatible.
driver: "openai",
chatModel: values.chatModel,
chatApiStyle: values.chatApiStyle,
// Cheap model id for the anonymous public-share assistant; empty falls
// back to chatModel server-side.
publicShareChatModel: values.publicShareChatModel,
@@ -761,6 +767,24 @@ export default function AiProviderSettings() {
{t("Resolves to {{url}}", { url: chatResolved })}
</Text>
<Select
mt="sm"
label={t("Protocol")}
description={t(
"How chat requests are sent and how reasoning is surfaced",
)}
data={[
{
value: "openai-compatible",
label: t("OpenAI-compatible (surfaces reasoning)"),
},
{ value: "openai", label: t("OpenAI (official)") },
]}
allowDeselect={false}
disabled={isLoading}
{...form.getInputProps("chatApiStyle")}
/>
{/* Anonymous public-share assistant: a single master toggle + an
optional cheaper model id. Reuses this card's driver/URL/key. */}
<Group justify="space-between" align="center" wrap="nowrap" mt="md">

View File

@@ -9,6 +9,12 @@ export type AiDriver = "openai" | "gemini" | "ollama";
// - 'json' -> JSON body with base64-encoded audio (OpenRouter)
export type SttApiStyle = "multipart" | "json";
// Chat provider implementation for the `openai` driver (chosen explicitly):
// - 'openai-compatible' -> maps streamed reasoning_content to reasoning parts
// (z.ai/GLM, DeepSeek, OpenRouter, ...). Default.
// - 'openai' -> official provider; real-OpenAI reasoning-model shaping.
export type ChatApiStyle = "openai-compatible" | "openai";
// Masked AI provider settings returned by the server.
// No API key is ever returned; only `hasApiKey` / `hasEmbeddingApiKey` indicate
// whether one is stored. `embeddingBaseUrl` is the RAW stored value (empty means
@@ -16,6 +22,7 @@ export type SttApiStyle = "multipart" | "json";
export interface IAiSettings {
driver?: AiDriver;
chatModel?: string;
chatApiStyle?: ChatApiStyle;
// Cheap model id for the anonymous public-share assistant; empty = chatModel.
publicShareChatModel?: string;
// Agent-role id whose persona the public-share assistant adopts; empty =
@@ -49,6 +56,7 @@ export interface IAiSettings {
export interface IAiSettingsUpdate {
driver?: AiDriver;
chatModel?: string;
chatApiStyle?: ChatApiStyle;
publicShareChatModel?: string;
// Agent-role id whose persona the public-share assistant adopts; empty =
// built-in locked persona.

View File

@@ -6,6 +6,7 @@ import { createMCPClient } from '@ai-sdk/mcp';
import { Agent, type Dispatcher } from 'undici';
import { AiMcpServerRepo } from '@docmost/db/repos/ai-chat/ai-mcp-server.repo';
import { AiMcpServer } from '@docmost/db/types/entity.types';
import { streamingDispatcherOptions } from '../../../integrations/ai/ai-streaming-fetch';
import { SecretBoxService } from '../../../integrations/crypto/secret-box';
import { isUrlAllowed, isIpAllowed } from './ssrf-guard';
@@ -400,6 +401,16 @@ export function validateResolvedAddresses(
*/
function buildPinnedDispatcher(): Agent {
return new Agent({
// Raise undici's default 300s headers/body timeouts on external MCP traffic
// to the same generous-but-finite silence timeout the chat fetch uses (#175).
// A long agent turn keeps an SSE transport (e.g. crawl4ai's /mcp/sse) open
// across the whole turn; that connection can idle BETWEEN tool calls longer
// than 5 min, and undici's bodyTimeout would otherwise sever it mid-task — a
// tool-call failure that aborts the streamed turn and shows the user "Lost
// connection to the AI provider". A slow single tool call (a crawl) can
// likewise exceed headersTimeout. The timeout stays FINITE so a genuinely
// hung server is still broken eventually.
...streamingDispatcherOptions(),
connect: {
lookup: (hostname, _options, callback) => {
// Always resolve ALL addresses ourselves; do not trust the caller's

View File

@@ -0,0 +1,48 @@
import { parseToolAllowlist } from './ai-mcp-server.repo';
/**
* The `tool_allowlist` jsonb column historically round-trips as a JSON STRING
* (rows written by the old double-encoding `jsonbArray`), so the driver hands
* back `'["a","b"]'` instead of an array. `parseToolAllowlist` normalizes both
* shapes to the `string[] | null` the entity type promises — fixing the settings
* UI crash (TagsInput `.map` on a string) and the tool-allowlist enforcement
* (which did `Array.isArray(allow)` and silently allowed ALL tools for a string).
*/
describe('parseToolAllowlist', () => {
it('passes a real string array through unchanged', () => {
expect(parseToolAllowlist(['search', 'crawl'])).toEqual(['search', 'crawl']);
});
it('parses a JSON-string array (the double-encoded read) into an array', () => {
// This is exactly what the DB returns for an old row: a jsonb string scalar.
expect(parseToolAllowlist('["alpha","beta"]')).toEqual(['alpha', 'beta']);
});
it('returns null for null / undefined (unrestricted)', () => {
expect(parseToolAllowlist(null)).toBeNull();
expect(parseToolAllowlist(undefined)).toBeNull();
});
it('returns [] for an empty array (no items, but a present allowlist)', () => {
expect(parseToolAllowlist([])).toEqual([]);
});
it('returns null for a JSON string that is not an array', () => {
expect(parseToolAllowlist('"justastring"')).toBeNull();
expect(parseToolAllowlist('{"a":1}')).toBeNull();
});
it('returns null for an unparseable string', () => {
expect(parseToolAllowlist('not json at all')).toBeNull();
});
it('returns null when elements are not all strings (defensive)', () => {
expect(parseToolAllowlist([1, 2, 3] as unknown)).toBeNull();
expect(parseToolAllowlist('[1,2,3]')).toBeNull();
});
it('returns null for a non-string, non-array primitive', () => {
expect(parseToolAllowlist(42 as unknown)).toBeNull();
expect(parseToolAllowlist(true as unknown)).toBeNull();
});
});

View File

@@ -21,32 +21,35 @@ export class AiMcpServerRepo {
id: string,
workspaceId: string,
): Promise<AiMcpServer | undefined> {
return this.db
const row = await this.db
.selectFrom('aiMcpServers')
.selectAll('aiMcpServers')
.where('id', '=', id)
.where('workspaceId', '=', workspaceId)
.executeTakeFirst();
return row ? normalizeRow(row) : row;
}
async listByWorkspace(workspaceId: string): Promise<AiMcpServer[]> {
return this.db
const rows = await this.db
.selectFrom('aiMcpServers')
.selectAll('aiMcpServers')
.where('workspaceId', '=', workspaceId)
.orderBy('createdAt', 'asc')
.execute();
return rows.map(normalizeRow);
}
/** Enabled servers only — used by the agent loop to build the toolset. */
async listEnabled(workspaceId: string): Promise<AiMcpServer[]> {
return this.db
const rows = await this.db
.selectFrom('aiMcpServers')
.selectAll('aiMcpServers')
.where('workspaceId', '=', workspaceId)
.where('enabled', '=', true)
.orderBy('createdAt', 'asc')
.execute();
return rows.map(normalizeRow);
}
async insert(
@@ -130,6 +133,14 @@ export class AiMcpServerRepo {
* Encode a string[] as a jsonb bind for the `tool_allowlist` column. Passing a
* plain JS array to the postgres driver would serialize it as a Postgres array
* literal (incompatible with jsonb), so we bind the JSON text and cast it.
*
* The cast is `::text::jsonb`, NOT `::jsonb`: if the parameter is bound straight
* to a jsonb cast, node-postgres infers its type as jsonb and JSON-stringifies
* the (already-JSON) string a SECOND time, so the column ends up holding a jsonb
* STRING SCALAR (`"[\"a\"]"`) instead of a jsonb ARRAY. Forcing the param through
* `::text` first binds it as text (sent verbatim), and `::jsonb` then parses it
* into a real array. (`normalizeRow` below repairs rows written the old way.)
*
* Returns null for null/empty arrays (an empty allowlist means "no restriction"
* is not intended — callers pass null to clear; an empty array is normalized to
* null here so it never round-trips as `[]`).
@@ -139,5 +150,37 @@ function jsonbArray(value: string[] | null | undefined) {
return null;
}
// Typed as string[] so it is assignable to the toolAllowlist column.
return sql<string[]>`${JSON.stringify(value)}::jsonb`;
return sql<string[]>`${JSON.stringify(value)}::text::jsonb`;
}
/**
* Parse the `toolAllowlist` value read from the DB into the `string[] | null`
* the entity type promises. The jsonb column historically round-trips as a JSON
* STRING (rows written by the old double-encoding `jsonbArray`, see above), so
* the driver hands back a string like `'["a","b"]'` rather than an array. Be
* tolerant: an already-parsed array passes through; a JSON string is parsed; null
* / a non-array / unparseable value becomes null (unrestricted).
*/
export function parseToolAllowlist(value: unknown): string[] | null {
if (value == null) return null;
if (Array.isArray(value)) {
return value.every((v) => typeof v === 'string') ? (value as string[]) : null;
}
if (typeof value === 'string') {
try {
const parsed = JSON.parse(value);
return Array.isArray(parsed) &&
parsed.every((v) => typeof v === 'string')
? (parsed as string[])
: null;
} catch {
return null;
}
}
return null;
}
/** Normalize a DB row so `toolAllowlist` is always `string[] | null`. */
function normalizeRow(row: AiMcpServer): AiMcpServer {
return { ...row, toolAllowlist: parseToolAllowlist(row.toolAllowlist) };
}

View File

@@ -10,6 +10,29 @@ import {
import { ExpressionBuilder, sql } from 'kysely';
import { DB, Workspaces } from '@docmost/db/types/db';
/**
* Writable `settings.ai.provider` keys, enforced at this generic SQL layer. This
* repo cannot import AI-feature types, so this list is its own copy; a parity
* test (ai-provider-settings-keys.spec.ts) asserts it equals
* PROVIDER_SETTINGS_KEYS in ai.types so a future drift fails in CI rather than
* silently dropping a field at this boundary.
*/
export const AI_PROVIDER_SETTINGS_ALLOWED: readonly string[] = [
'driver',
'chatModel',
'chatApiStyle',
'embeddingModel',
'baseUrl',
'embeddingBaseUrl',
'sttModel',
'sttBaseUrl',
'sttApiStyle',
'sttLanguage',
'systemPrompt',
'publicShareChatModel',
'publicShareAssistantRoleId',
];
@Injectable()
export class WorkspaceRepo {
public baseFields: Array<keyof Workspaces> = [
@@ -239,9 +262,8 @@ export class WorkspaceRepo {
// is a real jsonb object, never a double-encoded string. The CASE self-heals
// workspaces whose settings.ai.provider was previously corrupted into an
// array/string.
const ALLOWED = ['driver', 'chatModel', 'embeddingModel', 'baseUrl', 'embeddingBaseUrl', 'sttModel', 'sttBaseUrl', 'sttApiStyle', 'sttLanguage', 'systemPrompt', 'publicShareChatModel', 'publicShareAssistantRoleId'];
const entries = Object.entries(provider).filter(
([k, v]) => v !== undefined && ALLOWED.includes(k),
([k, v]) => v !== undefined && AI_PROVIDER_SETTINGS_ALLOWED.includes(k),
);
const patch = entries.length
? sql`jsonb_build_object(${sql.join(

View File

@@ -0,0 +1,40 @@
import { createInstrumentedFetch } from './ai-provider-http';
/**
* createInstrumentedFetch must be behavior-neutral: it delegates to the supplied
* baseFetch with the SAME input/init, returns the Response object untouched (so
* the streamed SSE body is never read/cloned), and rethrows the same error. The
* baseFetch injection is the seam that carries the streaming fetch (#175) onto
* the chat provider, so it is tested directly.
*/
describe('createInstrumentedFetch', () => {
it('delegates to the injected baseFetch with the same input/init', async () => {
const fakeResponse = new Response('ok', { status: 200 });
const baseFetch = jest.fn().mockResolvedValue(fakeResponse);
const instrumented = createInstrumentedFetch('test', baseFetch as never);
const init = { method: 'POST', body: '{"q":1}' };
const res = await instrumented('https://example.com/v1/chat', init);
expect(baseFetch).toHaveBeenCalledTimes(1);
expect(baseFetch).toHaveBeenCalledWith('https://example.com/v1/chat', init);
// The Response is returned UNTOUCHED (same reference — never read/cloned).
expect(res).toBe(fakeResponse);
});
it('rethrows the base fetch error unchanged (pre-response failure)', async () => {
const err = Object.assign(new TypeError('fetch failed'), {
cause: { code: 'ECONNRESET' },
});
const baseFetch = jest.fn().mockRejectedValue(err);
const instrumented = createInstrumentedFetch('test', baseFetch as never);
await expect(instrumented('https://example.com/')).rejects.toBe(err);
});
it('defaults to the global fetch when no baseFetch is given', () => {
// Constructing without a baseFetch must not throw — it simply wraps global
// fetch (the non-chat default).
expect(() => createInstrumentedFetch('test')).not.toThrow();
});
});

View File

@@ -0,0 +1,87 @@
import { Logger } from '@nestjs/common';
/**
* The provider HTTP fetch used by the chat path: a thin, behavior-neutral
* instrumentation wrapper around a supplied `fetch`.
*
* It defaults to the global `fetch`, but the chat provider passes the streaming
* fetch (which RAISES undici's 300s stream timeouts to a generous-but-finite
* silence timeout so a long agent turn is not severed mid-stream — #175). So this
* wrapper observes the EXACT transport a turn uses. It NEVER retries, times out,
* swaps the dispatcher, or reads/clones the response body — the Response is
* returned untouched (streaming unaffected) and any error is rethrown unchanged.
*
* Per provider HTTP call it logs: time-to-response-headers + status + request
* body size on success; and on a pre-response rejection the failure latency +
* error code/cause + request body size + the idle gap since the previous call.
* This telemetry is intentional and kept (it diagnoses provider connection
* resets / mid-stream cuts), and it is load-bearing: the streaming fetch reaches
* the chat provider THROUGH this wrapper, so the two are one construct.
*
* How to read the result (a long agentic turn makes one provider call per step):
* - a failed turn whose last provider line is "PRE-RESPONSE FAILED ... ECONNRESET"
* => the reset is in the CONNECTION phase of a step's request (the provider
* never replied) — usually a poisoned keep-alive socket or the provider/middle
* box resetting that request (large body / idle gap are the suspects, hence
* reqBytes + idleSincePrevCall below).
* - the last line is "OK status=200" and the turn still errors with NO
* "PRE-RESPONSE FAILED" => the cut happened MID-STREAM (after headers), a
* different failure mode.
*
* The seq/last-call timestamps are module-level, so under concurrent turns the
* idle-gap figure is approximate (fine for single-user diagnosis).
*/
export function createInstrumentedFetch(
context: string,
// The underlying fetch to instrument. Defaults to the global fetch; the chat
// provider passes the streaming fetch (raised, finite undici stream timeouts,
// #175) so the telemetry observes the SAME transport the long agent turn uses.
baseFetch: typeof fetch = fetch,
): typeof fetch {
const logger = new Logger(context);
let callSeq = 0;
let lastCallStartedAt: number | undefined;
return async (input: Parameters<typeof fetch>[0], init?: Parameters<typeof fetch>[1]): Promise<Response> => {
const callId = ++callSeq;
const startedAt = Date.now();
const idleSincePrev =
lastCallStartedAt === undefined ? undefined : startedAt - lastCallStartedAt;
lastCallStartedAt = startedAt;
// Request body size: the chat payload is a JSON string. Used to test whether
// failures correlate with the large accumulated context on later agent steps.
const body = init?.body as unknown;
const bodyBytes =
typeof body === 'string'
? body.length
: body instanceof Uint8Array
? body.byteLength
: undefined;
try {
// Delegate to the base fetch; return the Response UNTOUCHED (never read/
// clone the body) so the streamed SSE response is unaffected.
const res = await baseFetch(input, init);
logger.log(
`provider HTTP: call#${callId} OK ` +
`headersAfter=${Date.now() - startedAt}ms status=${res.status} ` +
`reqBytes=${bodyBytes ?? 'n/a'} idleSincePrevCall=${idleSincePrev ?? 'n/a'}ms`,
);
return res;
} catch (err) {
// fetch() rejected => PRE-RESPONSE failure (no headers/body received yet):
// the connection/request phase. Log it and rethrow the SAME error.
const e = err as {
name?: string;
message?: string;
cause?: { code?: string; message?: string };
};
logger.warn(
`provider HTTP: call#${callId} PRE-RESPONSE FAILED ` +
`after=${Date.now() - startedAt}ms code=${e?.cause?.code ?? 'none'} ` +
`name=${e?.name ?? 'Error'} cause=${e?.cause?.message ?? e?.message ?? 'unknown'} ` +
`reqBytes=${bodyBytes ?? 'n/a'} idleSincePrevCall=${idleSincePrev ?? 'n/a'}ms`,
);
throw err;
}
};
}

View File

@@ -0,0 +1,43 @@
import { validate } from 'class-validator';
import { plainToInstance } from 'class-transformer';
import { PROVIDER_SETTINGS_KEYS } from './ai.types';
import { AI_PROVIDER_SETTINGS_ALLOWED } from '@docmost/db/repos/workspace/workspace.repo';
import { UpdateAiSettingsDto } from './dto/update-ai-settings.dto';
/**
* Drift guard: the writable provider-settings keys are maintained in two layers
* that TypeScript cannot cross-check — PROVIDER_SETTINGS_KEYS (ai.types, used by
* the settings service) and AI_PROVIDER_SETTINGS_ALLOWED (the generic workspace
* repo's SQL boundary). A key missing from the repo copy silently drops the field
* on persist (exactly what happened to chatApiStyle), so this asserts they match.
*/
describe('provider-settings key allowlist parity', () => {
it('the repo SQL allowlist equals PROVIDER_SETTINGS_KEYS', () => {
expect([...AI_PROVIDER_SETTINGS_ALLOWED].sort()).toEqual(
[...PROVIDER_SETTINGS_KEYS].sort(),
);
});
});
/** DTO validation for the new chatApiStyle field (@IsIn(CHAT_API_STYLES)). */
describe('UpdateAiSettingsDto.chatApiStyle', () => {
const errorsFor = async (chatApiStyle: unknown) =>
validate(plainToInstance(UpdateAiSettingsDto, { chatApiStyle }));
it('accepts both valid values', async () => {
for (const v of ['openai-compatible', 'openai']) {
const errs = await errorsFor(v);
expect(errs.find((e) => e.property === 'chatApiStyle')).toBeUndefined();
}
});
it('rejects an unknown value', async () => {
const errs = await errorsFor('definitely-not-a-style');
expect(errs.find((e) => e.property === 'chatApiStyle')).toBeDefined();
});
it('accepts the field being omitted (optional)', async () => {
const errs = await validate(plainToInstance(UpdateAiSettingsDto, {}));
expect(errs.find((e) => e.property === 'chatApiStyle')).toBeUndefined();
});
});

View File

@@ -14,6 +14,8 @@ import {
MaskedAiSettings,
ResolvedAiConfig,
SttApiStyle,
ChatApiStyle,
PROVIDER_SETTINGS_KEYS,
} from './ai.types';
/**
@@ -24,6 +26,7 @@ import {
export interface UpdateAiSettingsInput {
driver?: AiDriver;
chatModel?: string;
chatApiStyle?: ChatApiStyle;
embeddingModel?: string;
baseUrl?: string;
embeddingBaseUrl?: string;
@@ -157,6 +160,8 @@ export class AiSettingsService {
const config: ResolvedAiConfig = {
driver: provider.driver,
chatModel: provider.chatModel,
// Plain passthrough; getChatModel defaults unset to 'openai-compatible'.
chatApiStyle: provider.chatApiStyle,
// Cheap model id for the anonymous public-share assistant; reuses the chat
// driver/baseUrl/apiKey. Empty/unset → callers fall back to chatModel.
publicShareChatModel: provider.publicShareChatModel,
@@ -238,6 +243,7 @@ export class AiSettingsService {
return {
driver: provider.driver,
chatModel: provider.chatModel,
chatApiStyle: provider.chatApiStyle,
embeddingModel: provider.embeddingModel,
baseUrl: provider.baseUrl,
embeddingBaseUrl: provider.embeddingBaseUrl,
@@ -275,20 +281,8 @@ export class AiSettingsService {
// Persist non-secret provider fields (only those present in the partial).
const providerPatch: Partial<AiProviderSettings> = {};
for (const key of [
'driver',
'chatModel',
'embeddingModel',
'baseUrl',
'embeddingBaseUrl',
'sttModel',
'sttBaseUrl',
'sttApiStyle',
'sttLanguage',
'systemPrompt',
'publicShareChatModel',
'publicShareAssistantRoleId',
] as const) {
// Single source of truth for the writable provider keys (see ai.types).
for (const key of PROVIDER_SETTINGS_KEYS) {
if (nonSecret[key] !== undefined) {
(providerPatch as Record<string, unknown>)[key] = nonSecret[key];
}

View File

@@ -0,0 +1,235 @@
import * as http from 'node:http';
import {
createStreamingFetch,
withPreResponseRetry,
streamTimeoutMs,
streamKeepAliveMs,
streamingDispatcherOptions,
isRetryableConnectError,
} from './ai-streaming-fetch';
/**
* #175: undici's default 300s headers/body timeouts severed long agent turns.
* The streaming fetch raises them to a generous-but-FINITE silence timeout (not
* 0 — a true hang must still break). We pin: the configured value + env override,
* that both dispatcher timeouts use it, and that a delayed response streams.
*/
describe('streamTimeoutMs', () => {
const ORIG = process.env.AI_STREAM_TIMEOUT_MS;
afterEach(() => {
if (ORIG === undefined) delete process.env.AI_STREAM_TIMEOUT_MS;
else process.env.AI_STREAM_TIMEOUT_MS = ORIG;
});
it('defaults to a generous-but-finite 15 minutes', () => {
delete process.env.AI_STREAM_TIMEOUT_MS;
expect(streamTimeoutMs()).toBe(900_000);
// Finite — NOT disabled (0 would let a hung provider leak forever).
expect(streamTimeoutMs()).toBeGreaterThan(0);
expect(Number.isFinite(streamTimeoutMs())).toBe(true);
});
it('honours a positive AI_STREAM_TIMEOUT_MS override', () => {
process.env.AI_STREAM_TIMEOUT_MS = '120000';
expect(streamTimeoutMs()).toBe(120000);
});
it('ignores an invalid / non-positive override (falls back to default)', () => {
for (const bad of ['0', '-5', 'abc', '']) {
process.env.AI_STREAM_TIMEOUT_MS = bad;
expect(streamTimeoutMs()).toBe(900_000);
}
});
it('applies the silence timeout + keep-alive recycle window to the dispatcher', () => {
delete process.env.AI_STREAM_TIMEOUT_MS;
delete process.env.AI_STREAM_KEEPALIVE_MS;
expect(streamingDispatcherOptions()).toEqual({
headersTimeout: 900_000,
bodyTimeout: 900_000,
keepAliveTimeout: 10_000,
keepAliveMaxTimeout: 10_000,
});
});
});
describe('streamKeepAliveMs', () => {
const ORIG = process.env.AI_STREAM_KEEPALIVE_MS;
afterEach(() => {
if (ORIG === undefined) delete process.env.AI_STREAM_KEEPALIVE_MS;
else process.env.AI_STREAM_KEEPALIVE_MS = ORIG;
});
it('defaults to 10s (recycle idle sockets so a NAT/proxy drop cannot poison reuse)', () => {
delete process.env.AI_STREAM_KEEPALIVE_MS;
expect(streamKeepAliveMs()).toBe(10_000);
});
it('honours a positive override and ignores invalid/non-positive', () => {
process.env.AI_STREAM_KEEPALIVE_MS = '4000';
expect(streamKeepAliveMs()).toBe(4000);
for (const bad of ['0', '-1', 'x', '']) {
process.env.AI_STREAM_KEEPALIVE_MS = bad;
expect(streamKeepAliveMs()).toBe(10_000);
}
});
});
describe('isRetryableConnectError', () => {
it('matches connection-level codes on the error or its cause', () => {
expect(isRetryableConnectError({ cause: { code: 'ECONNRESET' } })).toBe(true);
expect(isRetryableConnectError({ cause: { code: 'UND_ERR_SOCKET' } })).toBe(true);
expect(isRetryableConnectError({ code: 'ECONNREFUSED' })).toBe(true);
});
it('does NOT match aborts / unrelated errors', () => {
expect(isRetryableConnectError({ name: 'AbortError', cause: { code: 'ABORT_ERR' } })).toBe(false);
expect(isRetryableConnectError({ cause: { code: 'UND_ERR_HEADERS_TIMEOUT' } })).toBe(false);
expect(isRetryableConnectError(new Error('plain'))).toBe(false);
expect(isRetryableConnectError(undefined)).toBe(false);
});
});
describe('createStreamingFetch — against a delayed server', () => {
const ORIG = process.env.AI_STREAM_TIMEOUT_MS;
let server: http.Server;
let url: string;
// The server waits before sending ANY byte (a long time-to-first-token). It is
// > undici's ~1s timeout-timer granularity so a sub-second configured timeout
// fires deterministically in the load-bearing test below.
const DELAY = 1500;
beforeAll(async () => {
server = http.createServer((_req, res) => {
setTimeout(() => {
res.writeHead(200, { 'Content-Type': 'text/plain' });
res.end('ok');
}, DELAY);
});
await new Promise<void>((resolve) => server.listen(0, '127.0.0.1', resolve));
const addr = server.address() as import('node:net').AddressInfo;
url = `http://127.0.0.1:${addr.port}/`;
});
afterAll(async () => {
await new Promise<void>((resolve) => server.close(() => resolve()));
});
afterEach(() => {
if (ORIG === undefined) delete process.env.AI_STREAM_TIMEOUT_MS;
else process.env.AI_STREAM_TIMEOUT_MS = ORIG;
});
it('streams the delayed response at the default (generous) timeout', async () => {
delete process.env.AI_STREAM_TIMEOUT_MS; // default 15 min >> DELAY
const streamingFetch = createStreamingFetch();
const res = await streamingFetch(url);
expect(res.status).toBe(200);
expect(await res.text()).toBe('ok');
});
it('LOAD-BEARING: a sub-DELAY AI_STREAM_TIMEOUT_MS actually severs the response', async () => {
// Proves the configured dispatcher is wired into the fetch: with the timeout
// set below DELAY the call must reject with undici's headers-timeout. If the
// dispatcher were lost (fallback to global fetch's 300s default), the 1.5s
// response would slip through and this would NOT throw.
process.env.AI_STREAM_TIMEOUT_MS = '500';
const streamingFetch = createStreamingFetch();
let caught: unknown;
const startedAt = Date.now();
try {
await streamingFetch(url).then((r) => r.text());
} catch (e) {
caught = e;
}
// It rejected (a lost dispatcher -> global 300s default would NOT reject on a
// 1.5s response) and it did so BEFORE the response would have arrived (DELAY).
// Use `.name` (realm-safe) — undici's TypeError fails cross-realm instanceof.
expect(caught).toBeDefined();
expect((caught as Error)?.name).toBe('TypeError');
expect(Date.now() - startedAt).toBeLessThan(DELAY);
// When present, the undici cause is the headers timeout.
const code = (caught as { cause?: { code?: string } })?.cause?.code;
if (code) expect(code).toBe('UND_ERR_HEADERS_TIMEOUT');
});
});
describe('withPreResponseRetry', () => {
// The retry is the OUTERMOST layer (over the dispatcher-bound streaming fetch),
// matching ai.service's withPreResponseRetry(instrument(createStreamingFetch())).
// PRE_RESPONSE_CONNECT_RETRIES is 2 -> at most 3 total attempts.
const MAX_ATTEMPTS = 3;
let server: http.Server;
let url: string;
let requests = 0;
// 'first' resets only the first connection; 'all' resets every connection.
let resetMode: 'first' | 'all' = 'first';
const retryingFetch = () => withPreResponseRetry(createStreamingFetch());
beforeAll(async () => {
server = http.createServer((req, res) => {
requests += 1;
const shouldReset = resetMode === 'all' || requests === 1;
if (shouldReset) {
// Reset before any response byte (a poisoned/stale keep-alive socket).
const sock = req.socket as import('node:net').Socket & {
resetAndDestroy?: () => void;
};
if (typeof sock.resetAndDestroy === 'function') sock.resetAndDestroy();
else sock.destroy();
return;
}
res.writeHead(200, { 'Content-Type': 'text/plain' });
res.end('ok');
});
await new Promise<void>((resolve) => server.listen(0, '127.0.0.1', resolve));
const addr = server.address() as import('node:net').AddressInfo;
url = `http://127.0.0.1:${addr.port}/`;
});
afterAll(async () => {
await new Promise<void>((resolve) => server.close(() => resolve()));
});
beforeEach(() => {
requests = 0;
resetMode = 'first';
});
it('retries a pre-response reset on a fresh connection and succeeds', async () => {
resetMode = 'first';
const res = await retryingFetch()(url);
expect(res.status).toBe(200);
expect(await res.text()).toBe('ok');
// first request reset -> retry -> second request served.
expect(requests).toBe(2);
});
it('gives up after the retry bound and rethrows the original reset', async () => {
resetMode = 'all'; // every attempt resets -> retries exhaust
let caught: unknown;
try {
await retryingFetch()(url);
} catch (e) {
caught = e;
}
expect(caught).toBeDefined();
// A retryable connection error reached the caller (not swallowed).
expect(isRetryableConnectError(caught)).toBe(true);
// Bounded: exactly PRE_RESPONSE_CONNECT_RETRIES + 1 attempts hit the server
// (pins both the limit and that the final error propagates — guards an
// off-by-one or an infinite loop).
expect(requests).toBe(MAX_ATTEMPTS);
});
it('does NOT retry an aborted request (no retry storm)', async () => {
resetMode = 'all';
const ctrl = new AbortController();
ctrl.abort();
await expect(
retryingFetch()(url, { signal: ctrl.signal }),
).rejects.toBeDefined();
// Pre-aborted: the request never reached the server, so nothing was retried.
expect(requests).toBe(0);
});
});

View File

@@ -0,0 +1,156 @@
import { Agent } from 'undici';
/**
* Default SILENCE timeout for streaming AI calls (15 min). Generous, but FINITE.
*
* Node's global fetch (undici) defaults headersTimeout and bodyTimeout to
* 300_000ms, which severed legitimate long agent turns mid-stream — surfacing as
* "Lost connection to the AI provider" (#175): a late step with a huge context
* pushes the model's time-to-first-token past 5 min, or a reasoning model pauses
* >5 min between chunks. We do NOT disable the timeout (0) — that would let a
* genuinely hung provider, with the client still connected, hang forever
* (abortSignal only fires on client disconnect). Instead we raise it well above
* any realistic gap while keeping it finite so a true hang is eventually broken.
*
* This bounds SILENCE (time-to-first-byte and the gap BETWEEN chunks), NOT total
* turn duration — so an arbitrarily long turn that keeps streaming bytes is never
* cut; only a stream that goes quiet for longer than this is treated as a hang.
*/
const DEFAULT_STREAM_TIMEOUT_MS = 900_000;
/**
* Default keep-alive recycle window (10s). A pooled connection idle longer than
* this is CLOSED rather than reused.
*
* Long agent turns leave gaps of tens of seconds between provider calls (one
* call per step; a crawl/search tool runs in between). A NAT / reverse proxy /
* conntrack in front of the deployment silently drops an idle connection after
* its own timeout; undici, not knowing, then reuses that dead socket and the
* next request fails PRE-RESPONSE with `read ECONNRESET` (#175 prod telemetry:
* the resets correlate with idleSincePrevCall ~42s, while a direct path to the
* provider does NOT reset). Recycling idle sockets well below such a drop window
* means a long-gap call opens a fresh connection instead of reusing a stale one.
* `keepAliveMaxTimeout` also caps a server-advertised keep-alive so the provider
* cannot push the reuse window back up.
*/
const DEFAULT_STREAM_KEEPALIVE_MS = 10_000;
/**
* How many times to retry a PRE-RESPONSE connection failure (a reset/timeout
* before ANY response byte) on a fresh connection. Safe because `fetch()` only
* rejects before the Response resolves — a started stream is never replayed.
*/
const PRE_RESPONSE_CONNECT_RETRIES = 2;
/** undici cause codes for a connection-level failure that occurred PRE-RESPONSE. */
const RETRYABLE_CONNECT_CODES = new Set([
'ECONNRESET',
'ECONNREFUSED',
'EPIPE',
'ETIMEDOUT',
'UND_ERR_SOCKET',
'UND_ERR_CONNECT_TIMEOUT',
]);
function positiveEnv(name: string, fallback: number): number {
const raw = Number(process.env[name]);
return Number.isFinite(raw) && raw > 0 ? raw : fallback;
}
/**
* The configured silence timeout (ms). Override with `AI_STREAM_TIMEOUT_MS`; a
* missing/invalid/non-positive value falls back to {@link DEFAULT_STREAM_TIMEOUT_MS}.
*/
export function streamTimeoutMs(): number {
return positiveEnv('AI_STREAM_TIMEOUT_MS', DEFAULT_STREAM_TIMEOUT_MS);
}
/** Keep-alive recycle window (ms). Override with `AI_STREAM_KEEPALIVE_MS`. */
export function streamKeepAliveMs(): number {
return positiveEnv('AI_STREAM_KEEPALIVE_MS', DEFAULT_STREAM_KEEPALIVE_MS);
}
/**
* undici `Agent` options for streaming AI traffic — the (generous, finite)
* silence timeouts plus the keep-alive recycle window. Shared by the chat
* provider fetch and the external-MCP dispatcher so they behave identically.
*/
export function streamingDispatcherOptions(): {
headersTimeout: number;
bodyTimeout: number;
keepAliveTimeout: number;
keepAliveMaxTimeout: number;
} {
const t = streamTimeoutMs();
const ka = streamKeepAliveMs();
return {
headersTimeout: t,
bodyTimeout: t,
keepAliveTimeout: ka,
keepAliveMaxTimeout: ka,
};
}
/** True for a connection-level error worth retrying on a fresh connection. */
export function isRetryableConnectError(err: unknown): boolean {
const e = err as { code?: string; cause?: { code?: string } } | undefined;
const code = e?.cause?.code ?? e?.code;
return typeof code === 'string' && RETRYABLE_CONNECT_CODES.has(code);
}
/**
* Build a `fetch` for long-lived streaming AI calls (the agent chat turn) backed
* by a dedicated undici dispatcher (finite silence timeouts + keep-alive
* recycling, #175). A single shared dispatcher is returned (callers hold it for
* the service lifetime) so its connection pool is reused.
*
* This is the BASE transport — no retry. The chat path wraps it as
* `withPreResponseRetry(createInstrumentedFetch(ctx, createStreamingFetch()))`
* so the retry is the OUTERMOST layer and the instrumentation observes EVERY
* attempt (a recovered reset is still logged — see withPreResponseRetry).
*/
export function createStreamingFetch(): typeof fetch {
const dispatcher = new Agent(streamingDispatcherOptions());
return ((input: Parameters<typeof fetch>[0], init?: RequestInit) =>
fetch(input, {
...(init ?? {}),
// `dispatcher` is an undici-specific init field (not in the DOM
// RequestInit type); Node's global fetch reads it. Cast to satisfy it.
dispatcher,
} as RequestInit & { dispatcher: Agent })) as typeof fetch;
}
/**
* Wrap a fetch so a PRE-RESPONSE connection reset (`baseFetch` rejects before the
* Response resolves — so nothing has streamed) is retried a few times on a fresh
* connection (#175). A poisoned keep-alive socket is destroyed by undici on the
* reset, so the retry lands on a new connection. An abort (client disconnect) is
* never retried.
*
* This is the OUTERMOST transport layer by design: composing it as
* `withPreResponseRetry(instrumentedFetch)` means every attempt — including the
* resets that the retry recovers from — flows through the instrumentation, so the
* "PRE-RESPONSE FAILED ... ECONNRESET ... idleSincePrevCall" telemetry stays
* visible precisely when the fix is working (and AI_STREAM_KEEPALIVE_MS can be
* tuned from real data). A retry INSIDE the transport would hide it.
*/
export function withPreResponseRetry(baseFetch: typeof fetch): typeof fetch {
return (async (input: Parameters<typeof fetch>[0], init?: RequestInit) => {
for (let attempt = 0; ; attempt++) {
try {
return await baseFetch(input, init);
} catch (err) {
const aborted = init?.signal?.aborted === true;
if (
aborted ||
attempt >= PRE_RESPONSE_CONNECT_RETRIES ||
!isRetryableConnectError(err)
) {
throw err;
}
// Brief backoff before the fresh-connection retry.
await new Promise((resolve) => setTimeout(resolve, 150 * (attempt + 1)));
}
}
}) as typeof fetch;
}

View File

@@ -0,0 +1,58 @@
// `.provider` alone cannot prove the openai-compatible factory was called with
// `includeUsage: true` — a regression dropping it (which zeroes streamed token
// usage / reasoning-token metadata) would still pass. So mock the factory and
// assert the exact args. jest.mock is module-scoped, hence a dedicated file.
const mockCompatibleModel = { provider: 'openai-compatible.chat', modelId: 'm' };
// jest allows `mock`-prefixed vars inside a jest.mock factory.
const mockCreateOpenAICompatible = jest.fn(
(_settings: unknown) => () => mockCompatibleModel,
);
jest.mock('@ai-sdk/openai-compatible', () => ({
createOpenAICompatible: (settings: unknown) =>
mockCreateOpenAICompatible(settings),
}));
import { AiService } from './ai.service';
describe('AiService.getChatModel openai-compatible factory args', () => {
function serviceWith(chatApiStyle?: 'openai-compatible' | 'openai') {
const aiSettings = {
resolve: jest.fn().mockResolvedValue({
driver: 'openai',
chatModel: 'glm-5.2',
apiKey: 'the-key',
baseUrl: 'https://api.z.ai/v4',
chatApiStyle,
}),
};
return new AiService(
// eslint-disable-next-line @typescript-eslint/no-explicit-any
aiSettings as any,
{ find: jest.fn() } as never,
{ decryptSecret: jest.fn() } as never,
);
}
beforeEach(() => mockCreateOpenAICompatible.mockClear());
it('passes includeUsage:true plus baseURL/apiKey/fetch (default style)', async () => {
await serviceWith().getChatModel('ws-1'); // unset -> openai-compatible
expect(mockCreateOpenAICompatible).toHaveBeenCalledTimes(1);
expect(mockCreateOpenAICompatible).toHaveBeenCalledWith(
expect.objectContaining({
name: 'openai-compatible',
baseURL: 'https://api.z.ai/v4',
apiKey: 'the-key',
includeUsage: true,
fetch: expect.any(Function),
}),
);
});
it("does NOT use the openai-compatible factory for chatApiStyle 'openai'", async () => {
await serviceWith('openai').getChatModel('ws-1');
expect(mockCreateOpenAICompatible).not.toHaveBeenCalled();
});
});

View File

@@ -285,3 +285,64 @@ describe('AiService.getChatModel role model override', () => {
);
});
});
/**
* Chat provider selection by the EXPLICIT `chatApiStyle` (NOT inferred from
* baseUrl): 'openai-compatible' (default) uses @ai-sdk/openai-compatible, which
* maps streamed reasoning_content to reasoning parts; 'openai' uses the official
* provider; and openai-compatible without a baseURL safely falls back to the
* official provider (it has no default endpoint). Asserted via `.provider`.
*/
describe('AiService.getChatModel chatApiStyle provider selection', () => {
function serviceWith(opts: {
baseUrl?: string;
chatApiStyle?: 'openai-compatible' | 'openai';
}) {
const aiSettings = {
resolve: jest.fn().mockResolvedValue({
driver: 'openai',
chatModel: 'glm-5.2',
apiKey: 'key',
baseUrl: opts.baseUrl,
chatApiStyle: opts.chatApiStyle,
}),
};
return new AiService(
// eslint-disable-next-line @typescript-eslint/no-explicit-any
aiSettings as any,
{ find: jest.fn() } as never,
{ decryptSecret: jest.fn() } as never,
);
}
const providerOf = async (svc: AiService) =>
(
(await svc.getChatModel('ws-1')) as { provider: string }
).provider;
it("'openai-compatible' + baseURL -> openai-compatible provider", async () => {
expect(
await providerOf(
serviceWith({ baseUrl: 'https://api.z.ai/v4', chatApiStyle: 'openai-compatible' }),
),
).toContain('openai-compatible');
});
it("'openai' + baseURL -> official openai provider", async () => {
expect(
await providerOf(serviceWith({ baseUrl: 'https://api.z.ai/v4', chatApiStyle: 'openai' })),
).toBe('openai.chat');
});
it('unset + baseURL -> defaults to openai-compatible', async () => {
expect(
await providerOf(serviceWith({ baseUrl: 'https://api.z.ai/v4' })),
).toContain('openai-compatible');
});
it("'openai-compatible' WITHOUT baseURL -> safe fallback to official openai", async () => {
expect(
await providerOf(serviceWith({ chatApiStyle: 'openai-compatible' })),
).toBe('openai.chat');
});
});

View File

@@ -7,6 +7,7 @@ import {
type LanguageModel,
} from 'ai';
import { createOpenAI } from '@ai-sdk/openai';
import { createOpenAICompatible } from '@ai-sdk/openai-compatible';
import { createGoogleGenerativeAI } from '@ai-sdk/google';
import { createOllama } from 'ai-sdk-ollama';
import { AiSettingsService } from './ai-settings.service';
@@ -14,6 +15,11 @@ import { AiNotConfiguredException } from './ai-not-configured.exception';
import { AiEmbeddingNotConfiguredException } from './ai-embedding-not-configured.exception';
import { AiSttNotConfiguredException } from './ai-stt-not-configured.exception';
import { describeProviderError } from './ai-error.util';
import { createInstrumentedFetch } from './ai-provider-http';
import {
createStreamingFetch,
withPreResponseRetry,
} from './ai-streaming-fetch';
import { AiProviderCredentialsRepo } from '@docmost/db/repos/ai-chat/ai-provider-credentials.repo';
import { SecretBoxService } from '../crypto/secret-box';
import { AiDriver } from './ai.types';
@@ -43,6 +49,17 @@ export interface ChatModelOverride {
export class AiService {
private readonly logger = new Logger(AiService.name);
// Provider HTTP fetch for the chat path, layered so each transport concern is
// observed (#175). Inside-out: the streaming fetch (finite silence timeouts +
// keep-alive recycling) → provider-HTTP instrumentation (logs every attempt) →
// pre-response connection-reset retry as the OUTERMOST layer. Retry-outer means
// a reset the retry recovers from is still logged with its idle-gap, instead of
// collapsing into a clean "OK". Held for the service lifetime to reuse the
// streaming dispatcher's connection pool.
private readonly aiProviderFetch = withPreResponseRetry(
createInstrumentedFetch('AiService:provider-http', createStreamingFetch()),
);
constructor(
private readonly aiSettings: AiSettingsService,
private readonly aiProviderCredentialsRepo: AiProviderCredentialsRepo,
@@ -83,6 +100,10 @@ export class AiService {
let apiKey = cfg.apiKey;
let baseUrl = cfg.baseUrl;
// Chat provider implementation, chosen EXPLICITLY by the admin (not inferred
// from baseUrl). Unset → 'openai-compatible' so reasoning is surfaced by
// default for this fork's openai+baseUrl setups.
const chatApiStyle = cfg.chatApiStyle ?? 'openai-compatible';
// A driver override that differs from the workspace driver needs that
// driver's own creds (the workspace driver's key would be wrong/absent).
@@ -133,14 +154,41 @@ export class AiService {
}
switch (driver) {
case 'openai':
// baseURL (when set) covers openai-compatible endpoints. Use Chat
// Completions (/chat/completions) — the portable OpenAI-compatible
// endpoint. The default callable createOpenAI(...)(model) targets the
// Responses API (/responses), which OpenAI-compatible gateways
// (OpenRouter, etc.) reject on multi-turn requests (history with
// assistant messages) → 400.
return createOpenAI({ apiKey, baseURL: baseUrl }).chat(chatModel);
case 'openai': {
// The provider implementation is chosen by the admin's `chatApiStyle`
// (NOT inferred from baseUrl — a custom URL can front real OpenAI too).
// Both branches hit Chat Completions (/chat/completions); the provider
// fetch is the instrumented streaming fetch (finite-but-generous stream
// timeouts, #175).
//
// 'openai-compatible' (default) maps the third-party provider's streamed
// `reasoning_content` to reasoning parts (z.ai/GLM, DeepSeek, ...) — the
// point of #175. It has no default endpoint, so it requires a baseURL;
// when there is none (real OpenAI, or a role's cross-driver override that
// cleared baseUrl) we fall back to the official provider.
if (chatApiStyle === 'openai-compatible' && baseUrl) {
return createOpenAICompatible({
name: 'openai-compatible',
apiKey,
baseURL: baseUrl,
// Keep streamed token usage (stream_options.include_usage): without
// it @ai-sdk/openai-compatible omits usage, zeroing the live token
// counter and reasoning-token metadata. The official provider always
// sent it, so this preserves parity.
includeUsage: true,
fetch: this.aiProviderFetch,
})(chatModel);
}
// Official @ai-sdk/openai: real-OpenAI reasoning-model request shaping;
// `.chat()` targets Chat Completions (the default callable targets the
// Responses API, which openai-compatible gateways 400 on multi-turn
// history). In this fork baseUrl is normally set; undefined = real OpenAI.
return createOpenAI({
apiKey,
baseURL: baseUrl,
fetch: this.aiProviderFetch,
}).chat(chatModel);
}
case 'gemini':
return createGoogleGenerativeAI({ apiKey })(chatModel);
case 'ollama':

View File

@@ -16,6 +16,15 @@ export const AI_DRIVERS: AiDriver[] = ['openai', 'gemini', 'ollama'];
export type SttApiStyle = 'multipart' | 'json';
export const STT_API_STYLES: SttApiStyle[] = ['multipart', 'json'];
// Chat provider implementation for the `openai` driver. Chosen explicitly by the
// admin (NOT inferred from baseUrl — a custom URL can front real OpenAI too).
// 'openai-compatible' = @ai-sdk/openai-compatible: maps streamed
// `reasoning_content` to reasoning parts (z.ai/GLM, DeepSeek, OpenRouter, ...).
// 'openai' = official @ai-sdk/openai: real-OpenAI reasoning-model request shaping
// (max_completion_tokens, the 'developer' role), no third-party reasoning map.
export type ChatApiStyle = 'openai-compatible' | 'openai';
export const CHAT_API_STYLES: ChatApiStyle[] = ['openai-compatible', 'openai'];
/**
* Non-secret provider settings persisted under `settings.ai.provider`.
* The API key is intentionally absent here.
@@ -23,6 +32,9 @@ export const STT_API_STYLES: SttApiStyle[] = ['multipart', 'json'];
export interface AiProviderSettings {
driver: AiDriver;
chatModel: string;
// Chat provider implementation for the `openai` driver. Unset → defaults to
// 'openai-compatible' (so reasoning is surfaced by default). See ChatApiStyle.
chatApiStyle?: ChatApiStyle;
embeddingModel?: string;
baseUrl?: string;
// Embedding-specific base URL. Falls back to `baseUrl` when empty/unset.
@@ -45,6 +57,34 @@ export interface AiProviderSettings {
publicShareAssistantRoleId?: string;
}
/**
* The persisted, non-secret provider setting keys — the SINGLE source of truth
* for which fields a settings update may write through to `settings.ai.provider`.
* `satisfies readonly (keyof AiProviderSettings)[]` makes the compiler reject a
* typo or a key that is not a real provider setting.
*
* The settings service consumes this directly. The generic workspace repo cannot
* import AI types, so it keeps its own copy of the same keys, guarded by a parity
* test against this constant (so any future drift fails in CI, not silently in
* prod — a missing key there validates fine, passes the service, and is then
* dropped at the SQL boundary with no error).
*/
export const PROVIDER_SETTINGS_KEYS = [
'driver',
'chatModel',
'chatApiStyle',
'embeddingModel',
'baseUrl',
'embeddingBaseUrl',
'sttModel',
'sttBaseUrl',
'sttApiStyle',
'sttLanguage',
'systemPrompt',
'publicShareChatModel',
'publicShareAssistantRoleId',
] as const satisfies readonly (keyof AiProviderSettings)[];
/**
* Fully resolved provider config, including the decrypted API key for the
* stored driver. Returned by `AiSettingsService.resolve`. The keys are held in
@@ -76,6 +116,7 @@ export interface ResolvedAiConfig extends Partial<AiProviderSettings> {
export interface MaskedAiSettings {
driver?: AiDriver;
chatModel?: string;
chatApiStyle?: ChatApiStyle;
embeddingModel?: string;
baseUrl?: string;
embeddingBaseUrl?: string;

View File

@@ -1,5 +1,12 @@
import { IsIn, IsOptional, IsString } from 'class-validator';
import { AI_DRIVERS, AiDriver, STT_API_STYLES, SttApiStyle } from '../ai.types';
import {
AI_DRIVERS,
AiDriver,
CHAT_API_STYLES,
ChatApiStyle,
STT_API_STYLES,
SttApiStyle,
} from '../ai.types';
/**
* Admin update payload for the workspace AI provider settings.
@@ -18,6 +25,10 @@ export class UpdateAiSettingsDto {
@IsString()
chatModel?: string;
@IsOptional()
@IsIn(CHAT_API_STYLES)
chatApiStyle?: ChatApiStyle;
@IsOptional()
@IsString()
embeddingModel?: string;