Closed
Ghost
wants to merge 3 commits from
fix/ai-zai-stream-rootcause into develop
pull from: fix/ai-zai-stream-rootcause
merge into: vvzvlad:develop
vvzvlad:main
vvzvlad:feat/184-autonomous-agent-runs
vvzvlad:feat/git-sync
vvzvlad:fix/embeddings-reindex-progress
vvzvlad:feature/offline-sync
vvzvlad:refactor/193-tool-spec-registry
vvzvlad:test/244-part-b
vvzvlad:fix/255-ws-redis-adapter-leak
vvzvlad:feat/251-intentional-clear
vvzvlad:fix/252-e2e-open-handles
vvzvlad:feat/221-image-captions
vvzvlad:fix/244-dataloss-bugs
vvzvlad:develop
vvzvlad:feat/229-catalog-yaml
vvzvlad:feat/243-blob-sandbox
vvzvlad:feat/228-inline-footnotes
vvzvlad:fix/qa-ui-bugs-216-218
vvzvlad:feature/agent-roles-catalog
vvzvlad:fix/share-alias-rename
vvzvlad:fix/ai-chat-empty-render
vvzvlad:feat/191-chat-doc-binding
vvzvlad:feat/201-temporary-notes
vvzvlad:feat/198-interrupt-agent
vvzvlad:feat/ai-chat-full-history
vvzvlad:feat/199-ai-generate-title
vvzvlad:feat/205-share-aliases
vvzvlad:batch/issues-189-187-170
vvzvlad:feat/170-mcp-test-button
vvzvlad:feat/189-context-badge
vvzvlad:feat/198-interrupt-agent-send-now
vvzvlad:fix/issues-190-159
vvzvlad:fix/ai-chat-new-chat-during-stream
vvzvlad:fix/ai-chat-stream-perf
vvzvlad:batch/issues-2026-06-25
vvzvlad:feat/ai-chat-persistent-history
vvzvlad:fix/ai-chat-copy-chat-wysiwyg
vvzvlad:fix/ai-stream-reset-resilience
vvzvlad:fix/ai-stream-undici-timeout
vvzvlad:fix/footnote-review-1227-followup
vvzvlad:fix/ai-chat-token-counter-realtime
vvzvlad:docs/manual-qa-test-plan
3 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
e5effa13e1 |
fix(ai-http): generous-finite AI timeouts (120s) instead of disabled
Refines the #144 timeout decision with measured data. A 30-min probe of paced single z.ai requests: 22/22 succeeded, TTFB 1.6–9.9s, zero timeouts/429s, no multi-minute hang. So z.ai answers fast when NOT bursted; the reported "hangs tens of minutes" is the burst path (20-step agent + stacked retries), addressed by the per-host concurrency gate + 429 backoff. Therefore headersTimeout/bodyTimeout default to 120s (env-overridable; 0 to disable) rather than 0/infinite: 120s is ~12× the worst observed paced TTFB, so it tolerates real slow turns but cuts a genuinely-stuck request with a clear error instead of hanging for minutes (curl-style "wait forever" was too loose; #141's 60s was too tight). Sanitizer now falls back to the default on a bad env value; an explicit env 0 still disables. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
8fd818c279 |
fix(ai-http): serialize per-host AI requests + back off on 429 (z.ai #140)
z.ai's GLM Coding Plan throttles hard (429s) and stalls under bursts/overlap; two mitigations in the shared AI transport: - Per-host concurrency gate (default 1, env AI_HTTP_MAX_CONCURRENCY): outbound AI requests to a given host are serialized, and the slot is held until the (streamed) response body is fully consumed — so a chat stream blocks an overlapping title-gen / second-tab / RAG-embedding request instead of tripping z.ai's ~1-concurrent limit. A defensive max-hold (AI_HTTP_MAX_HOLD_MS, 10 min) prevents a hung stream from deadlocking all AI traffic (headers/body timeouts are disabled, see #144). - 429 backoff (AI_HTTP_MAX_429_RETRIES, default 3): respect Retry-After (or exponential backoff) and retry, so a rate-limited agent step waits the throttle out instead of failing the whole turn. NOTE: these address the burst/overlap dimension. The dominant symptom — z.ai's erratic time-to-first-byte (measured 2s..56s, endpoint/UA/tool-count-independent) — is mitigated by #144 (wait like curl); it is a z.ai-side capacity issue, not something client code can speed up. Tests: ai-http.spec.ts gains a concurrency-gate test (a 2nd request to the same host does not hit the server until the 1st body is consumed) and a 429-backoff test (Retry-After honored, eventual 200). 8/8 pass; typecheck clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
d7454c887d |
fix(ai-chat): root-cause #140 — stop aborting z.ai's slow first byte
The AI chat stream to z.ai (GLM-5.2, api/coding/paas/v4) broke in production on every heavy turn while `curl` to the same endpoint worked. ROOT CAUSE (reproduced in ai-http.spec.ts): z.ai's coding endpoint is a reasoning model with a long, variable TIME-TO-FIRST-RESPONSE-HEADER on a heavy chat request (tools + system prompt + document + history) — it emits nothing for tens of seconds before the first SSE byte. A trivial ping returns <2s, which is why "test connection" always passed. `curl` succeeds because it imposes no time-to-first-header limit. The prior attempt (#141) made it STRICTLY worse: it set undici `headersTimeout: 60_000` (aborting every heavy turn at ~60s — the prod logs show ~61-62s failures) AND added `UND_ERR_HEADERS_TIMEOUT` to the RetryAgent retry codes. Retrying a POST-with-body after a headers-timeout abort re-sends the body against a torn-down request and throws `UND_ERR_REQ_CONTENT_LENGTH_MISMATCH` — the exact production error. Fix — behave like curl: - Disable headersTimeout/bodyTimeout by default (0), env-overridable via AI_HTTP_HEADERS_TIMEOUT_MS / AI_HTTP_BODY_TIMEOUT_MS (sanitized so a typo can't crash the AI layer at import). The transport now waits for z.ai's slow first byte instead of killing the stream. - Keep the RetryAgent reconnecting ONLY genuine connection resets on a fresh socket; never retry a header/body timeout (it corrupts the POST body). - STT (transcribeJsonBase64) gains an explicit AbortSignal.timeout, since it shared aiFetch and previously relied on undici's default transport timeout. Tests: loopback reproduction proving the #141 retry config yields ContentLengthMismatch while the corrected set surfaces an honest HeadersTimeout, plus a curl-parity test (a finite headersTimeout aborts a slow first byte; aiFetch delivers the 200). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |