fix(ai-chat): root-cause #140 — stop aborting z.ai's slow first byte (supersedes #141) #144

Closed

Ghost wants to merge 3 commits from fix/ai-zai-stream-rootcause into develop

Author	SHA1	Message	Date
claude code agent 227	e5effa13e1	fix(ai-http): generous-finite AI timeouts (120s) instead of disabled Refines the #144 timeout decision with measured data. A 30-min probe of paced single z.ai requests: 22/22 succeeded, TTFB 1.6–9.9s, zero timeouts/429s, no multi-minute hang. So z.ai answers fast when NOT bursted; the reported "hangs tens of minutes" is the burst path (20-step agent + stacked retries), addressed by the per-host concurrency gate + 429 backoff. Therefore headersTimeout/bodyTimeout default to 120s (env-overridable; 0 to disable) rather than 0/infinite: 120s is ~12× the worst observed paced TTFB, so it tolerates real slow turns but cuts a genuinely-stuck request with a clear error instead of hanging for minutes (curl-style "wait forever" was too loose; #141's 60s was too tight). Sanitizer now falls back to the default on a bad env value; an explicit env 0 still disables. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 18:49:04 +03:00
claude code agent 227	8fd818c279	fix(ai-http): serialize per-host AI requests + back off on 429 (z.ai #140 ) z.ai's GLM Coding Plan throttles hard (429s) and stalls under bursts/overlap; two mitigations in the shared AI transport: - Per-host concurrency gate (default 1, env AI_HTTP_MAX_CONCURRENCY): outbound AI requests to a given host are serialized, and the slot is held until the (streamed) response body is fully consumed — so a chat stream blocks an overlapping title-gen / second-tab / RAG-embedding request instead of tripping z.ai's ~1-concurrent limit. A defensive max-hold (AI_HTTP_MAX_HOLD_MS, 10 min) prevents a hung stream from deadlocking all AI traffic (headers/body timeouts are disabled, see #144). - 429 backoff (AI_HTTP_MAX_429_RETRIES, default 3): respect Retry-After (or exponential backoff) and retry, so a rate-limited agent step waits the throttle out instead of failing the whole turn. NOTE: these address the burst/overlap dimension. The dominant symptom — z.ai's erratic time-to-first-byte (measured 2s..56s, endpoint/UA/tool-count-independent) — is mitigated by #144 (wait like curl); it is a z.ai-side capacity issue, not something client code can speed up. Tests: ai-http.spec.ts gains a concurrency-gate test (a 2nd request to the same host does not hit the server until the 1st body is consumed) and a 429-backoff test (Retry-After honored, eventual 200). 8/8 pass; typecheck clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 16:50:41 +03:00
claude code agent 227	d7454c887d	fix(ai-chat): root-cause #140 — stop aborting z.ai's slow first byte The AI chat stream to z.ai (GLM-5.2, api/coding/paas/v4) broke in production on every heavy turn while `curl` to the same endpoint worked. ROOT CAUSE (reproduced in ai-http.spec.ts): z.ai's coding endpoint is a reasoning model with a long, variable TIME-TO-FIRST-RESPONSE-HEADER on a heavy chat request (tools + system prompt + document + history) — it emits nothing for tens of seconds before the first SSE byte. A trivial ping returns <2s, which is why "test connection" always passed. `curl` succeeds because it imposes no time-to-first-header limit. The prior attempt (#141) made it STRICTLY worse: it set undici `headersTimeout: 60_000` (aborting every heavy turn at ~60s — the prod logs show ~61-62s failures) AND added `UND_ERR_HEADERS_TIMEOUT` to the RetryAgent retry codes. Retrying a POST-with-body after a headers-timeout abort re-sends the body against a torn-down request and throws `UND_ERR_REQ_CONTENT_LENGTH_MISMATCH` — the exact production error. Fix — behave like curl: - Disable headersTimeout/bodyTimeout by default (0), env-overridable via AI_HTTP_HEADERS_TIMEOUT_MS / AI_HTTP_BODY_TIMEOUT_MS (sanitized so a typo can't crash the AI layer at import). The transport now waits for z.ai's slow first byte instead of killing the stream. - Keep the RetryAgent reconnecting ONLY genuine connection resets on a fresh socket; never retry a header/body timeout (it corrupts the POST body). - STT (transcribeJsonBase64) gains an explicit AbortSignal.timeout, since it shared aiFetch and previously relied on undici's default transport timeout. Tests: loopback reproduction proving the #141 retry config yields ContentLengthMismatch while the corrected set surfaces an honest HeadersTimeout, plus a curl-parity test (a finite headersTimeout aborts a slow first byte; aiFetch delivers the 200). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 05:18:08 +03:00

Author

SHA1

Message

Date

claude code agent 227

e5effa13e1

fix(ai-http): generous-finite AI timeouts (120s) instead of disabled

Refines the #144 timeout decision with measured data. A 30-min probe of paced
single z.ai requests: 22/22 succeeded, TTFB 1.6–9.9s, zero timeouts/429s, no
multi-minute hang. So z.ai answers fast when NOT bursted; the reported
"hangs tens of minutes" is the burst path (20-step agent + stacked retries),
addressed by the per-host concurrency gate + 429 backoff.

Therefore headersTimeout/bodyTimeout default to 120s (env-overridable; 0 to
disable) rather than 0/infinite: 120s is ~12× the worst observed paced TTFB, so
it tolerates real slow turns but cuts a genuinely-stuck request with a clear
error instead of hanging for minutes (curl-style "wait forever" was too loose;
#141's 60s was too tight). Sanitizer now falls back to the default on a bad env
value; an explicit env 0 still disables.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-23 18:49:04 +03:00

claude code agent 227

8fd818c279

fix(ai-http): serialize per-host AI requests + back off on 429 (z.ai #140 )

z.ai's GLM Coding Plan throttles hard (429s) and stalls under bursts/overlap;
two mitigations in the shared AI transport:

- Per-host concurrency gate (default 1, env AI_HTTP_MAX_CONCURRENCY): outbound AI
  requests to a given host are serialized, and the slot is held until the
  (streamed) response body is fully consumed — so a chat stream blocks an
  overlapping title-gen / second-tab / RAG-embedding request instead of tripping
  z.ai's ~1-concurrent limit. A defensive max-hold (AI_HTTP_MAX_HOLD_MS, 10 min)
  prevents a hung stream from deadlocking all AI traffic (headers/body timeouts
  are disabled, see #144).
- 429 backoff (AI_HTTP_MAX_429_RETRIES, default 3): respect Retry-After (or
  exponential backoff) and retry, so a rate-limited agent step waits the throttle
  out instead of failing the whole turn.

NOTE: these address the burst/overlap dimension. The dominant symptom — z.ai's
erratic time-to-first-byte (measured 2s..56s, endpoint/UA/tool-count-independent)
— is mitigated by #144 (wait like curl); it is a z.ai-side capacity issue, not
something client code can speed up.

Tests: ai-http.spec.ts gains a concurrency-gate test (a 2nd request to the same
host does not hit the server until the 1st body is consumed) and a 429-backoff
test (Retry-After honored, eventual 200). 8/8 pass; typecheck clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-23 16:50:41 +03:00

claude code agent 227

d7454c887d

fix(ai-chat): root-cause #140 — stop aborting z.ai's slow first byte

The AI chat stream to z.ai (GLM-5.2, api/coding/paas/v4) broke in
production on every heavy turn while `curl` to the same endpoint worked.

ROOT CAUSE (reproduced in ai-http.spec.ts): z.ai's coding endpoint is a
reasoning model with a long, variable TIME-TO-FIRST-RESPONSE-HEADER on a
heavy chat request (tools + system prompt + document + history) — it emits
nothing for tens of seconds before the first SSE byte. A trivial ping
returns <2s, which is why "test connection" always passed. `curl` succeeds
because it imposes no time-to-first-header limit.

The prior attempt (#141) made it STRICTLY worse: it set undici
`headersTimeout: 60_000` (aborting every heavy turn at ~60s — the prod logs
show ~61-62s failures) AND added `UND_ERR_HEADERS_TIMEOUT` to the RetryAgent
retry codes. Retrying a POST-with-body after a headers-timeout abort re-sends
the body against a torn-down request and throws
`UND_ERR_REQ_CONTENT_LENGTH_MISMATCH` — the exact production error.

Fix — behave like curl:
- Disable headersTimeout/bodyTimeout by default (0), env-overridable via
  AI_HTTP_HEADERS_TIMEOUT_MS / AI_HTTP_BODY_TIMEOUT_MS (sanitized so a typo
  can't crash the AI layer at import). The transport now waits for z.ai's
  slow first byte instead of killing the stream.
- Keep the RetryAgent reconnecting ONLY genuine connection resets on a fresh
  socket; never retry a header/body timeout (it corrupts the POST body).
- STT (transcribeJsonBase64) gains an explicit AbortSignal.timeout, since it
  shared aiFetch and previously relied on undici's default transport timeout.

Tests: loopback reproduction proving the #141 retry config yields
ContentLengthMismatch while the corrected set surfaces an honest
HeadersTimeout, plus a curl-parity test (a finite headersTimeout aborts a
slow first byte; aiFetch delivers the 200).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-23 05:18:08 +03:00

fix(ai-chat): root-cause #140 — stop aborting z.ai's slow first byte (supersedes #141) #144

3 Commits