Files
gitmost/apps
claude code agent 227 d7454c887d fix(ai-chat): root-cause #140 — stop aborting z.ai's slow first byte
The AI chat stream to z.ai (GLM-5.2, api/coding/paas/v4) broke in
production on every heavy turn while `curl` to the same endpoint worked.

ROOT CAUSE (reproduced in ai-http.spec.ts): z.ai's coding endpoint is a
reasoning model with a long, variable TIME-TO-FIRST-RESPONSE-HEADER on a
heavy chat request (tools + system prompt + document + history) — it emits
nothing for tens of seconds before the first SSE byte. A trivial ping
returns <2s, which is why "test connection" always passed. `curl` succeeds
because it imposes no time-to-first-header limit.

The prior attempt (#141) made it STRICTLY worse: it set undici
`headersTimeout: 60_000` (aborting every heavy turn at ~60s — the prod logs
show ~61-62s failures) AND added `UND_ERR_HEADERS_TIMEOUT` to the RetryAgent
retry codes. Retrying a POST-with-body after a headers-timeout abort re-sends
the body against a torn-down request and throws
`UND_ERR_REQ_CONTENT_LENGTH_MISMATCH` — the exact production error.

Fix — behave like curl:
- Disable headersTimeout/bodyTimeout by default (0), env-overridable via
  AI_HTTP_HEADERS_TIMEOUT_MS / AI_HTTP_BODY_TIMEOUT_MS (sanitized so a typo
  can't crash the AI layer at import). The transport now waits for z.ai's
  slow first byte instead of killing the stream.
- Keep the RetryAgent reconnecting ONLY genuine connection resets on a fresh
  socket; never retry a header/body timeout (it corrupts the POST body).
- STT (transcribeJsonBase64) gains an explicit AbortSignal.timeout, since it
  shared aiFetch and previously relied on undici's default transport timeout.

Tests: loopback reproduction proving the #141 retry config yields
ContentLengthMismatch while the corrected set surfaces an honest
HeadersTimeout, plus a curl-parity test (a finite headersTimeout aborts a
slow first byte; aiFetch delivers the 200).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 05:18:08 +03:00
..