The AI chat stream to z.ai (GLM-5.2, api/coding/paas/v4) broke in
production on every heavy turn while `curl` to the same endpoint worked.
ROOT CAUSE (reproduced in ai-http.spec.ts): z.ai's coding endpoint is a
reasoning model with a long, variable TIME-TO-FIRST-RESPONSE-HEADER on a
heavy chat request (tools + system prompt + document + history) — it emits
nothing for tens of seconds before the first SSE byte. A trivial ping
returns <2s, which is why "test connection" always passed. `curl` succeeds
because it imposes no time-to-first-header limit.
The prior attempt (#141) made it STRICTLY worse: it set undici
`headersTimeout: 60_000` (aborting every heavy turn at ~60s — the prod logs
show ~61-62s failures) AND added `UND_ERR_HEADERS_TIMEOUT` to the RetryAgent
retry codes. Retrying a POST-with-body after a headers-timeout abort re-sends
the body against a torn-down request and throws
`UND_ERR_REQ_CONTENT_LENGTH_MISMATCH` — the exact production error.
Fix — behave like curl:
- Disable headersTimeout/bodyTimeout by default (0), env-overridable via
AI_HTTP_HEADERS_TIMEOUT_MS / AI_HTTP_BODY_TIMEOUT_MS (sanitized so a typo
can't crash the AI layer at import). The transport now waits for z.ai's
slow first byte instead of killing the stream.
- Keep the RetryAgent reconnecting ONLY genuine connection resets on a fresh
socket; never retry a header/body timeout (it corrupts the POST body).
- STT (transcribeJsonBase64) gains an explicit AbortSignal.timeout, since it
shared aiFetch and previously relied on undici's default transport timeout.
Tests: loopback reproduction proving the #141 retry config yields
ContentLengthMismatch while the corrected set surfaces an honest
HeadersTimeout, plus a curl-parity test (a finite headersTimeout aborts a
slow first byte; aiFetch delivers the 200).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>