Files
gitmost/apps
claude code agent 227 8fd818c279 fix(ai-http): serialize per-host AI requests + back off on 429 (z.ai #140)
z.ai's GLM Coding Plan throttles hard (429s) and stalls under bursts/overlap;
two mitigations in the shared AI transport:

- Per-host concurrency gate (default 1, env AI_HTTP_MAX_CONCURRENCY): outbound AI
  requests to a given host are serialized, and the slot is held until the
  (streamed) response body is fully consumed — so a chat stream blocks an
  overlapping title-gen / second-tab / RAG-embedding request instead of tripping
  z.ai's ~1-concurrent limit. A defensive max-hold (AI_HTTP_MAX_HOLD_MS, 10 min)
  prevents a hung stream from deadlocking all AI traffic (headers/body timeouts
  are disabled, see #144).
- 429 backoff (AI_HTTP_MAX_429_RETRIES, default 3): respect Retry-After (or
  exponential backoff) and retry, so a rate-limited agent step waits the throttle
  out instead of failing the whole turn.

NOTE: these address the burst/overlap dimension. The dominant symptom — z.ai's
erratic time-to-first-byte (measured 2s..56s, endpoint/UA/tool-count-independent)
— is mitigated by #144 (wait like curl); it is a z.ai-side capacity issue, not
something client code can speed up.

Tests: ai-http.spec.ts gains a concurrency-gate test (a 2nd request to the same
host does not hit the server until the 1st body is consumed) and a 429-backoff
test (Retry-After honored, eventual 200). 8/8 pass; typecheck clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 16:50:41 +03:00
..