fix(ai-http): fail fast + retry on provider header stall (#140) #141

Ghost · 2026-06-23T04:15:38+03:00

Ghost commented

2026-06-23 04:15:38 +03:00

Fixes #140 — the z.ai GLM coding endpoint intermittently accepts the chat request but never sends response headers; undici's default headersTimeout is 300s, so the user hung for five minutes before failing, and UND_ERR_HEADERS_TIMEOUT was not in the RetryAgent's retried error set → no recovery.

Root cause (confirmed on the stand)

headersTimeout bounds time-to-FIRST-response-headers (before any body) — it is not the streaming budget; once headers arrive the SSE body streams freely, unaffected by it. The previous ai-http.ts comment conflated it with bodyTimeout and left it at undici's 300s default, so a header stall = a 5-minute hang with no retry. A tiny ping (test endpoint) responded in <2s so it never tripped; the heavy streaming chat occasionally stalled.

Reproduced live with the provided z.ai creds: normal streams return headers in ~1–9s; the stall is the rare path where headers never come.

Fix

Cap headersTimeout (env AI_HTTP_HEADERS_TIMEOUT_MS, default 60s) so a header stall fails fast instead of hanging 300s. Safe for streaming — lowering it does not truncate live SSE.
Add UND_ERR_HEADERS_TIMEOUT to the retried error codes so the stalled request is retried on a fresh connection (which usually responds in seconds). A header timeout fires before any body, so the retry is clean (no partial SSE / Range-resume problem).
bodyTimeout kept generous (env AI_HTTP_BODY_TIMEOUT_MS, default 300s) so slow/thinking models with sparse chunks survive. UND_ERR_BODY_TIMEOUT is deliberately not retried (mid-body, partial SSE already delivered).

Verification

New ai-http.spec.ts regression tests (loopback server, no external calls): a header stall is retried on a fresh connection and recovers; a healthy fast response passes through in one attempt. 5/5 spec tests pass, server tsc clean.
Live on the stand: z.ai (GLM-5.2) chat streams normally with the fix (no regression). OpenRouter (anthropic/claude-sonnet-4.6) also streams cleanly — consistent with the stall being specific to z.ai's coding endpoint.

Note: this is the opposite direction from the previously-reverted experiment (which raised the timeout to 600s — that can't help a stuck request). Lowering + retrying is what converts a 5-minute hang into a fast retry that usually succeeds.

🤖 Generated with Claude Code

Fixes #140 — the z.ai GLM coding endpoint intermittently accepts the chat request but **never sends response headers**; undici's default `headersTimeout` is 300s, so the user hung for **five minutes** before failing, and `UND_ERR_HEADERS_TIMEOUT` was **not** in the RetryAgent's retried error set → no recovery. ### Root cause (confirmed on the stand) `headersTimeout` bounds **time-to-FIRST-response-headers** (before any body) — it is **not** the streaming budget; once headers arrive the SSE body streams freely, unaffected by it. The previous `ai-http.ts` comment conflated it with `bodyTimeout` and left it at undici's 300s default, so a header stall = a 5-minute hang with no retry. A tiny `ping` (test endpoint) responded in <2s so it never tripped; the heavy streaming chat occasionally stalled. Reproduced live with the provided z.ai creds: normal streams return headers in ~1–9s; the stall is the rare path where headers never come. ### Fix - **Cap `headersTimeout`** (env `AI_HTTP_HEADERS_TIMEOUT_MS`, default **60s**) so a header stall **fails fast** instead of hanging 300s. Safe for streaming — lowering it does not truncate live SSE. - **Add `UND_ERR_HEADERS_TIMEOUT` to the retried error codes** so the stalled request is **retried on a fresh connection** (which usually responds in seconds). A header timeout fires *before* any body, so the retry is clean (no partial SSE / Range-resume problem). - **`bodyTimeout`** kept generous (env `AI_HTTP_BODY_TIMEOUT_MS`, default 300s) so slow/thinking models with sparse chunks survive. `UND_ERR_BODY_TIMEOUT` is deliberately **not** retried (mid-body, partial SSE already delivered). ### Verification - New `ai-http.spec.ts` regression tests (loopback server, no external calls): a header stall is retried on a fresh connection and recovers; a healthy fast response passes through in one attempt. **5/5 spec tests pass**, server `tsc` clean. - Live on the stand: z.ai (GLM-5.2) chat streams normally with the fix (no regression). OpenRouter (`anthropic/claude-sonnet-4.6`) also streams cleanly — consistent with the stall being specific to z.ai's coding endpoint. Note: this is the **opposite direction** from the previously-reverted experiment (which *raised* the timeout to 600s — that can't help a stuck request). Lowering + retrying is what converts a 5-minute hang into a fast retry that usually succeeds. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

Ghost added 2 commits 2026-06-23 04:15:39 +03:00

fix(ai-http): fail fast + retry on provider header stall (#140 ) 35fc58eaaa

The z.ai GLM coding endpoint intermittently accepts the chat request but never
sends response headers; undici's default 300s headersTimeout then hung the user
for five minutes before failing, and UND_ERR_HEADERS_TIMEOUT was not in the
RetryAgent's retried error set, so there was no recovery.

headersTimeout only bounds time-to-FIRST-headers (before any body) — it is NOT
the streaming budget, so lowering it does not truncate live SSE streams. Cap it
(env AI_HTTP_HEADERS_TIMEOUT_MS, default 60s) so a header stall fails fast, and
add UND_ERR_HEADERS_TIMEOUT to the retried error codes so the stalled request is
retried on a fresh connection (which usually responds in seconds). bodyTimeout
kept generous (env AI_HTTP_BODY_TIMEOUT_MS, default 300s) so slow streams with
sparse chunks survive. UND_ERR_BODY_TIMEOUT is deliberately NOT retried (mid-body,
partial SSE already delivered).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

test(ai-http): cover header-stall fail-fast + retry (#140 ) d6cd275469

Extend ai-http.spec with two loopback-server tests: a provider that stalls
without sending headers triggers the (lowered) headersTimeout and is retried on a
fresh connection, recovering; a healthy fast response passes through in one
attempt. No external network calls.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

vvzvlad merged commit 6efb865625 into develop

2026-06-23 04:16:38 +03:00

vvzvlad referenced this issue from a commit

2026-06-23 04:16:38 +03:00

Merge pull request 'fix(ai-http): fail fast + retry on provider header stall (#140)' (#141) from fix/ai-stream-headers-timeout into develop

vvzvlad commented

2026-06-23 04:26:22 +03:00

docmost@0.93.0 start /app
pnpm --filter ./apps/server run start:prod

server@0.93.0 start:prod /app/apps/server
cross-env NODE_ENV=production node dist/main

2026-06-23T01:23:10.413Z INF | pid=45 hostname=d39ceaa4ecbe context=DatabaseModule msg=Establishing database connection
2026-06-23T01:23:10.439Z INF | pid=45 hostname=d39ceaa4ecbe context=RedisModule msg=default: the connection was successfully established
2026-06-23T01:23:10.507Z INF | pid=45 hostname=d39ceaa4ecbe context=DatabaseModule msg=Database connection successful
2026-06-23T01:23:10.984Z INF | pid=45 hostname=d39ceaa4ecbe context=DatabaseMigrationService msg=No pending database migrations
2026-06-23T01:23:10.998Z INF | pid=45 hostname=d39ceaa4ecbe context=NestApplication msg=Nest application successfully started
2026-06-23T01:23:11.012Z INF | pid=45 hostname=d39ceaa4ecbe context=NestApplication msg=Listening on http://127.0.0.1:3000 / https://docs.vvzvlad.xyz
2026-06-23T01:23:22.009Z INF | pid=45 hostname=d39ceaa4ecbe req={"method":"POST","url":"/api/ai-chat/stream","ip":"172.18.0.15","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15"} context=AiChatController msg=AI chat stream START chat=new ua="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15"
2026-06-23T01:23:23.724Z INF | pid=45 hostname=d39ceaa4ecbe req={"method":"POST","url":"/api/ai-chat/stream","ip":"172.18.0.15","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15"} context=AiHttp msg=provider request #3 -> POST api.z.ai/api/coding/paas/v4/chat/completions
2026-06-23T01:24:02.863Z INF | pid=45 hostname=d39ceaa4ecbe req={"method":"POST","url":"/api/ai-chat/stream","ip":"172.18.0.15","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/148.0.0.0 Safari/537.36"} context=AiChatController msg=AI chat stream START chat=new ua="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/148.0.0.0 Safari/537.36"
2026-06-23T01:24:03.078Z INF | pid=45 hostname=d39ceaa4ecbe req={"method":"POST","url":"/api/ai-chat/stream","ip":"172.18.0.15","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/148.0.0.0 Safari/537.36"} context=AiHttp msg=provider request #4 -> POST api.z.ai/api/coding/paas/v4/chat/completions
2026-06-23T01:24:25.348Z WRN | pid=45 hostname=d39ceaa4ecbe req={"method":"POST","url":"/api/ai-chat/stream","ip":"172.18.0.15","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15"} context=AiHttp msg=provider request #3 x after 61624ms: fetch failed <- [UND_ERR_REQ_CONTENT_LENGTH_MISMATCH] Request body length does not match content-length header
2026-06-23T01:24:27.358Z INF | pid=45 hostname=d39ceaa4ecbe req={"method":"POST","url":"/api/ai-chat/stream","ip":"172.18.0.15","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15"} context=AiHttp msg=provider request #5 -> POST api.z.ai/api/coding/paas/v4/chat/completions
2026-06-23T01:25:05.723Z WRN | pid=45 hostname=d39ceaa4ecbe req={"method":"POST","url":"/api/ai-chat/stream","ip":"172.18.0.15","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/148.0.0.0 Safari/537.36"} context=AiHttp msg=provider request #4 x after 62645ms: fetch failed <- [ECONNRESET] read ECONNRESET
2026-06-23T01:25:07.725Z INF | pid=45 hostname=d39ceaa4ecbe req={"method":"POST","url":"/api/ai-chat/stream","ip":"172.18.0.15","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/148.0.0.0 Safari/537.36"} context=AiHttp msg=provider request #6 -> POST api.z.ai/api/coding/paas/v4/chat/completions
2026-06-23T01:25:29.207Z WRN | pid=45 hostname=d39ceaa4ecbe req={"method":"POST","url":"/api/ai-chat/stream","ip":"172.18.0.15","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15"} context=AiHttp msg=provider request #5 x after 61849ms: fetch failed <- [ECONNRESET] read ECONNRESET
2026-06-23T01:25:33.211Z INF | pid=45 hostname=d39ceaa4ecbe req={"method":"POST","url":"/api/ai-chat/stream","ip":"172.18.0.15","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15"} context=AiHttp msg=provider request #7 -> POST api.z.ai/api/coding/paas/v4/chat/completions

нихуя не помогло же

> docmost@0.93.0 start /app > pnpm --filter ./apps/server run start:prod > server@0.93.0 start:prod /app/apps/server > cross-env NODE_ENV=production node dist/main 2026-06-23T01:23:10.413Z INF | pid=45 hostname=d39ceaa4ecbe context=DatabaseModule msg=Establishing database connection 2026-06-23T01:23:10.439Z INF | pid=45 hostname=d39ceaa4ecbe context=RedisModule msg=default: the connection was successfully established 2026-06-23T01:23:10.507Z INF | pid=45 hostname=d39ceaa4ecbe context=DatabaseModule msg=Database connection successful 2026-06-23T01:23:10.984Z INF | pid=45 hostname=d39ceaa4ecbe context=DatabaseMigrationService msg=No pending database migrations 2026-06-23T01:23:10.998Z INF | pid=45 hostname=d39ceaa4ecbe context=NestApplication msg=Nest application successfully started 2026-06-23T01:23:11.012Z INF | pid=45 hostname=d39ceaa4ecbe context=NestApplication msg=Listening on http://127.0.0.1:3000 / https://docs.vvzvlad.xyz 2026-06-23T01:23:22.009Z INF | pid=45 hostname=d39ceaa4ecbe req={"method":"POST","url":"/api/ai-chat/stream","ip":"172.18.0.15","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15"} context=AiChatController msg=AI chat stream START chat=new ua="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15" 2026-06-23T01:23:23.724Z INF | pid=45 hostname=d39ceaa4ecbe req={"method":"POST","url":"/api/ai-chat/stream","ip":"172.18.0.15","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15"} context=AiHttp msg=provider request #3 -> POST api.z.ai/api/coding/paas/v4/chat/completions 2026-06-23T01:24:02.863Z INF | pid=45 hostname=d39ceaa4ecbe req={"method":"POST","url":"/api/ai-chat/stream","ip":"172.18.0.15","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/148.0.0.0 Safari/537.36"} context=AiChatController msg=AI chat stream START chat=new ua="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/148.0.0.0 Safari/537.36" 2026-06-23T01:24:03.078Z INF | pid=45 hostname=d39ceaa4ecbe req={"method":"POST","url":"/api/ai-chat/stream","ip":"172.18.0.15","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/148.0.0.0 Safari/537.36"} context=AiHttp msg=provider request #4 -> POST api.z.ai/api/coding/paas/v4/chat/completions 2026-06-23T01:24:25.348Z WRN | pid=45 hostname=d39ceaa4ecbe req={"method":"POST","url":"/api/ai-chat/stream","ip":"172.18.0.15","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15"} context=AiHttp msg=provider request #3 x after 61624ms: fetch failed <- [UND_ERR_REQ_CONTENT_LENGTH_MISMATCH] Request body length does not match content-length header 2026-06-23T01:24:27.358Z INF | pid=45 hostname=d39ceaa4ecbe req={"method":"POST","url":"/api/ai-chat/stream","ip":"172.18.0.15","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15"} context=AiHttp msg=provider request #5 -> POST api.z.ai/api/coding/paas/v4/chat/completions 2026-06-23T01:25:05.723Z WRN | pid=45 hostname=d39ceaa4ecbe req={"method":"POST","url":"/api/ai-chat/stream","ip":"172.18.0.15","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/148.0.0.0 Safari/537.36"} context=AiHttp msg=provider request #4 x after 62645ms: fetch failed <- [ECONNRESET] read ECONNRESET 2026-06-23T01:25:07.725Z INF | pid=45 hostname=d39ceaa4ecbe req={"method":"POST","url":"/api/ai-chat/stream","ip":"172.18.0.15","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/148.0.0.0 Safari/537.36"} context=AiHttp msg=provider request #6 -> POST api.z.ai/api/coding/paas/v4/chat/completions 2026-06-23T01:25:29.207Z WRN | pid=45 hostname=d39ceaa4ecbe req={"method":"POST","url":"/api/ai-chat/stream","ip":"172.18.0.15","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15"} context=AiHttp msg=provider request #5 x after 61849ms: fetch failed <- [ECONNRESET] read ECONNRESET 2026-06-23T01:25:33.211Z INF | pid=45 hostname=d39ceaa4ecbe req={"method":"POST","url":"/api/ai-chat/stream","ip":"172.18.0.15","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15"} context=AiHttp msg=provider request #7 -> POST api.z.ai/api/coding/paas/v4/chat/completions нихуя не помогло же

Ghost referenced this issue from a commit

2026-06-23 05:16:31 +03:00

fix(ai-chat): root-cause #140 — stop aborting z.ai's slow first byte

Ghost referenced this pull request

2026-06-23 05:17:05 +03:00

fix(ai-chat): root-cause #140 — stop aborting z.ai's slow first byte (supersedes #141) #144

Ghost referenced this issue from a commit

2026-06-23 05:18:47 +03:00

fix(ai-chat): root-cause #140 — stop aborting z.ai's slow first byte

~~Ghost referenced this pull request 2026-06-23 05:26:41 +03:00~~

[bug][ai-chat] z.ai (GLM-5.2 coding plan) intermittently stalls/RSTs the chat stream (UND_ERR_HEADERS_TIMEOUT / ECONNRESET) #140

~~Ghost referenced this pull request 2026-06-23 16:22:19 +03:00~~