- Invert the transport layers so the pre-response retry is OUTERMOST and the
provider-HTTP instrumentation is INNER. Before, the retry lived inside
createStreamingFetch (under the instrumentation), so a reset the retry
recovered from logged only a clean "OK status=200" — the
"PRE-RESPONSE FAILED ... ECONNRESET ... idleSincePrevCall" signal went blind
exactly when the fix works, and AI_STREAM_KEEPALIVE_MS couldn't be tuned from
prod data. Now createStreamingFetch is the dispatcher-bound BASE (no retry) and
a new withPreResponseRetry() wraps it; ai.service composes
withPreResponseRetry(createInstrumentedFetch('AiService:provider-http',
createStreamingFetch())), so every attempt — including recovered resets — flows
through the instrumentation. (Also expresses the keepAlive-config vs retry-
behavior boundary structurally, per review #3.)
- Add the retry-exhaustion test: a server that resets EVERY connection, asserting
the call rejects with a retryable connection error AND exactly
PRE_RESPONSE_CONNECT_RETRIES + 1 (= 3) requests reached the server — pinning the
bound and that the final error propagates (guards an off-by-one / infinite loop
/ swallowed error). Existing happy-retry + abort tests moved onto
withPreResponseRetry.
Verified on the stand: a normal turn still streams (reasoning + finish) and the
provider-HTTP telemetry still logs. server tsc + ai/mcp specs green (30).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>