feat(ai-chat): bound external MCP tool calls with per-call timeouts
External MCP tools (web search, crawl) had no per-call timeout: a hung tool call was only broken by the 15-min transport silence timeout shared with the chat provider, and a server that kept the socket warm but never returned could spin until the user cancelled. Add two independent, composing bounds for external MCP traffic (the chat provider path is unchanged): - Silence 5 min: buildPinnedDispatcher now overrides headersTimeout/ bodyTimeout with mcpStreamTimeoutMs() (AI_MCP_STREAM_TIMEOUT_MS, default 300000) on the external-MCP dispatcher only, so a byte-silent upstream is severed in ~5 min instead of 15. - Total per-call 15 min: wrapToolWithCallTimeout wraps each external tool's execute with a fresh AbortController + timer composed with the turn signal via AbortSignal.any (AI_MCP_CALL_TIMEOUT_MS, default 900000). It RACES the call against the abort signal because @ai-sdk/mcp does not settle its in-flight promise on abort, so a warm-but-stuck call would otherwise hang forever. On timeout the call surfaces as a tool-error and the agent loop recovers. Add tests (incl. a never-settling real-client-style stub) and document both env vars in .env.example.
This commit is contained in:
@@ -70,6 +70,47 @@ export function streamKeepAliveMs(): number {
|
||||
return positiveEnv('AI_STREAM_KEEPALIVE_MS', DEFAULT_STREAM_KEEPALIVE_MS);
|
||||
}
|
||||
|
||||
/** Default SILENCE timeout for EXTERNAL-MCP transport (5 min). */
|
||||
const DEFAULT_MCP_STREAM_TIMEOUT_MS = 300_000;
|
||||
|
||||
/** Default total wall-clock cap for ONE external MCP tool call (15 min). */
|
||||
const DEFAULT_MCP_CALL_TIMEOUT_MS = 900_000;
|
||||
|
||||
/**
|
||||
* SILENCE timeout (ms) for EXTERNAL-MCP transport ONLY. Override with
|
||||
* `AI_MCP_STREAM_TIMEOUT_MS`; a missing/invalid/non-positive value falls back to
|
||||
* {@link DEFAULT_MCP_STREAM_TIMEOUT_MS} (5 min).
|
||||
*
|
||||
* Deliberately tighter than the chat provider's {@link streamTimeoutMs} (15 min)
|
||||
* so a byte-silent/hung MCP upstream is broken in ~5 min instead of 15. This is
|
||||
* the undici `headersTimeout`/`bodyTimeout` for the external-MCP dispatcher only
|
||||
* — it must NOT change the chat provider, which legitimately needs 15 min between
|
||||
* reasoning chunks (#175).
|
||||
*
|
||||
* Trade-off: a legitimately long but byte-silent single tool call (a slow crawl
|
||||
* that emits nothing until done) and an SSE transport that idles >5 min BETWEEN
|
||||
* tool calls are also cut here. The per-call total cap ({@link mcpCallTimeoutMs},
|
||||
* applied in mcp-clients.service) is the complementary guard for chatty-but-stuck
|
||||
* calls that keep the socket warm yet never return.
|
||||
*/
|
||||
export function mcpStreamTimeoutMs(): number {
|
||||
return positiveEnv('AI_MCP_STREAM_TIMEOUT_MS', DEFAULT_MCP_STREAM_TIMEOUT_MS);
|
||||
}
|
||||
|
||||
/**
|
||||
* Total wall-clock cap (ms) for ONE external MCP tool call — APP-LEVEL, not
|
||||
* transport. Override with `AI_MCP_CALL_TIMEOUT_MS`; a missing/invalid/
|
||||
* non-positive value falls back to {@link DEFAULT_MCP_CALL_TIMEOUT_MS} (15 min).
|
||||
*
|
||||
* Catches a tool that keeps the connection warm (SSE heartbeats / trickle) but
|
||||
* never returns a result — which the transport silence timeout
|
||||
* ({@link mcpStreamTimeoutMs}) would never break because the socket never goes
|
||||
* byte-silent.
|
||||
*/
|
||||
export function mcpCallTimeoutMs(): number {
|
||||
return positiveEnv('AI_MCP_CALL_TIMEOUT_MS', DEFAULT_MCP_CALL_TIMEOUT_MS);
|
||||
}
|
||||
|
||||
/**
|
||||
* undici `Agent` options for streaming AI traffic — the (generous, finite)
|
||||
* silence timeouts plus the keep-alive recycle window. Shared by the chat
|
||||
|
||||
Reference in New Issue
Block a user