chore(ai): passive z.ai provider HTTP telemetry (#175)

Investigate the intermittent (~20-30%) long-turn failure
"Lost connection to the AI provider" = AI_RetryError / read ECONNRESET
on the gitmost->z.ai link (browser-agnostic, mid-turn). Pure
instrumentation, no behavior change:

- ai-http-diagnostics.ts: a passive fetch wrapper injected into the
  OpenAI-compatible (z.ai) client. Per provider HTTP call it logs
  time-to-headers/status on success, and on a pre-response rejection the
  latency, error code/cause, request-body size and idle-gap since the
  previous call. The Response is returned untouched (streaming intact),
  errors rethrown unchanged; no retry/timeout/dispatcher.
- ai.service.ts: wire the instrumented fetch into the openai case only.

Lets us classify the reset as connection-phase vs mid-stream before
choosing a fix, without repeating the reverted RetryAgent (#140).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
claude_code
2026-06-24 21:24:05 +03:00
parent 04a418e1a6
commit 4cc8df836f
2 changed files with 91 additions and 1 deletions

View File

@@ -14,6 +14,8 @@ import { AiNotConfiguredException } from './ai-not-configured.exception';
import { AiEmbeddingNotConfiguredException } from './ai-embedding-not-configured.exception';
import { AiSttNotConfiguredException } from './ai-stt-not-configured.exception';
import { describeProviderError } from './ai-error.util';
// DIAGNOSTIC (provider ECONNRESET investigation) — temporary.
import { createDiagnosticFetch } from './ai-http-diagnostics';
import { AiProviderCredentialsRepo } from '@docmost/db/repos/ai-chat/ai-provider-credentials.repo';
import { SecretBoxService } from '../crypto/secret-box';
import { AiDriver } from './ai.types';
@@ -43,6 +45,13 @@ export interface ChatModelOverride {
export class AiService {
private readonly logger = new Logger(AiService.name);
// DIAGNOSTIC (provider ECONNRESET investigation) — temporary: passive
// instrumentation of the OpenAI-compatible provider HTTP calls (z.ai).
// Logs call timing/outcome only — no behavior change.
private readonly aiDiagnosticFetch = createDiagnosticFetch(
'AiService:provider-http',
);
constructor(
private readonly aiSettings: AiSettingsService,
private readonly aiProviderCredentialsRepo: AiProviderCredentialsRepo,
@@ -140,7 +149,13 @@ export class AiService {
// Responses API (/responses), which OpenAI-compatible gateways
// (OpenRouter, etc.) reject on multi-turn requests (history with
// assistant messages) → 400.
return createOpenAI({ apiKey, baseURL: baseUrl }).chat(chatModel);
// DIAGNOSTIC (provider ECONNRESET investigation) — temporary: pass the
// passive instrumented fetch (logging only; no behavior change).
return createOpenAI({
apiKey,
baseURL: baseUrl,
fetch: this.aiDiagnosticFetch,
}).chat(chatModel);
case 'gemini':
return createGoogleGenerativeAI({ apiKey })(chatModel);
case 'ollama':