fix(ai-chat): settle detached runs on pre-stream failures + review fixes (#184)

CRITICAL: any failure between a successful beginRun and streamText's terminal callbacks taking ownership (the bare awaits: user-message insert, history load, convertToModelMessages, settings resolve; the buildSystemPrompt/forUser block; and synchronous streamText wiring) left ai_chat_runs stuck 'running' forever (sweepRunning only runs at startup), which then 409'd every future turn in the chat and made the observer tab poll forever. Wrap the body of stream() after beginRun in a safety-net try/catch that settles the run to 'error' (via onSettled) before rethrowing, and make finalizeRun idempotent (active.delete is the once-guard) so a settle here and a settle from a streamText callback collapse to a single terminal write. Also from review comment 2519: - correct three client comments that falsely claimed /ai-chat/run is "flag-gated server-side and would 403" — it is owner-gated only; with the feature off the chat simply has no runs so the endpoint returns { run: null } (ai-chat-window.tsx, ai-chat-service.ts, ai-chat-query.ts). - remove the dead UpdatableAiChatRun type (zero usages; the repo update uses an inline Partial<...>). - add controller specs for POST /ai-chat/run and /ai-chat/stop (owner-gating, run:null when no run, run+message, stop by runId and by chatId). - add tests: an exception after beginRun settles the run to 'error' and drops the in-memory entry (next turn is not 409'd); finalizeRun is idempotent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 14:54:19 +03:00
parent 1abf9356a9
commit 4c0a4eb9cc
9 changed files with 876 additions and 528 deletions
--- a/apps/server/src/core/ai-chat/ai-chat-run.service.ts
+++ b/apps/server/src/core/ai-chat/ai-chat-run.service.ts
@@ -201,10 +201,19 @@ export class AiChatRunService implements OnModuleInit {

  /**
   * Finalize a run to its terminal status (succeeded / failed / aborted),
-   * stamping finishedAt + any error, and DROP its in-memory entry. Idempotent
-   * and best-effort: the at-most-once turn terminal callbacks call it, but a
-   * double call (or a call after the row was swept) merely re-writes the same
-   * terminal row.
+   * stamping finishedAt + any error, and DROP its in-memory entry. Best-effort.
+   *
+   * IDEMPOTENT (#184 review): the terminal write happens AT MOST ONCE per run.
+   * `this.active.delete(runId)` returns false when the run was already settled
+   * (its in-memory entry already dropped); in that case we no-op. This collapses
+   * a legitimate double-settle to a single write: AiChatService.stream wraps the
+   * turn in a safety-net catch that settles the run to 'error' on any failure
+   * BEFORE streamText's terminal callbacks own the lifecycle — and on the rare
+   * path where streamText DID attach (so a callback also settles) the two would
+   * otherwise both call onSettled. The first caller wins and writes the terminal
+   * row; the second returns early, so a late settle can never clobber the real
+   * terminal status or double-write. beginRun always registers the entry before
+   * the turn can settle, so a legitimate first finalize always finds it.
   */
  async finalizeRun(
    runId: string,
@@ -212,7 +221,7 @@ export class AiChatRunService implements OnModuleInit {
    turnStatus: TurnTerminalStatus,
    error?: string,
  ): Promise<void> {
-    this.active.delete(runId);
+    if (!this.active.delete(runId)) return;
    try {
      await this.runRepo.update(runId, workspaceId, {
        status: mapTurnStatusToRun(turnStatus),
@@ -258,10 +267,7 @@ export class AiChatRunService implements OnModuleInit {

  /** Fetch a run by id (workspace-scoped). Used to resolve + ownership-check an
   *  explicit stop targeting a runId. */
-  getRun(
-    runId: string,
-    workspaceId: string,
-  ): Promise<AiChatRun | undefined> {
+  getRun(runId: string, workspaceId: string): Promise<AiChatRun | undefined> {
    return this.runRepo.findById(runId, workspaceId);
  }