The "one active run per chat" guard was bypassable under a race. Two
simultaneous POST /ai-chat/stream on the same chat both passed the
controller's pre-hijack 409 check (a check-then-act TOCTOU), then the
loser's INSERT into ai_chat_runs hit the partial unique index
(ai_chat_runs_one_active_per_chat, 23505). That error was SWALLOWED, so
the second turn streamed UNTRACKED: no runId, not targetable by /stop,
and (autonomousRuns on) onClose won't abort it -> an orphan unstoppable
run that also spends provider tokens.
Make the unique-index INSERT the authoritative gate:
- AiChatRunService.beginRun: when the run-row INSERT fails with a 23505 on
ONE_ACTIVE_RUN_PER_CHAT_INDEX (via isUniqueViolation/violatedConstraint),
no longer swallow it -> throw a distinct RunAlreadyActiveError. Any other
error (incl. a 23505 on a different constraint) propagates unchanged.
- AiChatService.stream: when begin throws RunAlreadyActiveError, reject the
turn with a 409 ConflictException (code A_RUN_ALREADY_ACTIVE) BEFORE any
AI/provider call -> no tokens spent, no untracked turn. Other begin
failures keep the legacy best-effort fallback (stream socket-bound).
- ai-chat.controller: post-hijack catch honors an HttpException's real
status/body (clean 409) instead of a blanket 500, since the race 409 is
raised before a byte is written. Pre-check 409 now carries the same code.
The controller's cheap pre-check stays as a fast-path for the common
sequential double-submit; the INSERT violation is the race-safe backstop.
Tests: ai-chat-run.service.spec proves beginRun throws RunAlreadyActiveError
on the active-index 23505 (and only that constraint), leaks no controller,
and an integration-style two-concurrent-begins test where exactly one wins;
new ai-chat.service.run-race.spec proves stream rejects with a 409
ConflictException BEFORE any streamText/generateText and never persists an
untracked turn. The latter fails without the fix.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>