fix(ai-chat): adopt the server-returned chat id (two-tab adoption race #137) #138
Reference in New Issue
Block a user
Delete Branch "fix/ai-chat-chatid-adoption"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Fixes the AI-chat adoption race (#137), reproduced live with DB proof, plus a docs clarification.
The bug
A brand-new chat (
activeChatId === null) had no way to learn the id of the row the server created — the SSE stream never returned it — so the client adopted the newest chat in the per-user list (chats.items[0]). With two tabs open, a second tab creating a chat at ~the same time made its row the newest, so the first tab adopted the wrong id: its later turns persisted into the other chat and the agent rebuilt history from it (commands "leaked" between chats), while the live UI still showed the original conversation.The fix
ai-chat.service.ts): attach the authoritativechatIdto the streamed assistant message via the AI SDKmessageMetadataon thestartpart, so it reaches the client on the very first chunk (before any racing tab).chat-thread.tsx,ai-chat-window.tsx): readmessage.metadata.chatIdinuseChat'sonFinishand adopt that id in place (no remount → the live turn and the thread'schatIdReffollow the real id, so the next turn targets the right chat). Removed thechats.items[0]guess and theadoptNewChatref.Verified live (the two-tab race that confirmed the bug)
Tab A new chat "codeword FIG" → while A streams, Tab B new chat "codeword CIRCUS" → A asks again. Run 3×: A's both turns now stay in A's own chat row (the FIG one); A's 2nd turn no longer lands in B's chat. Single-tab new-chat (2 turns + reload → full history in the correct chat) and existing-chat flows unaffected.
tscclean (client + server).Also: AGENTS.md migration note
Clarifies that DB migrations auto-apply only in production (built image /
start:prod); the local dev stand (pnpm dev/nest start --watch) does not auto-run them — you mustpnpm --filter server migration:latestafter pulling/switching branches. (During this investigation, the unapplied20260622…-ai-chat-page-originmigration blanket-500'd all of AI chat on the dev stand — an env gotcha, not a code bug.)🤖 Generated with Claude Code
Code review — adoption race fix (#137)
Verdict: Request changes. The core fix is correct and genuinely closes #137 — streaming the authoritative
chatIdin message metadata and adopting it in place (instead of guessingchats.items[0]) is the right approach. Security / conventions / documentation / regressions / simplification all came back clean. Two things to address before merge, plus one non-blocking architecture note.⚠️ Warning — a new-chat first turn that fails before the
startchunk can create a duplicate chat rowAdoption is now gated on a truthy
serverChatId, which only exists once the AI SDK has processed the stream'sstartpart. But the server creates the chat row and persists the user message before the stream is wired (apps/server/src/core/ai-chat/ai-chat.service.ts~232–261). If anything fails in that window — a history DB read,convertToModelMessages, or any synchronous throw before the first chunk (all re-thrown by thetry/catchat ~520 → the controller writes a bare 500 with no SSE) — the client never receives achatId.On that path both
onFinish({ isError })andonErrorcallonTurnFinished(undefined), so adoption is skipped andchatIdRef.currentstaysnull. The composer is still enabled, so the user retries → the next send carrieschatId: null→ the server inserts a second chat row, orphaning the first turn in the first row.The old code adopted unconditionally (via the
adoptNewChatref + the chat-list refetch), so it didn't have this gap — the baseonErrorcomment even states the intent: "so a brand-new chat that fails its first turn is adopted." The PR silently drops that (and that comment is now stale).Fix: for the
activeChatId === null && !serverChatIdbranch, don't drop adoption silently — e.g. gate the next send so a failed first turn can't create a second row, or fall back to the list refetch only when there is exactly one unambiguous candidate. Also update the now-staleonErrorcomment.⚠️ Test coverage — the fix ships with no tests
This is a concurrency bug fix with no regression test; nothing locks #137 in.
onTurnFinished) has three branches — adopt / don't-adopt / skip — and none is tested, including the don't-adopt branch where the warning above lives. The project already extracts orchestration logic out ofchat-threadintoutils/queue-helpers.tswith aqueue-helpers.test.ts; the same pattern fits a pureadoptedChatId(activeChatId, serverChatId)helper. A test that returnsserverChatId(not "newest in list") is the regression guard for #137.messageMetadata(attachingchatIdonly on thestartpart) has no test —ai-chat.service.spec.tsis untouched. If it ever regresses (wrong part / wrong field), #137 quietly returns. Extract the callback to an exported helper and assertstart → { chatId },finish → undefined.💡 Architecture (non-blocking, forward-looking)
This is the 2nd id-handling fix in this exact spot. The new→persisted chat identity is spread across ~5 cooperating state slots (
activeChatIdatom,liveThreadChatId,threadKey,chatIdRef,chatStoreId) plus a render-phase guard and a load-bearing "both updates batch in one render" assumption. This PR improves the trigger (drops the heuristicuseEffect+ theadoptNewChatref) but leaves the scattering in place. When this area is next touched, worth considering either lettingChatThreadown its own id transition, or modeling the lifecycle as a small{ new | adopting | persisted }state machine so the "staleliveThreadChatId" state becomes unrepresentable. Not a merge blocker.Multi-aspect review: security · stability · conventions · documentation · regressions · test-coverage · simplification · architecture. The stability finding was verified against the controller/service error path.