feat(ai-chat): load full transcript for model history (drop 50-msg window)

The per-turn model conversation was rebuilt via findRecent(chatId, ws, 50),
a sliding window that dropped the beginning of any chat longer than ~50 stored
rows. Switch streamChat to the existing findAllByChat, which loads the full
non-deleted transcript chronologically with a 5000-row memory-safety backstop
(keeps the newest rows + logs a warning on overflow) — a safety net, not a
conversational limit. Remove the now-unused findRecent method and update the
comments/log text that referenced it (findAllByChat now feeds both the Markdown
export and the model history).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
claude_code
2026-06-25 23:51:41 +03:00
committed by claude code agent 227
parent fad1aa0501
commit 1f459d8d26
3 changed files with 23 additions and 42 deletions

View File

@@ -240,7 +240,7 @@ describe('prepareAgentStep', () => {
* write path. It runs identically for the upfront insert (empty steps, * write path. It runs identically for the upfront insert (empty steps,
* 'streaming'), every per-step update, and the terminal finalize — so a future * 'streaming'), every per-step update, and the terminal finalize — so a future
* background worker can call the same function. These tests pin the four status * background worker can call the same function. These tests pin the four status
* shapes and the `metadata.parts` shape that rowToUiMessage/findRecent depend on * shapes and the `metadata.parts` shape that rowToUiMessage/findAllByChat depend on
* (per-step text + tool parts via assistantParts, in-progress text appended). * (per-step text + tool parts via assistantParts, in-progress text appended).
*/ */
describe('flushAssistant', () => { describe('flushAssistant', () => {

View File

@@ -322,12 +322,14 @@ export class AiChatService implements OnModuleInit {
// Rebuild the conversation from persisted history (not the client payload), // Rebuild the conversation from persisted history (not the client payload),
// so the model always sees the authoritative server-side transcript. Load // so the model always sees the authoritative server-side transcript. Load
// the most RECENT tail (oldest -> newest) so chats longer than one page do // the FULL history in chronological order (oldest -> newest, incl. the user
// not drop recent turns (incl. the user message just inserted above). // message just inserted above) so NO turns are dropped — there is no
const history = await this.aiChatMessageRepo.findRecent( // recent-tail window anymore. `findAllByChat` keeps a 5000-row memory-safety
// backstop (on overflow it keeps the NEWEST rows and logs a warning); that
// is a safety net far above any realistic chat, not a conversational limit.
const history = await this.aiChatMessageRepo.findAllByChat(
chatId, chatId,
workspace.id, workspace.id,
50,
); );
const uiMessages = history.map(rowToUiMessage); const uiMessages = history.map(rowToUiMessage);
// convertToModelMessages is async in ai@6.0.134 (returns Promise<ModelMessage[]>). // convertToModelMessages is async in ai@6.0.134 (returns Promise<ModelMessage[]>).
@@ -1215,7 +1217,7 @@ export async function applyFinalize(
* *
* `metadata.parts` is built by assistantParts over the finished steps, then the * `metadata.parts` is built by assistantParts over the finished steps, then the
* in-progress text appended as a trailing text part, so rowToUiMessage / * in-progress text appended as a trailing text part, so rowToUiMessage /
* findRecent keep replaying the turn unchanged. `metadata.finishReason`, * findAllByChat keep replaying the turn unchanged. `metadata.finishReason`,
* `metadata.error`, `metadata.usage`, `metadata.contextTokens` and * `metadata.error`, `metadata.usage`, `metadata.contextTokens` and
* `metadata.maxContextTokens` are attached only when provided/relevant, matching * `metadata.maxContextTokens` are attached only when provided/relevant, matching
* the pre-#183 onFinish/onError records. * the pre-#183 onFinish/onError records.

View File

@@ -18,7 +18,8 @@ import { executeWithCursorPagination } from '@docmost/db/pagination/cursor-pagin
// (multi-instance deploy). // (multi-instance deploy).
const SWEEP_STREAMING_STALE_MS = 10 * 60 * 1000; // 10 minutes const SWEEP_STREAMING_STALE_MS = 10 * 60 * 1000; // 10 minutes
// Hard upper bound on the rows materialized by `findAllByChat` (export path). // Hard upper bound on the rows materialized by `findAllByChat`, which now feeds
// BOTH the Markdown export and the per-turn model history.
// A generous cap so a pathologically huge chat cannot load an unbounded result // A generous cap so a pathologically huge chat cannot load an unbounded result
// into memory; far above any realistic transcript length. // into memory; far above any realistic transcript length.
const FIND_ALL_BY_CHAT_LIMIT = 5000; const FIND_ALL_BY_CHAT_LIMIT = 5000;
@@ -78,14 +79,17 @@ export class AiChatMessageRepo {
} }
// Load ALL (non-deleted) messages of a chat in ascending chronological order // Load ALL (non-deleted) messages of a chat in ascending chronological order
// (oldest -> newest), unpaginated. Used by the server-side Markdown export // (oldest -> newest), unpaginated. Two callers, both treating the DB as the
// (#183), where the DB is the single source of truth and the whole transcript // single source of truth and needing the whole transcript in one pass
// must be rendered in one pass (findByChat is cursor-paginated and would only // (findByChat is cursor-paginated and would only return the first page):
// return the first page). // - the server-side Markdown export (#183);
// - the per-turn model history, rebuilt fresh on every turn so the model
// sees the full authoritative transcript.
// //
// Hard-capped at FIND_ALL_BY_CHAT_LIMIT rows (a generous bound, far above any // Hard-capped at FIND_ALL_BY_CHAT_LIMIT rows (a generous bound, far above any
// realistic transcript) so exporting a pathologically huge chat cannot // realistic transcript) — a shared memory-safety backstop for BOTH paths so a
// materialize an unbounded result set in memory. // pathologically huge chat cannot materialize an unbounded result set in
// memory. On overflow the NEWEST rows are kept and a warning is logged.
async findAllByChat( async findAllByChat(
chatId: string, chatId: string,
workspaceId: string, workspaceId: string,
@@ -93,9 +97,9 @@ export class AiChatMessageRepo {
limit: number = FIND_ALL_BY_CHAT_LIMIT, limit: number = FIND_ALL_BY_CHAT_LIMIT,
): Promise<AiChatMessage[]> { ): Promise<AiChatMessage[]> {
// Fetch newest-first (+1 to DETECT truncation), so on overflow we keep the // Fetch newest-first (+1 to DETECT truncation), so on overflow we keep the
// NEWEST `limit` messages — the recent conversation matters most for an // NEWEST `limit` messages — the recent conversation matters most — rather
// export — rather than silently dropping the tail (#183 review). Reverse back // than silently dropping the tail (#183 review). Then reverse back to
// to chronological for rendering, like findRecent. // chronological order (oldest -> newest) for rendering / model replay.
const rows = await this.db const rows = await this.db
.selectFrom('aiChatMessages') .selectFrom('aiChatMessages')
.select(this.baseFields) .select(this.baseFields)
@@ -110,38 +114,13 @@ export class AiChatMessageRepo {
if (rows.length > limit) { if (rows.length > limit) {
rows.length = limit; // keep the newest `limit` (rows are newest-first here) rows.length = limit; // keep the newest `limit` (rows are newest-first here)
this.logger.warn( this.logger.warn(
`Chat ${chatId} export truncated to the newest ${limit} messages ` + `Chat ${chatId} truncated to the newest ${limit} messages ` +
`(older messages omitted).`, `(older messages omitted).`,
); );
} }
return rows.reverse(); return rows.reverse();
} }
// Load the most RECENT `limit` messages for a chat and return them in
// ascending chronological order (oldest -> newest), as the model expects.
// `findByChat` returns the FIRST page ASC (the OLDEST messages), which loses
// recent turns once a chat grows beyond a page; this rebuilds the model
// history from the tail instead. Plain query (no cursor pagination).
async findRecent(
chatId: string,
workspaceId: string,
limit: number,
): Promise<AiChatMessage[]> {
const rows = await this.db
.selectFrom('aiChatMessages')
.select(this.baseFields)
.where('chatId', '=', chatId)
.where('workspaceId', '=', workspaceId)
.where('deletedAt', 'is', null)
.orderBy('createdAt', 'desc')
.orderBy('id', 'desc')
.limit(limit)
.execute();
// Selected newest-first for the limit; reverse to oldest-first for the model.
return rows.reverse();
}
async insert( async insert(
insertable: InsertableAiChatMessage, insertable: InsertableAiChatMessage,
trx?: KyselyTransaction, trx?: KyselyTransaction,