The header badge in the floating AI-chat window flipped meaning between
states (a live per-turn token counter while streaming vs. the context
size at rest), which made it "reset to 1" on each prompt and confused
users. Make it consistently show the current context size, with the
model's context window as an optional "/ max" denominator.
The max comes from a new admin-set AI setting (chatContextWindow, in
tokens) — provider-independent and always exact. The server stamps it
onto the assistant message metadata (maxContextTokens) next to
contextTokens, so the client reads both from the last row with no
client-side model resolution (survives shares / future per-role models).
- server: chatContextWindow in AiProviderSettings/keys/masked/resolved,
DTO (@IsInt @Min(0)), settings-service resolve/getMasked, repo parity
allowlist; flushAssistant writes metadata.maxContextTokens when > 0.
- client: ContextBadge component (extracted, shows "current [/ max]",
no live mode); removed the liveTurnTokens header path + dead util fn;
Context-window NumberInput in AI settings; i18n strings.
- live "Thinking · N tokens" feedback in the chat body is unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Approve-with-comments re-review; no blockers. All 7 actionable points (8 is a
forward-looking architecture note — recommendation A, keep as-is):
1. chat-markdown.util spec: restore parity coverage of the removed client spec —
tool error state (+ errorText), unknown-tool fallback (`Ran tool <name>` en /
`Выполнил инструмент <name>` ru), and the circular-output stringify catch.
2. findAllByChat row cap is now testable (injectable limit) + an int-spec proves
truncation on a modest volume.
3. Stability: the per-step durability updates are SERIALIZED via a promise chain
(stepUpdateChain) so they commit in step order — onlyIfStreaming already
closed the finalize race, this closes inter-step ordering.
4. findAllByChat keeps the NEWEST messages on truncation (order DESC + reverse,
like findRecent) and logs a warning with chatId, instead of silently dropping
the newest tail.
5. The LABELS parity comment already references the real path (tool-parts.tsx /
toolLabelKey) — confirmed accurate.
6. Removed the redundant 'off-by-one boundary' test (strict subset of the two
adjacent prepareAgentStep cases).
7. Extracted the terminal-finalize dispatch into a shared `applyFinalize`, used
by BOTH the service's finalizeAssistant and its test — the test now exercises
the real path, not a copy, so a production drift fails it.
Verified: server build + 325 ai-chat unit + 6 integration; prettier clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
15-point review of the persistent-history PR. Architecture decisions: crash
recovery = recency threshold; tool-label duplication = leave as-is.
Must-fix:
1. Boot-sweep bounded by recency. sweepStreaming now also requires
`updatedAt < now() - SWEEP_STREAMING_STALE_MS` (10 min), so a fresh replica's
startup sweep can't abort a turn another replica is actively streaming
(multi-instance deploy). Int-spec: a FRESH 'streaming' row is NOT swept, a
STALE one IS.
2. Restore export during the FIRST streaming turn of a new chat (#174). The
server chatId is now adopted EARLY (in-place, on the start-chunk metadata) via
a new `onServerChatId` callback wired through use-chat-session → chat-thread,
so `activeChatId` is set at turn start and the Copy button is live mid-first-
turn (canExport = !!activeChatId). Hook tests for early/in-place/no-op adopt.
3. Cover finalizeAssistant's fallback-insert branch: extracted pure
`planFinalizeAssistant(assistantId)` (update when id present, insert when the
upfront insert failed) + a dispatch harness test for both arms.
Tests: onModuleInit lifecycle spec (sweep called; throw → resolves + warns);
int-spec updatedAt assertion → toBeGreaterThan.
Cleanups: cap findAllByChat at 5000 rows; upfront-insert-failure log carries
chatId+workspaceId; removed the now-dead buildPartialAssistantRecord (only the
spec consumed it; shapes still pinned by the flushAssistant suite); controller
passes `lang: dto.lang` (normalizeLang handles undefined); dropped a no-op
`?? undefined` in errorOf; documented the content-column semantics change
(concatenated step text, UI renders from metadata.parts); CHANGELOG [Unreleased]
entry (#183, #174); reworded the stale LABELS parity comment.
Verified: server build + 323 ai-chat unit + 5 integration; client tsc + 160
ai-chat unit; prettier clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The client sends the "current page" as { id, title } in the request body and the
server echoed BOTH verbatim into the system prompt context and the
getCurrentPage tool. id and title are independently attacker/desync-controllable
(two tabs, stale navigation), so openPage.id could point at page B while
openPage.title said "Page A" — the model then reported "updated Page A" while it
actually edited page B (CASL still allowed it; the user has access). Red-team
finding #4.
Resolve the open page ONCE against the DB via a new `resolveOpenPageContext`:
workspace-scoped lookup + access check, returning the AUTHORITATIVE { id, title }
(title from the DB row, never the client) or null (fail-closed) for a missing /
foreign / inaccessible page. That validated value now feeds the system prompt,
the getCurrentPage tool, AND the new-chat history origin (which previously did
this validation inline, for the id only — now shared, and the title is fixed
too).
Tests: resolveOpenPageContext covers no-id, not-found, foreign-workspace,
Forbidden, non-Forbidden-fault (fail-closed), the DB-title-wins-over-client case,
and null-title coercion.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Admins can now give each EXTERNAL MCP server a free-text instruction ("how/
when to use this server's tools") that the agent receives in its SYSTEM
PROMPT next to the tool descriptions — porting the built-in SERVER_INSTRUCTIONS
idea to admin-configured servers. Trusted, admin-authored text (like a system
prompt); NON-secret, so unlike headersEnc it IS returned in views/forms.
- Migration: nullable `instructions text` on ai_mcp_servers (old rows = null =
no guidance). Table type + repo insert/update (blank/whitespace -> null via
blankToNull). DTO `@MaxLength(4000)`. Service threads it through
McpServerView/toView.
- mcp-clients: `McpServerInstruction { serverName, toolPrefix, instructions }`
threaded through the toolset/cache/lease. Guidance is built ONLY for a server
that actually connected AND contributed >=1 callable tool (the allowlist may
filter all of them out) AND has non-blank text — so a guide never appears for
tools the agent cannot call. Cached with the toolset, so an edit is picked up
next turn via the existing CRUD cache invalidation.
- System prompt: `buildMcpToolingBlock` renders an <mcp_tooling> block INSIDE
the safety sandwich (after context, before the trailing SAFETY_FRAMEWORK) so
it informs tool choice but cannot override the rules; each section is headed
by the server's `prefix_*` namespace. Empty/blank -> block omitted. The
caller (ai-chat.service) now builds the external toolset BEFORE the prompt and
passes external.instructions; client-handle lifecycle (close-once) unchanged.
- Client: instructions field in types + a Textarea (autosize, maxLength 4000)
in the MCP-server form with a namespace-prefix hint; i18n (en/ru).
Tests across every layer (prompt block placement + both SAFETY copies; view
blank->null; buildEntry includes guidance only for connected+>=1-tool+non-blank;
DTO MaxLength; repo + integration round-trip; service wiring). Delegated impl
reviewed (APPROVE); applied the import-type follow-up.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The chat lived in inconsistent paradigms (in-memory stream + client export vs.
DB-as-context), which made export flaky and lost the assistant answer if the
process died mid-turn. Make the DB the single source of truth.
A. STEP-GRANULAR DURABILITY (server)
- ai_chat_messages gains a nullable `status` column (migration; NULL = legacy =
completed). The assistant row is now INSERTED UPFRONT as `status:'streaming'`
and UPDATEd on every onStepFinish with all finished steps (text + tool calls +
tool RESULTS), then finalized once to completed/error/aborted on the terminal
callback. So a process death mid-turn keeps every finished step; a startup
sweep (OnModuleInit → sweepStreaming) flips any dangling 'streaming' row to
'aborted'. The write path no longer depends on a live socket.
- Pure exported `flushAssistant(steps, inProgressText, status, extra?)` builds
the persist payload (metadata.parts byte-identical to the old builder), so a
future background worker can call the same path. AiChatMessageRepo gains
`update`, `sweepStreaming`, and `findAllByChat`.
- consumeStream drain, external-MCP client close-once, SSE heartbeat preserved.
B. SERVER-SIDE EXPORT
- New pure `chat-markdown.util.ts` renders Markdown from DB rows ONLY (server
port of the client builder). Because A persists the in-progress row, the
export now includes an interrupted turn up to its last finished step (flagged
"still generating"). `POST /ai-chat/export` (owner-gated via assertOwnedChat,
workspace-scoped) returns it; `lang` accepts a full client locale tag
('en-US'/'ru-RU') and is normalized server-side (normalizeLang) — a strict
@IsIn(['en','ru']) DTO rejected the real client's i18n.language with a 400,
caught in real-browser testing.
- Client: handleCopy calls the endpoint; `canExport = !!activeChatId`. The whole
liveThreadRef/liveStateRef/onLiveContentChange/hasLiveContent hybrid (and the
client chat-markdown util + test) is removed — the server is now authoritative.
Tests: flushAssistant unit (status shapes + parts parity), chat-markdown.util
unit (incl. legacy NULL-status + interrupted note + ru + normalizeLang locale
tags), controller export wiring + owner-gate, integration update/sweepStreaming.
Verified: server build + 318 ai-chat unit + 3 integration; client tsc + 157
ai-chat unit; and END-TO-END in a real browser — a chat turn persists mid-stream
and the Copy button exports the DB-sourced markdown (showing the in-progress
row), HTTP 200 after the locale fix.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tokens were only counted post-hoc (onFinish) and the header badge updated only on
chat open/switch; reasoning wasn't requested or shown. Now a counter ticks LIVE
during generation and surfaces reasoning ("thinking") tokens separately, like
Claude Code's `Thinking… · N tokens`.
Architecture (AI SDK v6): no provider gives exact per-token usage mid-stream, so
the live number is a cheap client estimate (chars/≈4) reconciled to AUTHORITATIVE
provider usage at step boundaries and turn end. The useChat per-delta re-render is
the existing realtime engine.
- server: `chatStreamMetadata` now also forwards usage on `finish-step` + `finish`;
`sendReasoning: true`; persisted `metadata.usage` carries `reasoningTokens`
(normalized from `outputTokenDetails` or the deprecated field).
- client: pure `count-stream-tokens` (estimateTokens / liveTurnTokens, prefers
authoritative usage else estimate); `Thinking… · N tokens` in the typing
indicator; collapsible "Thinking" reasoning block; throttled (~8 Hz) live
turn-token header badge; `reasoningTokens` in types + Markdown export.
Review fixes folded in:
- v6 `finish-step.usage` is PER-STEP, not cumulative — the server now ACCUMULATES
a running sum (new pure `accumulateStepUsage`) and sends the cumulative, which
converges to `finish.totalUsage`, so the live counter never jumps DOWN on a
multi-step agent turn.
- reasoning double-count: the authoritative turn-total is attributed to a block
ONLY for a single-reasoning-part (one-step) turn; multi-step blocks each show
their own estimate (the authoritative total stays in the header).
- no "0" badge flash at turn start (require live > 0, else show context size).
- comment refreshed (finish-step trigger).
Tests: server `accumulateStepUsage` + updated `chatStreamMetadata` (34 in the
suite); client pure-fn tests. Both tsc clean; 162 client ai-chat + the ai-chat
server suite pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Addresses the PR #138 review.
Blocker 1 — duplicate chat row: a brand-new chat whose first turn errors BEFORE
the SSE 'start' chunk never receives the authoritative chatId, so metadata
adoption can't run; a retry then sent chatId:null and the server inserted a
SECOND chat row, orphaning the first turn. Keep metadata adoption as the primary
path (resolveAdoptedChatId) and add a bounded, unambiguous fallback: on a
new-chat finish with no server id, snapshot the known chat ids and, once the
list refetch lands, adopt the SINGLE newly-appeared id (pickNewlyCreatedChatId).
Zero or >1 new ids (e.g. two tabs racing) → no adoption — no items[0] guessing,
so #137 stays fixed. The wait-for-refetch guard compares set membership (robust
to a concurrent delete), and the diff dedupes so a repeated id from a paginated
list never reads as ambiguous.
Blocker 2 — tests: new adopt-chat-id.test.ts covers both pure helpers (adopt
decision + newly-created-id diff incl. dedupe/reorder); the server
messageMetadata callback is extracted to chatStreamStartMetadata and unit-tested
(start -> {chatId}, otherwise undefined).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A provider error (e.g. read ECONNRESET) routed the turn through the
streamText onError callback, which persisted an EMPTY assistant record
(buildErrorAssistantRecord -> text:'', parts:[]). The answer text already
streamed to and shown by the client was therefore lost from the persisted
row, the chat export, and reopened history — leaving only the error line.
The AI SDK v6 onError callback receives only { error } (no steps/text),
and the visible final answer streams in the last, not-yet-finished step,
so it is absent from every finished step.text. Accumulate it ourselves:
onChunk folds each 'text-delta' into inProgressText; onStepFinish moves a
finished step into capturedSteps and resets inProgressText. onError and
onAbort now persist the partial answer (finished steps' text + tool parts
via assistantParts, then the in-progress text appended last) through a new
shared pure helper buildPartialAssistantRecord, recording the cause in
metadata.error on the error path. Replaces buildErrorAssistantRecord; its
empty-turn shape is preserved when nothing streamed.
Complementary to the resilient-fetch reconnect: that reduces how often a
turn dies; this preserves what was produced when it dies anyway.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add ~330 tests across server (Jest), client (Vitest), editor-ext (Vitest)
and packages/mcp (node:test) for the gitmost features added since
053a9c0d: AI chat, AI agent roles, public-share assistant, MCP per-user
auth, HTML embed, page templates/embed, realtime tree, tree
expand/collapse, and the AI-settings UI.
Test-tooling fixes (prerequisite, were silently hiding coverage):
- Repair 3 page-template specs broken by the 11-arg TransclusionService
constructor; they never compiled, so template access-control / content
-leak / unsync-strip coverage was fictitious.
- Build @docmost/editor-ext before server tests via a `pretest` hook;
the stale dist omitted the new HtmlEmbed/PageEmbed exports (TS2305).
- Let jest resolve the .tsx email templates: add `tsx` to
moduleFileExtensions and widen the ts-jest transform to (t|j)sx?.
Behaviour-preserving "extract pure core" refactors that the tests drive:
- server: resolveShareAssistantRequest + uiMessageTextLength
(public-share controller), decideBasicGate + mapAuthResultToResponse
(mcp), buildErrorAssistantRecord (ai-chat), jsonbObject export (roles).
- client: render-raw-html + shouldExecute/canEdit, decide-embed-state,
page-embed picker utils, tree-socket reducers, open/close branch maps,
isEndpointConfigured/resolveKeyField; buildTreeWithChildren now treats
a permission-trimmed orphan as a root instead of crashing.
Deferred (need a test DB or HTTP harness, documented in the specs):
repo-level Postgres integration tests and the public-share XFF E2E.
Pre-existing DI/lib0-ESM suite failures are untouched and out of scope.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Integrate the already-merged step-limit work from develop. Only conflict was
ai-chat.service.spec.ts: both sides appended a describe block and edited the
import line. Resolved as a union — keep compactToolOutput + the assistantParts/
serializeSteps/rowToUiMessage suites (this branch) AND the prepareAgentStep
suite (develop), importing all symbols from ai-chat.service.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Unit tests for the safety-critical paths: crypto secret-box (round-trip,
tamper detection, wrong key), the SSRF guard (blocked ranges + DNS-rebinding),
the ai-chat tools service, the page-embedding repo, and the
assistant-parts/serialization helpers. Those server helpers (assistantParts,
rowToUiMessage, serializeSteps) are exported ONLY for the tests — no runtime
change.
Also: keyboard a11y on the chat history header and conversation rows
(role/tabIndex/Enter+Space), and DRY refactors that move shared logic into one
place (isToolPart -> tool-parts util; buildInitialValues in the MCP form).
The behaviour-changing edits that previously rode along in this commit are
split out into the following two commits, per review.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Port two refinements from the GLM variant onto the Claude base:
- prepareAgentStep: add a comment note that AI SDK v7 renames the per-step
`system` field to `instructions` (v6 ^6.0.134 still uses `system`), so it
gets updated correctly on the next SDK bump.
- ai-chat.service.spec: add an explicit off-by-one boundary test for
prepareAgentStep, expressed via MAX_AGENT_STEPS instead of a hardcoded 18/19
so it tracks the constant if the cap changes.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A narrow research question could burn all 8 steps on tool calls and end the
turn with no assistant text (empty turn). Two changes:
- MAX_AGENT_STEPS = 20 (was a magic stepCountIs(8)) so multi-search turns
aren't cut off mid-investigation.
- prepareStep reserves the LAST allowed step for a text-only synthesis:
toolChoice 'none' + a FINAL_STEP_INSTRUCTION appended to (not replacing)
the system prompt, so a tool-heavy turn always ends with a real answer.
Logic extracted into the pure, exported prepareAgentStep(stepNumber, system)
for unit testing; earlier steps return undefined (default behavior).
Implements docs/backlog/ai-chat-step-limit-and-forced-final-answer.md.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Read tools (getPage, getPageJson, getNode, diffPageVersions,
exportPageMarkdown) return whole pages with no size cap. Their outputs
were stored verbatim in metadata.parts and the tool_calls column, and
metadata.parts is replayed to the provider on every later turn via
convertToModelMessages. After reading a couple of large pages the prompt
grew by full page bodies each turn — rising token cost, latency and DB
row size.
Add compactToolOutput(): a pure, recursive, size-bounded compactor used
in assistantParts() and serializeSteps(). It preserves the value's kind
and small scalar fields (id/title/pageId, which the client reads to build
citations on reload) while truncating long strings, capping long arrays
with a marker, and collapsing subtrees past a depth limit. Small outputs
are returned unchanged by identity. Tool inputs are left intact so
replayed tool_use arguments keep their object shape.
Compaction runs only at persistence time (onFinish/onAbort), so the live
stream and the current turn's multi-step reasoning still see full bodies.
Add unit tests for compactToolOutput.