Commit Graph

84 Commits

Author SHA1 Message Date
claude_code
b6787cc542 fix(ai-chat): drain stream on client disconnect to stop heap-OOM leak
The /api/ai-chat/stream and public-share streaming paths piped streamText
output to the client socket via pipeUIMessageStreamToResponse, whose only
reader is that socket. On a client disconnect (pervasive Safari/proxy
ECONNRESET), backpressure stalled the stream: the controller aborted the
turn but nothing drained it, so streamText's onFinish/onError/onAbort never
fired. Cleanup (close leased MCP clients, persist partial) never ran and the
whole per-turn object graph (history, per-request toolset closures, captured
steps, SDK buffers) stayed rooted — accumulating across turns until the
default ~2GB heap saturated and the process crashed with
"Ineffective mark-compacts near heap limit - JavaScript heap out of memory".

Add the AI SDK v6 documented remedy: fire-and-forget
`result.consumeStream({ onError })` right after streamText(), which removes
backpressure and drains the stream independently of the client socket so the
terminal callbacks always fire and the turn's memory is released even when the
client has gone away. Applied to both the authenticated and public-share
stream services.

Also add `--heapsnapshot-near-heap-limit=2` to the prod start script so any
residual leak dumps a heap snapshot near OOM for diagnosis (no effect on
normal operation). Heap size stays ops-tunable via NODE_OPTIONS.

- apps/server/src/core/ai-chat/ai-chat.service.ts
- apps/server/src/core/ai-chat/public-share-chat.service.ts
- apps/server/package.json

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-25 03:59:32 +03:00
claude code agent 227
a14560c7c9 fix(ai-chat): raise undici's 300s stream timeout for long agent turns (#175)
Long research turns failed mid-task with "Lost connection to the AI provider".
Node's global fetch (undici) defaults BOTH headersTimeout and bodyTimeout to
300_000ms, and the chat provider + the external-MCP dispatcher both ran on it
with no override, so:
  - the z.ai chat stream dropped when a late step's huge accumulated context
    pushed the model's time-to-first-token past 5 min (the model reasons
    server-side with NO streamed reasoning, so the connection is silent until the
    first answer token — reproduced: even a trivial glm-5.2 query has a ~4-8s
    first-chunk gap; a long run reaches 400k+-token steps), or a reasoning model
    paused >5 min between chunks (bodyTimeout);
  - the crawl4ai SSE transport, held open across the whole turn, dropped when it
    idled >5 min between tool calls.

Fix: a dedicated undici dispatcher whose stream timeouts are raised to a
generous-but-FINITE silence timeout (default 15 min, AI_STREAM_TIMEOUT_MS) on
each path. NOT disabled (0): that would let a genuinely hung provider — with the
client still connected — hang forever, since the turn's abortSignal only fires on
client disconnect. The timeout bounds SILENCE (time-to-first-byte and the gap
BETWEEN chunks), NOT total turn duration, so an arbitrarily long turn that keeps
streaming is never cut; only a stream quiet for >15 min is treated as a hang.
  - ai-streaming-fetch.ts: createStreamingFetch() + streamTimeoutMs() /
    streamingDispatcherOptions() (the shared, configurable timeout).
  - ai.service: the chat provider fetch is createStreamingFetch(), wrapped by the
    existing passive ECONNRESET telemetry (createDiagnosticFetch gained an
    optional baseFetch) so the telemetry observes the SAME transport.
  - mcp-clients: the SSRF-pinned Agent uses streamingDispatcherOptions().

Investigation: reproduced the transport mechanism against the real z.ai endpoint
(a 1ms headersTimeout throws UND_ERR_HEADERS_TIMEOUT — the exact drop) and ran
the actual research agent to a ~428k-token context. Verified the fixed path
streams cleanly live (glm-5.2 turns finish; telemetry confirms the streaming
fetch is in use).

Tests: ai-streaming-fetch.spec (default 15m + env override + invalid fallback +
both-timeouts + streams a delayed response); ai-http-diagnostics + ai/mcp specs
green. server tsc clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 22:09:10 +03:00
claude_code
13cac155c1 chore(ai-chat): add temporary Safari stream-drop diagnostics
Investigate the Safari-only "Lost connection to the AI provider" mid-stream
disconnect (Chrome unaffected). Pure instrumentation, no behavior change:
the 15s heartbeat interval and all stream callbacks are unchanged.

- sse-resilience.ts: startSseHeartbeat() gains an optional onBeat hook fired
  after each successfully written ping (beat counter).
- ai-chat.service.ts: track stream start, first-chunk latency, model-silent
  gap and heartbeat count; log them on finish/error/abort to classify the
  drop (idle-gap vs hard wall-clock cap vs slow first chunk).
- ai-chat.controller.ts: append elapsed-since-request to the disconnect warn.

All blocks tagged "DIAGNOSTIC ... temporary" for easy removal once the Safari
failure mode is identified.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 15:14:29 +03:00
9225eeeeed Merge pull request 'feat(ai-chat): realtime token counter + reasoning tokens (#151)' (#158) from feat/ai-chat-realtime-tokens into develop
Reviewed-on: #158
2026-06-24 13:07:51 +03:00
claude code agent 227
0ebb1adce8 feat(ai-chat): realtime token counter + reasoning tokens, Claude-Code style (#151)
Tokens were only counted post-hoc (onFinish) and the header badge updated only on
chat open/switch; reasoning wasn't requested or shown. Now a counter ticks LIVE
during generation and surfaces reasoning ("thinking") tokens separately, like
Claude Code's `Thinking… · N tokens`.

Architecture (AI SDK v6): no provider gives exact per-token usage mid-stream, so
the live number is a cheap client estimate (chars/≈4) reconciled to AUTHORITATIVE
provider usage at step boundaries and turn end. The useChat per-delta re-render is
the existing realtime engine.

- server: `chatStreamMetadata` now also forwards usage on `finish-step` + `finish`;
  `sendReasoning: true`; persisted `metadata.usage` carries `reasoningTokens`
  (normalized from `outputTokenDetails` or the deprecated field).
- client: pure `count-stream-tokens` (estimateTokens / liveTurnTokens, prefers
  authoritative usage else estimate); `Thinking… · N tokens` in the typing
  indicator; collapsible "Thinking" reasoning block; throttled (~8 Hz) live
  turn-token header badge; `reasoningTokens` in types + Markdown export.

Review fixes folded in:
- v6 `finish-step.usage` is PER-STEP, not cumulative — the server now ACCUMULATES
  a running sum (new pure `accumulateStepUsage`) and sends the cumulative, which
  converges to `finish.totalUsage`, so the live counter never jumps DOWN on a
  multi-step agent turn.
- reasoning double-count: the authoritative turn-total is attributed to a block
  ONLY for a single-reasoning-part (one-step) turn; multi-step blocks each show
  their own estimate (the authoritative total stays in the header).
- no "0" badge flash at turn start (require live > 0, else show context size).
- comment refreshed (finish-step trigger).

Tests: server `accumulateStepUsage` + updated `chatStreamMetadata` (34 in the
suite); client pure-fn tests. Both tsc clean; 162 client ai-chat + the ai-chat
server suite pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 06:56:14 +03:00
claude code agent 227
0ec0af405a feat(ai-chat): per-role autoStart toggle + custom launchMessage (#149)
Agent role cards always auto-sent a hardcoded "Take a look at the current
document" on pick. Make it configurable per role:
- autoStart (bool, default true): whether picking the role auto-sends a message.
- launchMessage (nullable text): the text sent on auto-start; empty -> the
  built-in default. autoStart=false -> bind the role and send nothing (the user
  types the first message, which still carries the roleId).
Existing roles default to autoStart=true / launchMessage=null => identical old
behavior.

Full-stack:
- migration 20260624T120000 adds `auto_start boolean NOT NULL DEFAULT true` +
  `launch_message text` (additive; down drops both); db.d.ts updated by hand.
- DTO: autoStart (@IsBoolean) + launchMessage (trim @Transform, @MaxLength 2000).
- repo/service: thread + normalize (undefined=unchanged, ""=>null, autoStart??true).
  Both fields exposed in the picker-view for ordinary members (they decide
  whether/what to auto-send); instructions/modelConfig stay ADMIN-ONLY.
- client: IAiRole types, role form (Switch + Textarea, re-hydrated on edit),
  handleRolePick branches on autoStart; i18n en-US + ru-RU.

Review follow-ups folded in: reset the `rolePickedNoSend` flag when the thread
returns to an empty role-less state (the "New chat after autoStart=false pick"
stuck-UI bug — render-phase one-shot reset); made create/update launchMessage
normalization symmetric (raw value, server normalizes ""→null).

Server: 68 role tests pass, tsc clean. Client: tsc clean, role tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 06:28:41 +03:00
claude_code
5161de8ba9 revert(ai-http): drop resilient fetch/RetryAgent layer (#140)
The custom undici RetryAgent + aiFetch transport added for issue #140
did not actually heal mid-stream provider drops: undici's retry path is
a Range-based download-resume that SSE/chat-completions endpoints cannot
satisfy, so a reset after the first byte only swapped ECONNRESET for a
"server does not support the range header" error. Its only real effect
was reconnecting a poisoned keep-alive socket before the first byte, and
PR #141 on top of it turned the 60s headers timeout into deterministic
~61s failures (plus CONTENT_LENGTH_MISMATCH from retrying a POST body
after a timeout abort). The root cause is the z.ai coding endpoint, not
our transport.

Remove the whole layer and return all AI provider calls to Node's
default global fetch.

- delete integrations/ai/ai-http.ts and its spec
- ai.service.ts: drop the aiFetch import, the AI_BYPASS_RESILIENT_FETCH
  diagnostic toggle, and fetch:aiFetch from every chat/embedding/STT
  factory; raw STT call back to global fetch
- ai-chat.controller.ts: drop the stream-timing START log + startedAt
- ai-chat.service.ts: drop the first-chunk/FINISHED/ERROR timing logs
- .env.example: drop AI_BYPASS_RESILIENT_FETCH

Reverts: 1af5d34a, 7c308728, b7abb7ea, 35fc58ea, d6cd2754, 6efb8656.
Preserved (not part of the rollback): client-disconnect abort, title
generation in onFinish, partial-answer persistence, Safari SSE heartbeat.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 18:48:33 +03:00
97002f318a Merge pull request 'fix(ai-chat): adopt the server-returned chat id (two-tab adoption race #137)' (#138) from fix/ai-chat-chatid-adoption into develop
Reviewed-on: #138
2026-06-23 03:35:03 +03:00
claude_code
fd66ee6cce fix(ai-chat): stop title generation racing the chat stream (provider stall)
A new-chat turn fired the chat stream (streamText) and title generation
(generateText) concurrently to the same z.ai coding endpoint. That plan
stalls one of two concurrent requests, so the chat stream black-holed for
~300s (undici headers timeout) and the turn hung forever in every browser;
the AI SDK then retried 3x. Server logs showed two concurrent POSTs to
/chat/completions per turn — one 200 in ~8s, the other "fetch failed after
301209ms". Bypassing the custom undici transport did not help, confirming
the cause is the concurrency, not the transport.

Move generateTitle from before the response pipe into onFinish, so it runs
solo AFTER the stream's provider call completes. A first turn that errors or
aborts no longer auto-titles (fallback "Untitled chat" already handles a
null title) — acceptable, and it removes the request that was stalling.
2026-06-23 02:41:14 +03:00
claude code agent 227
f59ca3cb0d refactor(ai-chat): extract useChatSession hook + lock the id lifecycle with tests
Addresses the 2nd PR #138 review (test debt + the Variant-B architecture ask).

The new→persisted chat id lifecycle (mount key, both adoption paths, the
history-load latch, the render-phase reconciler, onTurnFinished) is moved out of
the 768-line window into a new useChatSession hook driven by a pure
threadSessionReducer (reconcile/adopt), so adopt-vs-switch is one explicit
dispatch point and the scattering the review flagged is gone (window: 768→~620).

Tests (the blockers):
- use-chat-session.test.tsx — hook-level locks incl. the #137 regression
  (adopts the authoritative streamed id 'A', NOT chats.items[0]='B' — fails on
  the old heuristic), the error-path fallback (arm/adopt/ambiguous/add+delete),
  the disarm-on-reconcile lock (a fallback armed then switched away must not be
  adopted by a late refetch), in-place-adopt-keeps-key vs external-switch-remount,
  and the waitingForHistory latch.
- extractServerChatId (reading message.metadata.chatId) and newlyAddedChatIds
  extracted as pure helpers with unit tests; threadSessionReducer tested.

Cleanups: single canonical #137 explanation in adopt-chat-id.ts (other sites
reference it); fallback effect computes the set diff once; invalidate callbacks
memoized; redundant invariant tests folded.

Behavior preserved — re-verified live (z.ai glm-5.2): new-chat adopt + 2nd turn
in the same row, no mid-conversation remount, two-tab race leak-free, switch to
an existing chat reseeds full history, reload restores history.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 02:25:52 +03:00
claude_code
7c308728de chore(ai-chat): add stream timing logs + env-gated aiFetch bypass (diagnostics)
The streaming chat turn hangs in all browsers while the non-streaming test
endpoint works — both use the same model/transport (createOpenAI + aiFetch),
so the suspect is the streaming path / custom undici RetryAgent transport.

- ai-http.ts: wrap aiFetch with per-request timing logs (start, ms-to-headers
  on success, elapsed ms + cause on failure). Chat at info, embeddings at
  debug. Only host+path logged.
- ai-chat.controller.ts / ai-chat.service.ts: log turn START, first-chunk
  latency, FINISHED duration, and elapsed ms on disconnect/error/abort.
- ai.service.ts: AI_BYPASS_RESILIENT_FETCH=true makes the CHAT model omit
  fetch:aiFetch and use the default global fetch — isolates transport vs
  request-shape. Chat-only; embeddings/STT untouched; reversible via env.
- .env.example: document the flag.

No timeout/retry change. tsc clean; ai-chat + ai suites pass (292).
2026-06-23 02:13:54 +03:00
claude code agent 227
580f3442b8 fix(ai-chat): prevent duplicate chat row on first-turn error; add adoption tests
Addresses the PR #138 review.

Blocker 1 — duplicate chat row: a brand-new chat whose first turn errors BEFORE
the SSE 'start' chunk never receives the authoritative chatId, so metadata
adoption can't run; a retry then sent chatId:null and the server inserted a
SECOND chat row, orphaning the first turn. Keep metadata adoption as the primary
path (resolveAdoptedChatId) and add a bounded, unambiguous fallback: on a
new-chat finish with no server id, snapshot the known chat ids and, once the
list refetch lands, adopt the SINGLE newly-appeared id (pickNewlyCreatedChatId).
Zero or >1 new ids (e.g. two tabs racing) → no adoption — no items[0] guessing,
so #137 stays fixed. The wait-for-refetch guard compares set membership (robust
to a concurrent delete), and the diff dedupes so a repeated id from a paginated
list never reads as ambiguous.

Blocker 2 — tests: new adopt-chat-id.test.ts covers both pure helpers (adopt
decision + newly-created-id diff incl. dedupe/reorder); the server
messageMetadata callback is extracted to chatStreamStartMetadata and unit-tested
(start -> {chatId}, otherwise undefined).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 01:17:30 +03:00
claude_code
1b4de2b420 fix(ai-chat): keep SSE stream alive in Safari (heartbeat + strip hop-by-hop headers)
Safari/WebKit dropped the AI chat answer stream mid-turn ("Load failed",
shown as "Lost connection to the server") while Chrome/Firefox were fine.
Two Safari-specific causes: (1) during model think/tool gaps the UI-message
SSE stream emits no bytes and WebKit aborts a non-progressing fetch far more
aggressively than Chrome; (2) the AI SDK sets a hop-by-hop `Connection:
keep-alive` header which is illegal on HTTP/2 — Chrome/Firefox ignore it,
Safari rejects the whole response. Earlier commits only improved the error
text, never the drop itself.

Add apps/server/src/core/ai-chat/sse-resilience.ts with two helpers wired into
both stream paths (authenticated + public share):
- startSseHeartbeat: writes a `: ping` SSE comment every 15s (ignored by the
  client's EventSourceParserStream) so bytes keep flowing; unref'd timer,
  guarded writes, auto-clear on finish/close.
- stripStreamingHopByHopHeaders: wraps writeHead once to drop Connection/
  Keep-Alive before the head is sent, so they can never leak into an HTTP/2
  response.
Add sse-resilience.spec.ts (7 tests). tsc + eslint clean.
2026-06-23 01:02:55 +03:00
claude code agent 227
1858a5800d fix(ai-chat): adopt the server-returned chat id, not the newest in the list
A brand-new chat (activeChatId === null) had no way to learn the id of the row
the server created: the SSE stream never returned it, so the client adopted the
NEWEST chat in the per-user list (chats.items[0]). With two tabs open, a second
tab creating a chat at ~the same time made its row the newest, so the first tab
adopted the wrong id — its later turns persisted into the other chat and the
agent rebuilt history from it (commands leaked between chats), while the live UI
still showed the original conversation. (#137)

The server now attaches the authoritative chatId to the streamed assistant
message via the AI SDK messageMetadata on the 'start' part, so it reaches the
client on the first chunk. The client reads message.metadata.chatId in useChat's
onFinish and adopts that id in place (no remount, so the live turn and the
thread's chatIdRef follow the real id and the next turn targets the right chat).
The chats.items[0] guess and the adoptNewChat ref are removed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 23:46:50 +03:00
claude_code
fc262636ab fix(ai-chat): persist partial answer when a turn errors mid-stream
A provider error (e.g. read ECONNRESET) routed the turn through the
streamText onError callback, which persisted an EMPTY assistant record
(buildErrorAssistantRecord -> text:'', parts:[]). The answer text already
streamed to and shown by the client was therefore lost from the persisted
row, the chat export, and reopened history — leaving only the error line.

The AI SDK v6 onError callback receives only { error } (no steps/text),
and the visible final answer streams in the last, not-yet-finished step,
so it is absent from every finished step.text. Accumulate it ourselves:
onChunk folds each 'text-delta' into inProgressText; onStepFinish moves a
finished step into capturedSteps and resets inProgressText. onError and
onAbort now persist the partial answer (finished steps' text + tool parts
via assistantParts, then the in-progress text appended last) through a new
shared pure helper buildPartialAssistantRecord, recording the cause in
metadata.error on the error path. Replaces buildErrorAssistantRecord; its
empty-turn shape is preserved when nothing streamed.

Complementary to the resilient-fetch reconnect: that reduces how often a
turn dies; this preserves what was produced when it dies anyway.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 20:30:59 +03:00
claude_code
7ce1a24f82 feat(ai-chat): show creation time and origin document in chat list
Each chat row in the AI-chat history now shows a dimmed second line with
how long ago the chat was created and the document it was created in
("N ago / <document>", or "No document" when started outside a page).

Server:
- New migration: nullable ai_chats.page_id (FK pages.id, ON DELETE SET NULL).
- Capture the origin page at chat creation from the client-supplied openPage,
  but validate it first: it must be a real page in the same workspace that the
  user may read (PageAccessService.validateCanView), else null. This keeps the
  "openPage.id is attacker-controllable but harmless" invariant - preventing a
  cross-workspace/cross-space page-title leak and a post-hijack FK crash.
- findByCreator left-joins pages (scoped by workspace, defense-in-depth) and
  returns pageTitle.

Client:
- IAiChat gains pageId/pageTitle; ConversationList renders a ChatMetaLine
  (useTimeAgo + origin document) as a dimmed second line.
- Add i18n key "No document" (en-US, ru-RU).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 16:16:26 +03:00
claude_code
3c3fb0816a Merge fix/embed-indexer-failfast-auth: fail-fast embeddings reindex on fatal provider errors 2026-06-22 03:46:24 +03:00
claude_code
f543e79c3e fix(ai-embedding): abort bulk reindex on fatal provider errors
reindexWorkspace isolated every per-page failure, so an invalid/missing
API key (401 "User not found") made all pages fail identically while the
batch kept issuing hundreds of doomed requests against the provider.

Add isFatalProviderError() (401/403 auth, 402 billing) and abort the
whole batch on such errors; 429 rate-limit and embedding timeouts stay
per-page isolated. Adds unit tests for the predicate and a regression
test for the abort/iterate control flow.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 03:46:17 +03:00
claude_code
1c9785997a fix(ai-chat): surface dropped-stream errors clearly + log client disconnects
A mid-stream connection drop showed a generic "Something went wrong / Load
failed" banner and left no server-side trace.

- error-message: classify the browsers' own fetch-failure strings ("Load
  failed" on WebKit, "Failed to fetch" on Chrome, "NetworkError" on Firefox)
  as a lost connection, so the banner names the cause instead of the generic
  heading.
- ai-chat.controller: log a warning in the request close handler when the
  client disconnects before completion, so a drop that reaches the app (e.g. a
  reverse proxy cutting the SSE) is visible in the server logs before the abort.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 03:44:25 +03:00
claude_code
4201f0a313 feat(comments): make AI comments inline-only with robust anchoring
The in-app AI chat hardcoded type='page' and the shared createComment
swallowed anchoring failures silently, so agent comments never got a
text anchor/highlight.

- Forbid page-type comments for the agent: top-level comments are always
  inline and require an exact `selection`; replies inherit the parent
  anchor (stored as the historical `page` type).
- Throw and roll back the just-created comment when the selection cannot
  be anchored, instead of leaving an orphan unanchored comment.
- Add comment-anchor module: text normalization (smart quotes, dashes,
  nbsp, collapsed whitespace) and matching across adjacent text nodes
  within a block, so selections crossing inline-code/bold/link anchor.
- Update create_comment (MCP) and createComment (ai-chat) tool schemas
  and descriptions; add unit + mock-HTTP orchestration tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 23:06:49 +03:00
claude_code
9a9b61b9a3 feat(ai-chat): log aborted stream turns in onAbort
The onAbort terminal path persisted the partial turn but wrote nothing
to the log, so a turn killed by a client disconnect / proxy drop / stop()
was invisible in the logs (unlike onError and the controller catch, which
both log). Add a logger.warn with the chat id, completed step count and
partial-text length so an aborted turn is traceable.
2026-06-21 21:21:48 +03:00
claude_code
4f8015b342 Merge branch 'develop' into test/coverage-refactor 2026-06-21 19:12:13 +03:00
claude_code
3d4ad664b3 test(refactor-tail): extract pure cores + cover collab/share/ai-chat/client gate
Batches 6-9: behaviour-preserving extractions of testable pure cores plus the
tests they unblock, and a fix for the broken client test environment.
Full suites green: server 113 suites / 1117 + 1 todo, client 30 files / 338.

client (R0 infra):
- vitest.setup.ts: in-memory localStorage/sessionStorage Storage stub wired via
  setupFiles. Unblocks menu-items.gating.test.ts (was 9 failing) -> client suite
  fully green. + menu-items.suggestions.test.ts (getSuggestionItems filter/sort).

share:
- extract buildShareMetaHtml (share-seo.util.ts) from the SEO controller; tests
  for reflected-XSS escaping in <title>/og/twitter meta, noindex, truncation;
  extractPageSlugId; updateAttachmentAttr; prepareContentForShare comment-strip
  (anonymous-viewer metadata-leak guard).

ai-chat (security extractions):
- selectAccessibleHits: CASL post-filter for semantic search (restricted page in
  an accessible space must NOT leak to the agent).
- validateResolvedAddresses: SSRF connect-time guard (block if ANY resolved
  address is private).
- resolveAudioFormat: mime whitelist (dead `?? 'webm'` fallback dropped, set
  unchanged). + mcp-servers toView header-leak guard, MCP tool namespacing.

collaboration (data-loss area):
- extract computeHistoryJob (pins the "agent delay MUST stay 0" invariant) and
  resolveSource. Integration: onAuthenticate read-only matrix (collab auth
  bypass), HistoryProcessor (contributor restore on save failure), onStoreDocument
  Approach-A boundary snapshot (human revision pinned before agent overwrite).

Reviewed (APPROVE WITH SUGGESTIONS): extractions behaviour-preserving, security
tests mutation-resistant.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 19:10:27 +03:00
claude_code
f3fa15e746 refactor(ai-chat): shared tool-spec registry for identical tools; formalize integration db factory
Implements two architecture follow-ups from the multi-aspect review.

1. Shared, zod-agnostic tool-spec registry (packages/mcp/src/tool-specs.ts)
   for the 14 AI tools whose name + schema + model-facing description are
   genuinely identical across the standalone MCP server and the in-app
   AI-SDK chat. Both layers consume it (registerShared in index.ts;
   sharedTool in ai-chat-tools.service.ts) and keep their own execute/auth.
   - Zod-agnostic builders (z) => ZodRawShape bridge the zod v3 (mcp) vs
     zod v4 (server) split; the registry imports no zod.
   - Folds in the documented edit_page_text drift-bug fix: the stale
     "strip-and-retry tolerated" claim is gone; canonical wording states a
     formatting-only change is refused into failed[].
   - Sibling-tool references in shared descriptions are transport-neutral so
     one description is correct for both snake_case (MCP) and camelCase
     (in-app) tool names.
   - Loader fail-fast guard for a stale @docmost/mcp build.
   - The ~17 intentionally-divergent tools (security guardrails, tuned UX)
     stay per-layer, untouched.
   - Rebuilt committed mcp artifacts (also regenerates a previously stale
     build/lib/docmost-schema.js to match its already-committed source).

2. Formalize apps/server/test/integration/db.ts as the canonical
   integration-test seed factory (module doc + a shortId helper); the
   hand-written minimal seeders are kept on purpose, decoupled from the
   app service-layer side effects.

Verified: server tsc + lint clean, mcp build clean; mcp unit tests 261 pass,
ai-chat-tools.service 16 pass, public-share-chat-tools 8 pass, ai-chat suite
224 pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 18:57:00 +03:00
claude_code
d4658d4cb3 Merge pull request '#114 refactor(ai-chat): shared parseNodeArg helper; keep duplication backlog doc' (#114) from refactor/ai-chat-tool-spec-registry into develop
# Conflicts:
#	apps/server/src/core/ai-chat/tools/ai-chat-tools.service.ts
2026-06-21 14:45:20 +03:00
claude_code
4105836a2d Merge pull request '#112 test(ai-chat): current-page coverage + getCurrentPage helper' (#112) from feat/ai-chat-current-page-robustness into develop 2026-06-21 14:31:12 +03:00
claude_code
c7f0b51389 fix(ai-chat): keep tool-duplication backlog doc; fix parseNodeArg comment
Pre-merge review follow-up for the parseNodeArg dedupe (PR #114):
- Restore docs/backlog/ai-chat-tool-definitions-duplicated.md instead of
  deleting it: it still tracks open debt (unified spec registry + ProseMirror
  <-> Markdown converter unification) that this branch defers, and
  docs/git-sync-plan.md links to its converter section. Mark the node-arg
  quirk as done and add a Progress section.
- Reword the in-app helper header from "byte-for-byte" to "behaviorally
  identical": the two copies differ in comments/quote style; only the logic,
  throw messages and branch order match.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 14:30:37 +03:00
claude_code
6397b500ba fix(share-ai): lower default per-workspace cap to 100 (#62)
The fail-closed limiter behavior (#62 primary item) already shipped; this
finishes the issue by lowering the default hourly per-workspace cap from 300
to 100 to better fit real anonymous-assistant load. Still overridable via
SHARE_AI_WORKSPACE_MAX_PER_HOUR.

- public-share-workspace-limiter.ts: SHARE_AI_WORKSPACE_MAX_PER_WINDOW 300 -> 100.
- .env.example: documented default + example value 300 -> 100.
- public-share-chat.spec.ts: update the default-cap assertion to 100.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 14:24:18 +03:00
claude code agent 227
f9757fda12 refactor(ai-chat): dedupe node-arg JSON normalization into a shared helper
First, safe step of docs/backlog/ai-chat-tool-definitions-duplicated.md: the
"node may be a JSON object OR a JSON string" quirk was hand-copied at 6 tool
sites. Extract it into a single parseNodeArg() helper per package and call it at
every site. Behavior-preserving — each site's throw message is byte-identical
(patch/insert: 'node was a string but not valid JSON'; update_page_json: 'content
was a string but not valid JSON'); no tool name/description/schema changed.

Two helper copies (packages/mcp/src/lib/parse-node-arg.ts and
apps/server/src/core/ai-chat/tools/parse-node-arg.ts) are intentional: the
ESM-only @docmost/mcp cannot be imported by the CommonJS server (it is loaded at
runtime via the Function('import()') trick), so runtime code cannot cross that
boundary by a normal import. Each copy is now the single source within its
package (6 inline copies -> 2 helpers). packages/mcp/build rebuilt in sync.

Tests: parse-node-arg.spec.ts (server, Jest) + parse-node-arg.test.mjs (mcp,
node:test) — object passthrough, valid-string parse, invalid-string throw with
the right message. Server tsc clean; mcp suite 254 pass; agent structural-edit
path verified live in-browser (agent inserted a node, persisted to the doc).

Deferred (documented for the record, since the backlog doc is removed with this
commit): the FULL transport-agnostic tool-spec registry (one name+schema+
description per tool shared by both transports) and deriving DocmostClientLike
from the real client type. Both are blocked by the current architecture, not by
effort: (1) @docmost/mcp ships no type declarations and is ESM-only, so a
type-only derivation needs declaration emission + tsconfig path wiring, and the
real client's precise return types break the in-app tool test stubs (attempted,
reverted to keep tsc green); (2) the two transports intentionally DIVERGE in tool
NAMES (snake_case x38 vs camelCase x41), membership (in-app adds getCurrentPage/
listSidebarPages, omits delete_comment/image tools) and model-facing
DESCRIPTIONS, so a unified registry would change behavior on BOTH the agent and
external MCP clients and needs its own verification pass. This is forward-looking
debt (the code is correct today), to be done incrementally.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 06:51:09 +03:00
claude code agent 227
e6b1170553 test(ai-chat): cover current-page injection; extract resolveCurrentPageResult
The 'current page' feature (client useMatch openPage + server getCurrentPage
tool + system-prompt injection) was already implemented & merged; this backfills
its missing test coverage and removes the completed backlog doc.

- extract pure resolveCurrentPageResult(openedPage) into current-page.util.ts
  (byte-identical to the prior inline getCurrentPage tool body) so it is
  unit-testable without the dynamically-imported ESM Docmost client; the tool
  now delegates to it.
- current-page.util.spec.ts: 7 cases (null/undefined/no-id/empty-id/full/no-title).
- ai-chat.prompt.spec.ts: +8 cases for the openedPage context line (title+pageId
  present, Untitled fallback for blank/whitespace title, no line when absent/blank
  id, and sandwich ordering before the trailing safety block).

Verified live in-browser: client sends openPage{id,title} on a page and null
off-page; the agent invokes getCurrentPage and answers with the real title+id.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 06:21:38 +03:00
claude code agent 227
33c52045a2 test(share-ai): drive the non-text message-part 400 path (#103)
Covers the #63 guard: a message with a non-text part -> 400 'Unsupported message
content'; a message mixing text + a non-text part still 400s (before the 413
size check).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 05:52:15 +03:00
claude code agent 227
455a554054 Merge remote-tracking branch 'gitea/fix/review-batch-2' into fix/review-batch-2
# Conflicts:
#	.env.example
#	README.ru.md
2026-06-21 05:34:17 +03:00
claude code agent 227
7e26239c3f Merge remote-tracking branch 'gitea/develop' into fix/review-batch-2
# Conflicts:
#	AGENTS.md
#	CHANGELOG.md
#	README.md
#	apps/server/src/collaboration/collaboration.handler.ts
#	apps/server/src/common/helpers/prosemirror/html-embed.spec.ts
#	apps/server/src/common/helpers/prosemirror/html-embed.util.ts
#	apps/server/src/core/ai-chat/public-share-chat.service.ts
#	apps/server/src/core/ai-chat/public-share-chat.spec.ts
#	apps/server/src/core/ai-chat/public-share-workspace-limiter.ts
#	apps/server/src/core/page/services/page.service.ts
#	apps/server/src/core/page/transclusion/transclusion.service.ts
#	apps/server/src/integrations/import/services/file-import-task.service.ts
#	apps/server/src/integrations/import/services/import.service.ts
2026-06-21 05:32:44 +03:00
claude_code
bc0c49db05 fix(review): address PR #101 review findings (dead DI, docs)
Some checks failed
Test / test (pull_request) Has been cancelled
- ai-chat: drop the unused pagePermissionRepo injection from
  PublicShareChatToolsService (its only use moved into
  ShareService.resolveReadableSharePage); update all 5 positional
  test construction sites to match the 3-arg constructor.
- env: correct the anonymous share-AI per-workspace cap comment —
  the limiter FAILS CLOSED on Redis failure (#62), not open.
- docs: sync README.ru.md with README.md — move "Page templates"
  from Planned to Done and drop the dead plan-doc link.

Remaining test-coverage gaps tracked as #102, #103, #104, #105, #106.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 05:24:13 +03:00
claude code agent 227
3f46496192 refactor(share): single resolveReadableSharePage for the share access boundary (#92)
The '(shareId,pageId) -> usable non-restricted page in THIS share' boundary was
written as 3 must-be-identical async sequences. They weren't: the chat funnel
omitted an explicit page.deletedAt check (latently safe via getShareForPage's
CTE) and layered isSharingAllowed separately. Add ShareService.resolveReadable-
SharePage(shareId,pageId,workspaceId) running the single canonical sequence
(getShareForPage -> id match (skipped when null) -> findById -> !deletedAt ->
!hasRestrictedAncestor) returning {share,page}|null; getSharedPage, the funnel,
and the getSharePage tool all use it. hasRestrictedAncestor now lives in the one
method no caller can skip; the funnel still returns uniform 404s and keeps
isSharingAllowed. Adds a direct security-invariant test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 04:04:09 +03:00
claude code agent 227
3953ecdb17 refactor(ai-chat): single live+enabled role resolve in the repo (#95)
resolveRoleForRequest and resolveShareRole duplicated the security invariant
'role exists, not soft-deleted, enabled, workspace-scoped, else null'. Move it to
AiAgentRoleRepo.findLiveEnabled(id, workspaceId) (deletedAt IS NULL + enabled +
workspace scope) and have both services call it, preserving each one's roleId
derivation + null handling. (describeProviderError half of #95 was done earlier.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 03:49:52 +03:00
claude code agent 227
c486750b2a test-infra: re-enable 16 disabled server suites (jest DI + lib0 ESM) (#56)
16 suites were disabled via testPathIgnorePatterns due to two root causes: lib0
ESM not transformed (the @hocuspocus/server -> lib0/decoding.js chain) and stock
'should be defined' specs built via Test.createTestingModule without providers.
Add lib0 to transformIgnorePatterns; convert the 14 DI placeholders to direct
new X(...) instantiation with stub deps (keeping a real construct smoke test);
re-enable the suites. Also updates the public-share limiter test to assert the
fail-closed behavior from #62 (surfaced now that the suite runs). Full server
suite: 67 passed, 689 tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 03:40:40 +03:00
claude code agent 227
b597841cf0 test(ai-roles): cover update() happy-path return shape (#88)
The concurrent-soft-delete guard was already covered; add the missing assertion
that update() returns toView(updated) from the post-update re-fetch (full
AgentRoleView shape, distinct second findById row), so a regression returning the
stale pre-update view or leaking columns is caught.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 03:28:58 +03:00
claude code agent 227
317fdb9424 test(public-share): cover getSharePage positive + soft-deleted branches (#85)
The anonymous share page-fetch tool's positive branch (sanitize via
updatePublicAttachments then jsonToMarkdown before returning to the model) was
untested, so a dropped/reordered sanitizer would ship a comment-mark/raw-
attachment leak with green tests. Add a positive-branch test pinning the
sanitizer call + that markdown derives from sanitized content, and a soft-deleted
test asserting a generic error with no content fetch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 03:28:58 +03:00
claude code agent 227
342bb47b30 fix(ai-roles): validate chatModel + guard driver-enum drift (#52)
chatModel was a free string accepted with empty/garbage values, failing only at
runtime as a provider 503; tighten it (trim + non-empty + max 200). Driver was
already @IsIn(AI_DRIVERS). Collapse the client driver list to one AI_DRIVER_VALUES
source and add a contract test that reads the server AI_DRIVERS and fails on
client/server drift.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 03:28:58 +03:00
claude code agent 227
099d31f594 fix(ai): sandwich SAFETY_FRAMEWORK around the role persona (#68)
A custom AI-role's text preceded the only SAFETY_FRAMEWORK block and replaced
the persona, so a jailbreak in the role text sat before the safety rules.
buildSystemPrompt now emits SAFETY both before AND after the persona, with the
role/persona delimited as lower-trust (<role_persona note=...>); the default
persona is sandwiched too. Context (currently-viewing-page) preserved.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 03:17:37 +03:00
claude code agent 227
05a7a4001f fix(share-ai): cap per-request output + unify provider errors (#60, #95)
#60: streamText had no maxOutputTokens, so one anonymous request could run up
the provider bill. Add maxOutputTokens (env SHARE_AI_MAX_OUTPUT_TOKENS, default
512) via resolveShareAiMaxOutputTokens().
#95: the anonymous path hand-built error strings, diverging from the unified
describeProviderError format used on the authenticated path; both onError blocks
now call describeProviderError so a share reader sees 402/429/503 causes in the
same form (and the stack is still logged). Both changes are in this one file and
share hunks, hence one commit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 03:08:34 +03:00
claude code agent 227
2b4ec0bfcc fix(share-ai): reject non-text message parts to close size-cap bypass (#63)
MAX_SHARE_MESSAGE_CHARS only counted text parts, so a forged non-text part
(tool-result/file/data) bypassed the cap and bloated the model input
(token-DoS); convertToModelMessages would also expand a forged tool-result. The
anonymous path runs no tools, so a client non-text part is never legitimate —
reject any message with a non-text part (isTextUIPart) before the size check.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 03:07:53 +03:00
claude code agent 227
e19849d980 fix(share-ai): fail-closed workspace limiter on Redis failure (#62)
The per-workspace anonymous share-AI cost cap failed OPEN on a Redis error
(return true => admit), so a Redis outage removed the cap entirely (unmetered
billable anonymous calls). The feature is optional, so unavailability is
harmless: fail CLOSED (return false => controller 429s) instead.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 03:07:53 +03:00
claude_code
64818cf9df Merge branch 'feat/share-ai-cost-guards' into develop 2026-06-21 02:21:04 +03:00
claude_code
262a0707d9 feat(share-ai): cap per-request output tokens and fail closed on Redis loss
Harden the anonymous public-share AI assistant against token-cost abuse
before exposing it to the internet:

- Add an env-tunable per-request output ceiling (maxOutputTokens) to the
  public-share streamText call so one anonymous request cannot run up the
  provider bill even if the per-IP throttle is evaded. New
  resolveShareAiMaxOutputTokens() / SHARE_AI_MAX_OUTPUT_TOKENS_DEFAULT
  (env SHARE_AI_MAX_OUTPUT_TOKENS, default 512), mirroring
  resolveShareAiWorkspaceMax().
- Flip the per-workspace cost limiter to FAIL CLOSED on Redis failure
  (was fail-open): if Redis is unavailable we cannot prove the workspace is
  under its cap, so deny rather than admit an unmetered, billable call.
- Update the limiter spec (fail-open -> fail-closed) and add resolver tests;
  document both knobs in .env.example.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 02:15:54 +03:00
claude_code
3695dbdf7f Merge remote-tracking branch 'gitea/develop' into fix/ai-chat-current-page 2026-06-21 01:29:37 +03:00
claude_code
90d3fab483 test: cover features since 053a9c0d + repair test tooling
Add ~330 tests across server (Jest), client (Vitest), editor-ext (Vitest)
and packages/mcp (node:test) for the gitmost features added since
053a9c0d: AI chat, AI agent roles, public-share assistant, MCP per-user
auth, HTML embed, page templates/embed, realtime tree, tree
expand/collapse, and the AI-settings UI.

Test-tooling fixes (prerequisite, were silently hiding coverage):
- Repair 3 page-template specs broken by the 11-arg TransclusionService
  constructor; they never compiled, so template access-control / content
  -leak / unsync-strip coverage was fictitious.
- Build @docmost/editor-ext before server tests via a `pretest` hook;
  the stale dist omitted the new HtmlEmbed/PageEmbed exports (TS2305).
- Let jest resolve the .tsx email templates: add `tsx` to
  moduleFileExtensions and widen the ts-jest transform to (t|j)sx?.

Behaviour-preserving "extract pure core" refactors that the tests drive:
- server: resolveShareAssistantRequest + uiMessageTextLength
  (public-share controller), decideBasicGate + mapAuthResultToResponse
  (mcp), buildErrorAssistantRecord (ai-chat), jsonbObject export (roles).
- client: render-raw-html + shouldExecute/canEdit, decide-embed-state,
  page-embed picker utils, tree-socket reducers, open/close branch maps,
  isEndpointConfigured/resolveKeyField; buildTreeWithChildren now treats
  a permission-trimmed orphan as a root instead of crashing.

Deferred (need a test DB or HTTP harness, documented in the specs):
repo-level Postgres integration tests and the public-share XFF E2E.
Pre-existing DI/lib0-ESM suite failures are untouched and out of scope.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 23:40:40 +03:00
claude code agent 227
a6ba19f0dc feat(ai-chat): add get_current_page tool for proxy-robust page context (#43, hardness #2)
The current page id was only injected as text in the system prompt, which a
proxy (CLIProxyAPI) can rewrite/truncate, so the agent could lose track of 'this
page'. Add a getCurrentPage tool the model can call to read the open page (id +
title) from the server-side request context (forUser now takes openedPage,
threaded from body.openPage — the same value used for the system prompt). The
inline system-prompt line is kept as belt-and-suspenders. Reads/writes still go
through the CASL-enforced page tools by id, so this is strictly not worse than
the existing prompt hint — just delivered over a channel the proxy can't mangle.

User-approved on the issue. Completes #43 together with the hardness-1 fix.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 22:19:40 +03:00
claude_code
4fe42ead56 feat(public-share): selectable agent-role identity + fix floating-icon overlap
Anonymous public-share AI assistant:
- Add a workspace setting `publicShareAssistantRoleId` so an admin can pick which
  agent role (identity/persona) the anonymous assistant adopts. The role's
  instructions REPLACE the built-in persona while the immutable safety framework
  is still always appended; the role's optional model override takes precedence
  over the cheap publicShareChatModel. Resolved server-authoritatively
  (workspace-scoped, soft-delete aware; disabled/missing roles fall back to the
  built-in persona, so the tool scope remains the real security boundary).
- Plumb the field through the update DTO, ai-settings service, the workspace.repo
  ALLOWED whitelist, resolve()/getMasked(), stream-time role resolution and the
  prompt/model, plus the settings UI: a new "Assistant identity" Select listing
  enabled roles (and surfacing a saved-but-disabled role explicitly).

Public-share branding / floating icon:
- Fix the AI assistant FAB overlapping the "Powered by ..." button (both were
  Affixed bottom-right): stack the FAB above the bottom-right branding.
- Rename "Powered by Docmost" -> "Powered by Gitmost" and point the link at the
  gitmost repo.

Tests: extend public-share-chat.spec (role persona replacement still appends the
safety framework, resolveShareRole edge cases, model-override precedence).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 19:54:45 +03:00