Implements all reviewer comments (code-review, red-team, and test-strategy
audit), accepting the recommended variants.
Server — realtime service (ai-realtime.service.ts):
- SSRF: pin the validated IP via a WebSocket `lookup` hook that re-checks every
resolved address with isIpAllowed (mirrors external-mcp buildPinnedDispatcher),
closing the TOCTOU/DNS-rebinding window; fix the misleading comment.
- no-silent-loss: on Stop, drain the in-flight segment (bounded 2.5s) and deliver
the final via onFinal before closing instead of dropping the tail.
- fail-closed deriveRealtimeUrl: a non-empty unparseable base now THROWS (no
silent api.openai.com fallback that would leak a self-hosted key); http://ws://
bases rejected (plaintext key). Path normalization preserved.
- parseUpstreamEvent keys the accumulator by item_id+content_index so GA segments
don't concatenate.
- inject a wsFactory seam for testing; also fix a latent bug — `import WebSocket
from 'ws'` resolved to undefined at runtime (no esModuleInterop) -> import=require.
- unref idle/max/drain timers.
Server — realtime gateway (ai-realtime.gateway.ts, session-limits.ts):
- reject revoked/disabled users and inactive sessions (mirror jwt.strategy:
findById+isUserDisabled + findActiveById) with NO counter increment.
- CSWSH: Origin allowlist (matching APP_URL, or no Origin for native clients)
before auth, no increment.
- extract SessionCounters (delete-at-zero, never negative) + pure canConnect
(both caps >= checked before any increment); document the per-process/in-memory
cap caveat (single-replica only).
Client:
- dictation-group: realtime final now inserts at the captured rangeRef SNAPSHOT
(not the live caret) and guards editor.isEditable; single-space separator.
- use-realtime-dictation/realtime-dictation-client: stop-during-acquisition tears
down the mic (no leak / button reset); reconnect re-emits start (double-start
guarded); interim ghost cleared on teardown; io() options de-duplicated.
- pcm16-worklet: flush the partial sub-frame tail on stop; one-pole anti-aliasing
low-pass before 48k->24k.
- extract shared mic-capture (acquireMicStream/mapGetUserMediaError, used by batch
+ realtime), pure DSP (pcm16-dsp.ts), and the session reducer/baseLanguageSubtag;
extract applyInterimMeta/clampRange/resolveUrl/appendFinalToDraft.
Tests + infra: +~150 server tests (deriveRealtimeUrl, parseUpstreamEvent branches,
openSession/lifecycle/timers/testConnection via fake ws, gateway auth/caps/no-leak,
realtime-test admin contract, AiSettings update/resolve, DTO boolean, SSRF deny)
and +~140 client tests (DSP property/edge, resampler continuity, framing, reducer,
mic-capture, RealtimeDictationClient/MicButton, ProseMirror interim regression +
history guards, appendFinalToDraft, resolveKeyField, route contract). Added
@vitest/coverage-v8. CHANGELOG [Unreleased] entry incl. the single-replica caveat.
Review: APPROVE WITH SUGGESTIONS (no critical/regression); applied the drain-timer
unref. Server tsc clean + 358 tests; client tsc clean + 201 tests; vite build ok.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Layer an optional realtime speech-to-text path on top of the existing
batch dictation, so transcribed text appears as the user speaks.
Transport A2: browser <-> our server (Socket.IO `/ai-realtime`) <->
OpenAI Realtime (raw ws). The provider API key never leaves the server;
the upstream URL is SSRF-checked before connecting; the gateway enforces
the dictation+dictationRealtime gate, cookie-JWT auth and per-user/
per-workspace concurrency caps. Implemented against the GA (2026) OpenAI
Realtime transcription contract (session.update / audio.input.format /
server_vad), not the now-removed beta shape.
Editor UI B2: interim text is shown as a meta-only ProseMirror ghost
decoration (no Yjs/history noise); only completed segments are committed.
Chat shows interim as a dimmed tail. The mic button switches realtime vs
batch by the workspace flag; batch remains the default and fallback.
Server:
- AiRealtimeService (upstream ws proxy, normalized events, idle/max-
duration timeouts, idempotent teardown) + parseUpstreamEvent unit tests
- AiRealtimeGateway (Socket.IO `/ai-realtime`) wired into AiChatModule
- admin-gated POST /ai-chat/realtime/test connectivity probe
- config: settings.ai.dictationRealtime + provider sttRealtimeModel/
sttRealtimeBaseUrl (realtime key reuses sttApiKey; no new secret)
Client:
- pcm16 AudioWorklet (24kHz mono PCM16), RealtimeDictationClient,
use-realtime-dictation hook (status/start/stop/cancel + onInterim/onFinal)
- RealtimeMicButton + dictation-interim ProseMirror decoration
- editor/chat integration + AI settings UI (toggle, model, test endpoint)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Merge the comments side-panel header into the Open/Resolved tab row to
save vertical space: title on the left, tabs centered, close button on
the right.
- comment-list-with-tabs: add optional `title`/`onClose` props; render
the title and close button as absolutely-positioned overlays around a
full-width centered Tabs.List. Keeping them outside Tabs.List preserves
the tablist ARIA contract (only role="tab" children) while the tab
list's full-width bottom border line is retained.
- aside: pass `title`/`onClose` to CommentListWithTabs for the comments
tab and drop the shared header for that tab; the toc/details tabs keep
their existing shared header and scroll area unchanged.
- Remove the large active-space name header in the space sidebar;
the active space stays highlighted in the spaces grid below.
- Move "Space settings" into the user avatar (top) menu next to
"Workspace settings"; it shows only while viewing a space and is
detected via useMatch("/s/:spaceSlug/*").
- Make the brand logo non-selectable/non-draggable (user-select:none
on .brand, draggable=false on the img).
- Remove the redundant "Home" button next to the logo (the logo
already links to /home).
- Remove the version label under the Settings sidebar menu.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The live mic-level halo around the stop button was frozen at a constant
scale (1.15) whenever the OS "Reduce motion" setting was on, so it never
reacted to the voice while dictating. Make haloScale unconditional so it
always follows audioLevel (amplitude 0.9), and drop the now-unused
useReducedMotion import and reduceMotion local.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Widen the comments/aside panel from 350 to 420 (~20% wider)
- Remove double padding around the panel: AppShell.Aside p="md"->"sm"
and inner Box p="md"->p={0}; reduce header-to-tabs gap mb="md"->"sm"
- Reduce empty space below the add-comment input (paddingBottom 25->10),
align the avatar with the input box (marginTop 10->2) and re-anchor the
send button (bottom 30->15)
- Pull the timestamp closer to the nickname via tighter line-height
(lh 1.2 on the name, 1.1 on the "… ago" text)
Make AI-created comments inline-only and reliably anchored: forbid
page-type comments for the agent, throw + roll back when a selection
cannot be anchored, and add robust text matching (normalization +
cross-text-node anchoring within a block).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The "Thinking…" indicator's bounce was fully disabled by the
prefers-reduced-motion rule (animation: none), leaving the dots
frozen for users with "Reduce motion" enabled. Drive the bounce
height with a --bounce custom property: -6px by default and a
smaller -3px under reduced-motion, so the indicator stays visibly
active everywhere instead of freezing.
The in-app AI chat hardcoded type='page' and the shared createComment
swallowed anchoring failures silently, so agent comments never got a
text anchor/highlight.
- Forbid page-type comments for the agent: top-level comments are always
inline and require an exact `selection`; replies inherit the parent
anchor (stored as the historical `page` type).
- Throw and roll back the just-created comment when the selection cannot
be anchored, instead of leaving an orphan unanchored comment.
- Add comment-anchor module: text normalization (smart quotes, dashes,
nbsp, collapsed whitespace) and matching across adjacent text nodes
within a block, so selections crossing inline-code/bold/link anchor.
- Update create_comment (MCP) and createComment (ai-chat) tool schemas
and descriptions; add unit + mock-HTTP orchestration tests.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a rule to the "Реализация" section of AGENTS.md stating that git
worktrees may only be created inside the .claude directory
(e.g. .claude/worktrees/<name>); creating them anywhere else is forbidden.
A brand-new chat's first turn streamed and finished successfully, but the
whole assistant response vanished from the UI. On finish the window adopts
the server-created chat id, which changed the <ChatThread> key and remounted
it — discarding the live useChat store (the full answer) and re-seeding from
not-yet-persisted history, so only the user message remained.
- chat-thread: pin the useChat store id to a per-mount value so adopting the
chatId prop no longer recreates the store and wipes the live turn.
- ai-chat-window: derive the thread mount key via setState-during-render and
move the live-thread marker in lockstep with the adopted id, so in-place
adoption keeps the same mounted thread while real chat switches still
remount and re-seed; gate the history loader to a freshly opened chat.
- cancel a pending adoption on New chat / explicit chat selection.
- log the raw stream error to the browser console for debugging.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The back-merge alone does not fix the develop version: git describe names
a tag ref, and the :develop image is built on GitHub Actions, so the tag
must exist on the `github` remote. git push of a branch does not push
tags. Document the multi-remote (gitea + github) tag-push requirement and
a recovery checklist when develop still shows the previous version.
Replace the AI chat typing indicator text "AI is typing…" with
"Thinking…".
- typing-indicator.tsx: use t("Thinking…") instead of t("AI is typing…")
- en-US: drop the now-redundant "AI is typing…" key (the "Thinking…"
key already existed and was unused)
- ru-RU: rename the key to "Thinking…" with value "Думаю…"
- update related comments in message-list.tsx and the test file
The UI version comes from `git describe --tags`, which resolves the nearest
tag in the current commit's ancestry. Release tags are created on main's
merge commit, which is not in develop's history, so develop builds keep
reporting the previous tag (e.g. v0.91.0-NNN) until main is merged back.
Add step 7 (back-merge main -> develop) to the "Cutting a release"
checklist and a subsection explaining why develop lags and how to fix it.
The onAbort terminal path persisted the partial turn but wrote nothing
to the log, so a turn killed by a client disconnect / proxy drop / stop()
was invisible in the logs (unlike onError and the controller catch, which
both log). Add a logger.warn with the chat id, completed step count and
partial-text length so an aborted turn is traceable.
showTypingIndicator treated any tool part in the latest assistant message
as visible content, so the "AI is typing…" dots were suppressed for the
rest of the turn once the first tool call appeared. During the model's
"thinking" pauses after a completed tool call, the chat showed only static
tool cards and no activity.
Inspect the last part of the assistant message instead of any part: hide
the dots only while output is actively rendering (a non-empty streaming
text part, or a tool still in the "running" state — which shows its own
Loader). Finished/errored tools and empty trailing text now keep the dots
visible, so the indicator reappears while the model thinks between steps.
Add tests covering the post-tool thinking gap and the running-tool case.
Add a pulsing halo behind the stop button that scales with the
microphone input level, giving real-time feedback that recording is
active and the mic is picking up sound.
- use-dictation: meter the captured MediaStream via AudioContext +
AnalyserNode (analyser only, never connected to destination), compute
a smoothed RMS audioLevel (0..1) in a requestAnimationFrame loop, and
tear the meter down on every recording-end path (stop/cancel/auto-stop/
unmount); meter failure is non-fatal to recording
- mic-button: render a translucent red halo whose scale follows
audioLevel; honor prefers-reduced-motion with a static halo
- stop(): recover and release resources when no live recorder remains
- fix unhandled rejection from AudioContext.resume()
Delete the backlog markdown file that outlined additional STT providers and the future async transcription architecture, as the content is now superseded by newer implementation plans.
Document Variant B for showing MCP-created comments (and pages) as AI
rather than as the service-account user, reusing the existing agent
provenance infrastructure (§15 C3).
- Root cause: MCP logs in via a plain service-account token, so
provenance.actor stays 'user' and created_source defaults to 'user';
the comment sidebar also renders no AI badge.
- B1 (backend): mark the MCP identity as agent via a new users.is_agent
flag; jwt.strategy derives req.raw.actor from it (non-spoofable).
Relax the provenance aiChatId type to string | null for external MCP.
- B2 (frontend): extend IComment with createdSource/aiChatId, extract a
shared AiAgentBadge, render it in comment-list-item.
- Includes edge cases, tests, scope decisions, and acceptance criteria.
Role cards in the new-chat empty state were capped at max-width 200px and
never grew, leaving large side gaps in a wide window. Make the cards flex
to fill each row (flex: 1 1 240px) and raise min/max width so they get
wider and use the available window width while still wrapping to ~2 columns
at the default window size.
Provider auth failures were logged with the provider's opaque message only
(e.g. OpenRouter returns "401: User not found." for a bad/missing API key),
which reads like a missing wiki user rather than a credentials problem.
describeProviderError now prepends a clear, human-readable English label for
a small set of well-known HTTP statuses while keeping the original detail
(status + provider message + truncated response-body snippet):
- 401/403 -> authentication failed (invalid or missing API key)
- 402 -> insufficient credits or quota
- 429 -> rate limit exceeded
Other statuses and status-less errors are formatted exactly as before. The
label is a static string and never contains the API key. Benefits every
caller (embedding processor, indexer, AI "Test endpoint" UI) at once.
Tests: switch the plain status+message case to a non-classified status (500);
add 401/403/402/429 cases; keep 502/503 as regression guards for the
unchanged path.
Batches 6-9: behaviour-preserving extractions of testable pure cores plus the
tests they unblock, and a fix for the broken client test environment.
Full suites green: server 113 suites / 1117 + 1 todo, client 30 files / 338.
client (R0 infra):
- vitest.setup.ts: in-memory localStorage/sessionStorage Storage stub wired via
setupFiles. Unblocks menu-items.gating.test.ts (was 9 failing) -> client suite
fully green. + menu-items.suggestions.test.ts (getSuggestionItems filter/sort).
share:
- extract buildShareMetaHtml (share-seo.util.ts) from the SEO controller; tests
for reflected-XSS escaping in <title>/og/twitter meta, noindex, truncation;
extractPageSlugId; updateAttachmentAttr; prepareContentForShare comment-strip
(anonymous-viewer metadata-leak guard).
ai-chat (security extractions):
- selectAccessibleHits: CASL post-filter for semantic search (restricted page in
an accessible space must NOT leak to the agent).
- validateResolvedAddresses: SSRF connect-time guard (block if ANY resolved
address is private).
- resolveAudioFormat: mime whitelist (dead `?? 'webm'` fallback dropped, set
unchanged). + mcp-servers toView header-leak guard, MCP tool namespacing.
collaboration (data-loss area):
- extract computeHistoryJob (pins the "agent delay MUST stay 0" invariant) and
resolveSource. Integration: onAuthenticate read-only matrix (collab auth
bypass), HistoryProcessor (contributor restore on save failure), onStoreDocument
Approach-A boundary snapshot (human revision pinned before agent overwrite).
Reviewed (APPROVE WITH SUGGESTIONS): extractions behaviour-preserving, security
tests mutation-resistant.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Shared zod-agnostic tool-spec registry for the 14 identical AI tools across
the standalone MCP server and the in-app AI-SDK chat (keeps execute/auth and
the ~17 intentionally-divergent guardrail tools per-layer), folds in the
edit_page_text drift-bug fix, and formalizes the integration-test db factory.
Implements two architecture follow-ups from the multi-aspect review.
1. Shared, zod-agnostic tool-spec registry (packages/mcp/src/tool-specs.ts)
for the 14 AI tools whose name + schema + model-facing description are
genuinely identical across the standalone MCP server and the in-app
AI-SDK chat. Both layers consume it (registerShared in index.ts;
sharedTool in ai-chat-tools.service.ts) and keep their own execute/auth.
- Zod-agnostic builders (z) => ZodRawShape bridge the zod v3 (mcp) vs
zod v4 (server) split; the registry imports no zod.
- Folds in the documented edit_page_text drift-bug fix: the stale
"strip-and-retry tolerated" claim is gone; canonical wording states a
formatting-only change is refused into failed[].
- Sibling-tool references in shared descriptions are transport-neutral so
one description is correct for both snake_case (MCP) and camelCase
(in-app) tool names.
- Loader fail-fast guard for a stale @docmost/mcp build.
- The ~17 intentionally-divergent tools (security guardrails, tuned UX)
stay per-layer, untouched.
- Rebuilt committed mcp artifacts (also regenerates a previously stale
build/lib/docmost-schema.js to match its already-committed source).
2. Formalize apps/server/test/integration/db.ts as the canonical
integration-test seed factory (module doc + a shortId helper); the
hand-written minimal seeders are kept on purpose, decoupled from the
app service-layer side effects.
Verified: server tsc + lint clean, mcp build clean; mcp unit tests 261 pass,
ai-chat-tools.service 16 pass, public-share-chat-tools 8 pass, ai-chat suite
224 pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The AI chat error banner always showed a generic "Something went wrong"
with no reason. The server already forwards the provider cause into the
stream (e.g. "Cannot connect to API: read ECONNRESET"), but the client
hid it behind a static heading.
- describeChatError now returns { title, detail }: a short heading naming
the cause category plus a one-line explanation.
- Add classifyProviderError: maps connection reset, timeout, rate limit,
context-window overflow, quota and auth failures to clear categories;
the 403/503 gating responses are preserved; unknown errors fall back to
the verbatim provider text.
- Match HTTP status codes only as the leading token and textual signatures
only against the message head (before "| response body:"), so a number
or phrase in the response-body snippet never mislabels the cause.
- Use the new {title, detail} in all three banners: chat-thread,
share-ai-widget and the persisted-error banner in message-item.
- Cover the classifier with 20 unit tests (categories + regressions).
Fit full role-card description text in the AI chat empty state and show a
generic "AI is typing…" indicator (role name kept only as the dimmed
interlocutor label).
The typing indicator rendered "<role name> is typing…". Show a generic
"AI is typing…" instead and keep the role/identity name only in the
dimmed interlocutor label above the typing dots.
- typing line now always renders t("AI is typing…")
- add the "AI is typing…" key to en-US and ru-RU locales
- sync stale doc comments that referenced the old text
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The colored role cards in the AI chat empty state truncated their
admin-configured description with an ellipsis and could clip the top row
when the cards overflowed. Make the full text fit:
- drop the description lineClamp so the whole text renders
- add overflow-wrap: anywhere so long unbreakable tokens (URLs) wrap
- switch the cards container to align-content: flex-start so an
overflowing top row stays reachable while scrolling (the parent
Mantine Center still vertically centers the block when it fits)
- widen the card max-width 180px -> 200px for more text room
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Follow-ups from the multi-aspect review of the e5bc82c7..d4658d4c range.
- CHANGELOG: document under [Unreleased] that the default per-workspace
hourly public-share assistant cap was lowered 300 -> 100 after the
v0.93.0 tag (#62). v0.93.0 shipped 300, so existing deployments that
never set SHARE_AI_WORKSPACE_MAX_PER_HOUR drop to 100 on upgrade.
- Recreate the still-open Section 3 (AiChatService.stream integration
coverage) of the deleted feature-test-coverage-deferred.md as a focused
backlog doc so the test debt stays tracked; Sections 1-2 are already
closed by the integration harness (PR #115).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Rework the new-chat role-card empty state:
- Remove the "Universal assistant" card; universal assistant is now the
implicit default the user gets by typing without picking a card.
- Show each role's description on its card (under the emoji and name).
- Clicking a card immediately starts the chat: it binds the role to the
new chat and sends the default opening prompt "Take a look at the
current document" (one click, no separate select step). roleIdRef is
set synchronously before sendMessage so the create request carries the
role.
- Show the current role's name in the window header badge and as the
assistant's display name (transcript label + "… is typing…"), falling
back to "AI agent" for a role-less chat. selectChat resets the picked
role so it cannot leak into an unrelated existing chat.
- Add the "Take a look at the current document" i18n key (en-US, ru-RU).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pre-merge review follow-up for the parseNodeArg dedupe (PR #114):
- Restore docs/backlog/ai-chat-tool-definitions-duplicated.md instead of
deleting it: it still tracks open debt (unified spec registry + ProseMirror
<-> Markdown converter unification) that this branch defers, and
docs/git-sync-plan.md links to its converter section. Mark the node-arg
quirk as done and add a Progress section.
- Reword the in-app helper header from "byte-for-byte" to "behaviorally
identical": the two copies differ in comments/quote style; only the logic,
throw messages and branch order match.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ru-RU was missing most AI-chat keys, so the chat/typing widgets rendered
mixed-language (some keys fell back to en-US). Fill the full AI-chat string
set in ru-RU and document the maintenance policy.
- ru-RU/translation.json: add the 24 missing AI-chat keys (labels, typing
indicator, Ask-AI widget, public-share, error messages); keep the typing
keys grouped; existing translations untouched.
- i18n.ts: add a policy comment near fallbackLng — en-US is the source of
truth; en-US + ru-RU are fully maintained; the other 10 locales
intentionally rely on the en-US fallback until contributed.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The fail-closed limiter behavior (#62 primary item) already shipped; this
finishes the issue by lowering the default hourly per-workspace cap from 300
to 100 to better fit real anonymous-assistant load. Still overridable via
SHARE_AI_WORKSPACE_MAX_PER_HOUR.
- public-share-workspace-limiter.ts: SHARE_AI_WORKSPACE_MAX_PER_WINDOW 300 -> 100.
- .env.example: documented default + example value 300 -> 100.
- public-share-chat.spec.ts: update the default-cap assertion to 100.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Implements Option 2 of #93. The restricted branch of broadcastPageMoved
previously resolved its audience twice — emitToAuthorizedUsers and
emitDeleteToUnauthorized each ran an independent fetchSockets +
getUserIdsWithPageAccess — leaving a race window between the two snapshots
where a socket could receive both the move and the delete (leak) or neither
(lost compensating delete).
- ws.service.ts: add emitMoveWithRestrictionSplit() that takes ONE socket
snapshot and ONE authorization resolution, then partitions the room:
authorized users get the moveTreeNode, everyone else (unauthorized +
anonymous) get the compensating deleteTreeNode. Disjoint + complete by
construction. Remove the now-unused emitToAuthorizedUsers /
emitDeleteToUnauthorized; keep private broadcastToAuthorizedUsers (still
used by emitRestrictedAwareToSpace).
- ws-tree.service.ts: broadcastPageMoved restricted branch now drives move +
delete from the single method.
- specs: assert the single method is used and that fetchSockets /
getUserIdsWithPageAccess are each called exactly once (single snapshot);
re-route ws-service.spec to emitTreeEvent after the method removal.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>