Files

claude code agent 227 1095c5679f fix(dictation): address PR #118 review feedback (security, stability, tests)

Implements all reviewer comments (code-review, red-team, and test-strategy
audit), accepting the recommended variants.

Server — realtime service (ai-realtime.service.ts):
- SSRF: pin the validated IP via a WebSocket `lookup` hook that re-checks every
  resolved address with isIpAllowed (mirrors external-mcp buildPinnedDispatcher),
  closing the TOCTOU/DNS-rebinding window; fix the misleading comment.
- no-silent-loss: on Stop, drain the in-flight segment (bounded 2.5s) and deliver
  the final via onFinal before closing instead of dropping the tail.
- fail-closed deriveRealtimeUrl: a non-empty unparseable base now THROWS (no
  silent api.openai.com fallback that would leak a self-hosted key); http://ws://
  bases rejected (plaintext key). Path normalization preserved.
- parseUpstreamEvent keys the accumulator by item_id+content_index so GA segments
  don't concatenate.
- inject a wsFactory seam for testing; also fix a latent bug — `import WebSocket
  from 'ws'` resolved to undefined at runtime (no esModuleInterop) -> import=require.
- unref idle/max/drain timers.

Server — realtime gateway (ai-realtime.gateway.ts, session-limits.ts):
- reject revoked/disabled users and inactive sessions (mirror jwt.strategy:
  findById+isUserDisabled + findActiveById) with NO counter increment.
- CSWSH: Origin allowlist (matching APP_URL, or no Origin for native clients)
  before auth, no increment.
- extract SessionCounters (delete-at-zero, never negative) + pure canConnect
  (both caps >= checked before any increment); document the per-process/in-memory
  cap caveat (single-replica only).

Client:
- dictation-group: realtime final now inserts at the captured rangeRef SNAPSHOT
  (not the live caret) and guards editor.isEditable; single-space separator.
- use-realtime-dictation/realtime-dictation-client: stop-during-acquisition tears
  down the mic (no leak / button reset); reconnect re-emits start (double-start
  guarded); interim ghost cleared on teardown; io() options de-duplicated.
- pcm16-worklet: flush the partial sub-frame tail on stop; one-pole anti-aliasing
  low-pass before 48k->24k.
- extract shared mic-capture (acquireMicStream/mapGetUserMediaError, used by batch
  + realtime), pure DSP (pcm16-dsp.ts), and the session reducer/baseLanguageSubtag;
  extract applyInterimMeta/clampRange/resolveUrl/appendFinalToDraft.

Tests + infra: +~150 server tests (deriveRealtimeUrl, parseUpstreamEvent branches,
openSession/lifecycle/timers/testConnection via fake ws, gateway auth/caps/no-leak,
realtime-test admin contract, AiSettings update/resolve, DTO boolean, SSRF deny)
and +~140 client tests (DSP property/edge, resampler continuity, framing, reducer,
mic-capture, RealtimeDictationClient/MicButton, ProseMirror interim regression +
history guards, appendFinalToDraft, resolveKeyField, route contract). Added
@vitest/coverage-v8. CHANGELOG [Unreleased] entry incl. the single-replica caveat.

Review: APPROVE WITH SUGGESTIONS (no critical/regression); applied the drain-timer
unref. Server tsc clean + 358 tests; client tsc clean + 201 tests; vite build ok.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-21 17:15:33 +03:00

Changelog

All notable changes to this project are documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Releases prior to 0.91.0 predate this changelog; see the git tags for earlier history.

Unreleased

Added

Realtime streaming dictation: a new live-dictation mic mode layered on top of the existing batch STT. Audio streams over a dedicated /ai-realtime Socket.IO namespace and text is inserted as you speak (interim partials shown as a ghost decoration, only finals committed to the document). Gated by a new dictationRealtime workspace toggle, with sttRealtimeModel and sttRealtimeBaseUrl settings (empty model falls back to sttModel; empty base URL falls back to the STT base URL server-side).
- Ops caveat (single-process assumption): the realtime concurrency caps (1 concurrent session per user, 5 per workspace) are enforced in-memory, per API process. They are therefore authoritative only on a single API replica — running multiple API instances (horizontal scale / load balancing) lets a user or workspace exceed these caps, since each process counts only its own sessions. Treat the limits as per-process until the counters are moved to a shared store.
Admin-only "Analytics / tracker" workspace setting: a raw HTML/JS snippet injected into the <head> of public share pages only (for analytics such as Google Analytics or Yandex.Metrika).

Changed

HTML embed blocks now render inside a sandboxed iframe (separate origin) and, when the workspace HTML-embed toggle is on, can be inserted by any member (previously admin-only). Turning the toggle off hides existing embeds and stops serving them on public share pages.
Remove the server-side role-based stripping of HTML-embed blocks from the write paths (collab/REST/MCP, page create/duplicate, import, transclusion unsync); sandboxing makes per-write gating unnecessary. The only remaining server-side strip is the public-share read path, which still honors the workspace HTML-embed toggle.

Breaking Changes

MCP shared-token auth moved to its own header. The /mcp shared guard no longer reads Authorization: Bearer <MCP_TOKEN>; it now reads only the X-MCP-Token header. Existing MCP clients (e.g. Claude Desktop) configured with Authorization: Bearer <MCP_TOKEN> must be reconfigured to send X-MCP-Token: <MCP_TOKEN> instead. The Authorization header is now reserved for per-user HTTP Basic / Bearer access JWT credentials. See MCP_TOKEN in .env.example. As a one-time aid, the server logs a single migration warning when it sees the old-style header.

0.91.0 - 2026-06-18

Gitmost is a community-focused fork of Docmost. This release drops the Enterprise-Edition code paths and introduces the in-app AI agent chat, a RAG knowledge layer, an embedded MCP server, and the Gitmost rebrand.

Breaking Changes

Remove all frontend Enterprise-Edition code — the project now builds as a pure community edition.
AI agent: drop the updateComment tool from the agent toolset.

Added

AI agent chat: per-user in-app AI agent with a floating chat window. Includes live streaming responses, open-page context awareness, a typing indicator, a Stop control, and copy/export of a conversation as Markdown.
AI agent write tools & provenance: reversible write tools (page create/update/move/soft-delete, comment reply/resolve) enforced by Docmost CASL, plus non-spoofable agent provenance signed into access/collab tokens and recorded on pages and comments. No permanent/force delete.
RAG knowledge retrieval: workspace bulk reindex with a manual "Reindex now" action, hybrid RRF retrieval with heading-breadcrumb chunks and a merged search tool, dimension-agnostic embeddings, and RAG indexing coverage shown in AI settings.
MCP: embedded community MCP server served at /mcp; an admin UI to list/add/edit/delete external MCP servers with per-server enable toggle, Test, write-only auth headers, a tool allowlist, and a Tavily preset; insert_image/ replace_image can now fetch sources from web URLs.
AI configuration: dedicated AI provider settings with separate base URL and API key for the chat vs. embedding model, and per-endpoint test buttons.
Branding: Gitmost logo, favicon, and app name.
Collaboration: comment resolution for the community build; agent edits are separated from human edits in page history.
Editor / client: page-tree open/closed state is persisted per workspace+user; the brand logo shows the current git describe version.

Changed

Move AI settings to a dedicated /settings/ai page and redesign it with per-endpoint test buttons.
edit_page_text now returns verifiable mutation results and refuses formatting-only edits; the agent tolerates Markdown in edit_page_text/insert_node locators.
Compact large tool outputs before persisting them.
Reduce the chat window corner radius, shrink the chat message font size, and shrink the default page-tree indentation from 16px to 8px.

Fixed

AI chat: stable streaming store id so optimistic and streamed messages render immediately; provider errors stay visible and surface the real provider status/message; the composer draft survives the new-chat id-adoption remount; the workspace AI-chat enable toggle is restored for self-hosted.
AI providers: use OpenAI Chat Completions for multi-turn requests; self-heal the stored provider settings JSON; drop the hard output-token cap that truncated complex tool calls.
RAG: make the indexer observable and bound hung embedding calls; stop the coverage bar from sticking below 100% on empty pages.
Collaboration: use - instead of : in the agent page-history job id.
Accessibility fixes (#2275) and table jitter on the edit/read toggle (#2252).

Removed

Non-functional DOCX / PDF / Confluence import buttons.

Documentation

README: rebrand to the Gitmost fork with EE-free positioning, an MCP comparison, a grouped roadmap, a Russian translation, a "Migration from Docmost" section, and AI agent chat documentation.
Add plans for mobile app, voice dictation, arbitrary HTML/CSS/JS embeds, and offline sync & PWA.

Internal

Add .claude/worktrees/ to .gitignore.
CI: add a develop workflow with workflow_dispatch; ignore cache errors in the develop and release builds.
Build: drop the private EE submodule, retarget CI to GHCR, and update the Docker image to the GHCR registry.

6.8 KiB Raw Blame History

Changelog

Unreleased

Added

Changed

Breaking Changes

0.91.0 - 2026-06-18

Breaking Changes

Added

Changed

Fixed

Removed

Documentation

Internal

6.8 KiB

Raw Blame History