Found while live-testing the realtime dictation: - 'already active' lockout (real bug): the per-user slot was tied to the connected socket lifetime and a stale/racing socket could leave the counter stuck, so a fresh mic start was rejected. Now per-user single-session is enforced purely by LATEST-WINS EVICTION — a new connect disconnects the user's prior socket and frees its slot synchronously — and the user counter no longer participates in the cap decision (it could only cause false lockouts). Also free the slot when a start fails to open. The per-workspace cap is unchanged. - #737: drop the separate sttRealtimeModel / sttRealtimeBaseUrl settings — realtime dictation now reuses the existing STT model + base URL (the realtime WS endpoint is derived from it server-side). Removed the fields from the DTO, types, settings service, repo allowlist, and the settings UI. The STT 'Test endpoint' button is now a single context-aware button (probes the realtime WS endpoint when realtime is on, the batch endpoint otherwise), and the 'Request format' selector is disabled while realtime is on (realtime always uses the OpenAI Realtime protocol). - no-silent-loss: parse the OpenAI conversation.item.input_audio_transcription.failed event (e.g. insufficient_quota, bad model) and surface its concrete reason to the client instead of dropping it silently — previously a per-item transcription failure produced 'no words' with no explanation. Tests: realtime suites green (gateway latest-wins eviction, parser .failed surfacing, ai-settings reuse-STT-model); server + client tsc clean; workspace vitest 37 pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
11 KiB
Changelog
All notable changes to this project are documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Releases prior to
0.91.0predate this changelog; see the git tags for earlier history.
Unreleased
Changed
- Public share AI: default per-workspace hourly assistant cap lowered
300 → 100. The limiter falls back to this default whenever
SHARE_AI_WORKSPACE_MAX_PER_HOURis unset, so a0.93.0deployment that never set the env var has its anonymous public-share assistant hourly cap cut from 300 to 100 on upgrade. SetSHARE_AI_WORKSPACE_MAX_PER_HOURto keep the previous limit. (#62)
0.93.0 - 2026-06-21
This release builds on the 0.91.0 AI foundation: admin-defined AI agent roles, an anonymous AI assistant on public shares, server-side voice dictation, an editor footnotes model, live page-template embeds, and sandboxed arbitrary-HTML embeds — plus a large batch of security hardening and test coverage.
Breaking Changes
- MCP shared-token auth moved to its own header. The
/mcpshared guard no longer readsAuthorization: Bearer <MCP_TOKEN>; it now reads only theX-MCP-Tokenheader. TheAuthorizationheader is now reserved for per-user HTTP Basic / Bearer access-JWT credentials, so each/mcprequest authenticates as a specific user (theMCP_DOCMOST_*service account is only a fallback). Existing MCP clients (e.g. Claude Desktop) configured withAuthorization: Bearer <MCP_TOKEN>must be reconfigured to sendX-MCP-Token: <MCP_TOKEN>instead. SeeMCP_TOKENin.env.example. As a one-time aid, the server logs a single migration warning when it sees the old-style header.
Added
- AI agent roles: admin-defined assistant personas with an optional per-role model override, selectable in chat.
- Anonymous AI assistant on public shares: public-share visitors can chat with a selectable agent-role identity that reuses the internal chat presentation, with per-request output-token caps and a fail-closed Redis limiter.
- Voice dictation (STT): server-side speech-to-text with a mic button in the chat and the editor, OpenRouter STT support, an endpoint test, and real provider-error surfacing.
- Realtime streaming dictation: a new live-dictation mic mode layered on top
of the existing batch STT. Audio streams over a dedicated
/ai-realtimeSocket.IO namespace and text is inserted as you speak (interim partials shown as a ghost decoration, only finals committed to the document). Gated by a newdictationRealtimeworkspace toggle; it reuses the existing STT model and endpoint (the realtime WS endpoint is derived from the STT base URL), so no separate model/endpoint settings are needed. While the toggle is on the batch "Request format" selector is disabled, since realtime always uses the OpenAI Realtime protocol.- Ops caveat (single-process assumption): the realtime concurrency caps (1 concurrent session per user, 5 per workspace) are enforced in-memory, per API process. They are therefore authoritative only on a single API replica — running multiple API instances (horizontal scale / load balancing) lets a user or workspace exceed these caps, since each process counts only its own sessions. Treat the limits as per-process until the counters are moved to a shared store.
- Footnotes: an editor footnotes model (inline references + a definitions list).
- Page templates: live whole-page embed (MVP) with a template-marker icon in the page tree and a working Refresh action.
- Arbitrary HTML/CSS/JS embeds: a sandboxed-iframe embed block gated by a per-workspace toggle (default OFF); insertable by any member when the toggle is on.
- Admin-only "Analytics / tracker" workspace setting: a raw HTML/JS snippet
injected into the
<head>of public share pages only (for analytics such as Google Analytics or Yandex.Metrika), kept separate from the member-facing HTML-embed feature. - MCP: a hierarchical tree mode for
list_pages, and per-user auth for the embedded/mcpendpoint. - Page tree: Expand all / Collapse all for the space tree, and server-authoritative realtime tree updates.
- AI chat UX: a
get_current_pagetool for proxy-robust page context, a current-context-size readout, an agent step cap raised 8→20 with a forced final text answer, and auto-collapse of the chat window on page focus. - AI settings: a Clear control inside the API-key field and an endpoint status dot bound to "configured × enabled".
- Client: an always-visible space grid replacing the space-switcher popover, removal of the sidebar Overview item, tighter comments-panel density, and no auto-open of the comments panel when adding a comment.
Changed
- HTML embed blocks now render inside a sandboxed iframe (separate origin) and, when the workspace HTML-embed toggle is on, can be inserted by any member (previously admin-only). Turning the toggle off hides existing embeds and stops serving them on public share pages.
- Remove the server-side role-based stripping of HTML-embed blocks from the write paths (collab/REST/MCP, page create/duplicate, import, transclusion unsync); sandboxing makes per-write gating unnecessary. The only remaining server-side strip is the public-share read path, which still honors the workspace HTML-embed toggle.
Fixed
- AI chat: preserve scroll position during streaming, record chats that fail on their first turn, and resolve the current page for agent context behind proxies.
- AI roles: guard
update()against concurrent soft-delete; harden the model override, role-name uniqueness, and id validation; sandwich the safety framework around the role persona. - Auth: handle null-password (SSO/LDAP-only) accounts without a bcrypt throw.
- Footnotes: survive duplicate-id definitions without collab divergence.
- HTML embed: fix stale iframe height and damp the resize loop; strip embeds at serve time on authenticated read paths and the plain page-create path.
- Page templates: import
ThrottleModuleso collab boots, never strand an in-flight page-embed id, and add defense-in-depth workspace checks. - Pages:
movePagecycle guard with no phantomPAGE_MOVEDevent. - Import: surface the real error cause from
/pages/importinstead of a generic 400.
Security
- MCP: close an SSO/MFA bypass on Basic auth and stop minting non-init sessions; close a brute-force limiter check-then-act race.
- Public share: block restricted descendants in the anonymous assistant, cap per-request output, fail closed when Redis is unavailable, and reject non-text message parts to close a size-cap bypass.
- Make
trustProxyenv-configurable with a safe default.
Internal
- CI: gate the
developand release image builds on the test suite, run the suites on push/PR, and build the:developimage on push todevelop. - Docs: replace
CLAUDE.mdwithAGENTS.mdcodifying the agent workflow and the release procedure, add migration-ordering guidance, and prune implemented plans. - A large batch of new server/client test coverage.
0.91.0 - 2026-06-18
Gitmost is a community-focused fork of Docmost. This release drops the Enterprise-Edition code paths and introduces the in-app AI agent chat, a RAG knowledge layer, an embedded MCP server, and the Gitmost rebrand.
Breaking Changes
- Remove all frontend Enterprise-Edition code — the project now builds as a pure community edition.
- AI agent: drop the
updateCommenttool from the agent toolset.
Added
- AI agent chat: per-user in-app AI agent with a floating chat window. Includes live streaming responses, open-page context awareness, a typing indicator, a Stop control, and copy/export of a conversation as Markdown.
- AI agent write tools & provenance: reversible write tools (page create/update/move/soft-delete, comment reply/resolve) enforced by Docmost CASL, plus non-spoofable agent provenance signed into access/collab tokens and recorded on pages and comments. No permanent/force delete.
- RAG knowledge retrieval: workspace bulk reindex with a manual "Reindex now" action, hybrid RRF retrieval with heading-breadcrumb chunks and a merged search tool, dimension-agnostic embeddings, and RAG indexing coverage shown in AI settings.
- MCP: embedded community MCP server served at
/mcp; an admin UI to list/add/edit/delete external MCP servers with per-server enable toggle, Test, write-only auth headers, a tool allowlist, and a Tavily preset;insert_image/replace_imagecan now fetch sources from web URLs. - AI configuration: dedicated AI provider settings with separate base URL and API key for the chat vs. embedding model, and per-endpoint test buttons.
- Branding: Gitmost logo, favicon, and app name.
- Collaboration: comment resolution for the community build; agent edits are separated from human edits in page history.
- Editor / client: page-tree open/closed state is persisted per
workspace+user; the brand logo shows the current
git describeversion.
Changed
- Move AI settings to a dedicated
/settings/aipage and redesign it with per-endpoint test buttons. edit_page_textnow returns verifiable mutation results and refuses formatting-only edits; the agent tolerates Markdown inedit_page_text/insert_nodelocators.- Compact large tool outputs before persisting them.
- Reduce the chat window corner radius, shrink the chat message font size, and shrink the default page-tree indentation from 16px to 8px.
Fixed
- AI chat: stable streaming store id so optimistic and streamed messages render immediately; provider errors stay visible and surface the real provider status/message; the composer draft survives the new-chat id-adoption remount; the workspace AI-chat enable toggle is restored for self-hosted.
- AI providers: use OpenAI Chat Completions for multi-turn requests; self-heal the stored provider settings JSON; drop the hard output-token cap that truncated complex tool calls.
- RAG: make the indexer observable and bound hung embedding calls; stop the coverage bar from sticking below 100% on empty pages.
- Collaboration: use
-instead of:in the agent page-history job id. - Accessibility fixes (#2275) and table jitter on the edit/read toggle (#2252).
Removed
- Non-functional DOCX / PDF / Confluence import buttons.
Documentation
- README: rebrand to the Gitmost fork with EE-free positioning, an MCP comparison, a grouped roadmap, a Russian translation, a "Migration from Docmost" section, and AI agent chat documentation.
- Add plans for mobile app, voice dictation, arbitrary HTML/CSS/JS embeds, and offline sync & PWA.
Internal
- Add
.claude/worktrees/to.gitignore. - CI: add a
developworkflow withworkflow_dispatch; ignore cache errors in the develop and release builds. - Build: drop the private EE submodule, retarget CI to GHCR, and update the Docker image to the GHCR registry.