- F10 [stability]: closeMetricsServer() now calls server.closeIdleConnections()
+ server.unref() after server.close(). server.close()'s callback doesn't fire
until keep-alive sockets drain, and the scraper (VictoriaMetrics/vmagent) holds
an idle keep-alive socket — so onModuleDestroy's awaited close would hang until
the scraper disconnects or the orchestrator SIGKILLs on the kill-grace window.
closeIdleConnections() drops idle keep-alive sockets so shutdown completes
immediately (Node 22, per the Dockerfile base).
- F9 [test]: client-telemetry.module.spec.ts pins the E1=B register() gate — the
core of the "public endpoint OFF by default" decision: flag unset / any non-
"true" value ("false"/""/"0"/…) → empty controllers+providers (route absent);
"true"/"TRUE" → registers VitalsController + VitalsService. A flag-inversion or
truthiness regression that reopened the anonymous disk-fill surface now fails.
- F11 [regression/perf]: the db_query_duration_seconds token work (firstSqlToken
regex + Set lookup) is now gated on isMetricsEnabled() in database.module.ts, so
a non-metrics deployment pays NOTHING per query (previously observeDbQuery
no-op'd but the token was still computed on every query). Also hoisted the
13-element known-token Set to a module const (KNOWN_SQL_TOKENS) so it's built
once, not per query.
Gate: server tsc 0; metrics + vitals + client-telemetry suites pass (incl. the
new register-gate test).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Squashed for a clean rebase onto develop (was 19 commits; the reviewer approved
the net diff at fb246080). Detaches an agent run from the HTTP request/browser
window: a run is a first-class lifecycle object (ai_chat_runs), a browser
disconnect no longer kills it, a concurrent-run insert-gate prevents double runs,
and a reopened chat live-follows a still-running run via a polled observer merge.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The metrics INFRA is already deployed (VictoriaMetrics scraping docmost:9464,
Grafana dashboards, alerts) with a target `gitmost-app` that is red because the
app half didn't exist. This is that half. The contract (metric names, port,
table, endpoint) is FIXED by the deployed infra and matched exactly.
Server (prom-client):
- A bare node:http `/metrics` server on METRICS_PORT (default 9464), SEPARATE
from the Fastify :3000 listener so /metrics never exists publicly; the whole
subsystem is OFF when METRICS_PORT is unset.
- collectDefaultMetrics() + http_request_duration_seconds{method,route,status}
via a Fastify onResponse hook using the ROUTE TEMPLATE (req.routeOptions.url,
never the raw URL — bounded cardinality; 404 -> "unknown"), EXCLUDING SSE/
streaming responses (would record the connection lifetime and poison p95).
- db_query_duration_seconds (Kysely log callback, labelled by the leading SQL
token), bullmq_queue_depth{queue} (getJobCounts every 15s) +
bullmq_job_duration_seconds{queue} (worker completed/failed),
collab_store_duration_seconds (around onStoreDocument).
- POST /api/telemetry/vitals — PUBLIC (sendBeacon) but IP-throttled; ~16KB body
cap, <=50 events/batch, metric-name + rating whitelist, attr truncated to 120
chars, batch insert; malformed/foreign/oversized silently dropped and 200'd (no
browser retry). New migration `client_metrics` (schema byte-identical to the
contract, both indexes, conditional grafana_ro GRANT; no app-side retention —
the maintenance container prunes >90d).
Client (web-vitals):
- initVitals() decides sampling ONCE per session (25%, sessionStorage) BEFORE
subscribing; onINP/onLCP/onCLS/onTTFB (attribution) buffered + flushed via
navigator.sendBeacon on visibilitychange:hidden and a timer (not fetch-per-
metric). Custom: editor_tx_ms (dispatchTransaction sync-part timer, >8ms, with
doc_size), page_open_ms, longtask_ms. Route labels are templates only; no
titles/slugs/text.
Gate: server + client tsc 0, frozen install 0 (added prom-client + web-vitals +
regenerated the lock), server metrics/vitals tests 11, client route-template 5,
and the migration verified valid against real Postgres.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The agent rebuilds context from DB each turn and didn't know the user manually
edited the open page since its last response, so it could overwrite those edits.
Add a per-turn ephemeral <page_changed> note in the system prompt (twin of
INTERRUPT_NOTE, self-clearing) carrying a unified Markdown diff of what changed
since the END of the agent's previous turn.
- New ai_chat_page_snapshots table (migration + hand-declared db.d.ts/entity
types) storing the page Markdown per (chat,page) at each turn's end.
- Pure computePageChange util (whitespace-normalized unified diff via the
existing jsdiff dep, 6KB cap + getPage hint).
- Turn start: if the open page's updatedAt moved past the snapshot, diff current
vs snapshot; non-empty -> PAGE_CHANGED_NOTE in the safety sandwich.
- Turn end: upsert the snapshot on EVERY terminal path (onFinish/onError/onAbort,
once) so the agent's own edits are excluded by construction even on aborted
turns.
All best-effort (never breaks/latency-regresses a turn); fast path when updatedAt
is unchanged. Server-only.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a retargetable, human-readable vanity link namespace /l/<alias> that
sits alongside the untouched /share/... routes.
- New share_aliases table (workspace-scoped, UNIQUE(workspace_id, alias),
page_id nullable ON DELETE SET NULL so the address outlives its target).
- ShareAliasRepo + ShareAliasService (create / no-op / 409 reassign guard /
availability / request-time readable-target resolution through the single
existing share boundary).
- Public ShareAliasRedirectController (GET /l/:alias) issues a 302 (never 301,
the target is mutable) to the canonical /share/:key/p/:slug page; unknown /
dangling / no-longer-readable aliases serve the SPA index with no leak.
'l/:alias' excluded from the global /api prefix.
- Authenticated ShareAliasController (set/remove/availability/for-page).
- Shared ASCII-only normalize/validate util (server + client copies).
- Client: Custom address block in the share modal (live normalize + debounced
availability + copy + reassign confirmation dialog).
- Unit tests: util, repo SQL-shape, service semantics, migration/entity sanity
(server jest) + client alias util (vitest).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Embed another page's LIVE content into a host page (it updates when the source
changes, not a static copy). A page can be flagged a template for discovery in
the picker; any accessible page can be embedded.
Server:
- migrations: pages.is_template (+ partial index) and page_template_references
(whole-page back-refs); db.d.ts/entity types hand-merged (db.d.ts is curated).
- POST /pages/toggle-template (CASL Edit) flips is_template; is_template is
returned by findById + the sidebar tree select so the tree menu label
reflects state. Search suggestions gain an onlyTemplates filter for the picker.
- POST /pages/template/lookup ({sourcePageIds[]}, <=50): returns each accessible
source's {title, icon, slugId, content, sourceUpdatedAt} with comment marks
stripped (same access path as transclusion: filterViewerAccessiblePageIds;
inaccessible -> no_access, missing -> not_found; error path -> not_found, never
raw content).
- reference sync (collectPageEmbedsFromPmJson + syncPageTemplateReferences) on
the Yjs save hook; duplicatePage remaps pageEmbed.sourcePageId + inserts refs.
Known MVP gap: REST content updates don't resync refs (lookup uses in-doc ids).
Client:
- pageEmbed node (editor-ext, registered in BOTH client + server schemas);
read-only NodeView with a batching lookup; '/Embed page' slash + template
picker (self-embed prevented); 'Make/Unset template' in the tree node menu.
- Cycle guard: an ancestry-chain context + depth cap (5) render a 'circular
embed' placeholder instead of recursing.
- Public shares show a placeholder (no public lookup in MVP).
MVP excludes (follow-ups): public-share lookup, unsync->static copy, server-side
expansion for export/RAG, MCP schema mirror, point-in-time snapshots.
Implements docs/page-templates-plan.md (MVP, variant A).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Reusable, workspace-shared agent roles for the built-in AI chat. A role is
a named persona (system-prompt instructions) + optional model override; a
chat is bound to a role at creation and applies it every turn.
Backend:
- migration 20260620T120000: ai_agent_roles table + ai_chats.role_id
(FK ON DELETE SET NULL); hand-merged types into db.d.ts/entity.types.ts
(db.d.ts is hand-curated here, full codegen would clobber it).
- core/ai-chat/roles: CRUD module. list = any workspace member; create/
update/delete = admin (Manage Settings ability, like ai-settings/mcp).
All repo queries scoped by workspace_id; soft-delete (deleted_at).
- buildSystemPrompt gains roleInstructions: role REPLACES the persona base
(admin prompt / DEFAULT_PROMPT) but SAFETY_FRAMEWORK + context are always
still appended.
- stream(): role resolved from ai_chats.role_id for existing chats (never
the request body -> no per-turn role swap); body.roleId only on creation.
Disabled (enabled=false) and soft-deleted roles fall back to universal.
- getChatModel(workspaceId, override): role model_config can swap model id /
driver; a driver without configured creds throws 503 with a clear message
naming the driver+role, resolved BEFORE response hijack.
Client:
- new-chat role picker (enabled roles only, default Universal assistant),
roleId sent only on the first message; role badge (emoji+name) in the chat
header and conversation list; admin Agent-roles management section in
Settings -> AI (add/edit/delete, MCP-form pattern).
Tests: ai-chat.prompt.spec (role layering + safety always present, incl.
jailbreak); ai.service.spec (override on unconfigured driver -> 503).
Implements docs/ai-agent-roles-plan.md.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- openai provider: use .chat() (Chat Completions) instead of the default callable
(Responses API), which gateways reject on multi-turn -> 400.
- updateAiProviderSettings: assemble settings.ai.provider via jsonb_build_object
with ::text-cast bound params + jsonb_typeof self-heal (postgres.js was
double-encoding it into an array; the ::text cast avoids 'could not determine
data type of parameter').
- chat agent: drop the hard maxOutputTokens cap (truncated complex tool calls);
keep a tiny cap only on the test-connection ping.
- testConnection + chat stream: surface the real provider error (statusCode+message)
to logs and the UI instead of generic masks; never log the API key.
- chat UI: typing indicator, incremental streaming render, tool 'running' status, Stop.
Also bundled (prior uncommitted ai-chat work):
- history 'AI agent' provenance badge; vector RAG (pgvector image + page_embeddings
+ AI_QUEUE indexer + space-scoped semanticSearch); external MCP servers backend
(@ai-sdk/mcp client, SSRF IP-pinning, encrypted headers, admin CRUD/Test);
yjs duplicate-instance fix via pnpm patch (single CJS instance server-side).
WIP checkpoint of the gitmost AI-chat backend (plan stages A + B1 + B3a).
The agent acts under the requesting user's JWT (Docmost CASL enforces page
access); the external service-account /mcp endpoint is untouched.
LLM provider config (A2-A4):
- integrations/crypto: AES-256-GCM SecretBoxService (key derived from APP_SECRET,
per-record salt/iv; clear error on rotation instead of crashing).
- ai_provider_credentials table/repo/types: encrypted API key stored outside
workspace settings/baseFields, write-only (never returned by any endpoint).
- integrations/ai: per-workspace AI SDK v6 provider driver (openai/gemini/ollama),
admin-gated GET(masked)/PATCH(write-only key)/Test endpoints; settings.ai.provider
holds non-secret config incl. systemPrompt. Removed unused AI_* env getters (DB is
the single source of truth).
Chat module (A1, A5-A8):
- ai_chats/ai_chat_messages repos (workspace-scoped, soft-delete, tsv never selected).
- core/ai-chat: CRUD + POST /ai-chat/stream (Fastify hijack + AI SDK v6
pipeUIMessageStreamToResponse, abort on disconnect, persist user/assistant msgs).
- Agent loop: streamText + stepCountIs(8); read tools searchPages/getPage via a
per-request DocmostClient over loopback REST under the user's minted access token.
- Gate settings.ai.chat (+ 503 when provider unconfigured); buildSystemPrompt with a
non-removable safety/anti-prompt-injection framework. Per-user rate limit.
Per-user auth (B1):
- @docmost/mcp DocmostClient gains an additive getToken variant (carry a user JWT,
re-fetch on 401) and exports DocmostClient; the email/password service-account path
(external /mcp, stdio) is unchanged.
Agent-edit provenance backbone (B3a):
- Migration: pages/page_history (last_updated_source, last_updated_ai_chat_id) and
comments (created_source, ai_chat_id, resolved_source).
- Signed actor/aiChatId claim in the collab token; onAuthenticate propagates it,
onStoreDocument writes it with a sticky agent marker, saveHistory copies it.
Migrations auto-run on boot (additive). Write tools, frontend, RAG and external MCP
servers are not in this checkpoint.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* stripe init
git submodules for enterprise modules
* * Cloud billing UI - WIP
* Proxy websockets in dev mode
* Separate workspace login and creation for cloud
* Other fixes
* feat: billing (cloud)
* * add domain service
* prepare links from workspace hostname
* WIP
* Add exchange token generation
* Validate JWT token type during verification
* domain service
* add SkipTransform decorator
* * updates (server)
* add new packages
* new sso migration file
* WIP
* Fix hostname generation
* WIP
* WIP
* Reduce input error font-size
* set max password length
* jwt package
* license page - WIP
* * License management UI
* Move license key store to db
* add reflector
* SSO enforcement
* * Add default plan
* Add usePlan hook
* * Fix auth container margin in mobile
* Redirect login and home to select page in cloud
* update .gitignore
* Default to yearly
* * Trial messaging
* Handle ended trials
* Don't set to readonly on collab disconnect (Cloud)
* Refine trial (UI)
* Fix bug caused by using jotai optics atom in AppHeader component
* configurable database maximum pool
* Close SSO form on save
* wip
* sync
* Only show sign-in in cloud
* exclude base api part from workspaceId check
* close db connection beforeApplicationShutdown
* Add health/live endpoint
* clear cookie on hostname change
* reset currentUser atom
* Change text
* return 401 if workspace does not match
* feat: show user workspace list in cloud login page
* sync
* Add home path
* Prefetch to speed up queries
* * Add robots.txt
* Disallow login and forgot password routes
* wildcard user-agent
* Fix space query cache
* fix
* fix
* use space uuid for recent pages
* prefetch billing plans
* enhance license page
* sync
* Work on mentions
* fix: properly parse page slug
* fix editor suggestion bugs
* mentions must start with whitespace
* add icon to page mention render
* feat: backlinks - WIP
* UI - WIP
* permissions check
* use FTS for page suggestion
* cleanup
* WIP
* page title fallback
* feat: handle internal link paste
* link styling
* WIP
* Switch back to LIKE operator for search suggestion
* WIP
* scope to workspaceId
* still create link for pages not found
* select necessary columns
* cleanups
* integrate websocket redis adapter
* use APP_SECRET for jwt signing
* auto migrate database on startup in production
* add updatedAt to update db operations
* create enterprise ee package directory
* fix comment editor focus
* other fixes