gitmost

Author	SHA1	Message	Date
claude code agent 227	9732bc888c	chore(offline-sync): tighten SW denylist, drop dead /api cache + http localhost CORS - Service worker (vite-plugin-pwa/Workbox): add /share/, /mcp, and /robots.txt to navigateFallbackDenylist so the SPA app-shell never shadows those server-rendered routes (they mirror the server static-serve exclude list — the share SEO/OG HTML, the MCP endpoint, and robots.txt must come from the server). - Remove the dead /api GET NetworkFirst Workbox rule (api-get-cache): offline reads are served by the persisted TanStack Query cache (IndexedDB) + y-indexeddb, never by an SW HTTP cache, so caching GET /api only risked stale responses. All /api is now NetworkOnly. clearOfflineCache still deletes any legacy api-get-cache defensively (comment updated to note it is no longer created). - CORS: drop the cleartext 'http://localhost' native-WebView origin. The Capacitor shell uses the secure scheme (capacitor.config cleartext:false, default Android scheme https, iOS hosted via CAP_SERVER_URL), so no native client uses it; allowing it only widened the credentialed-CORS surface. Keeps capacitor://, ionic://, and https://localhost. - docs/mobile-bootstrap.md: replace the inaccurate 'hand-rolled service worker' description with the real Workbox generateSW setup (prompt registration via virtual:pwa-register, production-only, denylist, NetworkOnly, RQ/y-indexeddb offline reads) and drop http://localhost from the CORS origins list. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 20:53:50 +03:00
claude code agent 227	98dac998d2	fix(offline-sync): bridge collaborative tree updates across processes via Redis In 2-process deployments (COLLAB_URL set) the standalone collab process runs Hocuspocus onStoreDocument, which emits PAGE_UPDATED with a treeUpdate snapshot on a collaborative rename. But CollabAppModule has no WsModule, so PageWsListener (the broadcaster) only exists in the API process — the collab-originated tree update never reached clients, and other users' sidebars/breadcrumbs went stale. Bridge it over Redis pub/sub with the API process as the single broadcast authority: - PageTreeBridgePublisher (registered ONLY in CollabAppModule) listens for PAGE_UPDATED and, when a treeUpdate snapshot is present, publishes it to the collab:tree-update channel. Gated exactly like PageWsListener so content-only saves never publish noise. - PageTreeBridgeSubscriber (registered in WsModule, API process) subscribes on a dedicated duplicated connection and re-broadcasts each snapshot through WsTreeService.broadcastPageUpdated — the same restriction-aware emitTreeEvent path, so authorization is preserved. Double-broadcast is prevented by module placement: the publisher lives only in the standalone collab process's root module, so in single-process mode it is never loaded and the local PageWsListener stays the sole broadcaster. The bridge is optional and fail-safe: publish errors, malformed payloads, broadcast rejections, an unlistened 'error' on the subscriber connection, and a subscribe() failure at boot are all caught and logged, never crashing or blocking the process. NOTE: assumes a single API broadcaster; horizontal API scaling would need a consumer-group/leader-election instead of fan-out pub/sub. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 20:53:50 +03:00
claude code agent 227	91d551fd4c	fix(offline-sync): make legacy ydoc self-heal atomic and crash-safe onLoadDocument rebuilds a legacy page (page.content, no page.ydoc) into a Yjs doc and seeds its 'title' fragment from the page.title column. Both TiptapTransformer.toYdoc and buildTitleSeedYdoc mint fresh Yjs client-ids on every call, so the heal must run exactly once per page. Three holes let it run twice (or lose a write): - Duplication trap: the initial page read took no row lock, so two processes (the API process via openDirectConnection and the standalone collab process) could both observe ydoc IS NULL and each rebuild with different client-ids; a long-offline client merging an earlier rebuild then duplicates all content. - Lost-update: persistYdoc wrote updatePage({ydoc}) outside any transaction, so it could clobber a concurrent onStoreDocument write (which does take a lock). - Swallowed write errors: a failed heal-persist was logged but the unpersisted fresh-client-id doc was returned anyway, silently re-arming the trap. Fix: the heal now runs in healUnderLock, which re-reads the row FOR UPDATE inside one transaction and re-validates under the lock — if ydoc is now present it adopts it (no rebuild, no write), otherwise it rebuilds, seeds, and persists the ydoc in the SAME transaction. The healthy hot path still loads with no lock and no write. Failure handling surfaces instead of hiding: a rebuild-persist failure refuses the load (re-throw + error log) so an unpersisted rebuild is never handed out, while a seed-only persist failure serves the existing healthy ydoc without the unpersisted seed (non-fatal). Removed the non-transactional persistYdoc. Deliberately does NOT use a fixed clientID: identical client-ids across docs built from differing content violate Yjs per-actor uniqueness and corrupt worse than the trap; serialization under the row lock is the correct fix. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 20:53:50 +03:00
claude code agent 227	baeaf328c7	fix(offline-sync): keep page titles in sync between REST and Yjs Title now lives in the page's Yjs 'title' fragment, but two paths corrupted it: - Rename-revert: a REST/MCP title change wrote only the page.title column, never the Yjs fragment, so the next editor open replayed the stale Yjs title and reverted the rename. PageService.update now mirrors the new title into the Yjs 'title' fragment via CollaborationGateway.writePageTitle, which goes through openDirectConnection directly (Redis-independent: works with COLLAB_DISABLE_REDIS and in single-process deployments, unlike the Redis-routed handleYjsEvent path). The write is best-effort: a Yjs failure is logged and never rolls back the committed column write. Agent provenance (actor/aiChatId) is threaded into the store context. - Untitled-on-open: an empty/just-initialized 'title' fragment clobbered a non-empty page.title to '' on open. onStoreDocument now treats the title as changed only when the extracted text is non-empty, covering both the title-only and body+title save branches. Empty-retitling via collab is intentionally impossible; the REST DTO is the place to enforce non-empty. writeTitleFragment does a full clear+seed of the 'title' fragment (no duplication/concatenation) and leaves the body fragment intact. Removed the dead useTreeMutation.handleRename path. Adds unit tests for writeTitleFragment, the gateway write, the anti-empty-clobber guard, and agent provenance. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 20:53:50 +03:00
claude_code	2786ee4699	fix(offline,server,docs): apply PR #116 review findings to offline-sync Carries the still-applicable findings from the PR #116 review into PR #120, since #120 includes the mobile-bootstrap commit. CORS hardening (removing the unconditional localhost/capacitor origins) is intentionally left out of scope. Service worker routing (latent bug fix + testability): - vite.config.ts: anchor Workbox path matching to a segment boundary (^/<seg>(/\|$)) instead of startsWith, so siblings like /apidocs, /collaborators, /socket.iox are no longer mis-routed as API/realtime and forced NetworkOnly; align navigateFallbackDenylist with the same anchors. - new apps/client/src/pwa/sw-strategy.ts holds the canonical predicates (isApiPath, isCollabOrSocketPath) + unit tests; the vite.config regexes mirror it inline (Workbox generateSW serializes urlPattern fns standalone, so they cannot import the module). Server CORS (R1 extraction + coverage): - extract buildCorsAllowlist / isOriginAllowed into cors.util.ts with unit tests (evil-origin rejected, WebView/no-Origin allowed); main.ts rewired to use them with byte-for-byte identical behavior. Privacy — clear offline cache on logout: - new clear-offline-cache.ts purges the persisted query cache (idb-keyval gitmost-rq-cache), the Yjs page.* IndexedDB databases, and the service-worker api-get-cache; wired into handleLogout (best-effort, before the redirect) so a previous user's private data does not linger locally. Conventions & docs: - prettier fixes on main.ts and login.dto.ts. - CHANGELOG: document offline reading, returnToken opt-in, optional Swagger, new env vars, logout cache-clear, and the CORS open->allowlist breaking change. - docs/mobile-app-plan.md: correct the now-false §2.4 claims and update the §12 checklist (native cap add ios left unchecked — generated locally, gitignored). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 20:48:35 +03:00
claude_code	abd7fcc5b8	test(server): port missing returnToken/env edge cases from #116 PR #120 rewrote auth.controller.spec.ts and environment.service.spec.ts in a leaner style but dropped several edge cases that PR #116 covered. Port the gaps so the server coverage matches the original review intent: - auth.controller: returnToken=false must behave like the omitted case (no token in the response body, cookie still set) — guards an `!== undefined`-style regression. - environment.getCorsAllowedOrigins: empty string -> [], single origin, and leading/trailing/duplicate commas with spaces -> trimmed list. - environment.isSwaggerEnabled: mixed-case "True" -> true; "false"/""/"1" -> false. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 20:48:08 +03:00
claude_code	a263748d02	test(offline): add reviewer-requested coverage for offline-sync core logic Adds the unit tests called out in the PR #120 review (test-coverage aspect). No production logic changes — the only non-test edit is exporting the already-injectable warmInfiniteAll helper so it can be unit tested. Server (Jest): - persistence.extension.spec.ts: onStoreDocument classification matrix (no-op / title-only / body+title / body-only), onLoadDocument seed + persist gating (early-return, page-null, ydoc seed, already-seeded no-persist, legacy content->ydoc), and seedTitleFragment 4-branch guard. - collaboration.util.spec.ts: buildTitleSeedYdoc round-trip. - environment.service.spec.ts: getCorsAllowedOrigins / isSwaggerEnabled. - auth.controller.spec.ts: login returnToken opt-in branch. Client (Vitest): - query-persister.test.ts: shouldDehydrateOfflineQuery status + allowlist gates and OFFLINE_PERSIST_ROOTS membership. - is-capacitor.test.ts: isCapacitorNativePlatform platform detection. - make-offline.test.ts: warmInfiniteAll cursor walk / maxPages / error swallow, and warmPageYdoc settle-once + timeout + teardown. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 20:48:08 +03:00
claude_code	b6d6de265b	feat(mobile): bootstrap mobile app (PWA + Capacitor + backend auth/CORS) Implements the §12 bootstrap from docs/mobile-app-plan.md. Backend (§6): - auth: optional returnToken flag on login returns the JWT in the body (data.authToken) for native Keychain/Keystore + Bearer; web cookie flow unchanged. - main.ts: explicit CORS allowlist (APP_URL + CORS_ALLOWED_ORIGINS env + Capacitor WebView origins), credentials enabled, replaces open enableCors(). - optional OpenAPI/Swagger at /api/docs behind SWAGGER_ENABLED. - env: CORS_ALLOWED_ORIGINS, SWAGGER_ENABLED, CAP_SERVER_URL. PWA: - manifest metadata, hand-rolled service worker (network-first nav, SWR assets, never intercepts /api,/socket.io,/collab), prod-only registration, apple-touch-icon. Capacitor: - capacitor.config.ts (webDir apps/client/dist; iOS via CAP_SERVER_URL to avoid bundling the AGPL client in the .ipa, see plan §9), cap:* scripts, deps, .gitignore for native dirs. - docs/mobile-bootstrap.md documenting what is done and the remaining manual steps (cap add ios/android, APNs/FCM, stores). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 20:47:38 +03:00
claude_code	a1fb828907	feat(offline): PWA shell, Yjs-backed titles, and offline read cache (M0–M2) Implements docs/offline-sync-plan.md milestones M0–M2. M0 (PWA shell): - Add vite-plugin-pwa (generateSW, registerType: 'prompt', manifest:false); NetworkOnly for /api,/collab,/socket.io, NetworkFirst for GET /api, navigateFallback to index.html. - Register SW via useRegisterSW with a Mantine update prompt; skip registration inside Capacitor native WebView (is-capacitor guard). M1 (harden CRDT body + title into Yjs): - Lift the per-page Y.Doc/Hocuspocus providers into a shared hook+context so body and title editors share one doc. - Move the page title into a dedicated 'title' Yjs fragment (CRDT, offline- tolerant); drop the REST title save. Server persists the title fragment to page.title and seeds it for legacy pages (empty-fragment guard); a collab rename emits a treeUpdate so other users' tree/breadcrumbs refresh. - Persist the rebuilt ydoc on the content->ydoc path to neutralize the Yjs duplication trap. Add a 3-state sync indicator. M2 (offline read/navigation): - Persist React Query to IndexedDB (idb-keyval persister, version buster, selected roots only). - "Make available offline" action warms page, space, tree (root+ancestors+ children) and comments under exact hook keys, plus the page ydoc. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 20:47:09 +03:00
claude_code	fad1aa0501	fix(db): move share-aliases migration spec out of migrations/ The #205 share-aliases feature placed share-aliases.migration.spec.ts inside src/database/migrations/. Kysely's FileMigrationProvider loads EVERY file in that folder as a migration, so `migration:latest` imported the test file and crashed with "ReferenceError: describe is not defined" (no Jest globals under tsx). That broke the migration step shared by the e2e-server, e2e-mcp and integration-test (test/test) jobs. Move the spec one level up to src/database/ (matching the existing src/database/jsonb-bind.spec.ts convention) so the migration runner no longer sees it, and fix its relative imports (./migrations/... and ./types/...). Jest still picks it up via the src/*/.spec.ts test glob. Verified locally: 3 passed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 20:14:31 +03:00
vvzvlad	13589b3973	Merge pull request 'feat(share): custom /l/:alias pretty links (share_aliases table) (#205 )' (#214 ) from feat/205-share-aliases into develop Reviewed-on: #214	2026-06-26 20:00:50 +03:00
claude_code	2e72a24d13	test(e2e): silence ts-jest allowJs warnings for editor-ext .js The e2e transform matches .js (required so ESM-only node_modules like nanoid/@sindresorhus get transpiled), which also sweeps in editor-ext's prebuilt CommonJS dist/*.js. ts-jest then warns "Got a .js file to compile while allowJs is not set to true" for each footnote file. The .js match cannot be dropped without reintroducing the ESM load errors, so enable allowJs for ts-jest via an inline tsconfig override (merged with apps/server/tsconfig.json — decorators/paths/module stay intact). Verified locally: 0 allowJs warnings, app still compiles and boots to the Redis connection (no DI/metadata regressions). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 19:45:37 +03:00
claude_code	aad0a37cfd	0.94.1 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 19:33:57 +03:00
claude_code	e4b46ddbfc	test(e2e): make server e2e actually boot (ESM chain + Fastify adapter) The previous jest-config fix let the module graph load further and exposed two more reasons the server e2e never passed since it was added: 1. ESM transform chain: AppModule pulls in editor-ext -> @tiptap -> @sindresorhus/slugify -> @sindresorhus/transliterate / escape-string-regexp, plus p-limit -> yocto-queue — all ESM-only. Extend the e2e transformIgnorePatterns whitelist to transform them (scoped packages need both the pnpm `@scope+name` and nested `@scope/name` path forms, hence `@sindresorhus[+/][a-z0-9-]+`). Verified locally: the graph now fully transforms and resolves. 2. Wrong HTTP adapter: Docmost runs on Fastify (main.ts uses FastifyAdapter) and does not depend on @nestjs/platform-express, but the scaffold test used the default createNestApplication() (Express) and died with "@nestjs/platform-express package is missing". Switch the test to FastifyAdapter + getInstance().ready(), close in afterEach. Verified locally: createNestApplication + app.init() now proceed to the live Redis/Postgres connection (the infra CI provides via services + migrations). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 19:13:22 +03:00
claude_code	deeec50b5f	test(e2e): fix remaining server config and mcp image failures Follow-up to the first e2e fix: with nanoid/editRes.edits resolved, the suites failed one layer deeper. Both layers were never green since the e2e jobs were added (non-blocking in CI), so the failures had stacked up. server e2e (jest-e2e.json) — align module resolution/transform with the working unit/integration jest configs so AppModule's full import graph loads: - moduleFileExtensions: add "tsx" (React-Email .tsx templates are pulled in via the auth controller chain). - transform: ^.+\.(t\|j)s$ -> ^.+\.(t\|j)sx?$ so .tsx is transformed. - moduleNameMapper: add ^src/(.*)$ -> <rootDir>/../src/$1 (code imports via the absolute 'src/...' alias). Verified locally: the module graph now fully resolves (only env vars, supplied by CI, remain). mcp e2e (test-e2e.mjs) — insert_image/replace_image accept only http(s) URLs the server fetches; the test passed local file paths and died with "Invalid image URL". Serve the PNG bytes over a throwaway 127.0.0.1 HTTP server (the Docmost server runs on the same CI host) and pass URLs. The featPng negative test is untouched: replaceImage checks the attachmentId and throws before fetching, so its local path is never validated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 18:54:42 +03:00
claude_code	7eefdad512	test(e2e): fix failing server and mcp e2e suites Two unrelated CI failures on the 0.94.0 release PR: - server e2e: jest-e2e.json lacked transformIgnorePatterns, so the ESM-only nanoid@5 package was loaded as CommonJS and crashed with "Cannot use import statement outside a module". Add the same node_modules whitelist already present in the unit and integration jest configs (nanoid\|uuid\|image-dimensions\|marked\|happy-dom\|lib0). - mcp e2e: test-e2e.mjs read editRes.edits, but editPageText() returns the per-edit results under `applied` (not `edits`), so editRes.edits was undefined and .every() threw. Read editRes.applied instead. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 18:34:56 +03:00
claude_code	378d8b676b	0.94.0 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 18:15:24 +03:00
vvzvlad	580f7bd5bb	Merge pull request 'Батч: бейдж контекста (#189 ) + e2e в CI (#187 ) + inline-тест MCP (#170 )' (#197 ) from batch/issues-189-187-170 into develop Reviewed-on: #197	2026-06-26 18:09:47 +03:00
claude code agent 227	0643cd1d82	test(share): exercise 70-char title-slug clamp in alias redirect The controller's buildPageSlug truncates the page title via `title?.substring(0, 70)` before slugifying, but no test drove that branch (the only titled case was 16 chars). Add a resolvable-alias case with a 119-char title whose 70-char boundary falls mid-word and assert the 302 target's slug reflects only the first 70 characters. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 18:05:21 +03:00
claude code agent 227	ba5cd02439	Address PR #197 review: test coverage + dedup + CI log capture Code-review follow-ups (Approve-with-comments) for batch #197 (context badge #189 / e2e in CI #187 / inline MCP test #170): - server: extract the duplicated chatContextWindow ::text->positive-int coercion (resolve() + getMasked()) into an exported parsePositiveInt helper and unit-test its branches (200000/1.9/0/-5/""/abc/undefined), closing the untested read-path gap. - client: merge the two backward scans over messageRows into one pure, exported selectContextBadge helper (numerator and denominator still taken from the most recent row carrying EACH value) and unit-test the different-rows and fresh-zero-doesn't-shadow cases. - client: extract the MCP "Test" button tristate presentation into a pure mcpTestButtonView helper (collapses the two parallel if/else chains) and unit-test idle/ok-with-tools/ok-no-tools/failed label+tooltip branches. - ci: redirect the backgrounded prod server's stdout/stderr to a log file in e2e-mcp and cat it on failure, so a start-up crash is diagnosable instead of surfacing only as the generic health timeout. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 17:24:29 +03:00
claude code agent 227	1043fe3b51	test(share): cover alias controllers; address PR #214 review Add the two blocking test-coverage specs requested in the PR #214 review and clear the cheap non-blocking items. Must-fix: - share-alias-redirect.controller.spec.ts: routing/leak guard for the public GET /l/:alias resolver (modeled on share-seo.controller.routing.spec). Pins 302-to-canonical on a hit; SPA index without a 302 for unknown/dangling/ unreadable aliases and a null workspace (no name-existence leak); defensive percent-decoding treated as unknown; self-hosted findFirst vs subdomain findByHostname workspace resolution; 404 when no built client index exists. - share-alias.controller.spec.ts: authz gates with mocked PageRepo/ShareService/ ShareAliasService/PageAccessService. Covers cross-workspace/nonexistent page -> NotFoundException, validateCanEdit, resolveReadableSharePage null -> BadRequestException, isSharingAllowed false -> ForbiddenException, set happy path delegation, remove() of a dangling alias (pageId null) skipping validateCanEdit but still deleting, and for-page validateCanView. Cheap review items: - Remove dead Logger import/field from ShareAliasRedirectController. - Remove dead PagePermissionRepo import/dependency from ShareAliasController. - Register the new share-alias UI strings in en-US and ru-RU catalogs. - Add an [Unreleased]/Added CHANGELOG entry for /l/:alias (#205). - Drop the tautological boilerplate assertions from the migration spec (exports up/down; runtime checks of typed entity literals). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 17:22:29 +03:00
claude code agent 227	eb5c8e6611	refactor(ai-chat): simplify share onFinish token extraction and cover the fallback (#159 ) onFinish always receives a totalUsage object, so the `?? {}` guard and optional chaining were dead. Extract the field-level extraction into a recordTurnUsage method (totalTokens, else input+output) and unit-test that recordShareTokens receives the summed value when totalTokens is absent and the authoritative total when present. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 17:01:02 +03:00
claude code agent 227	fdeede003b	feat(share): custom /l/:alias pretty links (share_aliases table) (#205 ) Add a retargetable, human-readable vanity link namespace /l/<alias> that sits alongside the untouched /share/... routes. - New share_aliases table (workspace-scoped, UNIQUE(workspace_id, alias), page_id nullable ON DELETE SET NULL so the address outlives its target). - ShareAliasRepo + ShareAliasService (create / no-op / 409 reassign guard / availability / request-time readable-target resolution through the single existing share boundary). - Public ShareAliasRedirectController (GET /l/:alias) issues a 302 (never 301, the target is mutable) to the canonical /share/:key/p/:slug page; unknown / dangling / no-longer-readable aliases serve the SPA index with no leak. 'l/:alias' excluded from the global /api prefix. - Authenticated ShareAliasController (set/remove/availability/for-page). - Shared ASCII-only normalize/validate util (server + client copies). - Client: Custom address block in the share modal (live normalize + debounced availability + copy + reassign confirmation dialog). - Unit tests: util, repo SQL-shape, service semantics, migration/entity sanity (server jest) + client alias util (vitest). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 06:28:26 +03:00
claude code agent 227	1d610b3a62	fix(ai-chat): add per-workspace rolling-day token budget for anonymous share assistant (#159 ) The anonymous public-share assistant only capped the COUNT of requests (100/hour/workspace), not their cost. One accepted turn runs the agent loop up to stepCountIs(5), re-sending the whole client-held transcript as input on every step, while maxOutputTokens caps only the output; the request window is hourly with no daily ceiling, so a steady stream at the cap sustains ~24x its count per day. Counting requests therefore does not bound the owner's LLM bill (red-team finding #5). Add a second cost contour: a cluster-wide, sliding-window per-workspace TOKEN budget over a rolling day. It is checked read-only BEFORE a turn streams (429, no request slot consumed, nothing spent) and the turn's real usage (totalUsage: input re-sent per step + output, summed across all steps) is recorded once it finishes via streamText onFinish. Fails closed on the check (deny when Redis can't prove we're under budget); best-effort on the record. Env-overridable via SHARE_AI_WORKSPACE_TOKEN_BUDGET_PER_DAY (default 1M/day). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 06:23:48 +03:00
claude code agent 227	770ba70541	fix(collab): retry transient store failures so autosave edits aren't lost (#206 ) persist-1: onStoreDocument wrapped the page write in a try/catch that only logged and swallowed the error, then resolved "successfully". hocuspocus destroys/unloads the in-memory Y.Doc right after the hook resolves (the only copy of the latest edit), so a transient DB error (deadlock, serialization failure, dropped connection) silently lost the edit. Worse, the post-store branch ran on the partially-assigned `page`, broadcasting a phantom "page.updated" and enqueueing a history snapshot for content never written. Wrap the write in a small bounded retry (3 attempts) so the save is re-attempted while we still hold the doc, and clear `page` on failure so the success-only side effects never report a save that didn't happen. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 06:05:28 +03:00
claude code agent 227	c919d4f636	fix(page): copy shared attachments for every referencing page on duplicate (#206 ) attach-1: when the same attachmentId was referenced by more than one page in a duplicated subtree, the per-attachmentId map held only a single copy entry, so the last page processed clobbered the others. The downstream ownership guard (`attachment.pageId !== oldPageId`) then matched at most one page and skipped the lone DB row entirely: no blob copied, no new row, every copy's image 404'd. Key the map to a list of entries and copy one blob/row per referencing page; drop the now-incorrect ownership guard. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 05:57:46 +03:00
claude code agent 227	c4807022f2	fix(page): add cycle/depth guard to recursive tree-traversal CTEs (#207 ) getPageBreadCrumbs (ancestor CTE) and forceDelete (descendant CTE) used withRecursive + unionAll with no CYCLE clause or depth cap. If a parent/child cycle already exists in the data (e.g. one slipped in via the #7 TOCTOU race), both queries loop forever — hang / statement timeout. Worse, the move guard itself runs the ancestor CTE, so a cycle would disable the very guard meant to prevent it (#207 #8). Add a depth counter bounded by MAX_PAGE_TREE_DEPTH to both recursive CTEs; the walk stops at the cap, so a cycle yields a bounded result instead of hanging. Real page trees are only a few levels deep, so the cap never truncates a legitimate result. getPageBreadCrumbs selects an explicit column list (not selectAll) so the internal depth counter never leaks into the breadcrumb shape. Adds an integration test that seeds an A<->B cycle directly and asserts both getPageBreadCrumbs and forceDelete return bounded / complete under a short connection-level statement_timeout instead of hanging. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 05:48:01 +03:00
claude code agent 227	00ca4ff3d6	fix(page): make movePage cycle-check + update atomic to prevent concurrent A<->B cycles (#207 ) The server-side move cycle guard (getPageBreadCrumbs) and the move UPDATE ran as two separate, unlocked statements. Two concurrent moves ("A under B" and "B under A") could each read the same pre-write acyclic snapshot, both pass the guard, then persist A.parentPageId=B AND B.parentPageId=A — a parent/child cycle (TOCTOU, #207 #7). Run the cycle check and the UPDATE inside one transaction (executeTx) guarded by a per-space advisory lock (pg_advisory_xact_lock, held until COMMIT) so all moves within a space serialize: the second mover blocks until the first commits and then sees the freshly written parent, so its guard rejects the cycle. getPageBreadCrumbs gains an optional trx so the check runs on the locked snapshot. Adds an integration test driving two opposing concurrent movePage calls and asserting no cycle ever persists and exactly one move is rejected. Updates the movePage unit-test stubs for the new transactional path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 05:43:57 +03:00
claude code agent 227	ef7d04d1e7	fix(ai-chat): clearer tool-call validation error for dropped pageId in parallel batches (#190 ) In-app AI-chat tools used bare zod schemas, so when the model dropped a required arg (typically pageId) in a parallel/batch tool call, the AI SDK forwarded zod's raw "expected string, received undefined" text to the model — not actionable. Add a centralized modelFriendlyInput(shape) wrapper that keeps the exact JSON Schema contract (required/description/constraints via z.toJSONSchema draft-7) but replaces the raw zod text with a message naming each missing/invalid parameter and reminding the model not to drop ids like pageId in parallel batches. No value is guessed/backfilled (cf. #159). Applied to every in-app tool: the sharedTool() builder and all inline inputSchema in ai-chat-tools.service.ts, plus public-share-chat-tools.service.ts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 05:30:07 +03:00
claude_code	9b61024b95	feat(ai-chat): header badge shows current/max context, max from AI settings (#189 ) The floating chat window's header badge flipped meaning — a live per-turn token counter while streaming, the persisted context size at rest — so it "reset to 1" on each prompt and conflated two different numbers. Replace it with a stable "current / max" context badge (e.g. `572 / 200k`). The live "Thinking · N tokens" inside the chat body stays; only the duplicate live counter is removed from the header. Max comes from a new admin setting "Context window (tokens)". The server resolves it and attaches `maxContextTokens` to the completed assistant turn's metadata (next to contextTokens), so the badge needs no client-side model resolution and this survives public shares / per-role models. Server: - ai.types: chatContextWindow on AiProviderSettings + PROVIDER_SETTINGS_KEYS + ResolvedAiConfig + MaskedAiSettings. - workspace.repo: chatContextWindow in AI_PROVIDER_SETTINGS_ALLOWED (parity). - update-ai-settings.dto: @IsInt @Min(0) chatContextWindow. - ai-settings.service: coerce the ::text-stored value to a positive int in resolve()/getMasked(). - ai-chat.service: flushAssistant writes metadata.maxContextTokens (>0); the completed turn passes resolved.chatContextWindow. Client: - ai-chat.types: maxContextTokens on the message-row metadata. - ai-chat-window: read maxContextTokens; render "current [/ max]"; drop the liveTurnTokens state/branch and the onLiveTurnTokens prop; new tooltip. - chat-thread: remove the live-turn-token throttle effect and plumbing. - count-stream-tokens: drop the now-dead liveTurnTokens()/types; keep estimateTokens. - settings: chatContextWindow on IAiSettings(+Update) + a NumberInput in the AI provider settings form. i18n: add the badge/settings keys (en, ru); remove the two now-unused keys. Tests: flushAssistant maxContextTokens, DTO validation, trim token tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-25 22:39:09 +03:00
claude code agent 227	ed3b65c36b	Merge remote-tracking branch 'gitea/develop' into batch/issues-2026-06-25 # Conflicts: # apps/server/src/core/ai-chat/ai-chat.service.spec.ts # apps/server/src/core/ai-chat/ai-chat.service.ts	2026-06-25 12:48:47 +03:00
claude code agent 227	aa7a115f66	refactor(review): address PR #186 re-review (approve-with-comments) Approve-with-comments re-review; no blockers. All 7 actionable points (8 is a forward-looking architecture note — recommendation A, keep as-is): 1. chat-markdown.util spec: restore parity coverage of the removed client spec — tool error state (+ errorText), unknown-tool fallback (`Ran tool <name>` en / `Выполнил инструмент <name>` ru), and the circular-output stringify catch. 2. findAllByChat row cap is now testable (injectable limit) + an int-spec proves truncation on a modest volume. 3. Stability: the per-step durability updates are SERIALIZED via a promise chain (stepUpdateChain) so they commit in step order — onlyIfStreaming already closed the finalize race, this closes inter-step ordering. 4. findAllByChat keeps the NEWEST messages on truncation (order DESC + reverse, like findRecent) and logs a warning with chatId, instead of silently dropping the newest tail. 5. The LABELS parity comment already references the real path (tool-parts.tsx / toolLabelKey) — confirmed accurate. 6. Removed the redundant 'off-by-one boundary' test (strict subset of the two adjacent prepareAgentStep cases). 7. Extracted the terminal-finalize dispatch into a shared `applyFinalize`, used by BOTH the service's finalizeAssistant and its test — the test now exercises the real path, not a copy, so a production drift fails it. Verified: server build + 325 ai-chat unit + 6 integration; prettier clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 12:28:35 +03:00
claude code agent 227	30c358a2f8	test(review): add the 4 new test-coverage points from PR #185 re-review The re-review's blocking/structural points (lease leak, dup-id guard test, body-before-title test, CHANGELOG, pg18, shared jsonb decoder) were already addressed in commit 24264ef; this adds the 4 genuinely-new coverage requests: - pt 6: `scrollToReference(id, index?)` exercised against a live editor DOM — selects the index-th `sup[data-footnote-ref][data-id]` occurrence, falls back to the first for out-of-range, returns false for an empty id (scrollIntoView stubbed). (#168) - pt 7: export `backlinkLabel` and pin the base-26 carry boundary (25->z, 26->aa, 27->ab, 51->az, 52->ba). (#168) - pt 8: integration fail-open — a PRESENT-but-corrupt tool_allowlist (jsonb string scalar holding non-array JSON) reads back as null ("no restriction"), covering normalizeRow's degrade branch. (#159 #172/#173) - pt 9: getFootnoteRefCount cache invalidation — adding a `[^a]` reference bumps the cached count 2 -> 3. (#168) Verified: editor-ext footnote 23; client structure 7 + tsc; server int 8. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 12:08:21 +03:00
claude code agent 227	ea61c96a7c	refactor(review): address PR #186 review (#183 — recency sweep, #174 export, tests, cleanups) 15-point review of the persistent-history PR. Architecture decisions: crash recovery = recency threshold; tool-label duplication = leave as-is. Must-fix: 1. Boot-sweep bounded by recency. sweepStreaming now also requires `updatedAt < now() - SWEEP_STREAMING_STALE_MS` (10 min), so a fresh replica's startup sweep can't abort a turn another replica is actively streaming (multi-instance deploy). Int-spec: a FRESH 'streaming' row is NOT swept, a STALE one IS. 2. Restore export during the FIRST streaming turn of a new chat (#174). The server chatId is now adopted EARLY (in-place, on the start-chunk metadata) via a new `onServerChatId` callback wired through use-chat-session → chat-thread, so `activeChatId` is set at turn start and the Copy button is live mid-first- turn (canExport = !!activeChatId). Hook tests for early/in-place/no-op adopt. 3. Cover finalizeAssistant's fallback-insert branch: extracted pure `planFinalizeAssistant(assistantId)` (update when id present, insert when the upfront insert failed) + a dispatch harness test for both arms. Tests: onModuleInit lifecycle spec (sweep called; throw → resolves + warns); int-spec updatedAt assertion → toBeGreaterThan. Cleanups: cap findAllByChat at 5000 rows; upfront-insert-failure log carries chatId+workspaceId; removed the now-dead buildPartialAssistantRecord (only the spec consumed it; shapes still pinned by the flushAssistant suite); controller passes `lang: dto.lang` (normalizeLang handles undefined); dropped a no-op `?? undefined` in errorOf; documented the content-column semantics change (concatenated step text, UI renders from metadata.parts); CHANGELOG [Unreleased] entry (#183, #174); reworded the stale LABELS parity comment. Verified: server build + 323 ai-chat unit + 5 integration; client tsc + 160 ai-chat unit; prettier clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 11:53:25 +03:00
claude code agent 227	f80276d41a	refactor(review): address PR #185 review (lease leak, tests, changelog, jsonb seam) 8-point multi-aspect review of the batch PR; security/regressions were clean. 1. Lease leak: the #180 reorder moved `toolsFor` (which leases external MCP clients, refCount+1) ahead of buildSystemPrompt + forUser, but the only release (closeExternalClients) was bound to the streamText callbacks. A throw in between leaked the lease (refCount stuck, undici sockets held until restart). Define closeExternalClients right after the lease and wrap buildSystemPrompt+forUser in try/catch that closes-then-rethrows. 2. Cover the patch_node/delete_node dup-id refusal (#159 #6): extract the guard into a pure `assertUnambiguousMatch` (node-ops) and unit-test 0/1/>1. 3. Regress the body-before-title order (#159 #10): mock-HTTP test (collab fails fast against a server with no WS upgrade) asserts /pages/update (title) is NEVER posted when the body write fails — for updatePage AND updatePageJson. 4. CHANGELOG [Unreleased]: #180, #168 (Added); #163 (Fixed). 5. Add the missing en-US i18n keys (Back to references / {{label}}). 6. Drop the duplicate content/empty/blank cases in ai-chat.prompt.spec.ts (they repeat the buildMcpToolingBlock unit tests); keep only sandwich placement + both-safety-copies. 7. CI Postgres pg16 -> pg18 (match docker-compose). 8. jsonb decode seam: shared `parseJsonbValue(value, guard)` in database/utils.ts holds the legacy double-encoding self-heal in one place; parseToolAllowlist / parseModelConfig keep only a type-guard. Verified: server build + 124 unit + 15 integration; mcp 311; prettier clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 11:36:01 +03:00
claude code agent 227	34c5b557ef	fix(share): SEO route must not leak a restricted page's title (#159 ) `ShareSeoController.getShare` resolved the inherited share with the RAW `getShareForPage`, which does NOT run the restricted-ancestor gate. So for a page shared with includeSubPages whose descendant is permission-restricted, the SEO route served that descendant's real title in <title>/og:title/twitter:title to anonymous visitors and crawlers — even though the content API returns 404 for it (red-team finding #3). Funnel the SEO path through the canonical `resolveReadableSharePage` boundary (the single place that checks `hasRestrictedAncestor`): a non-readable page now serves the plain SPA index with no meta. Also honour `isSharingAllowed` — a share whose workspace/space sharing toggle was flipped off after creation no longer leaks its title via SEO. Title comes from the server-resolved page; `buildShareMetaHtml` already emits robots=noindex when the share opted out of indexing. Tests (controller routing, fs spied at call time so bcrypt's native loader is untouched): non-readable page => plain index, no title; sharing-disabled => plain index; readable+indexing => title + og:title, no noindex; readable+no- indexing => noindex. Asserts getShareForPage is never called by the SEO path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 11:36:01 +03:00
claude code agent 227	59f0c8b22d	fix(ai-chat): validate the open page server-side so the agent edits the right one (#159 ) The client sends the "current page" as { id, title } in the request body and the server echoed BOTH verbatim into the system prompt context and the getCurrentPage tool. id and title are independently attacker/desync-controllable (two tabs, stale navigation), so openPage.id could point at page B while openPage.title said "Page A" — the model then reported "updated Page A" while it actually edited page B (CASL still allowed it; the user has access). Red-team finding #4. Resolve the open page ONCE against the DB via a new `resolveOpenPageContext`: workspace-scoped lookup + access check, returning the AUTHORITATIVE { id, title } (title from the DB row, never the client) or null (fail-closed) for a missing / foreign / inaccessible page. That validated value now feeds the system prompt, the getCurrentPage tool, AND the new-chat history origin (which previously did this validation inline, for the id only — now shared, and the title is fixed too). Tests: resolveOpenPageContext covers no-id, not-found, foreign-workspace, Forbidden, non-Forbidden-fault (fail-closed), the DB-title-wins-over-client case, and null-title coercion. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 11:36:01 +03:00
claude code agent 227	77ccc596ea	feat(ai-chat): per-MCP-server instructions in the agent system prompt (#180 ) Admins can now give each EXTERNAL MCP server a free-text instruction ("how/ when to use this server's tools") that the agent receives in its SYSTEM PROMPT next to the tool descriptions — porting the built-in SERVER_INSTRUCTIONS idea to admin-configured servers. Trusted, admin-authored text (like a system prompt); NON-secret, so unlike headersEnc it IS returned in views/forms. - Migration: nullable `instructions text` on ai_mcp_servers (old rows = null = no guidance). Table type + repo insert/update (blank/whitespace -> null via blankToNull). DTO `@MaxLength(4000)`. Service threads it through McpServerView/toView. - mcp-clients: `McpServerInstruction { serverName, toolPrefix, instructions }` threaded through the toolset/cache/lease. Guidance is built ONLY for a server that actually connected AND contributed >=1 callable tool (the allowlist may filter all of them out) AND has non-blank text — so a guide never appears for tools the agent cannot call. Cached with the toolset, so an edit is picked up next turn via the existing CRUD cache invalidation. - System prompt: `buildMcpToolingBlock` renders an <mcp_tooling> block INSIDE the safety sandwich (after context, before the trailing SAFETY_FRAMEWORK) so it informs tool choice but cannot override the rules; each section is headed by the server's `prefix_*` namespace. Empty/blank -> block omitted. The caller (ai-chat.service) now builds the external toolset BEFORE the prompt and passes external.instructions; client-handle lifecycle (close-once) unchanged. - Client: instructions field in types + a Textarea (autosize, maxLength 4000) in the MCP-server form with a namespace-prefix hint; i18n (en/ru). Tests across every layer (prompt block placement + both SAFETY copies; view blank->null; buildEntry includes guidance only for connected+>=1-tool+non-blank; DTO MaxLength; repo + integration round-trip; service wiring). Delegated impl reviewed (APPROVE); applied the import-type follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 11:36:01 +03:00
claude code agent 227	1cfad1f6fb	fix(db): jsonb double-encoding follow-ups from PR #172 review (#173 ) PR #172 fixed the jsonb double-encoding for `tool_allowlist` but the same class of bug, and the same re-derived workaround, remained elsewhere. 1. model_config (agent roles): jsonbObject still used the buggy `::jsonb` bind, so `ai_agent_roles.model_config` round-tripped as a jsonb STRING SCALAR. The read-path `typeof === 'object'` check then failed and the model override was SILENTLY dropped (role fell back to the default model). Fixed to `::text::jsonb` and added `parseModelConfig` + `normalizeRow` so every read self-heals already-corrupted rows (no migration). 2. Centralized the write workaround as `jsonbBind()` in database/utils.ts — one implementation with one explanation of the quirk — replacing the per-repo `jsonbArray` (mcp) and `jsonbObject` (roles). 3. Integration coverage (the fix is a DB round-trip a unit test cannot see; the read-side parser MASKS a write regression): new ai-mcp-server-repo.int-spec asserts `jsonb_typeof(tool_allowlist)='array'` after insert + heals a seeded string-scalar row; ai-agent-roles-repo int-spec gains the same for `model_config` (`'object'` + heal). 4. Updated the stale `ai-mcp-servers.types.ts` comment (the driver returns a JSON string for legacy rows; the repo normalizes every read). 5. Fail-open logging: a corrupt tool_allowlist degrades to "no restriction" (agent gets ALL tools) — normalizeRow now warns (server id only, never contents) so the silent widening leaves a trace. 6. Simplified parseToolAllowlist (normalize the string once, then a single array-of-strings check) — identical behaviour, all 12 cases still pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 11:36:01 +03:00
claude code agent 227	ae6faf3abc	fix(ai-chat): guard step-update vs finalize race with WHERE status='streaming' (#183 review) Review caught a real race: onStepFinish fires `updateStreaming()` fire-and- forget (not awaited), so the FINAL step's streaming UPDATE and the terminal `finalizeAssistant` UPDATE run as two concurrent statements on different pool connections — commit order is not guaranteed. If the late streaming update lands AFTER finalize, the completed row is clobbered back to status='streaming' with no usage/finishReason, and the next startup sweep then mis-marks the finished turn 'aborted'. Green unit/integration tests don't reproduce a cross-connection race. Fix: scope the per-step update with `onlyIfStreaming` → SQL `WHERE status='streaming'`. Once finalize has set a terminal status the late update matches zero rows and no-ops, regardless of commit order; finalize runs unguarded so it always wins. A cheap `if (finalized) return` short-circuit avoids most wasted queries, but the SQL guard is the authoritative fix (the flag can be set after a query is already in flight). Integration test: finalize to 'completed', then a late onlyIfStreaming update is a no-op — status/content/usage preserved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 06:14:02 +03:00
claude code agent 227	e7b719bbb8	feat(ai-chat): persistent history as source of truth — step durability + server export (#183 ) The chat lived in inconsistent paradigms (in-memory stream + client export vs. DB-as-context), which made export flaky and lost the assistant answer if the process died mid-turn. Make the DB the single source of truth. A. STEP-GRANULAR DURABILITY (server) - ai_chat_messages gains a nullable `status` column (migration; NULL = legacy = completed). The assistant row is now INSERTED UPFRONT as `status:'streaming'` and UPDATEd on every onStepFinish with all finished steps (text + tool calls + tool RESULTS), then finalized once to completed/error/aborted on the terminal callback. So a process death mid-turn keeps every finished step; a startup sweep (OnModuleInit → sweepStreaming) flips any dangling 'streaming' row to 'aborted'. The write path no longer depends on a live socket. - Pure exported `flushAssistant(steps, inProgressText, status, extra?)` builds the persist payload (metadata.parts byte-identical to the old builder), so a future background worker can call the same path. AiChatMessageRepo gains `update`, `sweepStreaming`, and `findAllByChat`. - consumeStream drain, external-MCP client close-once, SSE heartbeat preserved. B. SERVER-SIDE EXPORT - New pure `chat-markdown.util.ts` renders Markdown from DB rows ONLY (server port of the client builder). Because A persists the in-progress row, the export now includes an interrupted turn up to its last finished step (flagged "still generating"). `POST /ai-chat/export` (owner-gated via assertOwnedChat, workspace-scoped) returns it; `lang` accepts a full client locale tag ('en-US'/'ru-RU') and is normalized server-side (normalizeLang) — a strict @IsIn(['en','ru']) DTO rejected the real client's i18n.language with a 400, caught in real-browser testing. - Client: handleCopy calls the endpoint; `canExport = !!activeChatId`. The whole liveThreadRef/liveStateRef/onLiveContentChange/hasLiveContent hybrid (and the client chat-markdown util + test) is removed — the server is now authoritative. Tests: flushAssistant unit (status shapes + parts parity), chat-markdown.util unit (incl. legacy NULL-status + interrupted note + ru + normalizeLang locale tags), controller export wiring + owner-gate, integration update/sweepStreaming. Verified: server build + 318 ai-chat unit + 3 integration; client tsc + 157 ai-chat unit; and END-TO-END in a real browser — a chat turn persists mid-stream and the Copy button exports the DB-sourced markdown (showing the in-progress row), HTTP 200 after the locale fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 06:05:26 +03:00
claude_code	27c91e4a69	feat(ai-chat): bound external MCP tool calls with per-call timeouts External MCP tools (web search, crawl) had no per-call timeout: a hung tool call was only broken by the 15-min transport silence timeout shared with the chat provider, and a server that kept the socket warm but never returned could spin until the user cancelled. Add two independent, composing bounds for external MCP traffic (the chat provider path is unchanged): - Silence 5 min: buildPinnedDispatcher now overrides headersTimeout/ bodyTimeout with mcpStreamTimeoutMs() (AI_MCP_STREAM_TIMEOUT_MS, default 300000) on the external-MCP dispatcher only, so a byte-silent upstream is severed in ~5 min instead of 15. - Total per-call 15 min: wrapToolWithCallTimeout wraps each external tool's execute with a fresh AbortController + timer composed with the turn signal via AbortSignal.any (AI_MCP_CALL_TIMEOUT_MS, default 900000). It RACES the call against the abort signal because @ai-sdk/mcp does not settle its in-flight promise on abort, so a warm-but-stuck call would otherwise hang forever. On timeout the call surfaces as a tool-error and the agent loop recovers. Add tests (incl. a never-settling real-client-style stub) and document both env vars in .env.example.	2026-06-25 04:43:49 +03:00
claude_code	b6787cc542	fix(ai-chat): drain stream on client disconnect to stop heap-OOM leak The /api/ai-chat/stream and public-share streaming paths piped streamText output to the client socket via pipeUIMessageStreamToResponse, whose only reader is that socket. On a client disconnect (pervasive Safari/proxy ECONNRESET), backpressure stalled the stream: the controller aborted the turn but nothing drained it, so streamText's onFinish/onError/onAbort never fired. Cleanup (close leased MCP clients, persist partial) never ran and the whole per-turn object graph (history, per-request toolset closures, captured steps, SDK buffers) stayed rooted — accumulating across turns until the default ~2GB heap saturated and the process crashed with "Ineffective mark-compacts near heap limit - JavaScript heap out of memory". Add the AI SDK v6 documented remedy: fire-and-forget `result.consumeStream({ onError })` right after streamText(), which removes backpressure and drains the stream independently of the client socket so the terminal callbacks always fire and the turn's memory is released even when the client has gone away. Applied to both the authenticated and public-share stream services. Also add `--heapsnapshot-near-heap-limit=2` to the prod start script so any residual leak dumps a heap snapshot near OOM for diagnosis (no effect on normal operation). Heap size stays ops-tunable via NODE_OPTIONS. - apps/server/src/core/ai-chat/ai-chat.service.ts - apps/server/src/core/ai-chat/public-share-chat.service.ts - apps/server/package.json Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-25 03:59:32 +03:00
claude code agent 227	c065e26d14	refactor(ai): retry outside instrumentation + retry-exhaustion test (#179 review) - Invert the transport layers so the pre-response retry is OUTERMOST and the provider-HTTP instrumentation is INNER. Before, the retry lived inside createStreamingFetch (under the instrumentation), so a reset the retry recovered from logged only a clean "OK status=200" — the "PRE-RESPONSE FAILED ... ECONNRESET ... idleSincePrevCall" signal went blind exactly when the fix works, and AI_STREAM_KEEPALIVE_MS couldn't be tuned from prod data. Now createStreamingFetch is the dispatcher-bound BASE (no retry) and a new withPreResponseRetry() wraps it; ai.service composes withPreResponseRetry(createInstrumentedFetch('AiService:provider-http', createStreamingFetch())), so every attempt — including recovered resets — flows through the instrumentation. (Also expresses the keepAlive-config vs retry- behavior boundary structurally, per review #3.) - Add the retry-exhaustion test: a server that resets EVERY connection, asserting the call rejects with a retryable connection error AND exactly PRE_RESPONSE_CONNECT_RETRIES + 1 (= 3) requests reached the server — pinning the bound and that the final error propagates (guards an off-by-one / infinite loop / swallowed error). Existing happy-retry + abort tests moved onto withPreResponseRetry. Verified on the stand: a normal turn still streams (reasoning + finish) and the provider-HTTP telemetry still logs. server tsc + ai/mcp specs green (30). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 00:10:40 +03:00
claude code agent 227	b0faa2fe32	fix(ai-chat): recycle keep-alive sockets + retry pre-response resets (#175 ) The real cause of the long-task "Lost connection to the AI provider" — the earlier 300s-timeout fix (#176) was the wrong layer. The provider-HTTP telemetry on the user's deploy shows the failures are PRE-RESPONSE `read ECONNRESET` ~500ms in (not a 300s/15min timeout), correlated with idleSincePrevCall ~42s and large bodies; and crucially a retry of the SAME request often succeeds. A direct probe to the real z.ai endpoint does NOT reset (113KB bodies and a 45s-idle keep-alive reuse both succeed), and another agent (opencode) runs fine from the same infra — so the provider is healthy and the egress network is usable. The difference is the transport: undici's keep-alive pool REUSES a socket that the deployment's egress (NAT / firewall / conntrack) silently dropped during a long idle gap, so the next request resets pre-response. Fix (brings gitmost in line with clients that don't reuse stale sockets): - Keep-alive recycling: the streaming dispatcher (chat fetch AND the external-MCP dispatcher, via the shared streamingDispatcherOptions) now sets keepAliveTimeout + keepAliveMaxTimeout to a 10s recycle window (AI_STREAM_KEEPALIVE_MS), so a connection idle longer than that is closed instead of reused — a long-gap step opens a fresh connection. keepAliveMaxTimeout also caps a server-advertised keep-alive so the provider can't widen the window. - Pre-response connection retry: createStreamingFetch retries a connection-level reset (ECONNRESET / UND_ERR_SOCKET / ECONNREFUSED / EPIPE / *_TIMEOUT) on a fresh connection up to 2 times. This is SAFE because fetch() only rejects before the Response resolves — a started stream is never replayed; an abort (client disconnect) is never retried. Tests: ai-streaming-fetch.spec — keep-alive options, streamKeepAliveMs env, isRetryableConnectError, and a server that resets the first connection so the retry must land on a fresh one (+ aborted requests are not retried). Verified on the stand that a normal turn still streams (reasoning + text + finish) through the new transport. server tsc + ai/mcp specs green. Note: root cause is the deployment's egress dropping idle connections (Traefik is inbound-only); this makes the app resilient to it. AI_STREAM_KEEPALIVE_MS can be lowered if the egress drops faster than ~10s. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-24 23:51:17 +03:00
claude code agent 227	6edbbab43b	refactor(ai): unify provider-settings allowlist + stronger chatApiStyle tests (#177 review) Addresses the second #177 review: - Architecture (the silent allowlist drift): the writable provider-setting keys were maintained by hand in two TS-uncheckable places — the key-loop in ai-settings.service and the SQL ALLOWED list in the generic workspace repo (a miss there silently dropped a field on persist, exactly what bit chatApiStyle). Introduce one typed source of truth PROVIDER_SETTINGS_KEYS in ai.types (`satisfies readonly (keyof AiProviderSettings)[]`), have the service consume it, and keep the repo's own copy (it can't import AI types) guarded by a parity test so any future drift fails in CI. - Tests: - ai.service.include-usage.spec: mocks @ai-sdk/openai-compatible and asserts the factory is called with { includeUsage: true, baseURL, apiKey, fetch, name } — `.provider` alone could not catch a dropped includeUsage (the token-usage zeroing regression); also asserts the 'openai' style does NOT use it. - ai-provider-settings-keys.spec: the allowlist parity check + DTO validation for chatApiStyle (@IsIn accepts both values, rejects garbage, optional). - CHANGELOG: [Unreleased] entries for the new "Protocol" / chatApiStyle setting and the default provider change (openai -> openai-compatible). (#175, #177) server + client tsc clean; 42 ai/settings specs green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-24 23:18:31 +03:00
claude code agent 227	59190148db	feat(ai-chat): explicit chatApiStyle selector to surface reasoning (#175 ) Rebuilt on develop (after #176) and reworked per review: instead of inferring the provider from baseUrl (`if (baseUrl)`), the admin picks the chat provider EXPLICITLY via a new `chatApiStyle` ('openai-compatible' \| 'openai'), mirroring the existing sttApiStyle. A custom baseURL can front real OpenAI too, so the heuristic was fragile. Why reasoning was missing: glm-5.2 (and DeepSeek etc.) stream their thinking as `reasoning_content`, but the official @ai-sdk/openai provider does not map that field. 'openai-compatible' uses @ai-sdk/openai-compatible, which does — so reasoning parts now stream (verified live: reasoning-start/delta/end appear, and disappear when set to 'openai'). - Default (unset) = 'openai-compatible', so existing openai+baseUrl workspaces surface reasoning with no admin action. No DB migration (field lives in the settings.ai.provider JSON blob). - includeUsage: true on the openai-compatible model — without it the provider omits streamed usage, zeroing the live token counter / reasoning-token metadata. The official provider always sent it; this keeps parity. (Confirmed live: usage.totalTokens present.) - openai-compatible has no default endpoint, so with no baseURL (real OpenAI, or a role's cross-driver override that cleared it) it falls back to the official provider. Plumbing: ai.types (ChatApiStyle / CHAT_API_STYLES + AiProviderSettings / MaskedAiSettings), update DTO (@IsIn), ai-settings.service (resolve / getMasked / update allowlist), workspace.repo updateAiProviderSettings ALLOWED (the second, SQL-level allowlist the review missed — without it the field never persisted), ai.service selector. Client: ai-settings-service types + a Protocol <Select> in the chat section + i18n (en/ru). Scope is chat-only (embeddings don't stream reasoning; STT already has sttApiStyle). Tests: ai.service.spec — 4 cases (openai-compatible+baseURL, openai+baseURL, default-unset, openai-compatible-without-baseURL fallback). Verified on the stand: default streams reasoning + usage; 'openai' drops reasoning; the setting round-trips. server + client tsc clean; 36 ai/settings specs green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-24 22:58:15 +03:00
claude code agent 227	da15b55786	refactor(ai): address PR #176 review — finite-timeout wording, env doc, tests, permanent provider-http module - Wording: every comment now says the stream timeouts are RAISED to a generous-but-finite ~15-min silence timeout, not "disabled (0)" (the stale comments contradicted the code, which uses AI_STREAM_TIMEOUT_MS, default 900000ms). - Architecture (the load-bearing-temporary trap): the streaming fetch reached the chat provider only by riding the "temporary DIAGNOSTIC" telemetry, so deleting the telemetry by its own label would silently revert the timeout fix. Legitimize it: rename ai-http-diagnostics.ts -> ai-provider-http.ts, createDiagnosticFetch -> createInstrumentedFetch, field aiDiagnosticFetch -> aiProviderFetch, drop the "temporary" labels, and document the chat transport (streaming fetch + instrumentation) as one intentional construct. - Docs: AI_STREAM_TIMEOUT_MS added to .env.example next to AI_EMBEDDING_TIMEOUT_MS. - Tests: - ai-provider-http.spec: createInstrumentedFetch delegates to the injected baseFetch with the same input/init, returns the Response untouched, rethrows the error, and defaults to global fetch — covering the baseFetch seam. - ai-streaming-fetch.spec: the delayed-server test is now LOAD-BEARING — with AI_STREAM_TIMEOUT_MS set below the 1.5s server delay the call actually rejects (a lost dispatcher -> global 300s default would NOT), proving the configured dispatcher is wired; plus the default-timeout happy path. server tsc clean; ai-streaming-fetch / ai-provider-http / ai.service / mcp-servers / ai-error specs green (41). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-24 22:31:58 +03:00
claude code agent 227	a14560c7c9	fix(ai-chat): raise undici's 300s stream timeout for long agent turns (#175 ) Long research turns failed mid-task with "Lost connection to the AI provider". Node's global fetch (undici) defaults BOTH headersTimeout and bodyTimeout to 300_000ms, and the chat provider + the external-MCP dispatcher both ran on it with no override, so: - the z.ai chat stream dropped when a late step's huge accumulated context pushed the model's time-to-first-token past 5 min (the model reasons server-side with NO streamed reasoning, so the connection is silent until the first answer token — reproduced: even a trivial glm-5.2 query has a ~4-8s first-chunk gap; a long run reaches 400k+-token steps), or a reasoning model paused >5 min between chunks (bodyTimeout); - the crawl4ai SSE transport, held open across the whole turn, dropped when it idled >5 min between tool calls. Fix: a dedicated undici dispatcher whose stream timeouts are raised to a generous-but-FINITE silence timeout (default 15 min, AI_STREAM_TIMEOUT_MS) on each path. NOT disabled (0): that would let a genuinely hung provider — with the client still connected — hang forever, since the turn's abortSignal only fires on client disconnect. The timeout bounds SILENCE (time-to-first-byte and the gap BETWEEN chunks), NOT total turn duration, so an arbitrarily long turn that keeps streaming is never cut; only a stream quiet for >15 min is treated as a hang. - ai-streaming-fetch.ts: createStreamingFetch() + streamTimeoutMs() / streamingDispatcherOptions() (the shared, configurable timeout). - ai.service: the chat provider fetch is createStreamingFetch(), wrapped by the existing passive ECONNRESET telemetry (createDiagnosticFetch gained an optional baseFetch) so the telemetry observes the SAME transport. - mcp-clients: the SSRF-pinned Agent uses streamingDispatcherOptions(). Investigation: reproduced the transport mechanism against the real z.ai endpoint (a 1ms headersTimeout throws UND_ERR_HEADERS_TIMEOUT — the exact drop) and ran the actual research agent to a ~428k-token context. Verified the fixed path streams cleanly live (glm-5.2 turns finish; telemetry confirms the streaming fetch is in use). Tests: ai-streaming-fetch.spec (default 15m + env override + invalid fallback + both-timeouts + streams a delayed response); ai-http-diagnostics + ai/mcp specs green. server tsc clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-24 22:09:10 +03:00
claude_code	4cc8df836f	chore(ai): passive z.ai provider HTTP telemetry (#175 ) Investigate the intermittent (~20-30%) long-turn failure "Lost connection to the AI provider" = AI_RetryError / read ECONNRESET on the gitmost->z.ai link (browser-agnostic, mid-turn). Pure instrumentation, no behavior change: - ai-http-diagnostics.ts: a passive fetch wrapper injected into the OpenAI-compatible (z.ai) client. Per provider HTTP call it logs time-to-headers/status on success, and on a pre-response rejection the latency, error code/cause, request-body size and idle-gap since the previous call. The Response is returned untouched (streaming intact), errors rethrown unchanged; no retry/timeout/dispatcher. - ai.service.ts: wire the instrumented fetch into the openai case only. Lets us classify the reset as connection-phase vs mid-stream before choosing a fix, without repeating the reverted RetryAgent (#140). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-24 21:24:05 +03:00

1 2 3 4 5 ...

780 Commits