Commit Graph

862 Commits

Author SHA1 Message Date
agent_coder f815e61a8d fix(#234 review): wire Stop to server run + real autonomousRuns toggle (F1/F2/F3)
F1: the Stop button was wired only to useChat.stop() (a local SSE abort the server
ignores for a detached run), and dropping isStreaming re-armed observer-polling so the
still-running run streamed back into view. Add a client stopRun(chatId) -> POST
/ai-chat/stop; in autonomous mode Stop now also calls it (the authoritative stop). A
stoppingRun latch suppresses the observer MERGE (observedRow requires !stoppingRun) so
the stopped run's newly-persisted partial steps don't re-stream between Stop-press and
terminal settle. The latch clears only once this tab is no longer the streamer
(!localStreaming) — while streaming, the disabled run query holds the PREVIOUS turn's
terminal run in cache, which would otherwise clear the latch early and re-open the
flash on turn 2+. Latch releases on stopRun failure (view resumes) and on chat switch.
F2: give autonomousRuns a real enable path through the standard settings.ai.* toggle
(update-workspace.dto + workspace.service persist/audit + ai-provider-settings switch +
client workspace type), mirroring aiDictation. Persists to settings.ai.autonomousRuns
— the exact key the controller and ai-chat-window read.
F3: correct the pending->running comments (beginRun inserts 'running' directly;
'pending' is a reserved default never written in phase 1).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-02 22:26:15 +03:00
agent_coder 4efd80fc49 Merge branch 'develop' into feat/184-autonomous-agent-runs
# Conflicts:
#	apps/client/src/features/ai-chat/components/ai-chat-window.tsx
#	apps/server/src/core/ai-chat/ai-chat.service.ts
#	apps/server/src/database/database.module.ts
#	apps/server/src/database/types/db.d.ts
#	apps/server/src/database/types/entity.types.ts
2026-07-02 15:03:38 +03:00
agent_coder 2f3d5d3783 docs: fix escapeAttr comment count (three, not four) (#274 review)
The regex strips three attribute-breaking chars (" < >); the JSDoc said four.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-02 06:19:26 +03:00
agent_coder 6e681a9c66 fix(#274): escape page_changed injection surface, drop dead content_hash (review F1-F5)
F1: escape the collaborative page title before interpolating into
    <page_changed page="..."> (and the pre-existing openedPage attr) — strip
    <>" and collapse whitespace, so a crafted title can't break out of the
    attribute into the system prompt (cross-user injection).
F2: neutralize <page_changed>/</page_changed> occurrences inside the diff body
    so a crafted line can't close the block early.
F3: remove the dead content_hash column (written every turn, never read) —
    migration, repo, service hashing + crypto import, db.d.ts, spec asserts.
F4: test the best-effort catch branches (detectPageChange / snapshotOpenPage
    swallow errors and don't break the turn).
F5: soften the overstated 'diff cannot smuggle instructions' comment to
    defense-in-depth framing referencing the F1/F2 mitigations + safety sandwich.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-02 05:43:46 +03:00
agent_coder 8c5b57ebfa feat(ai-chat): notify the agent of user page edits between turns (closes #274)
The agent rebuilds context from DB each turn and didn't know the user manually
edited the open page since its last response, so it could overwrite those edits.
Add a per-turn ephemeral <page_changed> note in the system prompt (twin of
INTERRUPT_NOTE, self-clearing) carrying a unified Markdown diff of what changed
since the END of the agent's previous turn.

- New ai_chat_page_snapshots table (migration + hand-declared db.d.ts/entity
  types) storing the page Markdown per (chat,page) at each turn's end.
- Pure computePageChange util (whitespace-normalized unified diff via the
  existing jsdiff dep, 6KB cap + getPage hint).
- Turn start: if the open page's updatedAt moved past the snapshot, diff current
  vs snapshot; non-empty -> PAGE_CHANGED_NOTE in the safety sandwich.
- Turn end: upsert the snapshot on EVERY terminal path (onFinish/onError/onAbort,
  once) so the agent's own edits are excluded by construction even on aborted
  turns.
All best-effort (never breaks/latency-regresses a turn); fast path when updatedAt
is unchanged. Server-only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-02 01:54:00 +03:00
claude code agent 227 3b80285d57 fix(#260): open MCP collab docs by canonical UUID (slugId doc-name split)
Real root cause of the silent MCP edit loss: the web editor always opens the
collaboration document by the page UUID (`page.${page.id}`), but the MCP
opened it by the agent-supplied id — usually a slugId — so `page.${pageId}`
became `page.<slugId>`. For one DB page that is TWO independent Yjs documents;
both persist to the same `pages` row (findById/updatePage resolve id or
slugId), so the human tab's debounced store overwrites the agent edit
(last-store-wins) — gone after reload, never shown live. The slugId doc also
made the server's transclusion sync + embedding reindex throw Postgres 22P02.

Fix:
- MCP (primary): resolvePageId(pageId) returns the canonical UUID — a UUID
  short-circuits with no network call, a slugId resolves once via getPageRaw
  and is cached both ways. Every collab-write path (mutatePageContent /
  updatePageContentRealtime / replacePageContent and the mutate/replace/
  unlocked seams) now opens by the resolved UUID, so the MCP and the editor
  share ONE Yjs doc. replaceImage's whole-operation page lock also keys on the
  UUID so it serializes against the other (now-UUID-keyed) writes.
- Server (defense + kills the 22P02 noise): onStoreDocument passes the resolved
  page.id — not the raw doc-name id — to syncTransclusion, the embedding queue,
  the mention-notification job, addContributors, and the in-tx history read.
  Content store and the empty-guard are untouched.

Tests: a new MCP test stands up a real Hocuspocus server and asserts a slugId
input opens `page.<uuid>` (never `page.<slugId>`), with UUID short-circuit and
single-resolve caching; the server spec asserts the side-effects receive the
UUID for a `page.<slugId>` doc. closes #260

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 10:04:49 +03:00
claude code agent 227 3123552944 Merge remote-tracking branch 'gitea/develop' into HEAD
# Conflicts:
#	CHANGELOG.md
2026-06-30 02:28:31 +03:00
vvzvlad 22ea387495 Merge pull request 'feat(#246): inline spoiler mark (blur + click-reveal, lossless Markdown)' (#259) from feat/246-spoiler into develop
Reviewed-on: #259
2026-06-30 01:47:46 +03:00
vvzvlad 7e6dd457a4 Merge pull request 'refactor(#193): tool-host drift-guard + staged plan (shared spec registry already merged)' (#249) from refactor/193-tool-spec-registry into develop
Reviewed-on: #249
2026-06-30 01:47:13 +03:00
vvzvlad 42f3a328c2 Merge pull request 'feat(#251): intentional-clear signal editor→store (persist deliberate clear, keep #248 guard)' (#253) from feat/251-intentional-clear into develop
Reviewed-on: #253
2026-06-30 01:36:46 +03:00
vvzvlad a8a7fad850 Merge pull request 'test(#244): Part B backlog — editor-ext/mcp/client/server unit+contract tests + findBreadcrumbPath mutation fix' (#257) from test/244-part-b into develop
Reviewed-on: #257
2026-06-30 01:36:00 +03:00
vvzvlad d38a39e3e5 Merge pull request 'fix(ai): show live reindex progress so the embeddings counter resets to 0 and climbs' (#242) from fix/embeddings-reindex-progress into develop
Reviewed-on: #242
2026-06-29 23:44:13 +03:00
vvzvlad 116a231691 Merge pull request 'fix(#255): disconnect socket.io redis-adapter pub/sub clients on shutdown' (#256) from fix/255-ws-redis-adapter-leak into develop
Reviewed-on: #256
2026-06-29 23:23:47 +03:00
claude code agent 227 188c5f506c feat(editor): inline spoiler mark (blur + click-reveal, lossless Markdown) (#246)
Add an inline spoiler (Telegram/Discord-style hidden text): a TipTap mark
`spoiler` rendered as <span data-spoiler="true" class="spoiler">, blurred via
CSS and revealed on click (UI-only is-revealed class, never persisted).

- packages/editor-ext: the Spoiler mark (inclusive:false, set/toggle/unset
  commands, ||text|| input rule), exported; a lossless turndown rule emitting
  raw inline HTML; round-trip test.
- apps/client: SpoilerView mark-view (ReactMarkViewRenderer, Link pattern),
  registration in extensions, bubble-menu toggle button (editable only), CSS
  (blur + @media print reveal), en/ru i18n.
- apps/server: register Spoiler in collaboration.util tiptapExtensions so the
  mark survives HTML<->JSON export/index/import/Yjs; a test proving the public
  share keeps the spoiler (it isn't stripped with comments).

No keyboard shortcut: the proposed Mod-Shift-s collides with Strike (and
Mod-Shift-h with Highlight); the ||text|| input rule + the bubble-menu button
cover ergonomics.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 23:22:30 +03:00
claude code agent 227 aa14ad6698 docs(ai): quote the content predicate verbatim; drop twin tautological assert (F16,F17)
F17: the header's content-clause literal omitted the [[:space:]]* tolerance;
     copy page.repo.ts's exact '"type"[[:space:]]*:[[:space:]]*"text"' (jsonb::text
     renders a space after the colon, which is why the tolerance exists).
F16: remove expect(ttl).toBeGreaterThan(0) — the twin of the F15 removal;
     expect(ttl).toBe(120) strictly subsumes it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 17:30:52 +03:00
claude code agent 227 1e5994573f docs(ai): list all three embeddable clauses in int-spec header; drop tautological assert (F14,F15)
F14: the lockstep int-spec header still described the pre-F6 two-clause set with
     'iff' — add the content-JSON text-node clause so it matches embeddablePredicate.
F15: remove the redundant expect(ttl).toBeLessThanOrEqual(120) that followed
     expect(ttl).toBe(120).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 17:02:32 +03:00
claude code agent 227 d0eae69086 fix(ai): raise reindex pre-seed TTL to the client poll cap; cover predicate clause; align docs (F11-F13)
F11: PRE_SEED_TTL_SECONDS 45->120 (= client REINDEX_POLL_CAP_MS). At concurrency
     1 a queued reindex can wait past the old 45s; if the pre-seed expired while
     pending, getMasked fell back to the COUNT and reported done, so the client
     stopped polling and missed the climb. Tie the pre-seed TTL to the client cap.
F12: extend the lockstep integration spec — insertPage takes content; a
     text_content=null + text-node-content page is IN and a math-only page is OUT,
     pinning the structural "type":"text" clause (and the jsonb space-after-colon).
F13: list all three embeddable clauses in the reindex JSDoc/inline comments.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 16:12:36 +03:00
claude code agent 227 7043e08353 docs(ai-chat): align requestStop JSDoc with abort-first order (F17)
The method docstring still described the old write-then-abort order and
presented the stop_requested_at stamp as guaranteed. Reword to: abort first
(the only thing that actually stops the run), then best-effort stamp that may
be skipped on a DB error or lost to the finalize race — acceptable since the
row still settles as 'aborted'.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 15:38:55 +03:00
claude code agent 227 2916c13591 fix(ai-chat): abort the run before the stop audit-write; drop dead spec helper (F15,F16)
F15: requestStop awaited markStopRequested (a DB UPDATE) before aborting the
     in-process controller, so a transient DB error (pool exhaustion, deadlock,
     dropped connection) threw and skipped abort() — leaving the run executing
     despite an explicit Stop. Abort first, then record stop_requested_at
     best-effort in its own try/catch (logged, treated as marked=false, never
     rethrown). Return value preserved: Boolean(marked) || Boolean(entry).
F16: remove the dead chain helper + its void-suppressor from the repo spec.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 15:14:01 +03:00
claude code agent 227 91f24fc062 fix(ai): include content-bearing pages in reindex coverage; correct progress race & hot path (F6-F10)
F6: extend embeddablePredicate to pages with body content but null text_content,
    keyed on the text-node marker "type":"text" (not a bare "text": key, which
    also matched math nodes' attrs.text and would leave math-only pages stuck
    below 100%). Numerator and denominator share the predicate; tests assert the
    compiled WHERE is byte-identical and a math-only doc is excluded.
F7: correct the start() JSDoc (both totals are the real page count).
F8: nextReindexPollInterval reuses isReindexComplete.
F9: getMasked reads progress first and skips the two COUNTs while a reindex is active.
F10: pre-seed the progress entry with a short 45s TTL so a deduped enqueue's
     phantom "0 of N" expires quickly instead of sticking for the 1h TTL.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 14:37:26 +03:00
claude code agent 227 c0ff480898 test(#184): pin begin-failure resilience (swallow-and-continue) branch in stream() (F14)
Add a run-race spec case where runHooks.begin rejects with a plain Error
(not RunAlreadyActiveError): assert stream() does not 409, logs the legacy
fallback, persists the user message, and streams untracked on the socket
signal (effectiveSignal = signal, runId undefined).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 14:18:35 +03:00
claude code agent 227 f9b58a0e3d test(server): SSRF guardedFetch, decryptHeaders fail-open, yjs.util, tool-spec parity, storage delegation
guardedFetch blocks loopback/private/link-local/metadata IPs and never calls
fetch; decryptHeaders fails open (returns undefined, warns once, no blob leak).
yjs.util setYjsMark/removeYjsMarkByAttribute/updateYjsMarkAttribute on real
Y.Docs. SHARED_TOOL_SPECS<->in-app parity (name/desc/input-schema; a dropped or
renamed wiring fails). Replace the tautological storage.service spec with
driver-delegation checks across every public method.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 04:49:56 +03:00
claude code agent 227 82b042209e fix(ws): make redis adapter error handlers actually log (were noop)
The pub/sub error handlers were `(err) => () => {}` — a noop returning an
inner arrow that never runs, so socket.io redis client errors were silently
swallowed. Log them via Nest Logger. Adjacent pre-existing bug surfaced in
review of #255.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 04:32:34 +03:00
claude code agent 227 a0f4c86a74 fix(ws): disconnect socket.io redis adapter pub/sub clients on shutdown
The WsRedisIoAdapter creates two ioredis clients (pubClient/subClient) for
@socket.io/redis-adapter but never closed them, leaking their TCP handles on
application shutdown (#255). The redis-adapter does not own these clients'
lifecycle, and the adapter is instantiated from main.ts (not a DI provider),
so no Nest lifecycle hook applied to it.

Keep references to both clients and override dispose(), which Nest's
SocketModule.close() invokes exactly once during shutdown after all socket.io
servers are closed. Use disconnect(false) to mirror the sibling pub/sub pair
in collaboration/extensions/redis-sync (onDestroy): immediate close, no QUIT
round-trip, no auto-reconnect. Refs are nulled to guard against double-close.
Runtime behavior is unchanged; only the shutdown path is added.

Verified with a script that boots connectToRedis() against a real Redis:
2 sockets to :6379 open after connect, 0 remain after dispose().

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 04:28:56 +03:00
claude code agent 227 cce539e8e2 fix(collab): hoist intentional-clear consume out of the store retry loop (#251)
The store-side empty-guard consumed the per-document intentional-clear flag
INSIDE the bounded retry loop. consumeIntentionalClear always deletes the
in-memory Map entry, but a tx rollback cannot un-delete it: attempt 1
consumed the flag then updatePage threw a transient error and rolled back;
attempt 2 re-read the page non-empty, saw the flag gone, and the empty-guard
silently BLOCKED the write — dropping the user's deliberate clear and
defeating the retry guarantee for clears.

Hoist the decision out of the loop (like consumeContributors /
consumeAgentTouched): consume once into `allowIntentionalClear` before the
`for`, and only read that boolean on the empty-over-non-empty branch. The
single hoisted consume still drops a pending flag for a non-empty store
(the "cleared then retyped" case), since every store consumes regardless of
incoming emptiness.

Add a regression test: arm via the real onStateless transport, updatePage
throws once then succeeds, assert it is called twice and the retry writes the
empty doc (the clear survives). It fails on the old consume-in-loop ordering
(updatePage called once) and passes after the hoist.

Document the known fail-safe limitation near the TTL constant: if document
ownership transfers / a node crashes between the stateless signal and the
debounced store, the in-memory flag is lost and the clear is silently not
applied (the doc reloads non-empty) — fail-safe, content is never destroyed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 04:17:41 +03:00
claude code agent 227 8274720281 fix(server): close leaked redis sockets so e2e jest exits (#252)
The full-AppModule e2e (apps/server/test/app.e2e-spec.ts) passed but jest
never exited, burning CI to its timeout. Diagnosis (process._getActiveHandles
after app.close()) showed exactly two ioredis sockets to :6379 still open after
shutdown; everything else (BullMQ queues/workers, @nestjs/schedule intervals,
nestjs-ioredis, nestjs-kysely pg pool, @nestjs/cache-manager Keyv store,
hocuspocus pub/sub) already closes on app.close().

The two leaks were owned-but-never-closed clients:

1. ThrottleModule passed a pre-built `new Redis(...)` instance to
   ThrottlerStorageRedisService. With an instance, the lib sets
   disconnectRequired=false, so its onModuleDestroy never disconnects.
   Pass ioredis options instead so the service owns + disconnects the client.

2. CollaborationGateway created a source `new RedisClient(...)` that
   RedisSyncExtension only duplicates into pub/sub; the extension's onDestroy
   disconnects those duplicates but not the source. Keep a reference and
   disconnect it after the hocuspocus onDestroy hook in destroy().

Both are real lifecycle fixes (production shutdown is now clean too), so no
--forceExit is needed. Verified against real Postgres+Redis:
  - test:e2e (no forceExit, --runInBand) exits 0 in ~18s (was: hung forever)
  - --detectOpenHandles exits 0 with no open-handle report
  - active handles after app.close(): none
CI timeout-minutes safety nets left untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 04:11:51 +03:00
claude code agent 227 3fdb1e05a4 feat(collab): persist a deliberate page clear via an intentional-clear signal (#251)
The #248 store-side empty-guard (onStoreDocument) unconditionally refuses to
overwrite non-empty persisted content with an empty document, because a
momentarily-empty live Y.Doc is indistinguishable from a real clear at the
store layer. That correctly blocks glitches/bad-merges, but also blocks a user
who genuinely wants to empty a page. This re-introduces a WORKING, narrow,
non-spoofable exception (the dead context.intentionalClear hatch #248 removed
never had a real channel).

Definition of an intentional clear (client, IntentionalClear editor extension):
a LOCAL user transaction (docChanged, NOT a remote y-sync change — filtered via
isChangeOrigin) that reduces a non-empty doc to the empty single-paragraph
shape. This is exactly the select-all + Delete/Backspace keystroke path.

Transport (option b — hocuspocus stateless message): on that transition the
client sends a `{type:'intentional-clear'}` stateless message. The server
(PersistenceExtension.onStateless) records a short-lived (TTL 60s > 45s
maxDebounce), single-use "pending clear" flag keyed by the connection's
document. The next debounced onStoreDocument consumes it on the empty-guard
branch to let that one empty write through.

Why this is the right channel and non-spoofable:
- Yjs transaction origin/metadata does not survive to the server store; awareness
  is per-connection and racy. A stateless message ties the signal to a specific
  clear, survives the debounce, and rides the authenticated connection.
- The document is taken from the connection, never the payload, so a client
  cannot target another page.
- The flag is read ONLY on the empty-over-non-empty branch, so the worst a forged
  signal can do is clear a page the connection may already edit; it can never
  force or alter a non-empty write. Read-only connections cannot arm it. Every
  non-empty store drops a pending flag, so "cleared then retyped" leaves nothing
  usable; the flag is single-use and TTL-bounded.

NOTE: #248 is not yet on develop, so the empty-guard block is included here as
the foundation this exception extends. If #248 lands first this rebases cleanly
(the guard logic is identical; the #251-unique additions are the exception,
onStateless, the pending-flag state, and the client extension).

Tests:
- Server (real transport path, not a hand-poke): onStateless sets the flag with
  the exact client payload, then the debounced onStoreDocument persists the empty
  doc; plus single-use consumption, read-only rejection, non-empty-store drops
  the flag, and the unchanged #248 guard tests (empty-over-non-empty blocked,
  empty-over-empty allowed).
- Client: a real Editor + the actual selectAll+deleteSelection command emits the
  signal; typing / non-emptying edits / already-empty docs do not.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 04:06:39 +03:00
claude code agent 227 0ecddce748 fix(ai-chat): explicit give-up ERROR + accurate retry-window comment (#184 round-4)
F12 [suggestion]: finalizeRun's "all retries exhausted" path only logged
per-attempt warns ("attempt 3/3") then silently restored the in-memory
entry, giving no clear signal that the run row was left non-terminal
('running') pending recovery. Emit ONE greppable ERROR with context
(runId, chatId, final error) on give-up, matching the import-attachment
retry-loop pattern, so an operator can tell a survived blip from a give-up.

F13 [suggestion]: the "ORDER MATTERS (F6)" doc overclaimed that a later
settle "can retry" the terminal write as an in-process retrier. Correct it:
in-process retry is only POSSIBLE (not guaranteed) and only once the entry
is restored AND a fresh settler arrives afterwards; a concurrent settler in
the retry window is consumed at the synchronous active.delete claim, and the
no-streamText path has no second settler at all. The UNCONDITIONAL backstop
in every case is the boot sweep on the next restart.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 02:13:29 +03:00
claude code agent 227 9ad3931a1c fix(ai-chat): make finalizeRun once-gate atomic against concurrent settle (#184 round-3)
The F6 once-gate was non-atomic: `settled.has` was read BEFORE the awaited
terminal UPDATE and `settled.add` only after, so two concurrent finalizeRun
calls for the same run (the documented safety-net catch vs a streamText
terminal callback) both passed the check and both wrote the terminal row —
double-write + last-write-wins status clobber, a window the bounded retry only
widened.

Restore a SYNCHRONOUS atomic claim before any await: capture the entry, then
`active.delete` as a check-and-clear in one tick. The first caller claims and
proceeds; a concurrent second caller finds the entry gone and returns at the
claim, before any UPDATE. On a successful write we arm `settled` (post-write
idempotency gate) and do not restore; on total bounded-retry failure we restore
the claimed entry so a retrier can complete it — never both write and restore.

Also fix the F6(b) JSDoc/comment to not overclaim an in-process retrier on the
no-streamText path: there the only settler is the safety-net, so recovery on
total UPDATE failure is the unconditional boot sweep on the next restart.

Adds a concurrency test firing two simultaneous finalizeRun on one run (update
held on a pending promise) asserting update is called EXACTLY ONCE; existing F6
retry-rides-transient + retain-on-total-failure tests stay green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 01:34:43 +03:00
claude code agent 227 97250ac1d1 fix(ai-chat): harden run finalize + restore int-spec, cover terminal callbacks (#184 round-2)
Round-2 review fixes for PR #234 (#184 autonomous agent runs).

F6 (stability): finalizeRun no longer drops the in-memory entry before the
terminal write. It now UPDATEs first with a bounded retry; only on success does
it arm the idempotency once-gate (a new `settled` set keyed on "row already
terminal", not "entry deleted") and free the chat's active slot. If every
attempt fails the entry is RETAINED and the run left unsettled so a later
finalize / requestStop->onAbort / sweep can retry — a transient blip can no
longer strand a run 'running' and 409 every future turn in the chat. Idempotency
preserved (double-settle still collapses to a single write).

F7 (regression from F2): int-spec constructs AiChatRunService with the 2nd
EnvironmentService arg ({ isCloud: () => false }) so the file type-checks and all
integration tests compile+run again.

F8 (regression from F1): the windowed "stale but not fresh" case now calls
sweepRunning({ staleMs: SWEEP_RUN_STALE_MS }); added an int-level variant-C case
proving the no-arg boot sweep aborts even a FRESH running run.

F9 (coverage): run-race spec now captures streamText's options and invokes
onStepFinish/onFinish/onAbort/onError, asserting the #184 run hooks
(onStep / onSettled completed|aborted|error) fire with the right args.

F10 (docs): added an autonomousRuns single-instance-only note to .env.example so
the warnIfMultiInstance JSDoc reference is accurate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 01:23:46 +03:00
vvzvlad 4a72ee1681 Merge pull request 'refactor(agent-roles-catalog): YAML catalog with block-scalar instructions (#229)' (#231) from feat/229-catalog-yaml into develop
Reviewed-on: #231
2026-06-29 01:20:40 +03:00
claude code agent 227 82af0c5291 test(catalog): tighten + isolate real shipped catalog-file checks
Apply review suggestions to the real-files block in
ai-agent-roles-catalog.provider.spec.ts (test-only):

1. Fix inaccurate comment: there are 5 content YAML files (index +
   four per-bundle/lang files), not 6.
2. Improve isolation: read/parse the real index lazily inside tests
   (via loadRealIndex) instead of in the describe body, so a broken
   real file fails only these catalog tests, not collection of the
   whole spec (incl. the unrelated mocked-remote provider tests).
3. Add the symmetric slug check: each language file's slug set must
   equal the declared slug set (no undeclared/extra roles), matching
   scripts/check.mjs's exact two-way correspondence.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 23:59:41 +03:00
claude code agent 227 5ac75a9688 refactor(ai-chat): type getRun with concrete AiChatRun/AiChatMessage (#184)
F4: getRun was typed Promise<{ run: unknown; message: unknown }> while its
siblings are concrete. Import AiChatRun + AiChatMessage and return
Promise<{ run: AiChatRun | null; message: AiChatMessage | null }>.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 23:52:43 +03:00
claude code agent 227 362136ead0 test(ai-chat): pin the run-detach abortSignal wiring (#184)
F3: the load-bearing `effectiveSignal = handle.signal` -> streamText
`abortSignal` had no test; a regression to the socket-bound signal would pass
green and silently break Stop + durability. Add a happy-path test (runHooks.begin
returns the run signal -> streamText is driven with abortSignal === handle.signal,
NOT the socket) and a legacy-path test (no runHooks -> the socket signal is used).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 23:52:38 +03:00
claude code agent 227 c0844d5431 fix(ai-chat): unconditional boot sweep + single-instance guard for autonomous runs (#184)
F1 (DECISION C): make the crash-recovery boot sweep UNCONDITIONAL. A fast
restart (deploy/OOM within the old 10-min window of the last step) left a run
stuck `running` forever, and the one-active-run gate then 409'd every future
turn in that chat. On a fresh single-process boot any pending|running run is
definitionally hung, so onModuleInit now settles ALL of them to `aborted` with
no staleness window. AiChatRunRepo.sweepRunning takes an optional { staleMs }
window, kept ONLY for the future phase-2 multi-instance timer sweep (the boot
path passes no window). Repo + service tests assert a fresh `running` run
(updatedAt = now) is settled, not skipped.

F2 (DECISION A): treat phase-1 autonomousRuns as SINGLE-INSTANCE-ONLY. Stop and
its AbortController are process-local, so cross-instance Stop is unreliable
(phase 2). AiChatRunService now logs a startup WARNING when a horizontally-scaled
deployment is detected — via EnvironmentService.isCloud() (CLOUD=true), the only
horizontal-scaling signal this codebase has (the socket.io Redis adapter is
always wired since REDIS_URL is mandatory, so it is not a discriminator). The
constraint is documented in AGENTS.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 23:52:32 +03:00
claude_code 62eb7d082f test(ai-chat): stub sandboxStore.asSink in AiChatToolsService spec
The blob-sandbox feature (#243/#250) made AiChatToolsService.forUser()
eagerly call this.sandboxStore.asSink() while wiring the stash tool, but
the spec still passed an empty {} as the sandboxStore constructor arg.
That object has no asSink method, so all 19 tests in the suite failed in
CI with 'TypeError: this.sandboxStore.asSink is not a function'.

Replace the stale {} mock at all 4 constructor sites with a no-op sink
exposing asSink() -> { put, has, evict } (jest.fn()). These tests never
execute the stash tool, so a no-op sink is sufficient for forUser() to
wire successfully. Test-only change; production code is unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 23:45:06 +03:00
claude code agent 227 997e4395c6 test(agent-roles-catalog): pin the real shipped YAML files (#231 F1)
Provider tests only exercised synthetic stringifyYaml fixtures, so a
hand-conversion error in one of the 6 real catalog files (index.yaml,
bundles/{editorial,research}/{en,ru}.yaml) — a stray quote/colon in a
description, a broken emoji/arrow, a block-scalar indent slip that
silently changes or drops instructions — was caught by no automated
test. scripts/check.mjs is the only other guard and is wired into no
CI/turbo/husky step.

Add a real-files test block that reads each shipped file off disk,
parses it with the SAME options the provider uses
(strict: true, maxAliasCount: 100), and validates it through the
provider's own exported type guards (isCatalogIndex / isCatalogBundleFile
/ isCatalogRole). It is driven from the real index so new bundles/langs
are auto-covered, asserts the editorial bundle still ships fact-checker,
and requires every declared role to be present with non-empty
instructions/name in each language file.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 23:44:49 +03:00
claude code agent 227 85b38d6946 fix(ai): address reindex-progress review round 1 (PR #242)
F1: clear the "Reindex now" spinner once the poll cap fires. Gate the
reindexing part of the button's loading state on the active poll window
(reindexDeadline !== null) so a run that outlives the 120s cap no longer
leaves the button stuck-disabled with a stale `reindexing: true`; the
admin can restart.

F2: rewrite reindexWorkspace JSDoc to describe the EMBEDDABLE page set
(text OR existing embeddings), matching getEmbeddablePageIds /
countEmbeddablePages instead of the old "every non-deleted page".

F3: extract the shared embeddable-content predicate into a private
PageRepo.embeddablePredicate helper, called by both countEmbeddablePages
and getEmbeddablePageIds, removing the verbatim duplication. Behavior is
identical (lockstep int-spec stays green).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 23:39:20 +03:00
claude_code 204cf9dfe7 test(sandbox): address PR #250 round-4 review — SSRF accept-path tests, MCP structuredContent (#243)
Mandatory (test-coverage):
- internal-file-urls.test: pin the SSRF/traversal ACCEPT path of
  resolveInternalFilePath (the sole guard for content-controlled `src`): an
  absolute/protocol-relative URL has its foreign host dropped and only an
  /api/files/ pathname survives (http://evil.com/api/files/x/y.png -> /files/x/y.png),
  while a host-dropped path that escapes /api/files/ (https://evil.com/api/auth/whoami)
  or a backslash-traversal (/api/files\..\auth\whoami) is rejected. Locks the
  behavior so a future prefix-only refactor cannot silently open a bypass.

Suggestions:
- index.ts: the stash_page MCP tool now returns structuredContent
  { uri, sha256, size, images } alongside the resource_link, so the MCP output
  matches the documented shape (clients get the blob's sha256/ETag and the
  mirror counts, not just the link). No outputSchema registered. Rebuilt build/.
- new stash-page-mcp-result.test: server round-trip via InMemoryTransport asserts
  both the resource_link and the structuredContent mirror.
- internal-file-urls.test: cover the new URL parse-failure catch branch
  (http://[ -> "Invalid internal file src").
- environment.service.spec: assert getPositiveIntEnv warns once per key and
  independently across keys (the invalidPositiveIntWarned dedup).

Tests: packages/mcp 383 pass; apps/server sandbox/environment/mcp 235 pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 20:58:36 +03:00
claude_code aff58646d1 refactor(sandbox): address PR #250 round-3 review — dead import, env validation, uuid validator, docs (#243)
Must-fix:
- mcp.module: drop the now-dead EnvironmentModule import (and its stale
  comment). McpService no longer injects EnvironmentService; EnvironmentModule
  is @Global and imported at the app root, so DI still resolves.

Stability:
- environment.service: route getSandboxTtlMs + the three SANDBOX_MAX_*_BYTES
  caps through a shared getPositiveIntEnv() helper that warns once per key and
  falls back to the default on a non-integer or <= 0 value (previously the byte
  caps did a bare parseInt, so SANDBOX_MAX_TOTAL_BYTES=0 made every stash_page
  fail against a 0-byte cap). TTL behavior is unchanged.

Simplification:
- sandbox.controller: replace the homemade UUID_RE with the project's shared
  `uuid` validator (import { validate as isValidUUID } from 'uuid'), matching
  the attachment routes; update the spec fixtures to valid v4 UUIDs.
- mcp.service: inline the single-caller one-liner buildSandboxConfig() to
  this.sandboxStore.asSink() at the wiring site.

Docs:
- CHANGELOG: add an [Unreleased] > Added entry for #243 (stash_page tool,
  anonymous GET /api/sb/:id, five SANDBOX_* env vars).
- AGENTS.md: note that GET /api/sb/:id is in the workspace-gate preHandler's
  excludedPaths and is fully tokenless, unlike /api/files/public/... which
  still resolves a workspace and needs an attachment JWT.

Tests: cap-getter validation (0/-5/abc -> default, valid -> parsed), updated
UUID fixtures. apps/server jest sandbox/environment/mcp: 233 pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 20:21:31 +03:00
claude_code 8842bc8bf3 fix(sandbox): address PR #250 follow-up review — XSS hardening, eviction reconcile, doc sync (#243)
Security (must-fix):
- sandbox.controller: the anonymous GET /api/sb/:id response now sets
  X-Content-Type-Options: nosniff, a restrictive CSP, and Content-Disposition=
  attachment for any mime outside a raster-image allowlist (png/jpeg/gif/webp/
  avif). entry.mime is attacker-controlled, so an evil.svg/evil.html could
  otherwise execute script inline on the Docmost origin (stored XSS). Mirrors
  the public attachment route's hardening.

Stability:
- client.stashPage: reconcile mirrors AFTER the final document put, not only
  before it. The doc blob is the newest entry and FIFO eviction drops the
  oldest = this stash's own images, so the stored doc could reference an
  evicted blob (consumer 404) and over-report images.mirrored. A bounded loop
  now reverts doc-put-evicted mirrors, drops the stale doc blob, and re-puts
  until stable. Regenerated packages/mcp/build/.
- sandbox.controller: emit Cache-Control on the 304 branch too (ttlSeconds is
  computed before the conditional check).

Docs:
- Bump the MCP tool count 39 -> 40 across all READMEs and AGENTS.md (the
  registry now exposes exactly 40 tools).

Refactor:
- SandboxStore.asSink() centralizes the {put,has,evict} sink + uri<->id
  mapping; the embedded-MCP and in-app agent-tools wiring sites share it.

Tests:
- security headers (inline vs attachment, nosniff, CSP), 304 Cache-Control,
  putAndLink URL form, has()/remove(), asSink() round-trip, getSandboxPublicUrl
  (trailing-slash trim + APP_URL fallback), and a stash test where the doc put
  itself evicts a mirrored image.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 19:08:06 +03:00
claude_code 6eb335d5e3 fix(sandbox): address PR #250 review — SSRF guard, eviction safety, cleanup (#243)
Security:
- stash_page: reject path-traversal / percent-encoded srcs before the authed
  loopback fetch (resolveInternalFilePath), closing an SSRF/exfiltration hole
  where a crafted node.attrs.src could read an arbitrary internal GET endpoint
  into the anonymous sandbox.

Stability:
- stash_page: revert + recount mirrors FIFO-evicted by a later put in the same
  stash (no dangling sandbox refs, honest images.mirrored/failed); free image
  blobs if the final document put throws.
- Reject/clamp non-positive SANDBOX_TTL_MS to the 1h default (warn once).
- Log mirror failures unconditionally (console.warn, no blob bodies).

Cleanup / architecture:
- Remove dead expiresAt from SandboxPutResult.
- Centralize the /api/sb route in SANDBOX_ROUTE_SEGMENT/SANDBOX_API_PATH and
  move URL composition into SandboxStore.putAndLink; drop the duplicated sink
  closures and the now-unused EnvironmentService injection from McpService and
  AiChatToolsService.
- Un-export isInternalFileUrl; document the process-local (instance-bound)
  sandbox limitation in the tool description and .env.example.

Docs/tests:
- README/README.ru: 38 -> 39 tools + stash_page entry.
- Add traversal/normalize/recursion unit tests, stash self-eviction +
  doc-put-throw + empty/octet-stream mock tests, controller If-None-Match
  (wildcard/weak/list) + Cache-Control tests, and SANDBOX_TTL_MS validation
  tests. Regenerate packages/mcp/build.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 18:02:46 +03:00
claude code agent 227 2fe4ca8537 feat(sandbox): in-RAM blob sandbox for out-of-band page transfer (#243)
Add an ephemeral, process-local blob store so the in-app agent (and the
embedded MCP) can hand a large page document and its images to an external
consumer WITHOUT routing the bytes through the model context or Docmost auth.

- SandboxStore (@Injectable singleton): Map<uuid,{buf,mime,sha256,expiresAt}>
  in RAM only. put() picks a per-blob cap by mime (image vs doc), enforces a
  total-bytes RAM guard with oldest-first eviction, and stamps a TTL; get()
  lazily expires. sha256 computed at put() doubles as the strong ETag. An
  unref'd sweep interval clears expired entries and is cleared on destroy.
- GET /api/sb/:uuid anonymous controller: serves raw bytes with Content-Type,
  Content-Length and ETag=sha256; 404 on missing/expired/non-UUID (anti-
  traversal), 304 on a matching If-None-Match. No tokens, no 401 — the
  capability is the unguessable UUID + short TTL + TLS. Auth-exempt the same
  way as /api/files/public (no JwtAuthGuard) plus an /api/sb entry in main.ts's
  workspace-resolution preHandler so a remote consumer with no workspace host
  is not rejected.
- stash_page tool in both layers (MCP resource_link + in-app {uri,size,sha256,
  images}). client.stashPage serializes the get_page_json shape, mirrors every
  INTERNAL file/image src (type-agnostic, covers drawio/excalidraw/video/file)
  into the sandbox under Docmost auth and rewrites src to the sandbox URL;
  external http(s) srcs are left untouched; dedup by src; a failed image fetch
  is counted, never aborts the doc.
- SANDBOX_PUBLIC_URL / SANDBOX_TTL_MS / SANDBOX_MAX_BYTES /
  SANDBOX_MAX_IMAGE_BYTES / SANDBOX_MAX_TOTAL_BYTES wired through the
  environment service + validation + .env.example.
- SandboxModule (@Global) provides the shared store to the controller,
  McpService and AiChatToolsService (same instance for put and get).

Tests: SandboxStore (round-trip, sha256, TTL lazy + sweep, caps, eviction),
SandboxController (200+ETag+CT+CL, 404 missing/expired/non-UUID, 304), and a
mock-HTTP stashPage test (mirror+rewrite internal, keep external, dedup, failed
image counted, returns only a link). Interoperates with the vvzvlad/habr-mcp
consumer's anonymous-GET + sha256-ETag + resource_link contract.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 15:13:11 +03:00
claude code agent 227 d0ca127d83 refactor(ai-chat): drift-guard the DocmostClientLike hand-mirror (#193)
Issue #193's tool-half has two open items. The shared, zod-agnostic tool-spec
registry (SHARED_TOOL_SPECS) for the identical tools is already merged
(f3fa15e7) and consumed by both layers, so that subset is done. The remaining
items are: (a) deriving the layer-3 hand-mirror `DocmostClientLike` from the
real client type, and (b) folding more tools into the registry. Both were
deferred as risky, and that deferral still holds (verified, see below) — so
this change ships the safest concrete increment instead of forcing the risk.

What this adds (behaviour-neutral, test-only + a doc comment):

- packages/mcp/test/unit/client-host-contract.test.mjs: pins the layer-3
  contract from the ESM side, where the real DocmostClient is importable. It
  asserts every method the in-app `DocmostClientLike` mirror declares exists as
  a function on a real DocmostClient instance (constructor is side-effect-free).
  A rename/removal in client.ts now fails this test instead of silently shipping
  a runtime "x is not a function" into an agent tool call. Negative-case
  verified (a bogus method name is detected).

- docmost-client.loader.ts: replaces the vague mirror comment with a pointer to
  the guard test and a concrete, empirically-grounded staged plan for the full
  type-derivation. Verified blockers kept it deferred: @docmost/mcp emits no
  .d.ts (no `declaration`, no `types` export) and the server has no path mapping
  for it, so there is no type to import today; and the real methods' inferred
  CONCRETE return types conflict with the in-app adapter's loose
  Record<string,unknown> + `as`-cast result handling (deriving the exact type
  breaks the build / forces pervasive double-casts and full-surface test stubs).

Out of scope (noted in the issue): the PM<->Markdown converter unification.

Verified: server tsc clean; mcp tsc clean; mcp tests 369 pass (367 + 2 new);
ai-chat tools specs 51 pass. No behaviour change; committed mcp build untouched
(no mcp src changed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 15:07:43 +03:00
claude code agent 227 4c0a4eb9cc fix(ai-chat): settle detached runs on pre-stream failures + review fixes (#184)
CRITICAL: any failure between a successful beginRun and streamText's terminal
callbacks taking ownership (the bare awaits: user-message insert, history load,
convertToModelMessages, settings resolve; the buildSystemPrompt/forUser block;
and synchronous streamText wiring) left ai_chat_runs stuck 'running' forever
(sweepRunning only runs at startup), which then 409'd every future turn in the
chat and made the observer tab poll forever. Wrap the body of stream() after
beginRun in a safety-net try/catch that settles the run to 'error' (via
onSettled) before rethrowing, and make finalizeRun idempotent (active.delete is
the once-guard) so a settle here and a settle from a streamText callback collapse
to a single terminal write.

Also from review comment 2519:
- correct three client comments that falsely claimed /ai-chat/run is "flag-gated
  server-side and would 403" — it is owner-gated only; with the feature off the
  chat simply has no runs so the endpoint returns { run: null }
  (ai-chat-window.tsx, ai-chat-service.ts, ai-chat-query.ts).
- remove the dead UpdatableAiChatRun type (zero usages; the repo update uses an
  inline Partial<...>).
- add controller specs for POST /ai-chat/run and /ai-chat/stop (owner-gating,
  run:null when no run, run+message, stop by runId and by chatId).
- add tests: an exception after beginRun settles the run to 'error' and drops the
  in-memory entry (next turn is not 409'd); finalizeRun is idempotent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 14:54:19 +03:00
a 6390c45658 fix(ai-chat): close the concurrent-run race in #184 (insert is the gate)
The "one active run per chat" guard was bypassable under a race. Two
simultaneous POST /ai-chat/stream on the same chat both passed the
controller's pre-hijack 409 check (a check-then-act TOCTOU), then the
loser's INSERT into ai_chat_runs hit the partial unique index
(ai_chat_runs_one_active_per_chat, 23505). That error was SWALLOWED, so
the second turn streamed UNTRACKED: no runId, not targetable by /stop,
and (autonomousRuns on) onClose won't abort it -> an orphan unstoppable
run that also spends provider tokens.

Make the unique-index INSERT the authoritative gate:

- AiChatRunService.beginRun: when the run-row INSERT fails with a 23505 on
  ONE_ACTIVE_RUN_PER_CHAT_INDEX (via isUniqueViolation/violatedConstraint),
  no longer swallow it -> throw a distinct RunAlreadyActiveError. Any other
  error (incl. a 23505 on a different constraint) propagates unchanged.
- AiChatService.stream: when begin throws RunAlreadyActiveError, reject the
  turn with a 409 ConflictException (code A_RUN_ALREADY_ACTIVE) BEFORE any
  AI/provider call -> no tokens spent, no untracked turn. Other begin
  failures keep the legacy best-effort fallback (stream socket-bound).
- ai-chat.controller: post-hijack catch honors an HttpException's real
  status/body (clean 409) instead of a blanket 500, since the race 409 is
  raised before a byte is written. Pre-check 409 now carries the same code.

The controller's cheap pre-check stays as a fast-path for the common
sequential double-submit; the INSERT violation is the race-safe backstop.

Tests: ai-chat-run.service.spec proves beginRun throws RunAlreadyActiveError
on the active-index 23505 (and only that constraint), leaks no controller,
and an integration-style two-concurrent-begins test where exactly one wins;
new ai-chat.service.run-race.spec proves stream rejects with a 409
ConflictException BEFORE any streamText/generateText and never persists an
untracked turn. The latter fails without the fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 14:37:07 +03:00
claude code agent 227 95781d80e1 feat(ai-chat): durable detached agent runs (#184 phase 1)
Make an agent turn a first-class, server-side RUN that keeps executing and
persisting its steps after the browser window closes, and that a later client
can reconnect to — the core invariant of #184. Phase 1 only; the full proposal
(cross-process BullMQ runner, resumable live-tail transport, autonomy triggers,
budgets, history compaction) is explicitly deferred.

What lands:
- `ai_chat_runs` lifecycle table + repo: the run as a persistent object
  (status pending->running->succeeded|failed|aborted, trigger, createdBy,
  assistantMessageId projection link, error, step_count, timings). A partial
  unique index enforces ONE ACTIVE run per chat; a startup sweep recovers
  dangling runs (mirrors #183's sweepStreaming).
- AiChatRunService: owns the run lifecycle + an in-memory abort registry. The
  abort is governed by the RUN (an explicit user stop), NOT the HTTP socket —
  so a browser disconnect no longer ends the turn. Reuses #183's socket-
  independent durable write path (consumeStream + flushAssistant) unchanged.
- Controller, behind `settings.ai.autonomousRuns`: /stream wraps the turn in a
  run and does NOT abort on disconnect (logs only); a clean 409 rejects a
  concurrent run on the same chat; new POST /ai-chat/stop (explicit stop) and
  POST /ai-chat/run (reconnect -> latest persisted run + its projection). The
  runId is surfaced on the streamed start metadata. Flag OFF = byte-for-byte
  legacy behavior.

Tests: AiChatRunService unit spec (lifecycle, disconnect != stop, explicit
stop aborts the signal, best-effort sweeps); ai_chat_runs integration spec
(one-active-run index, detached persist+reconnect with no subscriber, explicit
stop, stale-run sweep). Server tsc + build clean; touched jest green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 14:37:07 +03:00
claude_code bf09eec4e1 fix(ai): address reindex-progress review (PR #242)
- Delete the now-orphaned PageRepo.getIdsByWorkspace (its only caller,
  reindexWorkspace, switched to getEmbeddablePageIds). Its docstring still
  claimed "Used by the RAG bulk reindex"; re-grep confirmed zero callers.
- ai-settings.service.reindex(): if aiQueue.add() throws (Redis hiccup/
  shutdown) the worker never runs so its finally->clear() never fires,
  leaving the seeded progress record stuck for the full 1h TTL (button
  stuck "reindexing: 0 of N"). Roll back the seed THIS call wrote
  (seeded flag, only when get() was null) before re-throwing, so a
  concurrent active run's record is never wiped. Add tests for both the
  clear-on-throw and the don't-clear-a-concurrent-run paths.
- Add an integration spec (real Postgres) proving getEmbeddablePageIds'
  WHERE stays in lockstep with countEmbeddablePages: seeds every boundary
  case and asserts the returned id set equals the count.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 04:39:18 +03:00
claude code agent 227 38a863e5f7 refactor(agent-roles-catalog): store catalog as YAML with block-scalar instructions (#229)
The agent-roles catalog content files move from JSON to YAML so each role's long
`instructions` system prompt is stored as a literal block scalar (`|-`): editing
one sentence now produces a line-by-line diff and the prompt is editable as plain
multi-line text instead of a single escaped JSON string.

Data:
- `index.json` -> `index.yaml`, `bundles/<id>/<lang>.json` -> `<lang>.yaml`
  (old `.json` deleted). Converted programmatically via the `yaml` library with
  `lineWidth: 0`; round-trip verified deepEqual against the old JSON, so the
  resolved role content is byte-for-byte identical (the only `version` bump is
  fact-checker v2->3, carried over from develop during the rebase; see below).

Server (`AiAgentRolesCatalogProvider`):
- parse with `yaml`'s safe default (JSON-compatible) schema instead of
  `JSON.parse` — `strict: true` (rejects duplicate keys) and `maxAliasCount: 100`
  (billion-laughs guard); no custom `!!` tags / no code execution. Fetched paths
  become `index.yaml` / `<lang>.yaml`. The streaming 1 MB size cap,
  `redirect: 'error'`, 10s timeout and `^[a-z0-9-]+$` path-traversal/SSRF guard
  are unchanged; the hand-written type guards are untouched (`instructions` is
  still a string after parsing).
- add `yaml` as a direct server dependency (already in the lockfile as a
  transitive dep).

Catalog tooling:
- `scripts/check.mjs` parses the catalog as YAML (lockfile stays JSON); pin
  `yaml` as a devDependency of the catalog package.

Tests:
- provider spec fixtures serialized with `yaml`; new tests for the block-scalar
  `instructions` round-trip (exact multi-line string), malformed YAML and
  strict duplicate-key rejection -> BadGateway; size-cap and path-traversal
  cases retargeted to the `.yaml` paths.

Docs: README, `.env.example`, `catalog-types.ts` comments and CHANGELOG updated
to the YAML layout. `AI_AGENT_ROLES_CATALOG_URL` base-URL contract unchanged.

Rebase onto develop + review (PR #231, comment 2509):
- semantic conflict: develop's 89edddc5 bumped fact-checker v2->3 (flags errors
  instead of confirming facts) in the now-deleted `.json`. Resolved the
  modify/delete by taking the deletion and porting develop's v3 `description` +
  `instructions` (en + ru) into the YAML and setting `version: 3` in index.yaml.
  Verified by `node scripts/check.mjs` going green against develop's unchanged
  content-hash lock (the ported YAML hashes byte-identically to the v3 JSON).
- doc fix: ai-agent-roles.service.ts catalog comment "untrusted JSON" -> YAML.
- doc fix: parseYaml docstring no longer claims `strict: true` rejects unknown
  custom tags (yaml@2.8.x warns + resolves to a plain scalar, then the type
  guard rejects it); the duplicate-key claim is kept.
- doc: note in check.mjs that `yaml` resolves from the repo-ROOT node_modules
  (via shamefully-hoist), not the catalog package's own pinned devDependency.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 04:38:50 +03:00
a 95d07d8d6f fix(ai): align reindex live denominator with the steady-state count
Review fixes for the reindex-progress counter (#242):

1. Denominator jump (478 -> 500 -> 478): reindexWorkspace iterated
   getIdsByWorkspace() (ALL non-deleted pages) but the seed/status use
   countEmbeddablePages (text OR existing-embedding), so the live total exceeded
   the steady-state total whenever empty/text-less pages existed. Add
   PageRepo.getEmbeddablePageIds() that selects the IDs of the EXACT same set
   countEmbeddablePages counts (deletedAt IS NULL AND (text_content matches a
   non-whitespace char OR an EXISTS non-deleted pageEmbeddings row)), and have
   reindexWorkspace iterate THAT set with total = its length. Iteration set and
   count source change together, so done reaches exactly total == the
   steady-state denominator. Dropping text-less pages is correct (reindexPage
   no-ops on them; a page that lost its text but still has stale embeddings is in
   the set via the EXISTS clause and still gets its stale rows cleared). Removed
   the contradictory "worker overwrites with the real page count" / "denominator
   matches" comment.

2. Mid-run re-trigger reset: reindex() unconditionally re-seeded done=0 before an
   enqueue that de-dupes a running job, so a second click/admin/tab reset the
   visible counter while the worker kept incrementing. Now seed only when
   get(workspaceId) === null; the worker's own start() remains the single
   authoritative reset.

3. TTL: documented that it is intentionally tied to write progress
   (start/increment) and never refreshed on get(), so a dead worker's record
   can't be kept alive forever by client polling.

Tests: new embedding-reindex-progress.service.spec.ts (fake ioredis: hash ->
ReindexProgress, malformed/missing/non-numeric -> null, non-finite startedAt ->
0, hgetall throws -> null, start/increment issue hset/hincrby+expire and swallow
Redis errors); reindex() seed order + no-reseed-when-active guard; getMasked
live test now uses progress.total=500 vs DB 478 to pin the progress branch;
indexer specs updated to mock getEmbeddablePageIds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 04:32:36 +03:00