vvzvlad/gitmost

fix(ai): show live reindex progress so the embeddings counter resets to 0 and climbs #242

Open

Ghost wants to merge 6 commits from fix/embeddings-reindex-progress into develop

Author	SHA1	Message	Date
claude code agent 227	bdc033e689	fix(ai): extract reindex-button loading predicate + correct poll comment (PR #242 ) F4: extract the reindex button `loading` predicate into a pure, unit-tested `isReindexButtonLoading({ mutationPending, deadline, status })` next to the other reindex helpers, replacing the inline JSX expression. Covers the load-bearing post-cap case (deadline nulled, reindexing stale-true -> not loading) plus mutationPending, active-run, and finished cases. F5: rewrite the `useAiSettingsQuery` poll comment to match the actual `nextReindexPollInterval` stop condition (continues while reindexing===true OR within deadline and not fully indexed; stops only when reindexing===false && indexed>=total, or the deadline cap) instead of the stale "until indexed===total". Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 01:49:55 +03:00
claude code agent 227	85b38d6946	fix(ai): address reindex-progress review round 1 (PR #242 ) F1: clear the "Reindex now" spinner once the poll cap fires. Gate the reindexing part of the button's loading state on the active poll window (reindexDeadline !== null) so a run that outlives the 120s cap no longer leaves the button stuck-disabled with a stale `reindexing: true`; the admin can restart. F2: rewrite reindexWorkspace JSDoc to describe the EMBEDDABLE page set (text OR existing embeddings), matching getEmbeddablePageIds / countEmbeddablePages instead of the old "every non-deleted page". F3: extract the shared embeddable-content predicate into a private PageRepo.embeddablePredicate helper, called by both countEmbeddablePages and getEmbeddablePageIds, removing the verbatim duplication. Behavior is identical (lockstep int-spec stays green). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-28 23:39:20 +03:00
claude_code	bf09eec4e1	fix(ai): address reindex-progress review (PR #242 ) - Delete the now-orphaned PageRepo.getIdsByWorkspace (its only caller, reindexWorkspace, switched to getEmbeddablePageIds). Its docstring still claimed "Used by the RAG bulk reindex"; re-grep confirmed zero callers. - ai-settings.service.reindex(): if aiQueue.add() throws (Redis hiccup/ shutdown) the worker never runs so its finally->clear() never fires, leaving the seeded progress record stuck for the full 1h TTL (button stuck "reindexing: 0 of N"). Roll back the seed THIS call wrote (seeded flag, only when get() was null) before re-throwing, so a concurrent active run's record is never wiped. Add tests for both the clear-on-throw and the don't-clear-a-concurrent-run paths. - Add an integration spec (real Postgres) proving getEmbeddablePageIds' WHERE stays in lockstep with countEmbeddablePages: seeds every boundary case and asserts the returned id set equals the count. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-28 04:39:18 +03:00
a	95d07d8d6f	fix(ai): align reindex live denominator with the steady-state count Review fixes for the reindex-progress counter (#242): 1. Denominator jump (478 -> 500 -> 478): reindexWorkspace iterated getIdsByWorkspace() (ALL non-deleted pages) but the seed/status use countEmbeddablePages (text OR existing-embedding), so the live total exceeded the steady-state total whenever empty/text-less pages existed. Add PageRepo.getEmbeddablePageIds() that selects the IDs of the EXACT same set countEmbeddablePages counts (deletedAt IS NULL AND (text_content matches a non-whitespace char OR an EXISTS non-deleted pageEmbeddings row)), and have reindexWorkspace iterate THAT set with total = its length. Iteration set and count source change together, so done reaches exactly total == the steady-state denominator. Dropping text-less pages is correct (reindexPage no-ops on them; a page that lost its text but still has stale embeddings is in the set via the EXISTS clause and still gets its stale rows cleared). Removed the contradictory "worker overwrites with the real page count" / "denominator matches" comment. 2. Mid-run re-trigger reset: reindex() unconditionally re-seeded done=0 before an enqueue that de-dupes a running job, so a second click/admin/tab reset the visible counter while the worker kept incrementing. Now seed only when get(workspaceId) === null; the worker's own start() remains the single authoritative reset. 3. TTL: documented that it is intentionally tied to write progress (start/increment) and never refreshed on get(), so a dead worker's record can't be kept alive forever by client polling. Tests: new embedding-reindex-progress.service.spec.ts (fake ioredis: hash -> ReindexProgress, malformed/missing/non-numeric -> null, non-finite startedAt -> 0, hgetall throws -> null, start/increment issue hset/hincrby+expire and swallow Redis errors); reindex() seed order + no-reseed-when-active guard; getMasked live test now uses progress.total=500 vs DB 478 to pin the progress branch; indexer specs updated to mock getEmbeddablePageIds. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-28 04:32:36 +03:00
a	630939e8f3	feat(ai): tighten reindex-progress polling on the reindexing flag Make the "Indexed N of N" counter update near-realtime during a reindex by tracking the server's active-run state instead of a pure time window: - Set REINDEX_POLL_INTERVAL to 5000ms (kept bounded by the cap). - Extract two pure, exported, unit-tested helpers: - nextReindexPollInterval: keep polling while the server reports an ACTIVE run (reindexing===true) OR within the deadline and not yet done; stop once the run is finished AND fully indexed (reindexing===false && indexed>=total) or the deadline cap is hit (the cap always wins, so a stuck/never-clearing progress record can't poll forever). - isReindexComplete: deadline-clear predicate mirroring that stop condition. - Wire the refetchInterval and the deadline-clearing effect to those helpers. - Keep the Reindex button spinner active for the whole run (loading also while settings.reindexing), reusing the existing loading prop; also blocks a redundant mid-run re-trigger (server de-dupes regardless). No SSE/websockets: polling keyed on the reindexing flag is the intended scope. The counter now tracks the actual active-reindex state and stops promptly when the server reports the run is done. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-28 04:32:36 +03:00
a	72bb03918d	fix(ai): show live reindex progress in semantic-search settings The "Indexed X of Y pages" counter stayed stuck at "478 of 478" during a manual "Reindex now" run instead of resetting to 0 and climbing. The status reports indexedPages = countIndexedPages (DISTINCT pages with >=1 embedding row), but reindex hard-replaces each page in its OWN small transaction, so nearly all pages always have rows -> the count never drops. Add a per-workspace live reindex-progress record in Redis (reusing the existing global ioredis client via RedisService, no new Redis config): - EmbeddingReindexProgressService: start/increment/clear/get over a Redis hash with a 1h TTL self-clean; all best-effort/cosmetic so a Redis failure degrades to the existing DB-count behavior. - AiSettingsService.reindex seeds {total, done:0, startedAt} at enqueue time so the very first poll already reports done=0. - EmbeddingIndexerService.reindexWorkspace overwrites total with the real page count at start, increments done per processed page (success or handled failure), and clears the record in a finally (covers success, fatal abort, and the unconfigured early-return) so a failed run never sticks. - AiSettingsService.getMasked returns the live run numbers when a progress record is active (plus an optional reindexing flag), else falls back to countIndexedPages/countEmbeddablePages. Per-page edits (reindexPage) never touch the workspace progress record, and no mass up-front delete is introduced (search availability preserved). Tests: indexer sets/increments/clears progress (incl. fatal abort and unconfigured early-return); status reports run progress when active and falls back when not. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-28 04:32:36 +03:00