fix(ai): show live reindex progress so the embeddings counter resets to 0 and climbs #242

Open
Ghost wants to merge 6 commits from fix/embeddings-reindex-progress into develop

6 Commits

Author SHA1 Message Date
claude code agent 227
bdc033e689 fix(ai): extract reindex-button loading predicate + correct poll comment (PR #242)
F4: extract the reindex button `loading` predicate into a pure, unit-tested
`isReindexButtonLoading({ mutationPending, deadline, status })` next to the
other reindex helpers, replacing the inline JSX expression. Covers the
load-bearing post-cap case (deadline nulled, reindexing stale-true -> not
loading) plus mutationPending, active-run, and finished cases.

F5: rewrite the `useAiSettingsQuery` poll comment to match the actual
`nextReindexPollInterval` stop condition (continues while reindexing===true OR
within deadline and not fully indexed; stops only when reindexing===false &&
indexed>=total, or the deadline cap) instead of the stale "until indexed===total".

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 01:49:55 +03:00
claude code agent 227
85b38d6946 fix(ai): address reindex-progress review round 1 (PR #242)
F1: clear the "Reindex now" spinner once the poll cap fires. Gate the
reindexing part of the button's loading state on the active poll window
(reindexDeadline !== null) so a run that outlives the 120s cap no longer
leaves the button stuck-disabled with a stale `reindexing: true`; the
admin can restart.

F2: rewrite reindexWorkspace JSDoc to describe the EMBEDDABLE page set
(text OR existing embeddings), matching getEmbeddablePageIds /
countEmbeddablePages instead of the old "every non-deleted page".

F3: extract the shared embeddable-content predicate into a private
PageRepo.embeddablePredicate helper, called by both countEmbeddablePages
and getEmbeddablePageIds, removing the verbatim duplication. Behavior is
identical (lockstep int-spec stays green).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 23:39:20 +03:00
claude_code
bf09eec4e1 fix(ai): address reindex-progress review (PR #242)
- Delete the now-orphaned PageRepo.getIdsByWorkspace (its only caller,
  reindexWorkspace, switched to getEmbeddablePageIds). Its docstring still
  claimed "Used by the RAG bulk reindex"; re-grep confirmed zero callers.
- ai-settings.service.reindex(): if aiQueue.add() throws (Redis hiccup/
  shutdown) the worker never runs so its finally->clear() never fires,
  leaving the seeded progress record stuck for the full 1h TTL (button
  stuck "reindexing: 0 of N"). Roll back the seed THIS call wrote
  (seeded flag, only when get() was null) before re-throwing, so a
  concurrent active run's record is never wiped. Add tests for both the
  clear-on-throw and the don't-clear-a-concurrent-run paths.
- Add an integration spec (real Postgres) proving getEmbeddablePageIds'
  WHERE stays in lockstep with countEmbeddablePages: seeds every boundary
  case and asserts the returned id set equals the count.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 04:39:18 +03:00
a
95d07d8d6f fix(ai): align reindex live denominator with the steady-state count
Review fixes for the reindex-progress counter (#242):

1. Denominator jump (478 -> 500 -> 478): reindexWorkspace iterated
   getIdsByWorkspace() (ALL non-deleted pages) but the seed/status use
   countEmbeddablePages (text OR existing-embedding), so the live total exceeded
   the steady-state total whenever empty/text-less pages existed. Add
   PageRepo.getEmbeddablePageIds() that selects the IDs of the EXACT same set
   countEmbeddablePages counts (deletedAt IS NULL AND (text_content matches a
   non-whitespace char OR an EXISTS non-deleted pageEmbeddings row)), and have
   reindexWorkspace iterate THAT set with total = its length. Iteration set and
   count source change together, so done reaches exactly total == the
   steady-state denominator. Dropping text-less pages is correct (reindexPage
   no-ops on them; a page that lost its text but still has stale embeddings is in
   the set via the EXISTS clause and still gets its stale rows cleared). Removed
   the contradictory "worker overwrites with the real page count" / "denominator
   matches" comment.

2. Mid-run re-trigger reset: reindex() unconditionally re-seeded done=0 before an
   enqueue that de-dupes a running job, so a second click/admin/tab reset the
   visible counter while the worker kept incrementing. Now seed only when
   get(workspaceId) === null; the worker's own start() remains the single
   authoritative reset.

3. TTL: documented that it is intentionally tied to write progress
   (start/increment) and never refreshed on get(), so a dead worker's record
   can't be kept alive forever by client polling.

Tests: new embedding-reindex-progress.service.spec.ts (fake ioredis: hash ->
ReindexProgress, malformed/missing/non-numeric -> null, non-finite startedAt ->
0, hgetall throws -> null, start/increment issue hset/hincrby+expire and swallow
Redis errors); reindex() seed order + no-reseed-when-active guard; getMasked
live test now uses progress.total=500 vs DB 478 to pin the progress branch;
indexer specs updated to mock getEmbeddablePageIds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 04:32:36 +03:00
a
630939e8f3 feat(ai): tighten reindex-progress polling on the reindexing flag
Make the "Indexed N of N" counter update near-realtime during a reindex by
tracking the server's active-run state instead of a pure time window:

- Set REINDEX_POLL_INTERVAL to 5000ms (kept bounded by the cap).
- Extract two pure, exported, unit-tested helpers:
  - nextReindexPollInterval: keep polling while the server reports an ACTIVE run
    (reindexing===true) OR within the deadline and not yet done; stop once the
    run is finished AND fully indexed (reindexing===false && indexed>=total) or
    the deadline cap is hit (the cap always wins, so a stuck/never-clearing
    progress record can't poll forever).
  - isReindexComplete: deadline-clear predicate mirroring that stop condition.
- Wire the refetchInterval and the deadline-clearing effect to those helpers.
- Keep the Reindex button spinner active for the whole run (loading also while
  settings.reindexing), reusing the existing loading prop; also blocks a
  redundant mid-run re-trigger (server de-dupes regardless).

No SSE/websockets: polling keyed on the reindexing flag is the intended scope.
The counter now tracks the actual active-reindex state and stops promptly when
the server reports the run is done.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 04:32:36 +03:00
a
72bb03918d fix(ai): show live reindex progress in semantic-search settings
The "Indexed X of Y pages" counter stayed stuck at "478 of 478" during a
manual "Reindex now" run instead of resetting to 0 and climbing. The status
reports indexedPages = countIndexedPages (DISTINCT pages with >=1 embedding
row), but reindex hard-replaces each page in its OWN small transaction, so
nearly all pages always have rows -> the count never drops.

Add a per-workspace live reindex-progress record in Redis (reusing the
existing global ioredis client via RedisService, no new Redis config):
- EmbeddingReindexProgressService: start/increment/clear/get over a Redis hash
  with a 1h TTL self-clean; all best-effort/cosmetic so a Redis failure degrades
  to the existing DB-count behavior.
- AiSettingsService.reindex seeds {total, done:0, startedAt} at enqueue time so
  the very first poll already reports done=0.
- EmbeddingIndexerService.reindexWorkspace overwrites total with the real page
  count at start, increments done per processed page (success or handled
  failure), and clears the record in a finally (covers success, fatal abort,
  and the unconfigured early-return) so a failed run never sticks.
- AiSettingsService.getMasked returns the live run numbers when a progress
  record is active (plus an optional reindexing flag), else falls back to
  countIndexedPages/countEmbeddablePages.

Per-page edits (reindexPage) never touch the workspace progress record, and no
mass up-front delete is introduced (search availability preserved).

Tests: indexer sets/increments/clears progress (incl. fatal abort and
unconfigured early-return); status reports run progress when active and falls
back when not.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 04:32:36 +03:00