F4: extract the reindex button `loading` predicate into a pure, unit-tested
`isReindexButtonLoading({ mutationPending, deadline, status })` next to the
other reindex helpers, replacing the inline JSX expression. Covers the
load-bearing post-cap case (deadline nulled, reindexing stale-true -> not
loading) plus mutationPending, active-run, and finished cases.
F5: rewrite the `useAiSettingsQuery` poll comment to match the actual
`nextReindexPollInterval` stop condition (continues while reindexing===true OR
within deadline and not fully indexed; stops only when reindexing===false &&
indexed>=total, or the deadline cap) instead of the stale "until indexed===total".
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
F1: clear the "Reindex now" spinner once the poll cap fires. Gate the
reindexing part of the button's loading state on the active poll window
(reindexDeadline !== null) so a run that outlives the 120s cap no longer
leaves the button stuck-disabled with a stale `reindexing: true`; the
admin can restart.
F2: rewrite reindexWorkspace JSDoc to describe the EMBEDDABLE page set
(text OR existing embeddings), matching getEmbeddablePageIds /
countEmbeddablePages instead of the old "every non-deleted page".
F3: extract the shared embeddable-content predicate into a private
PageRepo.embeddablePredicate helper, called by both countEmbeddablePages
and getEmbeddablePageIds, removing the verbatim duplication. Behavior is
identical (lockstep int-spec stays green).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Delete the now-orphaned PageRepo.getIdsByWorkspace (its only caller,
reindexWorkspace, switched to getEmbeddablePageIds). Its docstring still
claimed "Used by the RAG bulk reindex"; re-grep confirmed zero callers.
- ai-settings.service.reindex(): if aiQueue.add() throws (Redis hiccup/
shutdown) the worker never runs so its finally->clear() never fires,
leaving the seeded progress record stuck for the full 1h TTL (button
stuck "reindexing: 0 of N"). Roll back the seed THIS call wrote
(seeded flag, only when get() was null) before re-throwing, so a
concurrent active run's record is never wiped. Add tests for both the
clear-on-throw and the don't-clear-a-concurrent-run paths.
- Add an integration spec (real Postgres) proving getEmbeddablePageIds'
WHERE stays in lockstep with countEmbeddablePages: seeds every boundary
case and asserts the returned id set equals the count.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Review fixes for the reindex-progress counter (#242):
1. Denominator jump (478 -> 500 -> 478): reindexWorkspace iterated
getIdsByWorkspace() (ALL non-deleted pages) but the seed/status use
countEmbeddablePages (text OR existing-embedding), so the live total exceeded
the steady-state total whenever empty/text-less pages existed. Add
PageRepo.getEmbeddablePageIds() that selects the IDs of the EXACT same set
countEmbeddablePages counts (deletedAt IS NULL AND (text_content matches a
non-whitespace char OR an EXISTS non-deleted pageEmbeddings row)), and have
reindexWorkspace iterate THAT set with total = its length. Iteration set and
count source change together, so done reaches exactly total == the
steady-state denominator. Dropping text-less pages is correct (reindexPage
no-ops on them; a page that lost its text but still has stale embeddings is in
the set via the EXISTS clause and still gets its stale rows cleared). Removed
the contradictory "worker overwrites with the real page count" / "denominator
matches" comment.
2. Mid-run re-trigger reset: reindex() unconditionally re-seeded done=0 before an
enqueue that de-dupes a running job, so a second click/admin/tab reset the
visible counter while the worker kept incrementing. Now seed only when
get(workspaceId) === null; the worker's own start() remains the single
authoritative reset.
3. TTL: documented that it is intentionally tied to write progress
(start/increment) and never refreshed on get(), so a dead worker's record
can't be kept alive forever by client polling.
Tests: new embedding-reindex-progress.service.spec.ts (fake ioredis: hash ->
ReindexProgress, malformed/missing/non-numeric -> null, non-finite startedAt ->
0, hgetall throws -> null, start/increment issue hset/hincrby+expire and swallow
Redis errors); reindex() seed order + no-reseed-when-active guard; getMasked
live test now uses progress.total=500 vs DB 478 to pin the progress branch;
indexer specs updated to mock getEmbeddablePageIds.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make the "Indexed N of N" counter update near-realtime during a reindex by
tracking the server's active-run state instead of a pure time window:
- Set REINDEX_POLL_INTERVAL to 5000ms (kept bounded by the cap).
- Extract two pure, exported, unit-tested helpers:
- nextReindexPollInterval: keep polling while the server reports an ACTIVE run
(reindexing===true) OR within the deadline and not yet done; stop once the
run is finished AND fully indexed (reindexing===false && indexed>=total) or
the deadline cap is hit (the cap always wins, so a stuck/never-clearing
progress record can't poll forever).
- isReindexComplete: deadline-clear predicate mirroring that stop condition.
- Wire the refetchInterval and the deadline-clearing effect to those helpers.
- Keep the Reindex button spinner active for the whole run (loading also while
settings.reindexing), reusing the existing loading prop; also blocks a
redundant mid-run re-trigger (server de-dupes regardless).
No SSE/websockets: polling keyed on the reindexing flag is the intended scope.
The counter now tracks the actual active-reindex state and stops promptly when
the server reports the run is done.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The "Indexed X of Y pages" counter stayed stuck at "478 of 478" during a
manual "Reindex now" run instead of resetting to 0 and climbing. The status
reports indexedPages = countIndexedPages (DISTINCT pages with >=1 embedding
row), but reindex hard-replaces each page in its OWN small transaction, so
nearly all pages always have rows -> the count never drops.
Add a per-workspace live reindex-progress record in Redis (reusing the
existing global ioredis client via RedisService, no new Redis config):
- EmbeddingReindexProgressService: start/increment/clear/get over a Redis hash
with a 1h TTL self-clean; all best-effort/cosmetic so a Redis failure degrades
to the existing DB-count behavior.
- AiSettingsService.reindex seeds {total, done:0, startedAt} at enqueue time so
the very first poll already reports done=0.
- EmbeddingIndexerService.reindexWorkspace overwrites total with the real page
count at start, increments done per processed page (success or handled
failure), and clears the record in a finally (covers success, fatal abort,
and the unconfigured early-return) so a failed run never sticks.
- AiSettingsService.getMasked returns the live run numbers when a progress
record is active (plus an optional reindexing flag), else falls back to
countIndexedPages/countEmbeddablePages.
Per-page edits (reindexPage) never touch the workspace progress record, and no
mass up-front delete is introduced (search availability preserved).
Tests: indexer sets/increments/clears progress (incl. fatal abort and
unconfigured early-return); status reports run progress when active and falls
back when not.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>