Editorial roles (Corrector/Factchecker) brute-forced `get_node` block-by-block to
find occurrences (unquoted «ё», straight quotes, «т.е.»), burning tokens. New
`search_in_page(pageId, query, {regex?, caseSensitive?, limit?})` reads the page's
ProseMirror JSON via the existing getPageRaw and searches it IN MEMORY — no server
endpoint, no DB/schema change, no touch to the packages/mcp/src/lib schema mirror.
New pure `searchInDoc(doc, query, opts)` (packages/mcp/src/lib/page-search.ts):
recursive descent to each TEXT CONTAINER (paragraph/heading/table-cell paragraph),
glues its inline text via `blockPlainText` (a match survives inline-mark
boundaries — e.g. «т.е.» split across bold/italic), searches literal (indexOf) or
regex, and returns `{ total, truncated, matches:[{ nodeId, blockIndex, type,
before, match, after }] }`. `nodeId` is the container's attrs.id or the
`#<topLevelIndex>` of the enclosing top-level block — the SAME ref format
get_node/patch_node/comment-anchoring accept (verified identical to getNodeByRef),
so the agent goes straight from a hit to a targeted comment; `before`/`after` are
~40-char windows for a unique selection. `total`/`truncated` always reported (never
silent truncation). Lives in the SHARED_TOOL_SPECS registry → exposed in BOTH
transports (external /mcp + in-app AI-chat), with a SERVER_INSTRUCTIONS line and a
DocmostClientLike signature + contract-test entry. Corrector/Factchecker prompts
get a one-line "use search_in_page first" hint (versions bumped, catalog hash lock
refreshed).
Guards: empty/whitespace query → clear error; invalid regex → clear error (not a
generic 500); zero-length regex matches (`\b`, `a*`) skipped with lastIndex
advanced (no loop/flood); MAX_PATTERN_LENGTH=1000, MAX_CONTAINER_TEXT=100k bound
each exec; limit clamped [1,200] (default 50).
Tests: new page-search.test.mjs (17) — literal+regex, case-sensitivity,
mark-boundary glue, nodeId for paragraph/heading (attrs.id) and table-cell
(#<index> fallback), context bounds, limit/total/truncated + clamp, invalid
regex/empty/over-long errors, zero-length skip, empty-doc null-safety.
mcp: tsc clean; node --test 467 passed (+17). apps/server: tsc --noEmit clean
(DocmostClientLike + wiring). catalog check.mjs OK.
Known limitations (from internal review, non-blocking):
- Residual ReDoS: a crafted catastrophic-backtracking pattern (e.g. `(a+)+$`)
against a large single container can hang the event loop — JS regex is not
interruptible, so the length caps bound the base but not the backtracking.
Realistic exposure is low (containers are small; the pattern is supplied by the
authenticated model). Candidate for a follow-up hardening (safe-regex validation
or a worker+timeout) if it matters.
- Case-insensitive LITERAL search folds via toLowerCase; a char whose lowercase
differs in length (e.g. Turkish İ) BEFORE a match could shift the context
window — negligible for the RU/EN editorial scenario.
- On a `#<index>` table-cell fallback, `type` is the inline container ("paragraph")
while nodeId addresses the top-level block — addressing is correct; the field is
documented as the container's type.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Chat now shows the agent's name in the comment header, so the '[Copyedit]' /
'[Structure]' / '[Style]' / '[Facts]' prefix each role prepended just
duplicated the visible author.
- Remove the 'open the comment with the label [Role]' instruction from all four
labelled roles (structural-editor, line-editor, fact-checker, proofreader),
ru + en; the narrator was already label-free.
- Severity tags ([Critical]/[Major]/[Minor]) and the fact-checker's verdicts
([Incorrect]/[Unverified]/…) are kept — they carry meaning, not the role name.
- Versions bumped: structural-editor 3->4, line-editor 3->4, fact-checker 4->5,
proofreader 6->7; content-hash lock refreshed.
Check: agent-roles-catalog check.mjs OK.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The copyeditor had to be re-run several times to surface all issues: it has no
'work the whole document' instruction (unlike the developmental editor and the
narrator), and the severity labels nudge it toward reporting only the salient
few.
- Add a HOW TO WORK section (ru + en): one pass over the whole text start to
finish; flag EVERY violation including all repeat occurrences and [Minor]
items; don't summarize instead of marking up; one run covers the whole text,
not just 'the most important'.
- proofreader version 5 -> 6, content-hash lock refreshed.
Check: agent-roles-catalog check.mjs OK.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The copyeditor was emitting summary comments ('throughout, unify KB'; 'switch
the decimal separator everywhere') that carry no applicable suggestedText — a
human can't one-click any of them. This was actively instructed: the TONE
section told it to 'group repeated fixes so you don't spawn dozens of identical
comments'.
- Invert that TONE guidance: spread repeated fixes across the specific spots;
ten targeted comments each with a ready replacement beat one blanket note;
'spawning' comments is normal for a copyeditor.
- HOW-TO: explicitly forbid summary/consistency notes and require a separate
targeted comment with its own suggestedText on EVERY occurrence; the only
exception is a note that genuinely can't be a fragment replacement.
- ru + en mirrored; proofreader version 4 -> 5, content-hash lock refreshed.
Check: agent-roles-catalog check.mjs OK.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- editorial roles (ru/en): proofreader and line editor attach suggestedText
replacements to targeted fixes; fact-checker ALWAYS attaches the ready
correction for [Incorrect] verdicts; structural editor and narrator get a
light-touch rule for in-place rewordings; role versions bumped and the
content-hash lock refreshed
- MCP SERVER_INSTRUCTIONS: route 'propose a concrete text fix for one-click
human approval' to create_comment with suggestedText (unique-selection
reminder); build/ artifacts rebuilt
- AI-chat SAFETY_FRAMEWORK: mention the comment-suggestion capability so the
default assistant offers ready fixes instead of only describing changes
Checks: catalog check.mjs OK; @docmost/mcp tests 448/448; server
ai-chat.prompt spec 28/28.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Rework the fact-checker editorial role prompt so it stops commenting on
correct facts and only flags problems (errors, doubtful, unverifiable).
- Add the directive "don't write/comment that a fact is right or confirmed:
your job is to find errors, not confirm facts" to both RU and EN bundles.
- Remove the [Подтверждено]/[Verified] verdict; reframe the verdict list as
"for problem claims only".
- Reword the role description (no longer "confirms") and the
comment-on-every-claim rule to "problem claims only".
- Bump fact-checker role version 2 -> 3 and refresh the content-hash lock.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The proofreader role content was changed (STYLE SHEET block removed) without
bumping its catalog version, so clients never saw an update. Bump proofreader
2 -> 3, and add a content-hash guard so this can't happen silently again.
- index.json: proofreader version 2 -> 3
- scripts/check.mjs: new content-hash guard. A scripts/content-hashes.json lock
maps slug -> { version, hash } (sha256 over emoji/autoStart/name/description/
instructions/launchMessage across all languages). check.mjs now fails when a
role's content changed without bumping its version; the new --update-hashes
(alias --fix) refreshes the lock but refuses to write when a bump is missing.
- check.mjs: also require every index.json role to carry a finite numeric
version (matches the server's catalog validation), with defense-in-depth so a
missing version can't bypass the bump guard.
- scripts/content-hashes.json: new lock artifact (not part of the served catalog).
- README.md: document the guard, the lockfile, --update-hashes, and the
prune-then-readd limitation.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>