Editorial roles (Corrector/Factchecker) brute-forced `get_node` block-by-block to
find occurrences (unquoted «ё», straight quotes, «т.е.»), burning tokens. New
`search_in_page(pageId, query, {regex?, caseSensitive?, limit?})` reads the page's
ProseMirror JSON via the existing getPageRaw and searches it IN MEMORY — no server
endpoint, no DB/schema change, no touch to the packages/mcp/src/lib schema mirror.
New pure `searchInDoc(doc, query, opts)` (packages/mcp/src/lib/page-search.ts):
recursive descent to each TEXT CONTAINER (paragraph/heading/table-cell paragraph),
glues its inline text via `blockPlainText` (a match survives inline-mark
boundaries — e.g. «т.е.» split across bold/italic), searches literal (indexOf) or
regex, and returns `{ total, truncated, matches:[{ nodeId, blockIndex, type,
before, match, after }] }`. `nodeId` is the container's attrs.id or the
`#<topLevelIndex>` of the enclosing top-level block — the SAME ref format
get_node/patch_node/comment-anchoring accept (verified identical to getNodeByRef),
so the agent goes straight from a hit to a targeted comment; `before`/`after` are
~40-char windows for a unique selection. `total`/`truncated` always reported (never
silent truncation). Lives in the SHARED_TOOL_SPECS registry → exposed in BOTH
transports (external /mcp + in-app AI-chat), with a SERVER_INSTRUCTIONS line and a
DocmostClientLike signature + contract-test entry. Corrector/Factchecker prompts
get a one-line "use search_in_page first" hint (versions bumped, catalog hash lock
refreshed).
Guards: empty/whitespace query → clear error; invalid regex → clear error (not a
generic 500); zero-length regex matches (`\b`, `a*`) skipped with lastIndex
advanced (no loop/flood); MAX_PATTERN_LENGTH=1000, MAX_CONTAINER_TEXT=100k bound
each exec; limit clamped [1,200] (default 50).
Tests: new page-search.test.mjs (17) — literal+regex, case-sensitivity,
mark-boundary glue, nodeId for paragraph/heading (attrs.id) and table-cell
(#<index> fallback), context bounds, limit/total/truncated + clamp, invalid
regex/empty/over-long errors, zero-length skip, empty-doc null-safety.
mcp: tsc clean; node --test 467 passed (+17). apps/server: tsc --noEmit clean
(DocmostClientLike + wiring). catalog check.mjs OK.
Known limitations (from internal review, non-blocking):
- Residual ReDoS: a crafted catastrophic-backtracking pattern (e.g. `(a+)+$`)
against a large single container can hang the event loop — JS regex is not
interruptible, so the length caps bound the base but not the backtracking.
Realistic exposure is low (containers are small; the pattern is supplied by the
authenticated model). Candidate for a follow-up hardening (safe-regex validation
or a worker+timeout) if it matters.
- Case-insensitive LITERAL search folds via toLowerCase; a char whose lowercase
differs in length (e.g. Turkish İ) BEFORE a match could shift the context
window — negligible for the RU/EN editorial scenario.
- On a `#<index>` table-cell fallback, `type` is the inline container ("paragraph")
while nodeId addresses the top-level block — addressing is correct; the field is
documented as the container's type.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Chat now shows the agent's name in the comment header, so the '[Copyedit]' /
'[Structure]' / '[Style]' / '[Facts]' prefix each role prepended just
duplicated the visible author.
- Remove the 'open the comment with the label [Role]' instruction from all four
labelled roles (structural-editor, line-editor, fact-checker, proofreader),
ru + en; the narrator was already label-free.
- Severity tags ([Critical]/[Major]/[Minor]) and the fact-checker's verdicts
([Incorrect]/[Unverified]/…) are kept — they carry meaning, not the role name.
- Versions bumped: structural-editor 3->4, line-editor 3->4, fact-checker 4->5,
proofreader 6->7; content-hash lock refreshed.
Check: agent-roles-catalog check.mjs OK.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The copyeditor had to be re-run several times to surface all issues: it has no
'work the whole document' instruction (unlike the developmental editor and the
narrator), and the severity labels nudge it toward reporting only the salient
few.
- Add a HOW TO WORK section (ru + en): one pass over the whole text start to
finish; flag EVERY violation including all repeat occurrences and [Minor]
items; don't summarize instead of marking up; one run covers the whole text,
not just 'the most important'.
- proofreader version 5 -> 6, content-hash lock refreshed.
Check: agent-roles-catalog check.mjs OK.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The copyeditor was emitting summary comments ('throughout, unify KB'; 'switch
the decimal separator everywhere') that carry no applicable suggestedText — a
human can't one-click any of them. This was actively instructed: the TONE
section told it to 'group repeated fixes so you don't spawn dozens of identical
comments'.
- Invert that TONE guidance: spread repeated fixes across the specific spots;
ten targeted comments each with a ready replacement beat one blanket note;
'spawning' comments is normal for a copyeditor.
- HOW-TO: explicitly forbid summary/consistency notes and require a separate
targeted comment with its own suggestedText on EVERY occurrence; the only
exception is a note that genuinely can't be a fragment replacement.
- ru + en mirrored; proofreader version 4 -> 5, content-hash lock refreshed.
Check: agent-roles-catalog check.mjs OK.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- editorial roles (ru/en): proofreader and line editor attach suggestedText
replacements to targeted fixes; fact-checker ALWAYS attaches the ready
correction for [Incorrect] verdicts; structural editor and narrator get a
light-touch rule for in-place rewordings; role versions bumped and the
content-hash lock refreshed
- MCP SERVER_INSTRUCTIONS: route 'propose a concrete text fix for one-click
human approval' to create_comment with suggestedText (unique-selection
reminder); build/ artifacts rebuilt
- AI-chat SAFETY_FRAMEWORK: mention the comment-suggestion capability so the
default assistant offers ready fixes instead of only describing changes
Checks: catalog check.mjs OK; @docmost/mcp tests 448/448; server
ai-chat.prompt spec 28/28.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The agent-roles catalog content files move from JSON to YAML so each role's long
`instructions` system prompt is stored as a literal block scalar (`|-`): editing
one sentence now produces a line-by-line diff and the prompt is editable as plain
multi-line text instead of a single escaped JSON string.
Data:
- `index.json` -> `index.yaml`, `bundles/<id>/<lang>.json` -> `<lang>.yaml`
(old `.json` deleted). Converted programmatically via the `yaml` library with
`lineWidth: 0`; round-trip verified deepEqual against the old JSON, so the
resolved role content is byte-for-byte identical (the only `version` bump is
fact-checker v2->3, carried over from develop during the rebase; see below).
Server (`AiAgentRolesCatalogProvider`):
- parse with `yaml`'s safe default (JSON-compatible) schema instead of
`JSON.parse` — `strict: true` (rejects duplicate keys) and `maxAliasCount: 100`
(billion-laughs guard); no custom `!!` tags / no code execution. Fetched paths
become `index.yaml` / `<lang>.yaml`. The streaming 1 MB size cap,
`redirect: 'error'`, 10s timeout and `^[a-z0-9-]+$` path-traversal/SSRF guard
are unchanged; the hand-written type guards are untouched (`instructions` is
still a string after parsing).
- add `yaml` as a direct server dependency (already in the lockfile as a
transitive dep).
Catalog tooling:
- `scripts/check.mjs` parses the catalog as YAML (lockfile stays JSON); pin
`yaml` as a devDependency of the catalog package.
Tests:
- provider spec fixtures serialized with `yaml`; new tests for the block-scalar
`instructions` round-trip (exact multi-line string), malformed YAML and
strict duplicate-key rejection -> BadGateway; size-cap and path-traversal
cases retargeted to the `.yaml` paths.
Docs: README, `.env.example`, `catalog-types.ts` comments and CHANGELOG updated
to the YAML layout. `AI_AGENT_ROLES_CATALOG_URL` base-URL contract unchanged.
Rebase onto develop + review (PR #231, comment 2509):
- semantic conflict: develop's 89edddc5 bumped fact-checker v2->3 (flags errors
instead of confirming facts) in the now-deleted `.json`. Resolved the
modify/delete by taking the deletion and porting develop's v3 `description` +
`instructions` (en + ru) into the YAML and setting `version: 3` in index.yaml.
Verified by `node scripts/check.mjs` going green against develop's unchanged
content-hash lock (the ported YAML hashes byte-identically to the v3 JSON).
- doc fix: ai-agent-roles.service.ts catalog comment "untrusted JSON" -> YAML.
- doc fix: parseYaml docstring no longer claims `strict: true` rejects unknown
custom tags (yaml@2.8.x warns + resolves to a plain scalar, then the type
guard rejects it); the duplicate-key claim is kept.
- doc: note in check.mjs that `yaml` resolves from the repo-ROOT node_modules
(via shamefully-hoist), not the catalog package's own pinned devDependency.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rework the fact-checker editorial role prompt so it stops commenting on
correct facts and only flags problems (errors, doubtful, unverifiable).
- Add the directive "don't write/comment that a fact is right or confirmed:
your job is to find errors, not confirm facts" to both RU and EN bundles.
- Remove the [Подтверждено]/[Verified] verdict; reframe the verdict list as
"for problem claims only".
- Reword the role description (no longer "confirms") and the
comment-on-every-claim rule to "problem claims only".
- Bump fact-checker role version 2 -> 3 and refresh the content-hash lock.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The proofreader role content was changed (STYLE SHEET block removed) without
bumping its catalog version, so clients never saw an update. Bump proofreader
2 -> 3, and add a content-hash guard so this can't happen silently again.
- index.json: proofreader version 2 -> 3
- scripts/check.mjs: new content-hash guard. A scripts/content-hashes.json lock
maps slug -> { version, hash } (sha256 over emoji/autoStart/name/description/
instructions/launchMessage across all languages). check.mjs now fails when a
role's content changed without bumping its version; the new --update-hashes
(alias --fix) refreshes the lock but refuses to write when a bump is missing.
- check.mjs: also require every index.json role to carry a finite numeric
version (matches the server's catalog validation), with defense-in-depth so a
missing version can't bypass the bump guard.
- scripts/content-hashes.json: new lock artifact (not part of the served catalog).
- README.md: document the guard, the lockfile, --update-hashes, and the
prune-then-readd limitation.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
MUST-FIX
- isSourceUniqueViolation read the wrong error field: kysely-postgres-js
(postgres@3.4.8) puts the violated constraint on `constraint_name`, not
node-postgres' `.constraint`, so a concurrent same-slug+language import's
23505 was never recognized as a source-collision and surfaced a false
"name already exists" error. Now read `constraint_name` (with `.constraint`
as a fallback for other drivers). Fix the faked test fixture (it built the
error with the same wrong `.constraint` field, masking the bug): it now
uses `constraint_name`, so the test genuinely exercises the skip path and
FAILS against the unfixed code.
- Extract the catalog modal's role-state computation into a pure
`catalogRoleInstallState(role, workspaceRoles, language)` helper (mirrors
role-launch.ts) and cover it with vitest: import / installed / update /
same-slug-different-language.
SUGGESTIONS
- Restore IAiRoleUpdateFromCatalogResult as a discriminated union mirroring
the server; narrow the consumer via `"reason" in result` (the boolean
discriminant does not narrow under strictNullChecks:false).
- README: add a "How it's served" section documenting AI_AGENT_ROLES_CATALOG_URL
(remote http(s) base / local path / empty => in-repo folder).
- check.mjs: drop the redundant `const key = slug` alias.
- Cover the reason->message mapping in useUpdateAiRoleFromCatalogMutation
(4 branches) via renderHook with a mocked service.
- Cover importFromCatalog "bundle not in index" => BadGateway.
- Cover updateFromCatalog "slug in index but missing in bundle file" =>
not-in-catalog.
ARCHITECTURE
- Extract the shared catalog read prefix: a private `loadBundleById`
(fetchIndex -> meta -> fetchBundle -> versionMap) reused by getCatalogBundle
and importFromCatalog, and a `catalogRoleContentFields` mapper shared by the
import insert and update patch. The three orchestrations and their distinct
write paths stay separate.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Admins can browse a curated catalog of agent roles, import roles/bundles
into a workspace, and update an imported role when the catalog ships a
newer version.
Catalog: a set of JSON files (index.json manifest + bundles/<id>/<lang>.json)
served from a local folder (dev) or a remote http(s) base URL via
AI_AGENT_ROLES_CATALOG_URL. Seeded with the existing 7 RU roles (editorial +
research bundles) plus EN translations.
Server:
- migration: nullable jsonb `source` column on ai_agent_roles
({ slug, language, version }; null => manually created)
- catalog provider: remote fetch with timeout + streaming size cap, or local
read; ^[a-z0-9-]+$ segment guard against path-traversal/SSRF
- admin endpoints: catalog, catalog/bundle, import, update-from-catalog
- import/update match by slug+language; update preserves `enabled`
Client:
- catalog modal with language selector and Import/Installed/Update states
- "Import from catalog" button + empty-state CTA in the roles settings panel
- en-US/ru-RU strings
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>