Editorial roles (Corrector/Factchecker) brute-forced `get_node` block-by-block to
find occurrences (unquoted «ё», straight quotes, «т.е.»), burning tokens. New
`search_in_page(pageId, query, {regex?, caseSensitive?, limit?})` reads the page's
ProseMirror JSON via the existing getPageRaw and searches it IN MEMORY — no server
endpoint, no DB/schema change, no touch to the packages/mcp/src/lib schema mirror.
New pure `searchInDoc(doc, query, opts)` (packages/mcp/src/lib/page-search.ts):
recursive descent to each TEXT CONTAINER (paragraph/heading/table-cell paragraph),
glues its inline text via `blockPlainText` (a match survives inline-mark
boundaries — e.g. «т.е.» split across bold/italic), searches literal (indexOf) or
regex, and returns `{ total, truncated, matches:[{ nodeId, blockIndex, type,
before, match, after }] }`. `nodeId` is the container's attrs.id or the
`#<topLevelIndex>` of the enclosing top-level block — the SAME ref format
get_node/patch_node/comment-anchoring accept (verified identical to getNodeByRef),
so the agent goes straight from a hit to a targeted comment; `before`/`after` are
~40-char windows for a unique selection. `total`/`truncated` always reported (never
silent truncation). Lives in the SHARED_TOOL_SPECS registry → exposed in BOTH
transports (external /mcp + in-app AI-chat), with a SERVER_INSTRUCTIONS line and a
DocmostClientLike signature + contract-test entry. Corrector/Factchecker prompts
get a one-line "use search_in_page first" hint (versions bumped, catalog hash lock
refreshed).
Guards: empty/whitespace query → clear error; invalid regex → clear error (not a
generic 500); zero-length regex matches (`\b`, `a*`) skipped with lastIndex
advanced (no loop/flood); MAX_PATTERN_LENGTH=1000, MAX_CONTAINER_TEXT=100k bound
each exec; limit clamped [1,200] (default 50).
Tests: new page-search.test.mjs (17) — literal+regex, case-sensitivity,
mark-boundary glue, nodeId for paragraph/heading (attrs.id) and table-cell
(#<index> fallback), context bounds, limit/total/truncated + clamp, invalid
regex/empty/over-long errors, zero-length skip, empty-doc null-safety.
mcp: tsc clean; node --test 467 passed (+17). apps/server: tsc --noEmit clean
(DocmostClientLike + wiring). catalog check.mjs OK.
Known limitations (from internal review, non-blocking):
- Residual ReDoS: a crafted catastrophic-backtracking pattern (e.g. `(a+)+$`)
against a large single container can hang the event loop — JS regex is not
interruptible, so the length caps bound the base but not the backtracking.
Realistic exposure is low (containers are small; the pattern is supplied by the
authenticated model). Candidate for a follow-up hardening (safe-regex validation
or a worker+timeout) if it matters.
- Case-insensitive LITERAL search folds via toLowerCase; a char whose lowercase
differs in length (e.g. Turkish İ) BEFORE a match could shift the context
window — negligible for the RU/EN editorial scenario.
- On a `#<index>` table-cell fallback, `type` is the inline container ("paragraph")
while nodeId addresses the top-level block — addressing is correct; the field is
documented as the container's type.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
stashPage is declared in the server's DocmostClientLike interface and
shipped as the stash_page MCP tool (client.ts, tool-specs.ts, index.ts),
but the hand-maintained HOST_CONTRACT_METHODS mirror in the contract test
was never updated — so the drift-guard test failed and broke CI's
unit-test job. Add the missing name; both directions now agree.
uploadImage is internal to client.ts (called by insertImage/replaceImage);
the MCP transport (index.ts) does not call it directly. Remove it from the
comment's list of transport-called methods.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The DocmostClientLike mirror covers only methods the in-app adapter consumes;
the standalone MCP transport calls additional client methods not tracked here
(covered by its own typecheck). Fixes the misleading 'superset' wording (F2).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Architect-review hardening of the bidirectional DocmostClientLike <->
HOST_CONTRACT_METHODS guard (test-only, no production change):
- Interface method-name regex now accepts full TS identifiers
(digits/_/$) and generic signatures (method<T>(), avoiding a future
benign false-FAIL.
- Skip /* ... */ block comments in the interface body so a `name(` line
inside one is not falsely parsed as a method.
- Wrap the cross-package readFileSync with a clear "expected monorepo
layout" error instead of a bare ENOENT when run outside the monorepo.
- Narrow the guard's comments/error to state plainly it checks the
method-NAME set only; signature parity remains the deferred staged-plan
item.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The contract test only checked one direction (each name in
HOST_CONTRACT_METHODS exists on the real DocmostClient). But
HOST_CONTRACT_METHODS is itself a hand-copy of the server's
DocmostClientLike interface (docmost-client.loader.ts), and that
list<->interface link was untested: a method added to the interface +
consumed by the adapter but forgotten in the list (or removed from the
interface but left in the list) would escape both the server typecheck
(the pkg emits no .d.ts) and the existing test (name not in the list) ->
a runtime "x is not a function" in a tool call.
Parse the method names from the DocmostClientLike interface body (read
the .ts source via import.meta.url, scan member-signature lines) and
assert.deepEqual them against HOST_CONTRACT_METHODS BOTH ways. Lists are
currently identical (39=39), so this is a coverage hole closed, not a
live bug.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Issue #193's tool-half has two open items. The shared, zod-agnostic tool-spec
registry (SHARED_TOOL_SPECS) for the identical tools is already merged
(f3fa15e7) and consumed by both layers, so that subset is done. The remaining
items are: (a) deriving the layer-3 hand-mirror `DocmostClientLike` from the
real client type, and (b) folding more tools into the registry. Both were
deferred as risky, and that deferral still holds (verified, see below) — so
this change ships the safest concrete increment instead of forcing the risk.
What this adds (behaviour-neutral, test-only + a doc comment):
- packages/mcp/test/unit/client-host-contract.test.mjs: pins the layer-3
contract from the ESM side, where the real DocmostClient is importable. It
asserts every method the in-app `DocmostClientLike` mirror declares exists as
a function on a real DocmostClient instance (constructor is side-effect-free).
A rename/removal in client.ts now fails this test instead of silently shipping
a runtime "x is not a function" into an agent tool call. Negative-case
verified (a bogus method name is detected).
- docmost-client.loader.ts: replaces the vague mirror comment with a pointer to
the guard test and a concrete, empirically-grounded staged plan for the full
type-derivation. Verified blockers kept it deferred: @docmost/mcp emits no
.d.ts (no `declaration`, no `types` export) and the server has no path mapping
for it, so there is no type to import today; and the real methods' inferred
CONCRETE return types conflict with the in-app adapter's loose
Record<string,unknown> + `as`-cast result handling (deriving the exact type
breaks the build / forces pervasive double-casts and full-surface test stubs).
Out of scope (noted in the issue): the PM<->Markdown converter unification.
Verified: server tsc clean; mcp tsc clean; mcp tests 369 pass (367 + 2 new);
ai-chat tools specs 51 pass. No behaviour change; committed mcp build untouched
(no mcp src changed).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>