The agent-roles catalog content files move from JSON to YAML so each role's long
`instructions` system prompt is stored as a literal block scalar (`|-`): editing
one sentence now produces a line-by-line diff and the prompt is editable as plain
multi-line text instead of a single escaped JSON string.
Data:
- `index.json` -> `index.yaml`, `bundles/<id>/<lang>.json` -> `<lang>.yaml`
(old `.json` deleted). Converted programmatically via the `yaml` library with
`lineWidth: 0`; round-trip verified deepEqual against the old JSON, so the
resolved role content is byte-for-byte identical (the only `version` bump is
fact-checker v2->3, carried over from develop during the rebase; see below).
Server (`AiAgentRolesCatalogProvider`):
- parse with `yaml`'s safe default (JSON-compatible) schema instead of
`JSON.parse` — `strict: true` (rejects duplicate keys) and `maxAliasCount: 100`
(billion-laughs guard); no custom `!!` tags / no code execution. Fetched paths
become `index.yaml` / `<lang>.yaml`. The streaming 1 MB size cap,
`redirect: 'error'`, 10s timeout and `^[a-z0-9-]+$` path-traversal/SSRF guard
are unchanged; the hand-written type guards are untouched (`instructions` is
still a string after parsing).
- add `yaml` as a direct server dependency (already in the lockfile as a
transitive dep).
Catalog tooling:
- `scripts/check.mjs` parses the catalog as YAML (lockfile stays JSON); pin
`yaml` as a devDependency of the catalog package.
Tests:
- provider spec fixtures serialized with `yaml`; new tests for the block-scalar
`instructions` round-trip (exact multi-line string), malformed YAML and
strict duplicate-key rejection -> BadGateway; size-cap and path-traversal
cases retargeted to the `.yaml` paths.
Docs: README, `.env.example`, `catalog-types.ts` comments and CHANGELOG updated
to the YAML layout. `AI_AGENT_ROLES_CATALOG_URL` base-URL contract unchanged.
Rebase onto develop + review (PR #231, comment 2509):
- semantic conflict: develop's 89edddc5 bumped fact-checker v2->3 (flags errors
instead of confirming facts) in the now-deleted `.json`. Resolved the
modify/delete by taking the deletion and porting develop's v3 `description` +
`instructions` (en + ru) into the YAML and setting `version: 3` in index.yaml.
Verified by `node scripts/check.mjs` going green against develop's unchanged
content-hash lock (the ported YAML hashes byte-identically to the v3 JSON).
- doc fix: ai-agent-roles.service.ts catalog comment "untrusted JSON" -> YAML.
- doc fix: parseYaml docstring no longer claims `strict: true` rejects unknown
custom tags (yaml@2.8.x warns + resolves to a plain scalar, then the type
guard rejects it); the duplicate-key claim is kept.
- doc: note in check.mjs that `yaml` resolves from the repo-ROOT node_modules
(via shamefully-hoist), not the catalog package's own pinned devDependency.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
7.5 KiB
Agent roles catalog
This directory is data, not application code. It holds the content of an "agent roles catalog": reusable agent role definitions (system prompts plus a little metadata), grouped into bundles and translated into one or more languages. A separate server reads these files and serves them; nothing here is executable application logic except the validation script.
File layout
agent-roles-catalog/
index.yaml # the catalog manifest: bundles, languages, role versions
bundles/
<bundle-id>/
<lang>.yaml # one file per declared language (e.g. ru.yaml, en.yaml)
scripts/
check.mjs # validates the catalog (uses the `yaml` parser)
content-hashes.json # check artifact: per-role content-hash lock (NOT served)
package.json # defines the `check` script
README.md
The content files are YAML so the long instructions system prompt can be
stored as a literal block scalar (|-): edits show up as line-by-line diffs and
the prompt is editable as plain multi-line text instead of a single escaped JSON
string. The content-hashes.json lockfile under scripts/ stays JSON — it is a
check artifact, never served.
Currently shipped bundles:
editorial— the editorial suite (structural-editor, line-editor, fact-checker, proofreader, narrator), languagesru,en.research— a singleresearcherrole, languagesru,en.
How it's served
The server does not bundle this data; it reads it at request time from a single
configured location, the AI_AGENT_ROLES_CATALOG_URL env var
(EnvironmentService.getAiAgentRolesCatalogSource()), an http(s):// base URL
to the catalog's raw files. The server fetches <base>/index.yaml for the
manifest and <base>/bundles/<bundle-id>/<lang>.yaml for each opened bundle
file (REMOTE only).
That base URL is provided as a per-branch default in the Docker image (set in
CI: a develop build points at the develop raw URL, a release build at the
main raw URL) and can be overridden at runtime via the
AI_AGENT_ROLES_CATALOG_URL env var. Local-filesystem sources are no longer
supported; if the value is unset the catalog is unavailable.
The fetched YAML is parsed with a safe, JSON-compatible schema and re-validated
server-side (the catalog is treated as untrusted input). See .env.example for
the variable and the CHANGELOG for the rollout.
index.yaml schema
schemaVersion: 1
bundles:
- id: editorial # unique bundle id; matches bundles/<id>/
name: # localized display name
ru: "..."
en: "..."
description:
ru: "..."
en: "..."
languages: # which <lang>.yaml files must exist
- ru
- en
roles:
- slug: structural-editor
version: 1
# ...
version lives here, in index.yaml, per role. Bump it whenever a role's
content (instructions, name, description, etc.) changes, so consumers can detect
updates.
Bundle (<lang>.yaml) schema
schemaVersion: 1
language: ru
roles:
- slug: structural-editor # REQUIRED, unique across the whole catalog
emoji: "🧱"
name: "..." # REQUIRED, localized
description: "..." # localized
instructions: |- # REQUIRED, the system prompt, localized (literal block scalar)
First line of the prompt.
Second line.
autoStart: true # whether the role starts working immediately
launchMessage: "..." # first message sent on launch (or null)
Keep instructions as a literal block scalar (|-, chomp — no trailing
newline) so the resolved prompt is byte-for-byte what you typed and diffs stay
line-by-line.
Notes:
modelConfigis intentionally absent; the server treats an absentmodelConfigasnull.- A role's
slug,emoji, andautoStartare identical across all language files of the same bundle. Onlyname,description,instructions, andlaunchMessageare translated.
Slug uniqueness
Every slug must be UNIQUE ACROSS THE WHOLE CATALOG, not just within a
bundle. A slug appears once per language file of its bundle (same slug in
ru.yaml and en.yaml), but no two different bundles may share a slug.
scripts/check.mjs enforces this.
How to add things
Add a role to an existing bundle
- Add an entry to that bundle's
roles[]inindex.yamlwith a new uniqueslugandversion: 1. - Add a role object with the same
slugto every<lang>.yamlof the bundle, translatingname,description,instructions, andlaunchMessage. - Run the check (see below).
Add a bundle
- Add a bundle object to
index.yaml(id,name,description,languages,roles). - Create
bundles/<id>/<lang>.yamlfor each declared language, with one role object perroles[]entry. - Run the check.
Add a language to a bundle
- Add the language code to that bundle's
languages[]inindex.yaml. - Create
bundles/<id>/<lang>.yamlcontaining every role of the bundle, translated. - Run the check.
Change a role's content
Edit the role in the relevant <lang>.yaml file(s) and bump that role's
version in index.yaml. Then run node scripts/check.mjs --update-hashes
to refresh the content-hash lock (scripts/content-hashes.json). check.mjs
now fails if a role's content changed but its version was not bumped, so
this step is mandatory — the lock can only be refreshed after the bump.
Validating
From this directory:
node scripts/check.mjs # or: npm run check
It fails (exit code 1) if any slug is duplicated across the catalog, if a
bundle's index roles[] don't match the slugs present in each language file, if
a declared language file is missing, or if any role is missing a required field
(slug, name, instructions). It prints OK on success.
Content-hash guard
check.mjs also guards against changing a role's content without bumping its
version. It keeps a lockfile, scripts/content-hashes.json, mapping each role
slug to { version, hash }, where hash is a SHA-256 over the role's
content fields (emoji, autoStart, name, description, instructions,
launchMessage) across all of its language files, in a deterministic canonical
form. This lockfile is a check artifact only — the server fetches only
index.yaml and the bundle <lang>.yaml files, never this file, so it has no
effect on the served catalog or its schema.
On a normal run, for every role the check recomputes the hash and compares it against the lock:
- content unchanged and versions agree → OK;
- content changed but
versionnot bumped above the lock → error asking you to bump and refresh; - content changed and
versionbumped → error asking you to record it by refreshing the lock; - role missing from the lock, or a lock entry for a role that no longer exists → error asking you to refresh.
Refresh the lock with:
node scripts/check.mjs --update-hashes # alias: --fix
This recomputes the lock from the current catalog, prunes entries for removed
roles, and prints what changed — but it refuses to write (exit 1) if any
role's content changed while its index.yaml version was not bumped, so the
version bump is always enforced first. The check also requires every
index.yaml role to carry a finite numeric version (the server requires the
same).
Known, accepted limitation: a deliberate prune-then-readd of a slug (remove the
role and run --update-hashes, then re-add it with changed content at the same
version) is not caught, because a brand-new slug has no lock baseline to
enforce a bump against.