docs: add how-to-test.md (browser E2E + out-of-band) and link from AGENTS.md

Adds a testing guide covering how to verify features against a running stand:
drive the behaviour under test through the browser (not the API), verify
out-of-band in the DB/git, and the non-obvious traps. Notably the page has two
ProseMirror editors — [aria-label='Page title'] (non-collab) and
[aria-label='Page content'] (the collab body); querySelector('.ProseMirror')
returns the title, so tests must target the body editor and wait ~10s for the
hocuspocus store debounce. Links the new doc from AGENTS.md next to dev-stand.md
and adds a matching gotcha #8 to dev-stand.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
C9 Tester
2026-07-05 22:39:01 +03:00
parent 3267512ed9
commit 134b627806
3 changed files with 123 additions and 0 deletions
+6
View File
@@ -214,6 +214,12 @@ Run from the repo root unless noted. The dev workflow needs **Postgres (with the
> server, `APP_SECRET` mismatch between processes, a stale `editor-ext` white-
> screening the client, LAN exposure. See **[docs/dev-stand.md](docs/dev-stand.md)**
> for the step-by-step and the traps.
>
> **Testing the app against a stand** (browser E2E + out-of-band verification) has
> its own non-obvious traps — the page has two ProseMirror editors (only the body is
> collab-bound), a ~10s store debounce, and API-seeding the thing under test is a
> silent no-test. See **[docs/how-to-test.md](docs/how-to-test.md)** before writing
> UI tests.
```bash
pnpm install # install all workspaces (uses pnpm patches; see package.json `pnpm.patchedDependencies`)
+9
View File
@@ -131,5 +131,14 @@ const { Client } = require("pg");
7. **Migrations don't auto-run in dev** — run `migration:latest` after every pull
or branch switch.
8. **Automation (Playwright): type into the BODY editor, not the title.** A page has
two `.ProseMirror` editors — `[aria-label='Page title']` (non-collab) and
`[aria-label='Page content']` (the collab body). `document.querySelector('.ProseMirror')`
returns the TITLE editor, so typing there never changes body content and `mod+S`
versions nothing. Target `[aria-label='Page content']`, confirm it's collab-bound
(`el.editor.extensionManager.extensions.some(e=>e.name==='collaboration')`), and
wait ~10-12s for the store debounce before asserting `pages.content` changed. Full
testing methodology + traps: **[how-to-test.md](how-to-test.md)**.
See also the **Commands** and **Architecture → Two server processes** sections in
[`AGENTS.md`](../AGENTS.md).
+108
View File
@@ -0,0 +1,108 @@
# How to test the application (browser E2E + out-of-band)
How to actually verify a feature end-to-end against a running stand — driving the
**real app in a browser** and confirming results **out-of-band** in the DB/git, not
through the same API you're supposed to be testing. Written from real false-positives
that wasted hours (see **Traps** — read them before you write a test).
Prereq: a running stand — see **[dev-stand.md](dev-stand.md)**. Automation uses
Playwright (`pip install playwright && python -m playwright install chromium`).
## Principles
1. **Drive the behaviour under test through the browser.** The stand exists so you
exercise the real UI + realtime-collab + server path. Using `POST /api/pages/*` to
perform the action you're validating tests the API, not the app — an e2e suite can
do that. API calls are fine ONLY for one-time setup/fixtures, never for the
interaction you're asserting on.
2. **Evidence before claim.** Nothing "passes" without an artifact: a DB row, a git
diff, a screenshot looked at as an image. If you can't show it, you didn't verify it.
3. **Verify out-of-band.** Judge results from a source independent of the UI: `psql`
against the DB, a fresh `git clone` of a synced repo, a hard reload. Optimistic UI
lies about persistence.
4. **Disconfirm by default.** For each feature, actively try to prove it's broken
before concluding it works. Reload after every create/edit/save.
5. **Recon actuatability FIRST.** Before building editor tests, confirm the
interaction even works in your harness (does a typed edit reach the DB?). Skipping
this is how you ship a pile of tests that all silently exercised the wrong thing.
## The editor: two ProseMirror instances (READ THIS)
A page has **two** `.ProseMirror` editors:
| index | selector | role | collab? |
|---|---|---|---|
| 0 | `[aria-label='Page title']` | title field | **NO** (16 exts, no `collaboration`) |
| 1 | `[aria-label='Page content']` | body | **YES** (95 exts, has `collaboration`) |
`document.querySelector('.ProseMirror')` returns the **title** editor (first match).
Type there and you edit the title only — body page content never changes, so `mod+S`
"versions" unchanged content and every content test silently no-ops.
**Always target the body editor** and confirm it's collab-bound before typing:
```js
const el = document.querySelector("[aria-label='Page content']");
el.editor.extensionManager.extensions.some(e => e.name === 'collaboration'); // must be true
```
Body edits emit ~20 `/collab` websocket frames while typing and land in
`pages.content` after the **hocuspocus store debounce (~10s)** — so **wait ~12s**
before asserting persistence (checking at 6–8s is a false negative). `mod+S` (the
`save-version` stateless message) flushes immediately, so a version created right
after a settled body edit holds the typed text.
## A known-good browser flow
```
1. goto /s/<space-slug> # the "Create page" button lives in the space sidebar, not /home
2. click button[aria-label='Create page'] # fully UI-driven page creation
3. type into [aria-label='Page title'] # optional title
4. click [aria-label='Page content'] → type body text
5. wait ~12s (store debounce)
6. assert pages.content changed (psql) # out-of-band
7. mod+S / menu Save → assert page_history row (psql)
8. reload / fresh context → re-assert (persistence round-trip)
```
Auth: log in ONCE, save `storage_state.json`, reuse it across pages/agents (re-login
per run trips shared rate-limits). Cookie-based session authorizes both REST and the
collab websocket.
## Judging out-of-band
```bash
# page content / history
docker exec <db> psql -U docmost -d docmost -tAc \
"select coalesce(kind,'null'), content::text from page_history where page_id='<id>' order by created_at;"
# git-sync round-trip: clone the space repo and diff against what you pushed
git clone http://<user>:<pass>@127.0.0.1:3000/git/<spaceId>.git /tmp/x
```
`page_history.content` is full JSON — parse it, don't truncate the snippet, or a
marker check misses. For sync/async features (autosave, git-sync, idle-flush) use an
active probe: write a unique marker, wait past the debounce/poll window, re-read
out-of-band, ≥2 iterations — never conclude "broken" from a single snapshot.
## Traps (each of these produced a false result in a real run)
- **Wrong editor.** Typed into `.ProseMirror` (= title). Edits never touched body
content. → target `[aria-label='Page content']`.
- **Checked persistence too early.** Store debounce ~10s; a 6–8s check reads stale.
- **Truncated the DB snapshot** below where the test marker sits → false "content
missing".
- **API-seeded the content under test**, then "verified" the feature — that validated
the API, not the app.
- **Reused a fixed marker on a non-rebooted stand** → title/row collisions inflate
counts (`count==2`). Use a unique per-run marker (timestamp).
- **Idle/async read once** and called it "permanently broken" — it was mid-debounce.
- **Concluded env-limitation without a cross-build control.** If unsure whether a
failure is your harness or the product, run the SAME harness against a known-good
build; a divergence localizes it.
## Scope note
Some paths genuinely need a human in a real browser (rich drag-drop, native file
pickers, clipboard, and anything the harness can't actuate). Label those UNTESTED in
the report — "handled gracefully" is not "works". Keep four states distinct:
verified-working, defect, untested, env-limitation.