feat(editor): admin-only raw HTML/CSS/JS embed (variant C) #16

Merged
Ghost merged 7 commits from feat/html-embed-admin into develop 2026-06-20 20:19:06 +03:00

Implements docs/arbitrary-html-embed-plan.md — the owner-chosen Variant C: admin-only raw injection.

What

A new htmlEmbed block node renders and executes raw HTML/CSS/JS in the wiki origin (the original use-case: an analytics tracker that can read cookies / the page URL — which a sandboxed iframe can't). Because this is stored-XSS by design, the safety model is: only workspace admins/owners can get such a node persisted; everyone executes it when reading.

How

Node (editor-ext): htmlEmbed (atom, isolating, block). source is stored base64 in data-source for lossless HTML↔JSON round-trip. renderHTML emits only the encoded marker — it never inlines the raw markup, so generateHTML / export / search-index are not themselves injection vectors (only the client NodeView expands+executes). Registered in both the client extensions and the server tiptapExtensions (or the server would strip it). Markdown round-trips via an <!--html-embed:base64--> comment (turndown) + a marked rule.

Client NodeView: sets the source as HTML and re-creates <script> elements so they actually run (innerHTML-injected scripts don't auto-run); edit modal with a code textarea; renders in read-only/share too. The slash-menu "HTML embed" item is admin-gated (filtered by the user's workspace role).

Server enforcement (the real control): a client-only gate is insufficient — a non-admin could inject the node via the collab socket, REST/MCP/AI, paste, import, or duplication. stripHtmlEmbedNodes() removes htmlEmbed from any document persisted by a non-admin, applied at every write path that introduces content from an untrusted author.

Reasoning / decisions

  • Variant C (not the sandboxed iframe A) because the goal is page-analytics, which needs same-origin script execution; risk is contained by admin-gating + server enforcement.
  • renderHTML deliberately stores an encoded marker, not live markup, so no server-side HTML generation path becomes an injection vector.

Review findings & fixes (adversarial security review)

The review confirmed the guarded paths and renderHTML are safe, but found 2 BLOCKERS — write paths that wrote content directly, bypassing the strip, reachable by a non-admin with space-Edit:

  1. Zip/multi-file import (file-import-task.service) wrote content/ydoc with no strip → fixed: resolve the importer's role once, strip for non-admins.
  2. Page duplication (duplicatePage) copied content directly (explicitly bypassing the collab strip) → fixed: strip per duplicated page when the duplicating user isn't an admin.
    Also fixed transclusion unsync (returned a source snapshot retaining the embed → strip for non-admin callers). For the pre-persist broadcast window (WARNING): I verified anonymous public-share viewers do NOT open a collab socket (they render fetched, already-stripped content via ReadonlyPageEditor — no HocuspocusProvider), so the only residual is a transient execution among concurrent authenticated editors (semi-trusted space members) before the debounced strip — documented as accepted.

Final guard table — every non-admin write path now strips: collab persist ✓, REST/MCP/AI update ✓, single import ✓, zip import ✓ (new), duplication ✓ (new), transclusion unsync ✓ (new); restore introduces no new content. No remaining path lets a non-admin persist an executing htmlEmbed.

Verification

  • pnpm --filter @docmost/editor-ext build + pnpm --filter server build + pnpm --filter client build — clean.
  • html-embed.spec.ts15 pass (strip incl. nested subtrees; canAuthorHtmlEmbed role matrix; base64 UTF-8 codec; HTML↔JSON node round-trip keeps source; admin-gate write-path decision).
  • Browser (headless Chromium, collab on the branch schema): as admin, the gated "HTML embed" slash item is shown; inserting a script (#emb-probe/window.__embedProbe/console) executed (all three signals); after a save + full reload the node is still present and re-executes (proves it survives the collab save and isn't stripped for an admin). No real app errors. Screenshots captured.

🤖 Generated with Claude Code

Implements `docs/arbitrary-html-embed-plan.md` — the owner-chosen **Variant C: admin-only raw injection**. ## What A new `htmlEmbed` block node renders and **executes raw HTML/CSS/JS in the wiki origin** (the original use-case: an analytics tracker that can read cookies / the page URL — which a sandboxed iframe can't). Because this is stored-XSS by design, the safety model is: **only workspace admins/owners can get such a node persisted**; everyone executes it when reading. ## How **Node (editor-ext):** `htmlEmbed` (atom, isolating, block). `source` is stored **base64** in `data-source` for lossless HTML↔JSON round-trip. `renderHTML` emits only the encoded marker — it never inlines the raw markup, so `generateHTML` / export / search-index are not themselves injection vectors (only the client NodeView expands+executes). Registered in **both** the client extensions and the server `tiptapExtensions` (or the server would strip it). Markdown round-trips via an `<!--html-embed:base64-->` comment (turndown) + a marked rule. **Client NodeView:** sets the source as HTML and **re-creates `<script>` elements** so they actually run (innerHTML-injected scripts don't auto-run); edit modal with a code textarea; renders in read-only/share too. The slash-menu "HTML embed" item is **admin-gated** (filtered by the user's workspace role). **Server enforcement (the real control):** a client-only gate is insufficient — a non-admin could inject the node via the collab socket, REST/MCP/AI, paste, import, or duplication. `stripHtmlEmbedNodes()` removes `htmlEmbed` from any document persisted by a **non-admin**, applied at **every** write path that introduces content from an untrusted author. ## Reasoning / decisions - **Variant C** (not the sandboxed iframe A) because the goal is page-analytics, which needs same-origin script execution; risk is contained by admin-gating + server enforcement. - `renderHTML` deliberately stores an encoded marker, not live markup, so no server-side HTML generation path becomes an injection vector. ## Review findings & fixes (adversarial security review) The review confirmed the guarded paths and `renderHTML` are safe, but found **2 BLOCKERS** — write paths that wrote content directly, bypassing the strip, reachable by a non-admin with space-Edit: 1. **Zip/multi-file import** (`file-import-task.service`) wrote `content`/`ydoc` with no strip → fixed: resolve the importer's role once, strip for non-admins. 2. **Page duplication** (`duplicatePage`) copied content directly (explicitly bypassing the collab strip) → fixed: strip per duplicated page when the duplicating user isn't an admin. Also fixed **transclusion unsync** (returned a source snapshot retaining the embed → strip for non-admin callers). For the **pre-persist broadcast window** (WARNING): I verified anonymous **public-share viewers do NOT open a collab socket** (they render fetched, already-stripped content via `ReadonlyPageEditor` — no `HocuspocusProvider`), so the only residual is a transient execution among **concurrent authenticated editors** (semi-trusted space members) before the debounced strip — documented as accepted. **Final guard table — every non-admin write path now strips:** collab persist ✓, REST/MCP/AI update ✓, single import ✓, zip import ✓ (new), duplication ✓ (new), transclusion unsync ✓ (new); restore introduces no new content. No remaining path lets a non-admin persist an executing `htmlEmbed`. ## Verification - `pnpm --filter @docmost/editor-ext build` + `pnpm --filter server build` + `pnpm --filter client build` — clean. - `html-embed.spec.ts` — **15 pass** (strip incl. nested subtrees; `canAuthorHtmlEmbed` role matrix; base64 UTF-8 codec; HTML↔JSON node round-trip keeps `source`; admin-gate write-path decision). - Browser (headless Chromium, collab on the branch schema): as admin, the gated "HTML embed" slash item is shown; inserting a script (`#emb-probe`/`window.__embedProbe`/console) **executed** (all three signals); after a save + full **reload** the node is **still present and re-executes** (proves it survives the collab save and isn't stripped for an admin). No real app errors. Screenshots captured. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Ghost added 2 commits 2026-06-20 08:55:30 +03:00
Adds an htmlEmbed block node that renders and executes raw HTML/CSS/JS in the
wiki origin (e.g. an analytics tracker) — the owner-chosen variant C. Because
this is stored-XSS by design, only workspace admins/owners may get such a node
persisted; everyone executes it when reading.

- Node (editor-ext): htmlEmbed atom/isolating block; source stored base64 in
  data-source for lossless HTML<->JSON round-trip. renderHTML emits only the
  encoded marker (never inlines raw markup), so generateHTML/export/search are
  not themselves injection vectors. Registered in BOTH client extensions and
  server tiptapExtensions. Markdown round-trip via an <!--html-embed:b64-->
  comment (turndown) + a marked rule.
- Client NodeView: injects source and re-creates <script> elements so they
  actually run; edit modal; renders in read-only/share too. Slash item is
  admin-gated (adminOnly filtered by the user's workspace role).
- SERVER ENFORCEMENT (the real control — UI gating alone is insufficient):
  stripHtmlEmbedNodes() removes htmlEmbed from any document persisted by a
  non-admin, applied at every write path that introduces content from an
  untrusted author: collab onStoreDocument, REST/MCP/AI updatePageContent,
  single-file import, zip/multi-file import, page duplication, and transclusion
  unsync. Page restore introduces no new content. Public share/readonly viewers
  render fetched (already-stripped) content and do NOT open a collab socket, so
  the only residual is a transient broadcast window to concurrent authenticated
  editors (documented).

Implements docs/arbitrary-html-embed-plan.md (variant C).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Ghost added 1 commit 2026-06-20 13:09:22 +03:00
Release-cycle red-team found the admin-only gate missed PageService.create():
content/textContent/ydoc were derived and persisted without the strip, so any
space member could POST /pages/create with an htmlEmbed node (incl. the
markdown/html <!--html-embed:BASE64--> form) and store executing JS for every
reader. Add the same gate used by duplicatePage: strip htmlEmbed when the
caller is not a workspace admin/owner. Role is plumbed from the controller
(user.role); unknown role => non-admin (strip). All four create paths (create,
duplicate, single import, zip import) plus the update paths are now guarded.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Ghost added 1 commit 2026-06-20 14:52:41 +03:00
Release-cycle test audit: the strip boundary was tested only via a stand-in
helper re-implemented in the spec, so a deleted/misplaced guard kept CI green
(the missing create() guard was proof). Replace it with tests against real code:
- persistence.extension.onStoreDocument: real ydoc from a rich doc (columns/
  table/mention/htmlEmbed) -> non-admin strip removes only htmlEmbed, every other
  node preserved (data-loss guard); admin keeps; empty fragment no-throw.
- collaboration.handler.updatePageContent: real path, user?.role gate, decoded
  ydoc embed-free for non-admin, kept for admin.
- transclusion unsync: member stripped, admin preserved.
- editor-ext gains a vitest setup (was zero tests) + a markdown round-trip:
  the <!--html-embed:BASE64--> marker -> htmlEmbed node with decoded source, and
  hasHtmlEmbedNode matches it — pinning the marked/turndown shape the import
  strip relies on. tsconfig now excludes specs from the shipped dist.
- Fail-closed identity: source-pinned contracts that the gate keys on
  fileTask.creatorId (zip) / request userId (single) / callerRole (create) /
  authUser.role (duplicate), and missing-user -> strip (services can't load under
  jest's ESM graph; helpers replay the exact predicate).
Adds the verified-safe ^src/ jest moduleNameMapper (identical fail set).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Ghost added 1 commit 2026-06-20 19:28:50 +03:00
The admin-only raw HTML/JS embed is a deliberate stored-XSS surface, so gate the
whole feature behind a workspace toggle that is OFF by default; it only works
when a workspace admin explicitly enables it.

- settings.htmlEmbed (boolean, default false) + workspace-update field htmlEmbed,
  persisted via WorkspaceRepo.updateSetting with an audit diff. Flipping it is
  admin-only (same Manage Settings CASL as other workspace toggles).
- New gate htmlEmbedAllowed(featureEnabled, role) = featureEnabled && admin/owner.
  All 7 server write paths (create, duplicate, collab onStoreDocument, REST/MCP/AI
  updatePageContent, single + zip import, transclusion unsync) now read the
  workspace's settings.htmlEmbed and strip unless (toggle ON AND admin). OFF
  (default, or a failed/empty workspace lookup) strips htmlEmbed for EVERYONE
  including admins -> existing embeds are cleaned up on next save, none persist.
- Client (defense-in-depth): the /html slash item is hidden unless toggle ON +
  admin; the NodeView executes nothing and shows a 'disabled in this workspace'
  placeholder when OFF; an admin Switch in Workspace Settings -> General with a
  description of the behavior.
- docs/html-embed-admin.md documents the toggle + admin-only + fail-closed
  coedit (a non-admin save strips an admin's embed) + execution semantics.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Ghost added 1 commit 2026-06-20 19:50:10 +03:00
The html-embed feature toggle was enforced CLIENT-side in the NodeView (reads
settings.htmlEmbed from the logged-in workspace), so an anonymous public-share
viewer — who has no workspace context — always saw it as OFF and got a
placeholder instead of the executing embed. That broke the whole point (a
tracker must run for anonymous visitors).

Make it server-authoritative:
- share.service prepareContentForShare (the single path both share-content
  flows use) strips htmlEmbed from served content when the workspace toggle is
  OFF; both callers (updatePublicAttachments host page + lookupTransclusionForShare)
  resolve the toggle once and pass it. Fail-closed: missing workspace -> OFF ->
  stripped.
- NodeView executes whatever it was served in read-only/share mode
  (shouldExecute = !editor.isEditable || htmlEmbedEnabled); the disabled
  placeholder now only shows in the editable editor when OFF.

Net: anonymous share + toggle ON -> server serves the (admin-authored) embed ->
it executes for everyone; toggle OFF -> stripped server-side from every
share-content path (true kill switch); a non-admin embed can never be served
(save-path strip). No XSS regression in the editable editor.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
vvzvlad added 1 commit 2026-06-20 20:18:55 +03:00
# Conflicts:
#	apps/server/src/core/workspace/services/workspace.service.ts
Ghost merged commit 7a03321d43 into develop 2026-06-20 20:19:06 +03:00
Sign in to join this conversation.
No Reviewers
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: vvzvlad/gitmost#16