Commit Graph

56 Commits

Author SHA1 Message Date
claude_code
0b2af34029 test(integrations/client/packages): batch 2-4 unit coverage + zip-slip guard extraction
Batch 2-4 of the test-strategy rollout. Test-only except one minimal,
behaviour-preserving extraction in file.utils.ts. All suites green:
server 82 suites/836+1todo, editor-ext 86, mcp 270, client (new files) 86.

integrations (server):
- file.utils.ts: extract pure `isEntryPathSafe(entryName, targetDir)` from
  extractZipInternal so the zip-slip/path-traversal guard is unit-testable;
  call site rerouted, behaviour identical (only a warn-message string merged).
- file.utils.zip-safety.spec.ts: traversal/strip/__MACOSX/prefix-confusion
  cases (mutation-resistant: fails if containment loses the path.sep).
- import-formatter / import.utils / table-utils / export utils / import.service
  extractTitleAndRemoveHeading: pure import/export transforms, Notion/XWiki
  formatting, table colspan widths (idempotent), slug/link rewriting.

client:
- safeRedirectPath: open-redirect guard, every reject branch independently.
- buildChatMarkdown (fence anti-breakout), label-colors, normalize-label,
  share tree build, page URL builders, notification time-grouping (fake clock).

packages:
- editor-ext: deriveFootnoteId golden table, parseHtmlEmbedHeight crafted
  values, orphan footnote extraction.
- mcp: deriveFootnoteId parity (drift guard vs editor-ext), applyTextEdits
  idempotency + cross-block replaceAll, diffDocs/summarizeChange on reorder.

Reviewed (APPROVE): extraction behaviour-preserving, assertions mutation-resistant.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 18:22:15 +03:00
claude_code
81823fce1e feat(html-embed): sandbox the embed block; split trusted trackers into an admin field
Convert the htmlEmbed node from same-origin raw-HTML execution to a sandboxed
iframe (sandbox="allow-scripts allow-popups allow-forms", no allow-same-origin,
srcdoc) with postMessage auto-resize (validated by event.source) and an optional
manual height attr. The block now runs in an opaque origin and cannot reach the
viewer's cookies/session/API, so it is safe for any member.

Because the block is now harmless, remove the entire admin/role gating apparatus:
drop htmlEmbedAllowed/canAuthorHtmlEmbed/stripDisallowedHtmlEmbedNodes/
collectHtmlEmbedSources and every role-based strip on the write paths (collab
REST/MCP + socket, page create/duplicate, import x2, transclusion unsync), along
with the now-unused WorkspaceRepo/UserRepo injections and the PageService.create
callerRole param. Keep one strip: prepareContentForShare still removes htmlEmbed
on the anonymous public-share read path when the workspace master toggle is OFF.

The workspace settings.htmlEmbed toggle is now a plain feature switch (gates the
slash-menu and share rendering); when ON the block is available to all members.

Add settings.trackerHead: an admin-only raw HTML/JS analytics snippet injected
verbatim into the <head> of public share pages only (ShareSeoController), for
trackers that genuinely need same-origin. Admin-gated via the existing CASL
Manage/Settings ability; never injected into the authenticated app shell.

Closes security-review findings #1, #2, #4, #5, #10 (and #3 as a security issue).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 02:48:41 +03:00
claude_code
2b3fc926cc Merge remote-tracking branch 'gitea/develop' into feat/html-embed-admin
# Conflicts:
#	apps/server/src/core/workspace/services/workspace.service.ts
2026-06-20 20:18:44 +03:00
vvzvlad
f650d2591b fix(tree): address realtime-tree-server review findings
- make addTreeNode receivers idempotent (invalidateOnCreatePage guard +
  buildTree dedup) so the author's self-echo no longer duplicates the node
- broadcast realtime tree updates for bulk copy/duplicate and import via a
  root refetch: PAGE_CREATED now carries spaceId and the WS listener falls
  back to refetchRootTreeNodeEvent when no per-node snapshot is present
- remove the now-dead client-relay inbound path (isTreeEvent/handleTreeEvent)
  that remained a stale-restriction-cache attack surface
- honest string|null cast for a root move's parent id
- add tests: buildTree dedup; onPageCreated per-node vs refetch branching

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 19:48:06 +03:00
claude code agent 227
8fcce6a674 feat(html-embed): per-workspace feature toggle, default OFF
The admin-only raw HTML/JS embed is a deliberate stored-XSS surface, so gate the
whole feature behind a workspace toggle that is OFF by default; it only works
when a workspace admin explicitly enables it.

- settings.htmlEmbed (boolean, default false) + workspace-update field htmlEmbed,
  persisted via WorkspaceRepo.updateSetting with an audit diff. Flipping it is
  admin-only (same Manage Settings CASL as other workspace toggles).
- New gate htmlEmbedAllowed(featureEnabled, role) = featureEnabled && admin/owner.
  All 7 server write paths (create, duplicate, collab onStoreDocument, REST/MCP/AI
  updatePageContent, single + zip import, transclusion unsync) now read the
  workspace's settings.htmlEmbed and strip unless (toggle ON AND admin). OFF
  (default, or a failed/empty workspace lookup) strips htmlEmbed for EVERYONE
  including admins -> existing embeds are cleaned up on next save, none persist.
- Client (defense-in-depth): the /html slash item is hidden unless toggle ON +
  admin; the NodeView executes nothing and shows a 'disabled in this workspace'
  placeholder when OFF; an admin Switch in Workspace Settings -> General with a
  description of the behavior.
- docs/html-embed-admin.md documents the toggle + admin-only + fail-closed
  coedit (a non-admin save strips an admin's embed) + execution semantics.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 19:28:39 +03:00
claude code agent 227
caac5c7f36 test(html-embed): exercise the REAL admin-gate write paths + import round-trip
Release-cycle test audit: the strip boundary was tested only via a stand-in
helper re-implemented in the spec, so a deleted/misplaced guard kept CI green
(the missing create() guard was proof). Replace it with tests against real code:
- persistence.extension.onStoreDocument: real ydoc from a rich doc (columns/
  table/mention/htmlEmbed) -> non-admin strip removes only htmlEmbed, every other
  node preserved (data-loss guard); admin keeps; empty fragment no-throw.
- collaboration.handler.updatePageContent: real path, user?.role gate, decoded
  ydoc embed-free for non-admin, kept for admin.
- transclusion unsync: member stripped, admin preserved.
- editor-ext gains a vitest setup (was zero tests) + a markdown round-trip:
  the <!--html-embed:BASE64--> marker -> htmlEmbed node with decoded source, and
  hasHtmlEmbedNode matches it — pinning the marked/turndown shape the import
  strip relies on. tsconfig now excludes specs from the shipped dist.
- Fail-closed identity: source-pinned contracts that the gate keys on
  fileTask.creatorId (zip) / request userId (single) / callerRole (create) /
  authUser.role (duplicate), and missing-user -> strip (services can't load under
  jest's ESM graph; helpers replay the exact predicate).
Adds the verified-safe ^src/ jest moduleNameMapper (identical fail set).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 14:52:29 +03:00
claude code agent 227
bd28dbfe2b feat(editor): admin-only raw HTML/CSS/JS embed node
Adds an htmlEmbed block node that renders and executes raw HTML/CSS/JS in the
wiki origin (e.g. an analytics tracker) — the owner-chosen variant C. Because
this is stored-XSS by design, only workspace admins/owners may get such a node
persisted; everyone executes it when reading.

- Node (editor-ext): htmlEmbed atom/isolating block; source stored base64 in
  data-source for lossless HTML<->JSON round-trip. renderHTML emits only the
  encoded marker (never inlines raw markup), so generateHTML/export/search are
  not themselves injection vectors. Registered in BOTH client extensions and
  server tiptapExtensions. Markdown round-trip via an <!--html-embed:b64-->
  comment (turndown) + a marked rule.
- Client NodeView: injects source and re-creates <script> elements so they
  actually run; edit modal; renders in read-only/share too. Slash item is
  admin-gated (adminOnly filtered by the user's workspace role).
- SERVER ENFORCEMENT (the real control — UI gating alone is insufficient):
  stripHtmlEmbedNodes() removes htmlEmbed from any document persisted by a
  non-admin, applied at every write path that introduces content from an
  untrusted author: collab onStoreDocument, REST/MCP/AI updatePageContent,
  single-file import, zip/multi-file import, page duplication, and transclusion
  unsync. Page restore introduces no new content. Public share/readonly viewers
  render fetched (already-stripped) content and do NOT open a collab socket, so
  the only residual is a transient broadcast window to concurrent authenticated
  editors (documented).

Implements docs/arbitrary-html-embed-plan.md (variant C).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 08:54:54 +03:00
claude_code
732aaf54f8 refactor(import): remove non-functional DOCX/PDF/Confluence import stubs
These import paths relied on the private EE module that was deleted from
the repo. In the community build they either threw 'enterprise license'
(DOCX/PDF) or silently no-op'd (Confluence). The frontend buttons were
already removed in 38064064; this cleans up the dead backend stubs.

- import.service.ts: drop processDocx/processPdf methods, their dispatcher
  branches, the pageId computation + insertPage spread, and the now-unused
  moduleRef param/ModuleRef import
- file-import-task.service.ts: drop the Confluence branch and the now-unused
  moduleRef param/ModuleRef import
- import.controller.ts: restrict file extensions to .md/.html and zip
  sources to generic/notion; update the error message accordingly
- file.utils.ts: remove Confluence from the FileImportSource enum
- features.ts: remove the unused CONFLUENCE_IMPORT/DOCX_IMPORT/PDF_IMPORT
  feature keys

The isConfluenceImport logic in import-attachment.service.ts is intentionally
left in place (real shared attachment-parsing code, not a stub); its removal
is a separate, riskier refactor.
2026-06-20 04:05:29 +03:00
vvzvlad
3d03417c73 fix(import): surface real error cause in /pages/import instead of generic 400
The two catch blocks in importPage() threw an opaque "Error processing file
content" / "Failed to create imported page" BadRequest, hiding the real cause
from the HTTP response. This made a production 400 regression impossible to
diagnose without server log access, and violated the project convention that
errors must never be swallowed.

Extract `${err.name}: ${err.message}` into both the log (full err object kept
for the stack) and the thrown BadRequestException. Inner processMarkdown/
processHTML rethrowing catches and the EE processDocx/processPdf license
catches are left unchanged.

Local reproduction of the happy-dom 14->20 theory failed (full import chain
+ 22 edge cases pass on happy-dom@20.8.9), so the root cause is still pending
the now-visible reason from a recurring 400. Diagnostic script test-import.tsx
added; backlog doc updated with findings.
2026-06-19 16:25:12 +03:00
Philipinho
ed0501a864 fix passing wrong object 2026-05-20 19:09:22 +01:00
Philip Okugbe
c247d4c1e3 feat(ee): PDF import (#2142)
* feat: replace pdfjs-dist with firecrawl-pdf-inspector

* use modified firecrawl-pdf-inspector

* feat: pdf import

* increase single file upload size limit

* use npm package

* sync

* update package
2026-05-01 14:56:39 +01:00
Philip Okugbe
09c69d7a0f feat: properly preserve table width (#2143) 2026-05-01 00:49:31 +01:00
Philipinho
ec83fc82d5 fix: refactor sanitize 2026-04-27 15:16:26 +01:00
Philip Okugbe
a6a7e4370a feat(ee): PDF export api (#2112)
* feat(ee): server side PDF export

* feat: pdf export queue

* sync

* sync
2026-04-14 16:26:54 +01:00
Philip Okugbe
a062f7a165 fix: enhance confluence importer (#2072)
* fix placeholder

* min resize dimensions

* fix media links

* fix
2026-03-30 13:16:40 +01:00
Philip Okugbe
7981ef462e feat(editor): audio and PDF nodes (#2064)
* use local resizable

* feat: aduio

* support audio imports

* feat: use confluence real file names

* cleanup

* error handling

* hide notice

* add audio

* fix pulse

* Fix import and export

* unify pulse

* hide in readonly mode

* keywords

* keyword

* translations

* better sort

* feat: PDF embed

* cleanup

* remove audio menu

* open active

* hide focus on readonly mode

* increase iframe default dimension
2026-03-28 17:33:29 +00:00
Philip Okugbe
fa4872e89e fix(deps): package updates (#2041)
* update
* overrides
* override
* fix page update mutation
* fix
* cleanup
* loader
* fix excalidraw package
* override
* fix regex
2026-03-25 10:07:01 +00:00
Philip Okugbe
7520c329d0 fix notion importer (#2027)
* fix notion importer

* fix link selector on mobile
2026-03-15 22:06:40 +00:00
Philip Okugbe
89b94e5d19 feat: refactor link menu (#2025)
* link markview - WIP

* WIP

* feat: refactor links

* cleanup
2026-03-15 17:08:59 +00:00
Philipinho
057360c6be fix: validate import size 2026-03-03 20:00:05 +00:00
Philipinho
b6478fee84 fix imports 2026-03-03 13:57:10 +00:00
Philipinho
9f4728e279 fix 2026-03-03 00:08:20 +00:00
Philip Okugbe
69d7532c6c feat(ee): audit logs (#1977)
feat: clickhouse driver
* sync
* updates
2026-03-01 01:29:03 +00:00
Philip Okugbe
b5803f42da xwiki html import cleanup (#1969) 2026-02-24 15:53:38 +00:00
Philip Okugbe
2f97a3debc feat: DOCX import (#1913) 2026-02-06 10:34:51 -08:00
Philip Okugbe
78b1c1a453 feat: switch to cursor pagination (#1884)
* add cursor pagination function

* support custom order modifier
* refactor returned object

* feat(db): migrate paginated endpoints to cursor-based pagination

* sync

* support hasPrevPage boolean

* feat(client): migrate pagination from offset to cursor-based

* support beforeCursor/prevCursor

* wrap search results in items array for API consistency
2026-01-30 19:28:54 +00:00
Philip Okugbe
6ccb2bb872 feat(export): add metadata file to preserve page icons and ordering on import (#1877)
* feat(export): add metadata file to preserve page icons and ordering on import
- Export includes `docmost-metadata.json`
- Import reads metadata to restore icons and sort siblings by original position

* cleanup

* bonus fixes

* handle unknown prosemirror nodes

* add docmost app  version
2026-01-27 16:39:39 +00:00
Philip Okugbe
54775f537d fix: handle malformed URLs gracefully during import/export (#1868)
* Handling malformed URLs gracefully

* Allow import of invalid URLs, but adding logging.

---------

Co-authored-by: gpapp <gergely.papp@itworks.hu>
2026-01-25 00:48:43 +00:00
Philip Okugbe
efb0a9317b feat: allow upload of large files (#1862)
* Allow upload of large files

* feat: createByteCountingStream utility function.

---------

Co-authored-by: gpapp <gergely.papp@itworks.hu>
2026-01-22 20:00:58 +00:00
Philip Okugbe
47097969a0 fix: use subquery (#1833)
- enhance file tasks list endpoint
2026-01-13 15:58:26 +00:00
Philipinho
ab96672ecd fix 2025-12-02 13:14:03 +00:00
Philip Okugbe
9fb16bc842 feat(EE): AI vector search (#1691)
* WIP

* AI module - init

* WIP

* sync

* WIP

* refactor naming

* new columns

* sync

* sync

* fix search bug

* stream response

* WIP

* feat embeddings sync

* refine

* Add workspaceId to page events

* refine

* WIP

* add translation string

* sync

* reset ai answer on query change

* hide AI search in cloud

* capture streaming error

* sync
2025-12-01 11:50:25 +00:00
Philip Okugbe
c3b350d943 fix: zip extraction validation (#1753)
* fix: zip extraction validation

* fix
2025-12-01 11:37:59 +00:00
Philip Okugbe
520c07a0bc fix: generic page import hierarchy (#1747)
* fix page hierarchy

* fix
2025-11-29 11:50:02 +00:00
Philip Okugbe
bf8cf6254f feat: Typesense search driver (EE) (#1664)
* feat: typesense driver (EE) - WIP

* feat: typesense driver (EE) - WIP

* feat: typesense

* sync

* fix
2025-10-07 17:34:32 +01:00
Philipinho
cf5bbb10df fix import html processing 2025-09-18 15:34:13 +01:00
Philip Okugbe
9ac180f719 fix: enhance page import (#1570)
* change import process

* fix processor

* fix page name in notion import

* preserve confluence table bg color

* sync
2025-09-17 23:50:27 +01:00
Philipinho
f413720e15 - sync
- reinstantiate S3 client to fix file upload errors during import
- delete import zip file after use
2025-09-14 03:00:23 +01:00
Philip Okugbe
7ada3cb1f9 fix: page import task (#1551)
* fix import

* - fix notion importer
- support notion page icon import
- fix horizontal rule css
- rename service file

* sync

* 3 mins delay
2025-09-13 03:14:59 +01:00
Philipinho
dc0650289d sync 2025-09-04 15:07:01 -07:00
Philipinho
d43ee77617 remove debug log 2025-09-04 09:40:17 -07:00
Philipinho
5d91eb4f5f feat: queue imported attachments for indexing 2025-09-04 09:38:30 -07:00
Philip Okugbe
1f797c3d27 fix: confluence drawio import (#1518)
* POC

* WIP - working

* WIP

* WIP

* sync

* fix drawio preview image
2025-09-03 05:19:09 +01:00
Philip Okugbe
3b85f4b616 fix: enforce C collation for page position ordering to ensure consistent behavior in Postgres 17+ (#1446)
- Add explicit C collation to position ordering queries to fix incorrect page placement in PostgreSQL 17+
- Ensures consistent ASCII-based ordering regardless of database locale settings
- Fixes issue where new pages were incorrectly placed at random positions instead of bottom
2025-08-04 09:49:29 +01:00
Philip Okugbe
4dfed2b2af queue import attachments upload (#1353) 2025-07-19 18:00:06 +01:00
Philip Okugbe
6d024fc3de feat: bulk page imports (#1219)
* refactor imports - WIP

* Add readstream

* WIP

* fix attachmentId render

* fix attachmentId render

* turndown video tag

* feat: add stream upload support and improve file handling

- Add stream upload functionality to storage drivers\n- Improve ZIP file extraction with better encoding handling\n- Fix attachment ID rendering issues\n- Add AWS S3 upload stream support\n- Update dependencies for better compatibility

* WIP

* notion formatter

* move embed parser to editor-ext package

* import embeds

* utility files

* cleanup

* Switch from happy-dom to cheerio
* Refine code

* WIP

* bug fixes and UI

* sync

* WIP

* sync

* keep import modal mounted

* Show modal during upload

* WIP

* WIP
2025-06-09 04:29:27 +01:00
Philip Okugbe
287b833838 feat: support pasting markdown (#606) 2025-01-04 16:57:36 +00:00
Philip Okugbe
e48b1c0dae fix: markdown math import (#529)
* fix: markdown math block import

* fix: block and inline math import

* cleanup
2024-12-09 15:08:25 +00:00
Philip Okugbe
dd0319a14d fix: index imported content (#495) 2024-11-20 13:36:36 +00:00
Philip Okugbe
384f11f2b7 make file upload size limit configurable (#386) 2024-10-10 21:28:28 +01:00