diff --git a/AGENTS.md b/AGENTS.md index 1ab3c40a..2823eb86 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -264,7 +264,7 @@ Two routes are mounted **outside** the `/api` prefix at the root, as raw Fastify - `core/ai-chat/external-mcp/` — admins can attach external MCP servers (e.g. Tavily) to give the agent web access. **`ssrf-guard.ts` validates outbound MCP URLs against SSRF** — keep that guard in the path when touching external-MCP connection logic. ### Git-sync (native two-way Docmost ↔ git Markdown sync) -`integrations/git-sync/` (`GitSyncModule`) + the vendored pure engine in `packages/git-sync`. Off by default; gated by the `GIT_SYNC_ENABLED` master switch (and `GIT_SYNC_SERVICE_USER_ID`, the account git-originated writes are attributed to). Per-space opt-in via `space.settings.gitSync.enabled`. Each enabled space gets an on-disk working "vault" repo; the `GitSyncOrchestrator` runs a debounced + poll-backstop reconcile cycle (PULL Docmost→vault, PUSH vault→Docmost) under a per-space Redis leader lock + in-process mutex (`SpaceLockService`). Writes go through the collaboration layer (so concurrent human edits aren't clobbered) and are stamped `lastUpdatedSource = 'git-sync'` for the listener loop-guard. The in-process `setInterval` orchestration + best-effort lock (no fencing tokens) is a known multi-replica limitation — BullMQ + fencing is the documented future direction. +`integrations/git-sync/` (`GitSyncModule`) + the vendored pure engine in `packages/git-sync`. Off by default; gated by the `GIT_SYNC_ENABLED` master switch (and `GIT_SYNC_SERVICE_USER_ID`, the account git-originated writes are attributed to). Per-space opt-in via `space.settings.gitSync.enabled`, with a second per-space toggle `space.settings.gitSync.autoMergeConflicts` that changes PUSH behavior for a still-conflicted page (one carrying `<<<<<<<`/`>>>>>>>` markers): **off (the safe default)** records a per-page failure and holds the refs so the user resolves the git conflict first (markers never reach Docmost); **on** strips the marker lines and pushes both sides' content. Each enabled space gets an on-disk working "vault" repo; the `GitSyncOrchestrator` runs a debounced + poll-backstop reconcile cycle (PULL Docmost→vault, PUSH vault→Docmost) under a per-space Redis leader lock + in-process mutex (`SpaceLockService`). Writes go through the collaboration layer (so concurrent human edits aren't clobbered) and are stamped `lastUpdatedSource = 'git-sync'` for the listener loop-guard. The in-process `setInterval` orchestration + best-effort lock (no fencing tokens) is a known multi-replica limitation — BullMQ + fencing is the documented future direction. - **`/git` smart-HTTP host** (`integrations/git-sync/http/`, gated additionally by `GIT_SYNC_HTTP_ENABLED`, which defaults to `GIT_SYNC_ENABLED`): a raw root-mounted Fastify route `/git/.git/...` (registered in `main.ts`, NOT under `/api`) that bridges `git clone`/`fetch`/`push` to `git http-backend`. It authenticates HTTP Basic against `AuthService` (throttled by a `FailedLoginLimiter` mirroring the `/mcp` path), authorizes via `SpaceAbilityFactory` (read = fetch, Manage = push), and gates existence so a non-member gets the SAME 404 as a missing/sync-disabled space (never 403 — that would leak space existence). A push runs the receive-pack under the space lock, then a reconcile cycle. - **Schema mirror:** `packages/git-sync/src/lib/docmost-schema.ts` is one of the **three** hand-synced copies of the Tiptap document schema (see Client structure) — keep it in lockstep with `editor-ext` (canonical) and `packages/mcp`. diff --git a/CHANGELOG.md b/CHANGELOG.md index 832615d6..e7981294 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -12,6 +12,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added +- **Native two-way Docmost ↔ git Markdown sync.** Opt-in per space (Space + settings → a git-sync toggle, plus an `autoMergeConflicts` toggle that controls + whether a still-conflicted page is held back or pushed with its conflict + markers stripped): each enabled space is mirrored to an on-disk git "vault" of + Markdown files and reconciled in both directions (Docmost → vault and vault → + Docmost) on a debounced + poll-backstop cycle, under a per-space lock, writing + through the collaboration layer so concurrent human edits aren't clobbered. + Git-originated changes are attributed to a configurable service account and + carry a "git-sync" provenance badge in page history. Optionally exposes a `/git` + smart-HTTP host so you can `git clone`/`fetch`/`push` a space directly (HTTP + Basic auth, space-permission authorized). Off by default and configured via the + `GIT_SYNC_*` environment variables, including `GIT_SYNC_ENABLED`, + `GIT_SYNC_SERVICE_USER_ID`, and `GIT_SYNC_HTTP_ENABLED` (see `.env.example`). + (#119) - **Quick-create regular and temporary notes from the Home and Space screens.** The Home screen now shows a second action next to "New note" that creates a *temporary* note (one that auto-moves to Trash after the workspace lifetime), diff --git a/apps/server/src/collaboration/collaboration.handler.ts b/apps/server/src/collaboration/collaboration.handler.ts index d6074e8b..8490c2aa 100644 --- a/apps/server/src/collaboration/collaboration.handler.ts +++ b/apps/server/src/collaboration/collaboration.handler.ts @@ -11,7 +11,7 @@ import { User } from '@docmost/db/types/entity.types'; import { mergeXmlFragments, mergeXmlFragments3Way, -} from '../integrations/git-sync/services/yjs-body-merge'; +} from './merge/yjs-body-merge'; export type CollabEventHandlers = ReturnType< CollaborationHandler['getHandlers'] diff --git a/apps/server/src/integrations/git-sync/services/lcs.ts b/apps/server/src/collaboration/merge/lcs.ts similarity index 100% rename from apps/server/src/integrations/git-sync/services/lcs.ts rename to apps/server/src/collaboration/merge/lcs.ts diff --git a/apps/server/src/integrations/git-sync/services/redteam-three-way.spec.ts b/apps/server/src/collaboration/merge/redteam-three-way.spec.ts similarity index 100% rename from apps/server/src/integrations/git-sync/services/redteam-three-way.spec.ts rename to apps/server/src/collaboration/merge/redteam-three-way.spec.ts diff --git a/apps/server/src/integrations/git-sync/services/three-way-merge.spec.ts b/apps/server/src/collaboration/merge/three-way-merge.spec.ts similarity index 100% rename from apps/server/src/integrations/git-sync/services/three-way-merge.spec.ts rename to apps/server/src/collaboration/merge/three-way-merge.spec.ts diff --git a/apps/server/src/integrations/git-sync/services/three-way-merge.ts b/apps/server/src/collaboration/merge/three-way-merge.ts similarity index 100% rename from apps/server/src/integrations/git-sync/services/three-way-merge.ts rename to apps/server/src/collaboration/merge/three-way-merge.ts diff --git a/apps/server/src/integrations/git-sync/services/yjs-body-merge.callout.spec.ts b/apps/server/src/collaboration/merge/yjs-body-merge.callout.spec.ts similarity index 98% rename from apps/server/src/integrations/git-sync/services/yjs-body-merge.callout.spec.ts rename to apps/server/src/collaboration/merge/yjs-body-merge.callout.spec.ts index 16a9ad14..4204cf7e 100644 --- a/apps/server/src/integrations/git-sync/services/yjs-body-merge.callout.spec.ts +++ b/apps/server/src/collaboration/merge/yjs-body-merge.callout.spec.ts @@ -5,7 +5,7 @@ import { convertProseMirrorToMarkdown, } from '@docmost/git-sync'; -import { tiptapExtensions } from '../../../collaboration/collaboration.util'; +import { tiptapExtensions } from '../collaboration.util'; import { mergeXmlFragments, mergeXmlFragments3Way } from './yjs-body-merge'; /** diff --git a/apps/server/src/integrations/git-sync/services/yjs-body-merge.idempotency.spec.ts b/apps/server/src/collaboration/merge/yjs-body-merge.idempotency.spec.ts similarity index 100% rename from apps/server/src/integrations/git-sync/services/yjs-body-merge.idempotency.spec.ts rename to apps/server/src/collaboration/merge/yjs-body-merge.idempotency.spec.ts diff --git a/apps/server/src/integrations/git-sync/services/yjs-body-merge.schema-defaults.spec.ts b/apps/server/src/collaboration/merge/yjs-body-merge.schema-defaults.spec.ts similarity index 99% rename from apps/server/src/integrations/git-sync/services/yjs-body-merge.schema-defaults.spec.ts rename to apps/server/src/collaboration/merge/yjs-body-merge.schema-defaults.spec.ts index a9e5dbe9..1cc597cc 100644 --- a/apps/server/src/integrations/git-sync/services/yjs-body-merge.schema-defaults.spec.ts +++ b/apps/server/src/collaboration/merge/yjs-body-merge.schema-defaults.spec.ts @@ -1,7 +1,7 @@ import { TiptapTransformer } from '@hocuspocus/transformer'; import * as Y from 'yjs'; -import { tiptapExtensions } from '../../../collaboration/collaboration.util'; +import { tiptapExtensions } from '../collaboration.util'; import { mergeXmlFragments, mergeXmlFragments3Way } from './yjs-body-merge'; /** diff --git a/apps/server/src/integrations/git-sync/services/yjs-body-merge.spec.ts b/apps/server/src/collaboration/merge/yjs-body-merge.spec.ts similarity index 100% rename from apps/server/src/integrations/git-sync/services/yjs-body-merge.spec.ts rename to apps/server/src/collaboration/merge/yjs-body-merge.spec.ts diff --git a/apps/server/src/integrations/git-sync/services/yjs-body-merge.ts b/apps/server/src/collaboration/merge/yjs-body-merge.ts similarity index 99% rename from apps/server/src/integrations/git-sync/services/yjs-body-merge.ts rename to apps/server/src/collaboration/merge/yjs-body-merge.ts index 7652e595..e21ccdae 100644 --- a/apps/server/src/integrations/git-sync/services/yjs-body-merge.ts +++ b/apps/server/src/collaboration/merge/yjs-body-merge.ts @@ -2,7 +2,7 @@ import * as Y from 'yjs'; import { getSchema } from '@tiptap/core'; import type { Schema } from '@tiptap/pm/model'; -import { tiptapExtensions } from '../../../collaboration/collaboration.util'; +import { tiptapExtensions } from '../collaboration.util'; import { diff3Plan } from './three-way-merge'; import { buildLcsTable } from './lcs'; diff --git a/apps/server/src/core/page/services/page.service.ts b/apps/server/src/core/page/services/page.service.ts index cfea8eae..604ab54f 100644 --- a/apps/server/src/core/page/services/page.service.ts +++ b/apps/server/src/core/page/services/page.service.ts @@ -948,6 +948,12 @@ export class PageService { // Optional agent-edit provenance (from the signed access claim). Stamps the // source marker when the agent moves a page via REST (§6.6 REST path). provenance?: AuthProvenanceData, + // Optional responsible author. When set (git-sync), the move is ATTRIBUTED + // to that account via `lastUpdatedById` — parity with create/delete/rename, + // which all stamp the service user. A normal user move omits it, leaving + // `lastUpdatedById` untouched (a reparent is not a content edit, so the + // existing author is preserved — unchanged behavior). + actorUserId?: string, ) { // validate position value by attempting to generate a key try { @@ -1017,6 +1023,9 @@ export class PageService { { position: dto.position, parentPageId: parentPageId, + // Attribute a git-initiated move to the service account (parity with + // create/delete/rename). Omitted for normal user moves -> unchanged. + ...(actorUserId ? { lastUpdatedById: actorUserId } : {}), // Agent-edit provenance: annotate the source on an agent move. A // normal user request leaves the existing source value unchanged. ...agentSourceFields( diff --git a/apps/server/src/integrations/git-sync/git-sync.controller.spec.ts b/apps/server/src/integrations/git-sync/git-sync.controller.spec.ts index 5a04b51a..66574624 100644 --- a/apps/server/src/integrations/git-sync/git-sync.controller.spec.ts +++ b/apps/server/src/integrations/git-sync/git-sync.controller.spec.ts @@ -3,7 +3,7 @@ // guard (non-admin -> ForbiddenException, no orchestrator call), that trigger // uses the workspace from request context (never the body), and that status // returns the env-derived object. -import { ForbiddenException } from '@nestjs/common'; +import { ForbiddenException, NotFoundException } from '@nestjs/common'; import { WorkspaceCaslAction, WorkspaceCaslSubject, @@ -18,10 +18,11 @@ interface Built { env: Record; workspaceAbility: { createForUser: AnyMock }; ability: { cannot: AnyMock }; + spaceRepo: { findById: AnyMock }; } -function build(opts: { cannot?: boolean } = {}): Built { - const { cannot = false } = opts; +function build(opts: { cannot?: boolean; spaceFound?: boolean } = {}): Built { + const { cannot = false, spaceFound = true } = opts; const ability = { cannot: jest.fn(() => cannot) }; const workspaceAbility = { createForUser: jest.fn(() => ability) }; @@ -35,13 +36,17 @@ function build(opts: { cannot?: boolean } = {}): Built { getGitSyncDebounceMs: jest.fn(() => 2000), getGitSyncServiceUserId: jest.fn(() => 'svc-user'), }; + const spaceRepo = { + findById: jest.fn(async () => (spaceFound ? { id: 'space-1' } : undefined)), + }; const controller = new GitSyncController( orchestrator as any, env as any, workspaceAbility as any, + spaceRepo as any, ); - return { controller, orchestrator, env, workspaceAbility, ability }; + return { controller, orchestrator, env, workspaceAbility, ability, spaceRepo }; } const USER = { id: 'user-1' } as any; @@ -68,7 +73,7 @@ describe('GitSyncController', () => { }); it('admin: calls runOnce(dto.spaceId, workspace.id) using the workspace from context', async () => { - const { controller, orchestrator } = build({ cannot: false }); + const { controller, orchestrator, spaceRepo } = build({ cannot: false }); // The body carries an attacker-controlled workspaceId that must be ignored. const res = await controller.trigger( @@ -77,9 +82,27 @@ describe('GitSyncController', () => { WORKSPACE, ); + // The space is resolved workspace-scoped (context workspace, not the body). + expect(spaceRepo.findById).toHaveBeenCalledWith('space-1', 'ctx-ws'); expect(orchestrator.runOnce).toHaveBeenCalledWith('space-1', 'ctx-ws'); expect(res).toEqual({ spaceId: 'space-1', ran: true }); }); + + it('admin: 404s a spaceId that is not in the workspace and never calls runOnce', async () => { + // A foreign/non-existent space must be rejected BEFORE buildSettings runs + // (which would otherwise create an empty per-space vault directory). + const { controller, orchestrator, spaceRepo } = build({ + cannot: false, + spaceFound: false, + }); + + await expect( + controller.trigger({ spaceId: 'foreign' } as any, USER, WORKSPACE), + ).rejects.toBeInstanceOf(NotFoundException); + + expect(spaceRepo.findById).toHaveBeenCalledWith('foreign', 'ctx-ws'); + expect(orchestrator.runOnce).not.toHaveBeenCalled(); + }); }); describe('status', () => { diff --git a/apps/server/src/integrations/git-sync/git-sync.controller.ts b/apps/server/src/integrations/git-sync/git-sync.controller.ts index da64588e..5803ca6b 100644 --- a/apps/server/src/integrations/git-sync/git-sync.controller.ts +++ b/apps/server/src/integrations/git-sync/git-sync.controller.ts @@ -4,6 +4,7 @@ import { ForbiddenException, HttpCode, HttpStatus, + NotFoundException, Post, Get, UseGuards, @@ -12,6 +13,7 @@ import { JwtAuthGuard } from '../../common/guards/jwt-auth.guard'; import { AuthUser } from '../../common/decorators/auth-user.decorator'; import { AuthWorkspace } from '../../common/decorators/auth-workspace.decorator'; import { User, Workspace } from '@docmost/db/types/entity.types'; +import { SpaceRepo } from '@docmost/db/repos/space/space.repo'; import WorkspaceAbilityFactory from '../../core/casl/abilities/workspace-ability.factory'; import { WorkspaceCaslAction, @@ -47,6 +49,7 @@ export class GitSyncController { private readonly orchestrator: GitSyncOrchestrator, private readonly environmentService: EnvironmentService, private readonly workspaceAbility: WorkspaceAbilityFactory, + private readonly spaceRepo: SpaceRepo, ) {} /** Throw unless the caller is a workspace admin (Manage Settings). */ @@ -67,6 +70,15 @@ export class GitSyncController { @AuthWorkspace() workspace: Workspace, ): Promise { this.assertAdmin(user, workspace); + // Verify the client-supplied spaceId BELONGS to this workspace before doing + // any work (review): without this, `runOnce` -> `buildSettings` reads the + // raw `spaces` row and creates an empty per-space vault directory for a + // foreign/non-existent space before the content read finally 404s. Resolve + // it workspace-scoped and 404 early. + const space = await this.spaceRepo.findById(dto.spaceId, workspace.id); + if (!space) { + throw new NotFoundException('Space not found'); + } // Use the workspace from the request context (never client-supplied). return this.orchestrator.runOnce(dto.spaceId, workspace.id); } diff --git a/apps/server/src/integrations/git-sync/services/git-ingest-convergence.spec.ts b/apps/server/src/integrations/git-sync/services/git-ingest-convergence.spec.ts index 39bb17cd..8e857670 100644 --- a/apps/server/src/integrations/git-sync/services/git-ingest-convergence.spec.ts +++ b/apps/server/src/integrations/git-sync/services/git-ingest-convergence.spec.ts @@ -1,6 +1,6 @@ import * as Y from 'yjs'; -import { mergeXmlFragments3Way } from './yjs-body-merge'; +import { mergeXmlFragments3Way } from '../../../collaboration/merge/yjs-body-merge'; /** * Convergence repro for the git-ingest "silent revert" data-loss bug. diff --git a/apps/server/src/integrations/git-sync/services/git-sync.orchestrator.ts b/apps/server/src/integrations/git-sync/services/git-sync.orchestrator.ts index 2974233b..db95cc54 100644 --- a/apps/server/src/integrations/git-sync/services/git-sync.orchestrator.ts +++ b/apps/server/src/integrations/git-sync/services/git-sync.orchestrator.ts @@ -5,7 +5,14 @@ import { OnModuleInit, } from '@nestjs/common'; import { SchedulerRegistry } from '@nestjs/schedule'; -import { mkdir, readFile, rm, writeFile } from 'node:fs/promises'; +import { + lstat, + mkdir, + readFile, + realpath, + rm, + writeFile, +} from 'node:fs/promises'; import { InjectKysely } from 'nestjs-kysely'; import { KyselyDB } from '@docmost/db/types/kysely.types'; import { sql } from 'kysely'; @@ -303,11 +310,31 @@ export class GitSyncOrchestrator implements OnModuleInit, OnModuleDestroy { vault, settings, // ABSOLUTE-path fs primitives the engine cycle injects (it stays IO-free). + // `lstat`/`realpath` back the engine's symlink guard: both MUST yield + // `null` on ENOENT (a not-yet-created file is the normal write case) so the + // guard can tell "absent" (safe to create) from "is a symlink" (refuse). + // `lstat` does NOT follow the final link; `realpath` resolves it. fs: { readFile: (absPath) => readFile(absPath, 'utf8'), writeFile: (absPath, text) => writeFile(absPath, text, 'utf8'), mkdir: (absDir) => mkdir(absDir, { recursive: true }).then(() => undefined), rm: (absPath) => rm(absPath, { force: true }), + lstat: (absPath) => + lstat(absPath).then( + (st) => ({ isSymbolicLink: st.isSymbolicLink() }), + (err: NodeJS.ErrnoException) => { + if (err && err.code === 'ENOENT') return null; + throw err; + }, + ), + realpath: (absPath) => + realpath(absPath).then( + (p) => p, + (err: NodeJS.ErrnoException) => { + if (err && err.code === 'ENOENT') return null; + throw err; + }, + ), }, // Every cycle logs its full push plan + per-action lines + completion // counts (created/updated/deleted/skipped/failures) through this `log`, so diff --git a/apps/server/src/integrations/git-sync/services/gitmost-datasource.service.spec.ts b/apps/server/src/integrations/git-sync/services/gitmost-datasource.service.spec.ts index 8717cdc3..ceb35d34 100644 --- a/apps/server/src/integrations/git-sync/services/gitmost-datasource.service.spec.ts +++ b/apps/server/src/integrations/git-sync/services/gitmost-datasource.service.spec.ts @@ -362,13 +362,17 @@ describe('GitmostDataSourceService', () => { await service.bind(CTX).movePage('p1', 'parent-1'); expect(mocks.pageService.movePage).toHaveBeenCalledTimes(1); - const [dto, page, provenance] = mocks.pageService.movePage.mock.calls[0]; + const [dto, page, provenance, actorUserId] = + mocks.pageService.movePage.mock.calls[0]; expect(dto.pageId).toBe('p1'); expect(dto.parentPageId).toBe('parent-1'); expect(typeof dto.position).toBe('string'); expect(dto.position.length).toBeGreaterThan(0); expect(page).toEqual({ id: 'p1', spaceId: 'space-1' }); expect(provenance).toEqual({ actor: 'git-sync', aiChatId: null }); + // The git-initiated move is attributed to the service user (lastUpdatedById + // parity with create/delete/rename). + expect(actorUserId).toBe('svc-user'); }); it('passes through an explicit position unchanged', async () => { diff --git a/apps/server/src/integrations/git-sync/services/gitmost-datasource.service.ts b/apps/server/src/integrations/git-sync/services/gitmost-datasource.service.ts index 0dc14f13..4dd707cf 100644 --- a/apps/server/src/integrations/git-sync/services/gitmost-datasource.service.ts +++ b/apps/server/src/integrations/git-sync/services/gitmost-datasource.service.ts @@ -76,7 +76,7 @@ export class GitmostDataSourceService { this.createPage(ctx, title, content, spaceId, parentPageId), deletePage: (pageId) => this.deletePage(ctx, pageId), movePage: (pageId, parentPageId, position) => - this.movePage(pageId, parentPageId, position), + this.movePage(ctx, pageId, parentPageId, position), renamePage: (pageId, title) => this.renamePage(ctx, pageId, title), listRecentSince: (spaceId, sinceIso, hardPageCap) => this.listRecentSince(spaceId, sinceIso, hardPageCap), @@ -252,6 +252,7 @@ export class GitmostDataSourceService { * §3.2 / §14.4). */ private async movePage( + ctx: GitSyncBindContext, pageId: string, parentPageId: string | null, position?: string, @@ -268,6 +269,10 @@ export class GitmostDataSourceService { { pageId, parentPageId: parentPageId ?? null, position: resolvedPosition }, page, GIT_SYNC_PROVENANCE, + // Attribute the git-initiated move to the service user (lastUpdatedById), + // matching create/delete/rename — the contract is "git-operations are + // attributed to the service account". + ctx.userId, ); return { id: pageId }; } diff --git a/apps/server/src/integrations/git-sync/services/vault-registry.service.spec.ts b/apps/server/src/integrations/git-sync/services/vault-registry.service.spec.ts index 08658575..8a99da06 100644 --- a/apps/server/src/integrations/git-sync/services/vault-registry.service.spec.ts +++ b/apps/server/src/integrations/git-sync/services/vault-registry.service.spec.ts @@ -16,8 +16,8 @@ jest.mock('node:fs/promises', () => ({ // ensureServable shells out via `promisify(execFile)`; mock execFile with a // callback-style fn so promisify resolves. Each `git config ` call -// is recorded so the four config writes (incl. the security-critical -// receive.denyNonFastForwards=true) can be asserted. +// is recorded so the config writes (incl. the security-critical +// receive.denyNonFastForwards=true and core.symlinks=false) can be asserted. jest.mock('node:child_process', () => ({ execFile: jest.fn((_cmd: string, _args: string[], _opts: any, cb: any) => cb(null, { stdout: '', stderr: '' }), @@ -54,6 +54,7 @@ void loadGitSync; function build(dataDir: string): { service: VaultRegistryService } { const env = { getGitSyncDataDir: jest.fn(() => dataDir), + getGitSyncBackendTimeoutMs: jest.fn(() => 120000), }; const service = new VaultRegistryService(env as any); return { service }; @@ -96,7 +97,7 @@ describe('VaultRegistryService', () => { }); describe('ensureServable', () => { - it('ensures the repo then writes the four force-push-protection git configs', async () => { + it('ensures the repo then writes the force-push-protection + symlink-guard git configs', async () => { const { service } = build('/vaults'); const path = await service.ensureServable('space-1'); @@ -117,12 +118,18 @@ describe('VaultRegistryService', () => { ['receive.denyNonFastForwards', 'true'], ['http.receivepack', 'true'], ['http.uploadpack', 'true'], + // Security-critical (PR #119 review): a pushed symlink is checked out as + // a plain file, never a real link, so it cannot be followed to leak/ + // overwrite a file outside the vault. + ['core.symlinks', 'false'], ]); - // Every config write targets THIS vault's cwd. + // Every config write targets THIS vault's cwd and is time-bounded so a + // wedged git cannot hang the request path. for (const [cmd, args, opts] of execFileMock.mock.calls) { if (cmd === 'git' && args[0] === 'config') { expect(opts.cwd).toBe('/vaults/space-1'); + expect(opts.timeout).toBe(120000); } } }); diff --git a/apps/server/src/integrations/git-sync/services/vault-registry.service.ts b/apps/server/src/integrations/git-sync/services/vault-registry.service.ts index fc8834d0..b46ef356 100644 --- a/apps/server/src/integrations/git-sync/services/vault-registry.service.ts +++ b/apps/server/src/integrations/git-sync/services/vault-registry.service.ts @@ -62,10 +62,20 @@ export class VaultRegistryService { * rewrite the engine's history on `main`. * - http.receivepack=true / http.uploadpack=true — explicitly allow the * receive/upload services over HTTP. + * - core.symlinks=false — SECURITY (PR #119 review). A writer could push a + * `.md` entry that is a SYMLINK (e.g. `leak.md -> /etc/passwd` or + * `-> .env`); with symlinks enabled `updateInstead` would materialize a + * real link in the working tree, and the next push cycle would follow it + * and PUBLISH the target's contents as a Docmost page (server-file + * disclosure), or use a symlinked directory to write OUTSIDE the vault on + * pull. With `core.symlinks=false` git checks out such a blob as a PLAIN + * FILE containing the link text, never a real link, defusing the primitive + * at the git layer. (The engine's per-access lstat/realpath guard is the + * second layer — see path-guard.ts.) * - * All four are set idempotently (plain `git config` overwrites the local - * value). Returns the absolute vault path. Idempotent and safe to call before - * every request. + * All are set idempotently (plain `git config` overwrites the local value). + * Returns the absolute vault path. Idempotent and safe to call before every + * request. */ async ensureServable(spaceId: string): Promise { const { vaultGitEnv } = await loadGitSync(); @@ -81,13 +91,21 @@ export class VaultRegistryService { ['receive.denyNonFastForwards', 'true'], ['http.receivepack', 'true'], ['http.uploadpack', 'true'], + ['core.symlinks', 'false'], ]; + // Bound each `git config` (review suggestion): this runs in the request path + // BEFORE the watchdog, so a wedged git (a stale `.git/config.lock`) would + // otherwise hang the request indefinitely. Mirror the engine's GIT_EXEC + // bound via the configured backend timeout. + const timeout = this.environmentService.getGitSyncBackendTimeoutMs(); for (const [key, value] of configs) { await execFileAsync('git', ['config', key, value], { cwd: path, // Use the engine's cwd-isolated env (strips GIT_DIR / GIT_WORK_TREE) so // the config is written to THIS vault's local config, nothing else. env: vaultGitEnv(), + timeout, + maxBuffer: 10 * 1024 * 1024, }); } diff --git a/packages/git-sync/package.json b/packages/git-sync/package.json index 65340bf2..b43beb01 100644 --- a/packages/git-sync/package.json +++ b/packages/git-sync/package.json @@ -20,7 +20,6 @@ }, "license": "MIT", "dependencies": { - "@fellow/prosemirror-recreate-transform": "^1.2.3", "@tiptap/core": "3.20.4", "@tiptap/extension-highlight": "3.20.4", "@tiptap/extension-image": "3.20.4", diff --git a/packages/git-sync/src/engine/cycle.ts b/packages/git-sync/src/engine/cycle.ts index 985e9645..9f01ed5b 100644 --- a/packages/git-sync/src/engine/cycle.ts +++ b/packages/git-sync/src/engine/cycle.ts @@ -3,13 +3,19 @@ import { GitSyncClient } from "./client.types.js"; import { Settings } from "./settings.js"; import { readExisting, computePullActions, applyPullActions } from "./pull.js"; import { runPush } from "./push.js"; +import { assertVaultPathSafe, type PathGuardIo } from "./path-guard.js"; /** * Absolute-path filesystem primitives the cycle needs. Injected (not imported) * so the engine stays IO-free and unit-testable. `mkdir` is recursive; `rm` is * force (a missing file is a no-op). + * + * `lstat`/`realpath` back the SYMLINK GUARD (see ./path-guard.ts): every + * read/write/mkdir is screened so a pushed symlink (e.g. `leak.md -> /etc/passwd` + * or `-> .env`) cannot be followed to publish or overwrite a file outside the + * vault. Both MUST resolve to `null` on ENOENT and reject on any other error. */ -export interface CycleFs { +export interface CycleFs extends PathGuardIo { readFile: (absPath: string) => Promise; writeFile: (absPath: string, text: string) => Promise; mkdir: (absDir: string) => Promise; @@ -80,6 +86,31 @@ export async function runCycle(deps: RunCycleDeps): Promise { const vaultRoot = settings.vaultPath; const abs = (relPath: string) => `${vaultRoot}/${relPath}`; + // SYMLINK GUARD (defense-in-depth, see ./path-guard.ts). Wrap the injected + // read/write/mkdir primitives so EVERY engine file access is screened: a path + // that is — or traverses — a symlink, or whose realpath escapes the vault, is + // refused. `rm` is deliberately NOT wrapped: removing a path only deletes the + // link itself (force, non-recursive), never the target, and we WANT to be able + // to clean up a stray pushed symlink. A refusal THROWS; the pull/push loops + // already isolate per-file errors (skip + log), so a single poisoned entry is + // skipped while the rest of the space keeps syncing. + const guard = (p: string) => assertVaultPathSafe(fs, vaultRoot, p); + const safeFs = { + readFile: async (p: string): Promise => { + await guard(p); + return fs.readFile(p); + }, + writeFile: async (p: string, text: string): Promise => { + await guard(p); + return fs.writeFile(p, text); + }, + mkdir: async (p: string): Promise => { + await guard(p); + return fs.mkdir(p); + }, + rm: (p: string): Promise => fs.rm(p), + }; + // 1. The engine state store is git: make sure the repo + branches exist // before any tracked-file listing or diff. await vault.assertGitAvailable(); @@ -118,7 +149,7 @@ export async function runCycle(deps: RunCycleDeps): Promise { // 4. PULL -------------------------------------------------------------------- const existing = await readExisting({ listTracked: () => vault.listTrackedFiles("*.md"), - readFile: (relPath) => fs.readFile(abs(relPath)), + readFile: (relPath) => safeFs.readFile(abs(relPath)), }); const tree = await client.listSpaceTree(spaceId); @@ -135,9 +166,9 @@ export async function runCycle(deps: RunCycleDeps): Promise { { client, git: vault, - writeFile: (absPath, text) => fs.writeFile(absPath, text), - mkdir: (absDir) => fs.mkdir(absDir), - rm: (absPath) => fs.rm(absPath), + writeFile: (absPath, text) => safeFs.writeFile(absPath, text), + mkdir: (absDir) => safeFs.mkdir(absDir), + rm: (absPath) => safeFs.rm(absPath), log, }, pullActions, @@ -149,8 +180,8 @@ export async function runCycle(deps: RunCycleDeps): Promise { settings, git: vault, makeClient: () => client, - readFile: (relPath: string) => fs.readFile(abs(relPath)), - writeFile: (relPath: string, text: string) => fs.writeFile(abs(relPath), text), + readFile: (relPath: string) => safeFs.readFile(abs(relPath)), + writeFile: (relPath: string, text: string) => safeFs.writeFile(abs(relPath), text), log, }; diff --git a/packages/git-sync/src/engine/path-guard.ts b/packages/git-sync/src/engine/path-guard.ts new file mode 100644 index 00000000..548b6ac2 --- /dev/null +++ b/packages/git-sync/src/engine/path-guard.ts @@ -0,0 +1,132 @@ +/** + * Vault path guard (security, defense-in-depth). + * + * A user with push access to a git-sync space could commit a `.md` entry that is + * a SYMLINK (e.g. `leak.md -> /etc/passwd` or `-> /.env`). On the next + * cycle a naive `fs.readFile` would follow the link and PUBLISH the target's + * contents as a Docmost page (a read primitive that escalates a writer to + * arbitrary server-file disclosure — including the JWT secret / DB creds in + * `.env`); a symlinked DIRECTORY gives the inverse write-outside-the-vault + * primitive on pull. The primary defense is `core.symlinks=false` in each + * vault's git config (git then materializes a pushed symlink as a PLAIN FILE + * holding the link text, never a real link). This module is the second layer: + * before every engine read/write/mkdir we reject a path that IS — or traverses — + * a symlink, or whose real location escapes the vault root. + * + * IO-free by construction: the `lstat`/`realpath` primitives are injected + * (mirroring the rest of the engine) so the rules are unit-testable with fakes + * and the engine never imports `node:fs`. Path math uses `node:path`, which is + * pure. + */ +import { isAbsolute, relative, resolve, sep } from "node:path"; + +/** Why a path was refused. */ +export type VaultPathUnsafeReason = "symlink" | "escape"; + +/** + * Thrown when a path is refused by the guard. Engine read/write loops already + * isolate per-file errors (skip + log), so throwing here yields the review's + * required "skip+log" behavior without a separate control channel. + */ +export class VaultPathUnsafeError extends Error { + constructor( + readonly absPath: string, + readonly reason: VaultPathUnsafeReason, + readonly vaultRoot: string, + ) { + super( + reason === "symlink" + ? `git-sync: refusing to access '${absPath}' — it is (or traverses) a ` + + `symlink under vault '${vaultRoot}' (symlink guard)` + : `git-sync: refusing to access '${absPath}' — it resolves outside ` + + `vault '${vaultRoot}' (symlink guard)`, + ); + this.name = "VaultPathUnsafeError"; + } +} + +/** + * The injected IO the guard needs. Both MUST resolve to `null` on ENOENT (the + * normal case for a not-yet-created file on a write/mkdir) and reject on any + * other error. + */ +export interface PathGuardIo { + /** lstat WITHOUT following the final symlink. `null` when the path is absent. */ + lstat: (absPath: string) => Promise<{ isSymbolicLink: boolean } | null>; + /** realpath (follows symlinks). `null` when the path is absent. */ + realpath: (absPath: string) => Promise; +} + +/** + * Lexical containment: is `target` EQUAL to, or NESTED under, `root`? Catches a + * `..` traversal baked into a relPath before any IO. Both operands are resolved + * first so `.`/`..` segments are normalized. + */ +export function isWithinRoot(root: string, target: string): boolean { + const r = resolve(root); + const t = resolve(target); + if (t === r) return true; + const rel = relative(r, t); + return rel.length > 0 && !rel.startsWith(`..${sep}`) && rel !== ".." && !isAbsolute(rel); +} + +/** + * Reject `absPath` (resolving silently when it is safe) if it: + * - escapes `vaultRoot` lexically (a `..` traversal), OR + * - IS, or traverses, a symlink at any EXISTING segment from the root down + * (a symlinked ancestor dir, or the target file/dir itself), OR + * - resolves (realpath of its deepest existing ancestor) outside the vault. + * + * Absent leaf segments — the normal case when writing/mkdir'ing a NEW file — are + * safe: the walk stops at the first non-existent segment (nothing to follow). + */ +export async function assertVaultPathSafe( + io: PathGuardIo, + vaultRoot: string, + absPath: string, +): Promise { + const root = resolve(vaultRoot); + const target = resolve(absPath); + + // 1. Lexical containment — a `..` in a relPath never even reaches an lstat. + if (!isWithinRoot(root, target)) { + throw new VaultPathUnsafeError(absPath, "escape", vaultRoot); + } + + // 2. lstat-walk: reject a symlink at ANY existing level between the root and + // the target (inclusive). A symlinked ancestor or a symlinked target both + // let a follow-the-link read/write escape; rejecting the link itself is the + // surgical guard. + if (target !== root) { + const segments = relative(root, target) + .split(sep) + .filter((s) => s.length > 0); + let cur = root; + for (const segment of segments) { + cur = resolve(cur, segment); + const st = await io.lstat(cur); + if (st === null) break; // absent from here down — nothing left to follow + if (st.isSymbolicLink) { + throw new VaultPathUnsafeError(cur, "symlink", vaultRoot); + } + } + } + + // 3. realpath belt-and-suspenders: the deepest EXISTING ancestor must resolve + // inside the vault root's realpath. Catches an ancestor relocated via a + // symlink the lexical check would miss (e.g. the data dir itself being a + // link farm) and bounds the lstat→use TOCTOU window. + const realRoot = await io.realpath(root); + if (realRoot === null) return; // root absent — ensureRepo creates it first + let probe = target; + let realProbe = await io.realpath(probe); + while (realProbe === null && probe !== root) { + const parent = resolve(probe, ".."); + if (parent === probe) break; // reached the filesystem root + probe = parent; + realProbe = await io.realpath(probe); + } + if (realProbe !== null && !isWithinRoot(realRoot, realProbe)) { + throw new VaultPathUnsafeError(absPath, "escape", vaultRoot); + } +} diff --git a/packages/git-sync/src/index.ts b/packages/git-sync/src/index.ts index fc4e5c84..a52ca8d3 100644 --- a/packages/git-sync/src/index.ts +++ b/packages/git-sync/src/index.ts @@ -116,4 +116,11 @@ export type { CycleFs, } from "./engine/cycle.js"; +export { + assertVaultPathSafe, + isWithinRoot, + VaultPathUnsafeError, +} from "./engine/path-guard.js"; +export type { PathGuardIo, VaultPathUnsafeReason } from "./engine/path-guard.js"; + export { parsePageFile, serializePageFile } from "./lib/page-file.js"; diff --git a/packages/git-sync/src/lib/diff.ts b/packages/git-sync/src/lib/diff.ts deleted file mode 100644 index 94f7d447..00000000 --- a/packages/git-sync/src/lib/diff.ts +++ /dev/null @@ -1,319 +0,0 @@ -/** - * Headless, Docmost-equivalent document diff. - * - * Docmost's history editor computes a change set with the exact pipeline below - * (recreateTransform -> ChangeSet.addSteps -> simplifyChanges) and renders it as - * editor decorations. This module runs the SAME computation but serializes the - * result to text + integrity counts instead of decorations, so a diff can be - * previewed without a browser. - * - * recreateTransform here comes from @fellow/prosemirror-recreate-transform, the - * maintained published fork of the MIT prosemirror-recreate-steps source that - * Docmost vendors in @docmost/editor-ext; it exposes the identical - * recreateTransform(fromDoc, toDoc, { complexSteps, wordDiffs, simplifyDiff }) - * signature. - * - * If recreateTransform / the changeset throws on a pathological document pair, - * we fall back to a coarse block-level text diff so the tool never hard-fails. - */ - -import { getSchema } from "@tiptap/core"; -import { Node } from "@tiptap/pm/model"; -import { ChangeSet, simplifyChanges } from "@tiptap/pm/changeset"; -import { recreateTransform } from "@fellow/prosemirror-recreate-transform"; -import { docmostExtensions } from "./docmost-schema.js"; - -/** A single inserted/deleted change with its containing-block context. */ -export interface DiffChange { - op: "insert" | "delete"; - /** Lead (plain) text of the block that contains the change, for context. */ - block: string; - /** The inserted or deleted text. */ - text: string; -} - -/** Integrity counts as [old, new] tuples; footnoteMarkers as [oldList, newList]. */ -export interface DiffIntegrity { - images: [number, number]; - links: [number, number]; - tables: [number, number]; - callouts: [number, number]; - footnoteMarkers: [number[], number[]]; -} - -export interface DiffResult { - summary: { inserted: number; deleted: number; blocksChanged: number }; - integrity: DiffIntegrity; - changes: DiffChange[]; - /** Human-readable unified-ish summary. */ - markdown: string; -} - -/** Build the schema once; it is pure and reused across calls. */ -const schema = getSchema(docmostExtensions); - -/** Recursively concatenate the plain text of a JSON node. */ -function plainText(node: any): string { - if (!node || typeof node !== "object") return ""; - let out = ""; - if (typeof node.text === "string") out += node.text; - if (Array.isArray(node.content)) { - for (const child of node.content) out += plainText(child); - } - return out; -} - -/** Count nodes in a JSON doc that satisfy `pred` (recursive). */ -function countNodes(doc: any, pred: (node: any) => boolean): number { - let n = 0; - const visit = (node: any): void => { - if (!node || typeof node !== "object") return; - if (pred(node)) n++; - if (Array.isArray(node.content)) for (const c of node.content) visit(c); - }; - visit(doc); - return n; -} - -/** - * Count UNIQUE links in a JSON doc by their `href`. A single link can be split - * across several adjacent text runs (e.g. a "link+bold" run followed by a "link" - * run); counting link-bearing runs would over-count it. Walking the tree and - * collecting hrefs into a Set keys each distinct link once. Link marks with a - * missing/empty href are bucketed under a single "" key so a malformed link is - * still counted as one. - */ -function countUniqueLinks(doc: any): number { - const hrefs = new Set(); - const visit = (node: any): void => { - if (!node || typeof node !== "object") return; - if (node.type === "text" && Array.isArray(node.marks)) { - for (const m of node.marks) { - if (m && m.type === "link") { - const href = m.attrs && typeof m.attrs.href === "string" ? m.attrs.href : ""; - hrefs.add(href); - } - } - } - if (Array.isArray(node.content)) for (const c of node.content) visit(c); - }; - visit(doc); - return hrefs.size; -} - -/** - * Parse the ordered list of integers from `[N]` footnote markers found in the - * BODY only (every top-level block before the first notes heading; if no such - * heading, the whole doc). Returned in reading order. - */ -function footnoteMarkers(doc: any, notesHeading: string): number[] { - const top: any[] = Array.isArray(doc?.content) ? doc.content : []; - const notesIdx = top.findIndex( - (n) => - n && - n.type === "heading" && - plainText(n).trim() === notesHeading, - ); - const bodyBlocks = notesIdx >= 0 ? top.slice(0, notesIdx) : top; - const markers: number[] = []; - const re = /\[(\d+)\]/g; - for (const block of bodyBlocks) { - const text = plainText(block); - let m: RegExpExecArray | null; - re.lastIndex = 0; - while ((m = re.exec(text)) !== null) { - markers.push(Number(m[1])); - } - } - return markers; -} - -/** Compute the [old,new] integrity tuples for two JSON docs. */ -function computeIntegrity( - oldDoc: any, - newDoc: any, - notesHeading: string, -): DiffIntegrity { - const images: [number, number] = [ - countNodes(oldDoc, (n) => n.type === "image"), - countNodes(newDoc, (n) => n.type === "image"), - ]; - const links: [number, number] = [ - countUniqueLinks(oldDoc), - countUniqueLinks(newDoc), - ]; - const tables: [number, number] = [ - countNodes(oldDoc, (n) => n.type === "table"), - countNodes(newDoc, (n) => n.type === "table"), - ]; - const callouts: [number, number] = [ - countNodes(oldDoc, (n) => n.type === "callout"), - countNodes(newDoc, (n) => n.type === "callout"), - ]; - const fns: [number[], number[]] = [ - footnoteMarkers(oldDoc, notesHeading), - footnoteMarkers(newDoc, notesHeading), - ]; - return { images, links, tables, callouts, footnoteMarkers: fns }; -} - -/** - * Resolve the lead text of the top-level block in a ProseMirror Node that - * contains the given document position. Returns "" when out of range. - */ -function blockContextAt(node: Node, pos: number): string { - try { - const clamped = Math.max(0, Math.min(pos, node.content.size)); - const $pos = node.resolve(clamped); - // depth 1 is the top-level block in a doc node. - const block = $pos.depth >= 1 ? $pos.node(1) : $pos.node(0); - const text = block.textContent || ""; - return text.length > 80 ? text.slice(0, 77) + "..." : text; - } catch { - return ""; - } -} - -/** Truncate a string for the markdown summary. */ -function truncate(s: string, n = 120): string { - return s.length > n ? s.slice(0, n - 3) + "..." : s; -} - -/** - * Coarse fallback: a block-by-block plain-text diff. Used only when the precise - * changeset pipeline throws, so the tool degrades gracefully instead of failing. - */ -function coarseDiff(oldDoc: any, newDoc: any): DiffChange[] { - const oldBlocks: any[] = Array.isArray(oldDoc?.content) ? oldDoc.content : []; - const newBlocks: any[] = Array.isArray(newDoc?.content) ? newDoc.content : []; - const oldTexts = oldBlocks.map(plainText); - const newTexts = newBlocks.map(plainText); - const oldSet = new Set(oldTexts); - const newSet = new Set(newTexts); - const changes: DiffChange[] = []; - for (const t of oldTexts) { - if (!newSet.has(t) && t.trim() !== "") { - changes.push({ op: "delete", block: truncate(t, 80), text: t }); - } - } - for (const t of newTexts) { - if (!oldSet.has(t) && t.trim() !== "") { - changes.push({ op: "insert", block: truncate(t, 80), text: t }); - } - } - return changes; -} - -/** Build the human-readable unified-ish markdown summary. */ -function renderMarkdown( - result: Omit, - fellBack: boolean, -): string { - const lines: string[] = []; - const { summary, integrity, changes } = result; - lines.push( - `# Diff: ${summary.inserted} inserted / ${summary.deleted} deleted (${summary.blocksChanged} blocks changed)`, - ); - if (fellBack) { - lines.push(""); - lines.push("> note: precise diff failed; coarse block-level diff shown."); - } - lines.push(""); - lines.push("## Integrity (old -> new)"); - lines.push(`- images: ${integrity.images[0]} -> ${integrity.images[1]}`); - lines.push(`- links: ${integrity.links[0]} -> ${integrity.links[1]}`); - lines.push(`- tables: ${integrity.tables[0]} -> ${integrity.tables[1]}`); - lines.push(`- callouts: ${integrity.callouts[0]} -> ${integrity.callouts[1]}`); - lines.push( - `- footnoteMarkers: [${integrity.footnoteMarkers[0].join(", ")}] -> [${integrity.footnoteMarkers[1].join(", ")}]`, - ); - lines.push(""); - lines.push("## Changes"); - if (changes.length === 0) { - lines.push("(no textual changes)"); - } else { - for (const c of changes) { - const sign = c.op === "insert" ? "+" : "-"; - const ctx = c.block ? ` @ ${truncate(c.block, 60)}` : ""; - lines.push(`${sign} ${truncate(c.text)}${ctx}`); - } - } - return lines.join("\n"); -} - -/** - * Diff two ProseMirror JSON documents the way Docmost's history editor does and - * serialize the result to text + integrity counts. - * - * @param oldDocJson the earlier document - * @param newDocJson the later document - * @param notesHeading heading delimiting body from notes for footnote counting - */ -export function diffDocs( - oldDocJson: any, - newDocJson: any, - notesHeading: string = "Примечания переводчика", -): DiffResult { - const integrity = computeIntegrity(oldDocJson, newDocJson, notesHeading); - - let changes: DiffChange[] = []; - let inserted = 0; - let deleted = 0; - let fellBack = false; - const changedBlocks = new Set(); - - try { - const oldNode = Node.fromJSON(schema, oldDocJson); - const newNode = Node.fromJSON(schema, newDocJson); - const tr = recreateTransform(oldNode, newNode, { - complexSteps: false, - wordDiffs: true, - simplifyDiff: true, - }); - const changeSet = ChangeSet.create(oldNode).addSteps( - tr.doc, - tr.mapping.maps, - [], - ); - const simplified = simplifyChanges(changeSet.changes, newNode); - - for (const change of simplified) { - // Deleted text lives in the OLD doc coordinate range [fromA, toA). - if (change.toA > change.fromA) { - const text = oldNode.textBetween(change.fromA, change.toA, "\n", " "); - if (text.length > 0) { - deleted += text.length; - const block = blockContextAt(oldNode, change.fromA); - changes.push({ op: "delete", block, text }); - if (block) changedBlocks.add("d:" + block); - } - } - // Inserted text lives in the NEW doc coordinate range [fromB, toB). - if (change.toB > change.fromB) { - const text = newNode.textBetween(change.fromB, change.toB, "\n", " "); - if (text.length > 0) { - inserted += text.length; - const block = blockContextAt(newNode, change.fromB); - changes.push({ op: "insert", block, text }); - if (block) changedBlocks.add("i:" + block); - } - } - } - } catch { - // Pathological pair: degrade to a coarse block-level diff so we never throw. - fellBack = true; - changes = coarseDiff(oldDocJson, newDocJson); - for (const c of changes) { - if (c.op === "insert") inserted += c.text.length; - else deleted += c.text.length; - if (c.block) changedBlocks.add(c.op[0] + ":" + c.block); - } - } - - const partial: Omit = { - summary: { inserted, deleted, blocksChanged: changedBlocks.size }, - integrity, - changes, - }; - return { ...partial, markdown: renderMarkdown(partial, fellBack) }; -} diff --git a/packages/git-sync/test/cycle-roundtrip.test.ts b/packages/git-sync/test/cycle-roundtrip.test.ts index eb6253b6..a4c54807 100644 --- a/packages/git-sync/test/cycle-roundtrip.test.ts +++ b/packages/git-sync/test/cycle-roundtrip.test.ts @@ -54,6 +54,26 @@ const nodeFs: CycleFs = { const fs = await import("node:fs/promises"); await fs.rm(absPath, { force: true }); }, + // Real symlink-guard primitives (ENOENT -> null), mirroring the server wiring. + lstat: async (absPath) => { + const fs = await import("node:fs/promises"); + try { + const st = await fs.lstat(absPath); + return { isSymbolicLink: st.isSymbolicLink() }; + } catch (err) { + if ((err as NodeJS.ErrnoException)?.code === "ENOENT") return null; + throw err; + } + }, + realpath: async (absPath) => { + const fs = await import("node:fs/promises"); + try { + return await fs.realpath(absPath); + } catch (err) { + if ((err as NodeJS.ErrnoException)?.code === "ENOENT") return null; + throw err; + } + }, }; /** A minimal recording client; empty Docmost so the pull is a no-op. */ diff --git a/packages/git-sync/test/cycle.test.ts b/packages/git-sync/test/cycle.test.ts index 276ed940..6347aa2d 100644 --- a/packages/git-sync/test/cycle.test.ts +++ b/packages/git-sync/test/cycle.test.ts @@ -61,6 +61,10 @@ function baseDeps(vault: any, over: Partial = {}): RunCycleDeps { writeFile: vi.fn(async () => undefined), mkdir: vi.fn(async () => undefined), rm: vi.fn(async () => undefined), + // Default: nothing is a symlink and everything resolves in place (no + // escape). The symlink-guard tests below override these. + lstat: vi.fn(async () => ({ isSymbolicLink: false })), + realpath: vi.fn(async (p: string) => p), }, log: vi.fn(), ...over, @@ -154,6 +158,32 @@ describe("runCycle (composition)", () => { expect(vault.diffNameStatus).not.toHaveBeenCalled(); }); + it("SYMLINK GUARD: never reads a tracked .md that is a symlink (no .env/passwd disclosure)", async () => { + // Security regression (PR #119 review): a writer who pushes `leak.md` as a + // SYMLINK to a server file (e.g. `.env`) must NOT have its target read and + // published. readExisting reads each tracked .md to recover its gitmost_id; + // the guard refuses the symlink BEFORE the raw read, so the target's bytes + // are never touched and the cycle keeps running for the rest of the space. + const vault = fakeVault({ + listTrackedFiles: vi.fn(async () => ["leak.md"]), + }); + const deps = baseDeps(vault); + const rawReadFile = vi.fn(async () => "GIT_SYNC_SECRET=topsecret"); + deps.fs.readFile = rawReadFile as any; + // `/vault/leak.md` is reported as a symlink by lstat. + deps.fs.lstat = vi.fn(async (p: string) => + p === "/vault/leak.md" + ? { isSymbolicLink: true } + : { isSymbolicLink: false }, + ) as any; + + const res = await runCycle(deps); + + expect(res.ran).toBe(true); + // The poisoned symlink's target was NEVER read (the guard short-circuited). + expect(rawReadFile).not.toHaveBeenCalled(); + }); + it("throws BEFORE the push apply when the signal aborts during the pull phase", async () => { // Abort mid-cycle: the signal fires while listSpaceTree (the pull read) // runs, so the SECOND checkpoint (before runPush) trips and the push apply diff --git a/packages/git-sync/test/diff.test.ts b/packages/git-sync/test/diff.test.ts deleted file mode 100644 index 7d075b48..00000000 --- a/packages/git-sync/test/diff.test.ts +++ /dev/null @@ -1,377 +0,0 @@ -import { describe, expect, it } from 'vitest'; -import { diffDocs } from '../src/lib/diff.js'; - -// --------------------------------------------------------------------------- -// ProseMirror JSON builders. diffDocs accepts plain JSON docs (it parses them -// through the Docmost schema internally), so we only need minimal node shapes. -// --------------------------------------------------------------------------- - -/** A paragraph; omit `text` for an empty paragraph (no content array entries). */ -const para = (text?: string) => ({ - type: 'paragraph', - content: text ? [{ type: 'text', text }] : [], -}); - -/** A heading (level 2 by default) carrying a single text run. */ -const heading = (text: string, level = 2) => ({ - type: 'heading', - attrs: { level }, - content: [{ type: 'text', text }], -}); - -/** A top-level doc node wrapping the given blocks. */ -const doc = (...content: any[]) => ({ type: 'doc', content }); - -/** An image node (atom). */ -const image = () => ({ type: 'image', attrs: {} }); - -/** A callout node wrapping one paragraph. */ -const callout = (text = 'note') => ({ - type: 'callout', - attrs: { type: 'info' }, - content: [para(text)], -}); - -/** A 1x1 table. */ -const table = (cell = 'c') => ({ - type: 'table', - content: [ - { type: 'tableRow', content: [{ type: 'tableCell', content: [para(cell)] }] }, - ], -}); - -/** A paragraph carrying a text run that bears a link mark with the given href. */ -const linkPara = (text: string, href: string | undefined, extraMarks: any[] = []) => ({ - type: 'paragraph', - content: [ - { - type: 'text', - text, - marks: [{ type: 'link', attrs: href === undefined ? {} : { href } }, ...extraMarks], - }, - ], -}); - -/** The diff.ts default for the notes-heading argument. */ -const DEFAULT_NOTES_HEADING = 'Примечания переводчика'; - -describe('diffDocs', () => { - describe('textual changes (precise path)', () => { - it('reports no changes for two identical docs', () => { - const d = doc(para('hello world')); - const result = diffDocs(d, d); - - expect(result.changes).toHaveLength(0); - expect(result.summary).toEqual({ inserted: 0, deleted: 0, blocksChanged: 0 }); - // The Changes section renders the sentinel line for an empty change list. - expect(result.markdown).toContain('(no textual changes)'); - }); - - it('counts a pure insertion ("abc" -> "abcXY") and captures the inserted substring', () => { - const result = diffDocs(doc(para('abc')), doc(para('abcXY'))); - - expect(result.summary.inserted).toBe(2); - expect(result.summary.deleted).toBe(0); - // Exactly one insert change whose text equals the inserted substring. - const inserts = result.changes.filter((c) => c.op === 'insert'); - expect(inserts).toHaveLength(1); - expect(inserts[0].text).toBe('XY'); - // No deletions on a pure insertion. - expect(result.changes.filter((c) => c.op === 'delete')).toHaveLength(0); - }); - - it('counts a pure deletion ("abcXY" -> "abc") and captures the deleted substring', () => { - const result = diffDocs(doc(para('abcXY')), doc(para('abc'))); - - expect(result.summary.deleted).toBe(2); - expect(result.summary.inserted).toBe(0); - const deletes = result.changes.filter((c) => c.op === 'delete'); - expect(deletes).toHaveLength(1); - expect(deletes[0].text).toBe('XY'); - expect(result.changes.filter((c) => c.op === 'insert')).toHaveLength(0); - }); - - it('reports a word modification as a matched delete + insert with exact substrings', () => { - const result = diffDocs(doc(para('hello world')), doc(para('hello there'))); - - // "world" (5) removed, "there" (5) added. - expect(result.summary.inserted).toBe(5); - expect(result.summary.deleted).toBe(5); - - const deletes = result.changes.filter((c) => c.op === 'delete'); - const inserts = result.changes.filter((c) => c.op === 'insert'); - expect(deletes.map((c) => c.text)).toContain('world'); - expect(inserts.map((c) => c.text)).toContain('there'); - }); - - it('handles two empty docs without error', () => { - const result = diffDocs({ type: 'doc', content: [] }, { type: 'doc', content: [] }); - - expect(result.changes).toHaveLength(0); - expect(result.summary).toEqual({ inserted: 0, deleted: 0, blocksChanged: 0 }); - expect(result.markdown).toContain('(no textual changes)'); - }); - - it('reports an insertion into an empty doc', () => { - const result = diffDocs({ type: 'doc', content: [] }, doc(para('brand new'))); - - expect(result.summary.inserted).toBeGreaterThan(0); - const inserts = result.changes.filter((c) => c.op === 'insert'); - expect(inserts.length).toBeGreaterThan(0); - // The inserted text is the new paragraph's content. - expect(inserts.map((c) => c.text).join('')).toContain('brand new'); - }); - }); - - describe('integrity counting', () => { - it('counts images, tables and callouts as old -> new tuples', () => { - // old: 1 image, 1 callout, 1 table new: 2 images, 0 callouts, 1 table - const oldDoc = doc(image(), callout(), table()); - const newDoc = doc(image(), image(), table()); - const { integrity } = diffDocs(oldDoc, newDoc); - - expect(integrity.images).toEqual([1, 2]); - expect(integrity.callouts).toEqual([1, 0]); - expect(integrity.tables).toEqual([1, 1]); - }); - - it('renders the integrity section verbatim in the markdown', () => { - const oldDoc = doc(image(), callout(), table()); - const newDoc = doc(image(), image(), table()); - const { markdown } = diffDocs(oldDoc, newDoc); - - // The integrity block is our own formatting, so exact lines are asserted. - expect(markdown).toContain('## Integrity (old -> new)'); - expect(markdown).toContain('- images: 1 -> 2'); - expect(markdown).toContain('- callouts: 1 -> 0'); - expect(markdown).toContain('- tables: 1 -> 1'); - }); - - it('counts a single link split across two adjacent runs (shared href) as one link', () => { - // Two text runs, both bearing a link to the SAME href; one also bold. - const d = doc({ - type: 'paragraph', - content: [ - { type: 'text', text: 'foo', marks: [{ type: 'link', attrs: { href: 'http://x' } }, { type: 'bold' }] }, - { type: 'text', text: 'bar', marks: [{ type: 'link', attrs: { href: 'http://x' } }] }, - ], - }); - const { integrity } = diffDocs(d, d); - - // Counting by unique href collapses the two runs into one link. - expect(integrity.links).toEqual([1, 1]); - }); - - it('counts distinct hrefs separately', () => { - const d = doc({ - type: 'paragraph', - content: [ - { type: 'text', text: 'one', marks: [{ type: 'link', attrs: { href: 'http://a' } }] }, - { type: 'text', text: 'two', marks: [{ type: 'link', attrs: { href: 'http://b' } }] }, - ], - }); - const { integrity } = diffDocs(d, d); - expect(integrity.links).toEqual([2, 2]); - }); - - it('counts a link mark with a missing href once (bucketed under "")', () => { - // Per source: a missing/empty href is collected under a single "" key, so a - // malformed link is still counted exactly once. - const d = linkPara('orphan', undefined); - const { integrity } = diffDocs(d, d); - expect(integrity.links).toEqual([1, 1]); - }); - }); - - describe('footnoteMarkers', () => { - it('excludes markers after the default notes heading and preserves reading order', () => { - // Body has [1] then [2]; the [99] sits AFTER the notes heading and must be - // excluded from both old and new marker lists. - const d = doc( - para('intro [1] middle [2]'), - heading(DEFAULT_NOTES_HEADING), - para('[99] footnote body'), - ); - const { integrity } = diffDocs(d, d); - - expect(integrity.footnoteMarkers).toEqual([ - [1, 2], - [1, 2], - ]); - // Reading order: [1] precedes [2]. - expect(integrity.footnoteMarkers[1]).toEqual([1, 2]); - }); - - it('honors a custom notesHeading argument', () => { - const d = doc(para('a [1]'), heading('Notes'), para('[5] excluded')); - const { integrity } = diffDocs(d, d, 'Notes'); - - // With the matching custom heading, [5] is excluded. - expect(integrity.footnoteMarkers).toEqual([[1], [1]]); - }); - - it('includes every marker when no notes heading is present', () => { - // No heading equals the notesHeading -> the whole doc is the body. - const d = doc(para('a [1] b [2]'), para('[3]')); - const { integrity } = diffDocs(d, d); - - expect(integrity.footnoteMarkers).toEqual([ - [1, 2, 3], - [1, 2, 3], - ]); - }); - - it('renders the footnoteMarkers integrity line verbatim', () => { - const d = doc(para('x [1] y [2]'), heading(DEFAULT_NOTES_HEADING), para('[9]')); - const { markdown } = diffDocs(d, d); - expect(markdown).toContain('- footnoteMarkers: [1, 2] -> [1, 2]'); - }); - }); - - describe('coarse fallback', () => { - // An unknown node type makes Node.fromJSON reject the doc, which throws - // inside the precise pipeline and triggers the coarse block-level fallback. - // (Confirmed by running the module: `{ type: '___nope' }` is not in the - // schema, so parsing throws and `fellBack` becomes true.) - it('degrades to a coarse block-level diff instead of throwing', () => { - const oldDoc = doc(para('keep this'), { type: '___nope' }); - const newDoc = doc(para('keep this'), para('new block')); - - // Must not throw. - const result = diffDocs(oldDoc, newDoc); - - // The fallback note appears in the markdown header area. - expect(result.markdown).toContain('precise diff failed; coarse block-level diff shown.'); - // Only the genuinely new block is reported; the unchanged "keep this" - // block is not. - const inserts = result.changes.filter((c) => c.op === 'insert'); - expect(inserts).toHaveLength(1); - expect(inserts[0].text).toBe('new block'); - }); - - it('does not report whitespace-only blocks in the fallback path', () => { - // New doc adds a block whose plain text is only whitespace; coarseDiff - // skips blocks whose trimmed text is empty. - const oldDoc = doc({ type: '___nope' }, para('kept')); - const newDoc = doc(para('kept'), para(' ')); - - const result = diffDocs(oldDoc, newDoc); - - // Fallback was taken (precise path threw on the unknown node). - expect(result.markdown).toContain('coarse block-level diff shown.'); - // No change is reported: "kept" is unchanged and " " is whitespace-only. - expect(result.changes).toHaveLength(0); - expect(result.summary).toEqual({ inserted: 0, deleted: 0, blocksChanged: 0 }); - }); - - it('still computes integrity (images/tables/callouts/footnotes) in the coarse-fallback branch', () => { - // Regression guard: integrity is computed BEFORE the try/catch, so a - // pathological pair that forces the fallback must NOT zero the integrity - // counts. The unknown node forces the precise path to throw (fellBack). - const oldDoc = doc(image(), callout(), table(), para('a [1]'), { type: '___nope' }); - const newDoc = doc(image(), image(), table(), para('b [2] [3]')); - const result = diffDocs(oldDoc, newDoc); - - // The fallback was taken... - expect(result.markdown).toContain('coarse block-level diff shown.'); - // ...yet every integrity tuple is the real count, not [0,0]. - expect(result.integrity.images).toEqual([1, 2]); - expect(result.integrity.callouts).toEqual([1, 0]); - expect(result.integrity.tables).toEqual([1, 1]); - // Footnote markers are counted from both docs even under the fallback. - expect(result.integrity.footnoteMarkers).toEqual([[1], [2, 3]]); - }); - - it('reports both a deletion and an insertion in the fallback path', () => { - const oldDoc = doc(para('old paragraph'), { type: '___nope' }); - const newDoc = doc(para('new paragraph')); - - const result = diffDocs(oldDoc, newDoc); - - expect(result.markdown).toContain('coarse block-level diff shown.'); - const deletes = result.changes.filter((c) => c.op === 'delete'); - const inserts = result.changes.filter((c) => c.op === 'insert'); - // "old paragraph" no longer present -> deletion; "new paragraph" -> insertion. - expect(deletes.map((c) => c.text)).toContain('old paragraph'); - expect(inserts.map((c) => c.text)).toContain('new paragraph'); - // Character counts accumulate from the reported texts. - expect(result.summary.deleted).toBe('old paragraph'.length); - expect(result.summary.inserted).toBe('new paragraph'.length); - }); - }); - - describe('blockContextAt (DiffChange.block)', () => { - it('truncates a >80-char block context with an ellipsis and keeps it non-empty', () => { - // A 100-char paragraph with a one-char edit; the block context guards a - // swallowed catch and must produce a truncated, non-empty string. - const longText = 'X'.repeat(100); - const result = diffDocs(doc(para(longText)), doc(para(longText + 'Z'))); - - const inserts = result.changes.filter((c) => c.op === 'insert'); - expect(inserts).toHaveLength(1); - const block = inserts[0].block; - expect(block.length).toBeGreaterThan(0); - // Truncation rule: 77 chars + "..." = length 80, ending with "...". - expect(block.endsWith('...')).toBe(true); - expect(block).toHaveLength(80); - }); - - it('keeps a short block context untruncated', () => { - const result = diffDocs(doc(para('abc')), doc(para('abcXY'))); - const inserts = result.changes.filter((c) => c.op === 'insert'); - expect(inserts[0].block).toBe('abcXY'); - expect(inserts[0].block.endsWith('...')).toBe(false); - }); - - it('dedups blocksChanged by op + block context (multiple edits in one block count once per op)', () => { - // Two separate word edits inside a single paragraph produce 4 changes - // (2 deletes + 2 inserts) but only 2 distinct block keys: - // "d:the quick brown fox" and "i:the slow brown wolf". - const result = diffDocs( - doc(para('the quick brown fox')), - doc(para('the slow brown wolf')), - ); - - expect(result.changes.length).toBe(4); - expect(result.summary.blocksChanged).toBe(2); - }); - - it('counts one block key per op for edits spread across two blocks', () => { - // Edits in two different paragraphs -> 4 distinct block keys. - const result = diffDocs( - doc(para('first line here'), para('second line here')), - doc(para('first line HERE'), para('second line HERE')), - ); - - expect(result.summary.blocksChanged).toBe(4); - }); - }); - - describe('markdown rendering', () => { - it('puts the summary counts in the markdown header', () => { - const result = diffDocs(doc(para('abc')), doc(para('abcXY'))); - expect(result.markdown).toContain( - '# Diff: 2 inserted / 0 deleted (1 blocks changed)', - ); - }); - - it('renders each change with its op sign (loose membership, library-controlled order)', () => { - const result = diffDocs(doc(para('hello world')), doc(para('hello there'))); - - // The Changes section is ordered by the diff library; assert membership, - // not an exact ordered string. Scope to lines AFTER the "## Changes" - // heading, since integrity lines also begin with "- ". - const lines = result.markdown.split('\n'); - const changesIdx = lines.indexOf('## Changes'); - expect(changesIdx).toBeGreaterThanOrEqual(0); - const changeLines = lines - .slice(changesIdx + 1) - .filter((l) => l.startsWith('+ ') || l.startsWith('- ')); - expect(changeLines.some((l) => l.startsWith('- ') && l.includes('world'))).toBe(true); - expect(changeLines.some((l) => l.startsWith('+ ') && l.includes('there'))).toBe(true); - // One delete line and one insert line. - expect(changeLines.filter((l) => l.startsWith('- '))).toHaveLength(1); - expect(changeLines.filter((l) => l.startsWith('+ '))).toHaveLength(1); - }); - }); -}); diff --git a/packages/git-sync/test/engine-gaps.test.ts b/packages/git-sync/test/engine-gaps.test.ts index ca99dfc0..3bb209eb 100644 --- a/packages/git-sync/test/engine-gaps.test.ts +++ b/packages/git-sync/test/engine-gaps.test.ts @@ -4,7 +4,7 @@ import type { ApplyPushDeps, PushActions } from '../src/engine/push'; import { planReconciliation } from '../src/engine/reconcile'; import { buildVaultLayout, type PageNode } from '../src/engine/layout'; import { sanitizeTitle } from '../src/engine/sanitize'; -import { firstDivergence } from '../src/engine/roundtrip-helpers'; +import { firstDivergence } from './roundtrip-helpers'; import { applyPullActions } from '../src/engine/pull'; import type { PullActions, ApplyPullActionsDeps } from '../src/engine/pull'; import type { DeletionDecision } from '../src/engine/reconcile'; diff --git a/packages/git-sync/test/markdown-roundtrip.property.test.ts b/packages/git-sync/test/markdown-roundtrip.property.test.ts index 35bf5716..50f83d73 100644 --- a/packages/git-sync/test/markdown-roundtrip.property.test.ts +++ b/packages/git-sync/test/markdown-roundtrip.property.test.ts @@ -14,7 +14,7 @@ import { convertProseMirrorToMarkdown } from '../src/lib/markdown-converter.js'; // global DOM via jsdom at module load time — this is expected and required for // @tiptap/html's generateJSON to run under Node. import { markdownToProseMirror } from '../src/lib/markdown-to-prosemirror.js'; -import { stripBlockIds } from '../src/engine/roundtrip-helpers.js'; +import { stripBlockIds } from './roundtrip-helpers.js'; // --------------------------------------------------------------------------- // WHY THIS TEST EXISTS (SPEC §11 / "Задача №0") diff --git a/packages/git-sync/test/markdown-to-prosemirror-gaps.test.ts b/packages/git-sync/test/markdown-to-prosemirror-gaps.test.ts index b758dfd5..ebec6e9d 100644 --- a/packages/git-sync/test/markdown-to-prosemirror-gaps.test.ts +++ b/packages/git-sync/test/markdown-to-prosemirror-gaps.test.ts @@ -157,6 +157,25 @@ describe('preprocessCallouts: nested callouts + code-fenced ":::"', () => { expect(allText(callout)).toContain('before code'); expect(allText(callout)).toContain('after code'); }); + + it('(c) an UNCLOSED ":::" opener is treated as a literal line, not a callout', async () => { + // Realistic input: a hand-edited vault file with a `:::info` opener and no + // matching closing `:::`. The fallback emits the opener as a LITERAL line + // rather than swallowing the rest of the document into a phantom callout — + // previously uncovered (markdown-to-prosemirror.ts). + const md = [':::info', 'orphan body line', 'another line'].join('\n'); + + const docNode = await markdownToProseMirror(md); + + // No callout node was created (the opener never closed). + expect(findAll(docNode, 'callout')).toHaveLength(0); + // The opener survives as literal text and the body lines are preserved (the + // rest of the document was NOT eaten by an unterminated callout). + const text = allText(docNode); + expect(text).toContain(':::info'); + expect(text).toContain('orphan body line'); + expect(text).toContain('another line'); + }); }); // --------------------------------------------------------------------------- diff --git a/packages/git-sync/test/path-guard.test.ts b/packages/git-sync/test/path-guard.test.ts new file mode 100644 index 00000000..0aa7bb7f --- /dev/null +++ b/packages/git-sync/test/path-guard.test.ts @@ -0,0 +1,110 @@ +import { describe, it, expect, vi } from "vitest"; +import { + assertVaultPathSafe, + isWithinRoot, + VaultPathUnsafeError, + type PathGuardIo, +} from "../src/engine/path-guard"; + +const VAULT = "/srv/git-sync/space-1"; + +/** + * Build a fake PathGuardIo from a model of the filesystem: + * - `symlinks`: absolute paths that ARE symlinks (lstat -> isSymbolicLink). + * - `existing`: absolute paths that EXIST (anything not listed is ENOENT/null). + * The vault root is always treated as existing. + * - `realpaths`: optional realpath overrides (default: identity for existing). + */ +function fakeIo(model: { + symlinks?: string[]; + existing?: string[]; + realpaths?: Record; +}): PathGuardIo { + const symlinks = new Set(model.symlinks ?? []); + const existing = new Set([VAULT, ...(model.existing ?? []), ...symlinks]); + return { + lstat: vi.fn(async (p: string) => + existing.has(p) ? { isSymbolicLink: symlinks.has(p) } : null, + ), + realpath: vi.fn(async (p: string) => + existing.has(p) ? (model.realpaths?.[p] ?? p) : null, + ), + }; +} + +describe("isWithinRoot", () => { + it("accepts the root itself and nested paths", () => { + expect(isWithinRoot(VAULT, VAULT)).toBe(true); + expect(isWithinRoot(VAULT, `${VAULT}/a/b.md`)).toBe(true); + }); + it("rejects siblings, ancestors and `..` traversal", () => { + expect(isWithinRoot(VAULT, "/srv/git-sync/space-2/x.md")).toBe(false); + expect(isWithinRoot(VAULT, "/srv/git-sync")).toBe(false); + expect(isWithinRoot(VAULT, `${VAULT}/../space-2/x.md`)).toBe(false); + expect(isWithinRoot(VAULT, "/etc/passwd")).toBe(false); + }); +}); + +describe("assertVaultPathSafe", () => { + it("allows a normal nested file with no symlinks on its chain", async () => { + const io = fakeIo({ existing: [`${VAULT}/Folder`, `${VAULT}/Folder/Page.md`] }); + await expect( + assertVaultPathSafe(io, VAULT, `${VAULT}/Folder/Page.md`), + ).resolves.toBeUndefined(); + }); + + it("allows a NOT-YET-EXISTING leaf (the normal write/mkdir case)", async () => { + // Folder exists, the .md does not yet — the walk stops at the absent leaf. + const io = fakeIo({ existing: [`${VAULT}/Folder`] }); + await expect( + assertVaultPathSafe(io, VAULT, `${VAULT}/Folder/New.md`), + ).resolves.toBeUndefined(); + }); + + it("rejects a TARGET that is itself a symlink (the leak.md -> /etc/passwd attack)", async () => { + const io = fakeIo({ symlinks: [`${VAULT}/leak.md`] }); + await expect( + assertVaultPathSafe(io, VAULT, `${VAULT}/leak.md`), + ).rejects.toBeInstanceOf(VaultPathUnsafeError); + await expect( + assertVaultPathSafe(io, VAULT, `${VAULT}/leak.md`), + ).rejects.toMatchObject({ reason: "symlink" }); + }); + + it("rejects a path whose ANCESTOR directory is a symlink (write-outside-vault primitive)", async () => { + // `escape` is a symlinked dir; writing `escape/x.md` would land outside. + const io = fakeIo({ + symlinks: [`${VAULT}/escape`], + existing: [`${VAULT}/escape/x.md`], + }); + await expect( + assertVaultPathSafe(io, VAULT, `${VAULT}/escape/x.md`), + ).rejects.toMatchObject({ reason: "symlink" }); + }); + + it("rejects a `..` traversal lexically, before any IO", async () => { + const io = fakeIo({}); + await expect( + assertVaultPathSafe(io, VAULT, `${VAULT}/../space-2/steal.md`), + ).rejects.toMatchObject({ reason: "escape" }); + expect(io.lstat).not.toHaveBeenCalled(); + }); + + it("rejects when the deepest existing ancestor's realpath escapes the vault", async () => { + // No symlink flagged by lstat (e.g. the data dir was relocated under a link + // the lexical/lstat checks below the root cannot see), but realpath resolves + // the existing ancestor outside the vault's realpath. + const io = fakeIo({ + existing: [`${VAULT}/sub`], + realpaths: { [VAULT]: VAULT, [`${VAULT}/sub`]: "/elsewhere/sub" }, + }); + await expect( + assertVaultPathSafe(io, VAULT, `${VAULT}/sub/page.md`), + ).rejects.toMatchObject({ reason: "escape" }); + }); + + it("allows the vault root path itself", async () => { + const io = fakeIo({}); + await expect(assertVaultPathSafe(io, VAULT, VAULT)).resolves.toBeUndefined(); + }); +}); diff --git a/packages/git-sync/test/redteam-push-cycle.test.ts b/packages/git-sync/test/redteam-push-cycle.test.ts index b7dc3424..d0880c80 100644 --- a/packages/git-sync/test/redteam-push-cycle.test.ts +++ b/packages/git-sync/test/redteam-push-cycle.test.ts @@ -1,5 +1,10 @@ import { describe, expect, it, vi } from 'vitest'; -import { runPush, LAST_PUSHED_REF, DOCMOST_BRANCH } from '../src/engine/push'; +import { + runPush, + LAST_PUSHED_REF, + DOCMOST_BRANCH, + CONFLICT_MARKERS_FAILURE_REASON, +} from '../src/engine/push'; import type { PushDeps } from '../src/engine/push'; import type { Settings } from '../src/engine/settings'; import { runCycle, type RunCycleDeps } from '../src/engine/cycle'; @@ -139,6 +144,59 @@ describe('#13 conflict markers reach Docmost', () => { expect(pushedBody).toContain('my line'); expect(pushedBody).toContain('their line'); }); + + it('CREATE branch (autoMergeConflicts off): does NOT create a page from a conflicted NEW file; records a create failure', async () => { + // The conflict-markers guard is DUPLICATED on the CREATE path (a brand-new + // .md with NO gitmost_id, status 'A') and was previously untested — only the + // UPDATE branch had coverage. Without this, a regression would SILENTLY push + // `<<<<<<<`/`>>>>>>>` into a freshly-created page. Assert the create path + // isolates it exactly like update: no createPage, a kind:'create' failure + // with the conflict reason, and the refs held. + const { git, calls } = makePushGit({ + changes: [{ status: 'A', path: 'New.md' }], + }); + const createPage = vi.fn(async () => ({ data: { id: 'new-1' } })); + const client = { + listSpaceTree: vi.fn(async () => ({ pages: [], complete: true })), + importPageMarkdown: vi.fn(), + createPage, + deletePage: vi.fn(), + movePage: vi.fn(), + renamePage: vi.fn(), + }; + const deps: PushDeps = { + // makeSettings() leaves autoMergeConflicts undefined -> the SAFE default. + settings: makeSettings(), + git, + makeClient: () => client as any, + // Raw conflict body with NO gitmost_id frontmatter -> classified as CREATE. + readFile: vi.fn(async (path: string) => { + if (path === 'New.md') return conflictBody; + throw new Error(`no such file: ${path}`); + }), + writeFile: vi.fn(async () => {}), + log: () => {}, + }; + + const res = await runPush(deps, { dryRun: false }); + expect(res.mode).toBe('apply'); + + // No page was created from the conflicted content. + expect(createPage).not.toHaveBeenCalled(); + + // Recorded as a CREATE failure with the conflict-markers reason. + expect(res.applied?.failures).toEqual([ + expect.objectContaining({ + kind: 'create', + path: 'New.md', + error: CONFLICT_MARKERS_FAILURE_REASON, + }), + ]); + + // A failure prevents advancing the last-pushed ref. + expect(res.applied?.lastPushedAdvanced).toBe(false); + expect(calls.updateRef).toHaveLength(0); + }); }); // --------------------------------------------------------------------------- diff --git a/packages/git-sync/src/engine/roundtrip-helpers.ts b/packages/git-sync/test/roundtrip-helpers.ts similarity index 100% rename from packages/git-sync/src/engine/roundtrip-helpers.ts rename to packages/git-sync/test/roundtrip-helpers.ts diff --git a/packages/mcp/test/unit/schema-surface.test.mjs b/packages/mcp/test/unit/schema-surface.test.mjs new file mode 100644 index 00000000..bb813304 --- /dev/null +++ b/packages/mcp/test/unit/schema-surface.test.mjs @@ -0,0 +1,118 @@ +import { test } from "node:test"; +import assert from "node:assert/strict"; +import { getSchema } from "@tiptap/core"; + +import { docmostExtensions } from "../../build/lib/docmost-schema.js"; + +// SCHEMA-DRIFT GUARD (must-review gate). +// +// `src/lib/docmost-schema.ts` is a VENDORED MIRROR of the canonical Docmost +// document schema defined in `@docmost/editor-ext`. The MCP server uses it to +// convert pages to/from ProseMirror JSON (and through Yjs); any node, mark, or +// attribute that exists in the canonical schema but is missing here is silently +// dropped on a round-trip (data loss). The reverse — a node/mark/attr here that +// no longer exists in the canonical schema — is dead surface that can mask drift. +// +// This test derives a stable, sorted "schema surface" (every node/mark name and +// its sorted attribute keys) and pins it against an INLINE expected constant. +// It is intentionally a LOUD must-review gate rather than an automatic +// editor-ext diff: editor-ext's Tiptap representation differs from this +// vendored copy, so a cross-representation compare would be fragile. The +// reference lives in this file so it is reviewed in the diff of every change. +// +// This is the MCP twin of git-sync's +// `packages/git-sync/test/schema-surface-snapshot.test.ts`. The two vendored +// copies are NOT identical (see PROVENANCE in docmost-schema.ts): the MCP copy +// does not vendor every node git-sync does, so the surfaces legitimately differ. +// Keep both gates honest against `@docmost/editor-ext` independently. +// +// WHEN THIS TEST FAILS: do NOT blindly update `expectedSurface`. First confirm +// the change matches `@docmost/editor-ext` (the canonical schema) so the +// markdown <-> ProseMirror round-trip stays lossless, THEN copy the new surface +// into the expected constant below. + +/** Derive the deterministic schema surface from the vendored extension set. */ +function deriveSurface() { + const schema = getSchema(docmostExtensions); + const surface = []; + for (const [name, type] of Object.entries(schema.nodes)) { + surface.push({ + name, + kind: "node", + attrs: Object.keys(type.spec?.attrs ?? {}).sort(), + }); + } + for (const [name, type] of Object.entries(schema.marks)) { + surface.push({ + name, + kind: "mark", + attrs: Object.keys(type.spec?.attrs ?? {}).sort(), + }); + } + // Sort by name, then by kind, for a representation-independent ordering. + surface.sort((a, b) => + a.name === b.name ? a.kind.localeCompare(b.kind) : a.name.localeCompare(b.name), + ); + return surface; +} + +// The committed reference surface. Built from the ACTUAL current schema; review +// every change to this constant against `@docmost/editor-ext`. +const expectedSurface = [ + { name: "attachment", kind: "node", attrs: ["attachmentId", "mime", "name", "placeholder", "size", "url"] }, + { name: "audio", kind: "node", attrs: ["attachmentId", "placeholder", "size", "src"] }, + { name: "blockquote", kind: "node", attrs: [] }, + { name: "bold", kind: "mark", attrs: [] }, + { name: "bulletList", kind: "node", attrs: [] }, + { name: "callout", kind: "node", attrs: ["icon", "type"] }, + { name: "code", kind: "mark", attrs: [] }, + { name: "codeBlock", kind: "node", attrs: ["language"] }, + { name: "column", kind: "node", attrs: ["width"] }, + { name: "columns", kind: "node", attrs: ["layout", "widthMode"] }, + { name: "comment", kind: "mark", attrs: ["commentId", "resolved"] }, + { name: "details", kind: "node", attrs: ["open"] }, + { name: "detailsContent", kind: "node", attrs: [] }, + { name: "detailsSummary", kind: "node", attrs: [] }, + { name: "doc", kind: "node", attrs: [] }, + { name: "drawio", kind: "node", attrs: ["align", "alt", "aspectRatio", "attachmentId", "height", "size", "src", "title", "width"] }, + { name: "embed", kind: "node", attrs: ["align", "height", "provider", "src", "width"] }, + { name: "excalidraw", kind: "node", attrs: ["align", "alt", "aspectRatio", "attachmentId", "height", "size", "src", "title", "width"] }, + { name: "footnoteDefinition", kind: "node", attrs: ["id"] }, + { name: "footnoteReference", kind: "node", attrs: ["id"] }, + { name: "footnotesList", kind: "node", attrs: [] }, + { name: "hardBreak", kind: "node", attrs: [] }, + { name: "heading", kind: "node", attrs: ["id", "indent", "level", "textAlign"] }, + { name: "highlight", kind: "mark", attrs: ["color"] }, + { name: "horizontalRule", kind: "node", attrs: [] }, + { name: "htmlEmbed", kind: "node", attrs: ["height", "source"] }, + { name: "image", kind: "node", attrs: ["align", "alt", "aspectRatio", "attachmentId", "height", "placeholder", "size", "src", "title", "width"] }, + { name: "italic", kind: "mark", attrs: [] }, + { name: "link", kind: "mark", attrs: ["class", "href", "internal", "rel", "target", "title"] }, + { name: "listItem", kind: "node", attrs: [] }, + { name: "mathBlock", kind: "node", attrs: ["text"] }, + { name: "mathInline", kind: "node", attrs: ["text"] }, + { name: "mention", kind: "node", attrs: ["anchorId", "creatorId", "entityId", "entityType", "id", "label", "slugId"] }, + { name: "orderedList", kind: "node", attrs: ["start", "type"] }, + { name: "pageBreak", kind: "node", attrs: [] }, + { name: "paragraph", kind: "node", attrs: ["id", "indent", "textAlign"] }, + { name: "pdf", kind: "node", attrs: ["attachmentId", "height", "name", "placeholder", "size", "src", "width"] }, + { name: "strike", kind: "mark", attrs: [] }, + { name: "subpages", kind: "node", attrs: [] }, + { name: "subscript", kind: "mark", attrs: [] }, + { name: "superscript", kind: "mark", attrs: [] }, + { name: "table", kind: "node", attrs: [] }, + { name: "tableCell", kind: "node", attrs: ["align", "backgroundColor", "backgroundColorName", "colspan", "colwidth", "rowspan"] }, + { name: "tableHeader", kind: "node", attrs: ["align", "backgroundColor", "backgroundColorName", "colspan", "colwidth", "rowspan"] }, + { name: "tableRow", kind: "node", attrs: [] }, + { name: "taskItem", kind: "node", attrs: ["checked"] }, + { name: "taskList", kind: "node", attrs: [] }, + { name: "text", kind: "node", attrs: [] }, + { name: "textStyle", kind: "mark", attrs: ["color"] }, + { name: "underline", kind: "mark", attrs: [] }, + { name: "video", kind: "node", attrs: ["align", "alt", "aspectRatio", "attachmentId", "height", "placeholder", "size", "src", "width"] }, + { name: "youtube", kind: "node", attrs: ["align", "height", "src", "width"] }, +]; + +test("docmost schema surface matches the committed reference (re-verify against @docmost/editor-ext on change)", () => { + assert.deepEqual(deriveSurface(), expectedSurface); +}); diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml index 441ec519..e7f22bd4 100644 --- a/pnpm-lock.yaml +++ b/pnpm-lock.yaml @@ -889,9 +889,6 @@ importers: packages/git-sync: dependencies: - '@fellow/prosemirror-recreate-transform': - specifier: ^1.2.3 - version: 1.2.3 '@tiptap/core': specifier: 3.20.4 version: 3.20.4(@tiptap/pm@3.20.4)