fix(git-sync): address PR #119 review #2 — throttle /git Basic auth, fix mcp schema drift + warnings/tests

Must-fix:
- Throttle the raw /git HTTP-Basic path: it bypasses Nest/ThrottlerGuard, so
  verifyUserCredentials (bcrypt) ran unthrottled. Wrap it in the SAME
  FailedLoginLimiter the /mcp path uses (5/60s; per-IP, per-IP+email, global
  per-email keys; atomic tryReserve BEFORE bcrypt; success resets, non-credential
  errors release). The (threshold+1)-th attempt now gets 429 pre-bcrypt. Sweep
  timer + onModuleDestroy mirror McpService.
- Fix the mcp schema mirror drift: packages/mcp details `open` attr now reads via
  hasAttribute (matches editor-ext canon + git-sync copy); getAttribute dropped a
  bare `<details open>` state. (build/ is gitignored — rebuilt locally.)

Tests added:
- /git brute-force throttle: pre-bcrypt 429 on the 6th failure; success resets;
  non-credential error releases the budget.
- git-http-backend lost-lock AbortSignal: already-aborted -> no spawn + 500;
  live abort mid-request -> SIGTERM + response closed.
- orchestrator divergentDocmost -> WARN + flag surfaced in status (+ clean case).
- pollTick re-entrancy guard skips an overlapping tick.
- datasource NotFound early-throws (getPageJson/move/rename) + updatedAt:undefined
  stale-read branch (importPageMarkdown/createPage).

Suggestions:
- space.repo updateGitSyncSettings: parameterize the jsonb key (`${prefKey}::text`)
  instead of sql.raw (latent-injection footgun); value stays sql.lit. Spec updated.
- pollTick re-entrancy guard (private `polling` flag).
- page-change.listener docstring: honest about the move/rename/delete over-skip
  (loop-guard keys only on lastUpdatedSource) -> ~poll-interval latency, not loss.
- AGENTS.md: document the root /git smart-HTTP route + GitSyncModule.
- Remove redundant redteam-provenance.spec.ts (covered e2e in
  persistence.extension.spec.ts:145).
- Extract the duplicated SIGTERM->SIGKILL+finish block (watchdog + abort) into
  terminateChild; centralize watchdog-timer teardown in done().

Architecture (deferred, documented): mcp schema header now carries the three-copy
keep-in-sync + schema-core note; the editor-ext contract test documents that the
mcp copy and attribute-behaviour drift (details `open`) are not mechanically
covered yet.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
a
2026-06-27 23:49:36 +03:00
committed by claude code agent 227
parent 7179f8a5b2
commit 22e3fcdeba
14 changed files with 544 additions and 115 deletions

View File

@@ -244,8 +244,10 @@ Migration files live in `apps/server/src/database/migrations/` and are named `YY
The API server is a Fastify app with a global `/api` prefix (`main.ts` excludes `robots.txt`, public share pages, and `mcp` from the prefix). A `preHandler` hook enforces that a resolved `workspaceId` exists for most `/api` routes (multi-tenant by hostname/subdomain via `DomainMiddleware`). Auth is JWT (cookie + bearer); authorization is **CASL** (`core/casl`) — every data access is scoped to the user's abilities.
Two routes are mounted **outside** the `/api` prefix at the root, as raw Fastify routes that bypass the Nest pipeline (so neither `DomainMiddleware` nor `ThrottlerGuard` runs for them — each resolves the workspace and throttles itself): `/mcp` (the embedded MCP server, see below) and `/git/<spaceId>.git/...` (the git-sync smart-HTTP host, see below). Both share `mcp-auth.helpers.ts` (HTTP-Basic parsing, `FailedLoginLimiter`, `clientIp`) and the common `resolveRequestWorkspace` helper.
### Module structure (server)
`AppModule` wires integration modules (`integrations/*`: storage [local/S3/Azure], mail, queue [BullMQ on Redis], security, telemetry, throttle, `mcp`, `ai`) plus `CoreModule`, `DatabaseModule`, and `CollaborationModule`. `CoreModule` (`core/*`) holds the domain modules: `page`, `space`, `comment`, `workspace`, `user`, `auth`, `group`, `attachment`, `search`, `share`, `ai-chat`, etc. Each domain module follows NestJS controller → service → repo layering; DB repos live under `database/repos` and are injected app-wide from the global `DatabaseModule`.
`AppModule` wires integration modules (`integrations/*`: storage [local/S3/Azure], mail, queue [BullMQ on Redis], security, telemetry, throttle, `mcp`, `ai`, `git-sync`) plus `CoreModule`, `DatabaseModule`, and `CollaborationModule`. `CoreModule` (`core/*`) holds the domain modules: `page`, `space`, `comment`, `workspace`, `user`, `auth`, `group`, `attachment`, `search`, `share`, `ai-chat`, etc. Each domain module follows NestJS controller → service → repo layering; DB repos live under `database/repos` and are injected app-wide from the global `DatabaseModule`.
**EE removal artifact:** `app.module.ts` still contains a `try/require('./ee/ee.module')` stub. That path no longer exists, so the require fails and is swallowed (it only hard-exits when `CLOUD === 'true'`). Treat EE as gone — do not add code that depends on it.
@@ -261,6 +263,12 @@ The API server is a Fastify app with a global `/api` prefix (`main.ts` excludes
- `core/ai-chat/embedding/` — RAG indexer + a BullMQ consumer on `AI_QUEUE` that embeds pages into `page_embeddings` (vector search), complementing Postgres full-text search. Pages are (re)indexed on edit; `AI_EMBEDDING_TIMEOUT_MS` bounds a hung embeddings endpoint.
- `core/ai-chat/external-mcp/` — admins can attach external MCP servers (e.g. Tavily) to give the agent web access. **`ssrf-guard.ts` validates outbound MCP URLs against SSRF** — keep that guard in the path when touching external-MCP connection logic.
### Git-sync (native two-way Docmost ↔ git Markdown sync)
`integrations/git-sync/` (`GitSyncModule`) + the vendored pure engine in `packages/git-sync`. Off by default; gated by the `GIT_SYNC_ENABLED` master switch (and `GIT_SYNC_SERVICE_USER_ID`, the account git-originated writes are attributed to). Per-space opt-in via `space.settings.gitSync.enabled`. Each enabled space gets an on-disk working "vault" repo; the `GitSyncOrchestrator` runs a debounced + poll-backstop reconcile cycle (PULL Docmost→vault, PUSH vault→Docmost) under a per-space Redis leader lock + in-process mutex (`SpaceLockService`). Writes go through the collaboration layer (so concurrent human edits aren't clobbered) and are stamped `lastUpdatedSource = 'git-sync'` for the listener loop-guard. The in-process `setInterval` orchestration + best-effort lock (no fencing tokens) is a known multi-replica limitation — BullMQ + fencing is the documented future direction.
- **`/git` smart-HTTP host** (`integrations/git-sync/http/`, gated additionally by `GIT_SYNC_HTTP_ENABLED`, which defaults to `GIT_SYNC_ENABLED`): a raw root-mounted Fastify route `/git/<spaceId>.git/...` (registered in `main.ts`, NOT under `/api`) that bridges `git clone`/`fetch`/`push` to `git http-backend`. It authenticates HTTP Basic against `AuthService` (throttled by a `FailedLoginLimiter` mirroring the `/mcp` path), authorizes via `SpaceAbilityFactory` (read = fetch, Manage = push), and gates existence so a non-member gets the SAME 404 as a missing/sync-disabled space (never 403 — that would leak space existence). A push runs the receive-pack under the space lock, then a reconcile cycle.
- **Schema mirror:** `packages/git-sync/src/lib/docmost-schema.ts` is one of the **three** hand-synced copies of the Tiptap document schema (see Client structure) — keep it in lockstep with `editor-ext` (canonical) and `packages/mcp`.
### Client structure
Vite SPA. Code is organized by feature under `apps/client/src/features/*` (mirrors the server domains: `page`, `space`, `comment`, `ai-chat`, `editor`, …). Conventions:
- **TanStack Query** for server state (one `queries/` file per feature), **Jotai** atoms for local/shared UI state, **Mantine 8** + CSS modules (`*.module.css`) + `postcss-preset-mantine` for UI.

View File

@@ -1,10 +0,0 @@
import { resolveSource } from './persistence.extension';
// Red-team finding #14: an explicit git-sync write (no agent edit in the
// coalescing window) must keep the 'git-sync' source so the git-sync
// listener's loop-guard can recognize its own writes and not re-export them.
describe('resolveSource — #14 git-sync provenance loop-guard', () => {
it('keeps git-sync source for an explicit git-sync write (stickyTouched=true, actor=git-sync)', () => {
expect(resolveSource(true, 'git-sync')).toBe('git-sync');
});
});

View File

@@ -111,8 +111,9 @@ describe('SpaceRepo.updateGitSyncSettings — jsonb merge SQL', () => {
expect(sql).toContain(
`jsonb_build_object('gitSync', COALESCE(settings->'gitSync', '{}'::jsonb) ||`,
);
// The pref key is set via jsonb_build_object on the inner object.
expect(sql).toContain(`jsonb_build_object('enabled',`);
// The pref key is set via jsonb_build_object on the inner object, with the
// key as a BOUND, ::text-cast PARAMETER (not sql.raw) — security fix #5.
expect(sql).toMatch(/jsonb_build_object\(\$\d+::text,/);
// Scoped to the row + workspace.
expect(sql).toContain(`where "id" =`);
expect(sql).toContain(`and "workspaceId" =`);
@@ -121,21 +122,25 @@ describe('SpaceRepo.updateGitSyncSettings — jsonb merge SQL', () => {
// `set "settings" = jsonb_build_object(` without the COALESCE/merge).
expect(sql).not.toContain(`set "settings" = jsonb_build_object(`);
// The pref VALUE is inlined via sql.lit (matches the repo's sql.lit usage);
// updatedAt + id + workspaceId are the only bound parameters (the jsonb
// merge text is all literal). updatedAt is a Date, so assert id/workspaceId.
// The pref VALUE stays inlined via sql.lit, but the KEY is now a bound
// parameter, so id + workspaceId + the key are all bound (updatedAt is a Date).
expect(compiled!.parameters).toContain('space-1');
expect(compiled!.parameters).toContain('ws-1');
expect(compiled!.parameters).toContain('enabled');
});
it('inlines the prefKey/prefValue literally (sql.raw key, sql.lit value)', async () => {
it('binds the prefKey as a ::text parameter (no sql.raw splice) and inlines prefValue via sql.lit', async () => {
const { repo, getCaptured } = makeRepoCapturingSql();
await repo.updateGitSyncSettings('space-1', 'ws-1', 'enabled', false);
const sql = getCaptured()!.sql.replace(/\s+/g, ' ');
// key via sql.raw + value via sql.lit -> both appear literally in the
// inner build object (no bound parameter for either).
expect(sql).toContain(`jsonb_build_object('enabled', false)`);
const compiled = getCaptured()!;
const sql = compiled.sql.replace(/\s+/g, ' ');
// The key is a bound `$N::text` parameter; the value is the sql.lit literal.
expect(sql).toMatch(/jsonb_build_object\(\$\d+::text, false\)/);
// The literal key must NOT be spliced into the statement text (the footgun).
expect(sql).not.toContain(`'enabled'`);
// The key rides as a bound parameter instead.
expect(compiled.parameters).toContain('enabled');
});
});

View File

@@ -122,9 +122,15 @@ export class SpaceRepo {
return db
.updateTable('spaces')
.set({
// The jsonb key is a BOUND PARAMETER (`${prefKey}::text`), not
// `sql.raw(prefKey)`. The callers here only ever pass the literals
// 'enabled' / 'autoMergeConflicts', but sql.raw would splice the string
// straight into the statement — a latent SQL-injection footgun the moment
// a future caller passes a request-derived key. Parameterizing closes it
// with no behaviour change for the current literal callers.
settings: sql`COALESCE(settings, '{}'::jsonb)
|| jsonb_build_object('gitSync', COALESCE(settings->'gitSync', '{}'::jsonb)
|| jsonb_build_object('${sql.raw(prefKey)}', ${sql.lit(prefValue)}))`,
|| jsonb_build_object(${prefKey}::text, ${sql.lit(prefValue)}))`,
updatedAt: new Date(),
})
.where('id', '=', spaceId)

View File

@@ -251,6 +251,45 @@ describe('GitHttpBackendService.run', () => {
expect(res.statusCode).toBe(500);
expect(res.end).toHaveBeenCalledWith('Internal server error');
});
it('(abort) an ALREADY-aborted signal -> no spawn, 500 lock-lost', async () => {
// The per-space lock was already lost before run() reached the spawn: we must
// NOT start writing the working tree after a possible lock takeover.
const child = fakeChild();
spawnMock.mockReturnValue(child);
const service = buildService();
const res = fakeRes();
const controller = new AbortController();
controller.abort();
await service.run(baseRequest, fakeReq(), res, controller.signal);
expect(spawnMock).not.toHaveBeenCalled();
expect(res.statusCode).toBe(500);
expect(res.end).toHaveBeenCalledWith('Internal server error');
});
it('(abort) a live signal aborted mid-request -> child SIGTERM + response closed', async () => {
// The lock lapses mid-push: the abort fires, the child is killed (SIGTERM,
// then SIGKILL on escalation), and the response is finished.
const child = fakeChild();
spawnMock.mockReturnValue(child);
const service = buildService();
const res = fakeRes();
const warnSpy = jest.spyOn(Logger.prototype, 'warn');
const controller = new AbortController();
const p = service.run(baseRequest, fakeReq(), res, controller.signal);
await flush(); // let run() reach the spawn + wire the abort listener
controller.abort();
await p;
expect(child.kill).toHaveBeenCalledWith('SIGTERM');
expect(warnSpy).toHaveBeenCalled();
// No headers were sent before the abort -> a clean 500 is sent and ended.
expect(res.statusCode).toBe(500);
expect(res.writableEnded).toBe(true);
});
});
describe('buildGitBackendCgiEnv', () => {

View File

@@ -171,9 +171,13 @@ export class GitHttpBackendService {
let settled = false;
// Set once the child exists so the abort handler can target it.
let onAbort: (() => void) | null = null;
// The watchdog timer; cleared centrally in done() so EVERY settle path
// (close, error, timeout, abort) tears it down exactly once.
let watchdogTimer: ReturnType<typeof setTimeout> | undefined;
const done = () => {
if (settled) return;
settled = true;
if (watchdogTimer) clearTimeout(watchdogTimer);
// Detach the abort listener so a later lock loss does not fire into a
// request that already finished.
if (onAbort) {
@@ -206,34 +210,17 @@ export class GitHttpBackendService {
// Lost-lock abort: the per-space lock lapsed mid-request. Kill the child so
// a receive-pack stops writing `main`'s working tree before another replica
// (which may now hold the lock) starts a cycle. Mirrors the watchdog kill.
// (which may now hold the lock) starts a cycle. Same kill+finish path the
// watchdog uses (extracted into terminateChild).
onAbort = () => {
this.logger.warn(
this.terminateChild(
child,
rawRes,
headerParsed,
'lock-lost',
'git http-backend aborted (git-sync lock lost mid-request); killing child',
done,
);
try {
child.kill('SIGTERM');
const sigkill = setTimeout(() => {
try {
child.kill('SIGKILL');
} catch {
/* ignore */
}
}, 2000);
sigkill.unref?.();
} catch {
/* ignore */
}
if (!headerParsed && !rawRes.headersSent) {
this.send500(rawRes, 'lock-lost');
} else {
try {
rawRes.end();
} catch {
/* ignore */
}
}
done();
};
signal?.addEventListener('abort', onAbort);
@@ -241,40 +228,20 @@ export class GitHttpBackendService {
// child alive forever, so run() never resolves and (because this runs
// inside withSpaceLock) the per-space lock is held + heartbeat-refreshed
// indefinitely. Bound the request: on expiry kill the child, send a clean
// 500 if nothing was sent yet, and settle the promise. The log carries no
// client echo / credentials / body. `.unref()` so the timer never keeps the
// event loop alive; ALWAYS cleared in the close/error handlers below.
const timer = setTimeout(() => {
this.logger.warn(
// 500 if nothing was sent yet, and settle the promise. `.unref()` so the
// timer never keeps the event loop alive; ALWAYS cleared in done().
watchdogTimer = setTimeout(() => {
this.terminateChild(
child,
rawRes,
headerParsed,
'timeout',
`git http-backend timed out after ` +
`${this.environmentService.getGitSyncBackendTimeoutMs()}ms; killing child`,
done,
);
try {
child.kill('SIGTERM');
// Escalate to SIGKILL shortly after in case SIGTERM is ignored.
const sigkill = setTimeout(() => {
try {
child.kill('SIGKILL');
} catch {
/* ignore */
}
}, 2000);
sigkill.unref?.();
} catch {
/* ignore */
}
if (!headerParsed && !rawRes.headersSent) {
this.send500(rawRes, 'timeout');
} else {
try {
rawRes.end();
} catch {
/* ignore */
}
}
done();
}, this.environmentService.getGitSyncBackendTimeoutMs());
timer.unref?.();
watchdogTimer.unref?.();
// Accumulate stdout until we have the full CGI header block, then write the
// parsed status/headers and start streaming the remaining body bytes.
@@ -321,7 +288,7 @@ export class GitHttpBackendService {
});
child.on('error', (err) => {
clearTimeout(timer);
// The watchdog timer is cleared centrally in done().
if (!headerParsed && !rawRes.headersSent) {
this.send500(rawRes, 'child-error', err);
} else {
@@ -336,7 +303,7 @@ export class GitHttpBackendService {
});
child.on('close', (code) => {
clearTimeout(timer);
// The watchdog timer is cleared centrally in done().
if (!headerParsed && !rawRes.headersSent) {
// The child exited before emitting a complete CGI header block.
this.logger.error(
@@ -377,6 +344,49 @@ export class GitHttpBackendService {
});
}
/**
* Kill the child (SIGTERM, then SIGKILL after a grace period if it ignores the
* term) and finish the HTTP response cleanly, then settle. Shared by the two
* forced-termination paths — the watchdog timeout and the lost-lock abort —
* which differ ONLY by the log line and the send500 `reason`. If no response
* has started a clean 500 is sent; otherwise the in-flight stream is just
* ended. Never throws (a thrown kill/end would crash the request).
*/
private terminateChild(
child: ReturnType<typeof spawn>,
rawRes: ServerResponse,
responseStarted: boolean,
send500Reason: string,
logMessage: string,
done: () => void,
): void {
this.logger.warn(logMessage);
try {
child.kill('SIGTERM');
// Escalate to SIGKILL shortly after in case SIGTERM is ignored.
const sigkill = setTimeout(() => {
try {
child.kill('SIGKILL');
} catch {
/* ignore */
}
}, 2000);
sigkill.unref?.();
} catch {
/* ignore */
}
if (!responseStarted && !rawRes.headersSent) {
this.send500(rawRes, send500Reason);
} else {
try {
rawRes.end();
} catch {
/* ignore */
}
}
done();
}
/** Send a clean 500 without leaking credentials or the request body. */
private send500(rawRes: ServerResponse, reason: string, err?: unknown): void {
const message = err instanceof Error ? err.message : undefined;

View File

@@ -13,6 +13,7 @@ import {
NotFoundException,
UnauthorizedException,
} from '@nestjs/common';
import { CREDENTIALS_MISMATCH_MESSAGE } from '../../../core/auth/auth.constants';
import {
SpaceCaslAction,
SpaceCaslSubject,
@@ -488,4 +489,103 @@ describe('GitHttpService.handle', () => {
expect(state.headers['WWW-Authenticate']).toBe('Basic realm="gitmost"');
expect(built.backend.run).not.toHaveBeenCalled();
});
// --- brute-force throttle (must-fix #1, mirrors the /mcp Basic limiter) -----
describe('HTTP-Basic brute-force throttle', () => {
/** A request with wrong credentials for the given email. */
const wrongCredReq = (email = 'dev@example.com') =>
fakeRequest({
url: '/git/space-1.git/info/refs?service=git-upload-pack',
method: 'GET',
authorization: basic(email, 'wrong'),
});
it('rejects the (threshold+1)-th failed attempt with 429 BEFORE bcrypt', async () => {
const built = build();
// Realistic credential failure: verifyUserCredentials throws the SAME
// UnauthorizedException(CREDENTIALS_MISMATCH_MESSAGE) production throws, so
// isCredentialsFailure matches and the reservation is KEPT (counted).
built.authService.verifyUserCredentials.mockRejectedValue(
new UnauthorizedException(CREDENTIALS_MISMATCH_MESSAGE),
);
// 5 failed attempts (threshold = 5): each runs the credential check -> 401.
for (let i = 0; i < 5; i++) {
const { reply, state } = fakeReply();
await built.service.handle(wrongCredReq(), reply);
expect(state.statusCode).toBe(401);
}
expect(built.authService.verifyUserCredentials).toHaveBeenCalledTimes(5);
// The 6th attempt is throttled: 429, Retry-After, and bcrypt is NOT run.
const { reply, state } = fakeReply();
await built.service.handle(wrongCredReq(), reply);
expect(state.statusCode).toBe(429);
expect(state.headers['Retry-After']).toBe('60');
expect(state.headers['WWW-Authenticate']).toBe('Basic realm="gitmost"');
// Still 5 — the 6th never reached verifyUserCredentials (pre-bcrypt reject).
expect(built.authService.verifyUserCredentials).toHaveBeenCalledTimes(5);
expect(built.backend.run).not.toHaveBeenCalled();
built.service.onModuleDestroy();
});
it('a successful auth resets the limiter so later attempts are not throttled', async () => {
const built = build();
const verify = built.authService.verifyUserCredentials;
// First 4 attempts fail (credential mismatch), then one SUCCEEDS.
verify
.mockRejectedValueOnce(new UnauthorizedException(CREDENTIALS_MISMATCH_MESSAGE))
.mockRejectedValueOnce(new UnauthorizedException(CREDENTIALS_MISMATCH_MESSAGE))
.mockRejectedValueOnce(new UnauthorizedException(CREDENTIALS_MISMATCH_MESSAGE))
.mockRejectedValueOnce(new UnauthorizedException(CREDENTIALS_MISMATCH_MESSAGE))
.mockResolvedValueOnce({ id: 'user-1', email: 'dev@example.com' });
for (let i = 0; i < 4; i++) {
const { reply } = fakeReply();
await built.service.handle(wrongCredReq(), reply);
}
// 5th attempt succeeds -> proceeds (not throttled) and clears the budget.
const okReply = fakeReply();
await built.service.handle(
fakeRequest({
url: '/git/space-1.git/info/refs?service=git-upload-pack',
method: 'GET',
authorization: basic('dev@example.com', 'right'),
}),
okReply.reply,
);
expect(okReply.state.hijacked).toBe(true); // proceeded to the backend
// After the reset, a fresh wrong attempt is evaluated (401), NOT a 429 —
// proving the per-IP/per-IP+email budget was cleared by the success.
verify.mockRejectedValueOnce(
new UnauthorizedException(CREDENTIALS_MISMATCH_MESSAGE),
);
const { reply, state } = fakeReply();
await built.service.handle(wrongCredReq(), reply);
expect(state.statusCode).toBe(401);
built.service.onModuleDestroy();
});
it('a non-credential error releases the reservation (does not burn the budget)', async () => {
const built = build();
// A DB error (not a credentials mismatch) must NOT count toward the limiter.
built.authService.verifyUserCredentials.mockRejectedValue(
new Error('db down'),
);
// 10 such failures — far beyond the threshold — must all be 401, never 429,
// because each releases its reservation.
for (let i = 0; i < 10; i++) {
const { reply, state } = fakeReply();
await built.service.handle(wrongCredReq(), reply);
expect(state.statusCode).toBe(401);
}
expect(built.authService.verifyUserCredentials).toHaveBeenCalledTimes(10);
built.service.onModuleDestroy();
});
});
});

View File

@@ -1,4 +1,9 @@
import { Injectable, Logger, UnauthorizedException } from '@nestjs/common';
import {
Injectable,
Logger,
OnModuleDestroy,
UnauthorizedException,
} from '@nestjs/common';
import type { FastifyReply, FastifyRequest } from 'fastify';
import { AuthService } from '../../../core/auth/services/auth.service';
import SpaceAbilityFactory from '../../../core/casl/abilities/space-ability.factory';
@@ -9,7 +14,12 @@ import {
import { SpaceRepo } from '@docmost/db/repos/space/space.repo';
import { WorkspaceRepo } from '@docmost/db/repos/workspace/workspace.repo';
import { User } from '@docmost/db/types/entity.types';
import { parseBasicAuth } from '../../mcp/mcp-auth.helpers';
import {
parseBasicAuth,
FailedLoginLimiter,
clientIp,
isCredentialsFailure,
} from '../../mcp/mcp-auth.helpers';
import { resolveRequestWorkspace } from '../../../common/helpers/resolve-request-workspace';
import { EnvironmentService } from '../../environment/environment.service';
import { VaultRegistryService } from '../services/vault-registry.service';
@@ -40,9 +50,24 @@ const WWW_AUTHENTICATE = 'Basic realm="gitmost"';
* `/api` prefix does not apply). Never logs the password or Authorization header.
*/
@Injectable()
export class GitHttpService {
export class GitHttpService implements OnModuleDestroy {
private readonly logger = new Logger(GitHttpService.name);
/**
* In-process brute-force speed bump for the /git HTTP-Basic path. The raw
* `/git/*` Fastify route bypasses the Nest pipeline (so ThrottlerGuard, which is
* only on controllers, never runs) and there is no fastify rate-limit plugin, so
* without this `verifyUserCredentials` (bcrypt) would run unthrottled on every
* request once GIT_SYNC_HTTP_ENABLED is on. Mirrors the /mcp Basic path EXACTLY
* (FailedLoginLimiter, same 5/60s thresholds, the same per-IP / per-IP+email /
* global-per-email keys) so the two auth seams cannot diverge. A speed bump, not
* a hard boundary (in-process, per replica).
*/
private readonly failedLogins = new FailedLoginLimiter(5, 60_000);
/** Periodic sweep to bound limiter memory (mirrors McpService / mcp http.ts). */
private readonly sweepIntervalMs = 60_000;
private readonly sweepTimer: NodeJS.Timeout;
constructor(
private readonly environmentService: EnvironmentService,
private readonly authService: AuthService,
@@ -52,7 +77,21 @@ export class GitHttpService {
private readonly vaultRegistry: VaultRegistryService,
private readonly orchestrator: GitSyncOrchestrator,
private readonly backend: GitHttpBackendService,
) {}
) {
this.sweepTimer = setInterval(() => {
try {
this.failedLogins.sweep();
} catch (err) {
this.logger.error('git-http failed-login limiter sweep failed', err as Error);
}
}, this.sweepIntervalMs);
// Never keep the event loop alive solely for the sweep timer.
this.sweepTimer.unref?.();
}
onModuleDestroy(): void {
clearInterval(this.sweepTimer);
}
/**
* Resolve the workspace for a /git request the SAME way DomainMiddleware does,
@@ -124,27 +163,86 @@ export class GitHttpService {
let user: User | undefined;
let credentialsValid = false;
let throttled = false;
if (basic && workspaceId) {
try {
user = await this.authService.verifyUserCredentials(
{ email: basic.email, password: basic.password },
workspaceId,
);
credentialsValid = true;
} catch (err) {
if (!(err instanceof UnauthorizedException)) {
// A non-credential failure (e.g. DB error): treat as invalid creds for
// the gate (a 401), and log without leaking the password/header.
this.logger.warn(
`git-http: credential check error: ${
err instanceof Error ? err.message : String(err)
}`,
// Brute-force speed bump, mirroring the /mcp Basic path EXACTLY. Reserve
// ALL three keys ATOMICALLY and BEFORE bcrypt (tryReserve folds the check
// and the increment into one synchronous step), so the (threshold+1)-th
// attempt is rejected before verifyUserCredentials/bcrypt ever runs and
// concurrent attempts for one email cannot all observe count=0. The
// reservation IS the recorded failure: a genuine credential failure leaves
// it in place, a SUCCESS clears it (reset), a non-credential error releases
// it (so it cannot burn a victim's budget).
const emailLc = basic.email.toLowerCase();
const ip = clientIp(req);
const ipKey = `ip:${ip}`;
const ipEmailKey = `ip-email:${ip}:${emailLc}`;
// GLOBAL per-email backstop (no IP): the only key that survives IP / XFF
// rotation, so it is the real account-brute defense (see mcp-auth.helpers).
const emailKey = `email:${emailLc}`;
const ipOk = this.failedLogins.tryReserve(ipKey);
const ipEmailOk = this.failedLogins.tryReserve(ipEmailKey);
const emailOk = this.failedLogins.tryReserve(emailKey);
if (!ipOk || !ipEmailOk || !emailOk) {
// Blocked: release only the keys we actually reserved this call so an
// already-throttled request does not over-charge keys still under budget
// (matches the /mcp reserve model). Do NOT run bcrypt.
if (ipOk) this.failedLogins.release(ipKey);
if (ipEmailOk) this.failedLogins.release(ipEmailKey);
if (emailOk) this.failedLogins.release(emailKey);
throttled = true;
} else {
try {
user = await this.authService.verifyUserCredentials(
{ email: basic.email, password: basic.password },
workspaceId,
);
credentialsValid = true;
// Success: clear the per-IP and per-IP+email budgets fully; for the
// GLOBAL per-email key only release the one increment THIS request took
// (do not reset() it, or a victim's own success would wipe a parallel
// attacker's accumulated failures for that email — same rule as /mcp).
this.failedLogins.reset(ipKey);
this.failedLogins.reset(ipEmailKey);
this.failedLogins.release(emailKey);
} catch (err) {
// Only a genuine credentials failure (wrong email/password) keeps the
// reservation (it IS the recorded failure). Any other error — DB error,
// etc. — is NOT a password-guess signal, so release the reservation so
// it cannot burn a victim's limiter budget. credentialsValid stays
// false either way (the gate then 401s).
if (!isCredentialsFailure(err)) {
this.failedLogins.release(ipKey);
this.failedLogins.release(ipEmailKey);
this.failedLogins.release(emailKey);
}
if (!(err instanceof UnauthorizedException)) {
// A non-credential failure (e.g. DB error): treat as invalid creds
// for the gate (a 401), and log without leaking the password/header.
this.logger.warn(
`git-http: credential check error: ${
err instanceof Error ? err.message : String(err)
}`,
);
}
credentialsValid = false;
}
credentialsValid = false;
}
}
// Brute-force throttle tripped: reject BEFORE the gate (and before any space
// lookup), so a throttled attacker gets a uniform 429 with no bcrypt and no
// existence signal. WWW-Authenticate is still sent so a legitimate client
// re-prompts after the window.
if (throttled) {
reply
.header('WWW-Authenticate', WWW_AUTHENTICATE)
.header('Retry-After', '60')
.status(429)
.send('Too many failed authentication attempts. Try again later.');
return;
}
// --- resolve the space + per-space gating + CASL ------------------------
let spaceExists = false;
let spaceGitSyncEnabled = false;

View File

@@ -32,8 +32,17 @@ interface PageEventLike {
* it to avoid a write -> event -> sync echo. The guard ALWAYS runs (the page row
* is fetched for every event, structural ones included). This is the cheap first
* guard; the full bodyHash + updatedAt loop-guard (consuming the push side's
* `PushedPageRecord`) is a later hardening step — noted, not built
* here. The poll-safety interval still converges anything this guard drops.
* `PushedPageRecord`) is a later hardening step — noted, not built here.
*
* KNOWN OVER-SKIP (latency, NOT data loss): the guard keys ONLY on
* `lastUpdatedSource`, and a user MOVE / RENAME / DELETE does NOT change that
* column (only body writes stamp it). So a genuine user move/rename/delete of a
* page whose BODY was last written by git-sync still reads
* `lastUpdatedSource === 'git-sync'` and is dropped on this fast debounced path.
* No change is lost: the poll-safety interval (~GIT_SYNC_POLL_INTERVAL_MS, default
* 15s) re-enumerates the space and reconciles it — the only cost is up to one poll
* interval of extra latency before that structural change reaches git. The
* bodyHash+updatedAt loop-guard above would close this gap precisely.
*/
@Injectable()
export class PageChangeListener implements OnModuleDestroy {
@@ -73,7 +82,10 @@ export class PageChangeListener implements OnModuleDestroy {
if (!page) return;
// Loop-guard: skip our own writes to avoid a write -> event -> sync echo
// (best-effort). Applies unconditionally now.
// (best-effort). Applies unconditionally now. NOTE this also over-skips a
// user move/rename/delete of a page whose BODY was last written by git-sync
// (those structural ops don't touch lastUpdatedSource) — that change is not
// lost, just deferred to the ~15s poll backstop (see class docstring).
if (page.lastUpdatedSource === 'git-sync') return;
// Prefer ids carried on the event; fall back to the row we already fetched.

View File

@@ -314,6 +314,40 @@ describe('GitSyncOrchestrator', () => {
expect(deps.settings.autoMergeConflicts).toBe(false);
});
it("escalates a divergent-`docmost` push refusal to WARN and surfaces the flag in the status", async () => {
const built = build();
const warnSpy = jest
.spyOn(Logger.prototype, 'warn')
.mockImplementation(() => undefined);
// The engine refused to fast-forward a divergent `docmost` mirror (§5).
runCycleMock.mockResolvedValue({ ...OK_CYCLE, divergentDocmost: true });
const res = await built.orchestrator.runOnce('space-1', 'ws-1');
// The flag is surfaced in the returned status (consumable by /status).
expect(res.divergentDocmost).toBe(true);
// And escalated from the engine's info `log` to a WARN naming the space.
expect(warnSpy).toHaveBeenCalledWith(
expect.stringContaining('DIVERGENT'),
);
expect(warnSpy).toHaveBeenCalledWith(expect.stringContaining('space-1'));
});
it("does NOT warn when the cycle is clean (divergentDocmost falsy)", async () => {
const built = build();
const warnSpy = jest
.spyOn(Logger.prototype, 'warn')
.mockImplementation(() => undefined);
runCycleMock.mockResolvedValue(OK_CYCLE);
const res = await built.orchestrator.runOnce('space-1', 'ws-1');
expect(res.divergentDocmost).toBeUndefined();
expect(warnSpy).not.toHaveBeenCalledWith(
expect.stringContaining('DIVERGENT'),
);
});
it("surfaces the engine's skipped status (e.g. merge-in-progress) verbatim", async () => {
const built = build();
runCycleMock.mockResolvedValue({ ran: false, skipped: 'merge-in-progress' });
@@ -461,6 +495,40 @@ describe('GitSyncOrchestrator', () => {
expect(runOnce).toHaveBeenNthCalledWith(2, 'space-2', 'ws-2');
});
it('skips an overlapping tick while a previous pass is still in flight (re-entrancy guard)', async () => {
const built = build();
let release!: () => void;
const gate = new Promise<void>((resolve) => {
release = resolve;
});
// Stall the first pass inside enabledSpaces so a second tick fires while it
// is still running.
const enabledSpy = jest
.spyOn(built.orchestrator as any, 'enabledSpaces')
.mockImplementation(async () => {
await gate;
return [{ spaceId: 'space-1', workspaceId: 'ws-1' }];
});
const runOnce = jest
.spyOn(built.orchestrator, 'runOnce')
.mockResolvedValue({ spaceId: 'space-1', ran: true });
const first = (built.orchestrator as any).pollTick();
await Promise.resolve(); // let the first pass set polling=true + await gate
// A second tick during the first must be skipped: it never even enumerates.
await (built.orchestrator as any).pollTick();
expect(enabledSpy).toHaveBeenCalledTimes(1);
release();
await first;
expect(runOnce).toHaveBeenCalledTimes(1);
// After the first pass cleared the flag, a fresh tick runs normally.
await (built.orchestrator as any).pollTick();
expect(enabledSpy).toHaveBeenCalledTimes(2);
});
it('does NOT throw and runs nothing when the enabled-spaces query throws (try/catch backstop)', async () => {
jest.spyOn(Logger.prototype, 'error').mockImplementation(() => undefined);
const built = build();

View File

@@ -384,28 +384,45 @@ export class GitSyncOrchestrator implements OnModuleInit, OnModuleDestroy {
}
}
/** True while a pollTick pass is in flight (re-entrancy guard). */
private polling = false;
/**
* One poll tick: catches events missed by the listener and reconciles after
* downtime. Gated on GIT_SYNC_ENABLED (defensive — the interval is only
* registered when enabled). Each enabled space runs under its own lock
* (overlaps skipped). Never throws (runOnce swallows per-space errors).
*
* Re-entrancy guard: a batch of cycles can take LONGER than the poll interval
* (many spaces, slow pushes), so the next interval tick could fire while this
* pass is still running. The per-space lock already prevents overlapping cycles
* for one space, but an overlapping tick still re-runs enabledSpaces() and
* redundant per-space lock attempts for every space. The `polling` flag skips a
* tick while one is already in flight; it is in-process only (each replica
* guards its own ticks — cross-replica overlap is handled by the Redis lock).
*/
private async pollTick(): Promise<void> {
if (!this.environmentService.isGitSyncEnabled()) return;
let spaces: EnabledSpace[];
if (this.polling) return;
this.polling = true;
try {
spaces = await this.enabledSpaces();
} catch (err) {
this.logger.error(
`git-sync: failed to enumerate enabled spaces: ${
err instanceof Error ? err.message : String(err)
}`,
);
return;
}
for (const { spaceId, workspaceId } of spaces) {
// runOnce never throws; a per-space error is logged and returned in status.
await this.runOnce(spaceId, workspaceId);
let spaces: EnabledSpace[];
try {
spaces = await this.enabledSpaces();
} catch (err) {
this.logger.error(
`git-sync: failed to enumerate enabled spaces: ${
err instanceof Error ? err.message : String(err)
}`,
);
return;
}
for (const { spaceId, workspaceId } of spaces) {
// runOnce never throws; a per-space error is logged and returned in status.
await this.runOnce(spaceId, workspaceId);
}
} finally {
this.polling = false;
}
}
}

View File

@@ -205,6 +205,14 @@ describe('GitmostDataSourceService', () => {
content: { type: 'doc', content: [] },
});
});
it('throws NotFound when the page does not exist', async () => {
const { service, mocks } = build();
mocks.pageRepo.findById.mockResolvedValue(undefined);
await expect(service.bind(CTX).getPageJson('gone')).rejects.toThrow(
/not found/i,
);
});
});
describe('importPageMarkdown', () => {
@@ -236,6 +244,20 @@ describe('GitmostDataSourceService', () => {
expect(res.updatedAt).toBe('2026-06-20T11:00:00.000Z');
});
it('returns updatedAt:undefined when the page row is gone after the write (stale-read branch)', async () => {
// writeBody succeeds, but the post-write findById returns nothing (e.g. the
// page was concurrently hard-deleted) -> the optional updatedAt is omitted.
const { service, mocks } = build();
mocks.pageRepo.findById.mockResolvedValue(undefined);
const res = await service
.bind(CTX)
.importPageMarkdown('p1', '# Hello\n\nworld');
expect(mocks.collabGateway.writePageBody).toHaveBeenCalledTimes(1);
expect(res.updatedAt).toBeUndefined();
});
// The 2-way path (no base) is covered above; this exercises the THREE-WAY
// branch that only fires when a `baseMarkdown` is supplied (review #5). The
// merge dispatch itself now lives in the collab handler (gitSyncWriteBody);
@@ -295,6 +317,20 @@ describe('GitmostDataSourceService', () => {
updatedAt: '2026-06-20T12:00:00.000Z',
});
});
it('returns updatedAt:undefined when the fresh page row is missing after create', async () => {
const { service, mocks } = build();
mocks.pageService.create.mockResolvedValue({ id: 'new-id' });
// The post-create findById returns nothing -> the optional updatedAt is
// omitted (the id is still returned from create()).
mocks.pageRepo.findById.mockResolvedValue(undefined);
const res = await service
.bind(CTX)
.createPage('Title', 'body md', 'space-1');
expect(res).toEqual({ data: { id: 'new-id' }, updatedAt: undefined });
});
});
describe('deletePage', () => {
@@ -348,6 +384,15 @@ describe('GitmostDataSourceService', () => {
// db not consulted for a supplied position.
expect(mocks.db.selectFrom).not.toHaveBeenCalled();
});
it('throws NotFound and moves nothing when the page does not exist', async () => {
const { service, mocks } = build();
mocks.pageRepo.findById.mockResolvedValue(undefined);
await expect(
service.bind(CTX).movePage('gone', 'parent-1'),
).rejects.toThrow(/not found/i);
expect(mocks.pageService.movePage).not.toHaveBeenCalled();
});
});
describe('renamePage', () => {
@@ -364,6 +409,15 @@ describe('GitmostDataSourceService', () => {
expect(user).toEqual({ id: 'svc-user' });
expect(provenance).toEqual({ actor: 'git-sync', aiChatId: null });
});
it('throws NotFound and renames nothing when the page does not exist', async () => {
const { service, mocks } = build();
mocks.pageRepo.findById.mockResolvedValue(undefined);
await expect(
service.bind(CTX).renamePage('gone', 'whatever'),
).rejects.toThrow(/not found/i);
expect(mocks.pageService.update).not.toHaveBeenCalled();
});
});
describe('restorePage', () => {

View File

@@ -26,6 +26,14 @@ import * as editorExt from "@docmost/editor-ext";
// node/mark TYPE goes unmirrored. StarterKit-provided types (paragraph, bold,
// heading, …) are contributed by @tiptap/starter-kit in the mirror rather than
// by editor-ext, so they are naturally covered by the mirror's superset.
//
// NOT COVERED here (deferred): (1) the THIRD copy in `packages/mcp` — a separate
// package guarded by its own surface snapshot; (2) attribute *behaviour* drift,
// e.g. the details `open` attr read via getAttribute vs hasAttribute (PR #119
// review #2) — a name-level compare cannot see parseHTML/renderHTML differences.
// Mechanically guarding behavioural parity across all THREE copies needs the
// single framework-free "schema core" refactor (deferred — see AGENTS.md); until
// then each copy's header carries the manual keep-in-sync requirement.
/** Tiptap Node/Mark instances expose a `.name` and a `.type` of 'node'|'mark'. */
function isTiptapNodeOrMark(

View File

@@ -6,6 +6,16 @@
* (node ids, image sizing, link targets). Every code path that converts
* to or from ProseMirror JSON must use THIS set, otherwise a round-trip
* loses content.
*
* PROVENANCE / KEEP IN SYNC: this is ONE of THREE hand-synced copies of the
* canonical Docmost document schema — `@docmost/editor-ext` is canonical, plus
* this `packages/mcp` mirror and the `packages/git-sync` mirror. The node / mark /
* attribute surface (AND the attribute parseHTML/renderHTML behaviour, e.g. the
* details `open` boolean read via hasAttribute, not getAttribute) MUST be kept in
* lockstep across all three: a divergence silently degrades a round-trip (data
* loss). There is no mechanical cross-copy behavioural guard yet — the long-term
* fix is a single framework-free "schema core" both mirrors import (deferred,
* see the PR #119 review / AGENTS.md). Until then, sync by hand on every change.
*/
import StarterKit from "@tiptap/starter-kit";
import Image from "@tiptap/extension-image";
@@ -512,7 +522,11 @@ const Details = Node.create({
return {
open: {
default: false,
parseHTML: (el: HTMLElement) => el.getAttribute("open"),
// Mirror the canon (@docmost/editor-ext details.ts:42) + the git-sync
// copy: a bare `<details open>` has an empty-string `open` attribute, so
// getAttribute("open") returns "" (falsy) and dropped the open state;
// hasAttribute is the correct boolean read. Keep the three copies in sync.
parseHTML: (el: HTMLElement) => el.hasAttribute("open"),
renderHTML: (attrs: Record<string, any>) =>
attrs.open ? { open: "" } : {},
},