fix(git-sync): address PR #119 review (#1571)

Resolve the code-review findings from comment #1571 on PR #119.

Engine (packages/git-sync):
- Idempotent CREATE on retry: before createPage, look the page up in the
  live Docmost tree by (parentPageId, title) and ADOPT it instead of
  duplicating when a prior cycle created it but failed to persist the
  pageId back to disk. Only trust a COMPLETE tree for the lookup; fall
  back to createPage otherwise. Covered by new tests incl. a complete=false
  regression-lock.
- Route applyPullActions diagnostics through an injected logger instead of
  bare console (thread log from the cycle).
- Add a timeout to the git execFile chokepoint (runRaw) so a hung git
  subprocess cannot wedge a sync cycle.
- Translate remaining Russian code comments to English.
- Remove dead standalone-CLI code (parseArgs/PushParsedArgs,
  parseSettings/envSchema, loadSettingsOrExit + config-errors.ts) and the
  matching index exports/specs; keep the Settings type.
- Fix the dangling docs link in package.json.
- Add a schema-surface snapshot guard so any drift in the vendored
  document schema is a loud, must-review CI failure (+ provenance header).

Server (apps/server):
- Add a configurable watchdog timeout to the spawned git http-backend so a
  stalled push cannot hold the per-space lock forever
  (GIT_SYNC_BACKEND_TIMEOUT_MS).
- Close the in-process TOCTOU window in SpaceLockService.withSpaceLock by
  reserving the slot synchronously before acquire.
- Add tests: removePage git-sync provenance (both branches), ensureServable
  force-push-protection git configs, and the phase-B+ datasource methods.

Docs / build:
- AGENTS.md: list git-sync as the fifth workspace package and note the
  three schema mirrors; fix the dangling git-sync-plan.md backlog link.
- pnpm-lock.yaml: add the missing @docmost/git-sync workspace link so
  pnpm install --frozen-lockfile (CI default) succeeds.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
claude_code
2026-06-26 00:06:44 +03:00
committed by claude code agent 227
parent ad0933ecc9
commit 445363b07b
31 changed files with 767 additions and 462 deletions

View File

@@ -1,46 +0,0 @@
import { ZodError } from 'zod';
// Turn a ZodError from settings validation into a clear, actionable startup
// message that names the offending env var(s), then exit(1) — no raw stack
// trace. Mirrors the Python new-project skeleton's load_settings_or_exit.
// A non-ZodError is left to propagate unchanged.
export function loadSettingsOrExit<T>(factory: () => T): T {
try {
return factory();
} catch (err) {
if (!(err instanceof ZodError)) throw err;
const missing: string[] = [];
const invalid: string[] = [];
for (const issue of err.issues) {
const name = issue.path.length ? String(issue.path[0]) : '?';
// A missing required variable surfaces as an `invalid_type` issue whose
// received value was `undefined`. zod 3 exposed `issue.received` directly;
// zod 4 dropped that field and instead folds it into the message
// ("expected string, received undefined"). Detect both shapes so the
// missing-vs-invalid split holds across zod majors. NOTE: an invalid (but
// present) value uses a different code (invalid_format / invalid_value) or
// an `invalid_type` message that reports a non-undefined received (e.g.
// "received NaN" from a coerced number), so neither is misread as missing.
const i = issue as { received?: unknown; message?: string };
const isMissing =
issue.code === 'invalid_type' &&
(i.received === 'undefined' ||
/received undefined/i.test(i.message ?? ''));
if (isMissing) missing.push(name);
else invalid.push(`${name}: ${issue.message}`);
}
const lines = ['Configuration error in environment / .env:'];
if (missing.length) {
lines.push(' Missing required variable(s):');
for (const n of [...new Set(missing)]) lines.push(` - ${n}`);
}
if (invalid.length) {
lines.push(' Invalid value(s):');
for (const item of invalid) lines.push(` - ${item}`);
}
lines.push('');
lines.push('Set them in .env (see .env.example) and try again.');
process.stderr.write(lines.join('\n') + '\n');
process.exit(1);
}
}

View File

@@ -114,6 +114,7 @@ export async function runCycle(deps: RunCycleDeps): Promise<RunCycleResult> {
writeFile: (absPath, text) => fs.writeFile(absPath, text),
mkdir: (absDir) => fs.mkdir(absDir),
rm: (absPath) => fs.rm(absPath),
log,
},
pullActions,
vaultRoot,

View File

@@ -24,6 +24,12 @@ import { promisify } from "node:util";
const execFileAsync = promisify(execFile);
// Safety net: kill a hung git subprocess. This engine performs only LOCAL git
// operations (no network pushes), so a legitimate call never approaches this
// bound; it only prevents an indefinitely-stuck subprocess from wedging a sync
// cycle (the same risk the http-backend watchdog guards on the server side).
const GIT_EXEC_TIMEOUT_MS = 120_000;
/** Bot identity used for engine-authored vault commits (SPEC §7.3). */
export const BOT_AUTHOR_NAME = "Docmost Sync";
export const BOT_AUTHOR_EMAIL = "docmost-sync@local";
@@ -32,7 +38,7 @@ export const BOT_AUTHOR_EMAIL = "docmost-sync@local";
export const DEFAULT_BRANCH = "main";
/**
* One row of `git diff --name-status` (SPEC §6 "ФС → Docmost"). `status` is the
* One row of `git diff --name-status` (SPEC §6 "FS -> Docmost"). `status` is the
* single-letter change code (`-M` rename detection on), `path` is the (new) file
* path; for a rename/copy (`R`/`C`) `oldPath` is the source and `path` is the
* destination, with `score` carrying git's similarity index (0–100).
@@ -146,6 +152,7 @@ export class VaultGit {
// can be sizable.
...(cwd !== undefined ? { cwd } : {}),
maxBuffer: 64 * 1024 * 1024,
timeout: GIT_EXEC_TIMEOUT_MS,
env: vaultGitEnv(opts?.env),
},
);
@@ -413,7 +420,7 @@ export class VaultGit {
* the listing, e.g. `"*.md"`.
*
* The target wiki is RUSSIAN, so vault file names routinely contain Cyrillic
* (e.g. `Колонка.md`). With git's DEFAULT `core.quotepath=true`, `ls-files`
* (e.g. `Column.md` in Cyrillic). With git's DEFAULT `core.quotepath=true`, `ls-files`
* returns non-ASCII paths octal-escaped and double-quoted (`"\320\232..."`),
* which `src/pull.ts` `readExisting` would then parse as garbage paths,
* breaking move/duplicate detection. We defeat that two ways at once:
@@ -519,7 +526,7 @@ export class VaultGit {
/**
* Read a ref to its SHA, or `null` if unset. Thin alias over `revParse`,
* named for the push direction's marker `refs/docmost/last-pushed` (SPEC §5:
* "что из `main` уже отражено в Docmost").
* "what of `main` is already reflected in Docmost").
*/
async readRef(ref: string): Promise<string | null> {
return this.revParse(ref);

View File

@@ -13,7 +13,7 @@
import { createHash } from "node:crypto";
/**
* Stable hash of a page's markdown BODY (SPEC §10 "хэш тела"). Deterministic:
* Stable hash of a page's markdown BODY (SPEC §10 "body hash"). Deterministic:
* the same input string always yields the same digest, a different input a
* different one. Used to recognize our own write later (loop suppression).
*

View File

@@ -1,5 +1,5 @@
/**
* Pull cycle — Docmost -> vault (SPEC §6 "Docmost -> ФС").
* Pull cycle — Docmost -> vault (SPEC §6 "Docmost -> FS").
*
* This increment turns the read-only mirror into the git-backed pull cycle:
*
@@ -225,6 +225,11 @@ export interface ApplyPullActionsDeps {
mkdir: (absDir: string) => Promise<void>;
/** Remove a file by ABSOLUTE path (force: a missing file is a no-op). */
rm: (absPath: string) => Promise<void>;
/**
* Injected logger for cycle diagnostics (mirrors the push side). Optional —
* falls back to `console.log` so existing callers stay green.
*/
log?: (line: string) => void;
}
/** Outcome counters from `applyPullActions` (for the summary + tests). */
@@ -259,22 +264,25 @@ export async function applyPullActions(
vaultRoot: string,
): Promise<ApplyResult> {
const { client, git } = deps;
// One channel, mirroring the push side: route every cycle diagnostic through
// the injected logger; fall back to `console.log` when none is supplied.
const log = deps.log ?? ((line: string) => console.log(line));
// Emit the SPEC §8 suppression warnings (preserved from the original `main`).
const decision = actions.deletionDecision;
if (!decision.apply) {
if (decision.reason === "incomplete-fetch") {
console.warn(
log(
"pull: tree fetch incomplete — deletions suppressed this cycle (SPEC §8)",
);
} else if (decision.reason === "empty-live") {
console.warn(
log(
`pull: live fetch returned 0 pages but ${actions.existingCount} file(s) are ` +
`tracked — deletions suppressed this cycle (SPEC §8). Re-run when ` +
`Docmost is reachable.`,
);
} else {
console.warn(
log(
`pull: plan would delete ${actions.plannedDeleteCount} of ${actions.existingCount} ` +
`tracked file(s) (mass-delete guard) — deletions suppressed this ` +
`cycle (SPEC §8). Verify the live Docmost tree, then re-run.`,
@@ -311,14 +319,14 @@ export async function applyPullActions(
} catch (err) {
failed++;
failedPageIds.add(w.pageId);
console.error(
`pull: failed page ${w.pageId}:`,
err instanceof Error ? err.message : String(err),
log(
`pull: failed page ${w.pageId}: ` +
(err instanceof Error ? err.message : String(err)),
);
} finally {
completed++;
if (completed % PROGRESS_EVERY === 0) {
console.log(`pulled ${completed}/${actions.toWrite.length}`);
log(`pulled ${completed}/${actions.toWrite.length}`);
}
}
};
@@ -346,9 +354,9 @@ export async function applyPullActions(
await deps.rm(relToAbs(vaultRoot, rel));
return true;
} catch (err) {
console.error(
`pull: failed to ${what} ${rel}:`,
err instanceof Error ? err.message : String(err),
log(
`pull: failed to ${what} ${rel}: ` +
(err instanceof Error ? err.message : String(err)),
);
return false;
}
@@ -364,7 +372,7 @@ export async function applyPullActions(
for (const m of actions.moved) {
if (!m.removeOldPath) continue;
if (failedPageIds.has(m.pageId)) {
console.warn(
log(
`pull: move write for ${m.pageId} failed — keeping old path ` +
`${m.fromRelPath} (SPEC §8)`,
);
@@ -401,15 +409,15 @@ export async function applyPullActions(
await git.checkout(DEFAULT_BRANCH);
const merge = await git.merge(DOCMOST_BRANCH);
if (merge.conflict) {
console.error(
log(
"pull: merge of docmost -> main CONFLICTED. Conflict markers were left " +
"in the vault for manual resolution (SPEC §9). Nothing is pushed to " +
"Docmost (read-only). Resolve locally, then re-run.",
);
} else if (!merge.ok) {
console.error(`pull: merge of docmost -> main failed: ${merge.output}`);
log(`pull: merge of docmost -> main failed: ${merge.output}`);
}
console.log("pull: git push to remote is DEFERRED in this increment (SPEC §7).");
log("pull: git push to remote is DEFERRED in this increment (SPEC §7).");
return { written, movedApplied, deleted, failed, committed, merge };
}

View File

@@ -1,5 +1,5 @@
/**
* Push cycle — vault -> Docmost (SPEC §6 "ФС → Docmost"), FIRST increment.
* Push cycle — vault -> Docmost (SPEC §6 "FS -> Docmost"), FIRST increment.
*
* This module mirrors the structure of `./pull.ts`: a set of VaultGit diff/ref
* primitives (in `./git.ts`), a PURE planner (`computePushActions`) that turns
@@ -65,7 +65,7 @@ export interface RenameMoveAction {
/**
* A CLASSIFIED rename/move (push #3): a `RenameMoveAction` resolved into the
* Docmost op(s) it actually needs. The file PATH is the source of truth for tree
* position (SPEC §5: "истина связи — pageId, не путь" — the path is COSMETIC and
* position (SPEC §5: "the identity is the pageId, not the path" — the path is COSMETIC and
* LOCAL, the page identity is its pageId), so we compare the RESOLVED parent of
* the new path against the resolved parent of the old path, and the title in the
* current meta against the title in the previous meta. Each sub-op is emitted
@@ -235,7 +235,7 @@ export interface PushActionsInput {
*/
export function computePushActions(input: PushActionsInput): PushActions {
const { metaAt, currentPageIds } = input;
// PAGE-FILE FILTER (design §"Адопция"): only `.md` files OUTSIDE any dot-folder
// PAGE-FILE FILTER (design §"Adoption"): only `.md` files OUTSIDE any dot-folder
// are Docmost pages. `.obsidian/*`, attachments, and other non-page files are
// committed to the vault (no `.gitignore`) and so appear in the diff, but they
// are NEVER pages — Obsidian owns them. Without this filter every ADDED such
@@ -445,6 +445,7 @@ export const DOCMOST_BRANCH = "docmost";
export interface ApplyPushDeps {
client: Pick<
GitSyncClient,
| "listSpaceTree"
| "importPageMarkdown"
| "createPage"
| "deletePage"
@@ -489,7 +490,7 @@ export interface PushedPageRecord {
* exposed one. Absent when the (fake) client did not return it.
*/
updatedAt?: string;
/** Stable hash of the markdown BODY that was pushed (SPEC §10 "хэш тела"). */
/** Stable hash of the markdown BODY that was pushed (SPEC §10 "body hash"). */
bodyHash: string;
}
@@ -675,8 +676,34 @@ export async function applyPushActions(
}
// 2. CREATES — create the page, then write the assigned pageId back to meta so
// the file becomes tracked (SPEC §4 "записать присвоенный pageId обратно").
// the file becomes tracked (SPEC §4 "write the assigned pageId back").
// Isolated per page like updates.
//
// RETRY-ADOPT (#1 idempotency): create is NOT atomic with the pageId write-back
// (createPage runs, then writeFile, then the write-back commit at runPush 7a). If
// the write-back dies in between, the file on disk still has no pageId and the
// next cycle re-classifies it as a CREATE -> a DUPLICATE page would be created.
// To guard against this, build a (parentPageId|root, title) -> existing pageId map
// ONCE from the LIVE Docmost tree (only when there is at least one create). The
// native-Obsidian layout makes filenames — and therefore titles — unique within a
// folder, so (parentPageId, title) identifies the page; a match means a prior
// cycle already created it, so we ADOPT instead of duplicating.
let liveByParentTitle: Map<string, string> | null = null;
if (actions.creates.length > 0) {
const live = await client.listSpaceTree(deps.spaceId);
// Only trust a COMPLETE tree for retry-adopt: a truncated tree could miss an
// already-created page and let us create a DUPLICATE (the very thing adopt
// prevents). The native client always returns complete:true (reads the DB);
// on an incomplete tree we leave the map null -> fall back to plain createPage.
if (live.complete) {
liveByParentTitle = new Map();
for (const n of live.pages) {
const key = `${n.parentPageId ?? " root"} ${n.title ?? ""}`;
// Keep the FIRST node for a key (the layout makes this unique in practice).
if (!liveByParentTitle.has(key)) liveByParentTitle.set(key, n.id);
}
}
}
for (const c of actions.creates) {
try {
const text = await deps.readFile(c.path);
@@ -687,6 +714,26 @@ export async function applyPushActions(
const title = titleFromPath(c.path);
const parentPageId =
(await resolveParentPageIdViaTree(deps, c.path, "current")) ?? undefined;
// Retry-adopt (#1 idempotency): a prior cycle already created this page in
// Docmost but failed to persist the pageId back to the file, so it was
// re-seen as a create. Adopt the existing page instead of duplicating it:
// write the id back (file becomes tracked) and push the body as an UPDATE
// (idempotent — targets by pageId). Do NOT call createPage again.
const adoptKey = `${parentPageId ?? " root"} ${title}`;
const existingId = liveByParentTitle?.get(adoptKey);
if (existingId) {
const rewritten = serializePageFile(existingId, body);
await deps.writeFile(c.path, rewritten);
writtenBack.push({ path: c.path, pageId: existingId });
const adopted = await client.importPageMarkdown(existingId, body, null);
pushed.push({
pageId: existingId,
...extractUpdatedAt(adopted),
bodyHash: bodyHash(body),
});
created++;
continue;
}
const result = await client.createPage(
title,
body,
@@ -896,7 +943,7 @@ export function parentFolderFile(path: string): string | null {
}
/**
* Whether a vault path is a Docmost PAGE file (design §"Адопция"): a `.md` file
* Whether a vault path is a Docmost PAGE file (design §"Adoption"): a `.md` file
* with NO dot-segment anywhere in its path. This excludes `.obsidian/` config,
* `.trash/`, dotfiles (`.foo.md`), and every non-`.md` file (attachments, JSON,
* …) — Obsidian owns those; they live in the vault but are never pages. Used to
@@ -954,7 +1001,7 @@ function nativeMeta(
* then read its `gitmost_id` frontmatter and return that page's pageId. A root-level path
* (no enclosing folder), a missing/unreadable parent file, or a parent file with
* no parseable pageId all resolve to `null` (parent is ROOT / unknown ->
* `parentPageId: null`, SPEC §16 "parentPageId: null -> в корень").
* `parentPageId: null`, SPEC §16 "parentPageId: null -> to root").
*
* The IO is async, so this returns an ASYNC resolver; the call sites prefetch the
* parent pageIds (the classifier itself stays pure/sync over a plain table).
@@ -1112,7 +1159,7 @@ export interface PushRunResult {
}
/**
* Run one FS->Docmost push cycle (SPEC §6 "ФС → Docmost"), DRY-RUN BY DEFAULT.
* Run one FS->Docmost push cycle (SPEC §6 "FS -> Docmost"), DRY-RUN BY DEFAULT.
*
* Steps (mirrors `pull.ts`):
* 1. Preflight git: `assertGitAvailable` + `ensureRepo`; ABORT (clear message +
@@ -1426,17 +1473,3 @@ function logPlan(
for (const s of actions.skipped)
log(` skipped [${s.status}] ${s.path}: ${s.reason}`);
}
/** Parsed `push` CLI flags. DRY-RUN is the default; `--apply` opts into writes. */
export interface PushParsedArgs {
/** True when `--apply` was passed (the ONLY path that writes to Docmost). */
apply: boolean;
}
/**
* Parse the `push` CLI flags. SAFE BY DEFAULT: without `--apply` the run is a
* DRY-RUN (plan only). Exported so the flag handling is unit-testable.
*/
export function parseArgs(argv: string[]): PushParsedArgs {
return { apply: argv.includes("--apply") };
}

View File

@@ -2,41 +2,10 @@
* Engine settings.
*
* The engine is driven IN-PROCESS by the NestJS server, which builds the
* `Settings` object from `EnvironmentService` — so this module must NOT reach
* into `process.env`. It exposes only:
* - the `Settings` type the engine consumes, and
* - `parseSettings(env)` as a PURE function (validate a raw env object -> typed
* `Settings`), kept for unit tests and for the server to reuse if it wants
* to validate an env-shaped object.
* There is no `.env`-loading side-effecting entry point.
* `Settings` object from `EnvironmentService`. This module therefore exposes
* ONLY the `Settings` type the engine consumes — there is no `.env`-loading
* side-effecting entry point and no env-validation here (the server owns that).
*/
import { z } from 'zod';
// Schema keyed by the real ENV variable names so validation errors name the
// exact variable. Credentials and the address of our OWN Docmost instance have
// NO default — a missing value must fail at startup, never silently fall back.
export const envSchema = z.object({
// Docmost connection — address of our own instance, no default.
DOCMOST_API_URL: z.string().url(),
// Credentials for /auth/login — no default, never hardcoded.
DOCMOST_EMAIL: z.string().min(1),
DOCMOST_PASSWORD: z.string().min(1),
// Which Docmost space to mirror.
DOCMOST_SPACE_ID: z.string().min(1),
// Local git vault (state store) — kept under data/ so the volume persists it.
VAULT_PATH: z.string().min(1).default('data/vault'),
// Optional git remote the vault pushes to. Empty string is treated as unset.
GIT_REMOTE: z.preprocess(
(v) => (v === '' ? undefined : v),
z.string().min(1).optional(),
),
// Non-secret tunables — sensible defaults are fine.
POLL_INTERVAL_MS: z.coerce.number().int().positive().default(15000),
DEBOUNCE_MS: z.coerce.number().int().positive().default(2000),
LOG_LEVEL: z.enum(['debug', 'info', 'warn', 'error']).default('info'),
});
export type Settings = {
docmostApiUrl: string;
@@ -49,20 +18,3 @@ export type Settings = {
debounceMs: number;
logLevel: 'debug' | 'info' | 'warn' | 'error';
};
// Pure: validate a raw environment object and map it to a typed Settings.
// Throws ZodError on bad config. No side effects — safe to import in tests.
export function parseSettings(env: NodeJS.ProcessEnv): Settings {
const e = envSchema.parse(env);
return {
docmostApiUrl: e.DOCMOST_API_URL,
docmostEmail: e.DOCMOST_EMAIL,
docmostPassword: e.DOCMOST_PASSWORD,
docmostSpaceId: e.DOCMOST_SPACE_ID,
vaultPath: e.VAULT_PATH,
gitRemote: e.GIT_REMOTE,
pollIntervalMs: e.POLL_INTERVAL_MS,
debounceMs: e.DEBOUNCE_MS,
logLevel: e.LOG_LEVEL,
};
}

View File

@@ -1,5 +1,5 @@
/**
* Normalize-on-write helper (SPEC §11 "Резолюция").
* Normalize-on-write helper (SPEC §11 "Resolution").
*
* git diffs byte-for-byte, so writing a page in a NON-fixpoint markdown form
* would make the next pull re-export it to a slightly different (but stable)

View File

@@ -81,7 +81,6 @@ export {
applyPushActions,
runPush,
parentFolderFile,
parseArgs,
LAST_PUSHED_REF,
DOCMOST_BRANCH,
LOCAL_AUTHOR_NAME,
@@ -106,14 +105,10 @@ export type {
ApplyPushResult,
PushDeps,
PushRunResult,
PushParsedArgs,
} from "./engine/push.js";
export { parseSettings, envSchema } from "./engine/settings.js";
export type { Settings } from "./engine/settings.js";
export { loadSettingsOrExit } from "./engine/config-errors.js";
export { runCycle } from "./engine/cycle.js";
export type {
RunCycleDeps,

View File

@@ -1,6 +1,6 @@
/**
* Semantic canonicalization of ProseMirror/TipTap documents for the round-trip
* idempotency check (SPEC §11, "Задача №0", option (б): compare a CANONICALIZED
* idempotency check (SPEC §11, "Task #0", option (b): compare a CANONICALIZED
* form rather than raw bytes).
*
* `markdownToProseMirror` reconstructs schema DEFAULT attributes (e.g.

View File

@@ -103,8 +103,8 @@ function countUniqueLinks(doc: any): number {
/**
* Parse the ordered list of integers from `[N]` footnote markers found in the
* BODY only (every top-level block before the first "Примечания..." notes
* heading; if no such heading, the whole doc). Returned in reading order.
* BODY only (every top-level block before the first notes heading; if no such
* heading, the whole doc). Returned in reading order.
*/
function footnoteMarkers(doc: any, notesHeading: string): number[] {
const top: any[] = Array.isArray(doc?.content) ? doc.content : [];

View File

@@ -6,6 +6,14 @@
* (node ids, image sizing, link targets). Every code path that converts
* to or from ProseMirror JSON must use THIS set, otherwise a round-trip
* loses content.
*
* PROVENANCE / KEEP IN SYNC: this file is a VENDORED MIRROR of the canonical
* Docmost document schema in `@docmost/editor-ext`. The node/mark/attribute
* surface MUST be kept in sync with editor-ext — anything present there but
* missing here is silently dropped on a round-trip (data loss). The exported
* `docmostExtensions` surface is guarded by `test/schema-surface-snapshot.test.ts`,
* which fails loudly on any drift; when it does, re-verify parity against
* `@docmost/editor-ext` before updating the snapshot.
*/
import StarterKit from "@tiptap/starter-kit";
import Image from "@tiptap/extension-image";