fix(git-sync): self-heal a stale .git lock that wedged a space forever (D3-N3)
An interrupted git operation (a hard crash / OOM-kill / abrupt container stop mid
`git add`/`commit`/`checkout`) leaves a `.git/index.lock` (or a ref `*.lock`).
Git then refuses EVERY subsequent operation ("Unable to create '…/index.lock':
File exists"), so every poll cycle failed and the space's sync wedged INDEFINITELY
with no self-heal — the whole space stopped syncing until a human ran `rm` on the
lock (found via web-test restart/corruption charter, reproduced deterministically).
The daemon holds the per-space Redis lock and is the vault's ONLY writer, so any
`*.lock` reaching a fresh cycle is necessarily stale (no live git process holds it).
Add `VaultGit.clearStaleGitLocks()` and call it in the cycle preflight, right after
ensureRepo and before the mid-merge recovery — clearing index/HEAD/config/packed-refs/
MERGE_HEAD/ORIG_HEAD and the engine's ref locks (best-effort, missing = no-op).
Verified on the stand: a planted stale index.lock is now cleared and the space
recovers (edit reaches the vault, 0 "File exists" errors) — was wedged forever.
Unit test (real temp repo: index.lock blocks git add -> clear -> git add works);
git-sync suite green (707).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -116,6 +116,15 @@ export async function runCycle(deps: RunCycleDeps): Promise<RunCycleResult> {
|
||||
await vault.assertGitAvailable();
|
||||
await vault.ensureRepo();
|
||||
|
||||
// 1b. CLEAR stale git lock files left by an interrupted git op (bug D3-N3). A
|
||||
// hard crash / OOM-kill / abrupt container stop mid `git add`/`commit`/
|
||||
// `checkout` leaves a `.git/index.lock` (or a ref `*.lock`); git then refuses
|
||||
// every later op ("Unable to create '…/index.lock': File exists"), wedging the
|
||||
// space forever with no self-heal. The daemon holds the per-space Redis lock
|
||||
// and is the vault's only writer, so any leftover lock here is stale — remove
|
||||
// it before the merge check + any checkout/diff below.
|
||||
await vault.clearStaleGitLocks();
|
||||
|
||||
// 2. RECOVER from a vault left mid-merge by a PRIOR cycle (SPEC §9 wedge fix).
|
||||
// A leftover merge used to WEDGE THE WHOLE SPACE: this check returned
|
||||
// `skipped: "merge-in-progress"` so EVERY later cycle skipped the entire
|
||||
|
||||
@@ -19,7 +19,7 @@
|
||||
* - "nothing to commit" is treated as a graceful no-op, not an error.
|
||||
*/
|
||||
import { execFile } from "node:child_process";
|
||||
import { mkdir } from "node:fs/promises";
|
||||
import { mkdir, rm } from "node:fs/promises";
|
||||
import { promisify } from "node:util";
|
||||
|
||||
const execFileAsync = promisify(execFile);
|
||||
@@ -322,6 +322,38 @@ export class VaultGit {
|
||||
* failure deep inside `checkout`. This is what makes re-runs converge
|
||||
* (resumability, SPEC §12).
|
||||
*/
|
||||
/**
|
||||
* Remove STALE git lock files left by an INTERRUPTED git operation (a hard
|
||||
* crash / OOM-kill / abrupt container stop mid `git add`/`commit`/`checkout`
|
||||
* leaves `.git/index.lock`; interrupted ref updates leave `*.lock` files). Git
|
||||
* then refuses EVERY subsequent operation ("Unable to create '…/index.lock':
|
||||
* File exists"), which WEDGES the space's sync loop indefinitely with no
|
||||
* self-heal (bug D3-N3). The daemon holds the per-space Redis lock and is the
|
||||
* vault's ONLY writer, so any leftover `*.lock` reaching a fresh cycle is
|
||||
* necessarily stale (no live git process holds it) — clear them best-effort in
|
||||
* the cycle preflight, alongside the mid-merge recovery. Missing files are a
|
||||
* no-op (`force: true`).
|
||||
*/
|
||||
async clearStaleGitLocks(): Promise<void> {
|
||||
const gitDir = `${this.vaultPath}/.git`;
|
||||
const locks = [
|
||||
"index.lock",
|
||||
"HEAD.lock",
|
||||
"config.lock",
|
||||
"packed-refs.lock",
|
||||
"MERGE_HEAD.lock",
|
||||
"ORIG_HEAD.lock",
|
||||
"refs/heads/main.lock",
|
||||
"refs/heads/docmost.lock",
|
||||
"refs/docmost/last-pushed.lock",
|
||||
];
|
||||
await Promise.all(
|
||||
locks.map((rel) =>
|
||||
rm(`${gitDir}/${rel}`, { force: true }).catch(() => undefined),
|
||||
),
|
||||
);
|
||||
}
|
||||
|
||||
async isMergeInProgress(): Promise<boolean> {
|
||||
// MERGE_HEAD exists exactly while a merge is in progress.
|
||||
const mergeHead = await this.runRaw([
|
||||
|
||||
Reference in New Issue
Block a user