fix(git-sync): push 503 starvation + concurrent-edit marker leak/silent loss

Bug #1 (push 503 starvation): an external receive-pack that briefly overlapped
a poll cycle immediately 503'd because the per-space single-writer lock was
held. Add a BOUNDED retry-acquire on the PUSH path only (SpaceLockService
.withSpaceLock acquireRetry: capped exponential backoff up to ~5s); a transient
overlap now waits and succeeds, a genuinely stuck cycle still 503s after the
bound. The poll cycle passes no retry (immediate skip). Push result stays
deterministic: the receive-pack only runs once the lock is held, so a 503 never
leaves a half-applied ref.

Bug #2 (concurrent-edit marker leak + silent same-block loss):
- Marker leak (a): the push UPDATE path stripped markers for the body sent to
  Docmost but left raw <<<<<<</>>>>>>> committed on the published `main` vault
  forever (autoMergeConflicts ON). Now the cleaned body is written back to the
  vault file + recorded in writtenBack so runPush commits it on `main` and the
  vault converges to clean bytes.
- Marker leak (b): pin merge.conflictStyle=merge in ensureRepo and teach
  stripConflictMarkers/hasConflictMarkers about the diff3 `|||||||` base section
  (drop the marker AND the stale base region) so diff3/zdiff3 conflicts can
  never leak `|||||||` + base content into a page. Also scrub the 3-way merge
  BASE markdown.
- Silent same-block loss: the block 3-way merge still resolves same-block
  conflicts deterministically to git, but it is no longer silent: diff3Plan now
  reports a conflict count (mergeXmlFragments3WayWithStats), gitSyncWriteBody
  logs it, and the persistence boundary-snapshot now fires for git-sync writes
  over a non-git-sync baseline so the human's pre-merge content is preserved in
  page history (recoverable). Full both-preserved persisted-conflict UI remains
  the deferred redesign.

Tests: space-lock bounded-retry (success/stuck/poll-immediate); push vault-clean
+ diff3 |||||||  strip; ensureRepo conflictStyle pin; diff3Plan/3-way conflict
counts; persistence git-sync boundary snapshot. Server tsc clean; git-sync
vitest + server collaboration/git-sync jest all green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
claude code agent 227
2026-06-28 20:03:21 +03:00
parent 906733b5c8
commit b7e5cb6970
15 changed files with 567 additions and 77 deletions

View File

@@ -39,3 +39,24 @@ export const GIT_SYNC_LOCK_PREFIX = 'git-sync:lock:';
* and the Redis lock prevents two instances racing the same space.
*/
export const GIT_SYNC_LOCK_TTL_MS = 5 * 60 * 1000;
/**
* Bounded retry budget for ACQUIRING the per-space lock on the PUSH (external
* receive-pack) path. The poll cycle holds the single-writer lock while it
* processes a whole space, so a legitimate `git push` that arrives during a
* cycle would otherwise IMMEDIATELY 503 (GitSyncLockHeldError) even though the
* cycle is about to release the lock in well under a second for most spaces.
* Under continuous polling that made a majority of pushes 503 non-
* deterministically. So the push path retries the acquire with a small capped
* backoff for up to ~`TOTAL_MS` BEFORE giving up — a transient overlap with a
* cycle no longer fails the push, while a genuinely stuck/long cycle still
* surfaces a 503 after the bound (git then retries the whole push, which is
* safe: the receive-pack only runs ONCE the lock is held, so a 503 never leaves
* a half-applied ref). The POLL cycle itself does NOT retry (it just skips and
* the next tick reconciles), so this is push-only — the smaller blast radius.
*/
export const GIT_SYNC_PUSH_LOCK_RETRY_TOTAL_MS = 5_000;
/** First backoff between push lock-acquire attempts (ms); doubles, capped. */
export const GIT_SYNC_PUSH_LOCK_RETRY_BASE_MS = 100;
/** Cap on the per-attempt push lock-acquire backoff (ms). */
export const GIT_SYNC_PUSH_LOCK_RETRY_MAX_MS = 500;