fix(db): миграции «задним числом» из долгоживущих веток не роняют старт — CI-гейт + allowUnorderedMigrations (#363, инцидент #361) #365

Open
agent_coder wants to merge 3 commits from fix/363-migration-order into develop

3 Commits

Author SHA1 Message Date
agent_coder 0050ad7ebb docs(#363 review F2): update AGENTS.md migration-ordering to the new tolerant behavior
The "Migration ordering" section still described the OLD crash-loop-at-boot
behavior this PR removes ("Kysely refuses to start … rejected at boot"). Rewrote
it to the new two-layer model: the CI migration-order gate is the primary defense
(rename to a current timestamp), and the runtime now sets allowUnorderedMigrations
so the app applies a back-dated migration instead of crash-looping (with the note
that #ensureNoMissingMigrations still guards a removed applied migration, and that
migrations must stay independent since apply order can differ across instances).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-05 02:57:11 +03:00
agent_coder 7b4617db70 fix(#363 review F1): make the migration-order gate fail CLOSED (not open)
The CI gate — whose whole job is to BLOCK a back-dated migration — could pass
open in exactly the scenario it guards (a long branch vs a moving base, i.e. #361):

- Dropped the redundant `git fetch --depth=1`: the checkout already did
  fetch-depth:0 (full history), and the shallow graft truncated the BASE history,
  so `merge-base` (thus the three-dot `origin/base...HEAD` diff) failed when the
  base had moved ahead of the PR merge commit.
- Removed `|| true` on the diff: it swallowed that failure → `added` empty → loop
  skipped → bad=0 → gate PASS. Now `set -e` aborts the job (fail CLOSED) on any
  diff error — a gate must never pass on error.

Verified: yaml parses (jobs migration-order, test); a broken-ref diff with set -e
and no `|| true` aborts before bad=0 (fail-closed) instead of passing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-05 02:27:16 +03:00
agent_coder 459d636ffb fix(db): prevent the migration-order crash-loop from long-lived branches (#363, incident #361)
A long-lived branch can add a migration whose timestamped filename sorts BEFORE
migrations already applied in prod (#234's 20260627T130000-ai-chat-runs merged
after 20260704T120000-client-metrics was live). Kysely's migrator with the
default ordered setting then rejects the applied set as "corrupted migrations"
(no longer a prefix of the sorted list), throws, and the app crash-loops on boot
— exactly incident #361 (502s for ~11 min after a develop deploy). #119 and #120
(June branches) are the next such threats.

Two levels, both:
1. CI migration-order gate (a new `migration-order` job in test.yml, PR-only):
   fails the PR when an added migration sorts at/before the newest migration on
   the base branch, with an actionable message to rename it to a current
   timestamp before merge. This is the primary defense — makes back-dating
   impossible to merge accidentally.
2. `allowUnorderedMigrations: true` on BOTH Migrators (migration.service.ts
   startup auto-migrate + migrate.ts CLI): the runtime safety net — Kysely applies
   a not-yet-applied older migration instead of bricking startup, so a back-dated
   migration that bypasses the gate (manual push / hotfix branch) still boots.
   Trade-off documented inline: apply order across instances may differ from
   lexicographic, so migrations must stay independent (ours each create their own
   objects); the CI gate remains the primary line.

Verified: allowUnorderedMigrations is a valid Kysely 0.28.17 Migrator option;
server tsc clean; the gate script rejects a back-dated filename and passes a
current one. No new deps, no migration, no runtime behavior change beyond the
migrator resilience.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-05 01:36:57 +03:00