fix(db): миграции «задним числом» из долгоживущих веток не роняют старт — CI-гейт + allowUnorderedMigrations (#363, инцидент #361) #365

Open
agent_coder wants to merge 3 commits from fix/363-migration-order into develop
4 changed files with 61 additions and 1 deletions
+43
View File
@@ -13,6 +13,49 @@ permissions:
contents: read
jobs:
# Guard against a long-lived branch adding a migration whose timestamped
# filename sorts BEFORE migrations already applied on the target branch (and
# thus in prod). The Kysely startup migrator rejects that as "corrupted
# migrations" and crash-loops the app on boot (incident #361). This gate fails
# the PR so the migration is renamed to a current timestamp before merge. Only
# runs for pull_request events (needs a base branch to diff against).
migration-order:
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest
timeout-minutes: 5
steps:
- name: Checkout (full history for the base-branch diff)
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Added migrations must sort after the newest on the base branch
env:
TARGET_BRANCH: ${{ github.base_ref }}
run: |
set -euo pipefail
MIG_DIR="apps/server/src/database/migrations"
# checkout above already did fetch-depth:0 (full history). Fetch the base
# WITHOUT --depth (a shallow graft would truncate the base history and
# break the merge-base when the base has moved ahead of the PR merge —
# exactly the long-branch-vs-moving-base case this gate guards, #361).
git fetch --no-tags origin "$TARGET_BRANCH"
newest_on_target=$(git ls-tree -r --name-only "origin/${TARGET_BRANCH}" "$MIG_DIR" | sort | tail -1)
# NO `|| true`: a diff failure (e.g. an unresolved merge-base) must fail
# the job CLOSED — a gate whose job is to BLOCK must never pass on error.
# `set -e` above already aborts on a non-zero diff exit.
added=$(git diff --diff-filter=A --name-only "origin/${TARGET_BRANCH}...HEAD" -- "$MIG_DIR")
bad=0
for f in $added; do
if [[ "$f" < "$newest_on_target" || "$f" == "$newest_on_target" ]]; then
echo "::error::Migration $f sorts at or before the newest on ${TARGET_BRANCH} ($newest_on_target) — rename it with a CURRENT timestamp before merge (do not change its contents). See incident #361."
bad=1
fi
done
if [ "$bad" -eq 0 ]; then
echo "Migration order OK (added migrations all sort after $newest_on_target)."
fi
exit $bad
test:
runs-on: ubuntu-latest
timeout-minutes: 20
+4 -1
View File
@@ -250,7 +250,10 @@ pnpm --filter server migration:codegen # regenerate src/databa
```
Migration files live in `apps/server/src/database/migrations/` and are named `YYYYMMDDThhmmss-description.ts`. Fork-specific migrations only **add** tables (`page_embeddings`, `ai_chats`, `ai_chat_messages`, `ai_provider_credentials`, `ai_mcp_servers`, `page_template_references`) and columns (e.g. `pages.is_template`, a `NOT NULL DEFAULT false` boolean) — never drop/rewrite Docmost data.
**Migration ordering — always check when merging branches/features.** Kysely runs migrations in **alphabetical (= timestamp) order** and refuses to start if a *new* migration sorts **before** one already applied to the DB (`corrupted migrations: ... must always have a name that comes alphabetically after the last executed migration`). When you merge a branch or land a feature, verify your migration's timestamp still sorts **after every migration that may already be applied on the target** (`/bin/ls -1 apps/server/src/database/migrations | sort | tail`). Branches developed in parallel routinely break this: a feature branch adds `…T130000-…`, `main` meanwhile ships and deploys `…T150000-…`, and after the merge the older-timestamped file is rejected at boot. **Fix = rename your migration to a timestamp after the latest one already in the target** (content unchanged — the filename is the ordering key), then rebuild so the compiled `dist/database/migrations/` picks up the new name.
**Migration ordering — always check when merging branches/features.** Kysely runs migrations in **alphabetical (= timestamp) order**. A *new* migration that sorts **before** one already applied to the DB is a "back-dated" migration, which branches developed in parallel routinely produce: a feature branch adds `…T130000-…`, `develop` meanwhile ships and deploys `…T150000-…`, and after the merge the older-timestamped file has been skipped. Two layers guard this (both added for incident #361, where a back-dated migration crash-looped prod for ~11 min):
- **CI gate (primary):** the `migration-order` job in `.github/workflows/test.yml` fails a PR whose added migration sorts at/before the newest on the base branch. **So the fix is to rename your migration to a timestamp after the latest one already in the target** (`/bin/ls -1 apps/server/src/database/migrations | sort | tail`; content unchanged — the filename is the ordering key), then rebuild so the compiled `dist/database/migrations/` picks up the new name.
- **Runtime safety net:** both Migrators (`migration.service.ts` startup auto-migrate + `migrate.ts` CLI) set `allowUnorderedMigrations: true`, so the app does **not** refuse to start on an out-of-order migration — it applies the skipped older one instead of crash-looping. Kysely's `#ensureNoMissingMigrations` guard is still on (a *removed* applied migration is still an error). Because apply order can then differ from lexicographic across instances, migrations must stay **independent** (each creates its own objects) — the CI gate remains the primary line; this net only covers a gate bypass (manual push / hotfix branch).
## Architecture — the big picture
+4
View File
@@ -24,6 +24,10 @@ const migrator = new Migrator({
path,
migrationFolder,
}),
// Match the startup auto-migrator (migration.service.ts): a back-dated
// migration from a long-lived branch must be applied, not rejected as
// "corrupted migrations" (incident #361). See that file for the full rationale.
allowUnorderedMigrations: true,
});
run(db, migrator, migrationFolder);
@@ -19,6 +19,16 @@ export class MigrationService {
path,
migrationFolder: path.join(__dirname, '..', 'migrations'),
}),
// A long-lived branch can add a migration whose timestamped filename sorts
// BEFORE migrations already applied in prod (e.g. #234's 20260627 landing
// after 20260704 was live). With the default (ordered) setting the startup
// migrator then sees "corrupted migrations" — the applied set is no longer a
// prefix of the sorted list — throws, and the app crash-loops on boot
// (incident #361: 502s for ~11 min). allowUnorderedMigrations runs any
// not-yet-applied migration regardless of filename order, so a back-dated
// migration is applied instead of bricking startup. A CI order-gate still
// discourages back-dating; this is the runtime safety net.
allowUnorderedMigrations: true,
});
const { error, results } = await migrator.migrateToLatest();