diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index 7ea47b94..a0943391 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -13,6 +13,49 @@ permissions: contents: read jobs: + # Guard against a long-lived branch adding a migration whose timestamped + # filename sorts BEFORE migrations already applied on the target branch (and + # thus in prod). The Kysely startup migrator rejects that as "corrupted + # migrations" and crash-loops the app on boot (incident #361). This gate fails + # the PR so the migration is renamed to a current timestamp before merge. Only + # runs for pull_request events (needs a base branch to diff against). + migration-order: + if: github.event_name == 'pull_request' + runs-on: ubuntu-latest + timeout-minutes: 5 + steps: + - name: Checkout (full history for the base-branch diff) + uses: actions/checkout@v4 + with: + fetch-depth: 0 + - name: Added migrations must sort after the newest on the base branch + env: + TARGET_BRANCH: ${{ github.base_ref }} + run: | + set -euo pipefail + MIG_DIR="apps/server/src/database/migrations" + # checkout above already did fetch-depth:0 (full history). Fetch the base + # WITHOUT --depth (a shallow graft would truncate the base history and + # break the merge-base when the base has moved ahead of the PR merge — + # exactly the long-branch-vs-moving-base case this gate guards, #361). + git fetch --no-tags origin "$TARGET_BRANCH" + newest_on_target=$(git ls-tree -r --name-only "origin/${TARGET_BRANCH}" "$MIG_DIR" | sort | tail -1) + # NO `|| true`: a diff failure (e.g. an unresolved merge-base) must fail + # the job CLOSED — a gate whose job is to BLOCK must never pass on error. + # `set -e` above already aborts on a non-zero diff exit. + added=$(git diff --diff-filter=A --name-only "origin/${TARGET_BRANCH}...HEAD" -- "$MIG_DIR") + bad=0 + for f in $added; do + if [[ "$f" < "$newest_on_target" || "$f" == "$newest_on_target" ]]; then + echo "::error::Migration $f sorts at or before the newest on ${TARGET_BRANCH} ($newest_on_target) — rename it with a CURRENT timestamp before merge (do not change its contents). See incident #361." + bad=1 + fi + done + if [ "$bad" -eq 0 ]; then + echo "Migration order OK (added migrations all sort after $newest_on_target)." + fi + exit $bad + test: runs-on: ubuntu-latest timeout-minutes: 20 diff --git a/AGENTS.md b/AGENTS.md index 9bedbc39..f541ebb8 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -250,7 +250,10 @@ pnpm --filter server migration:codegen # regenerate src/databa ``` Migration files live in `apps/server/src/database/migrations/` and are named `YYYYMMDDThhmmss-description.ts`. Fork-specific migrations only **add** tables (`page_embeddings`, `ai_chats`, `ai_chat_messages`, `ai_provider_credentials`, `ai_mcp_servers`, `page_template_references`) and columns (e.g. `pages.is_template`, a `NOT NULL DEFAULT false` boolean) — never drop/rewrite Docmost data. -**Migration ordering — always check when merging branches/features.** Kysely runs migrations in **alphabetical (= timestamp) order** and refuses to start if a *new* migration sorts **before** one already applied to the DB (`corrupted migrations: ... must always have a name that comes alphabetically after the last executed migration`). When you merge a branch or land a feature, verify your migration's timestamp still sorts **after every migration that may already be applied on the target** (`/bin/ls -1 apps/server/src/database/migrations | sort | tail`). Branches developed in parallel routinely break this: a feature branch adds `…T130000-…`, `main` meanwhile ships and deploys `…T150000-…`, and after the merge the older-timestamped file is rejected at boot. **Fix = rename your migration to a timestamp after the latest one already in the target** (content unchanged — the filename is the ordering key), then rebuild so the compiled `dist/database/migrations/` picks up the new name. +**Migration ordering — always check when merging branches/features.** Kysely runs migrations in **alphabetical (= timestamp) order**. A *new* migration that sorts **before** one already applied to the DB is a "back-dated" migration, which branches developed in parallel routinely produce: a feature branch adds `…T130000-…`, `develop` meanwhile ships and deploys `…T150000-…`, and after the merge the older-timestamped file has been skipped. Two layers guard this (both added for incident #361, where a back-dated migration crash-looped prod for ~11 min): + +- **CI gate (primary):** the `migration-order` job in `.github/workflows/test.yml` fails a PR whose added migration sorts at/before the newest on the base branch. **So the fix is to rename your migration to a timestamp after the latest one already in the target** (`/bin/ls -1 apps/server/src/database/migrations | sort | tail`; content unchanged — the filename is the ordering key), then rebuild so the compiled `dist/database/migrations/` picks up the new name. +- **Runtime safety net:** both Migrators (`migration.service.ts` startup auto-migrate + `migrate.ts` CLI) set `allowUnorderedMigrations: true`, so the app does **not** refuse to start on an out-of-order migration — it applies the skipped older one instead of crash-looping. Kysely's `#ensureNoMissingMigrations` guard is still on (a *removed* applied migration is still an error). Because apply order can then differ from lexicographic across instances, migrations must stay **independent** (each creates its own objects) — the CI gate remains the primary line; this net only covers a gate bypass (manual push / hotfix branch). ## Architecture — the big picture diff --git a/apps/server/src/database/migrate.ts b/apps/server/src/database/migrate.ts index a5d58766..bc2db37e 100644 --- a/apps/server/src/database/migrate.ts +++ b/apps/server/src/database/migrate.ts @@ -24,6 +24,10 @@ const migrator = new Migrator({ path, migrationFolder, }), + // Match the startup auto-migrator (migration.service.ts): a back-dated + // migration from a long-lived branch must be applied, not rejected as + // "corrupted migrations" (incident #361). See that file for the full rationale. + allowUnorderedMigrations: true, }); run(db, migrator, migrationFolder); diff --git a/apps/server/src/database/services/migration.service.ts b/apps/server/src/database/services/migration.service.ts index 3247e5f8..2b63f728 100644 --- a/apps/server/src/database/services/migration.service.ts +++ b/apps/server/src/database/services/migration.service.ts @@ -19,6 +19,16 @@ export class MigrationService { path, migrationFolder: path.join(__dirname, '..', 'migrations'), }), + // A long-lived branch can add a migration whose timestamped filename sorts + // BEFORE migrations already applied in prod (e.g. #234's 20260627 landing + // after 20260704 was live). With the default (ordered) setting the startup + // migrator then sees "corrupted migrations" — the applied set is no longer a + // prefix of the sorted list — throws, and the app crash-loops on boot + // (incident #361: 502s for ~11 min). allowUnorderedMigrations runs any + // not-yet-applied migration regardless of filename order, so a back-dated + // migration is applied instead of bricking startup. A CI order-gate still + // discourages back-dating; this is the runtime safety net. + allowUnorderedMigrations: true, }); const { error, results } = await migrator.migrateToLatest();