Compare commits

...

9 Commits

Author SHA1 Message Date
agent_coder 0050ad7ebb docs(#363 review F2): update AGENTS.md migration-ordering to the new tolerant behavior
The "Migration ordering" section still described the OLD crash-loop-at-boot
behavior this PR removes ("Kysely refuses to start … rejected at boot"). Rewrote
it to the new two-layer model: the CI migration-order gate is the primary defense
(rename to a current timestamp), and the runtime now sets allowUnorderedMigrations
so the app applies a back-dated migration instead of crash-looping (with the note
that #ensureNoMissingMigrations still guards a removed applied migration, and that
migrations must stay independent since apply order can differ across instances).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-05 02:57:11 +03:00
agent_coder 7b4617db70 fix(#363 review F1): make the migration-order gate fail CLOSED (not open)
The CI gate — whose whole job is to BLOCK a back-dated migration — could pass
open in exactly the scenario it guards (a long branch vs a moving base, i.e. #361):

- Dropped the redundant `git fetch --depth=1`: the checkout already did
  fetch-depth:0 (full history), and the shallow graft truncated the BASE history,
  so `merge-base` (thus the three-dot `origin/base...HEAD` diff) failed when the
  base had moved ahead of the PR merge commit.
- Removed `|| true` on the diff: it swallowed that failure → `added` empty → loop
  skipped → bad=0 → gate PASS. Now `set -e` aborts the job (fail CLOSED) on any
  diff error — a gate must never pass on error.

Verified: yaml parses (jobs migration-order, test); a broken-ref diff with set -e
and no `|| true` aborts before bad=0 (fail-closed) instead of passing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-05 02:27:16 +03:00
agent_coder 459d636ffb fix(db): prevent the migration-order crash-loop from long-lived branches (#363, incident #361)
A long-lived branch can add a migration whose timestamped filename sorts BEFORE
migrations already applied in prod (#234's 20260627T130000-ai-chat-runs merged
after 20260704T120000-client-metrics was live). Kysely's migrator with the
default ordered setting then rejects the applied set as "corrupted migrations"
(no longer a prefix of the sorted list), throws, and the app crash-loops on boot
— exactly incident #361 (502s for ~11 min after a develop deploy). #119 and #120
(June branches) are the next such threats.

Two levels, both:
1. CI migration-order gate (a new `migration-order` job in test.yml, PR-only):
   fails the PR when an added migration sorts at/before the newest migration on
   the base branch, with an actionable message to rename it to a current
   timestamp before merge. This is the primary defense — makes back-dating
   impossible to merge accidentally.
2. `allowUnorderedMigrations: true` on BOTH Migrators (migration.service.ts
   startup auto-migrate + migrate.ts CLI): the runtime safety net — Kysely applies
   a not-yet-applied older migration instead of bricking startup, so a back-dated
   migration that bypasses the gate (manual push / hotfix branch) still boots.
   Trade-off documented inline: apply order across instances may differ from
   lexicographic, so migrations must stay independent (ours each create their own
   objects); the CI gate remains the primary line.

Verified: allowUnorderedMigrations is a valid Kysely 0.28.17 Migrator option;
server tsc clean; the gate script rejects a back-dated filename and passes a
current one. No new deps, no migration, no runtime behavior change beyond the
migrator resilience.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-05 01:36:57 +03:00
agent_vscode 5336f06d10 Merge pull request 'fix(e2e)+ci: канон callout '> [!info]' в e2e-mcp + параллельная сборка с гейтом на publish' (#356) from fix/e2e-callout-and-gate-build into develop 2026-07-04 22:42:11 +03:00
agent_vscode 4bd579f7f6 ci(develop): build image in parallel with tests, gate only the publish
Two-phase scheme instead of the sequential gate: the build job runs in
parallel with test/e2e jobs and only warms the buildx GHA cache
(push:false, cache-to mode=max); a new publish job (needs: test,
e2e-server, e2e-mcp, build) rebuilds from the warm cache (near-instant
on hit, full rebuild on eviction — same as the old sequential timing)
and pushes :develop. GHCR login moved to publish; build-args blocks are
kept textually identical between the two jobs so the cache hits.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-04 22:41:25 +03:00
agent_vscode 7bf1c91a95 ci(develop): gate the :develop image build on e2e suites
Reverse the previous policy where e2e jobs only turned the run red
without blocking the image publish: build.needs now lists test,
e2e-server and e2e-mcp, so a failing test of any kind stops the
:develop image from being built and pushed. Stale policy comments
updated accordingly.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-04 22:33:06 +03:00
agent_vscode 6c82c54470 test(mcp): expect Obsidian '> [!info]' callout export in e2e (#333 canon)
PR #333 deliberately changed the canonical markdown export of callout
nodes to the Obsidian-native format ('> [!type]' + blockquote body,
pinned by packages/prosemirror-markdown unit tests); the importer still
parses both ':::type' fences and '> [!type]'. The get_page e2e assertion
was missed in that switch and still expected ':::info', failing the
e2e-mcp job on develop since 4369bbc5.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-04 22:33:06 +03:00
agent_vscode 382e5196da Merge pull request 'fix(docker): toolchain python3/make/g++ для нативной сборки re2' (#353) from fix/docker-re2-toolchain into develop 2026-07-04 22:11:49 +03:00
agent_vscode 76e0c08cec fix(docker): install python3/make/g++ toolchain for re2 native build
The develop image build broke at `pnpm install --frozen-lockfile`: the new
native dependency re2@1.25.0 (packages/mcp, search_in_page #330) always
compiles from source under pnpm — its prebuilt-binary downloader
(install-artifact-from-github) cannot identify the GitHub repo because pnpm
does not populate npm_package_repository_*/npm_package_json env vars ("No
github repository was identified. Building locally ..."), and node:22-slim
ships no python3/make/g++ for the node-gyp fallback.

- builder stage: add a cache-friendly apt layer with python3 make g++
  before COPY; the stage is discarded so the toolchain may stay.
- installer stage: install the toolchain, run the prod install as the node
  user via `su node -c`, and purge the toolchain — all in one RUN layer so
  the final image stays slim and node_modules ownership needs no extra
  chown layer; USER node is restored right after.

Fixes the failed run 28715009124 (develop docker build); release.yml uses
the same Dockerfile and is covered too.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-04 22:09:40 +03:00
7 changed files with 120 additions and 15 deletions
+42 -11
View File
@@ -18,12 +18,48 @@ env:
IMAGE: ghcr.io/vvzvlad/gitmost IMAGE: ghcr.io/vvzvlad/gitmost
jobs: jobs:
# Run the reusable test suite first so a failing test blocks the image build. # Run the reusable test suite. Together with the e2e jobs below it gates the
# publish job (the image push), not the build itself — build runs in parallel.
test: test:
uses: ./.github/workflows/test.yml uses: ./.github/workflows/test.yml
# Runs in parallel with the test/e2e jobs and only warms the buildx cache
# (GHA cache, scope develop-amd64). No push happens here — the publish job
# below is the only one that pushes the image.
build: build:
needs: test runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Resolve version
id: version
run: echo "value=$(git describe --tags --always)" >> "$GITHUB_OUTPUT"
- name: Build develop image (warm cache, no push)
uses: docker/build-push-action@v6
with:
context: .
platforms: linux/amd64
build-args: |
APP_VERSION=${{ steps.version.outputs.value }}
AI_AGENT_ROLES_CATALOG_URL=https://raw.githubusercontent.com/vvzvlad/gitmost/develop/agent-roles-catalog
push: false
cache-from: type=gha,scope=develop-amd64
cache-to: type=gha,scope=develop-amd64,mode=max,ignore-error=true
# The gate: rebuilds from the cache the build job just wrote (near-instant on
# a cache hit; worst case — cache eviction — a full rebuild, which matches the
# old sequential timing) and pushes :develop only when unit tests AND both
# e2e suites AND the build are green.
publish:
needs: [test, e2e-server, e2e-mcp, build]
runs-on: ubuntu-latest runs-on: ubuntu-latest
timeout-minutes: 30 timeout-minutes: 30
steps: steps:
@@ -57,13 +93,10 @@ jobs:
push: true push: true
tags: ${{ env.IMAGE }}:develop tags: ${{ env.IMAGE }}:develop
cache-from: type=gha,scope=develop-amd64 cache-from: type=gha,scope=develop-amd64
cache-to: type=gha,scope=develop-amd64,mode=max,ignore-error=true
# e2e jobs run on every develop push but DO NOT gate the build/publish above: # e2e jobs gate the publish (image push), not the build: the :develop image
# `build` stays `needs: test` only, so the :develop image still ships even if # is pushed only when unit tests AND both e2e suites pass (publish.needs
# e2e fails. A failing e2e job turns the run red and triggers GitHub's email # lists them all).
# to the pusher — that red run + email is the intended notification, not a
# deploy block.
e2e-server: e2e-server:
runs-on: ubuntu-latest runs-on: ubuntu-latest
# Hard cap: the full-AppModule e2e leaks open handles and hung jest to the 6h max. # Hard cap: the full-AppModule e2e leaks open handles and hung jest to the 6h max.
@@ -124,9 +157,7 @@ jobs:
- name: Run server e2e - name: Run server e2e
run: pnpm --filter ./apps/server test:e2e run: pnpm --filter ./apps/server test:e2e
# Same rationale as e2e-server: this job is intentionally NOT in # Gates the publish too — see the comment above e2e-server.
# `build.needs`. Deploy of the :develop image must not be blocked by e2e;
# a red run plus GitHub's email to the pusher is the notification mechanism.
e2e-mcp: e2e-mcp:
runs-on: ubuntu-latest runs-on: ubuntu-latest
timeout-minutes: 20 timeout-minutes: 20
+43
View File
@@ -13,6 +13,49 @@ permissions:
contents: read contents: read
jobs: jobs:
# Guard against a long-lived branch adding a migration whose timestamped
# filename sorts BEFORE migrations already applied on the target branch (and
# thus in prod). The Kysely startup migrator rejects that as "corrupted
# migrations" and crash-loops the app on boot (incident #361). This gate fails
# the PR so the migration is renamed to a current timestamp before merge. Only
# runs for pull_request events (needs a base branch to diff against).
migration-order:
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest
timeout-minutes: 5
steps:
- name: Checkout (full history for the base-branch diff)
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Added migrations must sort after the newest on the base branch
env:
TARGET_BRANCH: ${{ github.base_ref }}
run: |
set -euo pipefail
MIG_DIR="apps/server/src/database/migrations"
# checkout above already did fetch-depth:0 (full history). Fetch the base
# WITHOUT --depth (a shallow graft would truncate the base history and
# break the merge-base when the base has moved ahead of the PR merge —
# exactly the long-branch-vs-moving-base case this gate guards, #361).
git fetch --no-tags origin "$TARGET_BRANCH"
newest_on_target=$(git ls-tree -r --name-only "origin/${TARGET_BRANCH}" "$MIG_DIR" | sort | tail -1)
# NO `|| true`: a diff failure (e.g. an unresolved merge-base) must fail
# the job CLOSED — a gate whose job is to BLOCK must never pass on error.
# `set -e` above already aborts on a non-zero diff exit.
added=$(git diff --diff-filter=A --name-only "origin/${TARGET_BRANCH}...HEAD" -- "$MIG_DIR")
bad=0
for f in $added; do
if [[ "$f" < "$newest_on_target" || "$f" == "$newest_on_target" ]]; then
echo "::error::Migration $f sorts at or before the newest on ${TARGET_BRANCH} ($newest_on_target) — rename it with a CURRENT timestamp before merge (do not change its contents). See incident #361."
bad=1
fi
done
if [ "$bad" -eq 0 ]; then
echo "Migration order OK (added migrations all sort after $newest_on_target)."
fi
exit $bad
test: test:
runs-on: ubuntu-latest runs-on: ubuntu-latest
timeout-minutes: 20 timeout-minutes: 20
+4 -1
View File
@@ -250,7 +250,10 @@ pnpm --filter server migration:codegen # regenerate src/databa
``` ```
Migration files live in `apps/server/src/database/migrations/` and are named `YYYYMMDDThhmmss-description.ts`. Fork-specific migrations only **add** tables (`page_embeddings`, `ai_chats`, `ai_chat_messages`, `ai_provider_credentials`, `ai_mcp_servers`, `page_template_references`) and columns (e.g. `pages.is_template`, a `NOT NULL DEFAULT false` boolean) — never drop/rewrite Docmost data. Migration files live in `apps/server/src/database/migrations/` and are named `YYYYMMDDThhmmss-description.ts`. Fork-specific migrations only **add** tables (`page_embeddings`, `ai_chats`, `ai_chat_messages`, `ai_provider_credentials`, `ai_mcp_servers`, `page_template_references`) and columns (e.g. `pages.is_template`, a `NOT NULL DEFAULT false` boolean) — never drop/rewrite Docmost data.
**Migration ordering — always check when merging branches/features.** Kysely runs migrations in **alphabetical (= timestamp) order** and refuses to start if a *new* migration sorts **before** one already applied to the DB (`corrupted migrations: ... must always have a name that comes alphabetically after the last executed migration`). When you merge a branch or land a feature, verify your migration's timestamp still sorts **after every migration that may already be applied on the target** (`/bin/ls -1 apps/server/src/database/migrations | sort | tail`). Branches developed in parallel routinely break this: a feature branch adds `…T130000-…`, `main` meanwhile ships and deploys `…T150000-…`, and after the merge the older-timestamped file is rejected at boot. **Fix = rename your migration to a timestamp after the latest one already in the target** (content unchanged — the filename is the ordering key), then rebuild so the compiled `dist/database/migrations/` picks up the new name. **Migration ordering — always check when merging branches/features.** Kysely runs migrations in **alphabetical (= timestamp) order**. A *new* migration that sorts **before** one already applied to the DB is a "back-dated" migration, which branches developed in parallel routinely produce: a feature branch adds `…T130000-…`, `develop` meanwhile ships and deploys `…T150000-…`, and after the merge the older-timestamped file has been skipped. Two layers guard this (both added for incident #361, where a back-dated migration crash-looped prod for ~11 min):
- **CI gate (primary):** the `migration-order` job in `.github/workflows/test.yml` fails a PR whose added migration sorts at/before the newest on the base branch. **So the fix is to rename your migration to a timestamp after the latest one already in the target** (`/bin/ls -1 apps/server/src/database/migrations | sort | tail`; content unchanged — the filename is the ordering key), then rebuild so the compiled `dist/database/migrations/` picks up the new name.
- **Runtime safety net:** both Migrators (`migration.service.ts` startup auto-migrate + `migrate.ts` CLI) set `allowUnorderedMigrations: true`, so the app does **not** refuse to start on an out-of-order migration — it applies the skipped older one instead of crash-looping. Kysely's `#ensureNoMissingMigrations` guard is still on (a *removed* applied migration is still an error). Because apply order can then differ from lexicographic across instances, migrations must stay **independent** (each creates its own objects) — the CI gate remains the primary line; this net only covers a gate bypass (manual push / hotfix branch).
## Architecture — the big picture ## Architecture — the big picture
+16 -2
View File
@@ -5,6 +5,13 @@ RUN npm install -g pnpm@10.4.0
FROM base AS builder FROM base AS builder
# re2 (packages/mcp) always compiles from source under pnpm (the prebuilt-binary
# download cannot identify the GitHub repo), so node-gyp needs python3/make/g++.
# This stage is discarded, so the toolchain can stay installed.
RUN apt-get update \
&& apt-get install -y --no-install-recommends python3 make g++ \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app WORKDIR /app
COPY . . COPY . .
@@ -57,9 +64,16 @@ COPY --from=builder /app/patches /app/patches
RUN chown -R node:node /app RUN chown -R node:node /app
USER node # Toolchain is needed transiently to compile re2 during the prod install; install
# and purge it in one layer to keep the final image slim. The install itself runs
# as the node user via su to keep node_modules ownership without a costly chown layer.
RUN apt-get update \
&& apt-get install -y --no-install-recommends python3 make g++ \
&& su node -c "pnpm install --frozen-lockfile --prod" \
&& apt-get purge -y --auto-remove python3 make g++ \
&& rm -rf /var/lib/apt/lists/*
RUN pnpm install --frozen-lockfile --prod USER node
RUN mkdir -p /app/data/storage RUN mkdir -p /app/data/storage
+4
View File
@@ -24,6 +24,10 @@ const migrator = new Migrator({
path, path,
migrationFolder, migrationFolder,
}), }),
// Match the startup auto-migrator (migration.service.ts): a back-dated
// migration from a long-lived branch must be applied, not rejected as
// "corrupted migrations" (incident #361). See that file for the full rationale.
allowUnorderedMigrations: true,
}); });
run(db, migrator, migrationFolder); run(db, migrator, migrationFolder);
@@ -19,6 +19,16 @@ export class MigrationService {
path, path,
migrationFolder: path.join(__dirname, '..', 'migrations'), migrationFolder: path.join(__dirname, '..', 'migrations'),
}), }),
// A long-lived branch can add a migration whose timestamped filename sorts
// BEFORE migrations already applied in prod (e.g. #234's 20260627 landing
// after 20260704 was live). With the default (ordered) setting the startup
// migrator then sees "corrupted migrations" — the applied set is no longer a
// prefix of the sorted list — throws, and the app crash-loops on boot
// (incident #361: 502s for ~11 min). allowUnorderedMigrations runs any
// not-yet-applied migration regardless of filename order, so a back-dated
// migration is applied instead of bricking startup. A CI order-gate still
// discourages back-dating; this is the runtime safety net.
allowUnorderedMigrations: true,
}); });
const { error, results } = await migrator.migrateToLatest(); const { error, results } = await migrator.migrateToLatest();
+1 -1
View File
@@ -450,7 +450,7 @@ async function main() {
// 8. get_page markdown round-trip sanity (table separator present) // 8. get_page markdown round-trip sanity (table separator present)
const md = await client.getPage(pageId); const md = await client.getPage(pageId);
check("get_page md: table separator emitted", md.data.content.includes("| --- |"), ""); check("get_page md: table separator emitted", md.data.content.includes("| --- |"), "");
check("get_page md: callout exported as :::", md.data.content.includes(":::info")); check("get_page md: callout exported as Obsidian '> [!info]'", md.data.content.includes("> [!info]"));
// 9. comments: create / list / reply / update / check_new / delete // 9. comments: create / list / reply / update / check_new / delete
const beforeComments = new Date(Date.now() - 1000).toISOString(); const beforeComments = new Date(Date.now() - 1000).toISOString();