F2 (security): the base-file removal in materializeStackVersion used the unclamped
filepath.Join while the read (GetFileContent) and write
(StoreStackFileFromBytesByVersion) in the same function are clamped. An
attacker-influenceable fileName (stack.EntryPoint/AdditionalFiles via DB
restore/import) could escape basePath in this file-deleting path. Use the clamped
filesystem.JoinPaths, consistent with the read/write.
F1 (test): the adopt branch (complete v{N} on disk but stale DB metadata -> repoint
ProjectPath + set StackFileVersion + seed Versions) is exactly the post-crash state
the crash-safety guarantee relies on (crash between materialize and DB persist), but
no test hit it — existing complete-v-dir tests have consistent metadata and take the
no-op short-circuit. Add TestMigrateStackFileVersions_2_44_0_AdoptCompleteVDirStaleMetadata:
seed a complete v1 on disk, stack with stale flat ProjectPath + empty Versions, no
base copies; assert the migration repoints ProjectPath to v1, sets StackFileVersion=1,
seeds Versions, and leaves the v1 files intact.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 2.44.0 stack-file-version migration skipped any stack with StackFileVersion>0,
assuming the versioned on-disk layout (compose/{id}/v{N}) already existed. That is
false for stacks whose files still live only at the flat base path (from EE<->CE
edition juggling, DB restores, older builds): they were skipped and left broken —
GET /stacks/{id}/file?version=N returns 500 (the v{N} dir doesn't exist), while the
stack keeps running.
Replace the blanket StackFileVersion>0 skip with a check on the ACTUAL on-disk state
of the current version: complete v-dir -> adopt (idempotent no-op when already
repointed); missing/incomplete -> heal by materializing v{N} from the flat base
files. Factor the fresh (v==0->v1) and heal (v>0-but-missing) paths into one shared
materializeStackVersion helper, and generalize seedStackVersionMetadata to an
arbitrary version v (heals a StackFileVersion=3 stack into v3, not v1; seeds Versions
only when empty so a real history is never clobbered).
Preserves the all-or-nothing guarantee (read every file before writing any; never
write a partial v{N}) and the base-copy-removal ordering (base files are removed only
after ProjectPath is repointed to the v-dir, so a deploy mid-migration or a crash
never finds a missing file — a re-run completes the repoint from the full v{N}).
closes#30
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds append-only version history on disk (compose/{id}/v{N}/<files>) for
file-based (WorkflowID==0) Compose/Swarm stacks, with rollback to any past
version. Git stacks (versioned by commit) and Kubernetes are untouched.
Backend:
- Stack model: StackFileVersion, PreviousDeploymentInfo, Versions[]; new
StackFileVersionInfo type. APIVersion 2.43.0 -> 2.44.0.
- Versioned multi-file snapshot (entrypoint + AdditionalFiles) into v{N}/;
ProjectPath repointed via GetStackProjectPathByVersion each deploy. Retention
cap (20): Versions[] trimmed in-tx, old dirs deleted only AFTER the tx commits.
- Update handlers: RollbackTo (content read server-side from the target version,
never trusted from the client; validated 1..current & present in Versions).
- Create paths seed v1. stackFile reads ?version= (validated; negative -> 400).
- New GET /stacks/{id}/versions endpoint.
- Migration 2.44.0: move existing file-based stacks' files into v1/ (idempotent,
atomic pre-read of the full file set, skips git/kube/orphans).
Frontend:
- useStackVersions query + stackVersions key; StackEditorTab builds the full
history list; StackVersionSelector shows 'v{N} · date · author'; file/versions
caches invalidated (by prefix) after deploy/rollback.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
P0 Health-gated rollback (standalone auto-update path): capture the previous
image id + reference + healthcheck before the recreate, then poll the new
container's health over a configurable window. On healthy proceed (and only
then clean up the old image); on unhealthy/exit/timeout re-tag the old image
back onto the original reference and Recreate (no pull) to restore it, reusing
Recreate's config preservation. The decision is a pure decideRollback() helper.
P1 Per-endpoint enable: ContainerAutomationDisabled flag on Endpoint (zero value
participates, no migration churn), checked by both daemons; settable via the
endpoint update API. UI control deferred (see report).
P2 Notifier seam: minimal Notifier interface + logNotifier, emitting structured
updated/rollback/update-failed/heal-restarted events from the daemon.
Settings: RollbackOnFailure + RollbackTimeout (default 120s) added to
ContainerAutomation.AutoUpdate, wired through defaults/migration/golden,
settings_update validation, the AutoUpdatePanel and the TS types.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add an optional periodic auto-update daemon that detects outdated container
images and applies updates, replacing the containrrr/watchtower sidecar. It
extends M1's containerautomation service/scheduler/labels infrastructure and
reuses the existing zlib image-detection engine, the standalone Recreate path
and the stack deployer.
Backend:
- api/containerautomation/autoupdate.go: scheduler job iterating Docker
(non-edge) endpoints -> in-scope running containers -> ContainerImageStatus;
for Outdated: standalone -> ContainerService.Recreate(pull); stack-managed ->
one stack redeploy-with-pull per stack per tick (git via RedeployWhenChanged,
file via the deployer directly); external compose -> detect only. Monitor-only
containers are status-checked (warms the badge cache) but never applied.
Overlap guard (atomic), pull/registry-auth failure -> leave running container
untouched, conservative cleanup of the dangling old image on the Cleanup flag
(non-forced ImageRemove only succeeds when truly unused).
- labels.go: update enable / monitor-only labels with watchtower aliases,
InUpdateScope, IsMonitorOnly, and pure resolveContainerUpdateRouting /
groupContainersForUpdate (Go analogue of M3's TS routing + grouping).
- service.go: run both jobs, Reload restarts/stops each per settings; NewService
also takes ContainerService, StackDeployer and GitService.
- Settings.ContainerAutomation.AutoUpdate {Enabled, PollInterval, Scope,
Cleanup} with fresh-install defaults and a 2.43.0 backfill (extends M1's
migration; golden test data updated). settings handler validates + reloads.
Frontend:
- Global AutoUpdatePanel in SettingsView (enable / poll interval / scope /
cleanup) via useUpdateSettingsMutation, plus settings TS types.
- Read-only per-container Auto-update row in the container details view
(Docker labels are immutable at runtime), surfacing monitor-only.
Tests: Go unit tests for the update label aliases, scope, monitor-only, the
routing decision and the one-redeploy-per-stack grouping; vitest for the panel
and the per-container row.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a native, CE-only auto-heal daemon that restarts Docker containers whose
healthcheck reports "unhealthy", replacing the willfarrell/autoheal sidecar.
Backend:
- New package api/containerautomation (service lifecycle + scheduler job,
per-endpoint heal pass, label/scope parsing, in-memory cooldown/retry state).
- Settings.ContainerAutomation.AutoHeal {Enabled, CheckInterval, Scope} with
fresh-install defaults and a 2.43.0 migration backfilling existing installs.
- Settings update handler reloads/stops the job via a small Reloader interface
(no import cycle); service bootstrapped from main.go after stack schedules.
Frontend:
- Global AutoHealPanel in SettingsView (enable / interval / scope) via
useUpdateSettingsMutation, plus settings TS types.
- Read-only per-container Auto-heal row in the container details view (Docker
labels are immutable at runtime; opt-in is set via Create/Edit form labels).
Tests: Go unit tests for label/scope resolution and the cooldown/retry decision;
vitest for the panel and the per-container row.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>