Split the single container-automation webhook URL into two independently
optional URLs — UpdateWebhookURL (fired on update/rollback/update-failed) and
HealWebhookURL (fired on auto-heal restart). The notifier routes each event to
its mechanism's URL by kind; an empty URL silences only that mechanism, so a
user can enable notifications for updates without heal (or vice-versa).
Settings gain both fields (each validated http/https, {{message}} allowed), the
NotificationPanel exposes two labeled inputs, and the golden migration output is
updated. Delivery path (goroutine/recover/timeout, {{message}} GET vs POST,
per-container stack message format) is unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add an opt-in webhook notification for container-automation events (image
update, rollback, update-failed, auto-heal restart), plugging into the existing
Notifier seam in notify.go.
- Settings: new ContainerAutomation.Notification.WebhookURL (shared across
update + heal), persisted and validated in the settings update handler
(optional; http/https only; accepts the {{message}} placeholder).
- webhookNotifier reads the current URL from the datastore per event (UI changes
take effect without a restart). If the URL contains {{message}} it substitutes
the URL-encoded message and issues a GET; otherwise it POSTs the message as the
body. Delivery, the env/stack name lookups, and any panic run in a goroutine
under recover() with a 10s timeout — strictly best-effort, never blocks or
crashes the automation daemon. multiNotifier fans events to logNotifier +
webhook and isolates a panic in any one notifier.
- Message format (maintainer's spec):
Environment | <env>
Stack [<name>] (Container [<name>] for non-stack events)
Update [<name>]: <old> -> <new>
Auto-heal: 'Auto-heal: restarted unhealthy container'.
- New NotificationPanel in settings to configure the URL.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
P0 Health-gated rollback (standalone auto-update path): capture the previous
image id + reference + healthcheck before the recreate, then poll the new
container's health over a configurable window. On healthy proceed (and only
then clean up the old image); on unhealthy/exit/timeout re-tag the old image
back onto the original reference and Recreate (no pull) to restore it, reusing
Recreate's config preservation. The decision is a pure decideRollback() helper.
P1 Per-endpoint enable: ContainerAutomationDisabled flag on Endpoint (zero value
participates, no migration churn), checked by both daemons; settable via the
endpoint update API. UI control deferred (see report).
P2 Notifier seam: minimal Notifier interface + logNotifier, emitting structured
updated/rollback/update-failed/heal-restarted events from the daemon.
Settings: RollbackOnFailure + RollbackTimeout (default 120s) added to
ContainerAutomation.AutoUpdate, wired through defaults/migration/golden,
settings_update validation, the AutoUpdatePanel and the TS types.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add an optional periodic auto-update daemon that detects outdated container
images and applies updates, replacing the containrrr/watchtower sidecar. It
extends M1's containerautomation service/scheduler/labels infrastructure and
reuses the existing zlib image-detection engine, the standalone Recreate path
and the stack deployer.
Backend:
- api/containerautomation/autoupdate.go: scheduler job iterating Docker
(non-edge) endpoints -> in-scope running containers -> ContainerImageStatus;
for Outdated: standalone -> ContainerService.Recreate(pull); stack-managed ->
one stack redeploy-with-pull per stack per tick (git via RedeployWhenChanged,
file via the deployer directly); external compose -> detect only. Monitor-only
containers are status-checked (warms the badge cache) but never applied.
Overlap guard (atomic), pull/registry-auth failure -> leave running container
untouched, conservative cleanup of the dangling old image on the Cleanup flag
(non-forced ImageRemove only succeeds when truly unused).
- labels.go: update enable / monitor-only labels with watchtower aliases,
InUpdateScope, IsMonitorOnly, and pure resolveContainerUpdateRouting /
groupContainersForUpdate (Go analogue of M3's TS routing + grouping).
- service.go: run both jobs, Reload restarts/stops each per settings; NewService
also takes ContainerService, StackDeployer and GitService.
- Settings.ContainerAutomation.AutoUpdate {Enabled, PollInterval, Scope,
Cleanup} with fresh-install defaults and a 2.43.0 backfill (extends M1's
migration; golden test data updated). settings handler validates + reloads.
Frontend:
- Global AutoUpdatePanel in SettingsView (enable / poll interval / scope /
cleanup) via useUpdateSettingsMutation, plus settings TS types.
- Read-only per-container Auto-update row in the container details view
(Docker labels are immutable at runtime), surfacing monitor-only.
Tests: Go unit tests for the update label aliases, scope, monitor-only, the
routing decision and the one-redeploy-per-stack grouping; vitest for the panel
and the per-container row.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a native, CE-only auto-heal daemon that restarts Docker containers whose
healthcheck reports "unhealthy", replacing the willfarrell/autoheal sidecar.
Backend:
- New package api/containerautomation (service lifecycle + scheduler job,
per-endpoint heal pass, label/scope parsing, in-memory cooldown/retry state).
- Settings.ContainerAutomation.AutoHeal {Enabled, CheckInterval, Scope} with
fresh-install defaults and a 2.43.0 migration backfilling existing installs.
- Settings update handler reloads/stops the job via a small Reloader interface
(no import cycle); service bootstrapped from main.go after stack schedules.
Frontend:
- Global AutoHealPanel in SettingsView (enable / interval / scope) via
useUpdateSettingsMutation, plus settings TS types.
- Read-only per-container Auto-heal row in the container details view (Docker
labels are immutable at runtime; opt-in is set via Create/Edit form labels).
Tests: Go unit tests for label/scope resolution and the cooldown/retry decision;
vitest for the panel and the per-container row.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>