1181 Commits

Author SHA1 Message Date
594312a777 Merge pull request 'feat(automation): native container auto-update (Watchtower-style) + auto-heal (#3)' (#19) from feat/3-auto-update into develop
Reviewed-on: #19
2026-07-01 23:25:24 +03:00
agent_coder
492d3d01b0 feat(#19): separate webhook per automation mechanism (update vs heal)
Split the single container-automation webhook URL into two independently
optional URLs — UpdateWebhookURL (fired on update/rollback/update-failed) and
HealWebhookURL (fired on auto-heal restart). The notifier routes each event to
its mechanism's URL by kind; an empty URL silences only that mechanism, so a
user can enable notifications for updates without heal (or vice-versa).

Settings gain both fields (each validated http/https, {{message}} allowed), the
NotificationPanel exposes two labeled inputs, and the golden migration output is
updated. Delivery path (goroutine/recover/timeout, {{message}} GET vs POST,
per-container stack message format) is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-01 22:47:25 +03:00
agent_coder
eb35e9c47f feat(automation): configurable webhook notifier for automation events
Add an opt-in webhook notification for container-automation events (image
update, rollback, update-failed, auto-heal restart), plugging into the existing
Notifier seam in notify.go.

- Settings: new ContainerAutomation.Notification.WebhookURL (shared across
  update + heal), persisted and validated in the settings update handler
  (optional; http/https only; accepts the {{message}} placeholder).
- webhookNotifier reads the current URL from the datastore per event (UI changes
  take effect without a restart). If the URL contains {{message}} it substitutes
  the URL-encoded message and issues a GET; otherwise it POSTs the message as the
  body. Delivery, the env/stack name lookups, and any panic run in a goroutine
  under recover() with a 10s timeout — strictly best-effort, never blocks or
  crashes the automation daemon. multiNotifier fans events to logNotifier +
  webhook and isolates a panic in any one notifier.
- Message format (maintainer's spec):
    Environment | <env>
    Stack [<name>]            (Container [<name>] for non-stack events)
    Update [<name>]: <old> -> <new>
  Auto-heal: 'Auto-heal: restarted unhealthy container'.
- New NotificationPanel in settings to configure the URL.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-01 19:31:18 +03:00
agent_coder
6aecdfbe46 feat(containers): interactive image-status badge (click to update / re-check)
Make the container image-status badge actionable, matching native Portainer:
- Clicking "Update available" opens the update confirm dialog and runs the
  existing update flow (standalone recreate-with-pull / stack redeploy), gated
  and disabled while in flight to avoid a double submit. The confirm+apply logic
  is extracted from UpdateNowButton into a shared useApplyContainerImageUpdate
  hook so the details button and the list badge share one implementation.
- Clicking "Up to date" re-queries the registry. Because the server caches image
  status (statusCache 5m + remoteDigestCache 5s), a plain refetch was a no-op, so
  the endpoint gains an optional ?force=true that bypasses BOTH caches for a
  manual re-check while still repopulating them; the default (auto badges + the
  auto-update daemon) keeps using the caches unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-01 19:04:49 +03:00
claude code agent
7257ae52d8 test(logs): cover the docker proxy stream/flush loop (F1)
Extract the manual stream-and-flush loop from dockerLocalProxy.ServeHTTP
into a behaviour-preserving package-private streamResponse(w, body) helper,
and add docker_test.go regression tests for the riskiest path (it runs on
every Docker API response):

- DeliversFullBodyAndFlushesPerChunk: a >32KB body delivered as several
  chunks (boundaries not aligned to the 32KB buffer), with the final Read
  returning (n>0, io.EOF) simultaneously, asserts the streamed body equals
  the input exactly (no loss/duplication) and that Flush ran more than once
  (the per-chunk flush is the whole point of the change).
- StopsOnWriteErrorWithoutPanic: a writer that errors on first Write (and
  does not implement http.Flusher, exercising the nil-flusher fallback)
  breaks the loop after one write without panicking.

No production behaviour change — the loop body is identical, only moved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 02:58:48 +03:00
claude code agent
637e96f236 fix(logs): flush docker proxy stream per chunk; trim log-viewer settings UI
Backend (the "logs arrive every ~5s / pipe clogged" bug):
- dockerLocalProxy.ServeHTTP streamed the docker socket response via
  io.Copy, which buffers ~2KB into the ResponseWriter and only flushes
  when full or on handler return. Low-throughput streaming endpoints
  (container logs follow=1, events, stats, attach) therefore arrived in
  multi-second batches. Stream manually and Flush() after each chunk so
  they are delivered live. Behaviour is otherwise identical to io.Copy
  (full-write contract, EOF handling, Debug error logging); hijacked
  attach/exec go through a separate websocket handler, unaffected.
- NewSingleHostReverseProxyWithHostHeader: set FlushInterval = -1 so the
  remote-endpoint path streams live too.

Frontend (maintainer UI asks):
- Remove the line-selection mechanic entirely (Copy-selected-lines and
  Unselect buttons, selectLine/copySelection/clearSelection, selectedLines
  state, line_selected highlight): selecting/copying is mouse-native. Copy
  (all visible) and Download stay.
- Rename the unclear "Fetch" since-selector label to "Since".
- Move the settings controls into the widget header (rd-widget-header
  default transclude slot) so they share one row with the "Log viewer
  settings" title, reclaiming vertical space for the log pane.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 02:05:02 +03:00
claude code agent
be3bfd0513 fix(automation): maintainer pre-merge review — stale detection, daemon edge cases, parity (F1-F9)
F1: cap the image-status cache TTL at 5m (was 24h) — the cache is keyed by the
    LOCAL imageID, which doesn't change when upstream pushes a new image under the
    same tag, so the 24h TTL hid new images from both the badge and the auto-update
    daemon; a short TTL re-resolves the remote digest within the poll window.
F2: document that the update->rollback guard map is in-memory (restart implication).
F3: skip auto-update for an unnamed container when rollback is on (the endpoint+name
    keyed guard can't record it, so it would loop) — pure skipUnnamedForRollback + test.
F4: wrap the pre-update ContainerInspect in context.WithTimeout(endpointTimeout).
F5: document Reload() does not interrupt an in-flight tick.
F6: floor auto-heal CheckInterval at 1s (mirrors auto-update) + test.
F7: wontfix — migration is currently correct; namespace rework is out of scope.
F8: correct the misleading SSRF/AllowList comment (no filter is applied).
F9: front auto-heal interval floor + test; dedup STALE_TIME; fix invalidation comment.
Also refresh three stale '24h/long-lived cache' comments to match the 5m TTL.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 19:51:15 +03:00
claude code agent
922f506fe5 feat(automation): guard update→rollback loop; name Settings types; tests & doc fixes (F1-F7)
F1: record rolled-back targets per service (endpointID/containerName + remote
    digest) and skip auto-update during a 24h cooldown unless the remote digest
    changes — breaks the infinite update→rollback loop on a persistently
    unhealthy image, without blocking a genuinely new image.
F2: unit-test applyContainerUpdate dispatch/payload mapping.
F3: settings_update.go comments mention auto-heal AND auto-update.
F4: drop stale '(future M4)' TS docs; primitives are frontend-only.
F5: replace the anonymous ContainerAutomation settings struct with named
    types (identical JSON tags).
F6: drop parseEnable (duplicate of boolLabel).
F7: remove the unused gitService dependency.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 14:29:57 +03:00
claude code agent
70f7fe5e84 Merge remote-tracking branch 'origin/feat/10-update-now' into feat/3-auto-update 2026-06-29 12:48:24 +03:00
claude code agent
cdf17d904d fix(automation): rollback robustness — transient inspect, start_period, digest images, shutdown, event order (#12 review)
F1: tolerate up to 3 consecutive health-gate inspect failures (reset on
success) before declaring an update failed, so a transient Docker API blip no
longer triggers a false rollback.

F2: detect baseCtx cancellation during the gate and abort without rolling back
or emitting update-failed (debug log only), instead of a misleading
"rollback failed" event on every shutdown mid-gate.

F3: derive the gate deadline as start + max(RollbackTimeout, StartPeriod+buffer)
via effectiveRollbackDeadline, reading the container's healthcheck StartPeriod
so a legitimately slow-starting container is not rolled back while starting.

F4: only enable the gate when the original reference is a proper tag (new
isTagReference helper); skip with a log line for digest-pinned / bare-image-id
containers that cannot be re-tagged.

F5: document the sequential-tick delay limitation of the gate poll.

F6: emit EventUpdated only after the gate confirms healthy (or immediately when
no gate is active); the rollback path emits only EventRollback, so the event
sequence is truthful.

F7: floor RollbackTimeout at 10s in backend and frontend validation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 10:57:54 +03:00
claude code agent
32a2b7a9ae feat(automation): health-gated rollback + per-endpoint + notify hook (#12, epic #3 M5)
P0 Health-gated rollback (standalone auto-update path): capture the previous
image id + reference + healthcheck before the recreate, then poll the new
container's health over a configurable window. On healthy proceed (and only
then clean up the old image); on unhealthy/exit/timeout re-tag the old image
back onto the original reference and Recreate (no pull) to restore it, reusing
Recreate's config preservation. The decision is a pure decideRollback() helper.

P1 Per-endpoint enable: ContainerAutomationDisabled flag on Endpoint (zero value
participates, no migration churn), checked by both daemons; settable via the
endpoint update API. UI control deferred (see report).

P2 Notifier seam: minimal Notifier interface + logNotifier, emitting structured
updated/rollback/update-failed/heal-restarted events from the daemon.

Settings: RollbackOnFailure + RollbackTimeout (default 120s) added to
ContainerAutomation.AutoUpdate, wired through defaults/migration/golden,
settings_update validation, the AutoUpdatePanel and the TS types.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 10:41:55 +03:00
claude code agent
21b5ec3e05 fix(automation): git-stack honesty + ECR registry refresh + interval floor (#11 review)
F1: Stop routing git-backed stacks through a per-tick RedeployWhenChanged for
image-only updates. The git redeploy path short-circuits when the commit is
unchanged (so an upstream-digest update never applies) yet still git-fetches
every tick. Git stacks are now detect-only in the auto-apply path; their image
update lands on the next git change or via manual "Update now". File (non-git)
stacks still force-pull-redeploy immediately. The AutoUpdatePanel text no longer
promises daemon auto-update for git/externally-managed containers.

F2: Resolve registries for the file-stack redeploy the same way the established
userless/system path (RedeployWhenChanged) does, via the new
deployments.ResolveStackRegistries: scope to the stack author's endpoint access
and RefreshAndPersistECRTokens, instead of hand-passing Registry().ReadAll().
ECR-backed stacks now auto-update with fresh tokens.

F3: Add a 1m floor for the auto-update poll interval, enforced in the settings
Validate and mirrored in the frontend validation.

F4: Thread the application shutdownCtx into NewService and use it as the base
for the heal/update job operation contexts, so shutdown cancels in-flight work.

F5: Correct the updateEndpoint comment about monitor-only badge-cache warming
(only in-scope monitor-only containers are status-checked).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 10:24:58 +03:00
claude code agent
b3ae5f3659 feat(automation): native auto-update daemon (#11, epic #3 M4)
Add an optional periodic auto-update daemon that detects outdated container
images and applies updates, replacing the containrrr/watchtower sidecar. It
extends M1's containerautomation service/scheduler/labels infrastructure and
reuses the existing zlib image-detection engine, the standalone Recreate path
and the stack deployer.

Backend:
- api/containerautomation/autoupdate.go: scheduler job iterating Docker
  (non-edge) endpoints -> in-scope running containers -> ContainerImageStatus;
  for Outdated: standalone -> ContainerService.Recreate(pull); stack-managed ->
  one stack redeploy-with-pull per stack per tick (git via RedeployWhenChanged,
  file via the deployer directly); external compose -> detect only. Monitor-only
  containers are status-checked (warms the badge cache) but never applied.
  Overlap guard (atomic), pull/registry-auth failure -> leave running container
  untouched, conservative cleanup of the dangling old image on the Cleanup flag
  (non-forced ImageRemove only succeeds when truly unused).
- labels.go: update enable / monitor-only labels with watchtower aliases,
  InUpdateScope, IsMonitorOnly, and pure resolveContainerUpdateRouting /
  groupContainersForUpdate (Go analogue of M3's TS routing + grouping).
- service.go: run both jobs, Reload restarts/stops each per settings; NewService
  also takes ContainerService, StackDeployer and GitService.
- Settings.ContainerAutomation.AutoUpdate {Enabled, PollInterval, Scope,
  Cleanup} with fresh-install defaults and a 2.43.0 backfill (extends M1's
  migration; golden test data updated). settings handler validates + reloads.

Frontend:
- Global AutoUpdatePanel in SettingsView (enable / poll interval / scope /
  cleanup) via useUpdateSettingsMutation, plus settings TS types.
- Read-only per-container Auto-update row in the container details view
  (Docker labels are immutable at runtime), surfacing monitor-only.

Tests: Go unit tests for the update label aliases, scope, monitor-only, the
routing decision and the one-redeploy-per-stack grouping; vitest for the panel
and the per-container row.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 10:04:09 +03:00
claude code agent
7eaff4dab0 fix(automation): real status cache read + nodeName key + honest errors (#9 review)
F1: ContainerImageStatus now reads the 24h statusCache (keyed by imageID)
before the remote registry digest lookup, so the cache is effective on the
input side for all callers instead of being write-only. This avoids the
rate-limited registry HEAD on repeat loads.

F2: add nodeName to the imageStatus query key so cached results cannot be
reused across nodes.

F3: correct the swagger annotations to reflect that engine-level issues
degrade to a 200 skipped/error status rather than 400/404.

F4: return a generic error message to the client instead of the raw
registry/engine error; the raw error is still logged server-side.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 09:09:18 +03:00
claude code agent
f69eb3f9eb feat(automation): CE container image update detection endpoint + badge (#9, epic #3 M2)
Add native CE detection of "a newer image is available" for running
containers, surfaced as a read-only HTTP endpoint and a containers-list
badge/column. No applying of updates (M3/M4), no auto-heal (M1).

Backend:
- New CE handler GET /docker/{id}/containers/{containerId}/image_status
  backed by the existing zlib/CE digest engine
  (images.NewClientWithRegistry + ContainerImageStatus). Honors nodeName,
  authz, and routes registry calls through the credential store / SSRF
  AllowList. Engine failures degrade to a 200 {Status:"error"} so the UI
  stays graceful. Response shape: {Status, Message?}.

Frontend (CE-only, no isBE gating; the EE ImageStatus component is left
untouched):
- useContainerImageStatus TanStack Query hook (5min staleTime, no
  refetch-on-focus; backend caches 24h) calling the non-proxied endpoint.
- UpdateStatusBadge component (own assets, neutral on skipped/error).
- "Update available" column in the containers datatable; one cached,
  non-blocking query per visible row.

Tests: Go response-shape unit test; vitest for the badge (all statuses)
and the hook (url + nodeName query param via msw).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 08:59:54 +03:00
claude code agent
51957d2f98 feat(automation): native auto-heal daemon (#8, epic #3 M1)
Add a native, CE-only auto-heal daemon that restarts Docker containers whose
healthcheck reports "unhealthy", replacing the willfarrell/autoheal sidecar.

Backend:
- New package api/containerautomation (service lifecycle + scheduler job,
  per-endpoint heal pass, label/scope parsing, in-memory cooldown/retry state).
- Settings.ContainerAutomation.AutoHeal {Enabled, CheckInterval, Scope} with
  fresh-install defaults and a 2.43.0 migration backfilling existing installs.
- Settings update handler reloads/stops the job via a small Reloader interface
  (no import cycle); service bootstrapped from main.go after stack schedules.

Frontend:
- Global AutoHealPanel in SettingsView (enable / interval / scope) via
  useUpdateSettingsMutation, plus settings TS types.
- Read-only per-container Auto-heal row in the container details view (Docker
  labels are immutable at runtime; opt-in is set via Create/Edit form labels).

Tests: Go unit tests for label/scope resolution and the cooldown/retry decision;
vitest for the panel and the per-container row.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 08:22:46 +03:00
andres-portainer
e664bf0e19 fix(helm): add missing SSRF protections BE-13136 (#3001) 2026-06-22 20:25:10 -03:00
andres-portainer
a6370808ae fix(ssrf): disable HTTP/2 for some specific cases BE-13121 (#2996) 2026-06-22 16:13:43 -03:00
Phil Calder
f596c862b3 fix(websocket): enforce environment authorization on kubernetes-shell [BE-13027] (#2774)
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: oscarzhou <oscar.zhou@portainer.io>
2026-06-22 15:09:41 +12:00
bernard-portainer
5395dee4c6 feat(gpu-stats): add gpu stats to environments [C9S-200] (#2735) 2026-06-22 09:21:43 +12:00
andres-portainer
26334e9088 feat(ssrf): add missing transport wrappings and more checks BE-13021 (#2968) 2026-06-19 20:26:03 -03:00
RHCowan
37bd8c06b5 fix(security): gate docker dashboard and edge async command routes [R8S-1057] (#2953) 2026-06-19 11:08:01 +12:00
Chaim Lev-Ari
4d539a691d feat(custom-templates): reuse existing git sources in create/update [BE-13053] (#2925)
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 21:45:35 +03:00
Chaim Lev-Ari
ee8e73d7f9 feat(edge/stacks): use source ID for edge stack git creation [BE-13044] (#2926)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-16 17:33:19 +03:00
Chaim Lev-Ari
d9673e33ec feat(helm): reuse existing git sources in Kubernetes Helm-from-git install [BE-13046] (#2900)
Co-authored-by: Claude <noreply@anthropic.com>
2026-06-15 22:01:31 +03:00
andres-portainer
16b5554f66 fix(customtemplates): add resource controls BE-13019 (#2897) 2026-06-15 14:59:07 -03:00
Chaim Lev-Ari
fcdd6b4510 feat(stacks): use source id to create git stacks [BE-13043] (#2870)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 18:49:26 +03:00
Devon Steenberg
8b21dfc318 feat(ssrf): add ssrf allow list to settings [BE-13021] (#2858) 2026-06-12 15:16:06 +12:00
andres-portainer
0da42c01b6 feat(gitcredential): remove GitCredential BE-12919 (#2838) 2026-06-11 18:53:24 -03:00
Steven Kang
1cd6017df6 fix(api): add endpoint authorization check to /api/kubernetes/{id}/* route - develop [R8S-1056] (#2829) 2026-06-11 09:49:50 +12:00
andres-portainer
babb4ffb37 fix(nolint): remove unnecessary nolint directives BE-13074 (#2852) 2026-06-10 15:35:08 -03:00
LP B
0c2f07988a feat(app/sources): source create view (#2680)
Co-authored-by: Chaim Lev-Ari <chaim.lev-ari@portainer.io>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-10 21:34:46 +03:00
andres-portainer
1765e41fd4 feat(ssrf): implement an SSRF protection mechanism BE-13021 (#2818) 2026-06-09 00:41:42 -03:00
andres-portainer
df7a4b5d6f feat(gitops): improve the data model BE-12919 (#2819) 2026-06-08 15:01:55 -03:00
Josiah Clumont
e3e2a3b782 fix(environments): Environment Groups detail view environment breakdown regression [BE-13051] (#2828) 2026-06-08 16:03:32 +12:00
andres-portainer
8daf0bb2a9 feat(customtemplates): use Sources for CustomTemplates BE-12919 (#2759) 2026-06-05 01:51:18 -03:00
Chaim Lev-Ari
d2b56efcb4 feat(security): require setup token for admin init and restore [BE-13029] (#2770) 2026-06-04 09:15:23 +03:00
Hannah Cooper
916367dccb fix(api-docs): time.Duration bounds fix + linting fixes [C9S-223] (#2762) 2026-06-04 15:14:07 +12:00
Chaim Lev-Ari
2ba8b582e2 feat(api): use generated api client [BE-12901] (#2727) 2026-06-03 14:37:39 +03:00
Chaim Lev-Ari
bc81eb7a22 feat(sources): allow user to edit source [BE-12956] (#2748) 2026-06-03 12:52:41 +03:00
Steven Kang
b233453cf7 feat(kubernetes): display cached images per node [R8S-898] (#2068) 2026-06-03 10:40:14 +12:00
Steven Kang
eb5ee3bfdb fix(kubernetes): improve PVC deletion UX based on workload usage [R8S-1046] (#2766) 2026-06-03 09:43:07 +12:00
Steven Kang
86a84c3c6a fix(kubernetes): updated wrong tooltip for container restart feature-gate [R8S-1037] (#2721) 2026-06-03 09:26:04 +12:00
andres-portainer
1fa756372e feat(gitops): general improvements BE-12919 (#2780) 2026-06-02 09:44:57 -03:00
Josiah Clumont
484af3c2c8 feat(environment group) detail view update v1 [c9s-206] (#2722)
Last system-test failure is also on dev
2026-06-02 16:59:18 +12:00
Devon Steenberg
742551e592 fix(registries): make gitlab proxy endpoint admin only [BE-13018] (#2764) 2026-06-02 15:45:57 +12:00
Chaim Lev-Ari
67590aa27d feat(api): auto generate typescript definition from api docs [BE-9222] (#2468) 2026-05-31 14:51:52 +03:00
Ali
6c059c41f9 chore: bump version to 2.43.0 (#2760) 2026-05-30 16:56:17 +12:00
andres-portainer
f1db82934d fix(security): fix a short-circuit condition that can lead to improper access control BE-13020 (#2756) 2026-05-29 20:47:59 -03:00
Hannah Cooper
28dd6b767f fix(api-docs): API docs fixes / improvements [C9S-208] (#2717) 2026-05-29 11:33:06 +12:00