Compare commits

...

3 Commits

Author SHA1 Message Date
agent_coder 600cec6ccc fix(docker): make the API-version floor undefeatable (Ping-based) + pin 1.24 + honor caller timeout (#35 review round 1)
Round-1 review found the floor was DEFEATED for the main #34 target (daemon unreachable
at construction): the SDK keeps ClientVersion() == "1.51" on a ping failure (not empty),
so the old empty-string guard passed, the floor branch was skipped, lazy negotiation
stayed armed, and the client could downgrade below 1.24 on the first real request.

- Detect unreachable EXPLICITLY via cli.Ping(ctx) instead of inspecting ClientVersion():
  ping error -> rebuildAtFloor; otherwise NegotiateAPIVersionPing(ping) applies the
  advertised version AND marks the client negotiated (disarms the lazy re-ping); if the
  result is empty or < 1.24 -> rebuildAtFloor. rebuildAtFloor closes and rebuilds with
  WithVersion(minSupportedAPIVersion) — which sets manualOverride, making both the lazy
  checkVersion and NegotiateAPIVersion no-ops, so the pinned client can NEVER re-negotiate
  below the floor. The rebuild reuses the same opts, so agent-signature headers / TLS /
  custom transport are preserved.
- Pin the floor at minSupportedAPIVersion "1.24", not "1.44": 1.44 is "too new" for a
  genuinely old daemon (400) and would make container.go's `< 1.44` MAC-cleanup wrongly
  skip. 1.24 satisfies the invariant and every supported daemon accepts it. Dropped the
  1.44 const.
- resolveNegotiateTimeout(timeout): use the caller's timeout when set (deployer's 3600s,
  autoupdate/autoheal), else a modest 10s default. The proxy hotpath and snapshot loop
  pass nil -> bounded 10s, so a daemon that accepts TCP but hangs on /_ping can't stall
  the whole snapshot loop for a huge fixed time.
- Test TestNegotiateWithFloor_UnreachableThenBelowFloor: unreachable at construction, then
  the daemon advertises 1.20; asserts the client pins to 1.24, ContainerStart goes /v1.24/
  (not /v1.20/), and /_ping isn't hit again after construction (no lazy re-negotiation).
  Mutation-verified: reverting to the old logic reds it.

go build / vet / gofmt clean; go test ./docker/client/ -race 4/4 pass. Still framed as a
hardening (part of #34) — the vendored SDK's ContainerStart sends no body regardless, so
confirm the failing deployment's build before treating this as the complete #34 fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-05 23:36:14 +03:00
agent_coder dbd2d06659 fix(docker): harden client API-version resolution — eager negotiate + >=1.24 floor (part of #34)
Issue #34 reports container Recreate / native auto-update (#28, #10) failing on modern
Docker Engines via the agent with "starting container with non-empty request body ...
removed in v1.24". The proposed fix: make API-version resolution robust in all three
docker client constructors.

- api/docker/client/client.go: add a shared negotiateWithFloor() used by createLocalClient,
  createTCPClient, createAgentClient. It eagerly NegotiateAPIVersion (10s bound) instead of
  relying on lazy first-call negotiation, and enforces a modern floor: if the negotiated
  version is empty or < 1.24 (versions.LessThan), rebuild pinned at 1.44 (WithVersion, which
  the SDK requires INSTEAD of negotiation — WithVersion sets manualOverride, disabling
  NegotiateAPIVersion). Negotiation stays primary; the floor is only a safety net. Effective
  API version is now guaranteed >= 1.24. Custom transport / TLS / agent headers untouched.
- Tests (api/docker/client/client_test.go): modern daemon -> empty start body; below-floor
  daemon -> pinned to floor; unreachable daemon -> stays >= 1.24.

IMPORTANT FINDING (needs maintainer confirmation before this is called a complete fix for
#34): on the pinned SDK (github.com/docker/docker v28.5.2), ContainerStart posts a nil body
UNCONDITIONALLY (client/container_start.go: cli.post(ctx, ".../start", query, nil, nil) — no
version branch), and NewClientWithOpts defaults to version 1.51 (negotiation only downgrades).
So on develop, the server-side path the issue names (autoupdate.go -> Recreate ->
cli.ContainerStart) cannot emit a request body regardless of API version — this change hardens
the client path but is unlikely to be the actual source of the reported symptom. The
"non-empty body" more likely originates from a DIFFERENT layer (an OLDER docker SDK in the
failing deployment, where ContainerStart did attach HostConfig at API < 1.24; or the docker
PROXY forwarding a raw browser request). Recommend confirming the failing environment's build
before merging as the #34 fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-05 23:05:26 +03:00
vvzvlad 68592337f5 Merge pull request 'feat(automation): входящий registry-push вебхук для мгновенного автообновления (#32)' (#33) from feat/32-registry-webhook into develop
Reviewed-on: #33
2026-07-05 16:51:48 +03:00
2 changed files with 272 additions and 4 deletions
+101 -4
View File
@@ -2,6 +2,7 @@ package client
import (
"bytes"
"context"
"errors"
"fmt"
"io"
@@ -16,6 +17,7 @@ import (
"github.com/rs/zerolog/log"
"github.com/docker/docker/api/types/image"
"github.com/docker/docker/api/types/versions"
"github.com/docker/docker/client"
"github.com/segmentio/encoding/json"
)
@@ -25,6 +27,24 @@ var errUnsupportedEnvironmentType = errors.New("environment not supported")
const (
defaultDockerRequestTimeout = 60 * time.Second
dockerClientVersion = "1.37"
// defaultNegotiateTimeout bounds eager API-version negotiation at client
// construction when the caller does not supply an explicit timeout, so an
// unreachable or hung daemon cannot block construction for long. Callers
// that pass a timeout (e.g. remote stack deploys, container automation)
// override this with their own deliberate value.
defaultNegotiateTimeout = 10 * time.Second
// minSupportedAPIVersion is both the hard invariant every Docker client
// must satisfy and the version the client is pinned to when negotiation
// cannot determine a usable daemon version. Below API v1.24 the daemon
// rejects requests the SDK still shapes with legacy semantics (e.g. a
// non-empty body on "POST /containers/{id}/start"), which breaks container
// recreate (issue #34). 1.24 is accepted by every supported daemon, so it
// is safe to pin even against a genuinely old engine (a newer floor such as
// 1.44 would be rejected as "too new" and would also skip the
// pre-1.44 MAC-address cleanup in container.go).
minSupportedAPIVersion = "1.24"
)
type NodeNamesCtxKey struct{}
@@ -71,11 +91,88 @@ func (factory *ClientFactory) CreateClient(endpoint *portainer.Endpoint, nodeNam
return createTCPClient(endpoint, timeout)
}
// resolveNegotiateTimeout derives the eager-negotiation timeout from the
// caller's requested client timeout, respecting deliberate per-endpoint
// timeouts (e.g. remote stack deploys, container automation) and falling back
// to a modest default when none is provided.
func resolveNegotiateTimeout(timeout *time.Duration) time.Duration {
if timeout != nil {
return *timeout
}
return defaultNegotiateTimeout
}
// negotiateWithFloor builds a Docker client from opts (which must include
// client.WithAPIVersionNegotiation()) and resolves its effective API version at
// construction time, guaranteeing it never drops below minSupportedAPIVersion —
// not even lazily on a later request. The custom agent/TCP transports do not
// reliably resolve a modern API version through the SDK's lazy first-call
// negotiation, so we resolve it up-front here.
//
// We ping the daemon explicitly (bounded by timeout) rather than relying on
// cli.NegotiateAPIVersion + inspecting ClientVersion(), because that cannot
// tell "reached the daemon and negotiated" apart from "never reached it": the
// SDK seeds every client with api.DefaultVersion and NegotiateAPIVersion
// swallows a ping error without clearing the version or marking the client as
// negotiated (docker@v28.5.2 client.go NegotiateAPIVersion). A non-empty
// ClientVersion() would therefore let an unreachable daemon pass, leaving lazy
// negotiation armed to downgrade below the floor on the first real request.
//
// - Ping error (unreachable at construction): pin at the floor.
// - Ping ok but advertised version < floor (or empty): pin at the floor.
// - Ping ok and version >= floor: apply it via NegotiateAPIVersionPing, which
// also marks the client negotiated so it will not re-ping lazily.
//
// The floor is applied by rebuilding with client.WithVersion, which sets
// manualOverride=true and thereby disables all further (lazy) negotiation, so
// the effective version is fixed and can never re-negotiate below the floor.
func negotiateWithFloor(opts []client.Opt, timeout time.Duration) (*client.Client, error) {
cli, err := client.NewClientWithOpts(opts...)
if err != nil {
return nil, err
}
ctx, cancel := context.WithTimeout(context.Background(), timeout)
defer cancel()
ping, err := cli.Ping(ctx)
if err != nil {
// Daemon unreachable at construction: pin at the floor so the client
// cannot lazily negotiate below it on the first real request.
return rebuildAtFloor(cli, opts)
}
// Apply the version advertised by the daemon; this also disarms lazy
// re-negotiation on subsequent requests.
cli.NegotiateAPIVersionPing(ping)
if v := cli.ClientVersion(); v == "" || versions.LessThan(v, minSupportedAPIVersion) {
return rebuildAtFloor(cli, opts)
}
return cli, nil
}
// rebuildAtFloor closes cli and returns a new client from the same opts pinned
// at minSupportedAPIVersion. client.WithVersion sets manualOverride=true, which
// makes both the lazy checkVersion and NegotiateAPIVersion no-ops, so the
// effective API version stays fixed at the floor.
func rebuildAtFloor(cli *client.Client, opts []client.Opt) (*client.Client, error) {
if err := cli.Close(); err != nil {
log.Warn().Err(err).Msg("failed to close docker client before applying API version floor")
}
return client.NewClientWithOpts(append(opts, client.WithVersion(minSupportedAPIVersion))...)
}
func createLocalClient(endpoint *portainer.Endpoint) (*client.Client, error) {
return client.NewClientWithOpts(
opts := []client.Opt{
client.WithHost(endpoint.URL),
client.WithAPIVersionNegotiation(),
)
}
return negotiateWithFloor(opts, defaultNegotiateTimeout)
}
func createTCPClient(endpoint *portainer.Endpoint, timeout *time.Duration) (*client.Client, error) {
@@ -94,7 +191,7 @@ func createTCPClient(endpoint *portainer.Endpoint, timeout *time.Duration) (*cli
opts = append(opts, client.WithScheme("https"))
}
return client.NewClientWithOpts(opts...)
return negotiateWithFloor(opts, resolveNegotiateTimeout(timeout))
}
func createAgentClient(endpoint *portainer.Endpoint, endpointURL string, signatureService portainer.DigitalSignatureService, nodeName string, timeout *time.Duration) (*client.Client, error) {
@@ -128,7 +225,7 @@ func createAgentClient(endpoint *portainer.Endpoint, endpointURL string, signatu
opts = append(opts, client.WithScheme("https"))
}
return client.NewClientWithOpts(opts...)
return negotiateWithFloor(opts, resolveNegotiateTimeout(timeout))
}
type NodeNameTransport struct {
+171
View File
@@ -1,11 +1,20 @@
package client
import (
"context"
"io"
"net/http"
"net/http/httptest"
"strings"
"sync/atomic"
"testing"
portainer "github.com/portainer/portainer/api"
"github.com/portainer/portainer/pkg/fips"
dockercontainer "github.com/docker/docker/api/types/container"
"github.com/docker/docker/api/types/versions"
"github.com/docker/docker/client"
"github.com/stretchr/testify/require"
)
@@ -29,3 +38,165 @@ func TestHttpClient(t *testing.T) {
require.Error(t, err)
require.Nil(t, cli)
}
// startRecord captures the request made to POST /containers/{id}/start.
type startRecord struct {
path string
body []byte
}
// newDockerStub returns an httptest server that mimics a Docker daemon
// advertising apiVersion, and records the request made to the container start
// endpoint into rec.
func newDockerStub(t *testing.T, apiVersion string, rec *startRecord) *httptest.Server {
t.Helper()
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
switch {
case strings.HasSuffix(r.URL.Path, "/_ping"):
w.Header().Set("Api-Version", apiVersion)
w.Header().Set("Ostype", "linux")
w.WriteHeader(http.StatusOK)
case strings.HasSuffix(r.URL.Path, "/start"):
body, _ := io.ReadAll(r.Body)
rec.path = r.URL.Path
rec.body = body
w.WriteHeader(http.StatusNoContent)
default:
w.WriteHeader(http.StatusOK)
_, _ = w.Write([]byte("{}"))
}
}))
t.Cleanup(srv.Close)
return srv
}
// TestNegotiateWithFloor_ModernDaemon verifies that against a daemon
// advertising a modern API version (>= 1.24) the client negotiates a modern
// version and issues POST /containers/{id}/start with an EMPTY body — the
// behavior required by API >= 1.24 (regression test for issue #34).
func TestNegotiateWithFloor_ModernDaemon(t *testing.T) {
t.Parallel()
var rec startRecord
srv := newDockerStub(t, "1.54", &rec)
opts := []client.Opt{
client.WithHost(srv.URL),
client.WithAPIVersionNegotiation(),
}
cli, err := negotiateWithFloor(opts, defaultNegotiateTimeout)
require.NoError(t, err)
t.Cleanup(func() { _ = cli.Close() })
// Negotiation resolved a concrete, modern version (capped at the client's
// maximum supported version).
require.NotEmpty(t, cli.ClientVersion())
require.False(t, versions.LessThan(cli.ClientVersion(), minSupportedAPIVersion),
"effective API version %q must be >= %q", cli.ClientVersion(), minSupportedAPIVersion)
err = cli.ContainerStart(context.Background(), "abc123", dockercontainer.StartOptions{})
require.NoError(t, err)
// The request must be versioned and carry no HostConfig body.
require.Contains(t, rec.path, "/v"+cli.ClientVersion()+"/")
require.Empty(t, rec.body, "start request body must be empty on API >= 1.24")
}
// TestNegotiateWithFloor_BelowFloorDaemon verifies the floor branch: a daemon
// that advertises an API version below the supported floor causes negotiation
// to resolve a legacy version, and the client must then be pinned at
// minSupportedAPIVersion instead of issuing requests below the supported floor.
func TestNegotiateWithFloor_BelowFloorDaemon(t *testing.T) {
t.Parallel()
var rec startRecord
// 1.20 is below minSupportedAPIVersion (1.24); negotiation would downgrade
// the client to it, which is exactly what must NOT be used for requests.
srv := newDockerStub(t, "1.20", &rec)
opts := []client.Opt{
client.WithHost(srv.URL),
client.WithAPIVersionNegotiation(),
}
cli, err := negotiateWithFloor(opts, defaultNegotiateTimeout)
require.NoError(t, err)
t.Cleanup(func() { _ = cli.Close() })
// The client is pinned at the supported floor, not the legacy 1.20.
require.Equal(t, minSupportedAPIVersion, cli.ClientVersion())
}
// TestNegotiateWithFloor_UnreachableThenBelowFloor covers the exact issue #34
// scenario the reviewer reproduced: the daemon is UNREACHABLE at construction
// time, then comes back up advertising a legacy version (1.20). The client must
// pin at the floor during construction AND must not lazily re-negotiate down to
// the legacy version on the first real request.
func TestNegotiateWithFloor_UnreachableThenBelowFloor(t *testing.T) {
t.Parallel()
var (
reachable atomic.Bool // false during construction, true afterwards
pingHits atomic.Int32
rec startRecord
)
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
switch {
case strings.HasSuffix(r.URL.Path, "/_ping"):
pingHits.Add(1)
if !reachable.Load() {
// Unreachable: make Ping fail (both HEAD and its GET fallback).
w.WriteHeader(http.StatusServiceUnavailable)
return
}
// Daemon is now up, but advertises a legacy, below-floor version.
w.Header().Set("Api-Version", "1.20")
w.Header().Set("Ostype", "linux")
w.WriteHeader(http.StatusOK)
case strings.HasSuffix(r.URL.Path, "/start"):
body, _ := io.ReadAll(r.Body)
rec.path = r.URL.Path
rec.body = body
w.WriteHeader(http.StatusNoContent)
default:
w.WriteHeader(http.StatusOK)
_, _ = w.Write([]byte("{}"))
}
}))
t.Cleanup(srv.Close)
opts := []client.Opt{
client.WithHost(srv.URL),
client.WithAPIVersionNegotiation(),
}
// Construction happens while the daemon is unreachable.
cli, err := negotiateWithFloor(opts, defaultNegotiateTimeout)
require.NoError(t, err)
t.Cleanup(func() { _ = cli.Close() })
require.Equal(t, minSupportedAPIVersion, cli.ClientVersion(),
"unreachable daemon must pin the client at the floor")
pingsAfterConstruction := pingHits.Load()
// The daemon comes back up advertising a legacy version.
reachable.Store(true)
err = cli.ContainerStart(context.Background(), "abc123", dockercontainer.StartOptions{})
require.NoError(t, err)
// The floor must hold: the request is issued at the pinned version, not the
// daemon's legacy 1.20, and the client did not re-ping to re-negotiate.
require.Contains(t, rec.path, "/v"+minSupportedAPIVersion+"/",
"start must be issued at the pinned floor version, not the legacy 1.20")
require.NotContains(t, rec.path, "/v1.20/")
require.Equal(t, pingsAfterConstruction, pingHits.Load(),
"client must not lazily re-negotiate after being pinned at the floor")
}