Hi,
have the following problem here:
I set up a docker based Rancher 2.5.5 installation in an air-gapped customer environment.
All needed images are in a local registry. And I start Rancher as follows:
—snip—
RANCHER_PRIVATE_DOCKER_REGISTRY=“myregistry.mydomain.de/lcm_local_rancher_img”
RANCHER_SERVER_NAME=“myserver.mydomain.de”
docker run -d --restart=unless-stopped
–name rancher_server
-p 80:80 -p 443:443
-e CATTLE_SYSTEM_CATALOG=bundled
-e CATTLE_API_HOST=“https://RANCHER_SERVER_NAME:443" \
-v /etc/rancher/ssl/{RANCHER_SERVER_NAME}.crt:/etc/rancher/ssl/cert.pem
-v /etc/rancher/ssl/{RANCHER_SERVER_NAME}.key:/etc/rancher/ssl/key.pem \
-v /var/lib/rancher:/var/lib/rancher \
-e AUDIT_LEVEL=1 -v /var/log/rancher/auditlog:/var/log/auditlog \
--privileged \
"{RANCHER_PRIVATE_DOCKER_REGISTRY}/rancher/rancher:${RANCHER_SERVER_VERSION}” --no-cacerts
—snap—
Rancher starts can can be used to create new clusters on other nodes in same datacenter.
The RANCHER_PRIVATE_DOCKER_REGISTRY is used then. All is working fine up to this point.
But when I stop and remove the container and start again (using the command from above), I see the
messages like the following in the logs:
…
Feb 4 20:36:01 RTST-A0101 journal: E0204 19:36:01.531114 40 kuberuntime_manager.go:727] createPodSandbox for pod “helm-operation-p7d4h_cattle-system(b32591c0-b958-4f1d-93b9-dd2c7c419cf7)” failed: rpc error: code = Unknown desc = failed to get sandbox image “docker.io/rancher/pause:3.1”: failed to pull image “docker.io/rancher/pause:3.1”: failed to pull and unpack image “docker.io/rancher/pause:3.1”: failed to resolve reference “docker.io/rancher/pause:3.1”: failed to do request: Head https://registry-1.docker.io/v2/rancher/pause/manifests/3.1: dial tcp 52.72.232.213:443: i/o timeout
Feb 4 20:36:01 RTST-A0101 journal: E0204 19:36:01.531168 40 pod_workers.go:191] Error syncing pod b32591c0-b958-4f1d-93b9-dd2c7c419cf7 (“helm-operation-p7d4h_cattle-system(b32591c0-b958-4f1d-93b9-dd2c7c419cf7)”), skipping: failed to “CreatePodSandbox” for “helm-operation-p7d4h_cattle-system(b32591c0-b958-4f1d-93b9-dd2c7c419cf7)” with CreatePodSandboxError: “CreatePodSandbox for pod “helm-operation-p7d4h_cattle-system(b32591c0-b958-4f1d-93b9-dd2c7c419cf7)” failed: rpc error: code = Unknown desc = failed to get sandbox image “docker.io/rancher/pause:3.1”: failed to pull image “docker.io/rancher/pause:3.1”: failed to pull and unpack image “docker.io/rancher/pause:3.1”: failed to resolve reference “docker.io/rancher/pause:3.1”: failed to do request: Head https://registry-1.docker.io/v2/rancher/pause/manifests/3.1: dial tcp 52.72.232.213:443: i/o timeout”
Feb 4 20:36:02 RTST-A0101 journal: E0204 19:36:02.530509 40 remote_runtime.go:105] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to get sandbox image “docker.io/rancher/pause:3.1”: failed to pull image “docker.io/rancher/pause:3.1”: failed to pull and unpack image “docker.io/rancher/pause:3.1”: failed to resolve reference “docker.io/rancher/pause:3.1”: failed to do request: Head https://registry-1.docker.io/v2/rancher/pause/manifests/3.1: dial tcp 54.236.131.166:443: i/o timeout
…
Although, it is an air-gapped install, the container-internal K3s-based cluster tries to pull images from the ouside.
If I use Cluster Explorer for the local cluster, I see some small red skulls in the Cluster Dashboard indicating problems and
FailedCreatePodSandBox/Failed-to-pull events will be displayed (in 3 second periods or so).
In Cluster Manager’s global view the cluster “local” is green / active.
After some tests, the only solution to eliminate the problems seems to start with additional env variables https_proxy and
no_proxy in Docker command. But then it’s not really air-gapped…
Is this a bug or does it work as designed? Don’t think so…
Are there any hints to prevent this situation?
How can the internal cluster use the air-gapped private registry, too?
Geralf