Air-gapped Docker-based Rancher 2.5.5 install still uses docker.io

Hi,

have the following problem here:

I set up a docker based Rancher 2.5.5 installation in an air-gapped customer environment.
All needed images are in a local registry. And I start Rancher as follows:
—snip—
RANCHER_PRIVATE_DOCKER_REGISTRY=“myregistry.mydomain.de/lcm_local_rancher_img
RANCHER_SERVER_NAME=“myserver.mydomain.de
docker run -d --restart=unless-stopped
–name rancher_server
-p 80:80 -p 443:443
-e CATTLE_SYSTEM_CATALOG=bundled
-e CATTLE_API_HOST=“https://RANCHER_SERVER_NAME:443" \ -v /etc/rancher/ssl/{RANCHER_SERVER_NAME}.crt:/etc/rancher/ssl/cert.pem
-v /etc/rancher/ssl/{RANCHER_SERVER_NAME}.key:/etc/rancher/ssl/key.pem \ -v /var/lib/rancher:/var/lib/rancher \ -e AUDIT_LEVEL=1 -v /var/log/rancher/auditlog:/var/log/auditlog \ --privileged \ "{RANCHER_PRIVATE_DOCKER_REGISTRY}/rancher/rancher:${RANCHER_SERVER_VERSION}” --no-cacerts
—snap—

Rancher starts can can be used to create new clusters on other nodes in same datacenter.
The RANCHER_PRIVATE_DOCKER_REGISTRY is used then. All is working fine up to this point.

But when I stop and remove the container and start again (using the command from above), I see the
messages like the following in the logs:

Feb 4 20:36:01 RTST-A0101 journal: E0204 19:36:01.531114 40 kuberuntime_manager.go:727] createPodSandbox for pod “helm-operation-p7d4h_cattle-system(b32591c0-b958-4f1d-93b9-dd2c7c419cf7)” failed: rpc error: code = Unknown desc = failed to get sandbox image “docker.io/rancher/pause:3.1”: failed to pull image “docker.io/rancher/pause:3.1”: failed to pull and unpack image “docker.io/rancher/pause:3.1”: failed to resolve reference “docker.io/rancher/pause:3.1”: failed to do request: Head https://registry-1.docker.io/v2/rancher/pause/manifests/3.1: dial tcp 52.72.232.213:443: i/o timeout
Feb 4 20:36:01 RTST-A0101 journal: E0204 19:36:01.531168 40 pod_workers.go:191] Error syncing pod b32591c0-b958-4f1d-93b9-dd2c7c419cf7 (“helm-operation-p7d4h_cattle-system(b32591c0-b958-4f1d-93b9-dd2c7c419cf7)”), skipping: failed to “CreatePodSandbox” for “helm-operation-p7d4h_cattle-system(b32591c0-b958-4f1d-93b9-dd2c7c419cf7)” with CreatePodSandboxError: “CreatePodSandbox for pod “helm-operation-p7d4h_cattle-system(b32591c0-b958-4f1d-93b9-dd2c7c419cf7)” failed: rpc error: code = Unknown desc = failed to get sandbox image “docker.io/rancher/pause:3.1”: failed to pull image “docker.io/rancher/pause:3.1”: failed to pull and unpack image “docker.io/rancher/pause:3.1”: failed to resolve reference “docker.io/rancher/pause:3.1”: failed to do request: Head https://registry-1.docker.io/v2/rancher/pause/manifests/3.1: dial tcp 52.72.232.213:443: i/o timeout”
Feb 4 20:36:02 RTST-A0101 journal: E0204 19:36:02.530509 40 remote_runtime.go:105] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to get sandbox image “docker.io/rancher/pause:3.1”: failed to pull image “docker.io/rancher/pause:3.1”: failed to pull and unpack image “docker.io/rancher/pause:3.1”: failed to resolve reference “docker.io/rancher/pause:3.1”: failed to do request: Head https://registry-1.docker.io/v2/rancher/pause/manifests/3.1: dial tcp 54.236.131.166:443: i/o timeout

Although, it is an air-gapped install, the container-internal K3s-based cluster tries to pull images from the ouside.
If I use Cluster Explorer for the local cluster, I see some small red skulls in the Cluster Dashboard indicating problems and
FailedCreatePodSandBox/Failed-to-pull events will be displayed (in 3 second periods or so).
In Cluster Manager’s global view the cluster “local” is green / active.

After some tests, the only solution to eliminate the problems seems to start with additional env variables https_proxy and
no_proxy in Docker command. But then it’s not really air-gapped…

Is this a bug or does it work as designed? Don’t think so…
Are there any hints to prevent this situation?
How can the internal cluster use the air-gapped private registry, too?

Geralf

1 Like

We are having the same problem when trying with air-gap installation with 2.5.x tried several version.
Would see that this is very important to get solved.
Will test more with 2.4.x, but would be great to be able to upgrade to newer versions.

/Christopher

I had this problem too. It turns out K3s needs to be configured independently for a local registry. You need to create a registries.yaml file and mount it into your Rancher container (/etc/rancher/k3s/) as explained here: Rancher Docs: Private Registry Configuration

There doesn’t seem to be anything in the Rancher installation docs about this, but there is in the release notes on GitHub.

1 Like