Unable to run Rancher 2 on Ubuntu 18.04

I’ve just about given up trying to troubleshoot something that shouldn’t be an issue in the first place. I have two Poweredge servers with fresh Ubuntu 18.04 installs. Firewalls are completely disabled. I followed the instructions to install to the letter, and got as far as here, with everything before it working fine:

kubectl -n cattle-sys rollout status deploy/rancher

The deployment times out, the three containers try to run but they are repeatedly terminated and restarted with a connection refused error to an internal IP in the cluster. I am using Docker CE as the container engine. I have looked EVERYWHERE for possible causes. ???

rancher-556d94d669-bkbp4   0/1     Error              19         15h
rancher-556d94d669-s9vpz   0/1     Unknown            20         15h
rancher-556d94d669-5gmmb   0/1     Unknown            19         15h
rancher-7f9585b56d-s8lqj   0/1     CrashLoopBackOff   5          8m16s
rancher-7f9585b56d-76v6b   0/1     Running            6          8m16s
rancher-7f9585b56d-wk9l5   0/1     Running            4          8m16s

Several of the failed container says this:

Warning  FailedMount  8m41s (x71 over 152m)  kubelet, zork-poweredge-1950  Unable to attach or mount volumes: unmounted volumes=[rancher-token-dtt4v], unattached volumes=[rancher-token-dtt4v]: timed out waiting for the condition
Warning  FailedMount  3m50s (x82 over 154m)  kubelet, zork-poweredge-1950  MountVolume.SetUp failed for volume "rancher-token-dtt4v" : secret "rancher-token-dtt4v" not found

The only container left that is running says this:

Events:
  Type     Reason     Age                     From                          Message
  ----     ------     ----                    ----                          -------
  Normal   Scheduled  <unknown>               default-scheduler             
Successfully assigned cattle-sys/rancher-7f9585b56d-76v6b to zork-poweredge-r410
  Normal   Killing    10m                     kubelet, zork-poweredge-r410  Container rancher failed liveness probe, will be restarted
  Warning  Unhealthy  9m20s (x4 over 11m)     kubelet, zork-poweredge-r410  Liveness probe failed: Get 
http://10.42.0.56:80/healthz: dial tcp 10.42.0.56:80: connect: connection refused
  Normal   Started    8m59s (x3 over 12m)     kubelet, zork-poweredge-r410  Started container rancher
  Warning  Unhealthy  8m52s (x8 over 12m)     kubelet, zork-poweredge-r410  Readiness probe failed: Get 
http://10.42.0.56:80/healthz: dial tcp 10.42.0.56:80: connect: connection refused
  Normal   Pulled     8m24s (x4 over 12m)     kubelet, zork-poweredge-r410  Container image "rancher/rancher:v2.4.8" already present on machine
  Normal   Created    8m24s (x4 over 12m)     kubelet, zork-poweredge-r410  Created container rancher
  Warning  BackOff    2m38s (x20 over 8m37s)  kubelet, zork-poweredge-r410  Back-off restarting failed container

Help!

Selinux and perhaps proxy checked?
Ralf

SELinux is not installed on Ubuntu 18 by default. There is also no proxy on a fresh install.

Sorry,
I meant a network-proxy, not a local proxy.
Ralf

There is no proxy anywhere. These machines are sitting in my home NAT network. No firewalls, nothing.

Nat means virtual host with nat virtual network interface?
Ralf

It’s a garden variety home network with a router, DHCP and network address translation (NAT). It’s just my home office lab.

ok,
I know several problems when proxies or nat devices are active.
Did you try to analyse the logfiles of all containers for hints?

Ralf

How does that work with Rancher?

docker logs -f ID of the container.
Start with the main rancher container

You don’t have 3 servers so you don’t want 3 instances of the server… --set replicas=2 in the helm install.

What kind of CPUs are in those and what kind of k8s cluster are you installing on? Is the ingress controller running? 1950 and r410 are both names I recognize from 10+ year ago and some of the images have required CPU features you may not have (https://github.com/rancher/rancher/issues/23307).

The Xeon CPU’s are fine, there are no illegal instruction errors. I’ll reset the replicas to two but that still does not explain the problem. this will have to wait until the weekend, I don’t have time to waste now troubleshooting something that should work.

Please share what exact commands you used + output, and any other check commands + output you ran. The pod output show two installs (at least, based on the output there are multiple ReplicaSet (rancher-$IDENTIFIER-$RANDOM) which does not happen on default install, it should just be one with 3 pods (or 2 if you used the flag to modify it). If you are trying to redo the install, please use rke remove to clean the nodes properly (reboot doesn’t hurt either) and start over. If you can share any other outputs regarding kubectl (like get nodes and get pods --all-namespaces), there is more info to determine what is failing.