Hi all,
I’ve set up a rancher cluster according to the High Availability (HA) Install. My rancher-cluster.yml
looks as such (substituted actual FQDN for example.com
):
nodes:
- address: rancher1.example.com
internal_address: 10.0.0.181
user: rancher
role: [controlplane,etcd,worker]
ssh_key_path: ~/.ssh/id_ed25519
- address: rancher2.example.com
internal_address: 10.0.0.182
user: rancher
role: [controlplane,etcd,worker]
ssh_key_path: ~/.ssh/id_ed25519
- address: rancher3.example.com
internal_address: 10.0.0.183
user: rancher
role: [controlplane,etcd,worker]
ssh_key_path: ~/.ssh/id_ed25519
services:
etcd:
snapshot: true
creation: 6h
The rke up --config ./rancher-cluster.yml
command finishes succesfully and kubectl get nodes
reports the following (again, substituted FQDN):
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
rancher1.example.com Ready controlplane,etcd,worker 1h v1.11.6
rancher2.example.com Ready controlplane,etcd,worker 1h v1.11.6
rancher3.example.com Ready controlplane,etcd,worker 1h v1.11.6
However, kubectl get pods --all-namespaces
reports that the nginx-ingress-controller
pods have status CrashLoopBackOff
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
ingress-nginx default-http-backend-797c5bc547-hbnnt 1/1 Running 0 1h
ingress-nginx nginx-ingress-controller-7n5kn 0/1 CrashLoopBackOff 20 1h
ingress-nginx nginx-ingress-controller-jpzg7 0/1 CrashLoopBackOff 20 1h
ingress-nginx nginx-ingress-controller-wxtp2 0/1 CrashLoopBackOff 10 28m
kube-system canal-4fhwf 3/3 Running 0 1h
kube-system canal-8mgbp 3/3 Running 0 1h
kube-system canal-97j6n 3/3 Running 0 1h
kube-system kube-dns-7588d5b5f5-62lvd 3/3 Running 0 1h
kube-system kube-dns-autoscaler-5db9bbb766-xdd4v 1/1 Running 0 1h
kube-system metrics-server-97bc649d5-6pbfk 1/1 Running 0 1h
kube-system rke-ingress-controller-deploy-job-tn7ml 0/1 Completed 0 1h
kube-system rke-kubedns-addon-deploy-job-pt7r6 0/1 Completed 0 1h
kube-system rke-metrics-addon-deploy-job-whsq7 0/1 Completed 0 1h
kube-system rke-network-plugin-deploy-job-jnjsn 0/1 Completed 0 1h
Fetching the logs of one of these pods, shows the following:
$ kubectl -n ingress-nginx logs nginx-ingress-controller-wxtp2
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: 0.16.2-rancher1
Build: d36d4cc
Repository: git@github.com:rancher/ingress-nginx.git
-------------------------------------------------------------------------------
F0201 12:08:23.061615 7 main.go:72] Port 80 is already in use. Please check the flag --http-port
Searches have yielded results that port 80 is already in use by a proxy container, but I haven’t found a way to modify the use of port 80 for the ingress controller deployment.
Extra host(s) information:
RAM: 16GB
CPU: 4
OS: Debian 9
Does anyone have a solution or pointer as to how to resolve this CrashLoopBackOff? Thanks in advance.
EDIT:
Extra information:
Searching GitHub leads to this report
Resolving this issue:
Patch the daemonset to run as user 0 (root) instead of 33 (www-data)
kubectl patch ds nginx-ingress-controller -n ingress-nginx -p '{"spec":{"template":{"spec":{"containers":[{"name":"nginx-ingress-controller","securityContext":{"runAsUser":0}}]}}}}'
Delete all pods in the ingress-nginx
namespace (so they will get respawned properly)
kubectl delete pods -n ingress-nginx --all