I’m at wits end with Rancher, and am seven attempts into trying to get a functional install just for a basic proof of concept. I am attempting to set up a three node RKE2 cluster on a VMware environment using RHEL boxes, and then install Rancher. The intention is to use our own certificates from a private CA. I can get the RKE2 cluster built every time, and it results in a functional Kubernetes cluster.
Every time I install Rancher it fails, and not always the same way. Most of the time the installation “fails” with the following message:
Error: INSTALLATION FAILED: 1 error occurred:
* Internal error occurred: failed calling webhook “validate.nginx.ingress.kubernetes.io”: failed to call webhook: Post “https://rke2-ingress-nginx-controller-admission.kube-system.svc:443/networking/v1/ingresses?timeout=10s”: context deadline exceeded
However, the install continues anyway and results in a non-functional Rancher environment. I had thought I had found the solution for the error above when I configured RHEL’s NetworkManager to ignore the cali and flannel interfaces, but on that attempt while the Rancher install “succeeded” the interface was incredibly slow and basically didn’t work. On this latest attempt, with the same steps as before, I’m back to the ingress error above with a failed installation.
I’ve spent many, many hours Googling these issues and either find unrelated issues, or people having very similar issues that get no replies. Is there better documentation somewhere, that actually results in a functional implementation of Rancher? If it is this difficult to spin up a simple demo environment I’m really struggling entertaining the thought of ever running this in production.
Please share your configuration file for master.
Can you please share the output of the following commands on one of your RKE2 control plane nodes?
kubectl get po -n kube-system
kubectl get po -n calico-system
More particularly, I am looking at the health of ingress-nginx
, kube-proxy
and all calico pods (calico-node
and the controller.)
My apologies as I’m relatively new to k8s and Rancher; which file would this be? The yml file used during the installation of RKE2 to create/join the cluster, or something else?
Certainly.
kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
cloud-controller-manager-xvmexrnch0001.localdomain.local 1/1 Running 2 (4d22h ago) 5d
cloud-controller-manager-xvmexrnch0002.localdomain.local 1/1 Running 3 (4d22h ago) 5d
cloud-controller-manager-xvmexrnch0003.localdomain.local 1/1 Running 2 (4d22h ago) 5d
etcd-xvmexrnch0001.localdomain.local 1/1 Running 1 5d
etcd-xvmexrnch0002.localdomain.local 1/1 Running 2 5d
etcd-xvmexrnch0003.localdomain.local 1/1 Running 2 5d
helm-install-rke2-canal-wff85 0/1 Completed 0 5d
helm-install-rke2-coredns-8sqjn 0/1 Completed 0 5d
helm-install-rke2-ingress-nginx-2dcgl 0/1 Completed 0 5d
helm-install-rke2-metrics-server-2nv8f 0/1 Completed 0 5d
helm-install-rke2-snapshot-controller-65vgr 0/1 Completed 1 5d
helm-install-rke2-snapshot-controller-crd-2rx27 0/1 Completed 0 5d
helm-install-rke2-snapshot-validation-webhook-qgfl8 0/1 Completed 0 5d
kube-apiserver-xvmexrnch0001.localdomain.local 1/1 Running 1 5d
kube-apiserver-xvmexrnch0002.localdomain.local 1/1 Running 1 5d
kube-apiserver-xvmexrnch0003.localdomain.local 1/1 Running 1 5d
kube-controller-manager-xvmexrnch0001.localdomain.local 1/1 Running 2 (4d22h ago) 5d
kube-controller-manager-xvmexrnch0002.localdomain.local 1/1 Running 3 (4d22h ago) 5d
kube-controller-manager-xvmexrnch0003.localdomain.local 1/1 Running 2 (4d22h ago) 5d
kube-proxy-xvmexrnch0001.localdomain.local 1/1 Running 2 (4d22h ago) 4d22h
kube-proxy-xvmexrnch0002.localdomain.local 1/1 Running 2 (4d22h ago) 4d22h
kube-proxy-xvmexrnch0003.localdomain.local 1/1 Running 2 (4d22h ago) 4d22h
kube-scheduler-xvmexrnch0001.localdomain.local 1/1 Running 1 (4d22h ago) 5d
kube-scheduler-xvmexrnch0002.localdomain.local 1/1 Running 1 (4d22h ago) 5d
kube-scheduler-xvmexrnch0003.localdomain.local 1/1 Running 1 (4d22h ago) 5d
rke2-canal-96v99 2/2 Running 2 (4d22h ago) 5d
rke2-canal-cgv67 2/2 Running 2 (4d22h ago) 5d
rke2-canal-klh9q 2/2 Running 2 (4d22h ago) 5d
rke2-coredns-rke2-coredns-565dfc7d75-mnq9l 1/1 Running 1 (4d22h ago) 5d
rke2-coredns-rke2-coredns-565dfc7d75-rs5tx 1/1 Running 1 (4d22h ago) 5d
rke2-coredns-rke2-coredns-autoscaler-6c48c95bf9-lxznf 1/1 Running 1 (4d22h ago) 5d
rke2-ingress-nginx-controller-24hcv 1/1 Running 1 (4d22h ago) 5d
rke2-ingress-nginx-controller-rmsmj 1/1 Running 1 (4d22h ago) 5d
rke2-ingress-nginx-controller-w5l8l 1/1 Running 1 (4d22h ago) 5d
rke2-metrics-server-c9c78bd66-zf69j 1/1 Running 2 (4d22h ago) 5d
rke2-snapshot-controller-6f7bbb497d-vdxs7 1/1 Running 1 (4d22h ago) 5d
rke2-snapshot-validation-webhook-65b5675d5c-dkvn6 1/1 Running 2 (4d22h ago) 5d
kubectl get po -n calico-system
No resources found in calico-system namespace.
kubectl get ns
NAME STATUS AGE
default Active 5d
kube-node-lease Active 5d
kube-public Active 5d
kube-system Active 5d
I’ve never seen that calico-system namespace, even on the Rancher install that technically reached the “success screen” (but didn’t end up working well). If it’s related at all, I did not install cert-manager, as we are using our own certs from our on-prem CA and the documentation indicated it isn’t needed.
Yes, the one which you are using to install (yaml)
@OUberLord Thank you for the previous output.
Here are some additional questions:
- Can you show me the
helm install
command you used to install Rancher?
- Can you show the output of the command:
kubectl get secret -n cattle-system
3-4 min after the helm install
command ?
- Please also the output of
kubectl get ing rancher -n cattle-system -o yaml
Some observations:
- Don’t worry about calico, because you are deploying
canal
instead and I do see the pods are active.
- kube-proxy, coreDNS and Ingress Nginx seem to be healthy. So, it should work.
- The fact that cert-manager is not installed is an important piece of information, because it means you have to do stuff in a very specific way to make Rancher work. However, your error message does not make much sense to me in that context.
The goal will be to:
- Check that you are using the right options to install Rancher without cert-manager.
- Verify that the Rancher ingress object is well configured
- Check that there is a certificate secret in the right namespace.
@belgaied2 Absolutely, and thank you for taking the time to help me out.
One thing I did notice: Before when I showed you the namespace listing, I had forgotten that I had reverted the servers back and had installed RKE2 / gotten the cluster working, but had NOT yet proceeded to the Rancher install in this iteration of the environment. Thus, cattle-system was “missing” from that output.
The helm install command I’ve been using is:
helm install rancher rancher-stable/rancher --namespace cattle-system --set hostname=rancher-demo.localdomain.local --set bootstrapPassword=admin --set ingress.tls.source=secret --set privateCA=true --set tls=external
Regarding the secrets, this is where I’m wondering if I could be doing something incorrectly. During one of my earliest attempts at the install, after the helm install I was seeing errors about missing secrets (the ones that you later manually create). I began to create the namespace as well as the secrets first, and then run the helm install. The commands look like the following:
kubectl create namespace cattle-system
kubectl -n cattle-system create secret tls tls-rancher-ingress --cert=tls.crt --key=tls.key
kubectl -n cattle-system create secret generic tls-ca --from-file=cacerts.pem=cacerts.pem
For clarity’s sake, should I be creating these secrets first and then running the helm install, or is that incorrect? If I create them first then run the install, here is the output after 3-4 minutes:
kubectl get secret -n cattle-system
NAME TYPE DATA AGE
bootstrap-secret Opaque 1 3m59s
git-webhook-api-service-token-qvpcj kubernetes.io/service-account-token 3 2m17s
helm-operation-7t9lp Opaque 3 21s
helm-operation-dhl29 Opaque 3 83s
pod-impersonation-helm-op-nss67-token kubernetes.io/service-account-token 3 86s
pod-impersonation-helm-op-zfqxn-token kubernetes.io/service-account-token 3 23s
rancher-token-lwzm4 kubernetes.io/service-account-token 3 2m59s
serving-cert kubernetes.io/tls 2 2m8s
sh.helm.release.v1.rancher.v1 helm.sh/release.v1 1 3m59s
tls-ca Opaque 1 4m47s
tls-rancher kubernetes.io/tls 2 2m8s
tls-rancher-ingress kubernetes.io/tls 2 4m52s
tls-rancher-internal kubernetes.io/tls 2 2m7s
tls-rancher-internal-ca kubernetes.io/tls 2 2m8s
As for the ingress output:
kubectl get ing rancher -n cattle-system -o yaml
Error from server (NotFound): ingresses.networking.k8s.io "rancher" not found
Also, at the same point of time after the install attempt, this is what the namespaces look like:
kubectl get ns
NAME STATUS AGE
cattle-fleet-system Active 3m12s
cattle-global-data Active 3m55s
cattle-global-nt Active 3m54s
cattle-impersonation-system Active 3m40s
cattle-system Active 6m43s
default Active 6d6h
fleet-default Active 4m3s
fleet-local Active 4m44s
kube-node-lease Active 6d6h
kube-public Active 6d6h
kube-system Active 6d6h
local Active 4m8s
p-6tkpk Active 3m49s
p-g2z6x Active 3m49s
It may be noteworthy that, approximately 15 minutes after the install attempt, the secrets list looks like this, with a lot of what seem to be duplicates being created:
kubectl get secret -n cattle-system
NAME TYPE DATA AGE
bootstrap-secret Opaque 1 13m
cattle-webhook-ca kubernetes.io/tls 2 3m30s
cattle-webhook-tls kubernetes.io/tls 2 3m30s
git-webhook-api-service-token-qvpcj kubernetes.io/service-account-token 3 12m
helm-operation-7t9lp Opaque 3 10m
helm-operation-dhl29 Opaque 3 11m
helm-operation-l968s Opaque 3 3m56s
helm-operation-ljs8h Opaque 3 7m3s
helm-operation-m4w2h Opaque 3 8m5s
helm-operation-p67qj Opaque 3 9m8s
helm-operation-tdqtr Opaque 3 6m1s
helm-operation-wd7hz Opaque 3 4m58s
pod-impersonation-helm-op-4mhj6-token kubernetes.io/service-account-token 3 8m7s
pod-impersonation-helm-op-hz4kd-token kubernetes.io/service-account-token 3 9m10s
pod-impersonation-helm-op-k42x8-token kubernetes.io/service-account-token 3 7m5s
pod-impersonation-helm-op-nj8cs-token kubernetes.io/service-account-token 3 3m58s
pod-impersonation-helm-op-nss67-token kubernetes.io/service-account-token 3 11m
pod-impersonation-helm-op-rvtxb-token kubernetes.io/service-account-token 3 6m3s
pod-impersonation-helm-op-w6nww-token kubernetes.io/service-account-token 3 5m
pod-impersonation-helm-op-zfqxn-token kubernetes.io/service-account-token 3 10m
rancher-token-lwzm4 kubernetes.io/service-account-token 3 12m
serving-cert kubernetes.io/tls 2 11m
sh.helm.release.v1.rancher-webhook.v1 helm.sh/release.v1 1 3m35s
sh.helm.release.v1.rancher.v1 helm.sh/release.v1 1 13m
tls-ca Opaque 1 14m
tls-rancher kubernetes.io/tls 2 11m
tls-rancher-ingress kubernetes.io/tls 2 14m
tls-rancher-internal kubernetes.io/tls 2 11m
tls-rancher-internal-ca kubernetes.io/tls 2 11m