We have setup a production HA rancher cluster with K3S: v1.20.6+k3s1. It’s a 2-node setup with certs from geotrust. We have it sitting behind a Citrix LB and the certs/setup seem fine (at least to the browsers).
We have not been able to import a new RKE cluster into rancher. It sits at:
This cluster is currently Provisioning ; areas that interact directly with it will not be available until the API is ready.
Waiting for etcd, controlplane and worker nodes to be registered
I have gone over the TLS setup and verified the certs we used to create the tls-rancher-ingress.
[root@rancher-pgh02 ]# openssl verify -verbose -CAfile <(cat digicertglobalroot.pem DigiCertIntCA.crt) rancher_somewhere_com.crt
rancher_somewhere_com.crt: OK
We created the setup with the hostname set to the LB DNS name.
helm install rancher rancher-stable/rancher \
--namespace cattle-system \
--set hostname=rancher.somewhere.com \
--set ingress.tls.source=secret
The process to add the node to rancher is cycling with this from docker logs:
time="2021-05-25T12:29:54Z" level=info msg="node kubelet-cp-pgh01 is not registered, restarting kubelet now"
time="2021-05-25T12:29:54Z" level=info msg="Listening on /tmp/log.sock"
time="2021-05-25T12:29:54Z" level=info msg="Rancher agent version v2.5.7 is starting"
time="2021-05-25T12:29:54Z" level=info msg="Option customConfig=map[address:10.70.12.196 internalAddress: label:map[] roles:[controlplane] taints:[]]"
time="2021-05-25T12:29:54Z" level=info msg="Option etcd=false"
time="2021-05-25T12:29:54Z" level=info msg="Option controlPlane=true"
time="2021-05-25T12:29:54Z" level=info msg="Option worker=false"
time="2021-05-25T12:29:54Z" level=info msg="Option requestedHostname=kubelet-cp-pgh01"
time="2021-05-25T12:29:54Z" level=info msg="Certificate details from https : // rancher.somewhere.com"
time="2021-05-25T12:29:54Z" level=info msg="Certificate #0 (https : // rancher.somewhere.com)"
time="2021-05-25T12:29:54Z" level=info msg="Subject: CN=rancher.somewhere.com,O=Some Where Systems LLC,L=Wellesley,ST=Massachusetts,C=US"
time="2021-05-25T12:29:54Z" level=info msg="Issuer: CN=DigiCert TLS RSA SHA256 2020 CA1,O=DigiCert Inc,C=US"
time="2021-05-25T12:29:54Z" level=info msg="IsCA: false"
time="2021-05-25T12:29:54Z" level=info msg="DNS Names: [rancher.somewhere.com www.rancher.somewhere.com]"
time="2021-05-25T12:29:54Z" level=info msg="IPAddresses: <none>"
time="2021-05-25T12:29:54Z" level=info msg="NotBefore: 2021-05-04 00:00:00 +0000 UTC"
time="2021-05-25T12:29:54Z" level=info msg="NotAfter: 2022-05-09 23:59:59 +0000 UTC"
time="2021-05-25T12:29:54Z" level=info msg="SignatureAlgorithm: SHA256-RSA"
time="2021-05-25T12:29:54Z" level=info msg="PublicKeyAlgorithm: RSA"
time="2021-05-25T12:29:54Z" level=fatal msg="Certificate chain is not complete, please check if all needed intermediate certificates are included in the server certificate (in the correct order) and if the cacerts setting in Rancher either contains the correct CA certificate (in the case of using self signed certificates) or is empty (in the case of using a certificate signed by a recognized CA). Certificate information is displayed above. error: Get \"https : // rancher.somewhere.com\": x509: certificate signed by unknown authority"
I have restarted the rancher pods, tried variations of certs in the tls.crt file and added both the intermediate and root cert to the LB setup.
Any ideas on how to get the RKE cluster to register? Willing to start over - re-do the whole rancher setup etc.
TIA.