Unready status on cattle-cluster-agent

Hello.
I have cattle-cluster-ready in unready status. Restart of container does not help. Here is the log, something with cert chain:

INFO: Environment: CATTLE_ADDRESS=10.42.1.171 CATTLE_CA_CHECKSUM= CATTLE_CLUSTER=true CATTLE_INTERNAL_ADDRESS= CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-5dcd9c744d-d5t4z CATTLE_SERVER=https://192.168.100.15
INFO: Using resolv.conf: nameserver 10.43.0.10 search cattle-system.svc.cluster.local svc.cluster.local cluster.local openstacklocal options ndots:5
INFO: https://192.168.100.15/ping is accessible
time="2019-11-21T07:39:28Z" level=info msg="Rancher agent version v2.3.2 is starting"
time="2019-11-21T07:39:28Z" level=info msg="Listening on /tmp/log.sock"
time="2019-11-21T07:39:28Z" level=info msg="Certificate details from https://192.168.100.15"
time="2019-11-21T07:39:28Z" level=info msg="Certificate #0 (https://192.168.100.15)"
time="2019-11-21T07:39:28Z" level=info msg="Subject: CN=cattle,O=the-ranch"
time="2019-11-21T07:39:28Z" level=info msg="Issuer: CN=cattle-ca,O=the-ranch"
time="2019-11-21T07:39:28Z" level=info msg="IsCA: false"
time="2019-11-21T07:39:28Z" level=info msg="DNS Names: []"
time="2019-11-21T07:39:28Z" level=info msg="IPAddresses: [168.70.37.116 188.68.33.224 192.168.100.15 185.252.90.14 127.0.0.1 74.125.193.138 173.194.73.113 112.35.66.7 89.208.85.139 47.92.7.135 83.143.86.62 64.233.161.100 122.3.89.189 112.35.88.28 5.188.210.101 82.146.38.46 36.77.209.10 123.125.114.144 172.217.7.14 36.85.208.230]"
time="2019-11-21T07:39:28Z" level=info msg="NotBefore: 2019-04-09 10:17:44 +0000 UTC"
time="2019-11-21T07:39:28Z" level=info msg="NotAfter: 2020-11-18 03:59:00 +0000 UTC"
time="2019-11-21T07:39:28Z" level=info msg="SignatureAlgorithm: SHA256-RSA"
time="2019-11-21T07:39:28Z" level=info msg="PublicKeyAlgorithm: RSA"
time="2019-11-21T07:39:28Z" level=fatal msg="Certificate chain is not complete, please check if all needed intermediate certificates are included in the server certificate (in the correct order) and if the cacerts setting in Rancher either contains the correct CA certificate (in the case of using self signed certificates) or is empty (in the case of using a certificate signed by a recognized CA). Certificate information is displayed above. error: Get https://192.168.100.15: x509: certificate signed by unknown authority"

How can i fix it?

when you deployed the Rancher cluster with RKE, did you specify the cert to use and save that cert into a secret? Not doing that can cause this

also you will probably want to set hostname.

install like this (if using a private CA)
sudo helm install rancher-latest/rancher
–name rancher
–namespace cattle-system
–set hostname=your loadbalanced hostname
–set ingress.tls.source=secret
–set privateCA=true
–set additionalTrustedCAs=true
–set addLocal=false

To add the cert that matches your loadbalanced hostname (saved as tls.crt and tls.key):
kubectl -n cattle-system create secret tls tls-rancher-ingress --cert=tls.crt --key=tls.key

to import your CA cert:
kubectl -n cattle-system create secret generic tls-ca --from-file=cacerts.pem

To import additional trusted CA’s (optional)
kubectl -n cattle-system create secret generic tls-ca-additional --from-file=ca-additional.pem

Initially, i don’t change anything in certs. Just deploy single-node rancher without DNS name, then create new custom cluster using docker run commands from GUI). I have a nginx reverse-proxy for accessing rancher GUI with https.

Some time ago, i had tried to change rancher server name to public DNS, using Let’s Encypt certs for https. With no success, my cluster lost connection to rancher server. I had restored from backup. It was few month ago, i do a few rancher and kubernetes upgrades from this time.

Probably this is an issue.

Here is a my old topic with no replies Switch to Let's Encrypt

In Rancher settings cacerts is empty:

So, I generate new self-signed certificates like described here (https://gist.github.com/superseb/f129ad4204ca119249db00965acf657a) and mount them to container /etc/rancher/ssl

So far, there are no agent restarts

@sergey

I am running K8S Cluster (Kubeadm) on top of EC2.

Followed to generate self-signed certificates (Generate self signed certificates for Rancher 2.x · GitHub)

PUBIP=`curl -s http://169.254.169.254/latest/meta-data/public-ipv4`
HIP=`ip -o -4 addr list ens224 | awk '{print $4}' | cut -d/ -f1`

docker run -v $PWD/certs:/certs \
  -e SSL_SUBJECT=rancher.$PUBIP.nip.io \
  -e SSL_DNS=rancher.$PUBIP.nip.io,localhost,ingress.local,rancher.$HIP.nip.io \
  -e SSL_IP=127.0.0.1,10.0.0.1 \
  superseb/omgwtfssl

kubectl create namespace cattle-system

kubectl -n cattle-system create secret tls tls-rancher-ingress --cert=$PWD/certs/cert.pem --key=$PWD/certs/key.pem

helm repo add rancher-latest https://releases.rancher.com/server-charts/latest

helm install rancher rancher-latest/rancher \
  --namespace cattle-system \
  --set hostname=rancher.$PUBIP.nip.io \
  --set replicas=1 \
  --set ingress.tls.source=secret \
  --version 2.6.3

kubectl  get po

NAME                                    READY   STATUS             RESTARTS   AGE
cattle-cluster-agent-765756ff9c-2hth4   0/1     CrashLoopBackOff   6          10m
helm-operation-lzv8z                    0/2     Completed          0          12m
helm-operation-w8886                    0/2     Completed          0          11m
rancher-54d5457f76-dz946                1/1     Running            0          13m
rancher-webhook-5d4f5b7f6d-6zpmg        1/1     Running            0          11m
  • Cluster Agent error …
oc logs -f pod/cattle-cluster-agent-765756ff9c-2hth4
INFO: Environment: CATTLE_ADDRESS=10.244.0.41 CATTLE_CA_CHECKSUM= CATTLE_CLUSTER=true CATTLE_CLUSTER_AGENT_PORT=tcp://10.102.241.35:80 CATTLE_CLUSTER_AGENT_PORT_443_TCP=tcp://10.102.241.35:443 CATTLE_CLUSTER_AGENT_PORT_443_TCP_ADDR=10.102.241.35 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PORT=443 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_PORT_80_TCP=tcp://10.102.241.35:80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_ADDR=10.102.241.35 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PORT=80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_SERVICE_HOST=10.102.241.35 CATTLE_CLUSTER_AGENT_SERVICE_PORT=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTP=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTPS_INTERNAL=443 CATTLE_CLUSTER_REGISTRY= CATTLE_INGRESS_IP_DOMAIN=sslip.io CATTLE_INSTALL_UUID=6ff81677-bbe6-4fbb-a05b-d80e75b68df2 CATTLE_INTERNAL_ADDRESS= CATTLE_IS_RKE=false CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-765756ff9c-2hth4 CATTLE_SERVER=https://rancher.18.188.45.142.nip.io CATTLE_SERVER_VERSION=v2.6.3
INFO: Using resolv.conf: nameserver 10.96.0.10 search cattle-system.svc.cluster.local svc.cluster.local cluster.local us-east-2.compute.internal options ndots:5
INFO: https://rancher.18.188.45.142.nip.io/ping is accessible
INFO: rancher.18.188.45.142.nip.io resolves to 18.188.45.142
time="2022-03-21T02:57:23Z" level=info msg="Listening on /tmp/log.sock"
time="2022-03-21T02:57:23Z" level=info msg="Rancher agent version v2.6.3 is starting"
time="2022-03-21T02:57:23Z" level=info msg="Certificate details from https://rancher.18.188.45.142.nip.io"
time="2022-03-21T02:57:23Z" level=info msg="Certificate #0 (https://rancher.18.188.45.142.nip.io)"
time="2022-03-21T02:57:23Z" level=info msg="Subject: CN=rancher.18.188.45.142.nip.io"
time="2022-03-21T02:57:23Z" level=info msg="Issuer: CN=test-ca"
time="2022-03-21T02:57:23Z" level=info msg="IsCA: false"
time="2022-03-21T02:57:23Z" level=info msg="DNS Names: [rancher.18.188.45.142.nip.io localhost ingress.local rancher.172.31.5.145.nip.io rancher.18.188.45.142.nip.io]"
time="2022-03-21T02:57:23Z" level=info msg="IPAddresses: [127.0.0.1 10.0.0.1]"
time="2022-03-21T02:57:23Z" level=info msg="NotBefore: 2022-03-21 02:53:24 +0000 UTC"
time="2022-03-21T02:57:23Z" level=info msg="NotAfter: 2022-05-20 02:53:24 +0000 UTC"
time="2022-03-21T02:57:23Z" level=info msg="SignatureAlgorithm: SHA256-RSA"
time="2022-03-21T02:57:23Z" level=info msg="PublicKeyAlgorithm: RSA"
time="2022-03-21T02:57:23Z" level=fatal msg="Certificate chain is not complete, please check if all needed intermediate certificates are included in the server certificate (in the correct order) and if the cacerts setting in Rancher either contains the correct CA certificate (in the case of using self signed certificates) or is empty (in the case of using a certificate signed by a recognized CA). Certificate information is displayed above. error: Get \"https://rancher.18.188.45.142.nip.io\": x509: certificate signed by unknown authority"