Cattle node agent - Certificate chain is not complete

Hi,

I have a cluster with 4 nodes. I’ve been using it since several months but I saw today that one cattle-node-agent was Unavailable (it is maybe unavailable since the beginning as I didn’t check it once I’ve installed rancher). I tried a redeploy on the DaemonSet and I have now 3 out of 4 agents unavailable.

These 3 agents are infinitely restarting with the following error message:

level=fatal msg=“Certificate chain is not complete, please check if all needed intermediate certificates are included in the server certificate (in the correct order) and if the cacerts setting in Rancher either contains the correct CA certificate (in the case of using self signed certificates) or is empty (in the case of using a certificate signed by a recognized CA). Certificate information is displayed above. error: Get https://rancher.adibox.be: x509: certificate signed by unknown authority”

And the agent that is still running is also giving a warning about that certificate:

level=error msg=“Failed to connect to proxy. Empty dialer response” error=“x509: certificate signed by unknown authority”
level=error msg=“Remotedialer proxy error” error=“x509: certificate signed by unknown authority”

But I have a certificate signed by Let’s Encrypt which is not a self signed one. I maybe tried with a self signed one when I was first installing rancher but I’ve since updated the LoadBalancer ingress to use the signed one.

When I go to rancher → settings → cacerts. It shows me an empty certificate.

I tried to update the tls-rancher-ingress , first by the rancher UI and then by deleting, recreating the secret but it still fails. I’ve also tried to create a secret tls-ca but my pods are still in an Unavailable state.

My questions are:

  1. What is the purpose of cattle-node-agents ? What will happen if they are all down ?
  2. Is it possible to check which certificate node agents are using ? If yes, how ?
  3. Is it possible to update the certificate they are using ? If yes, how ?

Thanks for reading.

Any help is welcome.

1 Like

I’m getting this too with a single node. What’s your Rancher deployment script?

The one I’m testing with is here: https://github.com/TheShellLand/antsable/blob/master/playbooks/roles/rancher/tasks/main.yaml

I’m trying to deploy:

  • a single-node Rancher
  • no external TLS
  • install cert-manger with helm
  • install rancher with helm

I think my problem is I need to set rancher to a different port, because kubernetes ingress ports are conflicting with rancher’s 80 and 443