Vsphere cluster creation stucks at the provisioning state

creating a vmware cluster stucks in a provisioning state and the nodes logs says “level=fatal msg=“Certificate chain is not complete,“. any idea on this error .
Rancher version 2.5.x.

The error is self explanatory, the certificate chain seems to be incomplete. The rest of the log contains details on the certificates used to diagnose what is going on. Please share the complete log and how you have setup Rancher (what type of install/what type of certificates/what commands have you used to setup).

rancher is running on the RKE cluster setup on vmware cluster.
rancher server is using the nginx server with the https setup. Rancher was installed using the secret with no private CA via helm chart.
helm install rancher rancher-latest/rancher --namespace cattle-system --set hostname= --set ingress.tls.source=secret

TLS secrets are created using the below command:

kubectl -n cattle-system create secret tls tls-rancher-ingress --cert=wildcard.crt --key=wildcard.key

What CA is the certificate from? Posting the full log will also provide these details.

How many certificates are in wildcard.crt?

its digicert
2 cert are in wildcard.crt
which logs you would be requiring. kubelet logs from the docker machine will work in this case.

The agent logs the full certificate details which should be enough to diagnose the issue as it will list the found certificates and the details so we can see what certificate and intermediate certificate it finds which is configured. Without that info I need to request that info manually, like what is the order of certificates in wildcard.crt, what CN is the first certificate, what CN is the second certificate.

The same check is also available in the Docker image superseb/ranchercheck, so running docker run --net=host superseb/ranchercheck https://rancher.yourdomain.com should also give enough details to diagnose.

docker run --net=host superseb/ranchercheck https://rancher.test.com
No value configured at https://rancher.test.com/v3/settings/cacerts, assuming certificate signed by trusted CA
Certificate details from https://rancher.test.com
Certificate #0 (https://rancher.test.com)
Subject: CN=.test.com,O=test Company L.P.,L=test,ST=Virginia,C=US
Issuer: CN=DigiCert SHA2 High Assurance Server CA,OU=www.digicert.com,O=DigiCert Inc,C=US
IsCA: false
DNS Names: [
.test.com test.com]
IPAddresses:
NotBefore: 2020-04-28 00:00:00 +0000 UTC
NotAfter: 2022-07-14 12:00:00 +0000 UTC
SignatureAlgorithm: SHA256-RSA
PublicKeyAlgorithm: RSA
Certificate chain is not complete, error: Get https://rancher.test.com: x509: certificate signed by unknown authority

One certificate is presented, so although there are two certificates in the file, they are in the wrong order or wrongly formatted because they are not correctly presented.

i just made some changes to the cert and now i can see the below output:

docker run --net=host superseb/ranchercheck https://rancher.test.com
No value configured at https://rancher.test.com/v3/settings/cacerts, assuming certificate signed by trusted CA
Certificate details from https://rancher.test.com
Certificate #0 (https://rancher.test.com)
Subject: CN=.test.com,O=test Company L.P.,L=Reston,ST=Virginia,C=US
Issuer: CN=DigiCert SHA2 High Assurance Server CA,OU=www.digicert.com,O=DigiCert Inc,C=US
IsCA: false
DNS Names: [
.test.com test.com]
IPAddresses:
NotBefore: 2020-04-28 00:00:00 +0000 UTC
NotAfter: 2022-07-14 12:00:00 +0000 UTC
SignatureAlgorithm: SHA256-RSA
PublicKeyAlgorithm: RSA
Certificate #1 (https://rancher.test.com)
Subject: CN=DigiCert SHA2 High Assurance Server CA,OU=www.digicert.com,O=DigiCert Inc,C=US
Issuer: CN=DigiCert High Assurance EV Root CA,OU=www.digicert.com,O=DigiCert Inc,C=US
IsCA: true
DNS Names:
IPAddresses:
NotBefore: 2013-10-22 12:00:00 +0000 UTC
NotAfter: 2028-10-22 12:00:00 +0000 UTC
SignatureAlgorithm: SHA256-RSA
PublicKeyAlgorithm: RSA
Certificate chain is complete, connection to https://rancher.test.com established successfully.

I am still getting the error while provisioning the vmware cluster:

Error creating machine: Error checking the host: Error checking and/or regenerating the certs: There was an error validating certificates for host “10.0.xx.xx:2376”: dial tcp 10.0.xx.xxx:2376: connect: no route to host

This is a different error and is not related to the certificate configured to access Rancher. Can Rancher access the specified IP on port 2376? Is Docker running on the machine with that IP? What OS is the machine running? What OS image was used?

See Rancher Docs: Technical how to access the machine to diagnose.

centos 7
firewalld is disabled so able to telnet to the machine as well.
docker is running on that machine node. service also running fine.

You can turn on trace logging to get all logging possible to diagnose the issue, it is stated here: Rancher Docs: Logging

Keep in mind that this log contains sensitive info so you can’t share the log without redacting that info.

you know what i was able to get through the provisioning of the vmware cluster and the nodes have been registered as well.

Thanks for all your inputs, i really appreciate that.