so we have a rancher cluster that was lost in a DR test which we are trying to restore however after several days we still cannot get it running again with identical config, 3 masters and 3 nodes.
running RKE v0.1.9 on an azure stack we have tried the following.
-
cleaning out all remaining certs and pruning the docker system.
-
redeploying the entire cluster.
-
recreating the config file.
however the issue still persists where the APIserver is not available.
We have found some errors but nothing obvious, how would we begin to troubleshoot ?
the main one we have found in the kube-apiserver logs is
http: TLS handshake error from abc:13758
but we cant get any clear idea of why the server is unavailable of the certs are generated cleanly.
What causes this ?? were kind of in a bind as we need this environment for other operations.