so we have a rancher cluster that was lost in a DR test which we are trying to restore however after several days we still cannot get it running again with identical config, 3 masters and 3 nodes.
running RKE v0.1.9 on an azure stack we have tried the following.
cleaning out all remaining certs and pruning the docker system.
redeploying the entire cluster.
recreating the config file.
however the issue still persists where the APIserver is not available.
We have found some errors but nothing obvious, how would we begin to troubleshoot ?
the main one we have found in the kube-apiserver logs is
http: TLS handshake error from abc:13758
but we cant get any clear idea of why the server is unavailable of the certs are generated cleanly.
What causes this ?? were kind of in a bind as we need this environment for other operations.