Cluster unavailable - Failed to communicate with API server - waiting for cluster agent to connect

etlweather · February 28, 2019, 7:06pm

I have a small cluster (1 node). It was working for a while perfectly. But one day, started getting errors that Rancher couldn’t connect to the API server.

Rebooting the node seem to have kept it alive for a few more hours, then failed again. Also killing the agent container and letting it restart reconnects the node. If I immediately go to see the workload in Rancher 2 UI, it times out and shows an error. If I wait for a while before pulling up the UI, and if I use kubectl I can manage the node. But after a little while, it goes back to error state.

The Rancher server is in the US, the node in South Africa. Ping time is about 220ms. I can pull up the web applications from the US that are running on the node. It’s not very fast but it works.

Also from the Rancher container, I can curl the API calls it claims are timing out. E.g. https://x.x.x.x:6443/version It only take a second to pull up that URL.

So I really don’t see why it disconnects and has such a hard time keeping the connection working.

That same Rancher server has a different cluster on it with about 20 nodes which are all on the same LAN and that works fine.

Any pointer of what I should look into?

Topic		Replies	Views
Cluster api access is stuck on a missing node Rancher	4	1281	April 15, 2022
Failing to Communicate with Kubernetes API server after Load Testing Cluster Rancher	5	6077	September 19, 2018
Waiting for cluster agent to connect	0	2513	August 27, 2019
After Upgrade from 2.3.5 to 2.4.2: Failed to communicate with API Server Rancher	2	4088	October 19, 2021
Cluster agent down - kubectl cant connect - GUI Failed to communicate with API server Rancher	3	4183	August 31, 2022

Cluster unavailable - Failed to communicate with API server - waiting for cluster agent to connect

Related topics