Cluster unavailable - Failed to communicate with API server - waiting for cluster agent to connect

I have a small cluster (1 node). It was working for a while perfectly. But one day, started getting errors that Rancher couldn’t connect to the API server.

Rebooting the node seem to have kept it alive for a few more hours, then failed again. Also killing the agent container and letting it restart reconnects the node. If I immediately go to see the workload in Rancher 2 UI, it times out and shows an error. If I wait for a while before pulling up the UI, and if I use kubectl I can manage the node. But after a little while, it goes back to error state.

The Rancher server is in the US, the node in South Africa. Ping time is about 220ms. I can pull up the web applications from the US that are running on the node. It’s not very fast but it works.

Also from the Rancher container, I can curl the API calls it claims are timing out. E.g. https://x.x.x.x:6443/version It only take a second to pull up that URL.

So I really don’t see why it disconnects and has such a hard time keeping the connection working.

That same Rancher server has a different cluster on it with about 20 nodes which are all on the same LAN and that works fine.

Any pointer of what I should look into?