Cluster agent in endless crash loop

seimic · March 6, 2020, 3:07pm

Hi,

any idea how to investigate the root cause of the following error?
It’s a user cluster registered as “Custom” RKE cluster in Rancher HA 2.3.5
Nodes are created externally, no provider like AWS etc. used.

After this message nothing else follows and the cluster agent crashes.

Setup:

Rancher HA Cluster: v2.3.5
Nginx as external loadbalancer (in same network)
User cluster: rancher/rancher-agent:v2.3.5
Both Kubernetes v1.16.6-rancher1-2
Everything in same network

Any help appreciated.

Kind regards,
Michael

seimic · March 7, 2020, 11:06pm

Installed the user cluster again, without to change the cidrs… now all pods in 10.42.0.0/16 subnet and services in 10.43.0.0/16…
The error remains same with different IP:

level=fatal msg=“Get https://10.43.0.1:443/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions: dial tcp 10.43.0.1:443: i/o timeout”

Did the DNS checks from here: https://rancher.com/docs/rancher/v2.x/en/troubleshooting/dns/
There are no errors in coredns, upstream dns is reachable too from all nodes (in this example only one control and one worker)

cattle-node-agents work fine.
Can’t find the reason.

seimic · March 9, 2020, 12:34pm

OK, replaced Weave CNI trough Canal and it works… but I would prefer Weave because of encryption.

This failes as mentioned above

network:
    options:
      flannel_backend_type: vxlan
      plugin: weave
        weave_network_provider:
          password: ...

Canal works directly

network:
    options:
      flannel_backend_type: vxlan
      plugin: canal

Its a very basic setup with one control plane and one worker to evaluate the whole automated setup.
(https://rancher.com/docs/rke/latest/en/config-options/add-ons/network-plugins/)

Kind regards,
Michael

Topic		Replies	Views
Cattle-pods failing Rancher 2.x	2	1638	October 25, 2019
Cattle-cluster-agent crashloopbackoff - Help me! Rancher 2.x	7	2639	June 27, 2022
Cattle cluster agent and node agent are showing CrashLoopBackOff	0	759	May 28, 2020
Existing Kubernetes Cluster join error	2	670	November 25, 2021
Rancher 2.5.9 Can't import Cluster: cattle-cluster-agent CrashLoopBackOff Rancher 2.x	4	5423	April 21, 2023

Cluster agent in endless crash loop

Related Topics