Cluster agent in endless crash loop

Hi,

any idea how to investigate the root cause of the following error?
It’s a user cluster registered as “Custom” RKE cluster in Rancher HA 2.3.5
Nodes are created externally, no provider like AWS etc. used.

After this message nothing else follows and the cluster agent crashes.

Setup:

Rancher HA Cluster: v2.3.5
Nginx as external loadbalancer (in same network)
User cluster: rancher/rancher-agent:v2.3.5
Both Kubernetes v1.16.6-rancher1-2
Everything in same network

Any help appreciated.

Kind regards,
Michael

Installed the user cluster again, without to change the cidrs… now all pods in 10.42.0.0/16 subnet and services in 10.43.0.0/16…
The error remains same with different IP:

level=fatal msg=“Get https://10.43.0.1:443/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions: dial tcp 10.43.0.1:443: i/o timeout”

Did the DNS checks from here: https://rancher.com/docs/rancher/v2.x/en/troubleshooting/dns/
There are no errors in coredns, upstream dns is reachable too from all nodes (in this example only one control and one worker)

cattle-node-agents work fine.
Can’t find the reason. :grimacing:

OK, replaced Weave CNI trough Canal and it works… but I would prefer Weave because of encryption.

This failes as mentioned above

network:
    options:
      flannel_backend_type: vxlan
      plugin: weave
        weave_network_provider:
          password: ...

Canal works directly

network:
    options:
      flannel_backend_type: vxlan
      plugin: canal

Its a very basic setup with one control plane and one worker to evaluate the whole automated setup.
(https://rancher.com/docs/rke/latest/en/config-options/add-ons/network-plugins/)

Kind regards,
Michael