Hello,
following Rancher doc on k3s HA, I have created 3 node k3s cluster. This is going to be management cluster that will be managing several Kubernetes clusters. But only after creating the second one, the 3 node HA k3s master and added 5 worker nodes, importing already fails or it is not complete.
I have this layout in rancher:
Whenever I try to access imported user cluster I get mentioned error. Much of the functionality is available using the new cluster dashboard, I can actually deploy workloads, eg. logging, monitoring, longhorn storage… but then I try to acces deployed longhorn storage page and again I get:
error trying to reach service: waiting for cluster [c-pbvxc] agent to connect
If I take a look into the rancher agent pod, no significant errors are visible:
k get all -n cattle-system
NAME READY STATUS RESTARTS AGE
pod/cattle-cluster-agent-6c7b995bf8-n759r 1/1 Running 0 84m
pod/helm-operation-knhcr 0/2 Completed 0 38m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/cattle-cluster-agent 1/1 1 1 25h
NAME DESIRED CURRENT READY AGE
replicaset.apps/cattle-cluster-agent-6c7b995bf8 1 1 1 6h23m
k logs cattle-cluster-agent-6c7b995bf8-n759r -n cattle-system
INFO: Environment: CATTLE_ADDRESS=172.16.2.6 CATTLE_CA_CHECKSUM=84395dbd1258fb10e3f80acfafa49fada160175ec155237fb9a02fd23f7efdde CATTLE_CLUSTER=true CATTLE_CLUSTER_REGISTRY= CATTLE_FEATURES= CATTLE_INTERNAL_ADDRESS= CATTLE_IS_RKE=false CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-6c7b995bf8-n759r CATTLE_SERVER=https://rancher.lan.omitted-address
INFO: Using resolv.conf: search cattle-system.svc.cluster.local svc.cluster.local cluster.local sidk.lan nameserver 172.17.0.10 options ndots:5
INFO: https://rancher.lan.sidk.systems/ping is accessible
INFO: rancher.lan.omitted-address resolves to 10.0.0.45
INFO: Value from https://rancher.lan.omitted-address/v3/settings/cacerts is an x509 certificate
time="2021-07-18T13:40:21Z" level=info msg="Listening on /tmp/log.sock"
time="2021-07-18T13:40:21Z" level=info msg="Rancher agent version v2.5.9 is starting"
time="2021-07-18T13:40:21Z" level=info msg="Connecting to wss://rancher.lan.omitted-address/v3/connect/register with token z2p7266fxb9vbw2pdtk5ptgj8wjqfkbvzkf4t78mx4x5d6nch9575w"
time="2021-07-18T13:40:21Z" level=info msg="Connecting to proxy" url="wss://rancher.lan.omitted-address/v3/connect/register"
time="2021-07-18T13:40:21Z" level=info msg="Starting user controllers"
W0718 13:40:21.448769 56 warnings.go:80] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
W0718 13:40:21.467856 56 warnings.go:80] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
time="2021-07-18T13:40:21Z" level=info msg="Registering namespaceHandler for adding labels "
time="2021-07-18T13:40:21Z" level=info msg="Starting workload controllers"
rancher.lan is exposed via the metallb LB address as seen, 10.0.0.45, is accessible normally, also i have added to that LB port 80 and it is also correctly resolvable.
The nodes are all ubuntu 20.04 LTS 2 with these settings added:
systemctl stop ufw
iptables -P FORWARD ACCEPT
modprobe br_netfilter
sysctl net.ipv4.ip_forward
sysctl net.bridge.bridge-nf-call-iptables=1
sysctl --system
iptables -P FORWARD ACCEPT
swapoff -a
What am I doing wrong? Most of the time managing imported cluster is failed, managing local cluster is OK. Some times imported cluster fails with API not ready and similar - basically, it “works” sporadically.
Any input/help/guidance is appreciated as I am unable to proceed to importing other clusters and am forced to stick with multiple rancher deployment which is highly undesired.
Thank you
rgD