Explore button for RKE Cluster created in Rancher stays Disabled

sam1 · September 5, 2021, 6:45am

New to using Rancher. Deployed the Docker image and when I went to the UI, I see a local named cluster already created.

Ignoring that, I created a new cluster using option : Use existing nodes and create a cluster using RKE and then added my Ubuntu VM with all 3 roles (etcd, CP, worker).

Post that, I have the cluster in Active state shown, but unlike the local cluster, the Explore option is disabled for my created clusters. Thus, there is not much I can do, such as looking at Nodes, Pods, or deploying anything via UI.

This is the view: (refer cluster by the name: k81-lemieux)

Note: The admin user logged in has Admin privs and all the possible privs shown.

Need inputs on where am I going wrong and how can i control the cluster same way as the options are there for local?

sam1 · September 6, 2021, 6:59pm

Any inputs would be helpful :-). Kind of struck as of now.

smgoller · September 10, 2021, 8:10pm

I have this exact problem. Did you find a solution?

Christian_Ohlendorff · September 23, 2021, 1:00pm

Have the exact same issue on my newly deployed Rancher 2.6.0, any pointers would be very much appreciated!

superseb · September 24, 2021, 12:49pm

There is no active cluster agent connection, you can check the logs of the pod on the node(s) to see why it cannot connect and if that is fixed, it should let you explore the cluster.

TheRealAlexV · September 24, 2021, 5:36pm

I’m having the same issue. This causes the explorer to not be available for the cluster on Rancher server.

Rancher version: Rancher version 2.6.0 rke1.
Kubernetes version: kubernetes version v1.20.11-rancher1-2
Downstream, Custom via docker on new Ubuntu 20.04 QEMU KVM vm nodes.

All 3 of my masters and all 3 of my workers can resolve the domain just fine. CoreDNS pods just show some io timeout errors:

linux/amd64, go1.15.3, 054c9ae
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
I0924 16:47:56.859902       1 trace.go:205] Trace[1427131847]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.19.2/tools/cache/reflector.go:156 (24-Sep-2021 16:47:26.859) (total time: 30000ms):
Trace[1427131847]: [30.000324176s] [30.000324176s] END
I0924 16:47:56.859923       1 trace.go:205] Trace[911902081]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.19.2/tools/cache/reflector.go:156 (24-Sep-2021 16:47:26.859) (total time: 30000ms):
Trace[911902081]: [30.000158123s] [30.000158123s] END
E0924 16:47:56.859925       1 reflector.go:127] pkg/mod/k8s.io/client-go@v0.19.2/tools/cache/reflector.go:156: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "https://10.43.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: i/o timeout
E0924 16:47:56.859933       1 reflector.go:127] pkg/mod/k8s.io/client-go@v0.19.2/tools/cache/reflector.go:156: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.43.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: i/o timeout
I0924 16:47:56.859937       1 trace.go:205] Trace[939984059]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.19.2/tools/cache/reflector.go:156 (24-Sep-2021 16:47:26.859) (total time: 30000ms):
Trace[939984059]: [30.000156648s] [30.000156648s] END
E0924 16:47:56.859941       1 reflector.go:127] pkg/mod/k8s.io/client-go@v0.19.2/tools/cache/reflector.go:156: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: Get "https://10.43.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: i/o timeout

Does Coredns forward it’s dns requests to the DNS servers that the host has configured? It would appear so judging by the logs output, but it’s not behaving that way.

I don’t have any wildcards involved in my DNS setup yet, however I am doing split horizon DNS, where rancher.mydomain.com resolves to the public or private facing IP depending on what side of the network you’re on. It resolve correctly when tested with nslookup and curl from the host:

ubuntu@k8s-master01:~$ curl -k https://rancher.mydomain.com/ping
pong

EDIT: I just worked around this issue by editing the cattle-cluster-agent deployment and adding a hostalias for rancher.mydomain.com to the pods spec. Adding entries to Pod /etc/hosts with HostAliases | Kubernetes

Christian_Ohlendorff · September 28, 2021, 12:57pm

I’m new to Rancher, so please excuse if I’m asking obvious questions, I have tried searching both these forums and Google.

What’s the name of the image the cluster agent is spawned from? Same question for the DNS service, the only name that leads me to think of any of these is rancher-agent:v2.6.0 a container that exits gracefully (exit code 0) and the log doesn’t indicate has any problems, it is the one used for originally registering my Kubernetes nodes to Rancher, but was it supposed to keep running?

Edit:
I think I found it, the container was spawned from the same image, but called k8s_cluster-register_cattle-cluster-agent-something and did, indeed show signs of not being able to resolve the name of the Rancher server, so I suspect I can use the same workaround as @TheRealAlexV, while I try to figure out what’s the root cause of this failure.

Hwc · October 10, 2021, 3:54am

I have this exact problem. Did you find a solution?

Arivoli_Murugan · October 20, 2021, 4:14am

Worked well for me !!

Adding detailed steps to fix the same and get started:

Make sure you have kubectl installed in your machine to access the RKE installed kubernetes.
Find the kube config file installed by RKE - “find / -name kube_config_cluster.yml”
export KUBECONFIG=$PWD/kube_config_cluster.yml
kubectl -n cattle-system get pods -l app=cattle-agent -o wide
Check the logs - kubectl logs cattle-cluster-agent- -n cattle-system

In my case the error was:

[root@kubeserver1 ~]# kubectl logs cattle-cluster-agent-6f6584c74c-9krm5 -n cattle-system

INFO: Environment: CATTLE_ADDRESS=10.42.0.12 CATTLE_CA_CHECKSUM=b07578fffc861bbbcce0d6180e72dcbbe61db9b01d98844271d45ec19676a40a CATTLE_CLUSTER=true CATTLE_CLUSTER_AGENT_PORT=tcp://10.43.240.166:80 CATTLE_CLUSTER_AGENT_PORT_443_TCP=tcp://10.43.240.166:443 CATTLE_CLUSTER_AGENT_PORT_443_TCP_ADDR=10.43.240.166 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PORT=443 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_PORT_80_TCP=tcp://10.43.240.166:80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_ADDR=10.43.240.166 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PORT=80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_SERVICE_HOST=10.43.240.166 CATTLE_CLUSTER_AGENT_SERVICE_PORT=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTP=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTPS_INTERNAL=443 CATTLE_CLUSTER_REGISTRY= CATTLE_FEATURES=embedded-cluster-api=false,fleet=false,monitoringv1=false,multi-cluster-management=false,multi-cluster-management-agent=true,provisioningv2=false,rke2=false CATTLE_INTERNAL_ADDRESS= CATTLE_IS_RKE=true CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-6f6584c74c-9krm5 CATTLE_SERVER=https://host.rancher.com CATTLE_SERVER_VERSION=v2.6.1

INFO: Using resolv.conf: nameserver 192.168.18.1 nameserver fe80::1%eth2 search rancher. com

ERROR: http s://kubeserver1.rancher.com/ping is not accessible (Could not resolve host: host.rancher. com)

Update Deployment as below:

kubectl edit deployment cattle-cluster-agent -n cattle-system
(Please make sure the deployment update has proper indentation)

spec:
hostAliases:
- ip: "127.0.0.1"
** hostnames:**
** - “kubeserver1”**
** - “kubeserver1.rancher. com”**
** - ip: “192.168.18.39”**
** hostnames:**
** - “kubeserver1.rancher. com”**
** - “kubeserver1”**
containers:
- env:
- name: CATTLE_FEATURES
value: embedded-cluster-api=false,fleet=false,monitoringv1=false,multi-cluster-management=false,multi-cluster-management-agent=true,provisioningv2=false,rke2=false
- name: CATTLE_IS_RKE
value: “true”
- name: CATTLE_SERVER
value: https://kubeserver1.rancher.com
- name: CATTLE_CA_CHECKSUM
value: b07578fffc861bbbcce0d6180e72dcbbe61db9b01d98844271d45ec19676a40a
- name: CATTLE_CLUSTER
value: “true”
- name: CATTLE_K8S_MANAGED
value: “true”
- name: CATTLE_CLUSTER_REGISTRY
- name: CATTLE_SERVER_VERSION
value: v2.6.1
image: rancher/rancher-agent:v2.6.1
imagePullPolicy: IfNotPresent
name: cluster-register
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /cattle-credentials
name: cattle-credentials
readOnly: true
dnsPolicy: ClusterFirst
kubectl apply -f /tmp/kubectl-edit-1556071999.yaml
[root@kubeserver1 ~]# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
cattle-fleet-system fleet-agent-8c9786db5-pc7gt 1/1 Running 0 101m
cattle-system cattle-cluster-agent-5f84d5bbfd-t6zfj 1/1 Running 0 48m
cattle-system cattle-node-agent-nhph7 1/1 Running 0 101m
cattle-system kube-api-auth-zwskv 1/1 Running 0 101m
ingress-nginx ingress-nginx-admission-create-nf69q 0/1 Completed 0 101m
ingress-nginx ingress-nginx-admission-patch-n6jst 0/1 Completed 0 101m
ingress-nginx nginx-ingress-controller-62vdg 1/1 Running 0 101m
kube-system calico-kube-controllers-6c977d77bc-twg5q 1/1 Running 0 101m
kube-system canal-zmbsd 2/2 Running 0 101m
kube-system coredns-685d6d555d-c5bgp 1/1 Running 0 101m
kube-system coredns-autoscaler-57fd5c9bd5-th2bb 1/1 Running 0 101m
kube-system metrics-server-7bf4b68b78-89kgl 1/1 Running 0 101m
kube-system rke-coredns-addon-deploy-job-pgcmc 0/1 Completed 0 101m
kube-system rke-ingress-controller-deploy-job-kj9mm 0/1 Completed 0 101m
kube-system rke-metrics-addon-deploy-job-9lm2p 0/1 Completed 0 101m
kube-system rke-network-plugin-deploy-job-67h2d 0/1 Completed 0 102m

** Cluster explorer would be up once agent is up !!**

trieszecg · October 29, 2021, 2:59pm

same issue here. After replacing CP nodes of existing cluster it became unavailable in cluster explorer.
cattle-system/cattle-cluster-agent cannot resolve the name of the rancher host anymore.

Simon_Carr · November 1, 2021, 6:45pm

Same problem for me. Just took down my working 2.5 cluster and created a new cluster with 2.6.2. The system says all the nodes are active, but I am unable to use the explore function.

Is this a bug in Rancher 2.6 or is everyone suddenly doing something wrong. Is anyone from Rancher investigating?

josesolis2201 · November 9, 2021, 9:27pm

Same issue, after upgrade to the new helm chart version this happened. We deleted and re-imported the clusters, it worked for a few days and then it happened again. We don’t see any errors anywhere. Is anyone investigating this?

Arivoli_Murugan · November 10, 2021, 3:16am

@josesolis2201 @Simon_Carr Is your cattle-cluster-agent running ? Please check status and pod logs.

josesolis2201 · November 10, 2021, 4:03pm

@Arivoli_Murugan

INFO: Using resolv.conf: nameserver 10.43.0.10 search cattle-system.svc.cluster.local svc.cluster.local cluster.local redacted.net options ndots:5
INFO: https://rancher.redacted.com/ping is accessible
INFO: rancher.redacted.com resolves to 10.50.94.128
INFO: Value from https://rancher.redacted.com/v3/settings/cacerts is an x509 certificate
time="2021-11-10T15:56:25Z" level=info msg="Listening on /tmp/log.sock"
time="2021-11-10T15:56:25Z" level=info msg="Rancher agent version v2.6.2 is starting"
time="2021-11-10T15:56:25Z" level=info msg="Connecting to wss://rancher.redacted.com/v3/connect/register with token starting with q76dtp85nzjpc6grqvhj5kfmp2k"
time="2021-11-10T15:56:25Z" level=info msg="Connecting to proxy" url="wss://rancher.redacted.com/v3/connect/register"

These are the logs from the cattle-cluster-agent pod in the cluster and yet, I can’t explore it in the UI.
I can see the machines that are part of the cluster but the explore button appears greyed out. This started happening only after the recent upgrade to 2.6

To dive deeper into our infrastructure and the issue at hand: We are importing RKE clusters into rancher, not provisioning them with rancher. As soon as I delete the cluster and re-register it the cluster is fine and I can explore it, it is only after some time (no pattern here) that I can no longer explore the cluster. The cattle pods never fail or hint to any errors

josesolis2201 · November 10, 2021, 4:11pm

I did notice that deleting the cluster in rancher does not correctly schedule the rancher resources for deletion. I have no idea what is going on but I am intrigued that not many people seem to have this issue

josesolis2201 · November 10, 2021, 4:20pm

I’ve imported the clusters again and they are working for the moment
The issue is that after some time they just stop working. They don’t appear unhealthy on anything the explore button just goes gray.
EDIT:
Like this actually:

mhonorio · December 14, 2021, 12:23pm

Same issue here, since I upgraded Rancher to v2.6. It was working fine on Rancher v2.5.

I don’t provision clusters on Rancher, I only import them and I noticed when Rancher generates the k8s manifests to import the desired cluster and when we run this manifest against the cluster more than once it causes this issue. Despite we can continue accessing the cluster through kubectl, I cannot explore it.

This is not good when we want to put those manifests on a pipeline or over ArgoCD for example because they will run the manifests over and over since they are declarative it shouldn’t cause any issue.

Of course recreating the cluster on Rancher fixes the issue but I cannot keep recreating it all the time.

Logs coming from the cattle-agent on the imported cluster:

INFO: Environment: CATTLE_ADDRESS=10.27.155.90 CATTLE_CA_CHECKSUM= CATTLE_CLUSTER=true CATTLE_CLUSTER_AGENT_PORT=tcp://172.20.34.246:80 CATTLE_CLUSTER_AGENT_PORT_443_TCP=tcp://172.20.34.246:443 CATTLE_CLUSTER_AGENT_PORT_443_TCP_ADDR=172.20.34.246 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PORT=443 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_PORT_80_TCP=tcp://172.20.34.246:80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_ADDR=172.20.34.246 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PORT=80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_SERVICE_HOST=172.20.34.246 CATTLE_CLUSTER_AGENT_SERVICE_PORT=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTP=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTPS_INTERNAL=443 CATTLE_CLUSTER_REGISTRY= CATTLE_INTERNAL_ADDRESS= CATTLE_IS_RKE=false CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-667bb4f9fd-r9pdr CATTLE_SERVER=https://new-rancher.redacted CATTLE_SERVER_VERSION=v2.6.2
INFO: Using resolv.conf: nameserver 172.20.0.10 search cattle-system.svc.cluster.local svc.cluster.local cluster.local ec2.internal options ndots:5
INFO: https://new-rancher.redacted/ping is accessible
INFO: new-rancher.redacted resolves to redacted
time=“2021-12-10T22:03:46Z” level=info msg=“Listening on /tmp/log.sock”
time=“2021-12-10T22:03:46Z” level=info msg=“Rancher agent version v2.6.2 is starting”
time=“2021-12-10T22:03:46Z” level=info msg=“Connecting to wss://new-rancher.redacted/v3/connect/register with token starting with redacted”
time=“2021-12-10T22:03:46Z” level=info msg=“Connecting to proxy” url=“wss://new-rancher.redacted/v3/connect/register”

Alexander_Kolosov · December 16, 2021, 7:26am

I have the same problem because fleet-agent doesn’t recognize the certificate signed by a corporate CA. When I created the cluster I started rancher-agent with SSL_CERT_DIR with the corporate CA certificate and everything works fine except fleet-agent. It seems a bug.

vchauhan · June 6, 2022, 4:47pm

This at least worked for me.

If you are using Rancher version 2.6.2, Then the problem in your cluster-agent is a cattle-feature is might not enabled.

Edit the cluster agent and add below line
kubectl edit deployment cattle-cluster-agent -n cattle-system
containers:

env:
name: CATTLE_FEATURES
value: embedded-cluster-api=false,fleet=false,monitoringv1=false,multi-cluster-management=false,multi-cluster-management-agent=true,provisioningv2=false,rke2=false

Topic		Replies	Views
Rancher GUI - cluster options is not shown Rancher	0	413	July 5, 2021
Rancher v2.7.1 sidebar gone & cluster stuck Rancher	0	305	June 9, 2023
Cluster cannot be accessed trough rancher after Rancher upgrade	1	796	November 2, 2023
Broken local-cluster in Rancher v2.5.9 Rancher	1	2062	September 7, 2021
Not able to add new nodes in imported cluster Rancher	3	1738	October 28, 2021

Explore button for RKE Cluster created in Rancher stays Disabled

Related topics