While giving Rancher 2.5.5 a shot, I’ve installed the new monitoring solution (9.4.202) on a fresh RKE v1.19.7 Kubernetes cluster. The underlying virtual nodes are running on OpenStack, however, they are manually provisioned and the cloud operator wasn’t used to create those machines.
All worker and master nodes have an internal IPv4 as well as an external IPv6 address in place. During the creation of the cluster, I’ve enforced the usage of the internal v4 addresses. Internal v6 communication is not required, because the cluster itself is available via an IPv4/6 load balancer in front of it.
Problem is that after the installation, Prometheus shows many targets down resulting in incomplete dashboards in Grafana.
(e.g. cattle-monitoring-system/rancher-monitoring-kube-etcd/0 (0/3 up))
The error is: server returned HTTP status 500 Internal Server Error. My best guess is that this is an issue with IPv6. The scraped URL is: http://[IPv6]:2379/metrics
Another error I see is at “cattle-monitoring-system/rancher-monitoring-node-exporter”: Get “http://[IPv6]:9796/metrics”: dial tcp [IPv6]:9796: connect: cannot assign requested address.
Problem could be that when I describe a node the IPv6 address is listed before the IPv4. Maybe the Prometheus discovery receives the first address of the node, which in this case would be a v6 address:
Addresses: InternalIP: IPv6 InternalIP: 10.200.200.27 Hostname: k8s-master-1
The issue is nicely visable when hooking into the prometheus pod and trying to
wget one of the failing endpoints:
$ kubectl -n cattle-monitoring-system exec -it prometheus-rancher-monitoring-prometheus-0 -- sh Defaulting container name to prometheus. Use 'kubectl describe pod/prometheus-rancher-monitoring-prometheus-0 -n cattle-monitoring-system' to see all of the containers in this pod. /prometheus $ wget -O. http://[IPv6]:2379/metrics Connecting to [IPv6]:2379 ([IPv6]:2379) wget: can't connect to remote host: Cannot assign requested address /prometheus $ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 3: eth0@if14: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue link/ether 8e:f4:56:71:68:2d brd ff:ff:ff:ff:ff:ff inet 10.42.3.9/32 brd 10.42.3.9 scope global eth0 valid_lft forever preferred_lft forever
So obviously the Pod can’t reach that Node’s v6 endpoint because the CNI is v4, only. However I never configured the Nodes to be v6 in any means. They just grabbed their IPv6 address from the interface list and due to the way OpenStack networking is done in my specific cluster I can’t disable it. I tried to de-configure v6 on the Node’s network interface, however the Kubernetes Node has the v6 address still assigned.
Maybe somebody has the same issue and has already figured out how to fix that problem?