Hi
I just installed a Rancher 2.0.3 HA cluster, and deployed a “custom” cluster through the UI. Everything seemed to have deployed smoothly.
However deployed a docker image(jonlangemak/net_tools – useful for testing) to test everything, I noticed
dns is not working, checked and kubedns seems to be running perfectly.
➜ cluster (rancher) ✗ kubectl get pods -o wide -n kube-system
NAME READY STATUS RESTARTS AGE IP NODE
calico-node-748w7 2/2 Running 0 5h 10.165.214.62 829k8almdev2
calico-node-c9trb 2/2 Running 0 5h 10.165.210.206 829k8apddev3
calico-node-thf4n 2/2 Running 0 5h 10.165.210.204 829k8apddev1
calico-node-vhwjm 2/2 Running 0 5h 10.165.210.205 829k8apddev2
calico-node-vs6dr 2/2 Running 0 5h 10.165.214.64 829k8almdev3
calico-node-xw5g5 2/2 Running 0 5h 10.165.214.63 829k8almdev1
kube-dns-5ccb66df65-4h7gk 3/3 Running 0 5h 10.42.2.2 829k8almdev3
kube-dns-5ccb66df65-dhqf2 3/3 Running 0 4h 10.42.5.2 829k8almdev2
kube-dns-autoscaler-6c4b786f5-dk2rk 1/1 Running 0 5h 10.42.3.2 829k8apddev3
Interesting thing, from inside the pod, I cannot ping the kube-dns IP’s. Then looking at the kube-dns service, which is the same IP that get’s populated in the pods /etc/resolv.conf
file, I cannot seems to ping that as well
➜ cluster (rancher) ✗ kubectl get services -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.43.0.10 <none> 53/UDP,53/TCP 5h
Using calico for overlay networking.
Not sure, what I missed or did wrong, but does anyone have any idea what could be wrong?
Out of intrest, I tested on the cluster that run’s Rancher itself (I followed this guide - https://rancher.com/docs/rancher/v2.x/en/installation/ha-server-install/) and I have the exact same problem on that cluster.
I am guessing this is cause kube-proxy does not seem to be working properly.
+ echo kube-proxy --v=2 --healthz-bind-address=0.0.0.0 --kubeconfig=/etc/kubernetes/ssl/kubecfg-kube-proxy.yaml
+ grep -q cloud-provider=azure
+ '[' kube-proxy = kubelet ']'
+ exec kube-proxy --v=2 --healthz-bind-address=0.0.0.0 --kubeconfig=/etc/kubernetes/ssl/kubecfg-kube-proxy.yaml
I0626 08:34:13.950128 5858 flags.go:27] FLAG: --allow-verification-with-non-compliant-keys="false"
I0626 08:34:13.950187 5858 flags.go:27] FLAG: --alsologtostderr="false"
I0626 08:34:13.950193 5858 flags.go:27] FLAG: --application-metrics-count-limit="100"
I0626 08:34:13.950199 5858 flags.go:27] FLAG: --azure-container-registry-config=""
I0626 08:34:13.950206 5858 flags.go:27] FLAG: --bind-address="0.0.0.0"
I0626 08:34:13.950211 5858 flags.go:27] FLAG: --boot-id-file="/proc/sys/kernel/random/boot_id"
I0626 08:34:13.950217 5858 flags.go:27] FLAG: --cleanup="false"
I0626 08:34:13.950227 5858 flags.go:27] FLAG: --cleanup-iptables="false"
I0626 08:34:13.950231 5858 flags.go:27] FLAG: --cleanup-ipvs="true"
I0626 08:34:13.950236 5858 flags.go:27] FLAG: --cloud-provider-gce-lb-src-cidrs="130.211.0.0/22,209.85.152.0/22,209.85.204.0/22,35.191.0.0/16"
I0626 08:34:13.950245 5858 flags.go:27] FLAG: --cluster-cidr=""
I0626 08:34:13.950248 5858 flags.go:27] FLAG: --config=""
I0626 08:34:13.950252 5858 flags.go:27] FLAG: --config-sync-period="15m0s"
I0626 08:34:13.950260 5858 flags.go:27] FLAG: --conntrack-max="0"
I0626 08:34:13.950267 5858 flags.go:27] FLAG: --conntrack-max-per-core="32768"
I0626 08:34:13.950274 5858 flags.go:27] FLAG: --conntrack-min="131072"
I0626 08:34:13.950279 5858 flags.go:27] FLAG: --conntrack-tcp-timeout-close-wait="1h0m0s"
I0626 08:34:13.950284 5858 flags.go:27] FLAG: --conntrack-tcp-timeout-established="24h0m0s"
I0626 08:34:13.950287 5858 flags.go:27] FLAG: --container-hints="/etc/cadvisor/container_hints.json"
I0626 08:34:13.950293 5858 flags.go:27] FLAG: --containerd="unix:///var/run/containerd.sock"
I0626 08:34:13.950297 5858 flags.go:27] FLAG: --default-not-ready-toleration-seconds="300"
I0626 08:34:13.950303 5858 flags.go:27] FLAG: --default-unreachable-toleration-seconds="300"
I0626 08:34:13.950307 5858 flags.go:27] FLAG: --docker="unix:///var/run/docker.sock"
I0626 08:34:13.950312 5858 flags.go:27] FLAG: --docker-env-metadata-whitelist=""
I0626 08:34:13.950315 5858 flags.go:27] FLAG: --docker-only="false"
I0626 08:34:13.950319 5858 flags.go:27] FLAG: --docker-root="/var/lib/docker"
I0626 08:34:13.950323 5858 flags.go:27] FLAG: --docker-tls="false"
I0626 08:34:13.950326 5858 flags.go:27] FLAG: --docker-tls-ca="ca.pem"
I0626 08:34:13.950331 5858 flags.go:27] FLAG: --docker-tls-cert="cert.pem"
I0626 08:34:13.950336 5858 flags.go:27] FLAG: --docker-tls-key="key.pem"
I0626 08:34:13.950342 5858 flags.go:27] FLAG: --enable-load-reader="false"
I0626 08:34:13.950345 5858 flags.go:27] FLAG: --event-storage-age-limit="default=0"
I0626 08:34:13.950349 5858 flags.go:27] FLAG: --event-storage-event-limit="default=0"
I0626 08:34:13.950355 5858 flags.go:27] FLAG: --feature-gates=""
I0626 08:34:13.950364 5858 flags.go:27] FLAG: --global-housekeeping-interval="1m0s"
I0626 08:34:13.950367 5858 flags.go:27] FLAG: --google-json-key=""
I0626 08:34:13.950370 5858 flags.go:27] FLAG: --healthz-bind-address="0.0.0.0"
I0626 08:34:13.950374 5858 flags.go:27] FLAG: --healthz-port="10256"
I0626 08:34:13.950378 5858 flags.go:27] FLAG: --help="false"
I0626 08:34:13.950381 5858 flags.go:27] FLAG: --hostname-override=""
I0626 08:34:13.950384 5858 flags.go:27] FLAG: --housekeeping-interval="10s"
I0626 08:34:13.950388 5858 flags.go:27] FLAG: --iptables-masquerade-bit="14"
I0626 08:34:13.950391 5858 flags.go:27] FLAG: --iptables-min-sync-period="0s"
I0626 08:34:13.950394 5858 flags.go:27] FLAG: --iptables-sync-period="30s"
I0626 08:34:13.950398 5858 flags.go:27] FLAG: --ipvs-min-sync-period="0s"
I0626 08:34:13.950402 5858 flags.go:27] FLAG: --ipvs-scheduler=""
I0626 08:34:13.950407 5858 flags.go:27] FLAG: --ipvs-sync-period="30s"
I0626 08:34:13.950412 5858 flags.go:27] FLAG: --ir-data-source="influxdb"
I0626 08:34:13.950416 5858 flags.go:27] FLAG: --ir-dbname="k8s"
I0626 08:34:13.950419 5858 flags.go:27] FLAG: --ir-hawkular=""
I0626 08:34:13.950423 5858 flags.go:27] FLAG: --ir-influxdb-host="localhost:8080/api/v1/namespaces/kube-system/services/monitoring-influxdb:api/proxy"
I0626 08:34:13.950428 5858 flags.go:27] FLAG: --ir-namespace-only="false"
I0626 08:34:13.950433 5858 flags.go:27] FLAG: --ir-password="root"
I0626 08:34:13.950436 5858 flags.go:27] FLAG: --ir-percentile="90"
I0626 08:34:13.950443 5858 flags.go:27] FLAG: --ir-user="root"
I0626 08:34:13.950447 5858 flags.go:27] FLAG: --kube-api-burst="10"
I0626 08:34:13.950450 5858 flags.go:27] FLAG: --kube-api-content-type="application/vnd.kubernetes.protobuf"
I0626 08:34:13.950454 5858 flags.go:27] FLAG: --kube-api-qps="5"
I0626 08:34:13.950459 5858 flags.go:27] FLAG: --kubeconfig="/etc/kubernetes/ssl/kubecfg-kube-proxy.yaml"
I0626 08:34:13.950466 5858 flags.go:27] FLAG: --log-backtrace-at=":0"
I0626 08:34:13.950473 5858 flags.go:27] FLAG: --log-cadvisor-usage="false"
I0626 08:34:13.950476 5858 flags.go:27] FLAG: --log-dir=""
I0626 08:34:13.950480 5858 flags.go:27] FLAG: --log-flush-frequency="5s"
I0626 08:34:13.950484 5858 flags.go:27] FLAG: --loglevel="1"
I0626 08:34:13.950488 5858 flags.go:27] FLAG: --logtostderr="true"
I0626 08:34:13.950492 5858 flags.go:27] FLAG: --machine-id-file="/etc/machine-id,/var/lib/dbus/machine-id"
I0626 08:34:13.950496 5858 flags.go:27] FLAG: --masquerade-all="false"
I0626 08:34:13.950500 5858 flags.go:27] FLAG: --master=""
I0626 08:34:13.950503 5858 flags.go:27] FLAG: --metrics-bind-address="127.0.0.1:10249"
I0626 08:34:13.950506 5858 flags.go:27] FLAG: --nodeport-addresses="[]"
I0626 08:34:13.950513 5858 flags.go:27] FLAG: --oom-score-adj="-999"
I0626 08:34:13.950518 5858 flags.go:27] FLAG: --profiling="false"
I0626 08:34:13.950522 5858 flags.go:27] FLAG: --proxy-mode=""
I0626 08:34:13.950527 5858 flags.go:27] FLAG: --proxy-port-range=""
I0626 08:34:13.950533 5858 flags.go:27] FLAG: --resource-container="/kube-proxy"
I0626 08:34:13.950537 5858 flags.go:27] FLAG: --stderrthreshold="2"
I0626 08:34:13.950542 5858 flags.go:27] FLAG: --storage-driver-buffer-duration="1m0s"
I0626 08:34:13.950546 5858 flags.go:27] FLAG: --storage-driver-db="cadvisor"
I0626 08:34:13.950551 5858 flags.go:27] FLAG: --storage-driver-host="localhost:8086"
I0626 08:34:13.950554 5858 flags.go:27] FLAG: --storage-driver-password="root"
I0626 08:34:13.950557 5858 flags.go:27] FLAG: --storage-driver-secure="false"
I0626 08:34:13.950561 5858 flags.go:27] FLAG: --storage-driver-table="stats"
I0626 08:34:13.950565 5858 flags.go:27] FLAG: --storage-driver-user="root"
I0626 08:34:13.950568 5858 flags.go:27] FLAG: --udp-timeout="250ms"
I0626 08:34:13.950571 5858 flags.go:27] FLAG: --v="2"
I0626 08:34:13.950574 5858 flags.go:27] FLAG: --version="false"
I0626 08:34:13.950583 5858 flags.go:27] FLAG: --vmodule=""
I0626 08:34:13.950587 5858 flags.go:27] FLAG: --write-config-to=""
W0626 08:34:13.950598 5858 server.go:195] WARNING: all flags other than --config, --write-config-to, and --cleanup are deprecated. Please begin using a config file ASAP.
I0626 08:34:13.950625 5858 feature_gate.go:226] feature gates: &{{} map[]}
I0626 08:34:13.952538 5858 iptables.go:198] Could not connect to D-Bus system bus: dial unix /var/run/dbus/system_bus_socket: connect: no such file or directory
time="2018-06-26T08:34:13Z" level=warning msg="Running modprobe ip_vs failed with message: `modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/3.10.0-862.3.3.el7.x86_64/modules.dep.bin'\nmodprobe: WARNING: Module ip_vs not found in directory /lib/modules/3.10.0-862.3.3.el7.x86_64`, error: exit status 1"
time="2018-06-26T08:34:13Z" level=error msg="Could not get ipvs family information from the kernel. It is possible that ipvs is not enabled in your kernel. Native loadbalancing will not work until this is fixed."
W0626 08:34:13.958268 5858 server_others.go:290] Can't use ipvs proxier, trying iptables proxier
I0626 08:34:13.959113 5858 server_others.go:140] Using iptables Proxier.
W0626 08:34:13.968233 5858 server.go:601] Failed to retrieve node info: nodes "829k8almdev1.cloudnl.digital.kpn.org" not found
W0626 08:34:13.968309 5858 proxier.go:298] missing br-netfilter module or unset sysctl br-nf-call-iptables; proxy may not work as intended
W0626 08:34:13.968321 5858 proxier.go:306] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
W0626 08:34:13.968325 5858 proxier.go:311] clusterCIDR not specified, unable to distinguish between internal and external traffic
I0626 08:34:13.968382 5858 server_others.go:174] Tearing down inactive rules.
I0626 08:34:14.000286 5858 server.go:444] Version: v1.10.3
I0626 08:34:14.003968 5858 server.go:470] Running in resource-only container "/kube-proxy"
I0626 08:34:14.004438 5858 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_max' to 327680
I0626 08:34:14.004482 5858 conntrack.go:52] Setting nf_conntrack_max to 327680
I0626 08:34:14.004632 5858 mount_linux.go:196] Detected OS without systemd
I0626 08:34:14.004776 5858 conntrack.go:83] Setting conntrack hashsize to 81920
I0626 08:34:14.010875 5858 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0626 08:34:14.010908 5858 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0626 08:34:14.011065 5858 config.go:102] Starting endpoints config controller
I0626 08:34:14.011085 5858 controller_utils.go:1019] Waiting for caches to sync for endpoints config controller
I0626 08:34:14.011113 5858 config.go:202] Starting service config controller
I0626 08:34:14.011120 5858 controller_utils.go:1019] Waiting for caches to sync for service config controller
I0626 08:34:14.111205 5858 controller_utils.go:1026] Caches are synced for service config controller
I0626 08:34:14.111219 5858 controller_utils.go:1026] Caches are synced for endpoints config controller
I0626 08:34:14.111273 5858 service.go:310] Adding new service port "default/kubernetes:https" at 10.43.0.1:443/TCP
I0626 08:34:33.269146 5858 service.go:310] Adding new service port "kube-system/kube-dns:dns" at 10.43.0.10:53/UDP
I0626 08:34:33.269172 5858 service.go:310] Adding new service port "kube-system/kube-dns:dns-tcp" at 10.43.0.10:53/TCP
I0626 08:34:38.381285 5858 service.go:310] Adding new service port "ingress-nginx/default-http-backend:" at 10.43.84.217:80/TCP
E0626 08:36:19.096983 5858 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=11, ErrCode=NO_ERROR, debug=""
E0626 08:36:19.097186 5858 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=11, ErrCode=NO_ERROR, debug=""
E0626 08:36:19.097425 5858 reflector.go:322] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to watch *core.Endpoints: Get https://127.0.0.1:6443/api/v1/endpoints?resourceVersion=796&timeoutSeconds=414&watch=true: dial tcp 127.0.0.1:6443: getsockopt: connection refused
E0626 08:36:19.097460 5858 reflector.go:322] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to watch *core.Service: Get https://127.0.0.1:6443/api/v1/services?resourceVersion=516&timeoutSeconds=360&watch=true: dial tcp 127.0.0.1:6443: getsockopt: connection refused
E0626 08:36:23.921010 5858 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Endpoints: endpoints is forbidden: User "system:kube-proxy" cannot list endpoints at the cluster scope
I0626 09:18:53.277917 5858 proxier.go:637] Stale udp service kube-system/kube-dns:dns -> 10.43.0.10
Fair enough on the ClusterIP, but I am not pinging a cluster IP, but a service IP. On my non-rancher k8s clusters using calico as well, I can ping service IP’s perfectly fine. Also it’s not just a ping, the IP is not reachable, so DNS does not work.
Centos 7.3
Digging a little more into this, it seems that kube-proxy
does not update the routing table on the host vm correctly.
For example I scaled up my pod replica’s
➜ .kube kubectl get pods -o wide -n infraplatform
NAME READY STATUS RESTARTS AGE IP NODE
net-test-bd89876f8-2hp5l 1/1 Running 0 23s 10.42.3.43 829k8apddev3
➜ .kube kubectl get pods -o wide -n infraplatform
NAME READY STATUS RESTARTS AGE IP NODE
net-test-bd89876f8-2hp5l 1/1 Running 0 50s 10.42.3.43 829k8apddev3
➜ .kube kubectl scale deployment net-test --replicas=2 -n infraplatform
deployment "net-test" scaled
➜ .kube kubectl get pods -o wide -n infraplatform
NAME READY STATUS RESTARTS AGE IP NODE
net-test-bd89876f8-2hp5l 1/1 Running 0 1m 10.42.3.43 829k8apddev3
net-test-bd89876f8-99jwv 1/1 Running 0 8s 10.42.5.41 829k8almdev2
➜ .kube kubectl scale deployment net-test --replicas=3 -n infraplatform
deployment "net-test" scaled
➜ .kube kubectl get pods -o wide -n infraplatform
NAME READY STATUS RESTARTS AGE IP NODE
net-test-bd89876f8-2hp5l 1/1 Running 0 1m 10.42.3.43 829k8apddev3
net-test-bd89876f8-99jwv 1/1 Running 0 40s 10.42.5.41 829k8almdev2
net-test-bd89876f8-dj985 1/1 Running 0 14s 10.42.4.44 829k8apddev2
➜ .kube kubectl get pods -o wide -n infraplatform
NAME READY STATUS RESTARTS AGE IP NODE
net-test-bd89876f8-2hp5l 1/1 Running 0 1m 10.42.3.43 829k8apddev3
net-test-bd89876f8-99jwv 1/1 Running 0 56s 10.42.5.41 829k8almdev2
net-test-bd89876f8-dj985 1/1 Running 0 30s 10.42.4.44 829k8apddev2
and have a watch on the routing table on the vm running
Every 2.0s: route -n Wed Jun 27 14:10:58 2018
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.165.210.1 0.0.0.0 UG 100 0 0 ens160
10.8.0.0 10.165.210.1 255.255.0.0 UG 100 0 0 ens160
10.17.0.0 10.165.210.1 255.255.0.0 UG 100 0 0 ens160
10.42.3.0 0.0.0.0 255.255.255.0 U 0 0 0 *
10.42.3.2 0.0.0.0 255.255.255.255 UH 0 0 0 calib092412e835
10.42.3.3 0.0.0.0 255.255.255.255 UH 0 0 0 cali92020d7c118
10.42.3.43 0.0.0.0 255.255.255.255 UH 0 0 0 calie4d23383568
10.165.210.0 0.0.0.0 255.255.254.0 U 100 0 0 ens160
10.165.214.0 10.165.210.1 255.255.254.0 UG 100 0 0 ens160
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.2.0 0.0.0.0 255.255.254.0 U 101 0 0 ens192
192.168.4.0 192.168.2.1 255.255.254.0 UG 101 0 0 ens192
if a replica lands on the same host vm, a route get’s added as we can see from the
10.42.3.43 0.0.0.0 255.255.255.255 UH 0 0 0 calie4d23383568
entry, however for any other pod replica’s that lands on other vm’s there is no routes in the routing table, which we can confirm with a traceroute as well
[root@829k8apddev3 ~]# traceroute 10.42.4.44
traceroute to 10.42.4.44 (10.42.4.44), 30 hops max, 60 byte packets
1 * * *
2 * * *
3 * * *
4 * * *
5 * * *
6 * * *
7 * * *
8 * * *
9 * * *
10 * * *
11 * * *
12 * * *
13 * * *
14 * * *
15 * * *
16 * * *
17 * * *
18 * * *
19 * * *
20 * * *
21 * * *
22 * * *
23 * * *
24 * * *
25 * * *
26 * * *
27 * * *
28 * * *
29 * * *
30 * * *
[root@829k8apddev3 ~]#
Why would kube-proxy not update the routing table? It’s really a stock cluster created rancher, the only thing i did was remove the NoExecute
taint(node-role.kubernetes.io/etcd-) from the nodes to get pods to schedule to all nodes (it’s a development cluster, so I don’t care about scheduling pods to controlplane or etcd host vm’s)