Kubernetes service IP's not working?

Hi

I just installed a Rancher 2.0.3 HA cluster, and deployed a “custom” cluster through the UI. Everything seemed to have deployed smoothly.

However deployed a docker image(jonlangemak/net_tools – useful for testing) to test everything, I noticed
dns is not working, checked and kubedns seems to be running perfectly.

➜  cluster (rancher) ✗ kubectl get pods -o wide -n kube-system
NAME                                  READY     STATUS    RESTARTS   AGE       IP               NODE
calico-node-748w7                     2/2       Running   0          5h        10.165.214.62    829k8almdev2
calico-node-c9trb                     2/2       Running   0          5h        10.165.210.206   829k8apddev3
calico-node-thf4n                     2/2       Running   0          5h        10.165.210.204   829k8apddev1
calico-node-vhwjm                     2/2       Running   0          5h        10.165.210.205   829k8apddev2
calico-node-vs6dr                     2/2       Running   0          5h        10.165.214.64    829k8almdev3
calico-node-xw5g5                     2/2       Running   0          5h        10.165.214.63    829k8almdev1
kube-dns-5ccb66df65-4h7gk             3/3       Running   0          5h        10.42.2.2        829k8almdev3
kube-dns-5ccb66df65-dhqf2             3/3       Running   0          4h        10.42.5.2        829k8almdev2
kube-dns-autoscaler-6c4b786f5-dk2rk   1/1       Running   0          5h        10.42.3.2        829k8apddev3

Interesting thing, from inside the pod, I cannot ping the kube-dns IP’s. Then looking at the kube-dns service, which is the same IP that get’s populated in the pods /etc/resolv.conf file, I cannot seems to ping that as well

➜  cluster (rancher) ✗ kubectl get services -n kube-system
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)         AGE
kube-dns   ClusterIP   10.43.0.10   <none>        53/UDP,53/TCP   5h

Using calico for overlay networking.

Not sure, what I missed or did wrong, but does anyone have any idea what could be wrong?

Out of intrest, I tested on the cluster that run’s Rancher itself (I followed this guide - https://rancher.com/docs/rancher/v2.x/en/installation/ha-server-install/) and I have the exact same problem on that cluster.

I am guessing this is cause kube-proxy does not seem to be working properly.

+ echo kube-proxy --v=2 --healthz-bind-address=0.0.0.0 --kubeconfig=/etc/kubernetes/ssl/kubecfg-kube-proxy.yaml
+ grep -q cloud-provider=azure
+ '[' kube-proxy = kubelet ']'
+ exec kube-proxy --v=2 --healthz-bind-address=0.0.0.0 --kubeconfig=/etc/kubernetes/ssl/kubecfg-kube-proxy.yaml
I0626 08:34:13.950128    5858 flags.go:27] FLAG: --allow-verification-with-non-compliant-keys="false"
I0626 08:34:13.950187    5858 flags.go:27] FLAG: --alsologtostderr="false"
I0626 08:34:13.950193    5858 flags.go:27] FLAG: --application-metrics-count-limit="100"
I0626 08:34:13.950199    5858 flags.go:27] FLAG: --azure-container-registry-config=""
I0626 08:34:13.950206    5858 flags.go:27] FLAG: --bind-address="0.0.0.0"
I0626 08:34:13.950211    5858 flags.go:27] FLAG: --boot-id-file="/proc/sys/kernel/random/boot_id"
I0626 08:34:13.950217    5858 flags.go:27] FLAG: --cleanup="false"
I0626 08:34:13.950227    5858 flags.go:27] FLAG: --cleanup-iptables="false"
I0626 08:34:13.950231    5858 flags.go:27] FLAG: --cleanup-ipvs="true"
I0626 08:34:13.950236    5858 flags.go:27] FLAG: --cloud-provider-gce-lb-src-cidrs="130.211.0.0/22,209.85.152.0/22,209.85.204.0/22,35.191.0.0/16"
I0626 08:34:13.950245    5858 flags.go:27] FLAG: --cluster-cidr=""
I0626 08:34:13.950248    5858 flags.go:27] FLAG: --config=""
I0626 08:34:13.950252    5858 flags.go:27] FLAG: --config-sync-period="15m0s"
I0626 08:34:13.950260    5858 flags.go:27] FLAG: --conntrack-max="0"
I0626 08:34:13.950267    5858 flags.go:27] FLAG: --conntrack-max-per-core="32768"
I0626 08:34:13.950274    5858 flags.go:27] FLAG: --conntrack-min="131072"
I0626 08:34:13.950279    5858 flags.go:27] FLAG: --conntrack-tcp-timeout-close-wait="1h0m0s"
I0626 08:34:13.950284    5858 flags.go:27] FLAG: --conntrack-tcp-timeout-established="24h0m0s"
I0626 08:34:13.950287    5858 flags.go:27] FLAG: --container-hints="/etc/cadvisor/container_hints.json"
I0626 08:34:13.950293    5858 flags.go:27] FLAG: --containerd="unix:///var/run/containerd.sock"
I0626 08:34:13.950297    5858 flags.go:27] FLAG: --default-not-ready-toleration-seconds="300"
I0626 08:34:13.950303    5858 flags.go:27] FLAG: --default-unreachable-toleration-seconds="300"
I0626 08:34:13.950307    5858 flags.go:27] FLAG: --docker="unix:///var/run/docker.sock"
I0626 08:34:13.950312    5858 flags.go:27] FLAG: --docker-env-metadata-whitelist=""
I0626 08:34:13.950315    5858 flags.go:27] FLAG: --docker-only="false"
I0626 08:34:13.950319    5858 flags.go:27] FLAG: --docker-root="/var/lib/docker"
I0626 08:34:13.950323    5858 flags.go:27] FLAG: --docker-tls="false"
I0626 08:34:13.950326    5858 flags.go:27] FLAG: --docker-tls-ca="ca.pem"
I0626 08:34:13.950331    5858 flags.go:27] FLAG: --docker-tls-cert="cert.pem"
I0626 08:34:13.950336    5858 flags.go:27] FLAG: --docker-tls-key="key.pem"
I0626 08:34:13.950342    5858 flags.go:27] FLAG: --enable-load-reader="false"
I0626 08:34:13.950345    5858 flags.go:27] FLAG: --event-storage-age-limit="default=0"
I0626 08:34:13.950349    5858 flags.go:27] FLAG: --event-storage-event-limit="default=0"
I0626 08:34:13.950355    5858 flags.go:27] FLAG: --feature-gates=""
I0626 08:34:13.950364    5858 flags.go:27] FLAG: --global-housekeeping-interval="1m0s"
I0626 08:34:13.950367    5858 flags.go:27] FLAG: --google-json-key=""
I0626 08:34:13.950370    5858 flags.go:27] FLAG: --healthz-bind-address="0.0.0.0"
I0626 08:34:13.950374    5858 flags.go:27] FLAG: --healthz-port="10256"
I0626 08:34:13.950378    5858 flags.go:27] FLAG: --help="false"
I0626 08:34:13.950381    5858 flags.go:27] FLAG: --hostname-override=""
I0626 08:34:13.950384    5858 flags.go:27] FLAG: --housekeeping-interval="10s"
I0626 08:34:13.950388    5858 flags.go:27] FLAG: --iptables-masquerade-bit="14"
I0626 08:34:13.950391    5858 flags.go:27] FLAG: --iptables-min-sync-period="0s"
I0626 08:34:13.950394    5858 flags.go:27] FLAG: --iptables-sync-period="30s"
I0626 08:34:13.950398    5858 flags.go:27] FLAG: --ipvs-min-sync-period="0s"
I0626 08:34:13.950402    5858 flags.go:27] FLAG: --ipvs-scheduler=""
I0626 08:34:13.950407    5858 flags.go:27] FLAG: --ipvs-sync-period="30s"
I0626 08:34:13.950412    5858 flags.go:27] FLAG: --ir-data-source="influxdb"
I0626 08:34:13.950416    5858 flags.go:27] FLAG: --ir-dbname="k8s"
I0626 08:34:13.950419    5858 flags.go:27] FLAG: --ir-hawkular=""
I0626 08:34:13.950423    5858 flags.go:27] FLAG: --ir-influxdb-host="localhost:8080/api/v1/namespaces/kube-system/services/monitoring-influxdb:api/proxy"
I0626 08:34:13.950428    5858 flags.go:27] FLAG: --ir-namespace-only="false"
I0626 08:34:13.950433    5858 flags.go:27] FLAG: --ir-password="root"
I0626 08:34:13.950436    5858 flags.go:27] FLAG: --ir-percentile="90"
I0626 08:34:13.950443    5858 flags.go:27] FLAG: --ir-user="root"
I0626 08:34:13.950447    5858 flags.go:27] FLAG: --kube-api-burst="10"
I0626 08:34:13.950450    5858 flags.go:27] FLAG: --kube-api-content-type="application/vnd.kubernetes.protobuf"
I0626 08:34:13.950454    5858 flags.go:27] FLAG: --kube-api-qps="5"
I0626 08:34:13.950459    5858 flags.go:27] FLAG: --kubeconfig="/etc/kubernetes/ssl/kubecfg-kube-proxy.yaml"
I0626 08:34:13.950466    5858 flags.go:27] FLAG: --log-backtrace-at=":0"
I0626 08:34:13.950473    5858 flags.go:27] FLAG: --log-cadvisor-usage="false"
I0626 08:34:13.950476    5858 flags.go:27] FLAG: --log-dir=""
I0626 08:34:13.950480    5858 flags.go:27] FLAG: --log-flush-frequency="5s"
I0626 08:34:13.950484    5858 flags.go:27] FLAG: --loglevel="1"
I0626 08:34:13.950488    5858 flags.go:27] FLAG: --logtostderr="true"
I0626 08:34:13.950492    5858 flags.go:27] FLAG: --machine-id-file="/etc/machine-id,/var/lib/dbus/machine-id"
I0626 08:34:13.950496    5858 flags.go:27] FLAG: --masquerade-all="false"
I0626 08:34:13.950500    5858 flags.go:27] FLAG: --master=""
I0626 08:34:13.950503    5858 flags.go:27] FLAG: --metrics-bind-address="127.0.0.1:10249"
I0626 08:34:13.950506    5858 flags.go:27] FLAG: --nodeport-addresses="[]"
I0626 08:34:13.950513    5858 flags.go:27] FLAG: --oom-score-adj="-999"
I0626 08:34:13.950518    5858 flags.go:27] FLAG: --profiling="false"
I0626 08:34:13.950522    5858 flags.go:27] FLAG: --proxy-mode=""
I0626 08:34:13.950527    5858 flags.go:27] FLAG: --proxy-port-range=""
I0626 08:34:13.950533    5858 flags.go:27] FLAG: --resource-container="/kube-proxy"
I0626 08:34:13.950537    5858 flags.go:27] FLAG: --stderrthreshold="2"
I0626 08:34:13.950542    5858 flags.go:27] FLAG: --storage-driver-buffer-duration="1m0s"
I0626 08:34:13.950546    5858 flags.go:27] FLAG: --storage-driver-db="cadvisor"
I0626 08:34:13.950551    5858 flags.go:27] FLAG: --storage-driver-host="localhost:8086"
I0626 08:34:13.950554    5858 flags.go:27] FLAG: --storage-driver-password="root"
I0626 08:34:13.950557    5858 flags.go:27] FLAG: --storage-driver-secure="false"
I0626 08:34:13.950561    5858 flags.go:27] FLAG: --storage-driver-table="stats"
I0626 08:34:13.950565    5858 flags.go:27] FLAG: --storage-driver-user="root"
I0626 08:34:13.950568    5858 flags.go:27] FLAG: --udp-timeout="250ms"
I0626 08:34:13.950571    5858 flags.go:27] FLAG: --v="2"
I0626 08:34:13.950574    5858 flags.go:27] FLAG: --version="false"
I0626 08:34:13.950583    5858 flags.go:27] FLAG: --vmodule=""
I0626 08:34:13.950587    5858 flags.go:27] FLAG: --write-config-to=""
W0626 08:34:13.950598    5858 server.go:195] WARNING: all flags other than --config, --write-config-to, and --cleanup are deprecated. Please begin using a config file ASAP.
I0626 08:34:13.950625    5858 feature_gate.go:226] feature gates: &{{} map[]}
I0626 08:34:13.952538    5858 iptables.go:198] Could not connect to D-Bus system bus: dial unix /var/run/dbus/system_bus_socket: connect: no such file or directory
time="2018-06-26T08:34:13Z" level=warning msg="Running modprobe ip_vs failed with message: `modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/3.10.0-862.3.3.el7.x86_64/modules.dep.bin'\nmodprobe: WARNING: Module ip_vs not found in directory /lib/modules/3.10.0-862.3.3.el7.x86_64`, error: exit status 1"
time="2018-06-26T08:34:13Z" level=error msg="Could not get ipvs family information from the kernel. It is possible that ipvs is not enabled in your kernel. Native loadbalancing will not work until this is fixed."
W0626 08:34:13.958268    5858 server_others.go:290] Can't use ipvs proxier, trying iptables proxier
I0626 08:34:13.959113    5858 server_others.go:140] Using iptables Proxier.
W0626 08:34:13.968233    5858 server.go:601] Failed to retrieve node info: nodes "829k8almdev1.cloudnl.digital.kpn.org" not found
W0626 08:34:13.968309    5858 proxier.go:298] missing br-netfilter module or unset sysctl br-nf-call-iptables; proxy may not work as intended
W0626 08:34:13.968321    5858 proxier.go:306] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
W0626 08:34:13.968325    5858 proxier.go:311] clusterCIDR not specified, unable to distinguish between internal and external traffic
I0626 08:34:13.968382    5858 server_others.go:174] Tearing down inactive rules.
I0626 08:34:14.000286    5858 server.go:444] Version: v1.10.3
I0626 08:34:14.003968    5858 server.go:470] Running in resource-only container "/kube-proxy"
I0626 08:34:14.004438    5858 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_max' to 327680
I0626 08:34:14.004482    5858 conntrack.go:52] Setting nf_conntrack_max to 327680
I0626 08:34:14.004632    5858 mount_linux.go:196] Detected OS without systemd
I0626 08:34:14.004776    5858 conntrack.go:83] Setting conntrack hashsize to 81920
I0626 08:34:14.010875    5858 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0626 08:34:14.010908    5858 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0626 08:34:14.011065    5858 config.go:102] Starting endpoints config controller
I0626 08:34:14.011085    5858 controller_utils.go:1019] Waiting for caches to sync for endpoints config controller
I0626 08:34:14.011113    5858 config.go:202] Starting service config controller
I0626 08:34:14.011120    5858 controller_utils.go:1019] Waiting for caches to sync for service config controller
I0626 08:34:14.111205    5858 controller_utils.go:1026] Caches are synced for service config controller
I0626 08:34:14.111219    5858 controller_utils.go:1026] Caches are synced for endpoints config controller
I0626 08:34:14.111273    5858 service.go:310] Adding new service port "default/kubernetes:https" at 10.43.0.1:443/TCP
I0626 08:34:33.269146    5858 service.go:310] Adding new service port "kube-system/kube-dns:dns" at 10.43.0.10:53/UDP
I0626 08:34:33.269172    5858 service.go:310] Adding new service port "kube-system/kube-dns:dns-tcp" at 10.43.0.10:53/TCP
I0626 08:34:38.381285    5858 service.go:310] Adding new service port "ingress-nginx/default-http-backend:" at 10.43.84.217:80/TCP
E0626 08:36:19.096983    5858 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=11, ErrCode=NO_ERROR, debug=""
E0626 08:36:19.097186    5858 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=11, ErrCode=NO_ERROR, debug=""
E0626 08:36:19.097425    5858 reflector.go:322] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to watch *core.Endpoints: Get https://127.0.0.1:6443/api/v1/endpoints?resourceVersion=796&timeoutSeconds=414&watch=true: dial tcp 127.0.0.1:6443: getsockopt: connection refused
E0626 08:36:19.097460    5858 reflector.go:322] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to watch *core.Service: Get https://127.0.0.1:6443/api/v1/services?resourceVersion=516&timeoutSeconds=360&watch=true: dial tcp 127.0.0.1:6443: getsockopt: connection refused
E0626 08:36:23.921010    5858 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Endpoints: endpoints is forbidden: User "system:kube-proxy" cannot list endpoints at the cluster scope
I0626 09:18:53.277917    5858 proxier.go:637] Stale udp service kube-system/kube-dns:dns -> 10.43.0.10

Regarding ClusterIP: https://rancher.com/docs/rancher/v2.x/en/faq/technical/#my-clusterip-does-not-respond-to-ping

What host OS are you running?

Fair enough on the ClusterIP, but I am not pinging a cluster IP, but a service IP. On my non-rancher k8s clusters using calico as well, I can ping service IP’s perfectly fine. Also it’s not just a ping, the IP is not reachable, so DNS does not work.

Centos 7.3

Digging a little more into this, it seems that kube-proxy does not update the routing table on the host vm correctly.

For example I scaled up my pod replica’s

➜  .kube kubectl get pods -o wide -n infraplatform
NAME                       READY     STATUS    RESTARTS   AGE       IP           NODE
net-test-bd89876f8-2hp5l   1/1       Running   0          23s       10.42.3.43   829k8apddev3
➜  .kube kubectl get pods -o wide -n infraplatform
NAME                       READY     STATUS    RESTARTS   AGE       IP           NODE
net-test-bd89876f8-2hp5l   1/1       Running   0          50s       10.42.3.43   829k8apddev3
➜  .kube kubectl scale deployment net-test --replicas=2 -n infraplatform
deployment "net-test" scaled
➜  .kube kubectl get pods -o wide -n infraplatform
NAME                       READY     STATUS    RESTARTS   AGE       IP           NODE
net-test-bd89876f8-2hp5l   1/1       Running   0          1m        10.42.3.43   829k8apddev3
net-test-bd89876f8-99jwv   1/1       Running   0          8s        10.42.5.41   829k8almdev2
➜  .kube kubectl scale deployment net-test --replicas=3 -n infraplatform
deployment "net-test" scaled
➜  .kube kubectl get pods -o wide -n infraplatform
NAME                       READY     STATUS    RESTARTS   AGE       IP           NODE
net-test-bd89876f8-2hp5l   1/1       Running   0          1m        10.42.3.43   829k8apddev3
net-test-bd89876f8-99jwv   1/1       Running   0          40s       10.42.5.41   829k8almdev2
net-test-bd89876f8-dj985   1/1       Running   0          14s       10.42.4.44   829k8apddev2
➜  .kube kubectl get pods -o wide -n infraplatform
NAME                       READY     STATUS    RESTARTS   AGE       IP           NODE
net-test-bd89876f8-2hp5l   1/1       Running   0          1m        10.42.3.43   829k8apddev3
net-test-bd89876f8-99jwv   1/1       Running   0          56s       10.42.5.41   829k8almdev2
net-test-bd89876f8-dj985   1/1       Running   0          30s       10.42.4.44   829k8apddev2

and have a watch on the routing table on the vm running

Every 2.0s: route -n                                                                                           Wed Jun 27 14:10:58 2018

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.165.210.1    0.0.0.0         UG    100    0        0 ens160
10.8.0.0        10.165.210.1    255.255.0.0     UG    100    0        0 ens160
10.17.0.0	10.165.210.1    255.255.0.0     UG    100    0        0 ens160
10.42.3.0	0.0.0.0         255.255.255.0   U     0      0        0 *
10.42.3.2	0.0.0.0         255.255.255.255 UH    0      0        0 calib092412e835
10.42.3.3	0.0.0.0         255.255.255.255 UH    0      0        0 cali92020d7c118
10.42.3.43	0.0.0.0         255.255.255.255 UH    0      0        0 calie4d23383568
10.165.210.0    0.0.0.0         255.255.254.0   U     100    0        0 ens160
10.165.214.0    10.165.210.1    255.255.254.0   UG    100    0        0 ens160
172.17.0.0	0.0.0.0         255.255.0.0     U     0      0        0 docker0
192.168.2.0     0.0.0.0         255.255.254.0   U     101    0        0 ens192
192.168.4.0     192.168.2.1     255.255.254.0   UG    101    0        0 ens192

if a replica lands on the same host vm, a route get’s added as we can see from the

10.42.3.43	0.0.0.0         255.255.255.255 UH    0      0        0 calie4d23383568

entry, however for any other pod replica’s that lands on other vm’s there is no routes in the routing table, which we can confirm with a traceroute as well

[root@829k8apddev3 ~]# traceroute 10.42.4.44
traceroute to 10.42.4.44 (10.42.4.44), 30 hops max, 60 byte packets
 1  * * *
 2  * * *
 3  * * *
 4  * * *
 5  * * *
 6  * * *
 7  * * *
 8  * * *
 9  * * *
10  * * *
11  * * *
12  * * *
13  * * *
14  * * *
15  * * *
16  * * *
17  * * *
18  * * *
19  * * *
20  * * *
21  * * *
22  * * *
23  * * *
24  * * *
25  * * *
26  * * *
27  * * *
28  * * *
29  * * *
30  * * *
[root@829k8apddev3 ~]#

Why would kube-proxy not update the routing table? It’s really a stock cluster created rancher, the only thing i did was remove the NoExecute taint(node-role.kubernetes.io/etcd-) from the nodes to get pods to schedule to all nodes (it’s a development cluster, so I don’t care about scheduling pods to controlplane or etcd host vm’s)