Access to pod https interface fails

On a bare-metal cluster running Rancher 2.2.2 on RancherOS 1.5.1 nodes, I’ve deployed a pod that provides a http interface on port 8080 and a https interface on port 8443, and a nodeport service for providing access to the pod ports.

While I can successfully access the http interface, I get timeout on the https interface.

What I’ve tried so far

  • change 8443 to 9443
  • change the NodePort to a LoadBalancer (using MetalLB)

To no avail - issue persist.

From the pod console, I’ve executed ‘tcpdump -i eth0 -vv’, and this has provided some interesting details.

When accessing the http interface, tcpdump shows (blank lines inserted for clarity):

09:54:21.393503 IP (tos 0x0, ttl 63, id 64391, offset 0, flags [DF], proto TCP (6), length 52)
    172.18.42.1.65084 > ecp-cd-8754f49b-hcs5r.webcache: Flags [.], cksum 0xaefc (correct), seq 93, ack 112, win 229, options [nop,nop,TS val 103132153 ecr 3386833339], length 0
09:54:21.393637 IP (tos 0x0, ttl 63, id 64392, offset 0, flags [DF], proto TCP (6), length 52)
    172.18.42.1.65084 > ecp-cd-8754f49b-hcs5r.webcache: Flags [F.], cksum 0xaefb (correct), seq 93, ack 112, win 229, options [nop,nop,TS val 103132153 ecr 3386833339], length 0

09:54:21.394037 IP (tos 0x0, ttl 64, id 60423, offset 0, flags [DF], proto TCP (6), length 52)
    ecp-cd-8754f49b-hcs5r.webcache > 172.18.42.1.65084: Flags [F.], cksum 0xe51e (incorrect -> 0xaefb), seq 112, ack 94, win 227, options [nop,nop,TS val 3386833340 ecr 103132153], length 0

09:54:21.394207 IP (tos 0x0, ttl 63, id 64393, offset 0, flags [DF], proto TCP (6), length 52)
    172.18.42.1.65084 > ecp-cd-8754f49b-hcs5r.webcache: Flags [.], cksum 0xaef9 (correct), seq 94, ack 113, win 229, options [nop,nop,TS val 103132153 ecr 3386833340], length 0

09:54:21.395545 IP (tos 0x0, ttl 64, id 34044, offset 0, flags [DF], proto UDP (17), length 69)
    ecp-cd-8754f49b-hcs5r.50252 > kube-dns.kube-system.svc.cluster.local.domain: [bad udp cksum 0x195c -> 0x7d52!] 36593+ PTR? 10.0.43.10.in-addr.arpa. (41)

09:54:21.395611 IP (tos 0x0, ttl 63, id 36096, offset 0, flags [DF], proto UDP (17), length 121)
    kube-dns.kube-system.svc.cluster.local.domain > ecp-cd-8754f49b-hcs5r.50252: [bad udp cksum 0x1990 -> 0x7cc6!] 36593 q: PTR? 10.0.43.10.in-addr.arpa. 1/0/0 10.0.43.10.in-addr.arpa. PTR kube-dns.kube-system.svc.cluster.local. (93)

When accessing the https interface, tcpdump provides as follows (blank lines inserted):

09:47:06.801956 IP (tos 0x0, ttl 63, id 52409, offset 0, flags [DF], proto TCP (6), length 60)
    172.18.42.1.hotu-chat > ecp-cd-8754f49b-hcs5r.pcsync-https: Flags [S], cksum 0x25d5 (correct), seq 3285241640, win 29200, options [mss 1460,sackOK,TS val 103023505 ecr 0,nop,wscale 7], length 0

09:47:06.801995 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    ecp-cd-8754f49b-hcs5r.pcsync-https > 172.18.42.1.hotu-chat: Flags [S.], cksum 0xe526 (incorrect -> 0x0a0c), seq 1304538340, ack 3285241641, win 28960, options [mss 1460,sackOK,TS val 3386398761 ecr 103023505,nop,wscale 7], length 0

09:47:06.802212 IP (tos 0x0, ttl 63, id 52410, offset 0, flags [DF], proto TCP (6), length 52)
    172.18.42.1.hotu-chat > ecp-cd-8754f49b-hcs5r.pcsync-https: Flags [.], cksum 0xa913 (correct), seq 1, ack 1, win 229, options [nop,nop,TS val 103023505 ecr 3386398761], length 0

09:47:06.802728 IP (tos 0x0, ttl 64, id 17181, offset 0, flags [DF], proto UDP (17), length 70)
    ecp-cd-8754f49b-hcs5r.55931 > kube-dns.kube-system.svc.cluster.local.domain: [bad udp cksum 0x195d -> 0x6cc1!] 59822+ PTR? 1.42.18.172.in-addr.arpa. (42)

09:47:06.802847 IP (tos 0x0, ttl 63, id 7094, offset 0, flags [DF], proto UDP (17), length 70)
    kube-dns.kube-system.svc.cluster.local.domain > ecp-cd-8754f49b-hcs5r.55931: [bad udp cksum 0x195d -> 0xec3d!] 59822 NXDomain q: PTR? 1.42.18.172.in-addr.arpa. 0/0/0 (42)

In the https request case, a wrong address is being returned from kube-dns (172.18.42.1 instead of 10.43.0.10).

On one of the worker node, I dumped the ip address information (ip addr | grep 172):

inet 172.18.42.1/16 scope global docker-sys
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0

As can be seen, 172.18.42.1 is the address of the system docker - can someone please shed some light on what is going on and how I can fix this (if not a bug in rancher or rancheros)

Thanks