Canal Liveness proble failes with v1.20.8-rancher1-1

Canal pods are not able to succeed the liveness probe and therefore are restarted continuously.

Here are the logs:
I0727 12:45:34.458678 81914 prober.go:117] Readiness probe for “canal-xvxd9_kube-system(b8f2e18a-2eac-4169-a978-e58df94e9a3b):calico-node” failed (failure): Get “http://localhost:9099/readiness”: dial tcp 127.0.0.1:9099: connect: connection refused
E0727 12:46:02.407758 81914 remote_runtime.go:392] ExecSync 66f081022bb6d2e66caed3a2e38c992d350d0ab2ea391e40166a145804b2f70c ‘/bin/calico-node -felix-live’ from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded

I checked if port 9099 is listening and it is:
netstat -an | grep 9099
tcp 0 0 127.0.0.1:9099 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:9099 127.0.0.1:42660 TIME_WAIT
tcp 0 0 127.0.0.1:42646 127.0.0.1:9099 TIME_WAIT
tcp 0 0 127.0.0.1:42644 127.0.0.1:9099 TIME_WAIT
tcp 0 0 127.0.0.1:42648 127.0.0.1:9099 TIME_WAIT
tcp 0 0 127.0.0.1:9099 127.0.0.1:42650 TIME_WAIT
tcp 0 0 127.0.0.1:9099 127.0.0.1:42668 TIME_WAIT
tcp 0 0 127.0.0.1:9099 127.0.0.1:42664 TIME_WAIT
tcp 0 0 127.0.0.1:42664 127.0.0.1:9099 TIME_WAIT
tcp 0 0 127.0.0.1:42666 127.0.0.1:9099 TIME_WAIT
tcp 0 0 127.0.0.1:42650 127.0.0.1:9099 TIME_WAIT
tcp 0 0 127.0.0.1:42654 127.0.0.1:9099 TIME_WAIT
tcp 0 0 127.0.0.1:42662 127.0.0.1:9099 TIME_WAIT

I also tried to simulate on the node a call to the calico liveness probe and works too:
curl -vvvv http://localhost:9099/readiness

  • About to connect() to localhost port 9099 (#0)
  • Trying 127.0.0.1…
  • Connected to localhost (127.0.0.1) port 9099 (#0)

GET /readiness HTTP/1.1
User-Agent: curl/7.29.0
Host: localhost:9099
Accept: /

< HTTP/1.1 204 No Content
< Date: Tue, 27 Jul 2021 12:48:23 GMT
<

  • Connection #0 to host localhost left intact

Any recommendation?

1 Like