Canal daemonset permanently restarts


We have an issue - canal containers are in permanent restart.

Here is a container log:

2020-01-10 09:06:28.923 [INFO][8] startup.go 256: Early log level set to info

2020-01-10 09:06:28.924 [INFO][8] startup.go 272: Using NODENAME environment for node name
2020-01-10 09:06:28.924 [INFO][8] startup.go 284: Determined node name: k8s-node-4
2020-01-10 09:06:28.926 [INFO][8] k8s.go 228: Using Calico IPAM
2020-01-10 09:06:28.926 [INFO][8] startup.go 316: Checking datastore connection
2020-01-10 09:06:28.947 [INFO][8] startup.go 340: Datastore connection verified
2020-01-10 09:06:28.947 [INFO][8] startup.go 95: Datastore is ready
2020-01-10 09:06:29.007 [INFO][8] startup.go 530: FELIX_IPV6SUPPORT is false through environment variable
2020-01-10 09:06:29.028 [INFO][8] startup.go 181: Using node name: k8s-node-4
2020-01-10 09:06:29.080 [INFO][16] k8s.go 228: Using Calico IPAM
CALICO_NETWORKING_BACKEND is none - no BGP daemon running
Calico node started successfully
2020-01-10 09:06:30.332 [WARNING][34] int_dataplane.go 354: Failed to query VXLAN device error=Link not found
2020-01-10 09:06:30.375 [WARNING][34] int_dataplane.go 384: Failed to cleanup preexisting XDP state error=failed to load XDP program (/tmp/felix-xdp-074424772): stat /sys/fs/bpf/calico/xdp/prefilter_v1_calico_tmp_A: no such file or directory
libbpf: failed to get EHDR from /tmp/felix-xdp-074424772
Error: failed to open object file

And there is a Kubernetes cluster options

I think, that this issue does not affect my cluster at the moment. But anyway, does anybody know how to fix it?

Does anybody have any ideas?

These error look normal. Try editing the probes for the workload to the following.

  failureThreshold: 6
    host: localhost
    path: /liveness
    port: 9099
    scheme: HTTP
  initialDelaySeconds: 10
  periodSeconds: 10
  successThreshold: 1
  timeoutSeconds: 1
name: calico-node
  failureThreshold: 3
    host: localhost
    path: /readiness
    port: 9099
    scheme: HTTP

If your still having issues, please reach out to me on Slack.

It’s works, in some way :wink: Rancher now does not restarts containers, it believe that check is passed.
But in logs i see

2020-01-13 12:35:52.042 [WARNING][35] health.go 190: Reporter failed readiness checks name=“async_calc_graph” reporter-state=&health.reporterState{name:“async_calc_graph”, reports:health.HealthReport{Live:true, Ready:true}, timeout:20000000000, latest:health.HealthReport{Live:true, Ready:false}, timestamp:time.Time{wall:0xbf7f36e9fa78c1cb, ext:6362133775414, loc:(*time.Location)(0x2b08080)}}

I found related issues on github:

I have rancher 2.3 and kubernetes v1.16.3. Should i apply CRD from this topics?