Canal daemonset permanently restarts

sergey · January 10, 2020, 9:15am

Hello.

We have an issue - canal containers are in permanent restart.

Here is a container log:

2020-01-10 09:06:28.923 [INFO][8] startup.go 256: Early log level set to info

2020-01-10 09:06:28.924 [INFO][8] startup.go 272: Using NODENAME environment for node name
2020-01-10 09:06:28.924 [INFO][8] startup.go 284: Determined node name: k8s-node-4
2020-01-10 09:06:28.926 [INFO][8] k8s.go 228: Using Calico IPAM
2020-01-10 09:06:28.926 [INFO][8] startup.go 316: Checking datastore connection
2020-01-10 09:06:28.947 [INFO][8] startup.go 340: Datastore connection verified
2020-01-10 09:06:28.947 [INFO][8] startup.go 95: Datastore is ready
2020-01-10 09:06:29.007 [INFO][8] startup.go 530: FELIX_IPV6SUPPORT is false through environment variable
2020-01-10 09:06:29.028 [INFO][8] startup.go 181: Using node name: k8s-node-4
2020-01-10 09:06:29.080 [INFO][16] k8s.go 228: Using Calico IPAM
CALICO_NETWORKING_BACKEND is none - no BGP daemon running
Calico node started successfully
2020-01-10 09:06:30.332 [WARNING][34] int_dataplane.go 354: Failed to query VXLAN device error=Link not found
2020-01-10 09:06:30.375 [WARNING][34] int_dataplane.go 384: Failed to cleanup preexisting XDP state error=failed to load XDP program (/tmp/felix-xdp-074424772): stat /sys/fs/bpf/calico/xdp/prefilter_v1_calico_tmp_A: no such file or directory
libbpf: failed to get EHDR from /tmp/felix-xdp-074424772
Error: failed to open object file

And there is a Kubernetes cluster options

I think, that this issue does not affect my cluster at the moment. But anyway, does anybody know how to fix it?

sergey · January 13, 2020, 9:25am

Does anybody have any ideas?

mattmattox · January 13, 2020, 10:45am

These error look normal. Try editing the probes for the workload to the following.

livenessProbe:
  failureThreshold: 6
  httpGet:
    host: localhost
    path: /liveness
    port: 9099
    scheme: HTTP
  initialDelaySeconds: 10
  periodSeconds: 10
  successThreshold: 1
  timeoutSeconds: 1
name: calico-node
readinessProbe:
  failureThreshold: 3
  httpGet:
    host: localhost
    path: /readiness
    port: 9099
    scheme: HTTP

If your still having issues, please reach out to me on Slack.

sergey · January 13, 2020, 12:39pm

It’s works, in some way Rancher now does not restarts containers, it believe that check is passed.
But in logs i see

2020-01-13 12:35:52.042 [WARNING][35] health.go 190: Reporter failed readiness checks name=“async_calc_graph” reporter-state=&health.reporterState{name:“async_calc_graph”, reports:health.HealthReport{Live:true, Ready:true}, timeout:20000000000, latest:health.HealthReport{Live:true, Ready:false}, timestamp:time.Time{wall:0xbf7f36e9fa78c1cb, ext:6362133775414, loc:(*time.Location)(0x2b08080)}}

sergey · January 13, 2020, 12:54pm

I found related issues on github:

I have rancher 2.3 and kubernetes v1.16.3. Should i apply CRD from this topics?

Topic		Replies	Views
Canal Liveness proble failes with v1.20.8-rancher1-1	0	1929	July 27, 2021
Rke2 with ipv6 not working	0	450	November 9, 2023
Excessive logging of canal daemonset Rancher	3	1260	December 10, 2019
Canal/Calico RBAC issues with Kubernetes Upgrade to v1.19.6 in Rancher v2.5.5 Rancher	4	10024	July 9, 2021
Rancher + Calico - k8s service network connection issues on bare metal nodes Rancher	0	587	June 6, 2021

Canal daemonset permanently restarts

Related topics