Hi,
I’m prototyping a system, where devices (k3s nodes in a rancher system) are in locations with inconsistent network access. I’m deploying pods to said nodes, and they remain running even during network outages.
However if a node happens to reboot while it doesn’t have network access, k3s-agent doesn’t seem to run containerd and attempt to start the pods until the connection is recovered?
with journalctl -eu k3s-agent
I could see the following error repeat until the connection was made:
failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": EOF
127.0.0.1:6444/cacerts seems to redirect to the k3s api server which is not available until there is a network connection…
After the connection is made, containerd starts and the node starts working as seen below.
Is there a way to configure rancher-system-agent & k3s-agent to automatically run containerd and the previously known pods even after a reboot without network access?
Nov 07 16:10:12 node_1 k3s[765]: time="2024-11-07T16:10:12+02:00" level=warning msg="Cluster CA certificate is not trusted by the host CA bundle, but the token does not include a CA hash. Use the full token from the server's node-token file to enable Cluster CA validation."
Nov 07 16:10:12 node_1 k3s[765]: time="2024-11-07T16:10:12+02:00" level=info msg="Waiting to retrieve agent configuration; server is not ready: /var/lib/rancher/k3s/agent/serving-kubelet.crt: https://127.0.0.1:6444/v1-k3s/serving-kubelet.crt: 503 Service Unavailable"
Nov 07 16:10:23 node_1 k3s[765]: time="2024-11-07T16:10:23+02:00" level=info msg="Using private registry config file at /etc/rancher/k3s/registries.yaml"
Nov 07 16:10:23 node_1 k3s[765]: time="2024-11-07T16:10:23+02:00" level=info msg="Module overlay was already loaded"
Nov 07 16:10:23 node_1 k3s[765]: time="2024-11-07T16:10:23+02:00" level=info msg="Module br_netfilter was already loaded"
Nov 07 16:10:23 node_1 k3s[765]: time="2024-11-07T16:10:23+02:00" level=info msg="Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400"
Nov 07 16:10:23 node_1 k3s[765]: time="2024-11-07T16:10:23+02:00" level=info msg="Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600"
Nov 07 16:10:23 node_1 k3s[765]: time="2024-11-07T16:10:23+02:00" level=info msg="Set sysctl 'net/ipv4/conf/all/forwarding' to 1"
Nov 07 16:10:23 node_1 k3s[765]: time="2024-11-07T16:10:23+02:00" level=info msg="Set sysctl 'net/netfilter/nf_conntrack_max' to 131072"
Nov 07 16:10:23 node_1 k3s[765]: time="2024-11-07T16:10:23+02:00" level=info msg="Using containerd template at /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl"
Nov 07 16:10:23 node_1 k3s[765]: time="2024-11-07T16:10:23+02:00" level=info msg="Logging containerd to /var/lib/rancher/k3s/agent/containerd/containerd.log"
Nov 07 16:10:23 node_1 k3s[765]: time="2024-11-07T16:10:23+02:00" level=info msg="Running containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/k3s/agent/containerd"
Nov 07 16:10:24 node_1 k3s[765]: time="2024-11-07T16:10:24+02:00" level=info msg="containerd is now running"```