Hello, I went ahead to start my first master node and it failed to initiate the k3s service. After some digging I found THIS to fix it. I went through the step and rebooted. This time the service started but I had the same errors for the modules. Is this safe to proceed or does anyone know how to fix this? It seems like it has been around for a while.
You should try modprobe br_netfilter & modprobe overlay manually yourself and see what you get. Try lsmod and look for them in the output too.
I don’t know about the error message, but if you don’t have those kernel modules available I doubt Kubernetes can function.
i will try that now… i got my cluster built with 2 masters and 4 workers… created my cert-manager pods and then went to install rancher with helm and get stuck at rolling out 2/3 and then fails with crashloopbackoff. Currently trying to figure out why… will test your suggestion now.
root@k3s-master-1:~# modprobe br_netfilter
modprobe: FATAL: Module br_netfilter not found in directory /lib/modules/5.13.19-2-pve
root@k3s-master-1:~# modprobe overlay
modprobe: FATAL: Module overlay not found in directory /lib/modules/5.13.19-2-pve
root@k3s-master-1:~#
If that’s the case then you have some problem with deploying the modules (not installed, permission issue, malformed modules, etc). Things likely won’t work until you get that fixed, but it’s now a simpler/clearer problem.
Linux containers use the host’s running kernel, so if the modprobe is part of the Proxmox host boot then they should be there (lsmod on the Proxmox host should tell you). If what you’re running inside a container is running a modprobe and erroring out from an error, then you might need the module in the container’s file system, but considering the different expectations of kernel versions between distros and different distro base containers that seems a bit odd too, but the error would still be there I’d think.
I use RKE2 and haven’t touched k3s, so I’m not sure if you can install it as a container or would need to deploy as a VM. I can tell you I’m running RKE2 on VMs hosted by KVM (which is what Proxmox uses for VMs), so if you deploy to VM rather than container then it should be workable.
Searching for packages with files with names similar to br_netfilter on https://packages.ubuntu.com/ didn’t come up with anything, so I think you may have a bit of a hunt on your hands.
If you aren’t familiar, netfilter is the part of the kernel that does all the firewall & NAT & general playing around with IPs & networks & whatnot that things like iptables, ufw, & firewalld use on the backend. The br part would be for network bridges. So hope that helps but that’s all the info I have. I’m in RHEL-land at the moment at work.
well thats not to good… i got some of this if it helps… at the bottom it cant connect to the internal container networks sooooo I’m wondering if that br_netfilter related.
kubectl describe pod rancher-996475d64-xf4ng --namespace cattle-system
Name: rancher-996475d64-xf4ng
Namespace: cattle-system
Priority: 0
Node: k3s-worker-1/192.168.30.8
Start Time: Wed, 22 Dec 2021 13:05:36 -0600
Labels: app=rancher
pod-template-hash=996475d64
release=rancher
Annotations: <none>
Status: Running
IP: 10.42.3.7
IPs:
IP: 10.42.3.7
Controlled By: ReplicaSet/rancher-996475d64
Containers:
rancher:
Container ID: containerd://e214f31b71be5734a1aedb4ba526686d04ad1707f1f8f6a446e4d0c0ebea3dd6
Image: rancher/rancher:v2.6.2
Image ID: docker.io/rancher/rancher@sha256:51658639bf2b28f4796e7daa04b0434e29fcf69ce062bf649462be77ae76cd2c
Port: 80/TCP
Host Port: 0/TCP
Args:
--http-listen-port=80
--https-listen-port=443
--add-local=true
State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 22 Dec 2021 13:17:57 -0600
Finished: Wed, 22 Dec 2021 13:18:06 -0600
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 22 Dec 2021 13:12:37 -0600
Finished: Wed, 22 Dec 2021 13:12:47 -0600
Ready: False
Restart Count: 7
Liveness: http-get http://:80/healthz delay=60s timeout=1s period=30s #success=1 #failure=3
Readiness: http-get http://:80/healthz delay=5s timeout=1s period=30s #success=1 #failure=3
Environment:
CATTLE_NAMESPACE: cattle-system
CATTLE_PEER_SERVICE: rancher
CATTLE_BOOTSTRAP_PASSWORD: <set to the key 'bootstrapPassword' in secret 'bootstrap-secret'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4jm46 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-4jm46:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: cattle.io/os=linux:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 12m default-scheduler Successfully assigned cattle-system/rancher-996475d64-xf4ng to k3s-worker-1
Warning Unhealthy 11m kubelet Readiness probe failed: Get "http://10.42.3.7:80/healthz": dial tcp 10.42.3.7:80: connect: connection refused
Normal Pulled 10m (x5 over 12m) kubelet Container image "rancher/rancher:v2.6.2" already present on machine
Normal Created 10m (x5 over 12m) kubelet Created container rancher
Normal Started 10m (x5 over 12m) kubelet Started container rancher
Warning BackOff 2m32s (x44 over 12m) kubelet Back-off restarting failed container
But it only has that in one of the pods… the other two don’t have that message. but all fail
Following the advice here and in in other forums I cooked up a set of documented Ansible Playbooks that setup your proxmox server and and LXC container that does run k3s and does have started containers. This allows me to start from scratch over and over again and tinker with one parameter at a time without copy pasting from this forum or several github gists or stackoverflow posts.
The bridge module seems to be not present but builtin or not needed.
On the guest you will still see warnings about the modules
I now host the container on my only disk with ext4. The trick with mounting ext4 formatted volumes on a ZFS zone did not (yet) work for me. Under investigation
The Playbooks are not yet Ansible roles and need changes to adopt to your situation.