First k3s cluster attempt on proxmox failed

Hello, I went ahead to start my first master node and it failed to initiate the k3s service. After some digging I found THIS to fix it. I went through the step and rebooted. This time the service started but I had the same errors for the modules. Is this safe to proceed or does anyone know how to fix this? It seems like it has been around for a while.

`k3s.service - Lightweight Kubernetes`

`Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: enabled)`

`Active: active (running) since Mon 2021-12-20 19:28:25 UTC; 7s ago`

`Docs:` [ `https://k3s.io` ](https://k3s.io/)

`Process: 3112 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service (code=exited, status=0/SUCCESS)`

`Process: 3114 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=1/FAILURE)`

`Process: 3115 ExecStartPre=/sbin/modprobe overlay (code=exited, status=1/FAILURE)`

`Main PID: 3116 (k3s-server)`

`Tasks: 94`

`Memory: 1.1G`

`CPU: 5.861s`

`CGroup: /system.slice/k3s.service`

`|- 937 /var/lib/rancher/k3s/data/e61cd97f31a54dbcd9893f8325b7133cfdfd0229ff3bfae5a4f845780a93e84c/bin/containerd-shim-runc->`

`|- 975 /var/lib/rancher/k3s/data/e61cd97f31a54dbcd9893f8325b7133cfdfd0229ff3bfae5a4f845780a93e84c/bin/containerd-shim-runc->`

`|-1012 /var/lib/rancher/k3s/data/e61cd97f31a54dbcd9893f8325b7133cfdfd0229ff3bfae5a4f845780a93e84c/bin/containerd-shim-runc->`

`|-2145 /var/lib/rancher/k3s/data/e61cd97f31a54dbcd9893f8325b7133cfdfd0229ff3bfae5a4f845780a93e84c/bin/containerd-shim-runc->`

`|-2349 /var/lib/rancher/k3s/data/e61cd97f31a54dbcd9893f8325b7133cfdfd0229ff3bfae5a4f845780a93e84c/bin/containerd-shim-runc->`

`|-3116 /usr/local/bin/k3s server`

`\` -3129 containerd`

You should try modprobe br_netfilter & modprobe overlay manually yourself and see what you get. Try lsmod and look for them in the output too.
I don’t know about the error message, but if you don’t have those kernel modules available I doubt Kubernetes can function.

i will try that now… i got my cluster built with 2 masters and 4 workers… created my cert-manager pods and then went to install rancher with helm and get stuck at rolling out 2/3 and then fails with crashloopbackoff. Currently trying to figure out why… will test your suggestion now.

root@k3s-master-1:~# modprobe br_netfilter
modprobe: FATAL: Module br_netfilter not found in directory /lib/modules/5.13.19-2-pve
root@k3s-master-1:~# modprobe overlay
modprobe: FATAL: Module overlay not found in directory /lib/modules/5.13.19-2-pve
root@k3s-master-1:~# 

If that’s the case then you have some problem with deploying the modules (not installed, permission issue, malformed modules, etc). Things likely won’t work until you get that fixed, but it’s now a simpler/clearer problem.

they are found on the proxmox host… so I just need to get them into the lxc maybe? but I thought was done by default. idk

Linux containers use the host’s running kernel, so if the modprobe is part of the Proxmox host boot then they should be there (lsmod on the Proxmox host should tell you). If what you’re running inside a container is running a modprobe and erroring out from an error, then you might need the module in the container’s file system, but considering the different expectations of kernel versions between distros and different distro base containers that seems a bit odd too, but the error would still be there I’d think.

I use RKE2 and haven’t touched k3s, so I’m not sure if you can install it as a container or would need to deploy as a VM. I can tell you I’m running RKE2 on VMs hosted by KVM (which is what Proxmox uses for VMs), so if you deploy to VM rather than container then it should be workable.

So it seems overlay is loaded but br_netfilter is not. I’m not sure how I could add it I’m on ubuntu 20.04 and the most recent pve version

i found this on the pve host

root@zfs1:~# lsmod | grep br_netfilter
root@zfs1:~# lsmod | grep overlay
overlay               131072  19
root@zfs1:~# ```

LXC:

root@k3s-master-1:/lib# lsmod | grep overlay
overlay 131072 19
root@k3s-master-1:/lib# lsmod | grep br_netfilter
root@k3s-master-1:/lib#

Searching for packages with files with names similar to br_netfilter on https://packages.ubuntu.com/ didn’t come up with anything, so I think you may have a bit of a hunt on your hands.

If you aren’t familiar, netfilter is the part of the kernel that does all the firewall & NAT & general playing around with IPs & networks & whatnot that things like iptables, ufw, & firewalld use on the backend. The br part would be for network bridges. So hope that helps but that’s all the info I have. I’m in RHEL-land at the moment at work.

well thats not to good… i got some of this if it helps… at the bottom it cant connect to the internal container networks sooooo I’m wondering if that br_netfilter related.

 kubectl describe pod rancher-996475d64-xf4ng --namespace cattle-system
Name:         rancher-996475d64-xf4ng
Namespace:    cattle-system
Priority:     0
Node:         k3s-worker-1/192.168.30.8
Start Time:   Wed, 22 Dec 2021 13:05:36 -0600
Labels:       app=rancher
              pod-template-hash=996475d64
              release=rancher
Annotations:  <none>
Status:       Running
IP:           10.42.3.7
IPs:
  IP:           10.42.3.7
Controlled By:  ReplicaSet/rancher-996475d64
Containers:
  rancher:
    Container ID:  containerd://e214f31b71be5734a1aedb4ba526686d04ad1707f1f8f6a446e4d0c0ebea3dd6
    Image:         rancher/rancher:v2.6.2
    Image ID:      docker.io/rancher/rancher@sha256:51658639bf2b28f4796e7daa04b0434e29fcf69ce062bf649462be77ae76cd2c
    Port:          80/TCP
    Host Port:     0/TCP
    Args:
      --http-listen-port=80
      --https-listen-port=443
      --add-local=true
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 22 Dec 2021 13:17:57 -0600
      Finished:     Wed, 22 Dec 2021 13:18:06 -0600
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 22 Dec 2021 13:12:37 -0600
      Finished:     Wed, 22 Dec 2021 13:12:47 -0600
    Ready:          False
    Restart Count:  7
    Liveness:       http-get http://:80/healthz delay=60s timeout=1s period=30s #success=1 #failure=3
    Readiness:      http-get http://:80/healthz delay=5s timeout=1s period=30s #success=1 #failure=3
    Environment:
      CATTLE_NAMESPACE:           cattle-system
      CATTLE_PEER_SERVICE:        rancher
      CATTLE_BOOTSTRAP_PASSWORD:  <set to the key 'bootstrapPassword' in secret 'bootstrap-secret'>  Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4jm46 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-4jm46:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 cattle.io/os=linux:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  12m                   default-scheduler  Successfully assigned cattle-system/rancher-996475d64-xf4ng to k3s-worker-1
  Warning  Unhealthy  11m                   kubelet            Readiness probe failed: Get "http://10.42.3.7:80/healthz": dial tcp 10.42.3.7:80: connect: connection refused
  Normal   Pulled     10m (x5 over 12m)     kubelet            Container image "rancher/rancher:v2.6.2" already present on machine
  Normal   Created    10m (x5 over 12m)     kubelet            Created container rancher
  Normal   Started    10m (x5 over 12m)     kubelet            Started container rancher
  Warning  BackOff    2m32s (x44 over 12m)  kubelet            Back-off restarting failed container

But it only has that in one of the pods… the other two don’t have that message. but all fail

1 Like
ubuntu@Ubuntu-pve:~$ kubectl -n cattle-system logs deploy/rancher
Found 4 pods, using pod/rancher-996475d64-n4pxc
2021/12/25 03:34:34 [INFO] Rancher version v2.6.2 (64c748d16) is starting
2021/12/25 03:34:34 [INFO] Rancher arguments {ACMEDomains:[] AddLocal:true Embedded:false BindHost: HTTPList                                                                                    enPort:80 HTTPSListenPort:443 K8sMode:auto Debug:false Trace:false NoCACerts:false AuditLogPath:/var/log/aud                                                                                    itlog/rancher-api-audit.log AuditLogMaxage:10 AuditLogMaxsize:100 AuditLogMaxbackup:10 AuditLevel:0 Features                                                                                    : ClusterRegistry:}
2021/12/25 03:34:34 [INFO] Listening on /tmp/log.sock
2021/12/25 03:34:34 [INFO] Running in clustered mode with ID 10.42.5.4, monitoring endpoint cattle-system/ra                                                                                    ncher
2021/12/25 03:34:34 [INFO] Applying CRD features.management.cattle.io
2021/12/25 03:34:35 [INFO] Applying CRD navlinks.ui.cattle.io
2021/12/25 03:34:35 [INFO] Applying CRD clusters.management.cattle.io
2021/12/25 03:34:35 [INFO] Applying CRD apiservices.management.cattle.io
2021/12/25 03:34:35 [INFO] Applying CRD clusterregistrationtokens.management.cattle.io
2021/12/25 03:34:36 [INFO] Applying CRD settings.management.cattle.io
2021/12/25 03:34:36 [INFO] Applying CRD preferences.management.cattle.io
2021/12/25 03:34:36 [INFO] Applying CRD features.management.cattle.io
2021/12/25 03:34:36 [INFO] Applying CRD clusterrepos.catalog.cattle.io
2021/12/25 03:34:36 [INFO] Applying CRD operations.catalog.cattle.io
2021/12/25 03:34:36 [INFO] Applying CRD apps.catalog.cattle.io
2021/12/25 03:34:36 [INFO] Applying CRD fleetworkspaces.management.cattle.io
2021/12/25 03:34:36 [INFO] Applying CRD managedcharts.management.cattle.io
2021/12/25 03:34:37 [INFO] Applying CRD clusters.provisioning.cattle.io
2021/12/25 03:34:37 [INFO] Applying CRD clusters.provisioning.cattle.io
2021/12/25 03:34:37 [INFO] Applying CRD rkeclusters.rke.cattle.io
2021/12/25 03:34:37 [INFO] Applying CRD rkecontrolplanes.rke.cattle.io
2021/12/25 03:34:37 [INFO] Applying CRD rkebootstraps.rke.cattle.io
2021/12/25 03:34:37 [INFO] Applying CRD rkebootstraptemplates.rke.cattle.io
2021/12/25 03:34:37 [INFO] Applying CRD rkecontrolplanes.rke.cattle.io
2021/12/25 03:34:37 [INFO] Applying CRD custommachines.rke.cattle.io
2021/12/25 03:34:38 [INFO] Applying CRD clusters.cluster.x-k8s.io
2021/12/25 03:34:38 [INFO] Applying CRD machinedeployments.cluster.x-k8s.io
2021/12/25 03:34:38 [INFO] Applying CRD machinehealthchecks.cluster.x-k8s.io
2021/12/25 03:34:38 [INFO] Applying CRD machines.cluster.x-k8s.io
2021/12/25 03:34:38 [INFO] Applying CRD machinesets.cluster.x-k8s.io
W1225 03:34:38.952421      35 warnings.go:80] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is depre                                                                                    cated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
W1225 03:34:39.160893      35 warnings.go:80] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is depre                                                                                    cated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
W1225 03:34:39.824956      35 warnings.go:80] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is depre                                                                                    cated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
W1225 03:34:40.014287      35 warnings.go:80] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is depre                                                                                    cated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
W1225 03:34:40.023021      35 warnings.go:80] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is depre                                                                                    cated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
W1225 03:34:40.062211      35 warnings.go:80] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is depre                                                                                    cated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
W1225 03:34:40.290885      35 warnings.go:80] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is depre                                                                                    cated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
W1225 03:34:40.303971      35 warnings.go:80] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is depre                                                                                    cated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
2021/12/25 03:34:45 [FATAL] error running the jail command: exit status 2

Following the advice here and in in other forums I cooked up a set of documented Ansible Playbooks that setup your proxmox server and and LXC container that does run k3s and does have started containers. This allows me to start from scratch over and over again and tinker with one parameter at a time without copy pasting from this forum or several github gists or stackoverflow posts.

My Proxmox K3S Ansible Playbooks Useful?

  • The bridge module seems to be not present but builtin or not needed.
  • On the guest you will still see warnings about the modules
  • I now host the container on my only disk with ext4. The trick with mounting ext4 formatted volumes on a ZFS zone did not (yet) work for me. Under investigation
  • The Playbooks are not yet Ansible roles and need changes to adopt to your situation.
kubectl get pods --all-namespaces 

NAMESPACE     NAME                                     READY   STATUS      RESTARTS   AGE
kube-system   local-path-provisioner-64ffb68fd-czzdd   1/1     Running     0          45m
kube-system   metrics-server-9cf544f65-lv9cn           1/1     Running     0          45m
kube-system   coredns-85cb69466-mrtch                  1/1     Running     0          45m
kube-system   helm-install-traefik-crd--1-2mtwk        0/1     Completed   0          45m
kube-system   helm-install-traefik--1-9z49d            0/1     Completed   1          45m
kube-system   svclb-traefik-zs6nr                      2/2     Running     0          44m
kube-system   traefik-786ff64748-zddpc                 1/1     Running     0          44m