I am trying to run Longhorn v1.4.0
on k3s v1.25.4+k3s1
. I have used kubectl apply against the manifest at https://github.com/longhorn/longhorn/blob/v1.4.0/deploy/longhorn.yaml.
This is the output of kubectl get pods -n longhorn-system
:
$ k get pods -n longhorn-system
NAME READY STATUS RESTARTS AGE
longhorn-driver-deployer-7fdddb9f99-ghr7n 0/1 Init:0/1 0 40m
longhorn-ui-6768fbbc6c-b7zxh 1/1 Running 0 40m
longhorn-conversion-webhook-58b5f48bbd-q6pmc 1/1 Running 0 40m
longhorn-ui-6768fbbc6c-27d5j 1/1 Running 0 40m
longhorn-conversion-webhook-58b5f48bbd-frd7m 1/1 Running 0 40m
longhorn-recovery-backend-d67444cf5-t9nfn 1/1 Running 0 40m
longhorn-recovery-backend-d67444cf5-w2n6s 1/1 Running 0 40m
longhorn-admission-webhook-6489cc5747-vk6fs 1/1 Running 0 40m
longhorn-admission-webhook-6489cc5747-fnstk 1/1 Running 0 40m
instance-manager-r-f191ceaab349c268c5d274049a5a0d00 1/1 Running 0 39m
instance-manager-e-805c799cf7f78da658c8d2c5ff964b2e 1/1 Running 0 39m
instance-manager-r-805c799cf7f78da658c8d2c5ff964b2e 1/1 Running 0 39m
instance-manager-e-f191ceaab349c268c5d274049a5a0d00 1/1 Running 0 39m
engine-image-ei-fc06c6fb-7p498 1/1 Running 0 39m
engine-image-ei-fc06c6fb-g5977 1/1 Running 0 39m
longhorn-manager-v8wwr 1/1 Running 0 21m
longhorn-manager-6qjh6 1/1 Running 0 21m
Everything appears to be working except longhorn-driver-deployer. I cannot see anything out of order in the manager logs: kubectl logs longhorn-manager-6qjh6 -n longhorn-system
:
Defaulted container "longhorn-manager" out of: longhorn-manager, wait-longhorn-admission-webhook (init)
W0102 20:28:28.230861 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
time="2023-01-02T20:28:28Z" level=info msg="cannot list the content of the src directory /var/lib/rancher/longhorn/engine-binaries for the copy, will do nothing: failed to execute: nsenter [--mount=/host/proc/1/ns/mnt --net=/host/proc/1/ns/net bash -c ls /var/lib/rancher/longhorn/engine-binaries/*], output , stderr ls: cannot access '/var/lib/rancher/longhorn/engine-binaries/*': No such file or directory\n: exit status 2"
time="2023-01-02T20:28:28Z" level=info msg="Skip the leader election for the upgrade since the current Longhorn system is already up to date"
W0102 20:28:28.496157 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0102 20:28:28.530198 1 shared_informer.go:240] Waiting for caches to sync for longhorn datastore
I0102 20:28:28.732273 1 shared_informer.go:247] Caches are synced for longhorn datastore
time="2023-01-02T20:28:28Z" level=info msg="Initializing metrics collector system" node=k8s0322.fjla.net
time="2023-01-02T20:28:28Z" level=info msg="Start longhorn-kubernetes-pod-controller" controller=longhorn-kubernetes-pod node=k8s0322.fjla.net
I0102 20:28:28.733290 1 shared_informer.go:240] Waiting for caches to sync for longhorn-kubernetes-pod-controller
time="2023-01-02T20:28:28Z" level=info msg="Starting Longhorn Kubernetes node controller"
I0102 20:28:28.733341 1 shared_informer.go:247] Caches are synced for longhorn-kubernetes-pod-controller
time="2023-01-02T20:28:28Z" level=info msg=Start controller=longhorn-kubernetes-configmap-controller node=k8s0322.fjla.net
I0102 20:28:28.733456 1 shared_informer.go:240] Waiting for caches to sync for longhorn-kubernetes-configmap-controller
I0102 20:28:28.733489 1 shared_informer.go:247] Caches are synced for longhorn-kubernetes-configmap-controller
time="2023-01-02T20:28:28Z" level=info msg="Start kubernetes controller"
I0102 20:28:28.733634 1 shared_informer.go:240] Waiting for caches to sync for kubernetes
I0102 20:28:28.733680 1 shared_informer.go:247] Caches are synced for kubernetes
time="2023-01-02T20:28:28Z" level=info msg="Starting Longhorn share manager controller" controller=longhorn-share-manager node=k8s0322.fjla.net
I0102 20:28:28.733875 1 shared_informer.go:240] Waiting for caches to sync for longhorn-share-manager-controller
I0102 20:28:28.733913 1 shared_informer.go:247] Caches are synced for longhorn-share-manager-controller
I0102 20:28:28.733370 1 shared_informer.go:240] Waiting for caches to sync for longhorn kubernetes node
I0102 20:28:28.734001 1 shared_informer.go:247] Caches are synced for longhorn kubernetes node
W0102 20:28:28.734561 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
time="2023-01-02T20:28:28Z" level=info msg="Starting Longhorn Backing Image controller"
time="2023-01-02T20:28:28Z" level=info msg="Starting Longhorn Engine Image controller" controller=longhorn-engine-image node=k8s0322.fjla.net
I0102 20:28:28.735363 1 shared_informer.go:240] Waiting for caches to sync for longhorn engine images
time="2023-01-02T20:28:28Z" level=info msg="Starting Longhorn SystemBackup controller" controller=longhorn-system-backup node=k8s0322.fjla.net
I0102 20:28:28.735403 1 shared_informer.go:247] Caches are synced for longhorn engine images
time="2023-01-02T20:28:28Z" level=info msg="Starting Longhorn backing image manager controller"
I0102 20:28:28.735485 1 shared_informer.go:240] Waiting for caches to sync for longhorn backing image manager
I0102 20:28:28.735627 1 shared_informer.go:247] Caches are synced for longhorn backing image manager
time="2023-01-02T20:28:28Z" level=info msg="Starting Longhorn replica controller" controller=longhorn-replica node=k8s0322.fjla.net
time="2023-01-02T20:28:28Z" level=info msg="Starting Longhorn Snapshot Controller" controller=longhorn-snapshot node=k8s0322.fjla.net
I0102 20:28:28.735421 1 shared_informer.go:240] Waiting for caches to sync for longhorn backing images
I0102 20:28:28.736202 1 shared_informer.go:247] Caches are synced for longhorn backing images
I0102 20:28:28.735425 1 shared_informer.go:240] Waiting for caches to sync for longhorn-system-backup
I0102 20:28:28.736349 1 shared_informer.go:247] Caches are synced for longhorn-system-backup
time="2023-01-02T20:28:28Z" level=info msg="Starting Longhorn Kubernetes secret controller" controller=longhorn-kubernetes-secret-controller node=k8s0322.fjla.net
I0102 20:28:28.736416 1 shared_informer.go:240] Waiting for caches to sync for longhorn-kubernetes-secret-controller
I0102 20:28:28.735943 1 shared_informer.go:240] Waiting for caches to sync for longhorn replicas
I0102 20:28:28.736499 1 shared_informer.go:247] Caches are synced for longhorn-kubernetes-secret-controller
I0102 20:28:28.736521 1 shared_informer.go:247] Caches are synced for longhorn replicas
time="2023-01-02T20:28:28Z" level=info msg="Starting Longhorn node controller"
time="2023-01-02T20:28:28Z" level=info msg="Starting Longhorn instance manager controller"
time="2023-01-02T20:28:28Z" level=info msg="Starting Longhorn Setting controller" controller=longhorn-setting node=k8s0322.fjla.net
I0102 20:28:28.736114 1 shared_informer.go:240] Waiting for caches to sync for longhorn-snapshot
time="2023-01-02T20:28:28Z" level=info msg="Starting Longhorn backing image data source controller"
time="2023-01-02T20:28:28Z" level=info msg="Starting Longhorn websocket controller" controller=longhorn-websocket node=k8s0322.fjla.net
I0102 20:28:28.737696 1 shared_informer.go:240] Waiting for caches to sync for longhorn websocket
I0102 20:28:28.737722 1 shared_informer.go:247] Caches are synced for longhorn-snapshot
I0102 20:28:28.737764 1 shared_informer.go:240] Waiting for caches to sync for longhorn backing image data source
I0102 20:28:28.737797 1 shared_informer.go:247] Caches are synced for longhorn backing image data source
I0102 20:28:28.737229 1 shared_informer.go:240] Waiting for caches to sync for longhorn node
I0102 20:28:28.737973 1 shared_informer.go:247] Caches are synced for longhorn node
time="2023-01-02T20:28:28Z" level=info msg="Starting Longhorn engine controller" controller=longhorn-engine node=k8s0322.fjla.net
I0102 20:28:28.738206 1 shared_informer.go:240] Waiting for caches to sync for longhorn engines
I0102 20:28:28.738249 1 shared_informer.go:247] Caches are synced for longhorn engines
I0102 20:28:28.737741 1 shared_informer.go:247] Caches are synced for longhorn websocket
I0102 20:28:28.737543 1 shared_informer.go:240] Waiting for caches to sync for longhorn settings
I0102 20:28:28.738659 1 shared_informer.go:247] Caches are synced for longhorn settings
time="2023-01-02T20:28:28Z" level=info msg="Starting Longhorn Backup Target controller" controller=longhorn-backup-target node=k8s0322.fjla.net
I0102 20:28:28.739271 1 shared_informer.go:240] Waiting for caches to sync for longhorn-backup-target
I0102 20:28:28.739588 1 shared_informer.go:247] Caches are synced for longhorn-backup-target
time="2023-01-02T20:28:28Z" level=info msg="Starting Longhorn SystemRestore controller" controller=longhorn-system-restore node=k8s0322.fjla.net
time="2023-01-02T20:28:28Z" level=info msg="Starting Longhorn Recurring Job controller"
time="2023-01-02T20:28:28Z" level=info msg="Starting Longhorn Backup Volume controller" controller=longhorn-backup-volume node=k8s0322.fjla.net
time="2023-01-02T20:28:28Z" level=info msg="Starting Longhorn volume controller" controller=longhorn-volume node=k8s0322.fjla.net
time="2023-01-02T20:28:28Z" level=info msg="Starting Longhorn Backup controller" controller=longhorn-backup node=k8s0322.fjla.net
time="2023-01-02T20:28:28Z" level=info msg="Starting Longhorn Orphan controller" controller=longhorn-orphan node=k8s0322.fjla.net
time="2023-01-02T20:28:28Z" level=info msg="Starting Longhorn Support Bundle controller" controller=longhorn-support-bundle node=k8s0322.fjla.net
time="2023-01-02T20:28:28Z" level=warning msg="number of node disks 1 and collected disk info 0 are not equal"
I0102 20:28:28.737242 1 shared_informer.go:240] Waiting for caches to sync for longhorn instance manager
time="2023-01-02T20:28:28Z" level=info msg="mismatching disks in node resource object and monitor collected data"
I0102 20:28:28.743036 1 shared_informer.go:247] Caches are synced for longhorn instance manager
time="2023-01-02T20:28:28Z" level=info msg="Event(v1.ObjectReference{Kind:\"Node\", Namespace:\"longhorn-system\", Name:\"k8s0222.fjla.net\", UID:\"3ac64634-14e7-4e00-b780-80cb0347d4c5\", APIVersion:\"longhorn.io/v1beta2\", ResourceVersion:\"141089\", FieldPath:\"\"}): type: 'Warning' reason: 'Ready' Node k8s0222.fjla.net is down: the manager pod longhorn-manager-v8wwr is not running"
I0102 20:28:28.740906 1 shared_informer.go:240] Waiting for caches to sync for longhorn-system-restore
I0102 20:28:28.756271 1 shared_informer.go:247] Caches are synced for longhorn-system-restore
I0102 20:28:28.741270 1 shared_informer.go:240] Waiting for caches to sync for longhorn recurring jobs
I0102 20:28:28.756403 1 shared_informer.go:247] Caches are synced for longhorn recurring jobs
I0102 20:28:28.741616 1 shared_informer.go:240] Waiting for caches to sync for longhorn-backup-volume
I0102 20:28:28.756997 1 shared_informer.go:247] Caches are synced for longhorn-backup-volume
I0102 20:28:28.741929 1 shared_informer.go:240] Waiting for caches to sync for longhorn engines
I0102 20:28:28.757284 1 shared_informer.go:247] Caches are synced for longhorn engines
I0102 20:28:28.742244 1 shared_informer.go:240] Waiting for caches to sync for longhorn-backup
I0102 20:28:28.757906 1 shared_informer.go:247] Caches are synced for longhorn-backup
I0102 20:28:28.742599 1 shared_informer.go:240] Waiting for caches to sync for longhorn-orphan
I0102 20:28:28.761578 1 shared_informer.go:247] Caches are synced for longhorn-orphan
I0102 20:28:28.742845 1 shared_informer.go:240] Waiting for caches to sync for longhorn-support-bundle
I0102 20:28:28.761913 1 shared_informer.go:247] Caches are synced for longhorn-support-bundle
time="2023-01-02T20:28:28Z" level=debug msg="Start monitoring instance manager instance-manager-r-805c799cf7f78da658c8d2c5ff964b2e" controller=longhorn-instance-manager instance manager=instance-manager-r-805c799cf7f78da658c8d2c5ff964b2e node=k8s0322.fjla.net
time="2023-01-02T20:28:28Z" level=debug msg="Start monitoring instance manager instance-manager-e-805c799cf7f78da658c8d2c5ff964b2e" controller=longhorn-instance-manager instance manager=instance-manager-e-805c799cf7f78da658c8d2c5ff964b2e node=k8s0322.fjla.net
time="2023-01-02T20:28:28Z" level=info msg="Cron is changed from to 0 0 */7 * *. Next snapshot check job will be executed at 2023-01-08 00:00:00 +0000 UTC" controller=longhorn-node monitor="snapshot monitor" node=k8s0322.fjla.net
time="2023-01-02T20:28:28Z" level=info msg="Listening on [fd46::8f6d:f9b6:182d:bf57:b228]:9500" node=k8s0322.fjla.net
time="2023-01-02T20:28:28Z" level=info msg="Debug Server listening on 127.0.0.1:6060" node=k8s0322.fjla.net
time="2023-01-02T20:28:29Z" level=info msg="Event(v1.ObjectReference{Kind:\"Node\", Namespace:\"longhorn-system\", Name:\"k8s0222.fjla.net\", UID:\"3ac64634-14e7-4e00-b780-80cb0347d4c5\", APIVersion:\"longhorn.io/v1beta2\", ResourceVersion:\"141774\", FieldPath:\"\"}): type: 'Normal' reason: 'Ready' Node k8s0222.fjla.net is ready"
time="2023-01-02T20:28:30Z" level=info msg="Event(v1.ObjectReference{Kind:\"Node\", Namespace:\"longhorn-system\", Name:\"k8s0322.fjla.net\", UID:\"66218d18-8856-4fc1-9923-ab409f9b02c4\", APIVersion:\"longhorn.io/v1beta2\", ResourceVersion:\"141777\", FieldPath:\"\"}): type: 'Normal' reason: 'Ready' Node k8s0322.fjla.net is ready"
kubectl describe svc -n longhorn-system
:
Name: longhorn-backend
Namespace: longhorn-system
Labels: app=longhorn-manager
app.kubernetes.io/instance=longhorn
app.kubernetes.io/name=longhorn
app.kubernetes.io/version=v1.4.0
Annotations: <none>
Selector: app=longhorn-manager
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.47.177.152
IPs: 10.47.177.152
Port: manager 9500/TCP
TargetPort: manager/TCP
Endpoints: 10.46.134.93:9500,10.46.143.105:9500
Session Affinity: ClientIP
Events: <none>
Running nslookup against longhorn-backend
inside a manager pod provides the correct address of the service.
However when running k exec -it longhorn-manager-v8wwr -n longhorn-system -- curl -m 1 -s -o /dev/null -w "%{http_code}" http://longhorn-backend:9500/v1
, the output is:
Defaulted container "longhorn-manager" out of: longhorn-manager, wait-longhorn-admission-webhook (init) 000command terminated with exit code 7
which, having looked through the manifest is what is causing the driver-deployer to hang.
I would welcome any suggestions to resolve this issue.
Thanks
Fred