I have a couple of volumes v0.8.1. I now want to upgrade the engine to 1.0.0 using the UI. On some of the volumes it worked well. Some others do not upgrade. Where can I see the root cause of this? How can I upgrade?
Please follow the instruction on https://longhorn.io/docs/1.0.0/deploy/upgrade/longhorn-manager/ to upgrade to 1.0.0. Only offline upgrade is supported for this release, so please scale down the workload for upgrade. You can see the instructions for details.
I followed instructions. The PVs are detached (as seen in the Longhorn UI). Still no success.
As you mentioned offline upgrade, I am a bit puzzled reading below, which states live upgrade is supported, right?
Live upgrade is available since v0.3.3, with the exception of upgrade from v0.5.0 to v0.6.0 and v0.7.0 to v0.8.0.
Nevertheless I am a bit lost, especially as a few PVs did upgrade well.
Sorry that part of document wasn’t up to date. I’ve updated it and added
v1.0.0 to the list.
o.k., fair enough. But still: the offline upgrade does not work.
The deployment has been scaled to 0. Replicaset also shows all 0. No Pods running anymore. The PVs in Longhorn-UI are detached. So all preconditions given, I assume.
I just see:
Jun 1 16:40:58 Ubuntu-1804-bionic-64-minimal k3s: E0601 16:40:58.508170 1076 goroutinemap.go:150] Operation for “/var/lib/kubelet/plugins_registry/io.rancher.longhorn-reg.sock” failed. No retries permitted until 2020-06-01 16:43:00.508102165 +0200 CEST m=+2334.636733513 (durationBeforeRetry 2m2s). Error: “RegisterPlugin error – dial failed at socket /var/lib/kubelet/plugins_registry/io.rancher.longhorn-reg.sock, err: failed to dial socket /var/lib/kubelet/plugins_registry/io.rancher.longhorn-reg.sock, err: context deadline exceeded”
Could this be a reason for Upgrade fail?
Yes. This is not right. Can you check the pod status? e.g.
kubectl -n longhorn-system get pod. I suspect the driver pod failed to deploy due to some reason.
Here we are:
engine-image-ei-62c02f63-jbj9x 1/1 Running 17 17d
longhorn-ui-6dcc77fb58-cnkzr 1/1 Running 2 22h
longhorn-manager-ndwz5 1/1 Running 2 22h
engine-image-ei-eee5f438-cfsqb 1/1 Running 2 22h
longhorn-driver-deployer-8848f7c7d-njdcv 1/1 Running 2 22h
instance-manager-r-a2716920 1/1 Running 0 41m
instance-manager-e-eef54c26 1/1 Running 0 41m
csi-resizer-686bd4b6d7-ltm6b 1/1 Running 0 39m
csi-resizer-686bd4b6d7-8tk2p 1/1 Running 0 39m
csi-resizer-686bd4b6d7-6kjrk 1/1 Running 0 39m
csi-attacher-5cc849c8dd-nr4dd 1/1 Running 1 40m
csi-provisioner-74557755-q7jvp 1/1 Running 1 40m
csi-provisioner-74557755-mn4vl 1/1 Running 1 40m
csi-provisioner-74557755-sk98z 1/1 Running 1 40m
csi-attacher-5cc849c8dd-6j6bp 1/1 Running 1 40m
csi-attacher-5cc849c8dd-jczvm 1/1 Running 1 40m
longhorn-csi-plugin-5pfnl 2/2 Running 0 39m
Can you generate a support bundle? You can click
Generate Support Bundle link at the bottom of the UI to download a zip file contains Longhorn related configuration and logs, and send to us at
email@example.com . We will take a look.
And can you file an issue too? Thanks.
Support Bundle sent and Bug-Issue opened.
Btw, this is not a concern. We’ve deprecated
io.rancher.longhorn-reg.sock. It’s safe to remove it from the node. See https://longhorn.io/docs/0.8.1/deploy/upgrade/longhorn-manager/#migration-failure-handling step 4.