Failed Upgrade from v0.8.1 to v1.0.0 caused by pv created before v0.6.2


I have this scenario I want to share. I’m sorry I mirror this thread to my blog, because I can’t write more than 2 links. You can see full post here

I scale down all pods that has vpc to 0
I upgraded longhorn chart from 0.8.1 to 1.0.0
Pods in longhorn-system was upgraded successfully
I upgrade all volume’s engine image successfully, and delete old engine image (0.8.1)
I scaled up the pods again. All is fine and can be mounted successfully, but 2 pods failed to mount
I checked longhorn manager logs, found this log when I scaled up pods

Skipping VolumeAttachment csi-5a855e5c301b037f5ca2d9661295f07e0aef8fd94bdd8bbc65e608d910d66759 for attacher io.rancher.longhorn

I remember io.rancher.longhorn in release note, I checked again “link redacted”

I went to “link redacted”

I executed “link redacted”
Unfortunatelly, the migration was fail (because I execute it after I upgrade to v1.0.0)

bash pvc-f476893d-0309-11ea-9a5c-ce8d7549db3b
FATA[2020-06-01T11:53:41Z] Error migrate PVs and PVCs for the volumes: Failed to migrate PV and PVC for the volume pvc-f476893d-0309-11ea-9a5c-ce8d7549db3b: failed to delete then recreate PV/PVC, users need to manually check the current PVC/PV then recreate them if needed: failed to wait for the old PV deletion complete
command terminated with exit code 1

When I check the pv, it was stuck in terminating state

pvc-f476893d-0309-11ea-9a5c-ce8d7549db3b 5Gi RWO Retain Terminating mine/data-mariadb-0 longhorn 204d

I remember issue like this happened to me when I tried to delete pv long time before. I solved it by removing the finalizer when editing pv (PLEASE DON’T DO THIS IF YOU STILL WANT YOUR PV DATA), and yes, my pv was gone, my data was gone. Ok no problem.

I checked another pv with younger creation time and compare it with my oldest one. There is a different in finalizer section, external-attacher/io-rancher-longhorn vs external-attacher/driver-longhorn-io.

I just think that external-attacher/io-rancher-longhorn doesn’t exist in v1.0.0 and make my pv in terminating state. So I changed external-attacher/io-rancher-longhorn into external-attacher/driver-longhorn-io, and re-run migration script in “link redacted”, and the pv was upgraded successfully. When I scaled up the pods, it also can mount the migrated volume.

By this result, I think the migration was well done, but please do at your own risk and always backup your data before upgrade. I was lucky I still have my data backed up.

In short, when you have upgraded from 0.8.1 into 1.0.0, but you still have pvc that need to be migrated, you can do this (at your own risk):

  1. Execute this to see pvc list that need to be migrated

    kubectl get pv --output=jsonpath="{.items[?(@.spec.csi.driver==“io.rancher.longhorn”)].spec.csi.volumeHandle}"

  2. Edit the finalizer in each pv from external-attacher/io-rancher-longhorn to external-attacher/driver-longhorn-io
  3. Run the migration script for each pv (“link redacted” number 3)

    curl -s | bash -s – volume name (<---- beware with double dash)

Please do with your own risk. I don’t know if this is the right steps. If you still can do “link redacted”, please do it that way.

I will try to answer any question to this thread as far as I can, but I’m not longhorn expert.

I hope this can help someone.

If any of you can reproduce or validate and explain exactly technically why this scenario was happened, it will be a big help.
Thank you.

The link we used to migrate volume for before v0.6.2.

@akhfa The upgrade instruction was supposed to be executed before the upgrade, that might be the reason it failed. We will look into it.

@shuo-wu Can you check the upgrade steps still valid for v1.0.0? Since we’ve removed the compatible CSI driver in v1.0.0.

As you said, the unexpected finalizer external-attacher/io-rancher-longhorn in the PV blocks/fails the migration. Removing the finalizer would fix this issue. We have updated the doc and you can check if it works for you:

BTW, deleting PV doesn’t mean the related Longhorn volume and data are removed. As long as the PV field spec.persistentVolumeReclaimPolicy is Retain, the related volume/data will be retained after the PV deletion. Then you can recreate PV/PVC/Workload for the volume.

Hi all, thanks for your responses

BTW, deleting PV doesn’t mean the related Longhorn volume and data are removed. As long as the PV field spec.persistentVolumeReclaimPolicy is Retain , the related volume/data will be retained after the PV deletion. Then you can recreate PV/PVC/Workload for the volume.

This fact is new for me. How can rancher know that when we create PV file, it is linked with old volume/data? Is it by the PV name?

I confirmed my data is retained when I recreate the PV through the UI

But I’m pretty sure that in my previous investigation, in a step, I also tried to recreate via same methods (via UI), but it was failed with error code something like there is no PV, or PVC not linked with any PV, or something like that. Maybe it was when my PV stuck in terminating state, but I’m not sure tough :thinking:

The Longhorn volume is identified by volumeHandle field in the PV. volumeHandle is essentially the Longhorn volume name.

The default StorageClass policy is Delete instead of Retain, so in that case, when you deleting a PV that dynamically provisioned by the StorageClass, the data will be deleted. The PV/PVC created using Longhorn UI will have policy Retain by default, and since it’s not dynamically created by the StorageClass, it won’t be deleted anyway when you delete the PV.

1 Like

You can check this example to see how Kubernetes link a PV with a CSI volume.
The field volumeHandle is the CSI volume name.

And here is the description for persistentVolumeReclaimPolicy: Retain

1 Like

Thank you for summarizing this topic. What I don’t quite understand yet: Do I always have to manually create the volume via the Longhorn UI beforehand, or is there a way to describe the in the yaml configuration?

If you mean creating a Longhorn volume, you can create it via YAML files with a storageClass configured, or via the Longhorn UI.
The example of the YAML file:

If you mean creating PV/PVC for a Longhorn volume, you can use Longhorn UI or a YAML file as mentioned above.

What I mean is the definition of durable volumes by using a storage class with reclaimPolicy: Retain

I am doing this like in the following example where I have the ‘csi’ section defining a ‘volumeHandle’. As far as I understand from the documentation I have - for durable volumes - first create the volume in the longhorn-ui and than define the static reference in my PersistenceVolume definition using the ‘csi/volumeHandle’ attribute. Is this right?

kind: PersistentVolume
apiVersion: v1
  name: dbdata-pv
  namespace: office-demo
    storage: 2Gi
  volumeMode: Filesystem
    - ReadWriteOnce
    namespace: office-demo
    name: dbdata-pvc
    fsType: ext4
    volumeHandle: office-demo-dbdata
  storageClassName: longhorn-durable

apiVersion: v1
kind: PersistentVolumeClaim
  name: dbdata-pvc
  namespace: office-demo
    - ReadWriteOnce
  storageClassName: longhorn-durable
      storage: 2Gi
  volumeName: "dbdata-pv"

The reclaimPolicy is not the key point of your consideration/question. I think you may get confused about how to provision and use a Longhorn volume in Kubernetes.

As I mentioned above, there are 2 ways to create then use a Longhorn volume. I will explain more about the 2 use cases:

  1. You can create a Longhorn volume via Longhorn UI. Then you can use the YAML file/example you just mentioned or Longhorn UI to create a PV and PVC for this Longhorn volume.
  2. You can directly create a PVC with a storageClass specified. Then Kubernetes will automatically create a PV then request Longhorn to create a Longhorn volume for the PV/PVC based on the parameters of storageClass. In this case, you don’t need to use Longhorn UI to do anything.

After you get a PVC/PV with the Longhorn volume, the workload can use the Longhorn volume by specifying the PVC.
BTW, a Longhorn volume is represented as a CRD object in Kubernetes. You can check this CRD object status by:
kubectl -n longhorn-system get <volume name>.

If you still have questions about the persistent storage of Kubernetes, you can read the following articles for a more comprehensive understanding: