Cant use pvc from longhorn 0.1.0 after upgrade


#1

Hi, i have just upgraded from longhorn 0.1.0 to 0.3.0 by following the guide in the github repo. After the upgrade I get this message for my pod when I try to use the volumes created at 0.1.0 MountVolume.SetUp failed for volume "pvc-2a2ae8c8-94ab-11e8-bca9-86231f9033d5" : unexpected end of JSON input. Creating new pvc and mounting the volume works flawlessly. Is there something I am missing? Best Adrian


#2

Hi @Adrian_Brennig

Can you check the log of longhorn manager? Also, are you using CSI or Flexvolume? If you’re using the Flexvolume driver, maybe you can enable debug mode of the Flexvolume driver to see what’s wrong: https://github.com/rancher/longhorn/blob/master/docs/troubleshooting.md#flexvolume-driver


#3

Thank you @yasker for your response!
I use CSI installed via the Rancher catalog. The longhorn-manager puts out the following logs:

time="2018-09-12T18:07:20Z" level=debug msg="Engine image rancher/longhorn-engine:v0.3.0 is ready"

9/12/2018 8:07:20 PM time="2018-09-12T18:07:20Z" level=info msg="Start Longhorn engine controller"

9/12/2018 8:07:20 PM time="2018-09-12T18:07:20Z" level=info msg="Start Longhorn volume controller"

9/12/2018 8:07:20 PM time="2018-09-12T18:07:20Z" level=info msg="Start Longhorn replica controller"

9/12/2018 8:07:20 PM time="2018-09-12T18:07:20Z" level=info msg="Start Longhorn Engine Image controller"

9/12/2018 8:07:20 PM time="2018-09-12T18:07:20Z" level=info msg="Start Longhorn node controller"

9/12/2018 8:07:20 PM time="2018-09-12T18:07:20Z" level=info msg="Start Longhorn websocket controller"

9/12/2018 8:07:20 PM time="2018-09-12T18:07:20Z" level=info msg="Listening on 10.42.5.183:9500"

9/12/2018 8:07:20 PM time="2018-09-12T18:07:20Z" level=warning msg="Error syncing Longhorn replica longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-ea1c6751: fail to sync replica for longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-ea1c6751: cannot find the default engine the replica pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-ea1c6751 belong to"

9/12/2018 8:07:20 PM time="2018-09-12T18:07:20Z" level=warning msg="Error syncing Longhorn replica longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-bbfaa045: fail to sync replica for longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-bbfaa045: cannot find the default engine the replica pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-bbfaa045 belong to"

9/12/2018 8:07:20 PM time="2018-09-12T18:07:20Z" level=warning msg="Error syncing Longhorn replica longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-a0266446: fail to sync replica for longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-a0266446: cannot find the default engine the replica pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-a0266446 belong to"

9/12/2018 8:07:20 PM time="2018-09-12T18:07:20Z" level=info msg="Event(v1.ObjectReference{Kind:\"Node\", Namespace:\"longhorn-system\", Name:\"worker1\", UID:\"79cf68e9-b6b0-11e8-ac84-8e7ee66bf8b9\", APIVersion:\"longhorn.rancher.io/v1alpha1\", ResourceVersion:\"33113580\", FieldPath:\"\"}): type: 'Warning' reason: 'ManagerPodDown' Node worker1 is down: the manager pod longhorn-manager-sqp6v is not running"

9/12/2018 8:07:20 PM time="2018-09-12T18:07:20Z" level=info msg="Event(v1.ObjectReference{Kind:\"Node\", Namespace:\"longhorn-system\", Name:\"worker2\", UID:\"7968d111-b6b0-11e8-ac84-8e7ee66bf8b9\", APIVersion:\"longhorn.rancher.io/v1alpha1\", ResourceVersion:\"33113598\", FieldPath:\"\"}): type: 'Warning' reason: 'ManagerPodDown' Node worker2 is down: the manager pod longhorn-manager-vb5pb is not running"

9/12/2018 8:07:20 PM time="2018-09-12T18:07:20Z" level=info msg="Event(v1.ObjectReference{Kind:\"Node\", Namespace:\"longhorn-system\", Name:\"worker3\", UID:\"79c3c1c7-b6b0-11e8-ac84-8e7ee66bf8b9\", APIVersion:\"longhorn.rancher.io/v1alpha1\", ResourceVersion:\"33113588\", FieldPath:\"\"}): type: 'Warning' reason: 'ManagerPodDown' Node worker3 is down: the manager pod longhorn-manager-9wxbc is not running"

9/12/2018 8:07:20 PM time="2018-09-12T18:07:20Z" level=warning msg="Error syncing Longhorn replica longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-ea1c6751: fail to sync replica for longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-ea1c6751: cannot find the default engine the replica pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-ea1c6751 belong to"

9/12/2018 8:07:20 PM time="2018-09-12T18:07:20Z" level=warning msg="Error syncing Longhorn replica longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-bbfaa045: fail to sync replica for longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-bbfaa045: cannot find the default engine the replica pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-bbfaa045 belong to"

9/12/2018 8:07:20 PM time="2018-09-12T18:07:20Z" level=warning msg="Error syncing Longhorn replica longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-a0266446: fail to sync replica for longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-a0266446: cannot find the default engine the replica pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-a0266446 belong to"

9/12/2018 8:07:20 PM E0912 18:07:20.629452 1 replica_controller.go:206] fail to sync replica for longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-ea1c6751: cannot find the default engine the replica pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-ea1c6751 belong to

9/12/2018 8:07:20 PM time="2018-09-12T18:07:20Z" level=warning msg="Dropping Longhorn replica longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-ea1c6751 out of the queue: fail to sync replica for longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-ea1c6751: cannot find the default engine the replica pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-ea1c6751 belong to"

9/12/2018 8:07:20 PM E0912 18:07:20.629765 1 replica_controller.go:206] fail to sync replica for longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-bbfaa045: cannot find the default engine the replica pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-bbfaa045 belong to

9/12/2018 8:07:20 PM E0912 18:07:20.629838 1 replica_controller.go:206] fail to sync replica for longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-a0266446: cannot find the default engine the replica pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-a0266446 belong to

9/12/2018 8:07:20 PM time="2018-09-12T18:07:20Z" level=warning msg="Dropping Longhorn replica longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-bbfaa045 out of the queue: fail to sync replica for longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-bbfaa045: cannot find the default engine the replica pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-bbfaa045 belong to"

9/12/2018 8:07:20 PM time="2018-09-12T18:07:20Z" level=warning msg="Dropping Longhorn replica longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-a0266446 out of the queue: fail to sync replica for longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-a0266446: cannot find the default engine the replica pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-a0266446 belong to"

9/12/2018 8:07:24 PM time="2018-09-12T18:07:24Z" level=info msg="Event(v1.ObjectReference{Kind:\"Node\", Namespace:\"longhorn-system\", Name:\"worker4\", UID:\"7b1cbb34-b6b0-11e8-ac84-8e7ee66bf8b9\", APIVersion:\"longhorn.rancher.io\", ResourceVersion:\"33113730\", FieldPath:\"\"}): type: 'Normal' reason: 'Ready' Node worker4 is ready"

9/12/2018 8:07:50 PM time="2018-09-12T18:07:50Z" level=warning msg="Error syncing Longhorn replica longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-a0266446: fail to sync replica for longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-a0266446: cannot find the default engine the replica pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-a0266446 belong to"

9/12/2018 8:07:50 PM time="2018-09-12T18:07:50Z" level=warning msg="Error syncing Longhorn replica longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-ea1c6751: fail to sync replica for longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-ea1c6751: cannot find the default engine the replica pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-ea1c6751 belong to"

9/12/2018 8:07:50 PM time="2018-09-12T18:07:50Z" level=warning msg="Error syncing Longhorn replica longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-bbfaa045: fail to sync replica for longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-bbfaa045: cannot find the default engine the replica pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-bbfaa045 belong to"

9/12/2018 8:07:50 PM time="2018-09-12T18:07:50Z" level=warning msg="Error syncing Longhorn replica longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-a0266446: fail to sync replica for longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-a0266446: cannot find the default engine the replica pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-a0266446 belong to"

9/12/2018 8:07:50 PM time="2018-09-12T18:07:50Z" level=warning msg="Error syncing Longhorn replica longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-ea1c6751: fail to sync replica for longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-ea1c6751: cannot find the default engine the replica pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-ea1c6751 belong to"

9/12/2018 8:07:50 PM time="2018-09-12T18:07:50Z" level=warning msg="Error syncing Longhorn replica longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-bbfaa045: fail to sync replica for longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-bbfaa045: cannot find the default engine the replica pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-bbfaa045 belong to"

9/12/2018 8:07:50 PM E0912 18:07:50.418932 1 replica_controller.go:206] fail to sync replica for longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-a0266446: cannot find the default engine the replica pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-a0266446 belong to

9/12/2018 8:07:50 PM time="2018-09-12T18:07:50Z" level=warning msg="Dropping Longhorn replica longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-a0266446 out of the queue: fail to sync replica for longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-a0266446: cannot find the default engine the replica pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-a0266446 belong to"

9/12/2018 8:07:50 PM E0912 18:07:50.420395 1 replica_controller.go:206] fail to sync replica for longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-ea1c6751: cannot find the default engine the replica pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-ea1c6751 belong to

9/12/2018 8:07:50 PM time="2018-09-12T18:07:50Z" level=warning msg="Dropping Longhorn replica longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-ea1c6751 out of the queue: fail to sync replica for longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-ea1c6751: cannot find the default engine the replica pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-ea1c6751 belong to"

9/12/2018 8:07:50 PM E0912 18:07:50.420693 1 replica_controller.go:206] fail to sync replica for longhorn-system/pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-bbfaa045: cannot find the default engine the replica pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-r-bbfaa045 belong to

The csi-provisioner is logging:
I0912 18:20:43.430595 1 reflector.go:286] github.com/kubernetes-csi/external-provisioner/vendor/github.com/kubernetes-incubator/external-storage/lib/controller/controller.go:498: forcing resync

The longhorn-ui also stopped working i am getting a 504 Gateway time-out.


#4

It seems there is no engine for volume pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5. Can you check kubectl -n longhorn-system get lhe and find pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5-e in it?


#5

kubectl -n longhorn-system get lhe
only returns those:
NAME AGE
pvc-06af3cf6-b205-11e8-9a0f-8e7ee66bf8b9-e 5m
pvc-089f9950-94b1-11e8-bca9-86231f9033d5-e 5m
pvc-1dcd587b-956e-11e8-bca9-86231f9033d5-e 5m
pvc-2a2ae8c8-94ab-11e8-bca9-86231f9033d5-e 5m
pvc-3e4d978b-b211-11e8-9a0f-8e7ee66bf8b9-e 5m
pvc-54725668-94f8-11e8-bca9-86231f9033d5-e 5m
pvc-58179b6b-aa44-11e8-8b76-8e7ee66bf8b9-e 5m
pvc-86ffa42c-a6ec-11e8-b758-8e7ee66bf8b9-e 5m


#6

OK, the lacking of the engine is the reason for the error.

I don’t know how the engine can be missing, it’s created along with the volume, and won’t be deleted until the volume is deleted.

Now I think in order to recover from this error, you need delete the volume pvc-ef9cd392-9276-11e8-a7ab-86231f9033d5 (delete the related workload first). Hope you’ve made a backup for the volume.


#7

Thank you! The error logs are now gone. I think they were volumes which were not deleted properly on 0.1.0.

Sadly I still cant connect to the UI. I just get a timeout. Recreating the ui pod doesn’t help either. Is there something else I could try?


#8

I got the UI working again. But the initial error still is present every pod with a volume from 0.1.0 is stuck at containerCreating whi the error MountVolume.SetUp failed for volume "pvc-***" : unexpected end of JSON input the longhorn-manager logs nothing, but the csi-provisioner is still lgging I0913 08:10:00.271184 1 reflector.go:286] github.com/kubernetes-csi/external-provisioner/vendor/github.com/kubernetes-incubator/external-storage/lib/controller/controller.go:496: forcing resync I can manually attach the volume. It seems to me that longhorn somehow does not receive the command to attach the volume when I start the pod.


#9

Oh, it was created using Flexvolume driver in v0.1.0 but now it’s on CSI. Yeah, there can be issues in Kubernetes because of that. Can you try to redeploy Longhorn with Flexvolume driver? Remember to set FLEXVOLUME_DIR correctly.


#10

Btw, @LLParse is currently investigating the impact on switching drivers on the existing deployment.


#11

Thank you! That did it! Did I miss that in the docs, if not it would be helpful if it would be mentioned to use flexvolume again. Now everything works. Thank you for your great help!!


#12

That’s great! We haven’t had document about the impact on switching from Flexvolume to CSI. We would add that part soon.