Lh 0.7.0 + Rancher 2.2.8 + CoreOs hangs on attaching volumes

Hi All,

K8s v1.15.3

I tried to use the longhorn storage class to provision some storage for a POD.
Seem to be hanging on attaching the volume. Status flips between attaching detaching.


time="2019-11-26T10:03:14Z" level=warning msg="pvc-05d5c4fb-cf3b-4192-a1a6-207d8b6943da-e-c25c8fb5: time=\"2019-11-26T10:02:58Z\" level=info msg=\"Get backend tcp://10.42.3.134:10000 revision counter 0\""

11/26/2019 10:03:14 AM time="2019-11-26T10:03:14Z" level=warning msg="pvc-05d5c4fb-cf3b-4192-a1a6-207d8b6943da-e-c25c8fb5: time=\"2019-11-26T10:02:59Z\" level=info msg=\"New data socket connnection established\""

11/26/2019 10:03:14 AM time="2019-11-26T10:03:14Z" level=warning msg="pvc-05d5c4fb-cf3b-4192-a1a6-207d8b6943da-e-c25c8fb5: time=\"2019-11-26T10:03:14Z\" level=error msg=\"Shutting down frontend due to failed to callback launcher: failed to start frontend: failed to call gRPC FrontendStartCallback for engine pvc-05d5c4fb-cf3b-4192-a1a6-207d8b6943da-e-c25c8fb5: rpc error: code = Unknown desc = failed to callback for engine pvc-05d5c4fb-cf3b-4192-a1a6-207d8b6943da-e-c25c8fb5 frontend start: Failed to execute: nsenter [--mount=/host/proc/744/ns/mnt --net=/host/proc/744/ns/net iscsiadm -m node -T iqn.2014-09.com.rancher:pvc-05d5c4fb-cf3b-4192-a1a6-207d8b6943da -p 10.42.4.200 --login], output iscsiadm: No records found\\n, error exit status 21\""

11/26/2019 10:03:14 AM time="2019-11-26T10:03:14Z" level=warning msg="pvc-05d5c4fb-cf3b-4192-a1a6-207d8b6943da-e-c25c8fb5: 2019/11/26 10:03:14 failed to start up launcher: failed to start frontend: failed to call gRPC FrontendStartCallback for engine pvc-05d5c4fb-cf3b-4192-a1a6-207d8b6943da-e-c25c8fb5: rpc error: code = Unknown desc = failed to callback for engine pvc-05d5c4fb-cf3b-4192-a1a6-207d8b6943da-e-c25c8fb5 frontend start: Failed to execute: nsenter [--mount=/host/proc/744/ns/mnt --net=/host/proc/744/ns/net iscsiadm -m node -T iqn.2014-09.com.rancher:pvc-05d5c4fb-cf3b-4192-a1a6-207d8b6943da -p 10.42.4.200 --login], output iscsiadm: No records found"

11/26/2019 10:03:14 AM time="2019-11-26T10:03:14Z" level=warning msg="pvc-05d5c4fb-cf3b-4192-a1a6-207d8b6943da-e-c25c8fb5: , error exit status 21"

11/26/2019 10:03:14 AM time="2019-11-26T10:03:14Z" level=warning msg="Engine of volume pvc-05d5c4fb-cf3b-4192-a1a6-207d8b6943da dead unexpectedly, reattach the volume"

Might be related to:

Could you provide the log of csi-attacher/csi-provisioner/longhorn-csi-plugin to see if there are any similar logs to that of the issue you mentioned?

No, csi logs seem ok
Workload: longhorn-csi-plugin

  I1125 16:40:16.222124 1 main.go:110] Version: v1.2.0-0-g6ef000ae

11/25/2019 4:40:16 PM I1125 16:40:16.222184 1 main.go:120] Attempting to open a gRPC connection with: "/csi/csi.sock"

11/25/2019 4:40:16 PM I1125 16:40:16.222202 1 connection.go:151] Connecting to unix:///csi/csi.sock

11/25/2019 4:40:17 PM I1125 16:40:17.223102 1 main.go:127] Calling CSI driver to discover driver name

11/25/2019 4:40:17 PM I1125 16:40:17.223134 1 connection.go:180] GRPC call: /csi.v1.Identity/GetPluginInfo

11/25/2019 4:40:17 PM I1125 16:40:17.223146 1 connection.go:181] GRPC request: {}

11/25/2019 4:40:17 PM I1125 16:40:17.226989 1 connection.go:183] GRPC response: {"name":"driver.longhorn.io","vendor_version":"v0.7.0"}

11/25/2019 4:40:17 PM I1125 16:40:17.227434 1 connection.go:184] GRPC error: <nil>

11/25/2019 4:40:17 PM I1125 16:40:17.227442 1 main.go:137] CSI driver name: "driver.longhorn.io"

11/25/2019 4:40:17 PM I1125 16:40:17.227483 1 node_register.go:58] Starting Registration Server at: /registration/driver.longhorn.io-reg.sock

11/25/2019 4:40:17 PM I1125 16:40:17.227684 1 node_register.go:67] Registration Server started at: /registration/driver.longhorn.io-reg.sock

11/25/2019 4:40:17 PM I1125 16:40:17.377049 1 main.go:77] Received GetInfo call: &InfoRequest{}

11/25/2019 4:40:17 PM I1125 16:40:17.441531 1 main.go:87] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}

How about the logs of csi-attacher and csi-provionser?

If possible, you can directly click Generate Support Bundle link at the bottom of the UI and send the bundle to longhorn-support-bundle@rancher.com.

Done. Did it like 10 days ago or so.

I cannot find the support bundle you send… Could you check and resend it?

It seems that there is something wrong with your iscsiadm configuration on the host nodes. Then the Longhorn engine cannot be launched.

The log of instance-manager-e-267e7e8b:

2019-12-13T10:18:44.421776548Z [pvc-1207e229-7320-4363-9bbd-a77fa03b568a-e-d7aef2b8-8e0c9333] time="2019-12-13T10:18:44Z" level=info msg="New data socket connnection established"
2019-12-13T10:18:45.47934259Z [longhorn-instance-manager] time="2019-12-13T10:18:45Z" level=info msg="wait for gRPC service of process pvc-1207e229-7320-4363-9bbd-a77fa03b568a-e-d7aef2b8-8e0c9333 to start"
2019-12-13T10:18:45.616282235Z [longhorn-instance-manager] time="2019-12-13T10:18:45Z" level=warning msg="FAIL to discover due to Failed to execute: nsenter [--mount=/host/proc/1535/ns/mnt --net=/host/proc/1535/ns/net iscsiadm -m discovery -t sendtargets -p 10.42.3.77], output iscsiadm: Cannot perform discovery. Invalid Initiatorname.\niscsiadm: Could not perform SendTargets discovery: invalid parameter\n, error exit status 7"

You can refer to this post to solve this issue. If it doesn’t work, feel free to ask for help.