Failed to use Longhorn


#1

Hello !

I try to use Rancher 2 with Longhorn but it’s doesn’t work.
This is my configuration:

  • RancherOS
    • version v1.4.1 from os image rancher/os:v1.4.1
  • Kubernetes:
    • Client Version: version.Info{Major:“1”, Minor:“12”, GitVersion:“v1.12.0”, GitCommit:“0ed33881dc4355495f623c6f22e7dd0b7632b7c0”, GitTreeState:“clean”, BuildDate:“2018-09-27T17:05:32Z”, GoVersion:“go1.10.4”, Compiler:“gc”, Platform:“linux/amd64”}
    • Server Version: version.Info{Major:“1”, Minor:“11”, GitVersion:“v1.11.1”, GitCommit:“b1b29978270dc22fecc592ac55d903350454310a”, GitTreeState:“clean”, BuildDate:“2018-07-17T18:43:26Z”, GoVersion:“go1.10.3”, Compiler:“gc”, Platform:“linux/amd64”}
  • Rancher:
    • v2.1.1
  • Longhorn:
    • v0.3.1

I have install my kubernetes cluster on two nodes with rke and I have install Rancher with helm.

Before to install Longhorn, i have launch this script to check the environment:

curl -sSfL https://raw.githubusercontent.com/rancher/longhorn/master/scripts/environment_check.sh | bash

The result:

> curl -sSfL https://raw.githubusercontent.com/rancher/longhorn/master/scripts/environment_check.sh | bash
daemonset.apps "longhorn-environment-check" created
waiting for pods to become ready (0/2)
waiting for pods to become ready (0/2)
all pods ready (2/2)

  MountPropagation is enabled!

cleaning up...
daemonset.apps "l

After, i have install Longhorn from the Rancher catalog in “longhorn-system” namespace with the “csi” driver.

Everything seems to have installed correctly:

> kubectl -n longhorn-system get all
NAME                                            READY     STATUS    RESTARTS   AGE
pod/csi-attacher-0                              1/1       Running   0          5m
pod/csi-provisioner-0                           1/1       Running   0          5m
pod/engine-image-ei-3bda103d-2d52c              1/1       Running   0          5m
pod/engine-image-ei-3bda103d-vm6rg              1/1       Running   0          5m
pod/longhorn-csi-plugin-2xjdx                   2/2       Running   0          5m
pod/longhorn-csi-plugin-cf7g4                   2/2       Running   0          5m
pod/longhorn-driver-deployer-7ff445576d-gcnc6   1/1       Running   0          5m
pod/longhorn-manager-6bf2r                      1/1       Running   0          5m
pod/longhorn-manager-cbbgr                      1/1       Running   0          5m
pod/longhorn-ui-5f599b67fd-82jr8                1/1       Running   0          5m

NAME                        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
service/csi-attacher        ClusterIP   10.43.1.40      <none>        12345/TCP      5m
service/csi-provisioner     ClusterIP   10.43.102.137   <none>        12345/TCP      5m
service/longhorn-backend    ClusterIP   10.43.49.130    <none>        9500/TCP       6m
service/longhorn-frontend   NodePort    10.43.177.26    <none>        80:30031/TCP   5m

NAME                                      DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/engine-image-ei-3bda103d   2         2         2         2            2           <none>          5m
daemonset.apps/longhorn-csi-plugin        2         2         2         2            2           <none>          5m
daemonset.apps/longhorn-manager           2         2         2         2            2           <none>          5m

NAME                                       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/longhorn-driver-deployer   1         1         1            1           5m
deployment.apps/longhorn-ui                1         1         1            1           5m

NAME                                                  DESIRED   CURRENT   READY     AGE
replicaset.apps/longhorn-driver-deployer-7ff445576d   1         1         1         5m
replicaset.apps/longhorn-ui-5f599b67fd                1         1         1         5m

NAME                               DESIRED   CURRENT   AGE
statefulset.apps/csi-attacher      1         1         5m
statefulset.apps/csi-provisioner   1         1         5m

After, I tried to use the example who is specified in the Longhorn project github:

> kubectl create -f https://raw.githubusercontent.com/rancher/longhorn/master/examples/pvc.yaml
persistentvolumeclaim "longhorn-volv-pvc" created
pod "volume-test" created

And now it’s failed:

The volume in Longhorn have a “Attaching” status in the Longhorn UI.

If I describe the pod:

> kubectl describe pod/volume-test
Name:         volume-test
Namespace:    default
Node:         rancher.[...].com/[...]
Start Time:   Fri, 02 Nov 2018 13:16:28 +0000
Labels:       <none>
Annotations:  <none>
Status:       Pending
IP:
Containers:
  volume-test:
    Container ID:
    Image:          nginx:stable-alpine
    Image ID:
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /data from volv (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-vg7mt (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  volv:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  longhorn-volv-pvc
    ReadOnly:   false
  default-token-vg7mt:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-vg7mt
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason              Age               From                                    Message
  ----     ------              ----              ----                                    -------
  Warning  FailedScheduling    7m (x6 over 7m)   default-scheduler                       pod has unbound PersistentVolumeClaims (repeated 2 times)
  Normal   Scheduled           7m                default-scheduler                       Successfully assigned default/volume-test to rancher-[...].com
  Warning  FailedAttachVolume  6m (x8 over 7m)   attachdetach-controller                 AttachVolume.Attach failed for volume "pvc-7bace724-dea1-11e8-8625-0cc47a9fd474" : rpc error: code = Internal desc = Action [attach] not available on [&{pvc-7bace724-dea1-11e8-8625-0cc47a9fd474 volume map[self:http://longhorn-backend:9500/v1/volumes/pvc-7bace724-dea1-11e8-8625-0cc47a9fd474] map[]}]
  Warning  FailedAttachVolume  1m (x3 over 5m)   attachdetach-controller                 AttachVolume.Attach failed for volume "pvc-7bace724-dea1-11e8-8625-0cc47a9fd474" : rpc error: code = Aborted desc = The volume pvc-7bace724-dea1-11e8-8625-0cc47a9fd474 is attaching
  Warning  FailedMount         41s (x3 over 5m)  kubelet, rancher-***.***l.com  Unable to mount volumes for pod "volume-test_default(7bd8d921-dea1-11e8-8625-0cc47a9fd474)": timeout expired waiting for volumes to attach or mount for pod "default"/"volume-test". list of unmounted volumes=[volv]. list of unattached volumes=[volv default-token-vg7mt]
>

Can you help me ?

Thanks.
Benjamin.


#2

I tried to set extra_binds configuration in my rke cluster.yml file:

  kubelet:
    image: ""
    extra_args: {}
    extra_binds:
      - "/var/lib/kubelet/plugins:/var/lib/kubelet/plugins"
      - "/var/lib/kubelet/pods:/var/lib/kubelet/pods:shared,z"

But the problem persist.


#3

Hi @Benjamin_Carriou

We haven’t added the full support for RancherOS yet, since we know there is an issue with RancherOS + RKE + CSI. https://github.com/rancher/longhorn/issues/198

Though some users reports that Flexvolume driver work well. You can try to reinstall Longhorn with driver Flexvolume and let us know.


#4

Hi @yasker

Thanks for your response.

I tried to use flexvolume driver but it’s failed to.

I have install longhorn from the rancher catalog and I set the Longhorn Flexvolume Path to

/opt/rke/var/lib/kubelet/volumeplugins

In my rke cluster.yml file, I configure that:

  kubelet:
    image: ""
    extra_args:
      volume-plugin-dir: /opt/rke/var/lib/kubelet/volumeplugins
    extra_binds: []

Kubelet is well configured:

root      9200  6.3  0.4 1478540 141780 ?      [...] --volume-plugin-dir=/opt/rke/var/lib/kubelet/volumeplugins [...]

Now, I tried to install wordpress from catalog and persist data to longhorn but it’s not work.

Some logs when I instanciate wordpress from catalog:

kubetail  -n longhorn-system

longhorn-driver-deployer-75bffb8c-xl8dk longhorn-driver-deployer] time="2018-11-05T15:57:09Z" level=info msg="provisioner: created volume pvc-72af745f-e113-11e8-a199-0cc47a9fd474, size: 10Gi"
[longhorn-manager-f2pbr] time="2018-11-05T15:57:09Z" level=debug msg="Created volume pvc-72af745f-e113-11e8-a199-0cc47a9fd474"
[longhorn-manager-f2pbr] time="2018-11-05T15:57:09Z" level=debug msg="Volume Controller rancher-ns3116855.kereval.com picked up pvc-72af745f-e113-11e8-a199-0cc47a9fd474"
[longhorn-manager-f2pbr] time="2018-11-05T15:57:10Z" level=debug msg="Requeue longhorn-system/pvc-72af745f-e113-11e8-a199-0cc47a9fd474 due to conflict"
[longhorn-manager-f2pbr] time="2018-11-05T15:57:10Z" level=info msg="Event(v1.ObjectReference{Kind:"Volume", Namespace:"longhorn-system", Name:"pvc-72af745f-e113-11e8-a199-0cc47a9fd474", UID:"7370a938-e113-11e8-a199-0cc47a9fd474", APIVersion:"longhorn.rancher.io", ResourceVersion:"36156", FieldPath:""}): type: 'Normal' reason: 'Detached' volume pvc-72af745f-e113-11e8-a199-0cc47a9fd474 has been detached"
[longhorn-ui-5f599b67fd-k58sj] 54.38.179.67 - - [05/Nov/2018:15:57:11 +0000] "GET /v1/nodes? HTTP/1.1" 200 2873 "http://longhorn-ui.rancher.kereval.com/dashboard" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
[longhorn-ui-5f599b67fd-k58sj] 10.42.1.0 - - [05/Nov/2018:15:57:11 +0000] "GET /v1/engineimages? HTTP/1.1" 200 652 "http://longhorn-ui.rancher.kereval.com/dashboard" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
[longhorn-ui-5f599b67fd-k58sj] 10.42.1.0 - - [05/Nov/2018:15:57:11 +0000] "GET /v1/volumes? HTTP/1.1" 200 5213 "http://longhorn-ui.rancher.kereval.com/dashboard" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
[longhorn-ui-5f599b67fd-k58sj] 10.42.1.0 - - [05/Nov/2018:15:57:11 +0000] "GET /v1/events? HTTP/1.1" 200 18925 "http://longhorn-ui.rancher.kereval.com/dashboard" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
[longhorn-manager-fc4nd] time="2018-11-05T15:57:12Z" level=debug msg="Attaching volume pvc-72af745f-e113-11e8-a199-0cc47a9fd474 to rancher-ns3116855.kereval.com"
[longhorn-manager-f2pbr] time="2018-11-05T15:57:12Z" level=info msg="Event(v1.ObjectReference{Kind:"Replica", Namespace:"longhorn-system", Name:"pvc-72af745f-e113-11e8-a199-0cc47a9fd474-r-1e407e21", UID:"73c9fbd2-e113-11e8-a199-0cc47a9fd474", APIVersion:"longhorn.rancher.io", ResourceVersion:"36255", FieldPath:""}): type: 'Normal' reason: 'Start' Starts pvc-72af745f-e113-11e8-a199-0cc47a9fd474-r-1e407e21"
[longhorn-manager-f2pbr] time="2018-11-05T15:57:12Z" level=info msg="Event(v1.ObjectReference{Kind:"Replica", Namespace:"longhorn-system", Name:"pvc-72af745f-e113-11e8-a199-0cc47a9fd474-r-8f56b8b2", UID:"73d3abe0-e113-11e8-a199-0cc47a9fd474", APIVersion:"longhorn.rancher.io", ResourceVersion:"36248", FieldPath:""}): type: 'Normal' reason: 'Start' Starts pvc-72af745f-e113-11e8-a199-0cc47a9fd474-r-8f56b8b2"
[longhorn-manager-f2pbr] time="2018-11-05T15:57:12Z" level=info msg="Event(v1.ObjectReference{Kind:"Replica", Namespace:"longhorn-system", Name:"pvc-72af745f-e113-11e8-a199-0cc47a9fd474-r-ff75e8dd", UID:"73b383a6-e113-11e8-a199-0cc47a9fd474", APIVersion:"longhorn.rancher.io", ResourceVersion:"36254", FieldPath:""}): type: 'Normal' reason: 'Start' Starts pvc-72af745f-e113-11e8-a199-0cc47a9fd474-r-ff75e8dd"
[longhorn-manager-f2pbr] time="2018-11-05T15:57:13Z" level=debug msg="Requeue longhorn-system/pvc-72af745f-e113-11e8-a199-0cc47a9fd474 due to conflict"
[longhorn-manager-f2pbr] time="2018-11-05T15:57:13Z" level=debug msg="Requeue longhorn-system/pvc-72af745f-e113-11e8-a199-0cc47a9fd474-r-1e407e21 due to conflict"
[longhorn-manager-f2pbr] time="2018-11-05T15:57:14Z" level=debug msg="Requeue longhorn-system/pvc-72af745f-e113-11e8-a199-0cc47a9fd474-r-1e407e21 due to conflict"
[longhorn-manager-f2pbr] time="2018-11-05T15:57:15Z" level=debug msg="Requeue longhorn-system/rancher-ns3116855.kereval.com due to conflict"
[longhorn-ui-5f599b67fd-k58sj] 54.38.179.67 - - [05/Nov/2018:15:57:16 +0000] "GET /v1/engineimages?field=status&stateValue=inProgress HTTP/1.1" 200 652 "http://longhorn-ui.rancher.kereval.com/volume?field=status&stateValue=inProgress" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
[longhorn-ui-5f599b67fd-k58sj] 54.38.179.67 - - [05/Nov/2018:15:57:16 +0000] "GET /v1/volumes?field=status&stateValue=inProgress HTTP/1.1" 200 4820 "http://longhorn-ui.rancher.kereval.com/volume?field=status&stateValue=inProgress" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
[longhorn-ui-5f599b67fd-k58sj] 10.42.1.0 - - [05/Nov/2018:15:57:16 +0000] "GET /v1/nodes?field=status&stateValue=inProgress HTTP/1.1" 200 2873 "http://longhorn-ui.rancher.kereval.com/volume?field=status&stateValue=inProgress" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
[longhorn-ui-5f599b67fd-k58sj] 54.38.179.67 - - [05/Nov/2018:15:57:17 +0000] "GET /v1/events?field=status&stateValue=inProgress HTTP/1.1" 200 21046 "http://longhorn-ui.rancher.kereval.com/volume?field=status&stateValue=inProgress" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
[longhorn-manager-f2pbr] time="2018-11-05T15:57:20Z" level=debug msg="Requeue longhorn-system/rancher-ns3116855.kereval.com due to conflict"
[longhorn-manager-f2pbr] time="2018-11-05T15:57:20Z" level=debug msg="Requeue longhorn-system/rancher-ns3116855.kereval.com due to conflict"
[longhorn-manager-f2pbr] time="2018-11-05T15:57:21Z" level=debug msg="Requeue longhorn-system/rancher-ns3116855.kereval.com due to conflict"
[longhorn-manager-f2pbr] time="2018-11-05T15:57:21Z" level=debug msg="Instance pvc-72af745f-e113-11e8-a199-0cc47a9fd474-r-ff75e8dd starts running, IP 10.42.0.28"
[longhorn-manager-f2pbr] time="2018-11-05T15:57:21Z" level=debug msg="Instance pvc-72af745f-e113-11e8-a199-0cc47a9fd474-r-8f56b8b2 starts running, IP 10.42.1.49"
[longhorn-manager-f2pbr] time="2018-11-05T15:57:21Z" level=debug msg="Instance pvc-72af745f-e113-11e8-a199-0cc47a9fd474-r-1e407e21 starts running, IP 10.42.0.27"
[longhorn-manager-fc4nd] time="2018-11-05T15:57:22Z" level=debug msg="websocket: close 1006 (abnormal closure): unexpected EOF" id=54304525855160007 type=engineimages
[longhorn-ui-5f599b67fd-k58sj] 54.38.179.67 - - [05/Nov/2018:15:57:22 +0000] "GET /v1/ws/1s/engineimages HTTP/1.1" 101 3322 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
[longhorn-ui-5f599b67fd-k58sj] 54.38.179.67 - - [05/Nov/2018:15:57:22 +0000] "GET /v1/ws/1s/nodes HTTP/1.1" 101 146139 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
[longhorn-manager-fc4nd] time="2018-11-05T15:57:22Z" level=debug msg="websocket: close 1006 (abnormal closure): unexpected EOF" id=3877003785578227882 type=nodes
[longhorn-manager-fc4nd] time="2018-11-05T15:57:22Z" level=debug msg="websocket: close 1006 (abnormal closure): unexpected EOF" id=1444887997823667411 type=events
[longhorn-ui-5f599b67fd-k58sj] 54.38.179.67 - - [05/Nov/2018:15:57:22 +0000] "GET /v1/ws/1s/events HTTP/1.1" 101 70030 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
[longhorn-ui-5f599b67fd-k58sj] 10.42.1.0 - - [05/Nov/2018:15:57:22 +0000] "GET /v1/ws/1s/volumes HTTP/1.1" 101 54878 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
[longhorn-manager-fc4nd] time="2018-11-05T15:57:22Z" level=debug msg="websocket: close 1006 (abnormal closure): unexpected EOF" id=973309006821215145 type=volumes
[longhorn-ui-5f599b67fd-k58sj] 10.42.1.0 - - [05/Nov/2018:15:57:22 +0000] "GET /static/ws-closed.d12dd327.svg HTTP/1.1" 304 0 "http://longhorn-ui.rancher.kereval.com/volume?field=status&stateValue=inProgress" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
[longhorn-manager-f2pbr] time="2018-11-05T15:57:22Z" level=info msg="Event(v1.ObjectReference{Kind:"Engine", Namespace:"longhorn-system", Name:"pvc-72af745f-e113-11e8-a199-0cc47a9fd474-e-edd316d0", UID:"73a5b8c5-e113-11e8-a199-0cc47a9fd474", APIVersion:"longhorn.rancher.io", ResourceVersion:"36409", FieldPath:""}): type: 'Normal' reason: 'Start' Starts pvc-72af745f-e113-11e8-a199-0cc47a9fd474-e-edd316d0"
[longhorn-manager-fc4nd] time="2018-11-05T15:57:24Z" level=debug msg="websocket: open" id=7070590508081648998 type=volumes
[longhorn-manager-fc4nd] time="2018-11-05T15:57:25Z" level=debug msg="websocket: open" id=3680301079392325632 type=nodes
[longhorn-manager-fc4nd] time="2018-11-05T15:57:25Z" level=debug msg="websocket: open" id=5431886313952997977 type=engineimages
[longhorn-manager-f2pbr] time="2018-11-05T15:57:25Z" level=debug msg="Requeue longhorn-system/rancher-ns3116855.kereval.com due to conflict"
[longhorn-manager-fc4nd] time="2018-11-05T15:57:25Z" level=debug msg="websocket: open" id=8334384289869306570 type=events
[longhorn-manager-f2pbr] time="2018-11-05T15:57:26Z" level=debug msg="Requeue longhorn-system/rancher-ns3116855.kereval.com due to conflict"
[longhorn-manager-f2pbr] time="2018-11-05T15:57:27Z" level=debug msg="Requeue longhorn-system/rancher-ns3116855.kereval.com due to conflict"
[longhorn-manager-f2pbr] time="2018-11-05T15:57:29Z" level=debug msg="Requeue longhorn-system/rancher-ns3116855.kereval.com due to conflict"
[longhorn-manager-f2pbr] time="2018-11-05T15:57:31Z" level=debug msg="Requeue longhorn-system/rancher-ns3116855.kereval.com due to conflict"

If I describe pvc:

kubectl -n longhorn-system describe pod/pvc-72af745f-e113-11e8-a199-0cc47a9fd474-e-edd316d0
Name:         pvc-72af745f-e113-11e8-a199-0cc47a9fd474-e-edd316d0
Namespace:    longhorn-system
Node:         rancher-ns3116855.kereval.com/51.38.52.98
Start Time:   Mon, 05 Nov 2018 16:02:07 +0000
Labels:       <none>
Annotations:  cni.projectcalico.org/podIP: 10.42.1.54/32
Status:       Running
IP:           10.42.1.54
Containers:
  pvc-72af745f-e113-11e8-a199-0cc47a9fd474-e-edd316d0:
    Container ID:  docker://10102fc211fcad7bf66aab14a0b77501f3d456161f2cd61181e3e1879ca5ba52
    Image:         rancher/longhorn-engine:v0.3.0
    Image ID:      docker-pullable://rancher/longhorn-engine@sha256:4748e6601502c2f765cf1dbbaaff76f43f536b5d35622338b54f2413533f89d9
    Port:          <none>
    Host Port:     <none>
    Command:
      engine-launcher
      start
      --launcher-listen
      0.0.0.0:9510
      --longhorn-binary
      /usr/local/bin/longhorn
      --listen
      0.0.0.0:9501
      --frontend
      tgt-blockdev
      --size
      10737418240
      --replica
      tcp://10.42.0.32:9502
      --replica
      tcp://10.42.1.53:9502
      --replica
      tcp://10.42.0.31:9502
      pvc-72af745f-e113-11e8-a199-0cc47a9fd474
    State:          Running
      Started:      Mon, 05 Nov 2018 16:02:12 +0000
    Ready:          False
    Restart Count:  0
    Readiness:      exec [ls /dev/longhorn/pvc-72af745f-e113-11e8-a199-0cc47a9fd474] delay=1s timeout=1s period=1s #success=1 #failure=15
    Environment:    <none>
    Mounts:
      /engine-binaries/ from engine-binaries (rw)
      /host/dev from dev (rw)
      /host/proc from proc (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-l6cjc (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  dev:
    Type:          HostPath (bare host directory volume)
    Path:          /dev
    HostPathType:
  proc:
    Type:          HostPath (bare host directory volume)
    Path:          /proc
    HostPathType:
  engine-binaries:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rancher/longhorn/engine-binaries/
    HostPathType:
  default-token-l6cjc:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-l6cjc
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                From                                    Message
  ----     ------     ----               ----                                    -------
  Normal   Pulled     29s                kubelet, rancher-ns3116855.kereval.com  Container image "rancher/longhorn-engine:v0.3.0" already present on machine
  Normal   Created    28s                kubelet, rancher-ns3116855.kereval.com  Created container
  Normal   Started    27s                kubelet, rancher-ns3116855.kereval.com  Started container
  Warning  Unhealthy  5s (x22 over 26s)  kubelet, rancher-ns3116855.kereval.com  Readiness probe failed: ls: cannot access '/dev/longhorn/pvc-72af745f-e113-11e8-a199-0cc47a9fd474': No such file or directory

The source of problem seem be a missing directory:

cannot access '/dev/longhorn/pvc-72af745f-e113-11e8-a199-0cc47a9fd474': No such file or directory

How can I resolve this ?
Create manually ths directory ?

Thanks for your help !


#5

Can you check the log of the engine using kubectl -n longhorn-system log pod/pvc-72af745f-e113-11e8-a199-0cc47a9fd474-e-edd316d0

Also, you can try https://github.com/rancher/longhorn/blob/master/docs/troubleshooting.md to see if it helps.

And just in case, all the Kubernetes cluster nodes are in the same datacenter right? Otherwise the network latency will likely be too big for Longhorn to work.


#6

The log of the engine:

kubectl -n longhorn-system describe pods/engine-image-ei-3bda103d-9lzqp

Name:           engine-image-ei-3bda103d-9lzqp
Namespace:      longhorn-system
Node:           rancher-ns3115830.kereval.com/54.38.179.67
Start Time:     Mon, 05 Nov 2018 15:36:52 +0000
Labels:         controller-revision-hash=3124322360
                longhorn=engine-image
                pod-template-generation=1
Annotations:    cni.projectcalico.org/podIP: 10.42.0.12/32
Status:         Running
IP:             10.42.0.12
Controlled By:  DaemonSet/engine-image-ei-3bda103d
Containers:
  engine-image-ei-3bda103d:
    Container ID:  docker://b6afd7100463e40b7afe8ed0bd4cde7b80053b2aeadbe8493cdc71e86ece472d
    Image:         rancher/longhorn-engine:v0.3.0
    Image ID:      docker-pullable://rancher/longhorn-engine@sha256:4748e6601502c2f765cf1dbbaaff76f43f536b5d35622338b54f2413533f89d9
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
    Args:
      -c
      cp /usr/local/bin/longhorn* /data/ && echo installed && trap 'rm /data/longhorn* && echo cleaned up' EXIT && sleep infinity
    State:          Running
      Started:      Mon, 05 Nov 2018 15:37:00 +0000
    Ready:          True
    Restart Count:  0
    Readiness:      exec [ls /data/longhorn] delay=5s timeout=1s period=5s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /data/ from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-l6cjc (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  data:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rancher/longhorn/engine-binaries/rancher-longhorn-engine-v0.3.0
    HostPathType:
  default-token-l6cjc:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-l6cjc
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/unreachable:NoExecute
Events:          <none>

Yes, normaly all the kubernetes cluster nodes are in the same datacenter.


#7

@Benjamin_Carriou

This is not the log…


#8

@yasker Oups sorry !

time="2018-11-06T08:21:33Z" level=info msg="launcher: controller 40d1e1cc-f9c5-492c-884e-e04a42c013ef started"
time="2018-11-06T08:21:33Z" level=info msg="Starting with replicas [\"tcp://10.42.0.231:9502\" \"tcp://10.42.1.249:9502\" \"tcp://10.42.0.230:9502\"]"
time="2018-11-06T08:21:33Z" level=info msg="Connecting to remote: 10.42.0.231:9502"
time="2018-11-06T08:21:33Z" level=info msg="Opening: 10.42.0.231:9502"
time="2018-11-06T08:21:33Z" level=info msg="Adding backend: tcp://10.42.0.231:9502"
time="2018-11-06T08:21:33Z" level=info msg="Set replica tcp://10.42.0.231:9502 to mode RW"
time="2018-11-06T08:21:33Z" level=info msg="Connecting to remote: 10.42.1.249:9502"
time="2018-11-06T08:21:33Z" level=info msg="Start monitoring tcp://10.42.0.231:9502"
time="2018-11-06T08:21:33Z" level=info msg="Opening: 10.42.1.249:9502"
time="2018-11-06T08:21:33Z" level=info msg="Adding backend: tcp://10.42.1.249:9502"
time="2018-11-06T08:21:33Z" level=info msg="Set replica tcp://10.42.1.249:9502 to mode RW"
time="2018-11-06T08:21:33Z" level=info msg="Connecting to remote: 10.42.0.230:9502"
time="2018-11-06T08:21:33Z" level=info msg="Start monitoring tcp://10.42.1.249:9502"
time="2018-11-06T08:21:33Z" level=info msg="Opening: 10.42.0.230:9502"
time="2018-11-06T08:21:33Z" level=info msg="Adding backend: tcp://10.42.0.230:9502"
time="2018-11-06T08:21:33Z" level=info msg="Set replica tcp://10.42.0.230:9502 to mode RW"
time="2018-11-06T08:21:33Z" level=info msg="Start monitoring tcp://10.42.0.230:9502"
time="2018-11-06T08:21:33Z" level=info msg="Get backend tcp://10.42.0.231:9502 revision counter 0"
time="2018-11-06T08:21:33Z" level=info msg="Get backend tcp://10.42.1.249:9502 revision counter 0"
time="2018-11-06T08:21:33Z" level=info msg="Get backend tcp://10.42.0.230:9502 revision counter 0"
time="2018-11-06T08:21:35Z" level=info msg="New data socket connnection established"
time="2018-11-06T08:21:36Z" level=warning msg="FAIL to discover due to Failed to execute: nsenter [--mount=/host/proc/2012/ns/mnt --net=/host/proc/2012/ns/net iscsiadm -m discovery -t sendtargets -p 10.42.0.232], output iscsiadm: Could not open /run/lock/iscsi: No such file or directory\niscsiadm: Could not open /run/lock/iscsi: No such file or directory\niscsiadm: Could not add new discovery record.\n, error exit status 6"
time="2018-11-06T08:21:36Z" level=warning msg="Nodes cleaned up for iqn.2014-09.com.rancher:pvc-72af745f-e113-11e8-a199-0cc47a9fd474"
time="2018-11-06T08:21:39Z" level=warning msg="FAIL to discover due to Failed to execute: nsenter [--mount=/host/proc/2012/ns/mnt --net=/host/proc/2012/ns/net iscsiadm -m discovery -t sendtargets -p 10.42.0.232], output iscsiadm: Could not open /run/lock/iscsi: No such file or directory\niscsiadm: Could not open /run/lock/iscsi: No such file or directory\niscsiadm: Could not add new discovery record.\n, error exit status 6"
time="2018-11-06T08:21:39Z" level=warning msg="Nodes cleaned up for iqn.2014-09.com.rancher:pvc-72af745f-e113-11e8-a199-0cc47a9fd474"
time="2018-11-06T08:21:42Z" level=warning msg="FAIL to discover due to Failed to execute: nsenter [--mount=/host/proc/2012/ns/mnt --net=/host/proc/2012/ns/net iscsiadm -m discovery -t sendtargets -p 10.42.0.232], output iscsiadm: Could not open /run/lock/iscsi: No such file or directory\niscsiadm: Could not open /run/lock/iscsi: No such file or directory\niscsiadm: Could not add new discovery record.\n, error exit status 6"
time="2018-11-06T08:21:42Z" level=warning msg="Nodes cleaned up for iqn.2014-09.com.rancher:pvc-72af745f-e113-11e8-a199-0cc47a9fd474"
time="2018-11-06T08:21:45Z" level=warning msg="FAIL to discover due to Failed to execute: nsenter [--mount=/host/proc/2012/ns/mnt --net=/host/proc/2012/ns/net iscsiadm -m discovery -t sendtargets -p 10.42.0.232], output iscsiadm: Could not open /run/lock/iscsi: No such file or directory\niscsiadm: Could not open /run/lock/iscsi: No such file or directory\niscsiadm: Could not add new discovery record.\n, error exit status 6"
time="2018-11-06T08:21:45Z" level=warning msg="Nodes cleaned up for iqn.2014-09.com.rancher:pvc-72af745f-e113-11e8-a199-0cc47a9fd474"
time="2018-11-06T08:21:48Z" level=warning msg="FAIL to discover due to Failed to execute: nsenter [--mount=/host/proc/2012/ns/mnt --net=/host/proc/2012/ns/net iscsiadm -m discovery -t sendtargets -p 10.42.0.232], output iscsiadm: Could not open /run/lock/iscsi: No such file or directory\niscsiadm: Could not open /run/lock/iscsi: No such file or directory\niscsiadm: Could not add new discovery record.\n, error exit status 6"
time="2018-11-06T08:21:48Z" level=warning msg="Nodes cleaned up for iqn.2014-09.com.rancher:pvc-72af745f-e113-11e8-a199-0cc47a9fd474"
time="2018-11-06T08:21:51Z" level=fatal msg="Failed to start up frontend: failed to start frontend: Failed to execute: longhorn-engine-launcher [--url 0.0.0.0:9510 frontend-start --id 40d1e1cc-f9c5-492c-884e-e04a42c013ef], output time=\"2018-11-06T08:21:51Z\" level=fatal msg=\"Error running frontend-start command: failed to start frontend: rpc error: code = Unknown desc = Failed to execute: nsenter [--mount=/host/proc/2012/ns/mnt --net=/host/proc/2012/ns/net iscsiadm -m node -T iqn.2014-09.com.rancher:pvc-72af745f-e113-11e8-a199-0cc47a9fd474 -p 10.42.0.232 --login], output iscsiadm: No records found\\n, error exit status 21.\"\n, error exit status 1"
time="2018-11-06T08:23:24Z" level=info msg="Receive terminated to exit"
time="2018-11-06T08:23:24Z" level=info msg="Shutdown SCSI target iqn.2014-09.com.rancher:pvc-72af745f-e113-11e8-a199-0cc47a9fd474"
time="2018-11-06T08:23:24Z" level=info msg="launcher: SCSI device /dev/longhorn/pvc-72af745f-e113-11e8-a199-0cc47a9fd474 shutdown"
time="2018-11-06T08:23:24Z" level=info msg="launcher: frontend has been shutdown"
time="2018-11-06T08:23:24Z" level=info msg="launcher: controller 40d1e1cc-f9c5-492c-884e-e04a42c013ef stopping"
time="2018-11-06T08:23:24Z" level=info msg="launcher: Longhorn Engine has been shutdown"
time="2018-11-06T08:23:24Z" level=warning msg="launcher: Engine returns exit status 1"
time="2018-11-06T08:23:24Z" level=info msg="launcher: Longhorn Engine Launcher has been shutdown"
time="2018-11-06T08:23:24Z" level=warning msg="launcher: Receive rpc shutdown: <nil>"

For information, open-iscsi is installed:

sudo apt install --no-install-recommends open-iscsi
Reading package lists... Done
Building dependency tree
Reading state information... Done
open-iscsi is already the newest version (2.0.874-3~deb9u1).
0 upgraded, 0 newly installed, 0 to remove and 15 not upgraded.

#9

I create the following directory and now it’s works !

mkdir /run/lock

This post talk about this: https://github.com/rancher/os/issues/2435

The logs of pod:

time="2018-11-06T08:26:20Z" level=info msg="launcher: controller 0d1d07e7-15fc-4eb8-b050-c793dbe10ab3 started"
time="2018-11-06T08:26:20Z" level=info msg="Starting with replicas [\"tcp://10.42.0.236:9502\" \"tcp://10.42.1.251:9502\" \"tcp://10.42.0.237:9502\"]"
time="2018-11-06T08:26:20Z" level=info msg="Connecting to remote: 10.42.0.236:9502"
time="2018-11-06T08:26:20Z" level=info msg="Opening: 10.42.0.236:9502"
time="2018-11-06T08:26:20Z" level=info msg="Adding backend: tcp://10.42.0.236:9502"
time="2018-11-06T08:26:20Z" level=info msg="Set replica tcp://10.42.0.236:9502 to mode RW"
time="2018-11-06T08:26:20Z" level=info msg="Connecting to remote: 10.42.1.251:9502"
time="2018-11-06T08:26:20Z" level=info msg="Start monitoring tcp://10.42.0.236:9502"
time="2018-11-06T08:26:20Z" level=info msg="Opening: 10.42.1.251:9502"
time="2018-11-06T08:26:20Z" level=info msg="Adding backend: tcp://10.42.1.251:9502"
time="2018-11-06T08:26:20Z" level=info msg="Set replica tcp://10.42.1.251:9502 to mode RW"
time="2018-11-06T08:26:20Z" level=info msg="Connecting to remote: 10.42.0.237:9502"
time="2018-11-06T08:26:20Z" level=info msg="Start monitoring tcp://10.42.1.251:9502"
time="2018-11-06T08:26:20Z" level=info msg="Opening: 10.42.0.237:9502"
time="2018-11-06T08:26:20Z" level=info msg="Adding backend: tcp://10.42.0.237:9502"
time="2018-11-06T08:26:20Z" level=info msg="Set replica tcp://10.42.0.237:9502 to mode RW"
time="2018-11-06T08:26:20Z" level=info msg="Start monitoring tcp://10.42.0.237:9502"
time="2018-11-06T08:26:20Z" level=info msg="Get backend tcp://10.42.0.236:9502 revision counter 0"
time="2018-11-06T08:26:20Z" level=info msg="Get backend tcp://10.42.1.251:9502 revision counter 0"
time="2018-11-06T08:26:20Z" level=info msg="Get backend tcp://10.42.0.237:9502 revision counter 0"
time="2018-11-06T08:26:23Z" level=info msg="New data socket connnection established"
time="2018-11-06T08:26:24Z" level=info msg="Creating device /dev/longhorn/pvc-72af745f-e113-11e8-a199-0cc47a9fd474 8:32"
time="2018-11-06T08:26:24Z" level=info msg="launcher: Device /dev/longhorn/pvc-72af745f-e113-11e8-a199-0cc47a9fd474 is ready"
time="2018-11-06T08:26:24Z" level=info msg="launcher: SCSI device /dev/sdc created"
time="2018-11-06T08:26:24Z" level=info msg="Listening on 0.0.0.0:9501"

Thanks for your help @yasker