Unable to attach Longhorn storage to single-cluster Rancher k8s node

Hi. I’ve just installed a single-node Rancher Kubernetes cluster to experiment with Longhorn. The k8s cluster was easy enough to setup, but I cannot seem to get Longhorn to cooperate, thought I’d reach out here to see if I’m doing something obviously wrong.

I first went ahead and created a new storage class named longhorn and a PV named longhorn-volv-pv:

And when I create a PVC with the following definition it successfully binds the PV:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: longhorn-volv-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 2Gi
  volumeName: longhorn-volv-pv
$ kubectl apply -f pvc.yaml 
persistentvolumeclaim/longhorn-volv-pvc created
$ kubectl get pvc,pv
NAME                                      STATUS   VOLUME             CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/longhorn-volv-pvc   Bound    longhorn-volv-pv   10Gi       RWO            longhorn       27s

NAME                                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                       STORAGECLASS   REASON   AGE
persistentvolume/longhorn-volv-pv   10Gi       RWO            Retain           Bound    default/longhorn-volv-pvc   longhorn                100s

Then I create a pod with the following definition, expecting it to mount my volume. It seems to get scheduled properly, but I guess my nodes don’t know how to attach the storage…

apiVersion: v1
kind: Pod
metadata:
  name: volume-test
  namespace: default
spec:
  containers:
  - name: volume-test
    image: nginx:stable-alpine
    imagePullPolicy: IfNotPresent
    volumeMounts:
    - name: volv
      mountPath: /data
    ports:
    - containerPort: 80
  volumes:
  - name: volv
    persistentVolumeClaim:
      claimName: longhorn-volv-pvc
$ kubectl apply -f pod.yaml 
pod/volume-test created
$ kubectl describe pod volume-test 
Name:         volume-test
Namespace:    default
Priority:     0
Node:         server/192.168.3.1
Start Time:   Sat, 26 Sep 2020 21:26:50 +0200
Labels:       <none>
Annotations:  kubernetes.io/psp: default-psp
Status:       Pending
IP:           
IPs:          <none>
Containers:
  volume-test:
    Container ID:   
    Image:          nginx:stable-alpine
    Image ID:       
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /data from volv (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-sx6pr (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  volv:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  longhorn-volv-pvc
    ReadOnly:   false
  default-token-sx6pr:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-sx6pr
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age        From               Message
  ----    ------     ----       ----               -------
  Normal  Scheduled  <unknown>  default-scheduler  Successfully assigned default/volume-test to server

After a short while it times out giving me the following events:

Events:
  Type     Reason              Age        From                     Message
  ----     ------              ----       ----                     -------
  Normal   Scheduled           <unknown>  default-scheduler        Successfully assigned default/volume-test to server
  Warning  FailedAttachVolume  15s        attachdetach-controller  AttachVolume.Attach failed for volume "longhorn-volv-pv" : attachdetachment timeout for volume longhorn-volv-pv
  Warning  FailedMount         12s        kubelet, server          Unable to attach or mount volumes: unmounted volumes=[volv], unattached volumes=[default-token-sx6pr volv]: timed out waiting for the condition

Hi @lindhe

Since you’ve already set the replica count to 1, so the reason it cannot get attached is something else. I guess maybe the iscsid on the node wasn’t configured correctly. Can you take a look at longhorn-manager log to see if there is any error?

1 Like

Thanks a lot for the suggestion! I cannot find any pod called longhorn-manager. I assumed Rancher came with Longhorn support out-of-the-box, but perhaps that’s where I get it wrong?

I followed the Rancher Quick Start Guide to setup my Rancher instance and created the cluster from the web GUI. I’m getting a feeling that’s not enough. Do you think I also have to do one or both of these?

EDIT: just verified that iscsid is indeed running on my host machine, so shouldn’t be the first guide I suppose. But perhaps I need to install Longhorn.

@lindhe Longhorn is not installed with Rancher by default. You need to go to the Apps and install it from there.

Also, you need to delete your created storage class longhorn and PV before installing Longhorn, since it will cause conflict during installation. You can create a storage class with other names for different parameters though.

Alright, cool. I assume I should set the “System” as the “Target Project” when installing, rather than the “Default” one?

Now I’ve got the Longhorn app installed, and it did indeed create a new StorageClass named longhorn. I see that it has set numberOfReplicas=3, which I guess could be problematic in a single-node setup like mine. But perhaps that’s just the default value, and can/should be set to 1 in the pv once provisioned?

Anyhow, now I’m trying to create and attach a volume. And this time I do have a longhorn-manager pod!!

I started off trying pvc+pod similar to my first example, but had no immediate success. So I ventured into the Longhorn app’s dashboard, and tried to configure nodes, disks and volumes appropriately. I setup the node and disk like this (note that I remove the default disk that was created):

It seems promising, because the node/disk has status and readiness as “Schedulable” and “Ready”, respectively. However, I’m still stuck. I head into the “Volume” tab in the Longhorn dashboard and create a volume there. I would attach a screenshot, but since I’m using a new account here I’m limited to one per post. This is not a pv, right? But some sort of “Longhorn volume” adjacent to Kubernetes concepts?

I now try to attach the disk directly via the Longhorn dashboard, and it’s just stuck trying but failing to attach to my host.

I’m including here what I believe to be the relevant parts of the longhorn-manager logs, but it doesn’t tell me much more than that it fails to attach the volume.

time="2020-10-03T10:04:52Z" level=debug msg="Created volume test-volume: {Size:21474836480 Frontend:blockdev FromBackup: NumberOfReplicas:1 StaleReplicaTimeout:20 NodeID: EngineImage:longhornio/longhorn-engine:v1.0.2 RecurringJobs:[] BaseImage: Standby:false DiskSelector:[] NodeSelector:[] DisableFrontend:false}"
172.17.0.1 - - [03/Oct/2020:10:04:52 +0000] "POST /v1/volumes HTTP/1.1" 200 883 "https://server.lindhe.io:8443/k8s/clusters/c-tflxd/api/v1/namespaces/longhorn-system-p-gvcqq/services/http:longhorn-frontend:80/proxy/volume" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0"
time="2020-10-03T10:04:52Z" level=debug msg="Volume controller server picked up test-volume"
time="2020-10-03T10:04:52Z" level=debug msg="Schedule replica test-volume-r-b80320a8 to node server, disk disk-1, datapath /storage/share/rancher/longhorn/replicas/test-volume-0b03010b"
time="2020-10-03T10:04:52Z" level=debug msg="Engine controller server picked up test-volume-e-226a3fcf"
time="2020-10-03T10:04:52Z" level=debug msg="Instance handler updated instance test-volume-e-226a3fcf state, old state , new state stopped"
time="2020-10-03T10:04:52Z" level=debug msg="Replica controller server picked up test-volume-r-b80320a8"
time="2020-10-03T10:04:52Z" level=debug msg="Instance handler updated instance test-volume-r-b80320a8 state, old state , new state stopped"
time="2020-10-03T10:04:52Z" level=debug msg="Requeue longhorn-system-p-gvcqq/test-volume due to error <nil> or Operation cannot be fulfilled on engines.longhorn.io \"test-volume-e-226a3fcf\": the object has been modified; please apply your changes to the latest version and try again"
time="2020-10-03T10:04:52Z" level=info msg="Event(v1.ObjectReference{Kind:\"Volume\", Namespace:\"longhorn-system-p-gvcqq\", Name:\"test-volume\", UID:\"831d4458-629d-4229-8bcd-1117221789a0\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"4613386\", FieldPath:\"\"}): type: 'Normal' reason: 'Detached' volume test-volume has been detached"
time="2020-10-03T10:06:27Z" level=debug msg="Prepare to create instance test-volume-r-b80320a8"
time="2020-10-03T10:06:27Z" level=info msg="Event(v1.ObjectReference{Kind:\"Replica\", Namespace:\"longhorn-system-p-gvcqq\", Name:\"test-volume-r-b80320a8\", UID:\"4ff6d5af-2c86-4fa6-a471-0689e8c57ee2\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"4613676\", FieldPath:\"\"}): type: 'Normal' reason: 'Start' Starts test-volume-r-b80320a8"
time="2020-10-03T10:06:27Z" level=debug msg="Attaching volume test-volume to server with disableFrontend set false"
172.17.0.1 - - [03/Oct/2020:10:06:27 +0000] "POST /v1/volumes/test-volume?action=attach HTTP/1.1" 200 1977 "https://server.lindhe.io:8443/k8s/clusters/c-tflxd/api/v1/namespaces/longhorn-system-p-gvcqq/services/http:longhorn-frontend:80/proxy/volume" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0"
time="2020-10-03T10:06:28Z" level=debug msg="Instance process test-volume-r-b80320a8 had been created, need to wait for instance manager update"
time="2020-10-03T10:06:28Z" level=warning msg="Instance test-volume-r-b80320a8 is state error, error message: exit status 1"
time="2020-10-03T10:06:28Z" level=debug msg="Instance handler updated instance test-volume-r-b80320a8 state, old state stopped, new state error"
time="2020-10-03T10:06:28Z" level=warning msg="Instance test-volume-r-b80320a8 crashed on Instance Manager instance-manager-r-6707467b at server, try to get log"
time="2020-10-03T10:06:28Z" level=warning msg="test-volume-r-b80320a8: time=\"2020-10-03T10:06:27Z\" level=info msg=\"Creating volume /host/storage/share/rancher/longhorn/replicas/test-volume-0b03010b, size 21474836480/512\""
time="2020-10-03T10:06:28Z" level=warning msg="test-volume-r-b80320a8: time=\"2020-10-03T10:06:27Z\" level=fatal msg=\"Error running start replica command: open /host/storage/share/rancher/longhorn/replicas/test-volume-0b03010b/revision.counter: invalid argument\""
time="2020-10-03T10:06:28Z" level=warning msg="Instance test-volume-r-b80320a8 is state error, error message: exit status 1"
time="2020-10-03T10:06:28Z" level=warning msg="Instance test-volume-r-b80320a8 crashed on Instance Manager instance-manager-r-6707467b at server, try to get log"
time="2020-10-03T10:06:28Z" level=warning msg="test-volume-r-b80320a8: time=\"2020-10-03T10:06:27Z\" level=info msg=\"Creating volume /host/storage/share/rancher/longhorn/replicas/test-volume-0b03010b, size 21474836480/512\""
time="2020-10-03T10:06:28Z" level=warning msg="test-volume-r-b80320a8: time=\"2020-10-03T10:06:27Z\" level=fatal msg=\"Error running start replica command: open /host/storage/share/rancher/longhorn/replicas/test-volume-0b03010b/revision.counter: invalid argument\""
time="2020-10-03T10:06:54Z" level=warning msg="Instance test-volume-r-b80320a8 is state error, error message: exit status 1"
time="2020-10-03T10:06:54Z" level=warning msg="Instance test-volume-r-b80320a8 crashed on Instance Manager instance-manager-r-6707467b at server, try to get log"
time="2020-10-03T10:06:54Z" level=warning msg="test-volume-r-b80320a8: time=\"2020-10-03T10:06:27Z\" level=info msg=\"Creating volume /host/storage/share/rancher/longhorn/replicas/test-volume-0b03010b, size 21474836480/512\""
time="2020-10-03T10:06:54Z" level=warning msg="test-volume-r-b80320a8: time=\"2020-10-03T10:06:27Z\" level=fatal msg=\"Error running start replica command: open /host/storage/share/rancher/longhorn/replicas/test-volume-0b03010b/revision.counter: invalid argument\""
time="2020-10-03T10:06:54Z" level=warning msg="Instance test-volume-r-b80320a8 is state error, error message: exit status 1"
time="2020-10-03T10:06:54Z" level=warning msg="Instance test-volume-r-b80320a8 crashed on Instance Manager instance-manager-r-6707467b at server, try to get log"
time="2020-10-03T10:06:54Z" level=warning msg="test-volume-r-b80320a8: time=\"2020-10-03T10:06:27Z\" level=info msg=\"Creating volume /host/storage/share/rancher/longhorn/replicas/test-volume-0b03010b, size 21474836480/512\""
time="2020-10-03T10:06:54Z" level=warning msg="test-volume-r-b80320a8: time=\"2020-10-03T10:06:27Z\" level=fatal msg=\"Error running start replica command: open /host/storage/share/rancher/longhorn/replicas/test-volume-0b03010b/revision.counter: invalid argument\""

@lindhe The error message is

time="2020-10-03T10:06:54Z" level=warning msg="test-volume-r-b80320a8: time=\"2020-10-03T10:06:27Z\" level=info msg=\"Creating volume /host/storage/share/rancher/longhorn/replicas/test-volume-0b03010b, size 21474836480/512\""
time="2020-10-03T10:06:54Z" level=warning msg="test-volume-r-b80320a8: time=\"2020-10-03T10:06:27Z\" level=fatal msg=\"Error running start replica command: open /host/storage/share/rancher/longhorn/replicas/test-volume-0b03010b/revision.counter: invalid argument\""

It seems similar to https://github.com/longhorn/longhorn/issues/533 . Longhorn doesn’t support filesystem that doesn’t implement the sparse file, e.g. zfs. Which filesystem you’re using on the node? We recommend ext4. xfs is also fine.

1 Like

Oh, well there’s the snag! I run ZFS, so that’s that as far as longhorn goes for me this time then. I’ll perhaps plug in an external drive to use instead, but more likely just go with a local volume driver.

Thank you so much for the help, I would not have figured it out by myself. Or at least not as quickly.