Longhorn dropping pods

I’m testing Longhorn on a cluster of 3 nodes (3 masters) installed on AWS (m5.4xlarge instances).
Currently, I am encountering 2 types of errors.

  1. Pods restarting applications: Longhorn keeps dropping pods so kubernetes can restart them

  2. Scheduling Failure :

2021-11-03 15:46:51 time=“2021-11-03T15:46:51Z” level=error msg=“unable to schedule replica” accessMode=rwx controller=longhorn-volume frontend=blockdev migratable=false node=ip-172-16-61-98 owner=ip-172-16-61-98 replica=pvc-440dd3ae-553d-4bd6-91b7-cf4497ab40fa-r-a28b83cc shareEndpoint=“nfs://10.43.81.243/pvc-440dd3ae-553d-4bd6-91b7-cf4497ab40fa” shareState=running state=attached volume=pvc-440dd3ae-553d-4bd6-91b7-cf4497ab40fa
2021-11-03 15:46:51 time=“2021-11-03T15:46:51Z” level=error msg=“There’s no available disk for replica pvc-440dd3ae-553d-4bd6-91b7-cf4497ab40fa-r-a28b83cc, size 96636764160”
2021-11-03 15:46:51 time=“2021-11-03T15:46:51Z” level=error msg=“unable to schedule replica” accessMode=rwx controller=longhorn-volume frontend=blockdev migratable=false node=ip-172-16-61-98 owner=ip-172-16-61-98 replica=pvc-440dd3ae-553d-4bd6-91b7-cf4497ab40fa-r-a28b83cc shareEndpoint=“nfs://10.43.81.243/pvc-440dd3ae-553d-4bd6-91b7-cf4497ab40fa” shareState=running state=attached volume=pvc-440dd3ae-553d-4bd6-91b7-cf4497ab40fa
2021-11-03 15:46:51 time=“2021-11-03T15:46:51Z” level=error msg=“There’s no available disk for replica pvc-440dd3ae-553d-4bd6-91b7-cf4497ab40fa-r-a28b83cc, size 96636764160”
2021-11-03 15:46:46 time=“2021-11-03T15:46:46Z” level=error msg=“unable to schedule replica” accessMode=rwx controller=longhorn-volume frontend=blockdev migratable=false node=ip-172-16-61-98 owner=ip-172-16-61-98 replica=pvc-440dd3ae-553d-4bd6-91b7-cf4497ab40fa-r-a28b83cc shareEndpoint=“nfs://10.43.81.243/pvc-440dd3ae-553d-4bd6-91b7-cf4497ab40fa” shareState=running state=attached volume=pvc-440dd3ae-553d-4bd6-91b7-cf4497ab40fa
2021-11-03 15:46:46 time=“2021-11-03T15:46:46Z” level=error msg=“There’s no available disk for replica pvc-440dd3ae-553d-4bd6-91b7-cf4497ab40fa-r-a28b83cc, size 96636764160”
2021-11-03 15:46:46 time=“2021-11-03T15:46:46Z” level=error msg=“unable to schedule replica” accessMode=rwx controller=longhorn-volume

Any suggestions?

Thanks

From the error log There’s no available disk for replica, it indicates that the replica can’t find the disk to schedule onto, which means that the disk space is not enough for this replica.
Could you please check the disk usage on the Longhorn UI.
Then, try to expand the host-level storage size.

Thanks for the reply.
I use an LVM on an entire EBS.
The LVM has a size of 400GB and the pvc has a size of 90GB.

You could go to the Node page, which shows the Used field.
After that, you could click the Operation → Edit node and disks to see the reason why failed replica.
You could check the doc on how to tune the Storage Over Provisioning Percentage (per-system) or Storage Reserved (per-node).
You could check the GitHub discussion as well