Longhorn dropping pods

Kimba · November 3, 2021, 3:58pm

I’m testing Longhorn on a cluster of 3 nodes (3 masters) installed on AWS (m5.4xlarge instances).
Currently, I am encountering 2 types of errors.

Pods restarting applications: Longhorn keeps dropping pods so kubernetes can restart them
Scheduling Failure :

		2021-11-03 15:46:51	time=“2021-11-03T15:46:51Z” level=error msg=“unable to schedule replica” accessMode=rwx controller=longhorn-volume frontend=blockdev migratable=false node=ip-172-16-61-98 owner=ip-172-16-61-98 replica=pvc-440dd3ae-553d-4bd6-91b7-cf4497ab40fa-r-a28b83cc shareEndpoint=“nfs://10.43.81.243/pvc-440dd3ae-553d-4bd6-91b7-cf4497ab40fa” shareState=running state=attached volume=pvc-440dd3ae-553d-4bd6-91b7-cf4497ab40fa
		2021-11-03 15:46:51	time=“2021-11-03T15:46:51Z” level=error msg=“There’s no available disk for replica pvc-440dd3ae-553d-4bd6-91b7-cf4497ab40fa-r-a28b83cc, size 96636764160”
		2021-11-03 15:46:51	time=“2021-11-03T15:46:51Z” level=error msg=“unable to schedule replica” accessMode=rwx controller=longhorn-volume frontend=blockdev migratable=false node=ip-172-16-61-98 owner=ip-172-16-61-98 replica=pvc-440dd3ae-553d-4bd6-91b7-cf4497ab40fa-r-a28b83cc shareEndpoint=“nfs://10.43.81.243/pvc-440dd3ae-553d-4bd6-91b7-cf4497ab40fa” shareState=running state=attached volume=pvc-440dd3ae-553d-4bd6-91b7-cf4497ab40fa
		2021-11-03 15:46:51	time=“2021-11-03T15:46:51Z” level=error msg=“There’s no available disk for replica pvc-440dd3ae-553d-4bd6-91b7-cf4497ab40fa-r-a28b83cc, size 96636764160”
		2021-11-03 15:46:46	time=“2021-11-03T15:46:46Z” level=error msg=“unable to schedule replica” accessMode=rwx controller=longhorn-volume frontend=blockdev migratable=false node=ip-172-16-61-98 owner=ip-172-16-61-98 replica=pvc-440dd3ae-553d-4bd6-91b7-cf4497ab40fa-r-a28b83cc shareEndpoint=“nfs://10.43.81.243/pvc-440dd3ae-553d-4bd6-91b7-cf4497ab40fa” shareState=running state=attached volume=pvc-440dd3ae-553d-4bd6-91b7-cf4497ab40fa
		2021-11-03 15:46:46	time=“2021-11-03T15:46:46Z” level=error msg=“There’s no available disk for replica pvc-440dd3ae-553d-4bd6-91b7-cf4497ab40fa-r-a28b83cc, size 96636764160”
		2021-11-03 15:46:46	time=“2021-11-03T15:46:46Z” level=error msg=“unable to schedule replica” accessMode=rwx controller=longhorn-volume

Any suggestions?

Thanks

JenTing · November 3, 2021, 4:18pm

From the error log There’s no available disk for replica, it indicates that the replica can’t find the disk to schedule onto, which means that the disk space is not enough for this replica.
Could you please check the disk usage on the Longhorn UI.
Then, try to expand the host-level storage size.

Kimba · November 3, 2021, 5:17pm

Thanks for the reply.
I use an LVM on an entire EBS.
The LVM has a size of 400GB and the pvc has a size of 90GB.

JenTing · November 4, 2021, 12:51am

You could go to the Node page, which shows the Used field.
After that, you could click the Operation → Edit node and disks to see the reason why failed replica.
You could check the doc on how to tune the Storage Over Provisioning Percentage (per-system) or Storage Reserved (per-node).
You could check the GitHub discussion as well

Topic		Replies	Views
Longhorn volume degraded, Replica Scheduling Failure, Error Message: precheck new replica failed Rancher	1	151	December 10, 2024
Error Message: replica scheduling failed Longhorn	2	1485	May 21, 2024
Longhorn PVC failed to switch to different pod, once pod instance died Longhorn	3	3848	September 4, 2019
Advanced scheduling for Share Manager Longhorn	3	1694	June 8, 2024
Pod stucks when recreates at another node Longhorn	0	442	August 9, 2023

Longhorn dropping pods

Related topics