No way to recover from EBS-related errors

trondhindenes · March 2, 2017, 8:45am

We have had some problems with containers using EBS-backed volumes. It seems to me that there’s no way to force rancher to “stop trying” in case of volume mount errors or similar - scaling down the service to 0 has no effect, everything is still stuck in “starting” state - even if I go to the aws console and force detach the volume.

For EBS-backed volumes to be production ready, we need more options in terms of breakglass functionality in rancher, such as

Force stop container
Force check if the volume is actually mounted
Force detach volume

We’ve had multiple situations where the only thing we could do was to bring down the entire rancher environment, detach all volumes in aws and then start it up again. “not fun” in dev/test, and completely unacceptable in prod.

trondhindenes · March 2, 2017, 6:21pm

Just to add on this, it also may happen that there’s a mismatch between the volumes Rancher thinks are mapped, and the volumes that actually are mapped. In these cases, there’s no way to “force disconnect” or “force check” the mapping status of a volume. It all seems extremely brittle to me.

I’m also seeing situations where Rancher “knows” the volume is detached, but it still doesnt seem to make any attempts to mount it before starting the container that depends on it.

Topic		Replies	Views
Issues creating volumes with rancher-ebs Rancher 1.x	0	838	December 13, 2017
Rancher EBS Reuse Volume Names / 500 Error Rancher 1.x	2	1465	May 31, 2017
Rancher-ebs versus convoy-ebs Rancher 1.x	2	1241	January 30, 2017
Does anybody else have problems using EBS-backed PersistentVolume on Kubernetes? Rancher 1.x	1	1464	January 17, 2017
Rancher EBS doesn't remove volumes, created from snapshots Rancher 1.x	0	920	June 29, 2017

No way to recover from EBS-related errors

Related topics