Scheduler losing track of memory reservations?

Think I’ve found a bug, would be good if someone else could confirm.

A bit of detail on my setup:
2 x machines running Rancher 1.4.1 on Ubuntu 16.04 with Docker 1.12.6 in a MySQL HA setup
2 x hosts running RancherOS 0.8.0 with 4GB (3.85GiB) memory

If I create a stack and add a service with a memory reservation of 1000MB, I can scale that service to 6 containers which will be successfully created (3 per host). If I scale it to 7 I get the “Allocation failed: Error reserving resources” error as expected. If I then scale the service down to 1 and then try and scale it back up, I can’t scale beyond 1 without getting the “Allocation failed: Error reserving resources” error. Deleting the Scheduler container (and it being recreated) seems to resolve it and the containers are then created. That creates problems of its own though because I can then scale above 6 (I’ve managed 11!).

I can’t see a way of finding out how much memory that rancher thinks has been reserved on a host (other than looking at the scheduler logs which don’t log it often enough to be helpful). Has anyone else seen this? Am I missing something simple? I stumbled upon some comments on git that mentioned releasing resources, but I assume this is something that rancher takes care of itself?

Thanks for the feedback on the feature @lukehtims!

I believe the problem lies in the fact that resources are not being released when you think they are.
In short, we do a two phase remova: the container gets removed and then some random number of seconds later, its gets purged. And the resources only get released on purge.

We want to fix this by making purge happen immediately after remove. I’m not sure how soon we’ll get that it in, but I’ll update you when we do.

I’ll open a github issue and link to it from here.

I assumed that was the case and leaving some time before/after scale up/down does seem to help. The problem was that I was creating a stack from compose files with memory reservations and a scale that would put me right on the limit. Setting the scale lower initially and then scaling up with a bit of a delay seems to work.

It would be nice if I could just set a maximum number of containers per host for a given service :wink: