How to schedule containers with failure domains [answered]

Is there a way to schedule containers so that no two containers in a service are within the same failure domain?

For example, I have 3 physical hosts, A, B, C. Each host has 2 VMs, A1, A2, B1, etc that are added as rancher hosts.

Can I create a service with 3 containers and a schedule that guarantees that no two containers are located on VMs running on the same host (i.e. A1, B2, C1)?

What do you mean by same failure domain ?

If the stacks that you are calling A1, A2 are rendering the same service, you can check for cluster options offered by Rancher :

Another solution is to use Rancher API to activate / deactivate stacks according to your own scheduler application and operations you want to perform.

What I mean, is that if I create a service with 3 containers, and two of them happen to be on VMs that reside in the same physical host (A1 & A2), that host is a single point of failure that can take down 2/3 of the containers. The same can be said of network switches, server chassis where multiple servers share the same power & control plane, or server racks that share the same power circuit. It would be great if there was a way to spread out the containers in a service to multiple failure domains so there is no single hardware failure that can take down a large portion of the containers.

I think that io.rancher.scheduler.affinity:host_label or io.rancher.scheduler.affinity:host_label_soft is the thing you want

So on your VMs A1 & A2 define the same host label with value and then set the scheduler to run 1 instance on each host with label XY so you can guarantee that only 1 container spins up per physical host.

Thats my understanding of the scheduler affinity didn’t tried it by myself but i think it is worth it to try it out.

I think the host affinity is the area that is most relevant, but that still wouldn’t solve the problem. My example was a very basic one.

Lets say I have 5 server chassis, and each one has 4 physical servers in it for 20 hosts total. I only want 3 containers in the service, but I want each container to be located on a server in 3 different chassis. Using your method, I would get 5 containers.

I think what is needed is a “unique” condition. On each of the physical servers in a chassis, I can add the same label for chassis=number. Then in the scheduling, I say that the values of the server label “chassis” should/must be unique for all containers in the service.

This way you can have as many failure domains as you want (server, switch, rack, room, datacenter) and you add the appropriate tags to each server, and you can spread out the service containers into many different failure domains.

Afaik you can use the scale parameter in the docker-compose file and set it to 3 - so rancher will bring up only 3 containers on hosts meeting your rules for schedule.affinity

And cause of the scale + affinity.schedule even if 2 are on the same host as soon as this hosts isn’t reachable anymore it should automatically spin up the containers on available hosts meeting the criteria. Thats my understanding for that.

You could try the following:

scale: 3
io.rancher.scheduler.affinity:host_label_soft: rack=rack_1,rack=rack_2,rack=rack_3,rack=rack_4,rack=rack_5

I believe this issue covers what you’re looking for:

Not implemented yet.

Thank you. That is exactly what I am looking for. I guess I need to wait for Rancher 2.0 to get it.