(This is a discussion started offline with Will @ Rancher, moved to forum at his suggestions)
It is fairly common in larger deployments to use none of the “standard” networking options, whether Docker bridge or Rancher’s network. Instead, many of us have our own networking designs that fit our use cases.
jpetazzo’s pipework was built to handle these cases.
In our case, we do the following upon launching a container:
Create a macvlan for a particular physical interface
Assign the macvlan to the container via netns
Find a fixed IP from a service (DNS, consul, etcd, zookeeper, etc.)
Assign the IP and routes to the container (using nsenter or ip netns exec)
How do I launch this configuration from within Rancher? Sure, if I am launching a container directly, I can write a script that will launch the container, create the macvlan, get the IP, hostname and routes, and run the commands… but the idea is to manage them all from Rancher.
Separately, how does libnetwork in 1.7 or 1.8 help? I know that libnetwork is structured to run ENTRYPOINT only after the network is set up, which is great, but I still don’t grasp how I will run my network setup even in a libnetwork structure.
@deltch The answer is a very well thought out “I don’t know.” There’s a lot of moving parts here with Rancher and libnetwork that I can’t say exactly how you should do this. I can give you plenty of solutions but they may be short term. The correct answer depends a lot on your time frame and willingness to change after you’ve implemented something. While libnetwork is in 1.7 it is experimental and the user facing design is rapidly changing. But rest assured what you have described will absolutely be supported in Rancher and I hope that will be in the Docker 1.8 time frame.
The way Rancher works is that we basically have two networking modes: “managed” and “unmanaged.” Unmanaged networking means we just let Docker do what Docker does. In this mode everything supported by Docker works in Rancher, but not vice versa. Right now our service discovery and load balancing only work with our “managed” network.
Managed networking is where we create an overlay ipsec network. Because Docker doesn’t truly have pluggable networking yet the way we do this is we work in tandem with the native Docker networking. Docker will assign a 172.17.0.0/16 address and then we assign an additional 10.42.0.0/16 address on eth0. The default route of the container stays as 172.17.42.1 and only IPs on 10.42.0.0/16 get routed over our network.
In the end you should be able to do a lot of creative things with Rancher and networking. Our managed networking mode only has the requirement that the container has a eth0 and we own 10.42.0.0/16 subnet (that should be configurable in the future). You are free to add more interfaces, change the routing rules, etc. This means container traffic can go over your macvlan (have you tried ipvlan?) but then container to container traffic from our service discovery mechanism will go over ipsec.
In the future we can make 100% of your traffic go over a configured network as long as you can guarantee that the IP’s of the containers can some how talk to each other. But this is where we get into the libnetwork work which is currently very dynamic.
“a very well thought out ‘I don’t know.’” Brilliant! (and you punctuated the period in the right place before the quote, which unfortunately is quite rare)
1.8 is due out first week of August IIRC. That opens up networking plugins.
Rancher-managed networking doesn’t work for us (neither does Linux Bridge/docker0, or Kubernetes networks, or pretty much any overlay). We have very tight network performance, especially latency requirements (I am working to publish a paper on my testing results within a few weeks, maybe offer it up at a conference). So we cannot use the additional layer of bridge, or tunnel, or whatever.
I am fine with macvlan direct to the network card, but I need a way to control it.
It looks like my only choices are:
Wait for 1.8 and use a plugin (either one that exists or write one)
Stick with 1.7 and use the events API (weave does that in certain implementations)
Wrap Docker calls and accept that the orchestrator like Rancher or restart policies won’t work
I think I will wait for 1.8, and if I am under pressure, put together an API events listener to carry us through.
OK @vincent, just cool. I was wondering why it was "mac"vlan and not done at L3. In truth, I think many of us would have preferred to do the equivalent of
ifconfig eth0:25 up
and assign an IP to it, then throw it into the netns, rather than the macvlan route. But the above just assigns an IP to the same interface. Adding it to the netns means taking it out of the default ns. Hence, the widespread usage of macvlan.
If we could do it at L3, that would be really nice.
It doesn’t look like it has widespread support though, still fairly new. You have any sense where it is on mainlining into the Linux kernel and iproute2 tools?
Got it, so it only is on some mainline distributions. CentOS 7 is on 3.10, Ubuntu hit it 3.19 with 15.04 back in April, CoreOS seems to have it even on stable (DigitalOcean is having a hard time booting stable, so I cannot test), although it only has docker 1.6, and I won’t run production on alpha.
Back to the core, though. I joined the discussion on github docker about IP management and network driver plugins, and might just build that plugin+events API listener for now.
In the meantime, need to get my head around ipvlan. I get that macvlan adds another interface, connected to the physical interface, gives it its own IP that I assign and a random (or assigned) mac address. I don’t actually see how ipvlan properly routes packets between interfaces. If they have the same mac address… oh, wait, it inspects the IP when it comes in?
@vincent I have been digging into multiple orchestration engines over the last few weeks. Playing with fleet gave me some ideas. Fleet essentially uses systemd service files - with all of their features - to define, start and stop the container. One of the advantages is that you can always have an ExecStartPost command, which will run something after the primary command.
Is there any reason we cannot have that in Rancher as well? That would be much broader than just networking, essentially allowing any sort of post-run configuration.
@deitch It would simple to support such a feature, but just a couple questions first. One thing that is different about Fleet is that it is a systemd scheduler, not a Docker orchestration tool. What I mean by that is it’s designed to run arbitrary systemd units/commands at a host level, which might happen to be a “docker run” command. With Rancher we primarily focus on running containers and things inside of containers. So my question would be is in what context should ExecStartPost run? Are you looking to execute scripts on the host and not just launch a container?
With fleet, you can only do ExecStartPost if that command is calling something that already exists on the server. So you have a distribution problem. Would you be assuming that ExecStartPost would call things already installed on the box and Rancher wouldn’t do anything to get those assets to the node?
As a final note, Docker 1.8 will not include networking drivers. They are scheduled for 1.9 but the UX may be completely different. In short, it might not be what you are thinking of.
I’m confident that Rancher can address whatever networking concerns you have. We provide a simple overlay networking that works anywhere, but we know it will not satisfy all requirements and welcome the challenge of different configurations. Our higher level components such as load balancing, service discovery, health checks, DNS, make no real assumptions that you are using our networking model. We just need a simple L3 network.
Yes, definitely, fleet is broader, it just happens to run docker run ... as its ExecStart. I was just thinking about it as a model.
Most of what people do pre and post launching a container that might be special is setting up certain filesystems or networking (which is why, I presume, most of the initial Docker plugin excitement is focused on those).
The distribution problem is an issue, both with fleet and Rancher, but isn’t Docker images all about distribution? I am wondering if limiting it to something that either exists on the Rancher-managed host or something that can be run in a container (even better) would make it easy. The challenge would be having that added container (not quite a Sidekick, more like an appetizer or dessert) be able to manage the first ones. It probably would need certain elevated privileges.
As a model, I like it a lot, because you have the distribution problem solved already. If the elevation issue could be solved too - container managing another container’s network and/or volumes - that would be very enticing.
As for the overlay, yeah, I get it. But if you are working in extremely high performance, every bit of latency matters. So we cannot use overlays. I like Calico’s approach of straight, routable L3, but again, you still need to do the networking.
I would be really interested in adding pre/post hooks that execute containers because the distribution problem is solved. For example, I haven’t tried this, but it is my assumption you could run pipeworks from a container if you did
docker run --net=host --pid=host --privileged -v /var/run/docker.sock:/var/run/docker.sock myfancyimage
Inside myfancyimage you would then just execute pipeworks that same way you did on the host before.
This is pretty darn close to just running it in a chroot. Do you think you could work with pre/post hooks that are more container focused? I’m fine with also arbitrary commands, but I just feel like I’m giving an incomplete user experience.
Yeah, that is exactly where I am going. I could then have my main container run as
docker run --net=none myregularimage
And then run pipeworks afterwards.
I was already working on an event listener that would wrap pipeworks, was planning on putting it in a privileged container. This just goes with it.
Of course, if I have that structured correctly, I can use it even without the extension. It just becomes a container that I run on every host, listens for start events, and then checks labels on containers… which, I think, is how Rancher networking works?
Hello, please how about this topic? Has rancher implemented ipvlan.? We need to be able to bind several network interfaces from host machine to a container
As far as I understand it, it is not a question of macvlan vs ipvlan for rancher, per se, as much as the kernel.
Rancher itself just does its own ipsec networking (i.e. an overlay network). Whether you use that, or another overlay, or the docker bridge, or ipvlan, or calico, or something else, is entirely at the docker level, which in turn depends on what your host supports. If you have a 3.19+ kernel, then use ipvlan, else macvlan.
FWIW, macvlan (which I use) creates issues when running in a VM, as VMWare will not let packets through if the mac address is not a vmware address.
In the end, what I do is:
Start a pipework container with --net=host on every host, and access to the docker pipe.
Any container that should have a specific IP (or 2, or 10) or route, etc., has labels describing those.
When a container starts, the pipework container catches the start event, looks for labels, and then installs the correct IPs, routes, etc.
Is this the best way? No. Far better would be using libnetwork. I intend to look into how to convert the pipework container from one that monitors events to one that implements libnetwork.
Mine is based upon https://github.com/dreamcat4/docker-images/tree/master/pipework although I cleaned and changed it quite a bit. I also had to get pipework to support dummy interfaces, routes, etc., which jpetazzo kindly accepted the pull requests back into mainline pipework for.
It doesn’t seem too hard to integrate with rancher, the way storage drivers is.
I am actually trying to know if it is in the plans or is there an alternative (ie if the host is running a supported kernel ).
Just realized there was another comment on this discussion.
I am still running pipework container as described above… but I don’t love it. I do it for 4 activities:
Custom IP that is visible on the fabric using macvlan (in our case, 3.10 kernel, so macvlan is OK, ipvlan is not)
dummy interface for some in-the-container multicast
route command for multicast for particular addresses via the dummy interface
route command for default routing via our custom IP (necessary because of all of the other changes)
I really would like to retire all of this in favour of a single (or set) of network plugin. Unfortunately, in some cases the custom IP is DHCP, which docker networking plugin does not (yet) support. Similarly dummy interfaces and routing are not supported yet.