Been fiddling around to integrate rancher into the new Azure Virtual Machine Scale Sets (https://azure.microsoft.com/en-us/blog/azure-vm-scale-sets-public-preview/).
I was looking to achieve this by extending my previous ARM template ; https://github.com/Azure/azure-quickstart-templates/blob/master/docker-rancher/nodes.json
For those not accustomed to ARM (Azure Resource Manager), I’m basically using docker compose to deploy the rancher agent ;
"compose": {
"rancheragent": {
"image": "rancher/agent:v0.8.2",
"restart": "always",
"privileged": true,
"volumes": [
"/var/run/docker.sock:/var/run/docker.sock"
],
"command": "[parameters('rancherApi')]"
}
}
The downside of this approach is that the agent always uses the external / public IP address for the server communication. If you wouldn’t do this, then the inter host networking would fail due to the IPSec VPN setup underneath.
Though as the client IP is dynamic (at boot), I’m having trouble using the CATTLE_AGENT_IP environment variable. Therefor I needed to use the public IP. Though this has a huge downside… The number of public IP addresses is limited / charged in Azure. When using scale sets, you would typically scale beyond those limits.
Any suggestion how to tackle this? The paths I’ve considered ;
- using the variable interpolation of docker compose in combination with the CATTLE_AGENT_IP => though I think this would not prove to be stable
- deploying the server in the same subnet & use the internal address as host ip => not tested, not sure if this would fix it
- extending the docker images with some additions with a bash script to enter the IP dynamically => though this is very work intensive in regards to upgrades
- extending the ARM template with a shell script as wrapper => at the moment this seems to be the best way, though it is far more complex compared to a “simple” docker compose
Anyhow, am I the only one experiencing these kind of deployment issues? Or am I pushing it too far in terms of automation… Any suggestions on what the best course of action would be to have an “easy” scalability in terms of hosts.
TL;DR
- CATTLE_AGENT_IP is needed as # of public IPs is limited
- setting the CATTLE_AGENT_IP dynamically / automated is not without implementation risks
- asking for suggestions / take me to school!