Why isn’t rancher/agent:latest pointing to the latest release the way rancher/server:latest is? Now, if I want to run the latest Rancher Server I pull rancher/server:latest, but if I want to run the latest Rancher Agent then I need to pull rancher/agent:v1.2.0 because rancher/agent:latest is older.
@huggla I’m not sure why it isn’t kept in sync. It may be that we never use the latest tag and thus stopped updating it. But that aside, we try to control which version of the agent you are running by specifying it in the docker run command the UI produces on the Add Host page.
Is there more to your usecase that we aren’t covering?
We are using Ansible to add hosts to Rancher Server and I was using Latest but now I changed it to v1.2.0 but I will have to keep updating this as new versions come out. For Production this might be ok but for the dev environment I would rather not have to change it all the time.
We start the agent from a cloud-config located on a Openstack config-drive. I too would appreciate if we didn’t have to keep track of version numbers and having to manually update the cloud-config whenever a new version of the agent is released.
The actual version you deploy is not really all that relevant because all the software within it is updated to match the version expected by the rancher/server it connects to. This is why you never have to manually update the agent container between releases.
I dont want to waste diskspace with more than one agent image. Can I have the old one automatically removed when it’s updated?
It also doesn’t work when you set the container’s restart_policy to unless-stopped. It keeps recreating the agent container over and over.
@huggla Standard image management that has nothing specific to do with rancher. Also it’s ~230mb, most of the layers are common between versions, and as I said we don’t actually upgrade the container anyway, the appropriate versions of the software for the server version are downloaded in to it.
@RobertKozak When you run the registration command it creates a volume to remember various things, starts the actual agent container, and exits. Don’t add restart policy to the command. The actual agent container that is supposed to stay running already has it.
There probably should be a note in the docs about this since it breaks docker run functionality and convention of the restart_policy.
There are a infinite ways you can alter it and make it not work, but the UI and API give you a specific command (down to including sudo
) because that’s what we expect users to run. It’s a setup container that runs once (the equivalent of start-once once you’re in Rancher).
This seems unnecessarily complex and confusing for the user. Wouldn’t it be better if the setup container image were separated and named differently than the actual agent?
What if there was a change in the setup part of the agent image? We need a tag that is guaranteed to always work. Should we use “latest” then? And, I still think that having a tag called “latest” that really isn’t the latest version is confusing.
The issue comes from the fact that I am not using the UI or the API to run the rancher_agent on the host. I am using Ansible which has its own docker_container task and I assume Puppet and Chef do something similar. These tools expect docker to run like docker and the way you are starting the rancher agent is not normally how docker works.
- name: Register the Agent Host machine with the Rancher server
docker_container:
name: "{{ rancher_agent_name }}"
image: rancher/agent:{{ rancher_agent_version }}
privileged: yes
detach: true
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /var/lib/rancher:/var/lib/rancher
command: "{{ rancher_token.json['registrationUrl'] }}"
state: started
when: "{{ 'rancher-agent' not in containers.stdout }}"
In ansible the docker_container task should be idempotent and the state started means if the container is not running run it and if its already running leave it alone. But since this container spawns off another container this task can never be idempotent. So I have to check if its running and add the when: clause.
Since the rancher_agent container spawns off another and kills itself the restart_policy is ineffective. So I can no longer control the lifetime of the agent container. if it stops for some reason I would like it to start back up unless I specifically manually stop it.ie. – restart unless-stopped
Maybe you need to think about having two different types of rancher_agent containers:
- rancher_agent
- rancher_agent_launcher
The public interface to getting the “correct” rancher/agent
tag for the version of the server that is running is the registration command show in the UI or registrationToken resource in the API (where the UI gets it). The full command
, image
and registrationUrl
are all there separately so you can use whichever you want. Whether we update latest
or not with every release it can only be the right version for a subset of server versions.
You’re overcomplicating this by defining it as a docker_container task in ansible and trying to make that container auto-restart and assert that the state is running. The registration command as given is idempotent, you can run it on every boot if you want. And the container that is ultimately started already has --restart=always
.
It is already two containers, they just use the same image so an additional pull isn’t needed and one starts the other. Making them separate images solves nothing. The important bits are it needs to:
- Define a local volume to persist the identity of the host across agent container lifetimes
- Provide one “docker run …” command for normal users to run that is short and has a chance of being copy/pasted correctly
- Not require you to see and maintain all the detail of what is really required to run the agent.
Take a look at the actual running agent container’s inspect (or that link above). There are 8 different volume mounts, network options, oomkiller options, and a variety of other things. You could define the agent container yourself with all of that in Ansible if you really want to, but it would be 2 screens of YAML long and break as soon as we change the agent to expect some new option because that configuration now arbitrarily lives in your Ansible file instead of our definition of how the agent should be run.
Ok based on the above I made two changes:
- I am now getting the image name from the json returned from the self link in the registration tokens rather than a variable I had defined in Ansible
- I renamed the container to rancher_agent_launcher. is this is better because now I have a rancher_agent_launcher and rancher_agent containers. Not knowing that it spawns a new container I had no idea if it based the new name on the old one or if it is hard coded. Which is why I always had two containers named rancher-agent one stopped and the new one spawned. Now that I know this I can make sure the launching container is named properly.
But I disagree with the statement “You’re overcomplicating this by defining it as a docker_container task in ansible and trying to make that container auto-restart and assert that the state is running.” I don’t agree its “overcomplicating” to use the tool as its intended to be used.
-
Rancher uses docker containers and the docker_container task in ansible is the proper way to run a docker container. Yes I could use shell task or command task but that defeats the purpose of having a properly defined way of running docker containers via a tool like Ansible and would not have the proper tests and metadata to make sure the task is idempotent.
-
What is over complicated is a docker container starting another docker container which is not standard practice. I won’t debate the design decision of this because it may be the right choice. But because it is not standard practice it should be well documented which I mentioned further up in this thread.
-
“The registration command as given is idempotent, you can run it on every boot if you want. And the container that is ultimately started already has --restart=always.” This is not true. At least not for a proper definition of idempotent. I just ran the registration command 4 times:
sudo docker run -d --privileged -v /var/run/docker.sock:/var/run/docker.sock -v /var/lib/rancher:/var/lib/rancher rancher/agent:v1.2.0 http://rancher1.nowcom.io:8080/v2-beta/scripts/FC46915F487660891F28:1483142400000:RvhwtYadiYEljoMUqgNDJS0tQ
and now I have 4 stopped containers just sitting around stopped and doing nothing. Idempotent means if you ran it over and over you won’t get a different result. I am getting a different result because now I have 4 extra stopped containers.And each one of them deleting and recreating restarted the actual spawned container. That doesn’t meet the definition of idempotent. (see last two lines below)
3/8/2017 12:50:21 PM
3/8/2017 12:50:21 PMINFO: Running Agent Registration Process, CATTLE_URL=http://rancher1.nowcom.io:8080/v2-beta/scripts/FC46915F487660891F82:1483142400000:RvhwTYadiYEljoMuqgNDJS0tQ
3/8/2017 12:50:21 PMINFO: Attempting to connect to: http://rancher1.nowcom.io:8080/v2-beta/scripts/FC46915F487660891F82:1483142400000:RvhwTYadiYEljoMuqgNDJS0tQ
3/8/2017 12:50:22 PMINFO: http://rancher1.nowcom.io:8080/v2-beta/scripts/FC46915F487660891F82:1483142400000:RvhwTYadiYEljoMuqgNDJS0tQ is accessible
3/8/2017 12:50:22 PMINFO: Inspecting host capabilities
3/8/2017 12:50:22 PMINFO: Boot2Docker: false
3/8/2017 12:50:22 PMINFO: Host writable: true
3/8/2017 12:50:22 PMINFO: Token: xxxxxxxx
3/8/2017 12:50:22 PMINFO: Running registration
3/8/2017 12:50:23 PMINFO: Printing Environment
3/8/2017 12:50:23 PMINFO: ENV: CATTLE_ACCESS_KEY=6857BBE910887009529C
3/8/2017 12:50:23 PMINFO: ENV: CATTLE_HOME=/var/lib/cattle
3/8/2017 12:50:23 PMINFO: ENV: CATTLE_REGISTRATION_ACCESS_KEY=registrationToken
3/8/2017 12:50:23 PMINFO: ENV: CATTLE_REGISTRATION_SECRET_KEY=xxxxxxx
3/8/2017 12:50:23 PMINFO: ENV: CATTLE_SECRET_KEY=xxxxxxx
3/8/2017 12:50:23 PMINFO: ENV: CATTLE_URL=http://rancher1.nowcom.io:8080/v2-beta
3/8/2017 12:50:23 PMINFO: ENV: DETECTED_CATTLE_AGENT_IP=192.168.100.241
3/8/2017 12:50:23 PMINFO: ENV: RANCHER_AGENT_IMAGE=rancher/agent:v1.2.0
3/8/2017 12:50:23 PMINFO: Deleting container rancher-agent
3/8/2017 12:50:23 PMINFO: Launched Rancher Agent: 7fb343884935125acf6d088e0de4db6fd1be156c413ee439bc8273980a375933
Which brings me to my last point. Ignoring the idempotency issue for a moment, the container I start rancher-agent-launcher sticks around after launching (spawning) the actual rancher agent container. Yes its stopped but its just hanging there my hosts always shows at least two containers in the Stand alone containers section in the UI. The launcher with a red circle showing its stopped.
This has nothing to do with the fact I am using a tool like ansible because I get the same results I mentioned above just running it on the command line of the host. I can still have multiple launcher agents created, run and stopped.
It is already two containers, they just use the same image so an additional pull isn’t needed and one starts the other. Making them separate images solves nothing.
Except it would have prevented this confusion. What’s least complicated from a developers standpoint is, in this case, not very transparent to the user. If you stay with the one image approach, then you should write some documentation for the users.
A HTTP GET writes a new line into the web server’s log each time you call it too but that doesn’t mean anybody would claim it’s not idempotent. The functionally is idempotent: the host is registered once in Rancher and one rancher/agent container is running on the host no matter how many times you run the registration command. But yes if you go digging there’s an extra line in ps -a
. You can add --rm
to the command if you want to have it delete itself on exit.
The functionally is idempotent: the host is registered once in Rancher and one rancher/agent container is running on the host no matter how many times you run the registration command
I’m sorry but that is not true. There is not one rancher/agent container running on the container no matter how many times the registration command is run. It is deleting and restarting an already existing container which directly opposite to the idea of idempotency. The true part is that at the end of this process there remains one but the existing one was killed to make way for a new one. Again, that is not idempotent behavior. There should be at the very least a test in there to determine whether or not the new functionality is needed and only destroy and recreate if absolutely required. Currently, it does it regardless of being required.
But yes if you go digging there’s an extra line in ps -a. You can add --rm to the command if you want to have it delete itself on exit.
It’s not only an extra line in ps -a but it is also shown on the rancher UI and very visible. I have added the --rm functionality now that I know that there are two containers for rancher/agent. But this begs the question why is this not part of the original command? What is the value of keeping it around now? I really would like an answer to this question because if it should be kept around I am now doing something wrong by removing it.
Honestly, it really looks like there used to be one rancher/agent and then rancher team decided on the design to spawn another rancher/agent as a new design paradigm. And because of this decision:
- Documentation is not kept up to date
- the command to register an agent forgot to include the --rm argument
- someone forgot to add in a proper test to make sure the original running agent didn’t need to be destroyed and recreated.
If these things are in process and just not available yet I can understand that because not everything can be rolled out all at once. But saying the feature as-is is idempotent is wrong.
The nice thing about open source is you can easily run a 2-year old rancher/server tag and see that it’s always been this way, and will not be changing now.
The only thing that’s changed somewhat recently is:
- It’s actually now two containers and a volume instead of 3 (a data-only container) because we dropped compatibility with old docker versions (even though they have long since been “unsupported”, they still mostly worked)
-We stopped excluding the agent from backpopulating (importing a natively started container into the db) copies of itself. This was changed so you can look at logs from the agent in the UI/API.
--rm
is more trouble than it’s worth because when something is setup wrong the log contain can useful information.
What should change but has been buried in all this is the initial container should be marked started-once in the DB, and therefore be green instead of red.
The public interface to getting the “correct” rancher/agent tag for the version of the server that is running is the registration command show in the UI or registrationToken resource in the API (where the UI gets it). The full command, image and registrationUrl are all there separately so you can use whichever you want.
Can I somehow use wget on our hosts to get the right agent version from our server? Our hosts are running RancherOS.