I’ve tried editing the yaml live in the webUI for the cluster itself but can’t seem to find a syntax that works in the kubelet stanza. The cluster either flails for a while trying and failing to apply the change and then rolls back, or just ignores the changes and they disappear.
I’m guessing I want to be using an “extra_args” flag in the yaml and then kubectl’s native “–cpu-manager-policy=static” ? or is there some rancher/yaml syntax along the lines of cpu_manager_policy: static? I’ve tried a slew of variations on a test cluster with no luck and I confess this feels like a really sloppy way to figure this out. But I can’t seem to find any examples on line.
Some Progress:
By experimenting with changing values slightly for existing extra_args statements in the etcd service and checking the results of “ps aux” inside the container (first I had to install ps) I can see how rancher parses it’s yaml syntax and presents it to the binary call as an argument.
I also found information on the actual thing I’m trying to change (the CPU manager policy) suggesting that if I set this I also need to reserve at least one CPU for kubernetes use.
So I’m about sure the correct syntax for the kubelet stanza in the “live” config.yaml is:
Note those first two lines in the stanza were there already.
‘docker ps’ on a controller node that’s failing to accept the changed config shows that the kubelet container is in a restart loop. ‘docker logs’ shows kubelet complaining about both of my added arguments being deprecated and that they should be passed in a config file. But the usage statement lists a slew of CLI arguments as similarly deprecated. Presumably at some point rancher is going to need to reengineer their kubelet container to inject and call a config file.
The current meaningful error is the last line of the attempted start:
F0518 16:36:35.790721 6150 server.go:273] failed to run Kubelet: could not initialize checkpoint manager: could not restore state from checkpoint: configured policy “static” differs from state checkpoint policy “none”
I suspect this is a kubernetes issue, not a rancher one. But would still be grateful for any input.
The output also says “Please drain this node and delete the CPU manager checkpoint file “/var/lib/kubelet/cpu_manager_state” before restarting Kubelet.” but how do I do that in a docker container that I can’t shell into while it’s in a restart loop?
So this does still appear to be a rancher issue: if this container is fungible, the previous checkpoint would come with the container image pulled from rancher’s repo? Even if I could delete that file, the checkpoint state would just revert on the next restart? And, it seems like this would be a problem for many other settings that might change?
SOLVED. Unable to shell into the rebooting container I never got far enough to figure out that /var/lib/kubelet/cpu_manager_state is not inside the container, it’s mounted from the node’s file system. Fix is to apply the extra_arg setting in the cluster’s yaml config and then delete /var/lib/kubelet/cpu_manager_state from every controller and worker node