Adding cpu-manager-policy argument to kubelet?

Hi All,

I’m looking for a way to set the CPU management policy to ‘static’ as outlined at Control CPU Management Policies on the Node | Kubernetes for a rancher-built/managed on premise cluster.

I’ve tried editing the yaml live in the webUI for the cluster itself but can’t seem to find a syntax that works in the kubelet stanza. The cluster either flails for a while trying and failing to apply the change and then rolls back, or just ignores the changes and they disappear.

I’m guessing I want to be using an “extra_args” flag in the yaml and then kubectl’s native “–cpu-manager-policy=static” ? or is there some rancher/yaml syntax along the lines of cpu_manager_policy: static? I’ve tried a slew of variations on a test cluster with no luck and I confess this feels like a really sloppy way to figure this out. But I can’t seem to find any examples on line.

Any help would be hugely appreciated.

Some Progress:
By experimenting with changing values slightly for existing extra_args statements in the etcd service and checking the results of “ps aux” inside the container (first I had to install ps) I can see how rancher parses it’s yaml syntax and presents it to the binary call as an argument.

I also found information on the actual thing I’m trying to change (the CPU manager policy) suggesting that if I set this I also need to reserve at least one CPU for kubernetes use.

So I’m about sure the correct syntax for the kubelet stanza in the “live” config.yaml is:

kubelet:
  fail_swap_on: false
  generate_serving_certificate: false
  extra_args:
    cpu-manager-policy: 'static'
    reserved-cpus: 0

Note those first two lines in the stanza were there already.

‘docker ps’ on a controller node that’s failing to accept the changed config shows that the kubelet container is in a restart loop. ‘docker logs’ shows kubelet complaining about both of my added arguments being deprecated and that they should be passed in a config file. But the usage statement lists a slew of CLI arguments as similarly deprecated. Presumably at some point rancher is going to need to reengineer their kubelet container to inject and call a config file.

The current meaningful error is the last line of the attempted start:
F0518 16:36:35.790721 6150 server.go:273] failed to run Kubelet: could not initialize checkpoint manager: could not restore state from checkpoint: configured policy “static” differs from state checkpoint policy “none”

I suspect this is a kubernetes issue, not a rancher one. But would still be grateful for any input.

The output also says “Please drain this node and delete the CPU manager checkpoint file “/var/lib/kubelet/cpu_manager_state” before restarting Kubelet.” but how do I do that in a docker container that I can’t shell into while it’s in a restart loop?

So this does still appear to be a rancher issue: if this container is fungible, the previous checkpoint would come with the container image pulled from rancher’s repo? Even if I could delete that file, the checkpoint state would just revert on the next restart? And, it seems like this would be a problem for many other settings that might change?

I’ve also posted this as an issue in the rancher/racher github repo:

Five months later I’ve had no luck solving this problem. My opened issue on github got no reply and eventually autoclosed. Anyone? Bueller? Bueller?

SOLVED. Unable to shell into the rebooting container I never got far enough to figure out that /var/lib/kubelet/cpu_manager_state is not inside the container, it’s mounted from the node’s file system. Fix is to apply the extra_arg setting in the cluster’s yaml config and then delete /var/lib/kubelet/cpu_manager_state from every controller and worker node