Use PodNodeSelector Admission Controller to prohibit using specific nodes

I’m trying to build a “multi-tenancy” RKE cluster with Existing Custom Nodes from Rancher 2.5.3. In the cluster I want to dedicate specific worker nodes to different customers using a specific name-space for each customer and NodeSelectors to define which worker nodes each name-space should use.

To prohibit users from violating the policy I’ve managed to enable PodNodeSelector by adding the following lines to the cluster YAML config:

kube-api:
   extra_args:
     admission-control-config-file: /etc/kubernetes/adm-control/admission-control.yaml
     enable-admission-plugins: NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota,NodeRestriction,Priority,TaintNodesByCondition,PersistentVolumeClaimResize,PodNodeSelector
   extra_binds:
     - '/etc/kubernetes/adm-control:/etc/kubernetes/adm-control'

And created the following files on each ctrl-plane node:

$ sudo cat /etc/kubernetes/adm-control/admission-control.yaml 
apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
- name: PodNodeSelector
  path: podnodeselector.yaml

$ sudo cat /etc/kubernetes/adm-control/podnodeselector.yaml 
podNodeSelectorPluginConfig:
  clusterDefaultNodeSelector: "cust=none"
  ns-cust1: "cust=cust1"
  ns-cust2: "cust=cust2"

I could then label my hosts and annotate my name-spaces accordingly to make each customers pods scheduled to the correct hosts.

Is there a better way to accomplish the same thing, or an easier way to configure it with Rancher? Especially the need to need to create local config-files on each master node and bind into the kube-apiserver service is really ugly to maintain. Any way to “inject” those config files through the UI or the cluster YAML config instead?

Quick update:

I found that the whole /etc/kubernetes folder is already mounted in the kube-apiserver container, so no need for the extra_bind, so the additions in cluster.yaml is now only:

kube-api:
  extra_args:
    admission-control-config-file: /etc/kubernetes/adm-control/admission-control.yaml
    enable-admission-plugins: NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota,NodeRestriction,Priority,TaintNodesByCondition,PersistentVolumeClaimResize,PodNodeSelector

Also found that I could consolidate the two admission-control-config files to one:

$ cat /etc/kubernetes/adm-control/admission-control.yaml
apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
- name: PodNodeSelector
  configuration:
    podNodeSelectorPluginConfig:
      clusterDefaultNodeSelector: "cust=none"
      ns-cust1: "cust=cust1"
      ns-cust2: "cust=cust2"

However, I also found that there is built-in support for some admission control plugins like RateLimit, and if I enable that with:

services:
  kube-api:
    event_rate_limit:
      enabled: true

It will create its own /etc/kubernetes/admission.yaml file and override my --admission-control-config-file directive. I suppose what we need is a way to configure the PodNodeSelector plugin the same way as the RateLimit plugin is configured so Rancher can combine the admission-control-config-file for us:

services:
  kube-api:
    event_rate_limit:
      enabled: true
      configuration:
        apiVersion: eventratelimit.admission.k8s.io/v1alpha1
        kind: Configuration
        limits:
        - type: Server
          qps: 6000
          burst: 30000

Is such a configuration perhaps already possible?

After some source code diving github, I found the admission_configuration directive, which almost solved problem. With the following kube-api config I were able to accomplish what I wanted:

kube-api:
  admission_configuration:
    api_version: apiserver.config.k8s.io/v1
    kind: AdmissionConfiguration
    plugins:
      - configuration:
          apiVersion: eventratelimit.admission.k8s.io/v1alpha1
          kind: Configuration
          limits:
            - burst: 20000
              qps: 5000
              type: Server
        name: EventRateLimit
        path: ''
      - configuration:
          podNodeSelectorPluginConfig:
            clusterDefaultNodeSelector: cust=none
            ns-cust1: env=cust1
            ns-cust2: env=cust2
        name: PodNodeSelector
        path: ''
  event_rate_limit:
    enabled: true
  extra_args:
    enable-admission-plugins: NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota,NodeRestriction,Priority,TaintNodesByCondition,PersistentVolumeClaimResize,PodNodeSelector

However, to make this work, I had to enable the event_rate_limit plug-in. Otherwise the admission.yaml file is not created and the the --admission-control-config-file cmdline option is not added.

I also had to include EventRateLimit config under admission_configuration (as well as my PodNodeSelector config), otherwise a default admission.yaml file for RateLimit is created, ignoring the admission_configuration directive.

To conclude, it seems to me that the problem of multiple admission control plugins requiring the use of the same configuration file has been identified, so a new common place to configure them (admission_configuration) have been implemented, but it’s currently only used by the RateLimit plugin, so it has the above described hard-coded limits.

It would be nice to see something like

pod_node_selector:
  enabled: true

that would enable the podNodeSelector Plugin and take the config from admission_configuration as well. Or why not make it a bit more generic and always create the admission.yaml file and add the cmdline-directive if any configuration exists under admission_configuration?

PS: The kube-apiserver container is not restarted if the configuration under admission_configuration is changed. Only the admission.yaml file is updated, so you need to manually restart the container or trigger a restart by another configuration change.

1 Like

Hi :slight_smile: unless you need the clusterDefaultNodeSelector, the admission configuration file or object is NOT required, because selectors can be dynamically assigned via scheduler.alpha.kubernetes.io/node-selector annotation directly to each namespace!