Pods.metrics.k8s.io is forbidden: User "system:kube-proxy"

Trying to create an autoscaling config based on cpu metrics, see the following output;

Events:
Type Reason Age From Message


Warning FailedGetResourceMetric 91s (x2280 over 19h) horizontal-pod-autoscaler unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: pods.metrics.k8s.io is forbidden: User “system:kube-proxy” cannot list pods.metrics.k8s.io in the namespace “dashboard-test”.

Using a rancher deployed k8s cluster on prem with RKE. Checking clusterroles and clusterrolebindings I cant find any reference to that user. Is someconfig missing?

Using rancher 2.0.8.

So i’ve managed to hack this together into a fix. I’ll set out the details below;

  1. It appears to me that the job which runs to configure metrics-server doesnt create an appropriate RBAC policy for projects/namespaces.
  2. I upgraded to Rancher 2.1.1 and ended up pushing to version 1.12 of rancher k8s, this appears to update the logging message to include a vital piece at the end of the line “no RBAC policy matched”
  3. After a lot of poking around, it seems that no policy is created in the namespace for the given deployment automatically, which explains why the system:kube-proxy user cannot access the relevant API’s. I’m still confused as to why system:kube-proxy is in use though as it doesnt appear to be declared anywhere in the metric-server configs, or anywhere else that I could see, other than in the rancher golang code in a function which returns system:kube-proxy if the username passed is kube-proxy.

In order to make this work, I performed the following 2 steps;

  1. created a rolebinding in the namespace of the pod in question. For the sake of testing, I used the admin clusterrole. I did this with the below kubectl command;

kubectl create rolebinding hpa-test --clusterrole=admin --user=system:kube-proxy --namespace=dashboard-test

I then found that I needed to configure my workload to have resources. This was highlighted by the error;
failed to get cpu utilization: missing request for cpu
Resolution was based on here; https://stackoverflow.com/questions/34486213/kubernetes-autoscaling-not-working (thanks to George on that post)
I made this edit in the rancher gui using the YAML editor

Once those 2 things were done, I deleted and re-created the autoscaling config and ran a describe on it a number of times to see the updates as they came in and everything now appears to be working.

For the rancher team; It’d be really helpful if;
a) there was some documentation added around autoscaling which might have helped in this situation, even to say its not supported would’ve saved me a week
b) I’m not sure if this is a bug or a feature yet as I’m not knowledgeable enough, might be worth someone taking a look at this
c) add autoscaling functionality to the GUI (if it isnt already. In which case where is it?)

Finally, thanks for making Rancher, its a really useful tool and has allowed us to rapidly accelerate our learning and deployment of containers.

For completeness, here is the locked down rbac config applied that works;

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: view-metrics
rules:
- apiGroups:
    - metrics.k8s.io
  resources:
    - pods
    - nodes
  verbs:
    - get
    - list
    - watch

---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: view-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: view-metrics
subjects:
  - apiGroup: rbac.authorization.k8s.io
    kind: User
    name: system:kube-proxy