Trying to create an autoscaling config based on cpu metrics, see the following output;
Events:
Type Reason Age From Message
Warning FailedGetResourceMetric 91s (x2280 over 19h) horizontal-pod-autoscaler unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: pods.metrics.k8s.io is forbidden: User “system:kube-proxy” cannot list pods.metrics.k8s.io in the namespace “dashboard-test”.
Using a rancher deployed k8s cluster on prem with RKE. Checking clusterroles and clusterrolebindings I cant find any reference to that user. Is someconfig missing?
Using rancher 2.0.8.
So i’ve managed to hack this together into a fix. I’ll set out the details below;
- It appears to me that the job which runs to configure metrics-server doesnt create an appropriate RBAC policy for projects/namespaces.
- I upgraded to Rancher 2.1.1 and ended up pushing to version 1.12 of rancher k8s, this appears to update the logging message to include a vital piece at the end of the line “no RBAC policy matched”
- After a lot of poking around, it seems that no policy is created in the namespace for the given deployment automatically, which explains why the system:kube-proxy user cannot access the relevant API’s. I’m still confused as to why system:kube-proxy is in use though as it doesnt appear to be declared anywhere in the metric-server configs, or anywhere else that I could see, other than in the rancher golang code in a function which returns system:kube-proxy if the username passed is kube-proxy.
In order to make this work, I performed the following 2 steps;
- created a rolebinding in the namespace of the pod in question. For the sake of testing, I used the admin clusterrole. I did this with the below kubectl command;
kubectl create rolebinding hpa-test --clusterrole=admin --user=system:kube-proxy --namespace=dashboard-test
I then found that I needed to configure my workload to have resources. This was highlighted by the error;
failed to get cpu utilization: missing request for cpu
Resolution was based on here; https://stackoverflow.com/questions/34486213/kubernetes-autoscaling-not-working (thanks to George on that post)
I made this edit in the rancher gui using the YAML editor
Once those 2 things were done, I deleted and re-created the autoscaling config and ran a describe on it a number of times to see the updates as they came in and everything now appears to be working.
For the rancher team; It’d be really helpful if;
a) there was some documentation added around autoscaling which might have helped in this situation, even to say its not supported would’ve saved me a week
b) I’m not sure if this is a bug or a feature yet as I’m not knowledgeable enough, might be worth someone taking a look at this
c) add autoscaling functionality to the GUI (if it isnt already. In which case where is it?)
Finally, thanks for making Rancher, its a really useful tool and has allowed us to rapidly accelerate our learning and deployment of containers.
For completeness, here is the locked down rbac config applied that works;
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: view-metrics
rules:
- apiGroups:
- metrics.k8s.io
resources:
- pods
- nodes
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: view-metrics
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: view-metrics
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: User
name: system:kube-proxy