Enabled cluster level monitoring BUT pod errors in operator-init* pods

We just enabled monitoring on our Rancher cluster by logging in to the Rancher interface and going to Tools–>Monitoring and clicking the “Enable” button at the bottom.

The monitoring resources are placed in the namespace: cattle-prometheus

When I run: kubectl get pod -n cattle-prometheus -o wide

I am seeing 7 pods that have a STATUS of “Error”:
NAME READY STATUS
alertmanager-cluster-alerting-0 2/2 Running
exporter-kube-state-cluster-monitoring-7c8bf7c7bb-g54ff 1/1 Running
exporter-node-cluster-monitoring-595qr 1/1 Running
exporter-node-cluster-monitoring-7pm4v 1/1 Running
exporter-node-cluster-monitoring-gz4w8 1/1 Running
exporter-node-cluster-monitoring-jg67g 1/1 Running
exporter-node-cluster-monitoring-vqfps 1/1 Running
exporter-node-cluster-monitoring-zzc6r 1/1 Running
grafana-cluster-monitoring-dcc77fbbb-tfxrb 2/2 Running
operator-init-cluster-monitoring-g9prd 0/1 Error
operator-init-cluster-monitoring-lx6kh 0/1 Error
operator-init-cluster-monitoring-wqgrn 0/1 Error
operator-init-monitoring-operator-8zcfx 0/1 Error
operator-init-monitoring-operator-bnt8h 0/1 Error
operator-init-monitoring-operator-cg4c7 0/1 Error
operator-init-monitoring-operator-hzrhg 0/1 Error
prometheus-cluster-monitoring-0 5/5 Running
prometheus-operator-monitoring-operator-5bb7ddcd65-bwxgm 1/1 Running

The logs in operator-init-cluster-monitoring contain the following:

customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com configured
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com configured
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com configured
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com configured
The CustomResourceDefinition "podmonitors.monitoring.coreos.com" is invalid: 
* spec.validation.openAPIV3Schema.properties[spec].properties[podMetricsEndpoints].items.properties[targetPort].anyOf[0].type: Forbidden: must be empty to be structural
* spec.validation.openAPIV3Schema.properties[spec].properties[podMetricsEndpoints].items.properties[targetPort].anyOf[1].type: Forbidden: must be empty to be structural
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com configured

The logs in opeator-init-monitoring-operator contain the following:

Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com configured
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com configured
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com configured
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com configured
The CustomResourceDefinition "podmonitors.monitoring.coreos.com" is invalid: 
* spec.validation.openAPIV3Schema.properties[spec].properties[podMetricsEndpoints].items.properties[targetPort].anyOf[0].type: Forbidden: must be empty to be structural
* spec.validation.openAPIV3Schema.properties[spec].properties[podMetricsEndpoints].items.properties[targetPort].anyOf[1].type: Forbidden: must be empty to be structural

Since monitoring seems to be deployed by rancher when you click the “Enable” button, I’m not sure why this isn’t working. Has anyone see this error or can you advise?

1 Like

Please add the used Rancher version and the cluster k8s version you are enabling monitoring on.

1 Like

Rancher version: 2.4.6

k8s version (from kubectl version):
Client Version: version.Info{Major:“1”, Minor:“19”, GitVersion:“v1.19.0”, GitCommit:“e19964183377d0ec2052d1f1fa930c4d7575bd50”, GitTreeState:“clean”, BuildDate:“2020-08-26T14:30:33Z”, GoVersion:“go1.15”, Compiler:“gc”, Platform:“linux/amd64”}

Server Version: version.Info{Major:“1”, Minor:“18”, GitVersion:“v1.18.6”, GitCommit:“dff82dc0de47299ab66c83c626e08b245ab19037”, GitTreeState:“clean”, BuildDate:“2020-07-15T16:51:04Z”, GoVersion:“go1.13.9”, Compiler:“gc”, Platform:“linux/amd64”}

1 Like

Our Rancher version is 2.4.6 and Kubernetes version is v1.18.6

1 Like

My co-worker just stood up a second RKE/Rancher cluster at a newer version and when he enabled monitoring, we are NOT seeing the errors any more. We are going to upgrade the cluster that has this problem. I’m not sure what version the newest cluster is at but I’ve reached out to him to ask and will update this post once I know.

1 Like

So the cluster running:
racher 2.4.8
RKE Kubernetes 1.18.8

does NOT have any pod errors in cattle-prometheus

We will upgrade the 2.4.6/1.18.6 cluster to this version to overcome the errors in that cluster

1 Like

Hi !
I have the same problem, I have Kubernetes Version: v1.19.4 and Rancher 2.5.2
any leads ?