Enabled cluster level monitoring BUT pod errors in operator-init* pods

bmcderm · September 30, 2020, 9:01pm

We just enabled monitoring on our Rancher cluster by logging in to the Rancher interface and going to Tools–>Monitoring and clicking the “Enable” button at the bottom.

The monitoring resources are placed in the namespace: cattle-prometheus

When I run: kubectl get pod -n cattle-prometheus -o wide

I am seeing 7 pods that have NAME alertmanager-cluster-alerting-0 exporter-kube-state-cluster-monitoring exporter-node-cluster-monitoring-595qr exporter-node-cluster-monitoring-7pm4v exporter-node-cluster-monitoring-gz4w8 exporter-node-cluster-monitoring-jg67g exporter-node-cluster-monitoring-vqfps exporter-node-cluster-monitoring-zzc6r grafana-cluster-monitoring-dcc77fbbb-tfxrb operator-init-cluster-monitoring-g9prd operator-init-cluster-monitoring-lx6kh operator-init-cluster-monitoring-wqgrn operator-init-monitoring-operator-8zcfx operator-init-monitoring-operator-bnt8h operator-init-monitoring-operator-cg4c7 operator-init-monitoring-operator-hzrhg prometheus-cluster-monitoring-0 prometheus-operator-monitoring-operato a STATUS of “Error”:
READY STATUS
2/2 Running
-7c8bf7c7bb-g54ff 1/1 Running
1/1 Running
1/1 Running
1/1 Running
1/1 Running
1/1 Running
1/1 Running
2/2 Running
0/1 Error
0/1 Error
0/1 Error
0/1 Error
0/1 Error
0/1 Error
0/1 Error
5/5 Running
r-5bb7ddcd65-bwxgm 1/1 Running

The logs in operator-init-cluster-monitoring contain the following:

customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com configured
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com configured
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com configured
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com configured
The CustomResourceDefinition "podmonitors.monitoring.coreos.com" is invalid: 
* spec.validation.openAPIV3Schema.properties[spec].properties[podMetricsEndpoints].items.properties[targetPort].anyOf[0].type: Forbidden: must be empty to be structural
* spec.validation.openAPIV3Schema.properties[spec].properties[podMetricsEndpoints].items.properties[targetPort].anyOf[1].type: Forbidden: must be empty to be structural
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com configured

The logs in opeator-init-monitoring-operator contain the following:

Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com configured
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com configured
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com configured
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com configured
The CustomResourceDefinition "podmonitors.monitoring.coreos.com" is invalid: 
* spec.validation.openAPIV3Schema.properties[spec].properties[podMetricsEndpoints].items.properties[targetPort].anyOf[0].type: Forbidden: must be empty to be structural
* spec.validation.openAPIV3Schema.properties[spec].properties[podMetricsEndpoints].items.properties[targetPort].anyOf[1].type: Forbidden: must be empty to be structural

Since monitoring seems to be deployed by rancher when you click the “Enable” button, I’m not sure why this isn’t working. Has anyone see this error or can you advise?

superseb · October 1, 2020, 10:55am

Please add the used Rancher version and the cluster k8s version you are enabling monitoring on.

bmcderm · October 1, 2020, 1:35pm

Rancher version: 2.4.6

k8s version (from kubectl version):
Client Version: version.Info{Major:“1”, Minor:“19”, GitVersion:“v1.19.0”, GitCommit:“e19964183377d0ec2052d1f1fa930c4d7575bd50”, GitTreeState:“clean”, BuildDate:“2020-08-26T14:30:33Z”, GoVersion:“go1.15”, Compiler:“gc”, Platform:“linux/amd64”}

Server Version: version.Info{Major:“1”, Minor:“18”, GitVersion:“v1.18.6”, GitCommit:“dff82dc0de47299ab66c83c626e08b245ab19037”, GitTreeState:“clean”, BuildDate:“2020-07-15T16:51:04Z”, GoVersion:“go1.13.9”, Compiler:“gc”, Platform:“linux/amd64”}

bmcderm · October 1, 2020, 1:48pm

Our Rancher version is 2.4.6 and Kubernetes version is v1.18.6

bmcderm · October 1, 2020, 3:42pm

My co-worker just stood up a second RKE/Rancher cluster at a newer version and when he enabled monitoring, we are NOT seeing the errors any more. We are going to upgrade the cluster that has this problem. I’m not sure what version the newest cluster is at but I’ve reached out to him to ask and will update this post once I know.

bmcderm · October 1, 2020, 3:47pm

So the cluster running:
racher 2.4.8
RKE Kubernetes 1.18.8

does NOT have any pod errors in cattle-prometheus

We will upgrade the 2.4.6/1.18.6 cluster to this version to overcome the errors in that cluster

kamal_boumahdi · February 7, 2022, 10:00pm

Hi !
I have the same problem, I have Kubernetes Version: v1.19.4 and Rancher 2.5.2
any leads ?

Topic		Replies	Views
Rancher2, failed enable default monitoring	1	965	September 16, 2019
Rancher (incorrectly) reports Cluster in error Rancher	5	1205	July 14, 2021
Using Rancher Alerts to monitor for cattle-system pods that restart frequently?	1	1175	January 14, 2021
I dont understand Prometheus in Rancher at all Rancher	7	6143	June 16, 2019
Install monitoring crashing cluster	1	869	December 16, 2020

Enabled cluster level monitoring BUT pod errors in operator-init* pods

Related topics