Alert manager and system-library-rancher-monitoring-0.0.2 not found

Hi

I’ve been running Rancher 2.2.3 for some time now, and now got to setting up notifiers and assigning it to Alerts. Cluster monitoring feature with Prometheus is NOT enabled.

However, we are now seing this alert on the two clusters where I have enabled notifiers:

Failed to ensure catalog “catalog://?catalog=system-library&template=rancher-monitoring&version=0.0.2”: failed to find catalog by ID “catalog://?catalog=system-library&template=rancher-monitoring&version=0.0.2”: catalogtemplateversions.management.cattle.io “system-library-rancher-monitoring-0.0.2” not found

I would like to get input on what is happening within rancher and what I can do to resolve this. It is worth noting that our setup does not have internet access. So we are running an air-gap environment.

//Marcus

Here is a log snippet as well:

2019/06/13 11:07:52 [ERROR] ClusterAlertGroupController c-4twb9/node-alert [cluster-alert-group-deployer] failed with : deploy alertmanager failed, failed to find catalog by ID "catalog://?catalog=system-library&template=rancher-monitoring&version=0.0.2", catalogtemplateversions.management.cattle.io "system-library-rancher-monitoring-0.0.2" not found
2019/06/13 11:07:52 [ERROR] ClusterAlertGroupController c-4twb9/etcd-alert [cluster-alert-group-deployer] failed with : deploy alertmanager failed, failed to find catalog by ID "catalog://?catalog=system-library&template=rancher-monitoring&version=0.0.2", catalogtemplateversions.management.cattle.io "system-library-rancher-monitoring-0.0.2" not found
2019/06/13 11:07:52 [ERROR] ProjectController c-4twb9/p-hcpph [system-image-upgrade-controller] failed with : get template system-library-rancher-logging failed: catalogTemplate.management.cattle.io "cattle-global-data/system-library-rancher-logging" not found
2019/06/13 11:07:52 [ERROR] ProjectController c-4twb9/p-f7w2j [system-image-upgrade-controller] failed with : get template system-library-rancher-logging failed: catalogTemplate.management.cattle.io "cattle-global-data/system-library-rancher-logging" not found
2019/06/13 11:07:52 [ERROR] ClusterAlertRuleController c-4twb9/scheduler-system-service [cluster-alert-rule-deployer] failed with : deploy alertmanager failed, failed to find catalog by ID "catalog://?catalog=system-library&template=rancher-monitoring&version=0.0.2", catalogtemplateversions.management.cattle.io "system-library-rancher-monitoring-0.0.2" not found
2019/06/13 11:07:52 [ERROR] ClusterAlertRuleController c-4twb9/node-disk-running-full [cluster-alert-rule-deployer] failed with : deploy alertmanager failed, failed to find catalog by ID "catalog://?catalog=system-library&template=rancher-monitoring&version=0.0.2", catalogtemplateversions.management.cattle.io "system-library-rancher-monitoring-0.0.2" not found
2019/06/13 11:07:52 [ERROR] ClusterAlertRuleController c-4twb9/migrate-clusteralert-controllermanager [cluster-alert-rule-deployer] failed with : deploy alertmanager failed, failed to find catalog by ID "catalog://?catalog=system-library&template=rancher-monitoring&version=0.0.2", catalogtemplateversions.management.cattle.io "system-library-rancher-monitoring-0.0.2" not found
2019/06/13 11:07:52 [ERROR] ClusterAlertRuleController c-4twb9/deployment-event-alert [cluster-alert-rule-deployer] failed with : deploy alertmanager failed, failed to find catalog by ID "catalog://?catalog=system-library&template=rancher-monitoring&version=0.0.2", catalogtemplateversions.management.cattle.io "system-library-rancher-monitoring-0.0.2" not found
2019/06/13 11:07:52 [ERROR] ClusterAlertRuleController c-4twb9/high-cpu-load [cluster-alert-rule-deployer] failed with : deploy alertmanager failed, failed to find catalog by ID "catalog://?catalog=system-library&template=rancher-monitoring&version=0.0.2", catalogtemplateversions.management.cattle.io "system-library-rancher-monitoring-0.0.2" not found
2019/06/13 11:07:52 [ERROR] ProjectController c-jbmwt/p-pq8zh [system-image-upgrade-controller] failed with : get template system-library-rancher-logging failed: catalogTemplate.management.cattle.io "cattle-global-data/system-library-rancher-logging" not found
2019/06/13 11:07:52 [ERROR] ProjectController c-jbmwt/p-kwkgs [system-image-upgrade-controller] failed with : get template system-library-rancher-logging failed: catalogTemplate.management.cattle.io "cattle-global-data/system-library-rancher-logging" not found
2019/06/13 11:07:52 [ERROR] ProjectController c-jbmwt/p-vhrjp [system-image-upgrade-controller] failed with : get template system-library-rancher-logging failed: catalogTemplate.management.cattle.io "cattle-global-data/system-library-rancher-logging" not found
2019/06/13 11:09:55 [ERROR] CatalogController library [catalog] failed with : Timeout in HTTP GET to [https://git.rancher.io/charts/index.yaml], did not respond in 30s
2019/06/13 11:09:55 [ERROR] CatalogController system-library [catalog] failed with : Timeout in HTTP GET to [https://git.rancher.io/system-charts/index.yaml], did not respond in 30s
1 Like

I removed all alerts from each of the two clusters and then restarted docker+rancher container. Now i’m only seeing this in the log:

2019/06/13 11:40:53 [ERROR] ProjectController c-4twb9/p-f7w2j [system-image-upgrade-controller] failed with : get template system-library-rancher-logging failed: catalogTemplate.management.cattle.io "cattle-global-data/system-library-rancher-logging" not found
2019/06/13 11:40:53 [ERROR] ProjectController c-jbmwt/p-pq8zh [system-image-upgrade-controller] failed with : get template system-library-rancher-logging failed: catalogTemplate.management.cattle.io "cattle-global-data/system-library-rancher-logging" not found

But the original message from the first post is still visible from the main cluster page, at /g/clusters
:frowning:

I am having the same issue with rancher-v2.3.2.
Did you find a fix for the issue?

1 Like

Yes, this was actually resolved.

We have an Air-gapped environment. So this was the result of Rancher not being able to reach the repositories for the needed helm charts.

This call was made even though the monitoring feature was not enabled. Guess some kind of background job.

So when internet access was corrected, through proxy, then this went away.
Then we also realized that provisioning clusters on vsphere does not work with proxy settings, but that is another topic. Rancher now as direct internet access.

I had the same problem, I configured the proxy so it was not resolved