Alertmanager configs in Monitoring V2 in Rancher no working

jmlp1 · February 3, 2024, 8:01pm

Rancher V2.7.5 and v2.7.6 both have the same issue, notifications via Slack or Email Receivers are not sent
Followed instruction here How to set up Alertmanager configs in Monitoring V2 in Rancher | Support | SUSE

There are also opened issues here

github.com/rancher/rancher

AlertmanagerConfig not sending alerts/notifications to Slack - Rancher 2.7.3

opened 04:49PM - 17 May 23 UTC

tbernacchi

kind/bug area/monitoring [zube]: To Triage team/observability&backup

**Rancher Server Setup** - Rancher version: 2.7.3 - GKE: v1.24.10-gke.2300 *…*Information about the Cluster** - Kubernetes version: GKE: v1.24.10-gke.2300 **Describe the bug** After the creation of the AlertmanagerConfig nothing happens and the alerts/notifications are not sending to Slack channel. **To Reproduce** - Create a secret containing the webhook URL for the Slack channel: ```bash kubectl create secret generic my-secret-slack -n cattle-monitoring-system --from-literal=key="https://hooks.slack.com/services/T9VQ2SN9K/B0542USLL8J/XXXXXXXXXXXXXXXXXXXX" ``` - PrometheusRule: <img width="1266" alt="Screen Shot 2023-05-17 at 14 03 38" src="https://github.com/rancher/rancher/assets/26934246/33f59259-9ebc-419c-8e2b-4f5d99fc5659"> <img width="1266" alt="Screen Shot 2023-05-17 at 14 02 52" src="https://github.com/rancher/rancher/assets/26934246/aef76c7d-c031-4438-9745-ae32c936b785"> - Create an AlertmanagerConfig on UI setting Slack as a receiver: <img width="1266" alt="Screen Shot 2023-05-17 at 13 37 55" src="https://github.com/rancher/rancher/assets/26934246/00a6ff7b-e650-4164-97af-ea2baa40c53a"> <img width="1266" alt="Screen Shot 2023-05-17 at 13 38 46" src="https://github.com/rancher/rancher/assets/26934246/f5ff4760-3abd-4b76-8bc4-1ce3bec9b4c3"> <img width="1266" alt="Screen Shot 2023-05-17 at 13 39 30" src="https://github.com/rancher/rancher/assets/26934246/62a0e98d-6510-4538-9918-b7621032382d"> <img width="1266" alt="Screen Shot 2023-05-17 at 13 46 24" src="https://github.com/rancher/rancher/assets/26934246/f5d5d955-6252-44eb-8c71-ebf5b55f5c87"> The alert is in the UI, but no messages are arriving in the Slack channel: <img width="1266" alt="Screen Shot 2023-05-17 at 14 02 24" src="https://github.com/rancher/rancher/assets/26934246/b945ab8b-7496-4a52-8c7b-ef3197b48cc6"> <img width="1266" alt="Screen Shot 2023-05-17 at 14 02 52" src="https://github.com/rancher/rancher/assets/26934246/66612b99-144f-4d20-9d18-47552b3f8e63"> Using `curl` on the command line the message arrives normally on the Slack channel: <img width="1266" alt="Screen Shot 2023-05-17 at 14 17 40" src="https://github.com/rancher/rancher/assets/26934246/378876f9-8f43-417a-9ff2-6e6a4358dea5"> **Result** No alerts are sending to Slack channel. **Expected Result** Slack channel receives the alerts. **Logs** I believe there is nothing unusual on the logs either: ```bash kubectl logs pod/alertmanager-rancher-monitoring-alertmanager-0 -n cattle-monitoring-system -c alertmanager ... ... ... ts=2023-05-17T15:43:24.425Z caller=coordinator.go:113 level=info component=configuration msg="Loading configuration file" file=/etc/alertmanager/config_out/alertmanager.env.yaml ts=2023-05-17T15:43:24.426Z caller=coordinator.go:126 level=info component=configuration msg="Completed loading of configuration file" file=/etc/alertmanager/config_out/alertmanager.env.yaml ts=2023-05-17T15:49:24.624Z caller=coordinator.go:113 level=info component=configuration msg="Loading configuration file" file=/etc/alertmanager/config_out/alertmanager.env.yaml ts=2023-05-17T15:49:24.624Z caller=coordinator.go:126 level=info component=configuration msg="Completed loading of configuration file" file=/etc/alertmanager/config_out/alertmanager.env.yaml ts=2023-05-17T16:37:25.641Z caller=coordinator.go:113 level=info component=configuration msg="Loading configuration file" file=/etc/alertmanager/config_out/alertmanager.env.yaml ts=2023-05-17T16:37:25.641Z caller=coordinator.go:126 level=info component=configuration msg="Completed loading of configuration file" file=/etc/alertmanager/config_out/alertmanager.env.yaml ``` ```bash kubectl logs -f prometheus-rancher-monitoring-prometheus-0 -n cattle-monitoring-system -c prometheus ... ... ... ts=2023-05-17T14:27:03.394Z caller=main.go:1181 level=info msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml ts=2023-05-17T14:27:03.401Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config" ts=2023-05-17T14:27:03.401Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config" ts=2023-05-17T14:27:03.401Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config" ts=2023-05-17T14:27:03.402Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config" ts=2023-05-17T14:27:03.402Z caller=kubernetes.go:326 level=info component="discovery manager notify" discovery=kubernetes msg="Using pod service account via in-cluster config" ts=2023-05-17T14:27:03.518Z caller=main.go:1218 level=info msg="Completed loading of configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml totalDuration=124.398203ms db_storage=1.668µs remote_storage=3.097µs web_handler=1.809µs query_engine=1.742µs scrape=2.261779ms scrape_sd=1.441285ms notify=29.135µs notify_sd=337.088µs rules=115.691019ms tracing=8.936µs ts=2023-05-17T15:00:02.173Z caller=compact.go:519 level=info component=tsdb msg="write block" mint=1684324800180 maxt=1684332000000 ulid=01H0N3K9FG9W8715R904QF5W71 duration=1.036452537s ts=2023-05-17T15:00:02.213Z caller=head.go:844 level=info component=tsdb msg="Head GC completed" duration=37.948353ms ts=2023-05-17T15:00:02.215Z caller=checkpoint.go:100 level=info component=tsdb msg="Creating checkpoint" from_segment=6 to_segment=7 mint=1684332000000 ts=2023-05-17T15:00:02.768Z caller=head.go:1013 level=info component=tsdb msg="WAL checkpoint complete" first=6 last=7 duration=553.32545ms ts=2023-05-17T17:00:02.170Z caller=compact.go:519 level=info component=tsdb msg="write block" mint=1684332000023 maxt=1684339200000 ulid=01H0NAF0QGSQKQMCKTCG7WK47X duration=1.03365409s ts=2023-05-17T17:00:02.214Z caller=head.go:844 level=info component=tsdb msg="Head GC completed" duration=41.518447ms ts=2023-05-17T17:00:04.062Z caller=compact.go:460 level=info component=tsdb msg="compact blocks" count=3 mint=1684303200173 maxt=1684324800000 ulid=01H0NAF1S8GA622JRW4F53383X sources="[01H0MF03VY371H8TW0K9TF1DYC 01H0MNVV3Y9HX4BXAPF7CPRDZX 01H0MWQJBZTY1ES1GJ4ZDBDB3G]" duration=1.84544325s ts=2023-05-17T17:00:04.070Z caller=db.go:1294 level=info component=tsdb msg="Deleting obsolete block" block=01H0MF03VY371H8TW0K9TF1DYC ts=2023-05-17T17:00:04.076Z caller=db.go:1294 level=info component=tsdb msg="Deleting obsolete block" block=01H0MWQJBZTY1ES1GJ4ZDBDB3G ts=2023-05-17T17:00:04.082Z caller=db.go:1294 level=info component=tsdb msg="Deleting obsolete block" block=01H0MNVV3Y9HX4BXAPF7CPRDZX ``` **Additional context** I've double-check this procedure before opening this Github issue: https://www.suse.com/support/kb/doc/?id=000020737 Any help will be appreciate it! Thank you

github.com/rancher/rancher

Can not send alert email

opened 06:16AM - 25 Apr 23 UTC

matrixoneken

kind/bug area/alerting [zube]: Need Info team/infracloud team/opni

I use rancher 2.7.1 I have set alert rule. I can see the rule is acitved and th…e alert is fired , just as following images ![1682403162206](https://user-images.githubusercontent.com/41671830/234189633-6e089629-d5aa-45c9-826f-b1a75f2e13fe.png) I also set the AlertmanagerConfig as following ![1682403267511](https://user-images.githubusercontent.com/41671830/234189876-00b02374-5dfc-4fac-8cbe-2f87448c5713.png) ![1682403313286](https://user-images.githubusercontent.com/41671830/234189983-c870824d-fb84-4381-b568-c7ea91bfcc5f.png) I set a receiver of email and set a routers for the new alert rule . But I can not get some alert email. Can someone help me. Thank you!

I tried with Robusta in an AKS env and just worked, their alert manager sent notifications to Slack or Emails based on Prometheus

At the moment we are not receiving alerts from production Rancher clusters or even testing ones, different clouds are all behaving the same

Would be great if someone could have a look into this

vaishnav · February 5, 2024, 8:08am

Hi, in the Alertmanager GUI → Status section, can you see your alert manager config present there? We need to check if the Config is getting reflected in the Alertmanager. If the Alertmanager configurations is not getting updated then search for alertmanager-monitoring-rancher-monitor-alertmanger secret in the cattle-monitoring-system namespace. You need to delete this secret since it contains the alertmanager data and after deleting this, new config will get applied. Wait for some time and if the config is correct it will be updated.

jmlp1 · February 17, 2024, 10:15am

Hi @vaishnav,
Tried your steps, easy to follow, and indeed was the wrong config, however after deleting the secret still showed the wrong config, should be a slack and email receivers instead of pagerduty, which we don’t use
In addition the Status appears disabled, I have tried to read few internet article and Suse documentation but this is not really clear about how to proceed for enabling it

This is the confusing part
Routers and Receivers appear as deprecated, receivers are empty but routes have a default route that matches the above, this “default null” looks cannot be deleted

My configuration is set in AlertmanagerConfigs section but looks not been applied

vaishnav · February 19, 2024, 7:06am

If it’s showing disabled it’s okay. And can you create a AlertManagerConfig instead of Routes and Receivers? You can configure the routes and receivers in the AlertManagerConfig as well.

Topic		Replies	Views
Alert manager and system-library-rancher-monitoring-0.0.2 not found Rancher	5	3212	December 24, 2019
Rancher 2.6.3 Monitoring and Alerting Rancher	0	269	June 16, 2022
Monitoring alert-rules via API? Rancher	0	513	April 9, 2019
Email Notification of Rancher Monitoringv2.5.9 is not working Rancher	0	859	December 1, 2021
Unable to send alerts to MS Teams from Rancher Rancher	0	702	December 1, 2021

Alertmanager configs in Monitoring V2 in Rancher no working

Related topics