Rancher 2.5.2 - One cluster's System project not viewable

One of my RKE clusters, running v1.18.3 fwiw, I cannot view the System project - the UI shows System in grey with “Cluster not ready”, and the cursor turns to a red circle with a slash indicating that I can’t click it.

From kubectl command line, I can see system services are running fine, so I don’t know what’s going on here. I have another RKE cluster in the same rancher instance running v1.18.3 without this problem, and several others with v1.17 versions and an EKS cluster without this problem.

Where should I start looking?

If this helps I found something odd when viewing the project in the API view (going into the cluster -> Projects/Namespaces -> Click on three dots beside the System project and View in API, then click Edit)

Two fields:

Field Name Type Value
state string error
transitioning enum error

@spirilis Do you see anything in the rancher pod logs on the local cluster?

There is some interesting stuff but it’s across multiple clusters, example:

{"log":"2020/11/12 03:00:37 [ERROR] error syncing 'p-r45zp/projectalert-workload-alert': handler project-alert-group-controller: Failed
 to get service for alertmanager, services \"cattle-prometheus/alertmanager-operated\" not found, requeuing\n","stream":"stdout","time"
:"2020-11-12T03:00:37.49213735Z"}
{"log":"2020/11/12 03:00:37 [ERROR] error syncing 'p-42ls8/projectalert-workload-alert': handler project-alert-group-controller: Failed
 to get service for alertmanager, services \"cattle-prometheus/alertmanager-operated\" not found, requeuing\n","stream":"stdout","time"
:"2020-11-12T03:00:37.496114896Z"}
{"log":"2020/11/12 03:00:37 [ERROR] error syncing 'p-nq4ss/projectalert-workload-alert': handler project-alert-group-controller: Failed
 to get service for alertmanager, services \"cattle-prometheus/alertmanager-operated\" not found, requeuing\n","stream":"stdout","time"
:"2020-11-12T03:00:37.49684387Z"}
{"log":"2020/11/12 03:00:38 [ERROR] error syncing 'local/n-hwj5w': handler notifier-config-syncer: Update Webhook Receiver Config: Get
secret: secrets \"webhook-receiver\" not found, requeuing\n","stream":"stdout","time":"2020-11-12T03:00:38.115606262Z"}
{"log":"2020/11/12 03:00:38 [ERROR] error syncing 'c-7jf7k/node-alert': handler cluster-alert-group-controller: Update Webhook Receiver
 Config: Get secret: secrets \"webhook-receiver\" not found, requeuing\n","stream":"stdout","time":"2020-11-12T03:00:38.218844926Z"}
{"log":"2020/11/12 03:00:38 [ERROR] error syncing 'c-7jf7k/kube-components-alert': handler cluster-alert-group-controller: Update Webho
ok Receiver Config: Get secret: secrets \"webhook-receiver\" not found, requeuing\n","stream":"stdout","time":"2020-11-12T03:00:38.2639
0932Z"}

That said, I just did an interesting experiment.

I created a new project “SystemZ” and tried moving all the system namespaces over to it. Initially the “Move” button was greyed out, but once I unchecked “cattle-logging”, the Move button turned blue and I was able to move them to SystemZ.

I renamed “System”, now containing only cattle-logging, to “SystemL” and renamed “SystemZ” to “System”.

Now I can manage the System project just fine - but SystemL showed greyed out with “Cluster not ready”.