We are currently using Rancher version 2.6.9. While debugging some oidc errors we are experiencing trouble with our error logs: They are being flooded by error messages from which we are unable to identify the source of.
You may find the logs of the last 5 minutes in the gist.
We could identify several types of errors:
error syncing 'p-n7k7r/creator-project-owner': handler mgmt-auth-prtb-controller: clusters.management.cattle.io "c-fvg4w" not found, requeuing
error syncing 'p-pqzmm/creator-project-owner': handler auth-prov-v2-prtb: failed to update fleet-local/r-cluster-local-view-p-pqzmm-creator-project-owner-nk3rmcfzaj rbac.authorization.k8s.io/v1, Kind=RoleBinding for auth-prov-v2-prtb-rolebinding p-pqzmm/creator-project-owner: RoleBinding.rbac.authorization.k8s.io "r-cluster-local-view-p-pqzmm-creator-project-owner-nk3rmcfzaj" is invalid: [metadata.ownerReferences.apiVersion: Invalid value: "": version must not be empty, metadata.ownerReferences.kind: Invalid value: "": kind must not be empty, metadata.ownerReferences.name: Invalid value: "": name must not be empty], requeuing
error syncing 'grb-ftw5p': handler grb-cluster-sync: Index with name by-cluster does not exist, requeuing
error syncing 'c-p6msc/p-jq749': handler system-image-upgrade-controller: upgrade cluster c-p6msc system service alerting failed: template system-library-rancher-monitoring incompatible with rancher version or cluster's [c-p6msc] kubernetes version, requeuing
From our understanding there could be some jobs running in the background referencing already deleted objects. Any suggestions how we could clean this up?
Fwiw- it looks like these prtbās are from a project that was deleted, but the project namespace (in the rancher-hosting RKE cluster) is still in a āTerminatingā state. Guess this is remnants of an old bug. I am going to try deleting the remaining resources in that namespace.
Well if anyone else encounters the specific errors I found, it turns out someone deleted some projects in the cluster and the project-namespaces (IN THE CLUSTER HOSTING RANCHER) were hanging around because the mgmt-auth-prtb-controller finalizer couldnāt complete because the āprojectName:ā field in the prtb object had the project ID, but not the cluster.
Editing the YAML for the prtb and prepending c-(clusterID): to the project ID in the projectName: field cleared the backlog and now all those project-namespaces successfully terminated, and the logs quit spamming Rancherās pods.