Rancher flooding logs with errors

mloibl · December 13, 2022, 3:09pm

We are currently using Rancher version 2.6.9. While debugging some oidc errors we are experiencing trouble with our error logs: They are being flooded by error messages from which we are unable to identify the source of.

You may find the logs of the last 5 minutes in the gist.

We could identify several types of errors:

error syncing 'p-n7k7r/creator-project-owner': handler mgmt-auth-prtb-controller: clusters.management.cattle.io "c-fvg4w" not found, requeuing

error syncing 'p-pqzmm/creator-project-owner': handler auth-prov-v2-prtb: failed to update fleet-local/r-cluster-local-view-p-pqzmm-creator-project-owner-nk3rmcfzaj rbac.authorization.k8s.io/v1, Kind=RoleBinding for auth-prov-v2-prtb-rolebinding p-pqzmm/creator-project-owner: RoleBinding.rbac.authorization.k8s.io "r-cluster-local-view-p-pqzmm-creator-project-owner-nk3rmcfzaj" is invalid: [metadata.ownerReferences.apiVersion: Invalid value: "": version must not be empty, metadata.ownerReferences.kind: Invalid value: "": kind must not be empty, metadata.ownerReferences.name: Invalid value: "": name must not be empty], requeuing

error syncing 'grb-ftw5p': handler grb-cluster-sync: Index with name by-cluster does not exist, requeuing

error syncing 'c-p6msc/p-jq749': handler system-image-upgrade-controller: upgrade cluster c-p6msc system service alerting failed: template system-library-rancher-monitoring incompatible with rancher version or cluster's [c-p6msc] kubernetes version, requeuing

From our understanding there could be some jobs running in the background referencing already deleted objects. Any suggestions how we could clean this up?

Kind regards

Moritz

daemonadmin · March 28, 2023, 1:51pm

We have the same issue. Any suggestions will be appreciated.

spirilis · March 28, 2023, 6:36pm

I have something vaguely similar spamming my logs with 2.6.11, prtb-related:

2023/03/28 18:35:21 [ERROR] error syncing ‘p-89qlv/prtb-z88r4’: handler mgmt-auth-prtb-controller: cannot determine project and cluster from p-89qlv, requeuing
2023/03/28 18:35:21 [ERROR] error syncing ‘p-89qlv/prtb-l4lfp’: handler mgmt-auth-prtb-controller: cannot determine project and cluster from p-89qlv, requeuing
2023/03/28 18:35:21 [ERROR] error syncing ‘p-89qlv/prtb-2rqq7’: handler mgmt-auth-prtb-controller: cannot determine project and cluster from p-89qlv, requeuing
2023/03/28 18:35:21 [ERROR] error syncing ‘p-89qlv/prtb-plm5c’: handler mgmt-auth-prtb-controller: cannot determine project and cluster from p-89qlv, requeuing

spirilis · March 28, 2023, 6:47pm

Fwiw- it looks like these prtb’s are from a project that was deleted, but the project namespace (in the rancher-hosting RKE cluster) is still in a “Terminating” state. Guess this is remnants of an old bug. I am going to try deleting the remaining resources in that namespace.

spirilis · March 28, 2023, 7:12pm

Well if anyone else encounters the specific errors I found, it turns out someone deleted some projects in the cluster and the project-namespaces (IN THE CLUSTER HOSTING RANCHER) were hanging around because the mgmt-auth-prtb-controller finalizer couldn’t complete because the “projectName:” field in the prtb object had the project ID, but not the cluster.

Editing the YAML for the prtb and prepending c-(clusterID): to the project ID in the projectName: field cleared the backlog and now all those project-namespaces successfully terminated, and the logs quit spamming Rancher’s pods.

E.g.:

apiVersion: management.cattle.io/v3
kind: ProjectRoleTemplateBinding
metadata:
  annotations:
    field.cattle.io/creatorId: user-hkvlr
  creationTimestamp: "2020-01-30T15:58:43Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2020-03-16T13:51:30Z"
  finalizers:
  - controller.cattle.io/mgmt-auth-prtb-controller
  generation: 3
  labels:
    cattle.io/creator: norman
  name: prtb-z8vjh
  namespace: p-r7hr6
  resourceVersion: "50750630"
  uid: a18bd8e2-f936-468a-ab63-e4d04e757472
projectName: p-r7hr6
roleTemplateName: project-owner
userName: u-x9pbj
userPrincipalName: local://u-x9pbj

Changing that “projectName” field to:

projectName: c-c4hlm:p-r7hr6

did the trick.

Quick for/awk/sed script to rip through a terminating namespace’s PRTBs to fix:

K8SNS="p-r7hr6"
for prtb in `kubectl -n $K8SNS get projectroletemplatebindings.v3.management.cattle.io | grep ^prtb- | awk '{print $1}'`; do
    kubectl -n $K8SNS get projectroletemplatebindings.v3.management.cattle.io $prtb -o yaml > tmp.yml
    sed -i 's/^projectName: \(.*\)$/projectName: c-c4hlm:\1/' tmp.yml
    kubectl -n $K8SNS apply -f tmp.yml
done

Replace the “c-c4hlm” with the correct cluster ID as needed.

Topic		Replies	Views
Rancher Upgrade v2.7.x or later Instability and Downstream Clusters Flapping Rancher	0	61	October 29, 2024
Rancher2_cluster_v2 Resource \| terrraform \| rke2 errors Rancher	1	802	January 6, 2025
Rancher 1.2.1 Upgrade Errors in Logs Rancher 1.x	0	718	December 16, 2016
Rancher Release v2.6.11 Announcements	1	2071	March 8, 2023
Conflicting with k8s version 1.22.2-3 Rancher	0	1417	May 18, 2022

Rancher flooding logs with errors

Related topics