Upgrade from Rancher 2.8.3 to Rancher 2.9.1

dbrancher · September 10, 2024, 10:56pm

After I upgraded from 2.8.3 to Rancher 2.9.1
the gitjob pod is crashloopbackoff error, because of this
kubectl describe pod gitjob-7d55659c45-4ksnb -n cattle-fleet-system
Name: gitjob-7d55659c45-4ksnb
Namespace: cattle-fleet-system
Priority: 1000000000
Priority Class Name: rancher-critical
Node: ip-10-98-4-221.ec2.internal/10.98.4.221
Start Time: Tue, 10 Sep 2024 15:10:29 -0700
Labels: app=gitjob
fleet.cattle.io/shard-default=true
fleet.cattle.io/shard-id=
pod-template-hash=7d55659c45
Annotations: kubectl.kubernetes.io/restartedAt: 2024-09-10T15:10:29-07:00
Status: Running
IP: 10.98.5.196
IPs:
IP: 10.98.5.196
Controlled By: ReplicaSet/gitjob-7d55659c45
Containers:
gitjob:
Container ID: containerd://a1646204257c3b3976ef6e1e2f9358e658b850b106dcf6ebbb05731260166a03
Image: rancher/fleet:v0.10.1
Image ID: docker.io/rancher/fleet@sha256:51452abec82be84dce773f04e8dcad6a0ec1acfc2a47ff257fac51a82b4b4c6f
Port: 8081/TCP
Host Port: 0/TCP
Command:
gitjob
–gitjob-image
rancher/tekton-utils:v0.1.11
Args:
fleetcontroller
gitjob
–gitjob-image
rancher/fleet:v0.10.1
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: StartError
Message: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: “gitjob”: executable file not found in $PATH: unknown
Exit Code: 128
Started: Wed, 31 Dec 1969 16:00:00 -0800
Finished: Tue, 10 Sep 2024 15:11:14 -0700
Ready: False
Restart Count: 3
Environment:
NAMESPACE: cattle-fleet-system (v1:metadata.namespace)
CATTLE_ELECTION_LEASE_DURATION: 30s
CATTLE_ELECTION_RETRY_PERIOD: 10s
CATTLE_ELECTION_RENEW_DEADLINE: 25s
GITREPO_RECONCILER_WORKERS: 1
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-tjhbq (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-tjhbq:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: cattle.io/os=linux:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s

Looks like rancher/fleet image or the command args is not finding the executable
How can I fix it.

scheibling · September 12, 2024, 10:30pm

TL;DR: Delete the gitjob deployment and re-apply the helm chart

Ran into the same issue now, seems that the helm chart isn’t actually removing the old gitjob-deployment and instead merging the new deployment into it.

Additionally, at some point, it looks like someone switched the definition for the commands for that pod from the “command” parameter (entrypoint fleetcontroller and command gitjob …) to the “args” parameter (entrypoint empty, args fleetcontroller gitjob …).

This combination results in the pod trying to start the command “gitjob … fleetcontroller gitjob …”, which fails since gitjob is a subcommand of fleetcontroller

dbrancher · September 18, 2024, 10:44pm

Hi thanks,

I deleted the gitjob deployment

i ran this
helm upgrade rancher rancher-stable/rancher --namespace cattle-system -f rancher-sb-283-values.yaml --version=2.9.1

but didn’t see the gitjob deployment come back
do you mean reapply the gitjob helm chart or something like that.

scheibling · September 23, 2024, 7:15am

Correct, you need to re-run the upgrade command for Rancher once you’ve deleted the gitjob deployment

Topic		Replies	Views
Cattle-cluster-agent in CrashLoopBackoff state Rancher	1	6806	February 13, 2021
Rancher in CrashLoopBackOff cycle Rancher	0	1310	November 30, 2020
Cattle-system status CrashLoopBackOff Rancher	6	13308	June 7, 2022
Cattle-pods failing Rancher	2	1789	October 25, 2019
Two of the rancher pods are in error on rke2 kubernetes cluster Rancher	0	45	December 5, 2024

Upgrade from Rancher 2.8.3 to Rancher 2.9.1

Related topics