Upgrade from Rancher 2.8.3 to Rancher 2.9.1

After I upgraded from 2.8.3 to Rancher 2.9.1
the gitjob pod is crashloopbackoff error, because of this
kubectl describe pod gitjob-7d55659c45-4ksnb -n cattle-fleet-system
Name: gitjob-7d55659c45-4ksnb
Namespace: cattle-fleet-system
Priority: 1000000000
Priority Class Name: rancher-critical
Node: ip-10-98-4-221.ec2.internal/10.98.4.221
Start Time: Tue, 10 Sep 2024 15:10:29 -0700
Labels: app=gitjob
fleet.cattle.io/shard-default=true
fleet.cattle.io/shard-id=
pod-template-hash=7d55659c45
Annotations: kubectl.kubernetes.io/restartedAt: 2024-09-10T15:10:29-07:00
Status: Running
IP: 10.98.5.196
IPs:
IP: 10.98.5.196
Controlled By: ReplicaSet/gitjob-7d55659c45
Containers:
gitjob:
Container ID: containerd://a1646204257c3b3976ef6e1e2f9358e658b850b106dcf6ebbb05731260166a03
Image: rancher/fleet:v0.10.1
Image ID: docker.io/rancher/fleet@sha256:51452abec82be84dce773f04e8dcad6a0ec1acfc2a47ff257fac51a82b4b4c6f
Port: 8081/TCP
Host Port: 0/TCP
Command:
gitjob
–gitjob-image
rancher/tekton-utils:v0.1.11
Args:
fleetcontroller
gitjob
–gitjob-image
rancher/fleet:v0.10.1
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: StartError
Message: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: “gitjob”: executable file not found in $PATH: unknown
Exit Code: 128
Started: Wed, 31 Dec 1969 16:00:00 -0800
Finished: Tue, 10 Sep 2024 15:11:14 -0700
Ready: False
Restart Count: 3
Environment:
NAMESPACE: cattle-fleet-system (v1:metadata.namespace)
CATTLE_ELECTION_LEASE_DURATION: 30s
CATTLE_ELECTION_RETRY_PERIOD: 10s
CATTLE_ELECTION_RENEW_DEADLINE: 25s
GITREPO_RECONCILER_WORKERS: 1
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-tjhbq (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-tjhbq:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: cattle.io/os=linux:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s

Looks like rancher/fleet image or the command args is not finding the executable
How can I fix it.

TL;DR: Delete the gitjob deployment and re-apply the helm chart

Ran into the same issue now, seems that the helm chart isn’t actually removing the old gitjob-deployment and instead merging the new deployment into it.

Additionally, at some point, it looks like someone switched the definition for the commands for that pod from the “command” parameter (entrypoint fleetcontroller and command gitjob …) to the “args” parameter (entrypoint empty, args fleetcontroller gitjob …).

This combination results in the pod trying to start the command “gitjob … fleetcontroller gitjob …”, which fails since gitjob is a subcommand of fleetcontroller

Hi thanks,

I deleted the gitjob deployment

i ran this
helm upgrade rancher rancher-stable/rancher --namespace cattle-system -f rancher-sb-283-values.yaml --version=2.9.1

but didn’t see the gitjob deployment come back
do you mean reapply the gitjob helm chart or something like that.

Correct, you need to re-run the upgrade command for Rancher once you’ve deleted the gitjob deployment