Longhorn helm upgrade to 1.7.1 stuck

My upgrade job in ns kube-system is stuck with the below log messages. The pods in ns longhorn-system still look intact. However, the failed upgrade degraded longhorn somehow and many pvcs are not working anymore.

Maybe someone knows a proper way out of this situation? This is not a production k3s installation, but I invested some time in getting it where it was and would like not to lose the data stored in longhorn. It’s a 3 server 3 agent cluster and I am using kube-hetzner.

if [[ ${KUBERNETES_SERVICE_HOST} =~ .*:.* ]]; then
    echo "KUBERNETES_SERVICE_HOST is using IPv6"
    CHART="${CHART//%\{KUBERNETES_API\}%/[${KUBERNETES_SERVICE_HOST}]:${KUBERNETES_SERVICE_PORT}}"
else
    CHART="${CHART//%\{KUBERNETES_API\}%/${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}}"
fi

set +v -x
+ [[ '' == \v\2 ]]
+ shopt -s nullglob
+ [[ -f /config/ca-file.pem ]]
+ [[ -f /tmp/ca-file.pem ]]
+ [[ false == \t\r\u\e ]]
+ [[ false == \t\r\u\e ]]
+ [[ -n '' ]]
+ helm_content_decode
+ set -e
+ ENC_CHART_PATH=/chart/longhorn.tgz.base64
+ CHART_PATH=/tmp/longhorn.tgz
+ [[ ! -f /chart/longhorn.tgz.base64 ]]
+ return
+ [[ install != \d\e\l\e\t\e ]]
+ helm_repo_init
+ grep -q -e 'https\?://'
+ [[ longhorn/longhorn == stable/* ]]
+ [[ -n https://charts.longhorn.io ]]
+ [[ -f /auth/username ]]
+ [[ -f /auth/tls.crt ]]
+ helm repo add longhorn https://charts.longhorn.io
"longhorn" already exists with the same configuration, skipping
+ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "longhorn" chart repository
Update Complete. ÔÄêHappy Helming!ÔÄê
+ helm_update install --namespace longhorn-system --version '*'
++ ++ jq -r '"\(.[0].chart),\(.[0].status)"'
helm ls --all -f '^longhorn$' --namespace longhorn-system --output json
++ tr '' ''
+ LINE=longhorn-1.7.1,uninstalling
+ IFS=,
+ read -r INSTALLED_VERSION STATUS _
+ VALUES=
+ for VALUES_FILE in /config/*.yaml
+ VALUES=' --values /config/values-01_HelmChart.yaml'
+ [[ install = \d\e\l\e\t\e ]]
+ [[ longhorn-1.7.1 =~ ^(|null)$ ]]
+ [[ uninstalling =~ ^(pending-install|pending-upgrade|pending-rollback|uninstalling)$ ]]
Previous helm job was interrupted, updating status from uninstalling to failed
+ echo Previous helm job was interrupted, updating status from uninstalling to failed
+ echo 'Resetting helm release status from '\''uninstalling'\'' to '\''failed'\'''
+ helm set-status longhorn failed --namespace longhorn-system
2024/09/24 05:12:14 release longhorn status updated
+ [[ uninstalling == \p\e\n\d\i\n\g\-\u\p\g\r\a\d\e ]]
+ STATUS=failed
+ [[ failed =~ ^deployed$ ]]
+ [[ failed =~ ^(deleted|failed|null|unknown)$ ]]
+ [[ reinstall == \r\e\i\n\s\t\a\l\l ]]
+ echo 'Uninstalling failed helm chart'
+ helm uninstall longhorn --namespace longhorn-system --wait

This log looks weird… Have you tried to uninstall Longhorn before this upgrade? Or was this upgrade somehow trying to do uninstallation then re-installation with the new version? Would you describe more about how you do the upgrade?

Thanks, I was finally able to get 1.7.1 to install, but only by using

kubectl edit settings.longhorn.io deleting-confirmation-flag -n longhorn-system

That, as I suspected, removed all volumes, but at least produced a working longhorn system.

The upgrade was initiated by k3s or maybe something that the GitHub - kube-hetzner/terraform-hcloud-kube-hetzner: Optimized and Maintenance-free Kubernetes on Hetzner Cloud in one command! project installs - it is quiet the blackbox for me and I only activated longhorn in the kube.tf file of this project.

I still have replicas in /var/longhorn but cannot mount them, they show numerous file system errors. I was able to recover some of my data. However, since this is a hobby project not much was lost. It seems however good advice to backup longhorn data before installing a new release - which I did not.

We encountered a similar issue before, which is caused by helm controller FAILURE_POLICY: reinstall. I suspected that this is also the culprit in your case since kube-hetzner is using Helm Controller inside k3s as well. We are investigating how we can improve it.

Ref the Longhorn ticket [FEATURE] Support/verify installing Longhorn Helm chart using helm-controller which is built in k3s and rke2 · Issue #9506 · longhorn/longhorn · GitHub

1 Like

I’m guessing this crappy forum website software did something to mangle what you posted.
When you ran the kubectl edit command, what did you change? Did you delete the block of yaml that contained “deleting-confirmation-flag” ?