Longhorn helm upgrade to 1.7.1 stuck

My upgrade job in ns kube-system is stuck with the below log messages. The pods in ns longhorn-system still look intact. However, the failed upgrade degraded longhorn somehow and many pvcs are not working anymore.

Maybe someone knows a proper way out of this situation? This is not a production k3s installation, but I invested some time in getting it where it was and would like not to lose the data stored in longhorn. It’s a 3 server 3 agent cluster and I am using kube-hetzner.

if [[ ${KUBERNETES_SERVICE_HOST} =~ .*:.* ]]; then
    echo "KUBERNETES_SERVICE_HOST is using IPv6"
    CHART="${CHART//%\{KUBERNETES_API\}%/[${KUBERNETES_SERVICE_HOST}]:${KUBERNETES_SERVICE_PORT}}"
else
    CHART="${CHART//%\{KUBERNETES_API\}%/${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}}"
fi

set +v -x
+ [[ '' == \v\2 ]]
+ shopt -s nullglob
+ [[ -f /config/ca-file.pem ]]
+ [[ -f /tmp/ca-file.pem ]]
+ [[ false == \t\r\u\e ]]
+ [[ false == \t\r\u\e ]]
+ [[ -n '' ]]
+ helm_content_decode
+ set -e
+ ENC_CHART_PATH=/chart/longhorn.tgz.base64
+ CHART_PATH=/tmp/longhorn.tgz
+ [[ ! -f /chart/longhorn.tgz.base64 ]]
+ return
+ [[ install != \d\e\l\e\t\e ]]
+ helm_repo_init
+ grep -q -e 'https\?://'
+ [[ longhorn/longhorn == stable/* ]]
+ [[ -n https://charts.longhorn.io ]]
+ [[ -f /auth/username ]]
+ [[ -f /auth/tls.crt ]]
+ helm repo add longhorn https://charts.longhorn.io
"longhorn" already exists with the same configuration, skipping
+ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "longhorn" chart repository
Update Complete. ÔÄêHappy Helming!ÔÄê
+ helm_update install --namespace longhorn-system --version '*'
++ ++ jq -r '"\(.[0].chart),\(.[0].status)"'
helm ls --all -f '^longhorn$' --namespace longhorn-system --output json
++ tr '' ''
+ LINE=longhorn-1.7.1,uninstalling
+ IFS=,
+ read -r INSTALLED_VERSION STATUS _
+ VALUES=
+ for VALUES_FILE in /config/*.yaml
+ VALUES=' --values /config/values-01_HelmChart.yaml'
+ [[ install = \d\e\l\e\t\e ]]
+ [[ longhorn-1.7.1 =~ ^(|null)$ ]]
+ [[ uninstalling =~ ^(pending-install|pending-upgrade|pending-rollback|uninstalling)$ ]]
Previous helm job was interrupted, updating status from uninstalling to failed
+ echo Previous helm job was interrupted, updating status from uninstalling to failed
+ echo 'Resetting helm release status from '\''uninstalling'\'' to '\''failed'\'''
+ helm set-status longhorn failed --namespace longhorn-system
2024/09/24 05:12:14 release longhorn status updated
+ [[ uninstalling == \p\e\n\d\i\n\g\-\u\p\g\r\a\d\e ]]
+ STATUS=failed
+ [[ failed =~ ^deployed$ ]]
+ [[ failed =~ ^(deleted|failed|null|unknown)$ ]]
+ [[ reinstall == \r\e\i\n\s\t\a\l\l ]]
+ echo 'Uninstalling failed helm chart'
+ helm uninstall longhorn --namespace longhorn-system --wait

This log looks weird… Have you tried to uninstall Longhorn before this upgrade? Or was this upgrade somehow trying to do uninstallation then re-installation with the new version? Would you describe more about how you do the upgrade?

Thanks, I was finally able to get 1.7.1 to install, but only by using

kubectl edit settings.longhorn.io deleting-confirmation-flag -n longhorn-system

That, as I suspected, removed all volumes, but at least produced a working longhorn system.

The upgrade was initiated by k3s or maybe something that the GitHub - kube-hetzner/terraform-hcloud-kube-hetzner: Optimized and Maintenance-free Kubernetes on Hetzner Cloud in one command! project installs - it is quiet the blackbox for me and I only activated longhorn in the kube.tf file of this project.

I still have replicas in /var/longhorn but cannot mount them, they show numerous file system errors. I was able to recover some of my data. However, since this is a hobby project not much was lost. It seems however good advice to backup longhorn data before installing a new release - which I did not.

We encountered a similar issue before, which is caused by helm controller FAILURE_POLICY: reinstall. I suspected that this is also the culprit in your case since kube-hetzner is using Helm Controller inside k3s as well. We are investigating how we can improve it.

Ref the Longhorn ticket [FEATURE] Support/verify installing Longhorn Helm chart using helm-controller which is built in k3s and rke2 · Issue #9506 · longhorn/longhorn · GitHub

1 Like