Node does not exist

Created a new pvc today. first one since upgrading to 1.1.1. Not sure if it’s related (would lean more to NOT being related but figured i would mention it.)

Warning  FailedAttachVolume  <invalid> (x11 over 6m)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-22cf1f37-f0f2-4203-9918-904816a80841" : rpc error: code = NotFound desc = ControllerPublishVolume: the node kube-1 does not exist

I have tried deleting the pod and tried to get it to run somewhere else. Same thing for kube-2 and kube-5.

The longhorn UI shows all the nodes even the ones it says are not found.

Has anyone seen this?

Hmm the error doesn’t look familiar. It could just be temporarily blocked. If you can provide us your support bundle then we can have a better idea of what is happening.

You can attach the support bundle here or send it to longhorn-support-bundle@Suse.com with the issue number - 20666.

I have sent over support bundle. I saw this happen with a couple nodes. So not sure if I have something SERIOUSLY wrong in the underlying cluster or what. Seems to only be happening with this new storage request. All my existing ones seems to be fine.

Thank you @CrankyCoder , we’ve received it. Can you also help to create a bug report with the reproducing steps and environment info?

I can, but I don’t know how it manifested to begin with. So I am unsure if it’s a bug or something wrong with my cluster or what.

Is it reproducible on a new cluster? Are you able to add a new node and a volume to attach to the new node to see if it’s schedulable?

We also saw quite of logs coming from the metrics server. Are you able to disable it and see if that makes a difference?

2021-07-07T15:02:08.240143213Z time="2021-07-07T15:02:08Z" level=warning msg="error during scrape" collector=node error="the server could not find the requested resource (get nodes.metrics.k8s.io)" node=kube-1

I am building a new cluster starting today. Longhorn will be the first “workload” to go into it as soon as it’s online.

Hello there!
I encountered the same issue that had been solved by redeploying a bit later. So I cannot provide more information on the topic but just wanted to check if you were able to solve this on your side as well eventually ?
Cheers

What version of Longhorn are you using?

The ControllerPublishVolume means sends the actual attach request to the longhorn api.

Could you run kubectl get csinodes to see if the target node is in the list?

Im having the same issues
image

I Created a new bug here [BUG] LongHorn is not detecting any nodes after 16 · Issue #7329 · longhorn/longhorn · GitHub

i can not afford to recreate a new cluster as most of my pods in this are Production… But i really need answers as i can not complete the setup

we are follow up your report in the github ticket