Hi! I am pretty new to Kubernetes in general, and I’ve been tinkering with a 3 worker node Rancher cluster for the last two months. Lab use, just to learn. I have had a lot of trouble with NFS for PVC (specifically permissions related issues after the image creates a few directories down and sets ownership, regardless of uid/gid set), and bind mounting from the node isn’t really scalable - which led me to Longhorn.
I’ve read the documentation that is available, and watched the available couple related Rancher Labs videos, but I still have a number of questions based on my uncertainty of what is expected behavior.
In the Cloud Native 2019 video, it is noted that a statefulset is preferable to a deployment for use with Longhorn due to the need to maintain one mounted volume to one node (RWO). When deployed as a statefulset, can it be scaled beyond one pod/node utilizing the block level replication to line up the volumes?
Edit To clarify - in testing I have been unable to scale beyond one live pod, every other pod regardless of node gets stuck at unable to attach volumes.
Is it designed to operate that way? If not:
I: How can you scale to multiple pods or nodes? II: What is the point of defining as a statefulset vs just a single pod deployment? III: Is there any way to switch nodes in this scenario other than detaching the volume from Longhorn UI and then reattaching to the node with the desired pod? IV: What is the point of the replicas? Just for data resiliency? *Edit *Or HA in the scenario described below under #V? V: In all of my tests, I’ve never been able to get a volume to re-attach to a new pod without forcefully deleting the pod and detaching the volume in a failed node scenario. I see the expected 5 minutes, some time, 6 minutes etc documented under the node failed section – but unless I manually intervene, nothing changes. I assume this is because as a statefulset, the pods come up 1 by 1, and since the last pod is unknown, it won’t finish creating. Is that correct/expected? *Edit* Re-reading the document, I see that it is, although that final six minutes step I have not seen, I've had to detach the volume and re-attach it to another node. Is there no better approach I can take to automate this? VI: I’ve noticed many examples of Longhorn using deployments, despite this. I see that as long as you lock your pods to the one node that has the mounted volumes, this works at scale. I assume these examples are just for show, and this is not desirable as it could lead to data corruption (1 node, 3x pods, 1x volume). Is that correct?