My organization has built an on-site data-analytics platform using Rancher to manage the worker instances. We store Parquet files used for analysis and archive using Minio.
We started off running Minio inside Kubernetes, but we ran into the same kind of challenges you describe in terms of maintaining the veracity of persistent volumes—though we’ve gotten much better at wrangling those in our VSphere cloud provider since we began—when we were developing the platform it gave me pause about what I would have to do to ensure all my long-term storage was accessible in disaster or time-sensitive scenarios.
I decided ultimately NOT to use Kubernetes for this, but to run Minio as a VM in the same network segment as Rancher/k8s, but not managed by it. This has worked out very well. I would encourage you to consider that while k8s is an awesome, super powerful tool, it is not the right tool for every job. Sometimes simplicity in one area gives you the ability to manage complexity in another one.