Hello,
When will Rancher support the vSphere CSI and CPI drivers ?, with the release of ESX 6.7 U3 installing these drivers will allow the vsphere to have the mapping in the vsphere UI of the containers and their PVC’s. This is the prefer method within kubernetes to integrate with providers.
If some one has already managed to get this working I would like to know how ?
here is the details
https://github.com/kubernetes/cloud-provider-vsphere/blob/master/docs/book/tutorials/kubernetes-on-vsphere-with-kubeadm.md
thanks in advance
1 Like
I’ve tried following the steps outlined at https://cloud-provider-vsphere.sigs.k8s.io/tutorials/kubernetes-on-vsphere-with-kubeadm.html; but obviously skipping the creation of the cluster, since Rancher takes care of that.
When applying the CPI manifests, the vsphere-cloud-controller-manager pod is not deployed, because of scheduling rules in the manifest…
I’ll see if I can work around this and report back.
@djpbessems any luck? Was having challenges myself with it back in September. With the VCP essentially end of life, I am concerned with the lack of support/integration of the CSI & CPI with Rancher. Going to start looking at this again myself, just wondering if anyone else has had any success.
The CPI/CSI providers are generic. There are a few differences I have found due to additional taints RKE applies and the fact that all components in RKE need to run in a container.
To start off when you create a cluster, please edit the cluster.yaml in Rancher / RKE with the following tweaks to the kubelet.
kubelet:
fail_swap_on: false
generate_serving_certificate: false
extra_binds:
- /var/lib/csi/sockets/pluginproxy/csi.vsphere.vmware.com:/var/lib/csi/sockets/pluginproxy/csi.vsphere.vmware.com:rshared
- /csi:/csi:rshared
extra_args:
cloud-provider: external
Now when nodes are provisioned via Rancher, you will see the additional taints before installing the CPI.
Now you can use the CPI install instructions.
A minor tweak is needed to the CPI daemonset manifest to allow it tolerate the RKE taints.
tee $HOME/cloud-provider.yaml > /dev/null << EOF
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: cloud-controller-manager
namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: vsphere-cloud-controller-manager
namespace: kube-system
labels:
k8s-app: vsphere-cloud-controller-manager
spec:
selector:
matchLabels:
k8s-app: vsphere-cloud-controller-manager
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
k8s-app: vsphere-cloud-controller-manager
spec:
nodeSelector:
node-role.kubernetes.io/controlplane: "true"
securityContext:
runAsUser: 0
tolerations:
- key: node.cloudprovider.kubernetes.io/uninitialized
value: "true"
effect: NoSchedule
- key: node-role.kubernetes.io/controlplane
value: "true"
effect: NoSchedule
- key: node-role.kubernetes.io/etcd
value: "true"
effect: NoExecute
serviceAccountName: cloud-controller-manager
containers:
- name: vsphere-cloud-controller-manager
image: gcr.io/cloud-provider-vsphere/cpi/release/manager:latest
args:
- --v=2
- --cloud-provider=vsphere
- --cloud-config=/etc/cloud/vsphere.conf
volumeMounts:
- mountPath: /etc/cloud
name: vsphere-config-volume
readOnly: true
resources:
requests:
cpu: 200m
hostNetwork: true
volumes:
- name: vsphere-config-volume
configMap:
name: cloud-config
---
apiVersion: v1
kind: Service
metadata:
labels:
component: cloud-controller-manager
name: vsphere-cloud-controller-manager
namespace: kube-system
spec:
type: NodePort
ports:
- port: 43001
protocol: TCP
targetPort: 43001
selector:
component: cloud-controller-manager
---
EOF
Once the cloud-controller is installed you will see the taints are removed.
A similar tweak is needed for the CSI controller manifest, to allow it to handle RKE taints
tee csi-controller.yaml >/dev/null <<'EOF'
kind: StatefulSet
apiVersion: apps/v1
metadata:
name: vsphere-csi-controller
namespace: kube-system
spec:
serviceName: vsphere-csi-controller
replicas: 1
updateStrategy:
type: "RollingUpdate"
selector:
matchLabels:
app: vsphere-csi-controller
template:
metadata:
labels:
app: vsphere-csi-controller
role: vsphere-csi
spec:
serviceAccountName: vsphere-csi-controller
nodeSelector:
node-role.kubernetes.io/controlplane: "true"
tolerations:
- key: node-role.kubernetes.io/controlplane
value: "true"
effect: NoSchedule
- key: node-role.kubernetes.io/etcd
value: "true"
effect: NoExecute
dnsPolicy: "Default"
containers:
- name: csi-attacher
image: quay.io/k8scsi/csi-attacher:v1.1.1
args:
- "--v=4"
- "--timeout=300s"
- "--csi-address=$(ADDRESS)"
env:
- name: ADDRESS
value: /csi/csi.sock
volumeMounts:
- mountPath: /csi
name: socket-dir
- name: vsphere-csi-controller
image: gcr.io/cloud-provider-vsphere/csi/release/driver:v1.0.1
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "rm -rf /var/lib/csi/sockets/pluginproxy/csi.vsphere.vmware.com"]
args:
- "--v=4"
imagePullPolicy: "Always"
env:
- name: CSI_ENDPOINT
value: unix:///var/lib/csi/sockets/pluginproxy/csi.sock
- name: X_CSI_MODE
value: "controller"
- name: VSPHERE_CSI_CONFIG
value: "/etc/cloud/csi-vsphere.conf"
volumeMounts:
- mountPath: /etc/cloud
name: vsphere-config-volume
readOnly: true
- mountPath: /var/lib/csi/sockets/pluginproxy/
name: socket-dir
ports:
- name: healthz
containerPort: 9808
protocol: TCP
livenessProbe:
httpGet:
path: /healthz
port: healthz
initialDelaySeconds: 10
timeoutSeconds: 3
periodSeconds: 5
failureThreshold: 3
- name: liveness-probe
image: quay.io/k8scsi/livenessprobe:v1.1.0
args:
- "--csi-address=$(ADDRESS)"
env:
- name: ADDRESS
value: /var/lib/csi/sockets/pluginproxy/csi.sock
volumeMounts:
- mountPath: /var/lib/csi/sockets/pluginproxy/
name: socket-dir
- name: vsphere-syncer
image: gcr.io/cloud-provider-vsphere/csi/release/syncer:v1.0.1
args:
- "--v=2"
imagePullPolicy: "Always"
env:
- name: FULL_SYNC_INTERVAL_MINUTES
value: "30"
- name: VSPHERE_CSI_CONFIG
value: "/etc/cloud/csi-vsphere.conf"
volumeMounts:
- mountPath: /etc/cloud
name: vsphere-config-volume
readOnly: true
- name: csi-provisioner
image: quay.io/k8scsi/csi-provisioner:v1.2.2
args:
- "--v=4"
- "--timeout=300s"
- "--csi-address=$(ADDRESS)"
- "--feature-gates=Topology=true"
- "--strict-topology"
env:
- name: ADDRESS
value: /csi/csi.sock
volumeMounts:
- mountPath: /csi
name: socket-dir
volumes:
- name: vsphere-config-volume
secret:
secretName: vsphere-config-secret
- name: socket-dir
hostPath:
path: /var/lib/csi/sockets/pluginproxy/csi.vsphere.vmware.com
type: DirectoryOrCreate
---
apiVersion: storage.k8s.io/v1beta1
kind: CSIDriver
metadata:
name: csi.vsphere.vmware.com
spec:
attachRequired: true
podInfoOnMount: false
EOF
The node drivers dont need any tweaks as they are a standard daemonset.
Post this you should be able to configure a storage class and consume it in your workloads.
1 Like
I am also trying to get this working. I’m starting with a single “master” node running etcd and control plane and a single worker node.
Everything appears to deploy properly, but the driver is not showing in the CSINode object.
Name: myworker
Namespace:
Labels: <none>
Annotations: <none>
API Version: storage.k8s.io/v1
Kind: CSINode
Metadata:
Creation Timestamp: 2020-01-22T14:41:32Z
Owner References:
API Version: v1
Kind: Node
Name: myworker
UID: c3dbd1aa-e3f2-4655-8273-aaaada208a5e
Resource Version: 1175261
Self Link: /apis/storage.k8s.io/v1/csinodes/myworker
UID: a401f00d-9ea2-48ba-8065-247ed25d021e
Spec:
Drivers: <nil>
Events: <none>
Any suggestions?
savage
February 28, 2020, 4:53pm
6
Hi, newbie here, do we use the rancher cloud provider for vsphere when using CSI? Or do we not need it. When I try to create a cluster without the cloud provider the nodes never connect to kubelet.
savage
February 28, 2020, 5:48pm
7
not rancher, kubernetes cloud provider I meant ::facepalm::
Bo-Zi
February 15, 2022, 4:18pm
8
The problem still exists in Rancher 2.6.3.
Thanks you for the workaround!
Reported to Kubernetes SIGs on Git, as they might improve their part too.
opened 04:11PM - 15 Feb 22 UTC
kind/bug
**Is this a BUG REPORT or FEATURE REQUEST?**:
> Uncomment only one, leave it … on its own line:
>
/kind bug
> /kind feature
**What happened**:
Can't install vSphere CSI (100.1.0+up2.3.0) chart in Rancher 2.6.3 RKE.
vsphere-csi-controller Pod stucks in Pending state with error:
`0/2 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had taint {node-role.kubernetes.io/etcd: true}, that the pod didn't tolerate.`
The chart install fails to wait for pod to start.
**What you expected to happen**:
The vSphere CSI (100.1.0+up2.3.0) chart is expected to install normally.
**How to reproduce it (as minimally and precisely as possible)**:
Make a deployment of RKE on vSphere platform with a help of Rancher 2.6.3.
**Anything else we need to know?**:
Look's like a taint toleration mismatch.
The Deployment has the following tolerations:
(Pay attention at etcd toleration "effect: **NoSchedule**")
```
apiVersion: apps/v1
kind: Deployment
metadata:
...
name: vsphere-csi-controller
namespace: kube-system
spec:
...
template:
...
spec:
...
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
operator: Exists
- effect: NoSchedule
key: node-role.kubernetes.io/controlplane
value: "true"
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
value: "true"
- effect: NoSchedule
key: node-role.kubernetes.io/etcd
value: "true"
```
But the master node taints are the following:
("effect: **NoExecute**" but not **NoSchedule** as in tolerations)
```
apiVersion: v1
kind: Node
metadata:
...
labels:
...
node-role.kubernetes.io/controlplane: "true"
node-role.kubernetes.io/etcd: "true"
node-role.kubernetes.io/master: ""
spec:
...
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/controlplane
value: "true"
- effect: NoExecute
key: node-role.kubernetes.io/etcd
value: "true"
```
**Possible solution:**
Adding a toleration "key: node-role.kubernetes.io/etcd; effect: NoExecute" to the vsphere-csi-controller Deployment solved the problem.
_Thanks to_ https://forums.rancher.com/t/kubernetes-cluster-on-vsphere-with-csi-and-cpi/15648/4
**Environment**:
- csi-vsphere version: 100.1.0+up2.3.0
- vsphere-cloud-controller-manager version: 100.1.0+up1.0.100
- Kubernetes version: v1.21.9
- vSphere version: 7.0.1 Build: 17005016
- OS (e.g. from /etc/os-release): Ubuntu 20.04.3 LTS
- Kernel (e.g. `uname -a`): 5.4.0-99-generic
- Install tools: Rancher 2.6.3 + Ubuntu 20.04 cloud image VMDK
- Others: