Kubernetes Cluster on vSphere with CSI and CPI

Hello,
When will Rancher support the vSphere CSI and CPI drivers ?, with the release of ESX 6.7 U3 installing these drivers will allow the vsphere to have the mapping in the vsphere UI of the containers and their PVC’s. This is the prefer method within kubernetes to integrate with providers.

If some one has already managed to get this working I would like to know how ?

here is the details
https://github.com/kubernetes/cloud-provider-vsphere/blob/master/docs/book/tutorials/kubernetes-on-vsphere-with-kubeadm.md

thanks in advance

1 Like

I’ve tried following the steps outlined at https://cloud-provider-vsphere.sigs.k8s.io/tutorials/kubernetes-on-vsphere-with-kubeadm.html; but obviously skipping the creation of the cluster, since Rancher takes care of that.
When applying the CPI manifests, the vsphere-cloud-controller-manager pod is not deployed, because of scheduling rules in the manifest…
I’ll see if I can work around this and report back.

@djpbessems any luck? Was having challenges myself with it back in September. With the VCP essentially end of life, I am concerned with the lack of support/integration of the CSI & CPI with Rancher. Going to start looking at this again myself, just wondering if anyone else has had any success.

The CPI/CSI providers are generic. There are a few differences I have found due to additional taints RKE applies and the fact that all components in RKE need to run in a container.

To start off when you create a cluster, please edit the cluster.yaml in Rancher / RKE with the following tweaks to the kubelet.

    kubelet:
      fail_swap_on: false
      generate_serving_certificate: false
      extra_binds:
        - /var/lib/csi/sockets/pluginproxy/csi.vsphere.vmware.com:/var/lib/csi/sockets/pluginproxy/csi.vsphere.vmware.com:rshared
        - /csi:/csi:rshared
      extra_args:
        cloud-provider: external

Now when nodes are provisioned via Rancher, you will see the additional taints before installing the CPI.

Now you can use the CPI install instructions.

A minor tweak is needed to the CPI daemonset manifest to allow it tolerate the RKE taints.

tee $HOME/cloud-provider.yaml > /dev/null << EOF
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cloud-controller-manager
  namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: vsphere-cloud-controller-manager
  namespace: kube-system
  labels:
    k8s-app: vsphere-cloud-controller-manager
spec:
  selector:
    matchLabels:
      k8s-app: vsphere-cloud-controller-manager
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        k8s-app: vsphere-cloud-controller-manager
    spec:
      nodeSelector:
        node-role.kubernetes.io/controlplane: "true"
      securityContext:
        runAsUser: 0
      tolerations:
      - key: node.cloudprovider.kubernetes.io/uninitialized
        value: "true"
        effect: NoSchedule
      - key: node-role.kubernetes.io/controlplane
        value: "true"
        effect: NoSchedule
      - key: node-role.kubernetes.io/etcd
        value: "true"
        effect: NoExecute
      serviceAccountName: cloud-controller-manager
      containers:
        - name: vsphere-cloud-controller-manager
          image: gcr.io/cloud-provider-vsphere/cpi/release/manager:latest
          args:
            - --v=2
            - --cloud-provider=vsphere
            - --cloud-config=/etc/cloud/vsphere.conf
          volumeMounts:
            - mountPath: /etc/cloud
              name: vsphere-config-volume
              readOnly: true
          resources:
            requests:
              cpu: 200m
      hostNetwork: true
      volumes:
      - name: vsphere-config-volume
        configMap:
          name: cloud-config
---
apiVersion: v1
kind: Service
metadata:
  labels:
    component: cloud-controller-manager
  name: vsphere-cloud-controller-manager
  namespace: kube-system
spec:
  type: NodePort
  ports:
    - port: 43001
      protocol: TCP
      targetPort: 43001
  selector:
    component: cloud-controller-manager
---
EOF

Once the cloud-controller is installed you will see the taints are removed.

A similar tweak is needed for the CSI controller manifest, to allow it to handle RKE taints

tee csi-controller.yaml >/dev/null <<'EOF'
kind: StatefulSet
apiVersion: apps/v1
metadata:
  name: vsphere-csi-controller
  namespace: kube-system
spec:
  serviceName: vsphere-csi-controller
  replicas: 1
  updateStrategy:
    type: "RollingUpdate"
  selector:
    matchLabels:
      app: vsphere-csi-controller
  template:
    metadata:
      labels:
        app: vsphere-csi-controller
        role: vsphere-csi
    spec:
      serviceAccountName: vsphere-csi-controller
      nodeSelector:
        node-role.kubernetes.io/controlplane: "true"
      tolerations:
        - key: node-role.kubernetes.io/controlplane
          value: "true"
          effect: NoSchedule
        - key: node-role.kubernetes.io/etcd
          value: "true"
          effect: NoExecute
      dnsPolicy: "Default"
      containers:
        - name: csi-attacher
          image: quay.io/k8scsi/csi-attacher:v1.1.1
          args:
            - "--v=4"
            - "--timeout=300s"
            - "--csi-address=$(ADDRESS)"
          env:
            - name: ADDRESS
              value: /csi/csi.sock
          volumeMounts:
            - mountPath: /csi
              name: socket-dir
        - name: vsphere-csi-controller
          image: gcr.io/cloud-provider-vsphere/csi/release/driver:v1.0.1
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "rm -rf /var/lib/csi/sockets/pluginproxy/csi.vsphere.vmware.com"]
          args:
            - "--v=4"
          imagePullPolicy: "Always"
          env:
            - name: CSI_ENDPOINT
              value: unix:///var/lib/csi/sockets/pluginproxy/csi.sock
            - name: X_CSI_MODE
              value: "controller"
            - name: VSPHERE_CSI_CONFIG
              value: "/etc/cloud/csi-vsphere.conf"
          volumeMounts:
            - mountPath: /etc/cloud
              name: vsphere-config-volume
              readOnly: true
            - mountPath: /var/lib/csi/sockets/pluginproxy/
              name: socket-dir
          ports:
            - name: healthz
              containerPort: 9808
              protocol: TCP
          livenessProbe:
            httpGet:
              path: /healthz
              port: healthz
            initialDelaySeconds: 10
            timeoutSeconds: 3
            periodSeconds: 5
            failureThreshold: 3
        - name: liveness-probe
          image: quay.io/k8scsi/livenessprobe:v1.1.0
          args:
            - "--csi-address=$(ADDRESS)"
          env:
            - name: ADDRESS
              value: /var/lib/csi/sockets/pluginproxy/csi.sock
          volumeMounts:
            - mountPath: /var/lib/csi/sockets/pluginproxy/
              name: socket-dir
        - name: vsphere-syncer
          image: gcr.io/cloud-provider-vsphere/csi/release/syncer:v1.0.1
          args:
            - "--v=2"
          imagePullPolicy: "Always"
          env:
            - name: FULL_SYNC_INTERVAL_MINUTES
              value: "30"
            - name: VSPHERE_CSI_CONFIG
              value: "/etc/cloud/csi-vsphere.conf"
          volumeMounts:
            - mountPath: /etc/cloud
              name: vsphere-config-volume
              readOnly: true
        - name: csi-provisioner
          image: quay.io/k8scsi/csi-provisioner:v1.2.2
          args:
            - "--v=4"
            - "--timeout=300s"
            - "--csi-address=$(ADDRESS)"
            - "--feature-gates=Topology=true"
            - "--strict-topology"
          env:
            - name: ADDRESS
              value: /csi/csi.sock
          volumeMounts:
            - mountPath: /csi
              name: socket-dir
      volumes:
        - name: vsphere-config-volume
          secret:
            secretName: vsphere-config-secret
        - name: socket-dir
          hostPath:
            path: /var/lib/csi/sockets/pluginproxy/csi.vsphere.vmware.com
            type: DirectoryOrCreate
---
apiVersion: storage.k8s.io/v1beta1
kind: CSIDriver
metadata:
  name: csi.vsphere.vmware.com
spec:
  attachRequired: true
  podInfoOnMount: false
EOF

The node drivers dont need any tweaks as they are a standard daemonset.

Post this you should be able to configure a storage class and consume it in your workloads.

1 Like

I am also trying to get this working. I’m starting with a single “master” node running etcd and control plane and a single worker node.
Everything appears to deploy properly, but the driver is not showing in the CSINode object.

    Name:         myworker
    Namespace:    
    Labels:       <none>
    Annotations:  <none>
    API Version:  storage.k8s.io/v1
    Kind:         CSINode
    Metadata:
      Creation Timestamp:  2020-01-22T14:41:32Z
      Owner References:
        API Version:     v1
        Kind:            Node
        Name:            myworker
        UID:             c3dbd1aa-e3f2-4655-8273-aaaada208a5e
      Resource Version:  1175261
      Self Link:         /apis/storage.k8s.io/v1/csinodes/myworker
      UID:               a401f00d-9ea2-48ba-8065-247ed25d021e
    Spec:
      Drivers:  <nil>
    Events:     <none>

Any suggestions?

Hi, newbie here, do we use the rancher cloud provider for vsphere when using CSI? Or do we not need it. When I try to create a cluster without the cloud provider the nodes never connect to kubelet.

not rancher, kubernetes cloud provider I meant ::facepalm::

The problem still exists in Rancher 2.6.3.
Thanks you for the workaround!
Reported to Kubernetes SIGs on Git, as they might improve their part too.