Rancher 2.0 and Apache Spark 2.3

Has anyone had any luck using Apache Spark 2.3 (v2.3.1) and Rancher 2.0 (v2.0.2)? I am trying to use the new functionality added in Apache Spark 2.3 to launch Spark jobs in Kubernetes without creating a standalone Spark server. This means that on each spark-submit a new spark-master pod and spark-executor pods will be created when a job is submitted (and then disappear when the job completes). So far, I have been using this command when attempting to connect to the K8s cluster that Rancher is deploying.

bin/spark-submit
–master k8s://https://{rancher_host}/k8s/clusters/c-vpvlm
–deploy-mode cluster
–name spark-pi
–class org.apache.spark.examples.SparkPi
–conf spark.kubernetes.namespace=spark
–conf spark.executor.instances=5
–conf spark.kubernetes.container.image={local repo path}/spark:v2.3.1
–conf spark.kubernetes.authenticate.submission.caCertFile={ca_file_path}.pem
–conf spark.kubernetes.authenticate.submission.oauthToken={oauth_token}
local:///tmp/spark-2.3.1/examples/jars/spark-examples_2.11-2.3.1.jar

However, this has been resulting in a Timeout Exception when attempting to connect to the Kubernetes cluster. Admittedly, myself (and my group) are still in the Kubernetes/Rancher PoC process, so I may have something mis-configured here and/or it may be something weird with Spark since this is currently an “experimental” feature (but I have seen it work with base K8s). Regardless, it would be great to get this working with Rancher since Spark is such a large part of the current Big Data stack.

Edit: It’s also worth noting that I also tried the host listed at https://{rancher_host}/k8s/clusters/c-vpvlm/api with the tcp/6443 port and got a 403 Forbidden in response.

Reference: https://spark.apache.org/docs/2.3.1/running-on-kubernetes.html

I’m having the exact same issue.

2018-07-03 18:12:56 WARN WatchConnectionManager:192 - Exec Failure
java.net.SocketTimeoutException: timeout
at okio.Okio$4.newTimeoutException(Okio.java:230)
at okio.AsyncTimeout.exit(AsyncTimeout.java:285)
at okio.AsyncTimeout$2.read(AsyncTimeout.java:241)
at okio.RealBufferedSource.indexOf(RealBufferedSource.java:345)
at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:217)
at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:211)
at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189)
at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:75)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:135)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketException: Socket closed
at java.net.SocketInputStream.read(SocketInputStream.java:204)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
at okio.Okio$2.read(Okio.java:139)
at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)
… 26 more
Exception in thread “main” io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:194)
at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543)
at okhttp3.internal.ws.RealWebSocket$2.onFailure(RealWebSocket.java:208)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:148)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: timeout
at okio.Okio$4.newTimeoutException(Okio.java:230)
at okio.AsyncTimeout.exit(AsyncTimeout.java:285)
at okio.AsyncTimeout$2.read(AsyncTimeout.java:241)
at okio.RealBufferedSource.indexOf(RealBufferedSource.java:345)
at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:217)
at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:211)
at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189)
at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:75)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:135)
… 4 more
Caused by: java.net.SocketException: Socket closed
at java.net.SocketInputStream.read(SocketInputStream.java:204)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
at okio.Okio$2.read(Okio.java:139)
at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)
… 26 more
2018-07-03 18:12:56 INFO ShutdownHookManager:54 - Shutdown hook called
2018-07-03 18:12:56 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-dc9b119a-ccc3-44e5-b9ec-a59e6b79fce1

Hi @robatts, it works fine for us using Spark v2.3.1 and Rancher v2.0.8, probably your certFile and oatuhToken are not correct, instead you can set you kubeconfig file for your rancher k8s cluster and the kubeconfig file can be retrieved through the rancher UI, you can verify with $kubectl cluster-info to make sure the kubelet can access the k8s cluster correctly, thanks.

Also, you will need to make the following change to resolve some additional run-time issues.

  1. the examples/jars/spark-examples_2.11-2.3.1.jar from the spark will encounter a runtime issue, instead we change it to https://github.com/JWebDev/spark/raw/master/spark-examples_2.11-2.3.1.jar
  2. you will need add clusterrolebinding for the default user with $kubectl apply -f spark-clusterrole.yaml as below.
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: spark-cluster-rolebinding
subjects:
  - kind: ServiceAccount
    # Reference to upper's `metadata.name`
    name: default
    # Reference to upper's `metadata.namespace`
    namespace: default
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io