Re: Spark 3 pod template for the driver

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Spark 3 pod template for the driver

Michel Sumbul
Hello,

Adding the dev mailing list maybe there is someone here that can help to have/show a valid/accepted pod template for spark 3?

Thanks in advance,
Michel


Le ven. 26 juin 2020 à 14:03, Michel Sumbul <[hidden email]> a écrit :
Hi Jorge,
If I set that in the spark submit command it works but I want it only in the pod template file.

Best regards,
Michel

Le ven. 26 juin 2020 à 14:01, Jorge Machado <[hidden email]> a écrit :
Try to set spark.kubernetes.container.image

On 26. Jun 2020, at 14:58, Michel Sumbul <[hidden email]> wrote:

Hi guys,

I try to use Spark 3 on top of Kubernetes and to specify a pod template for the driver.

Here is my pod manifest or the driver and when I do a spark-submit with the option:
--conf spark.kubernetes.driver.podTemplateFile=/data/k8s/podtemplate_driver3.yaml

I got the error message that I need to specify an image, but it's the manifest.
Does my manifest file is wrong, How should it look like?

Thanks for your help,
Michel

--------
The pod manifest:

apiVersion: v1
kind: Pod
metadata:
  name: mySpark3App
  labels:
    app: mySpark3App
    customlabel/app-id: "1"
spec:
  securityContext:
    runAsUser: 1000
  volumes:
    - name: "test-volume"
      emptyDir: {}
  containers:
    - name: spark3driver
      image: mydockerregistry.example.com/images/dev/spark3:latest
      instances: 1
      resources:
        requests:
          cpu: "1000m"
          memory: "512Mi"
        limits:
          cpu: "1000m"
          memory: "512Mi"
      volumeMounts:
       - name: "test-volume"
         mountPath: "/tmp"

Reply | Threaded
Open this post in threaded view
|

Re: Spark 3 pod template for the driver

edeesis
If I could muster a guess, you still need to specify the executor image. As
is, this will only specify the driver image.

You can specify it as --conf spark.kubernetes.container.image or --conf
spark.kubernetes.executor.container.image



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark 3 pod template for the driver

Michel Sumbul
Hi Edeesis,

The goal is to not have these settings in the spark submit command. If I specify the same things in a pod template for the executor, I still got the message:
"Exception in thread "main" org.apache.spark.SparkException "Must specify the driver container image"

it even don't try to start an executor container as the driver is not started yet.
Any idea?

Thanks,
Michel

Le mar. 30 juin 2020 à 00:06, edeesis <[hidden email]> a écrit :
If I could muster a guess, you still need to specify the executor image. As
is, this will only specify the driver image.

You can specify it as --conf spark.kubernetes.container.image or --conf
spark.kubernetes.executor.container.image



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark 3 pod template for the driver

edeesis
Okay, I see what's going on here.

Looks like the way that spark is coded, the driver container image (specified by --conf
spark.kubernetes.driver.container.image) and executor container image (specified by --conf
spark.kubernetes.executor.container.image) is required.

If they're not specified it'll fallback to --conf
spark.kubernetes.container.image.

The way the "pod template" feature was coded is such that even if it's specified in the YAML, those conf properties take priority and override the value set on the YAML file.

So basically what I'm saying is that although you have it in the YAML file, you still need to specify them.

If, like you said, the goal is to not specify those in the spark submit, you'll likely need to submit an Improvement to the JIRA.

On Tue, Jun 30, 2020 at 5:26 AM Michel Sumbul <[hidden email]> wrote:
Hi Edeesis,

The goal is to not have these settings in the spark submit command. If I specify the same things in a pod template for the executor, I still got the message:
"Exception in thread "main" org.apache.spark.SparkException "Must specify the driver container image"

it even don't try to start an executor container as the driver is not started yet.
Any idea?

Thanks,
Michel

Le mar. 30 juin 2020 à 00:06, edeesis <[hidden email]> a écrit :
If I could muster a guess, you still need to specify the executor image. As
is, this will only specify the driver image.

You can specify it as --conf spark.kubernetes.container.image or --conf
spark.kubernetes.executor.container.image



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark 3 pod template for the driver

edeesis
If I had to guess, it's likely because the Spark code would have to read the YAML to make sure the required parameters are set, and the way it's done was just easier to build on without a lot of refactoring.

On Mon, Jul 6, 2020 at 5:06 PM Michel Sumbul <[hidden email]> wrote:
Thanks Edward for the reply!

I have the impression thats also the case for other settings like memory requested, is it right?
I think I will create a ticket, to allow the user to specify any configuration in a yaml file instead of having a long list of --conf parameter when submitting the job.
Except if there has been reasons not doing it like that from the beginning?

thanks,
Michel

Le jeudi 2 juillet 2020 à 00:43:25 UTC+1, Edward Mitchell <[hidden email]> a écrit :


Okay, I see what's going on here.

Looks like the way that spark is coded, the driver container image (specified by --conf
spark.kubernetes.driver.container.image) and executor container image (specified by --conf
spark.kubernetes.executor.container.image) is required.

If they're not specified it'll fallback to --conf
spark.kubernetes.container.image.

The way the "pod template" feature was coded is such that even if it's specified in the YAML, those conf properties take priority and override the value set on the YAML file.

So basically what I'm saying is that although you have it in the YAML file, you still need to specify them.

If, like you said, the goal is to not specify those in the spark submit, you'll likely need to submit an Improvement to the JIRA.

On Tue, Jun 30, 2020 at 5:26 AM Michel Sumbul <[hidden email]> wrote:
Hi Edeesis,

The goal is to not have these settings in the spark submit command. If I specify the same things in a pod template for the executor, I still got the message:
"Exception in thread "main" org.apache.spark.SparkException "Must specify the driver container image"

it even don't try to start an executor container as the driver is not started yet.
Any idea?

Thanks,
Michel

Le mar. 30 juin 2020 à 00:06, edeesis <[hidden email]> a écrit :
If I could muster a guess, you still need to specify the executor image. As
is, this will only specify the driver image.

You can specify it as --conf spark.kubernetes.container.image or --conf
spark.kubernetes.executor.container.image



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]