[DISCUSS] [Spark confs] Making spark.jars conf take precedence over spark default classpath

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] [Spark confs] Making spark.jars conf take precedence over spark default classpath

nupurshukla
This post was updated on .
Hello,

I am prototyping a change in the behavior of spark.jars conf for my
use-case.  spark.jars conf is used to specify a list of jars to include on
the driver and executor classpaths.

Current behavior:  spark.jars conf value is not read until after the JVM
has already started and the system classloader has already loaded, and hence
the jars added using this conf get “appended” to the spark classpath. This
means that spark looks for the jar in its default classpath first and then
looks at the path specified in spark.jars conf.

Proposed prototype: I am proposing a new behavior where we can have
spark.jars take precedence over spark default classpath in terms of how jars
are discovered. This can be achieved by using
spark.{driver,executor}.extraClassPath conf. This conf modifies the actual
launch command of the driver (or executors), and hence this path is
"prepended" to the classpath and thus takes precedence over the default
classpath. Can the behavior of conf spark.jars be modified by adding the
conf value of spark.jars to conf value of
spark.{driver,executor}.extraClassPath during argument parsing in
SparkSubmitArguments.scala, so that we can achieve precedence order of jars specified in spark.jars >
spark.{driver,executor}.extraClassPath > spark default classpath (left to
right precedence order)

Pseudo sample code:
In  loadEnvironmentArguments():

if (jars != null) {
      if (driverExtraClassPath != null) {
        driverExtraClassPath = driverExtraClassPath + "," + jars
      }
      else {
        driverExtraClassPath = jars
      }
    }


What are your thoughts on this? Could this have any undesired side-effects?
Or has this already been explored and there are some known issues with this
approach?

As an example, consider jars :
sample-jar-1.0.0.jar present in spark’s default classpath
sample-jar-2.0.0.jar present on all nodes of the cluster at path
/<somepath>/
new-jar-1.0.0.jar present on all nodes of the cluster at path /<somepath>/
(and not in spark default classpath)

And two scenarios 2 spark jobs are submitted with the following – jars conf
values

<http://apache-spark-developers-list.1001551.n3.nabble.com/file/t3705/Capture.png




Thanks,
Nupur



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] [Spark confs] Making spark.jars conf take precedence over spark default classpath

Imran Rashid-5
Hi Nupur,

Is what you're trying to do already possible via the spark.{driver,executor}.userClassPathFirst options?


On Wed, Jul 22, 2020 at 5:50 PM nupurshukla <[hidden email]> wrote:
Hello,

I am prototyping a change in the behavior of spark.jars conf for my
use-case.  spark.jars conf is used to specify a list of jars to include on
the driver and executor classpaths.

*Current behavior:*  spark.jars conf value is not read until after the JVM
has already started and the system classloader has already loaded, and hence
the jars added using this conf get “appended” to the spark classpath. This
means that spark looks for the jar in its default classpath first and then
looks at the path specified in spark.jars conf.

*Proposed prototype:* I am proposing a new behavior where we can have
spark.jars take precedence over spark default classpath in terms of how jars
are discovered. This can be achieved by using
spark.{driver,executor}.extraClassPath conf. This conf modifies the actual
launch command of the driver (or executors), and hence this path is
"prepended" to the classpath and thus takes precedence over the default
classpath. Can the behavior of conf spark.jars be modified by adding the
conf value of spark.jars to conf value of
spark.{driver,executor}.extraClassPath during argument parsing in
SparkSubmitArguments.scala
<https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L151>
, so that we can achieve precedence order of jars specified in spark.jars >
spark.{driver,executor}.extraClassPath > spark default classpath (left to
right precedence order)

*Pseudo sample code:*
In  loadEnvironmentArguments()
<https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L151>
:

/if (jars != null) {
      if (driverExtraClassPath != null) {
        driverExtraClassPath = driverExtraClassPath + "," + jars
      }
      else {
        driverExtraClassPath = jars
      }
    }/


*As an example*, consider jars :
sample-jar-1.0.0.jar present in spark’s default classpath
sample-jar-2.0.0.jar present on all nodes of the cluster at path
/<somepath>/
new-jar-1.0.0.jar present on all nodes of the cluster at path /<somepath>/
(and not in spark default classpath)

And two scenarios 2 spark jobs are submitted with the following – jars conf
values

<http://apache-spark-developers-list.1001551.n3.nabble.com/file/t3705/Capture.png>


What are your thoughts on this? Could this have any undesired side-effects?
Or has this already been explored and there are some known issues with this
approach?

Thanks,
Nupur



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]