testing 0.9.0-incubating and maven

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

testing 0.9.0-incubating and maven

Alex Cozzi
Hi Patrick,
thank you for testing. I think I found out what is wrong: I am trying to build my own examples that also depend on another library which in turns depends on hadoop 2.2.
what was happening is that my library brings in hadoop 2.2, while spark depends on hadoop 1.04 and then I think I get conflict versions of the classes.

A couple of things are not clear to me:

1: do the published artifacts support YARN and hadoop 2.2 or will I need to make my own build?
2: if they do, how do I activate the profiles in my maven config? I tried mvn -Pyarn compile but it does not work (maven says “[WARNING] The requested profile "yarn" could not be activated because it does not exist.”)


essentially I would like to specify the spark dependencies as:

<dependencies>
                <dependency>
                        <groupId>org.scala-lang</groupId>
                        <artifactId>scala-library</artifactId>
                        <version>${scala.version}</version>
                </dependency>

                <dependency>
                        <groupId>org.apache.spark</groupId>
                        <artifactId>spark-core_${scala.tools.version}</artifactId>
                        <version>0.9.0-incubating</version>
                </dependency>

and tell maven to use the “yarn” profile for this dependency, but I do not seem to be able to make it work.
Anybody has any suggestion?

Alex
Reply | Threaded
Open this post in threaded view
|

Re: testing 0.9.0-incubating and maven

Patrick Wendell
Hey Alex,

Maven profiles only affect the Spark build itself. They do not
transitively affect your own build.

Checkout the docs for how to deploy applications on yarn:
http://spark.incubator.apache.org/docs/latest/running-on-yarn.html

When compiling your application, just should explicitly add the hadoop
version you depend on to your own build (e.g. a hadoop-client
dependency). Take a look at the example here where we show adding
hadoop-client:

http://spark.incubator.apache.org/docs/latest/quick-start.html

When deploying Spark applications on YARN, you actually want to mark
spark as a provided dependency in your application's maven and bundle
your application as an assembly jar, then submit it with a Spark YARN
bundle to a YARN cluster. The instructions are the same as they were
in 0.8.1.

For the spark jar you want to submit to YARN, you can download the
precompiled Spark one.

It might make sense to try this pipeline with 0.8.1 and get it working
there. It sounds here more like you are dealing with getting the build
set-up rather than a particular issue with the 0.9.0 RC.

- Patrick

On Thu, Jan 16, 2014 at 1:13 PM, Alex Cozzi <[hidden email]> wrote:

> Hi Patrick,
> thank you for testing. I think I found out what is wrong: I am trying to build my own examples that also depend on another library which in turns depends on hadoop 2.2.
> what was happening is that my library brings in hadoop 2.2, while spark depends on hadoop 1.04 and then I think I get conflict versions of the classes.
>
> A couple of things are not clear to me:
>
> 1: do the published artifacts support YARN and hadoop 2.2 or will I need to make my own build?
> 2: if they do, how do I activate the profiles in my maven config? I tried mvn -Pyarn compile but it does not work (maven says “[WARNING] The requested profile "yarn" could not be activated because it does not exist.”)
>
>
> essentially I would like to specify the spark dependencies as:
>
> <dependencies>
>                 <dependency>
>                         <groupId>org.scala-lang</groupId>
>                         <artifactId>scala-library</artifactId>
>                         <version>${scala.version}</version>
>                 </dependency>
>
>                 <dependency>
>                         <groupId>org.apache.spark</groupId>
>                         <artifactId>spark-core_${scala.tools.version}</artifactId>
>                         <version>0.9.0-incubating</version>
>                 </dependency>
>
> and tell maven to use the “yarn” profile for this dependency, but I do not seem to be able to make it work.
> Anybody has any suggestion?
>
> Alex
Reply | Threaded
Open this post in threaded view
|

Re: testing 0.9.0-incubating and maven

Alex Cozzi
Thanks for the help. I am doing progress, but I found I need to do a bit of fiddling with excluding dependencies from spark in order to have mine take effect. As soon as I have a working pom I will post here as an example.

Alex Cozzi
[hidden email]
------------------------------------------------------
eBay is hiring! Check out our job openings
http://ebay.referrals.selectminds.com/?et=OlVHMHJl

On Jan 16, 2014, at 1:54 PM, Patrick Wendell <[hidden email]> wrote:

> Hey Alex,
>
> Maven profiles only affect the Spark build itself. They do not
> transitively affect your own build.
>
> Checkout the docs for how to deploy applications on yarn:
> http://spark.incubator.apache.org/docs/latest/running-on-yarn.html
>
> When compiling your application, just should explicitly add the hadoop
> version you depend on to your own build (e.g. a hadoop-client
> dependency). Take a look at the example here where we show adding
> hadoop-client:
>
> http://spark.incubator.apache.org/docs/latest/quick-start.html
>
> When deploying Spark applications on YARN, you actually want to mark
> spark as a provided dependency in your application's maven and bundle
> your application as an assembly jar, then submit it with a Spark YARN
> bundle to a YARN cluster. The instructions are the same as they were
> in 0.8.1.
>
> For the spark jar you want to submit to YARN, you can download the
> precompiled Spark one.
>
> It might make sense to try this pipeline with 0.8.1 and get it working
> there. It sounds here more like you are dealing with getting the build
> set-up rather than a particular issue with the 0.9.0 RC.
>
> - Patrick
>
> On Thu, Jan 16, 2014 at 1:13 PM, Alex Cozzi <[hidden email]> wrote:
>> Hi Patrick,
>> thank you for testing. I think I found out what is wrong: I am trying to build my own examples that also depend on another library which in turns depends on hadoop 2.2.
>> what was happening is that my library brings in hadoop 2.2, while spark depends on hadoop 1.04 and then I think I get conflict versions of the classes.
>>
>> A couple of things are not clear to me:
>>
>> 1: do the published artifacts support YARN and hadoop 2.2 or will I need to make my own build?
>> 2: if they do, how do I activate the profiles in my maven config? I tried mvn -Pyarn compile but it does not work (maven says “[WARNING] The requested profile "yarn" could not be activated because it does not exist.”)
>>
>>
>> essentially I would like to specify the spark dependencies as:
>>
>> <dependencies>
>>                <dependency>
>>                        <groupId>org.scala-lang</groupId>
>>                        <artifactId>scala-library</artifactId>
>>                        <version>${scala.version}</version>
>>                </dependency>
>>
>>                <dependency>
>>                        <groupId>org.apache.spark</groupId>
>>                        <artifactId>spark-core_${scala.tools.version}</artifactId>
>>                        <version>0.9.0-incubating</version>
>>                </dependency>
>>
>> and tell maven to use the “yarn” profile for this dependency, but I do not seem to be able to make it work.
>> Anybody has any suggestion?
>>
>> Alex