Add spark dependency on on org.opencypher:okapi-shade.okapi

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Add spark dependency on on org.opencypher:okapi-shade.okapi

Weichen Xu
Hi everyone,

I'd like to call a new vote on the issue: should we add dependency "org.opencypher:okapi-shade.okapi" into spark ? The issue background is:

Spark is going to add a big feature "Spark Graph", the prototypical implementation is here
which will introduce dependency org.opencypher:okapi-shade.okapi

Xiangrui already mentioned 2 concerns on this dependency change:
On the technical side, my main concern is the runtime dependency on org.opencypher:okapi-shade.okapi depends on several Scala libraries. We came out with the solution to shade a few Scala libraries to avoid pollution. However, I'm not super confident that the approach is sustainable for two reasons: a) there exists no proper shading libraries for Scala, 2) We will have to wait for upgrades from those Scala libraries before we can upgrade Spark to use a newer Scala version. So it would be great if some Scala experts can help review the current implementation and help assess the risk.

So let's discuss and vote whether this is a good choice.
Before this spark graph feature to get into spark ASAP, this issue should be resolved first.

This vote is open until next Tuseday (Oct. 22).

[ ] +1: Accept the proposal
[ ] +0
[ ] -1: I don't think this is a good idea because ...

Thank you!

Weichen

Reply | Threaded
Open this post in threaded view
|

Re: Add spark dependency on on org.opencypher:okapi-shade.okapi

Mats Rydberg
Hello Weichen, community

I'm sorry, I'm feeling a little bit confused about this vote. Is this about the PR (https://github.com/apache/spark/pull/24490) that was merged in early June and introduced the spark-graph module including the okapi-shade dependency?

Regarding the okapi-shade dependency which was developed as part of the above PR work, some advice was offered by Scala experts at TripleQuote which helped find a satisfactory solution. The shading mechanism used is standard and very comparable to a Java library shading solution.

The PR you link (https://github.com/apache/spark/pull/24297) is not meant for merging. It is just a proof-of-concept branch containing a full implementation of the system, which is kept up-to-date with the API discussion on the currently proposed PR: https://github.com/apache/spark/pull/24851.

Thank you
Mats


On Tue, Oct 15, 2019 at 10:38 AM Weichen Xu <[hidden email]> wrote:
Hi everyone,

I'd like to call a new vote on the issue: should we add dependency "org.opencypher:okapi-shade.okapi" into spark ? The issue background is:

Spark is going to add a big feature "Spark Graph", the prototypical implementation is here
which will introduce dependency org.opencypher:okapi-shade.okapi

Xiangrui already mentioned 2 concerns on this dependency change:
On the technical side, my main concern is the runtime dependency on org.opencypher:okapi-shade.okapi depends on several Scala libraries. We came out with the solution to shade a few Scala libraries to avoid pollution. However, I'm not super confident that the approach is sustainable for two reasons: a) there exists no proper shading libraries for Scala, 2) We will have to wait for upgrades from those Scala libraries before we can upgrade Spark to use a newer Scala version. So it would be great if some Scala experts can help review the current implementation and help assess the risk.

So let's discuss and vote whether this is a good choice.
Before this spark graph feature to get into spark ASAP, this issue should be resolved first.

This vote is open until next Tuseday (Oct. 22).

[ ] +1: Accept the proposal
[ ] +0
[ ] -1: I don't think this is a good idea because ...

Thank you!

Weichen

Reply | Threaded
Open this post in threaded view
|

Re: Add spark dependency on on org.opencypher:okapi-shade.okapi

Weichen Xu
Hi Mats Rydberg,

Although this dependency "org.opencypher:okapi-shade.okapi" was added into spark, but Xiangrui raised two concerns (see above mail) about it, so we'd better rethink on this and consider whether this is a good choice, so I call this vote.

Thanks!

On Tue, Oct 15, 2019 at 10:56 PM Mats Rydberg <[hidden email]> wrote:
Hello Weichen, community

I'm sorry, I'm feeling a little bit confused about this vote. Is this about the PR (https://github.com/apache/spark/pull/24490) that was merged in early June and introduced the spark-graph module including the okapi-shade dependency?

Regarding the okapi-shade dependency which was developed as part of the above PR work, some advice was offered by Scala experts at TripleQuote which helped find a satisfactory solution. The shading mechanism used is standard and very comparable to a Java library shading solution.

The PR you link (https://github.com/apache/spark/pull/24297) is not meant for merging. It is just a proof-of-concept branch containing a full implementation of the system, which is kept up-to-date with the API discussion on the currently proposed PR: https://github.com/apache/spark/pull/24851.

Thank you
Mats


On Tue, Oct 15, 2019 at 10:38 AM Weichen Xu <[hidden email]> wrote:
Hi everyone,

I'd like to call a new vote on the issue: should we add dependency "org.opencypher:okapi-shade.okapi" into spark ? The issue background is:

Spark is going to add a big feature "Spark Graph", the prototypical implementation is here
which will introduce dependency org.opencypher:okapi-shade.okapi

Xiangrui already mentioned 2 concerns on this dependency change:
On the technical side, my main concern is the runtime dependency on org.opencypher:okapi-shade.okapi depends on several Scala libraries. We came out with the solution to shade a few Scala libraries to avoid pollution. However, I'm not super confident that the approach is sustainable for two reasons: a) there exists no proper shading libraries for Scala, 2) We will have to wait for upgrades from those Scala libraries before we can upgrade Spark to use a newer Scala version. So it would be great if some Scala experts can help review the current implementation and help assess the risk.

So let's discuss and vote whether this is a good choice.
Before this spark graph feature to get into spark ASAP, this issue should be resolved first.

This vote is open until next Tuseday (Oct. 22).

[ ] +1: Accept the proposal
[ ] +0
[ ] -1: I don't think this is a good idea because ...

Thank you!

Weichen

Reply | Threaded
Open this post in threaded view
|

Re: Add spark dependency on on org.opencypher:okapi-shade.okapi

Sean Owen-2
I do not have a very informed opinion here, so take this with a grain of salt.

I'd say that we need to either commit a coherent version of this for
Spark 3, or not at all. If it doesn't have support, I'd back out the
existing changes.
I was initially skeptical about how much this needs to be in Spark vs
a third-party package, and that still stands.

The addition of another dependency isn't that big a deal IMHO, but,
yes, it does add something to the maintenance overhead. But that's all
the more true of a new module.

I don't feel strongly about it, but if this isn't obviously getting
support from any committers, can we keep it as a third party library
for now?


On Tue, Oct 15, 2019 at 8:53 PM Weichen Xu <[hidden email]> wrote:

>
> Hi Mats Rydberg,
>
> Although this dependency "org.opencypher:okapi-shade.okapi" was added into spark, but Xiangrui raised two concerns (see above mail) about it, so we'd better rethink on this and consider whether this is a good choice, so I call this vote.
>
> Thanks!
>
> On Tue, Oct 15, 2019 at 10:56 PM Mats Rydberg <[hidden email]> wrote:
>>
>> Hello Weichen, community
>>
>> I'm sorry, I'm feeling a little bit confused about this vote. Is this about the PR (https://github.com/apache/spark/pull/24490) that was merged in early June and introduced the spark-graph module including the okapi-shade dependency?
>>
>> Regarding the okapi-shade dependency which was developed as part of the above PR work, some advice was offered by Scala experts at TripleQuote which helped find a satisfactory solution. The shading mechanism used is standard and very comparable to a Java library shading solution.
>>
>> The PR you link (https://github.com/apache/spark/pull/24297) is not meant for merging. It is just a proof-of-concept branch containing a full implementation of the system, which is kept up-to-date with the API discussion on the currently proposed PR: https://github.com/apache/spark/pull/24851.
>>
>> Thank you
>> Mats
>>
>>
>> On Tue, Oct 15, 2019 at 10:38 AM Weichen Xu <[hidden email]> wrote:
>>>
>>> Hi everyone,
>>>
>>> I'd like to call a new vote on the issue: should we add dependency "org.opencypher:okapi-shade.okapi" into spark ? The issue background is:
>>>
>>> Spark is going to add a big feature "Spark Graph", the prototypical implementation is here
>>> https://github.com/apache/spark/pull/24297
>>> which will introduce dependency org.opencypher:okapi-shade.okapi
>>>
>>> Xiangrui already mentioned 2 concerns on this dependency change:
>>>>
>>>> On the technical side, my main concern is the runtime dependency on org.opencypher:okapi-shade.okapi depends on several Scala libraries. We came out with the solution to shade a few Scala libraries to avoid pollution. However, I'm not super confident that the approach is sustainable for two reasons: a) there exists no proper shading libraries for Scala, 2) We will have to wait for upgrades from those Scala libraries before we can upgrade Spark to use a newer Scala version. So it would be great if some Scala experts can help review the current implementation and help assess the risk.
>>>
>>>
>>> So let's discuss and vote whether this is a good choice.
>>> Before this spark graph feature to get into spark ASAP, this issue should be resolved first.
>>>
>>> This vote is open until next Tuseday (Oct. 22).
>>>
>>> [ ] +1: Accept the proposal
>>> [ ] +0
>>> [ ] -1: I don't think this is a good idea because ...
>>>
>>> Thank you!
>>>
>>> Weichen
>>>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Add spark dependency on on org.opencypher:okapi-shade.okapi

Martin Junghanns-2
I'm slightly confused about this discussion. I worked on all of the aforementioned PRs: the module PR that has been merged, the current PR that introduces the Graph API and the PoC PR that contains the full implementation. The issues around shading were addressed and the module PR eventually got merged. Two PMC members including the SPIP shepherd are working with me (and others) on the current API PR. The SPIP to bring Spark Graph into Apache Spark itself has been successfully voted on earlier this year. I presented this work at Spark Summit in San Fransisco in May and was asked by the organizers to present the topic at the European Spark Summit. I'm currently sitting in the speakers room of that conference preparing for the talk and reading this thread. I hope you understand my confusion.

I admit - and Xiangrui pointed this out in the other thread, too - that we made the early mistake of not bringing more Spark committers on board which lead to a stagnation period during summer when Xiangrui wasn't around to help review and bring progress to the project.

Sean, if your concern is the lack of maintainers of that module, I personally would like to volunteer to maintain Spark Graph. I'm also a contributor to the Okapi stack and am able to work on whatever issue might come up on that end including updating dependencies etc. FWIW, Okapi is actively maintained by a team at Neo4j.

Best, Martin 

On Wed, 16 Oct 2019, 4:35 AM Sean Owen <[hidden email]> wrote:
I do not have a very informed opinion here, so take this with a grain of salt.

I'd say that we need to either commit a coherent version of this for
Spark 3, or not at all. If it doesn't have support, I'd back out the
existing changes.
I was initially skeptical about how much this needs to be in Spark vs
a third-party package, and that still stands.

The addition of another dependency isn't that big a deal IMHO, but,
yes, it does add something to the maintenance overhead. But that's all
the more true of a new module.

I don't feel strongly about it, but if this isn't obviously getting
support from any committers, can we keep it as a third party library
for now?


On Tue, Oct 15, 2019 at 8:53 PM Weichen Xu <[hidden email]> wrote:
>
> Hi Mats Rydberg,
>
> Although this dependency "org.opencypher:okapi-shade.okapi" was added into spark, but Xiangrui raised two concerns (see above mail) about it, so we'd better rethink on this and consider whether this is a good choice, so I call this vote.
>
> Thanks!
>
> On Tue, Oct 15, 2019 at 10:56 PM Mats Rydberg <[hidden email]> wrote:
>>
>> Hello Weichen, community
>>
>> I'm sorry, I'm feeling a little bit confused about this vote. Is this about the PR (https://github.com/apache/spark/pull/24490) that was merged in early June and introduced the spark-graph module including the okapi-shade dependency?
>>
>> Regarding the okapi-shade dependency which was developed as part of the above PR work, some advice was offered by Scala experts at TripleQuote which helped find a satisfactory solution. The shading mechanism used is standard and very comparable to a Java library shading solution.
>>
>> The PR you link (https://github.com/apache/spark/pull/24297) is not meant for merging. It is just a proof-of-concept branch containing a full implementation of the system, which is kept up-to-date with the API discussion on the currently proposed PR: https://github.com/apache/spark/pull/24851.
>>
>> Thank you
>> Mats
>>
>>
>> On Tue, Oct 15, 2019 at 10:38 AM Weichen Xu <[hidden email]> wrote:
>>>
>>> Hi everyone,
>>>
>>> I'd like to call a new vote on the issue: should we add dependency "org.opencypher:okapi-shade.okapi" into spark ? The issue background is:
>>>
>>> Spark is going to add a big feature "Spark Graph", the prototypical implementation is here
>>> https://github.com/apache/spark/pull/24297
>>> which will introduce dependency org.opencypher:okapi-shade.okapi
>>>
>>> Xiangrui already mentioned 2 concerns on this dependency change:
>>>>
>>>> On the technical side, my main concern is the runtime dependency on org.opencypher:okapi-shade.okapi depends on several Scala libraries. We came out with the solution to shade a few Scala libraries to avoid pollution. However, I'm not super confident that the approach is sustainable for two reasons: a) there exists no proper shading libraries for Scala, 2) We will have to wait for upgrades from those Scala libraries before we can upgrade Spark to use a newer Scala version. So it would be great if some Scala experts can help review the current implementation and help assess the risk.
>>>
>>>
>>> So let's discuss and vote whether this is a good choice.
>>> Before this spark graph feature to get into spark ASAP, this issue should be resolved first.
>>>
>>> This vote is open until next Tuseday (Oct. 22).
>>>
>>> [ ] +1: Accept the proposal
>>> [ ] +0
>>> [ ] -1: I don't think this is a good idea because ...
>>>
>>> Thank you!
>>>
>>> Weichen
>>>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Add spark dependency on on org.opencypher:okapi-shade.okapi

Sean Owen-2
We don't all have to agree on whether to add this -- there are like 10
people with an opinion -- and I certainly would not veto it. In
practice a medium-sized changes needs someone to review/merge it all
the way through, and nobody strongly objecting. I too don't know what
to make of the situation; what happened to the supporters here?

I am concerned about maintenance, as inevitably any new module falls
on everyone to maintain to some degree, and people come and go despite
their intentions. But that isn't the substance of why I personally
wouldn't merge it. Just doesn't seem like it must live in Spark. But
again this is my opinion; you don't need to convince me, just need to
(re?)-convince a shepherd, sponsor for this change.

Voting on the dependency part or whatever is also not important. It's
a detail, and already merged even.

The issue to hand is: if nobody supports reviewing and merging the
rest of the change, what then? we can't leave it half implemented. The
fallback plan is just to back it out and reconsider later. This would
be a poor outcome process-wise, but better than leaving it incomplete.

On Wed, Oct 16, 2019 at 3:15 AM Martin Junghanns
<[hidden email]> wrote:

>
> I'm slightly confused about this discussion. I worked on all of the aforementioned PRs: the module PR that has been merged, the current PR that introduces the Graph API and the PoC PR that contains the full implementation. The issues around shading were addressed and the module PR eventually got merged. Two PMC members including the SPIP shepherd are working with me (and others) on the current API PR. The SPIP to bring Spark Graph into Apache Spark itself has been successfully voted on earlier this year. I presented this work at Spark Summit in San Fransisco in May and was asked by the organizers to present the topic at the European Spark Summit. I'm currently sitting in the speakers room of that conference preparing for the talk and reading this thread. I hope you understand my confusion.
>
> I admit - and Xiangrui pointed this out in the other thread, too - that we made the early mistake of not bringing more Spark committers on board which lead to a stagnation period during summer when Xiangrui wasn't around to help review and bring progress to the project.
>
> Sean, if your concern is the lack of maintainers of that module, I personally would like to volunteer to maintain Spark Graph. I'm also a contributor to the Okapi stack and am able to work on whatever issue might come up on that end including updating dependencies etc. FWIW, Okapi is actively maintained by a team at Neo4j.
>
> Best, Martin
>
> On Wed, 16 Oct 2019, 4:35 AM Sean Owen <[hidden email]> wrote:
>>
>> I do not have a very informed opinion here, so take this with a grain of salt.
>>
>> I'd say that we need to either commit a coherent version of this for
>> Spark 3, or not at all. If it doesn't have support, I'd back out the
>> existing changes.
>> I was initially skeptical about how much this needs to be in Spark vs
>> a third-party package, and that still stands.
>>
>> The addition of another dependency isn't that big a deal IMHO, but,
>> yes, it does add something to the maintenance overhead. But that's all
>> the more true of a new module.
>>
>> I don't feel strongly about it, but if this isn't obviously getting
>> support from any committers, can we keep it as a third party library
>> for now?
>>
>>
>> On Tue, Oct 15, 2019 at 8:53 PM Weichen Xu <[hidden email]> wrote:
>> >
>> > Hi Mats Rydberg,
>> >
>> > Although this dependency "org.opencypher:okapi-shade.okapi" was added into spark, but Xiangrui raised two concerns (see above mail) about it, so we'd better rethink on this and consider whether this is a good choice, so I call this vote.
>> >
>> > Thanks!
>> >
>> > On Tue, Oct 15, 2019 at 10:56 PM Mats Rydberg <[hidden email]> wrote:
>> >>
>> >> Hello Weichen, community
>> >>
>> >> I'm sorry, I'm feeling a little bit confused about this vote. Is this about the PR (https://github.com/apache/spark/pull/24490) that was merged in early June and introduced the spark-graph module including the okapi-shade dependency?
>> >>
>> >> Regarding the okapi-shade dependency which was developed as part of the above PR work, some advice was offered by Scala experts at TripleQuote which helped find a satisfactory solution. The shading mechanism used is standard and very comparable to a Java library shading solution.
>> >>
>> >> The PR you link (https://github.com/apache/spark/pull/24297) is not meant for merging. It is just a proof-of-concept branch containing a full implementation of the system, which is kept up-to-date with the API discussion on the currently proposed PR: https://github.com/apache/spark/pull/24851.
>> >>
>> >> Thank you
>> >> Mats
>> >>
>> >>
>> >> On Tue, Oct 15, 2019 at 10:38 AM Weichen Xu <[hidden email]> wrote:
>> >>>
>> >>> Hi everyone,
>> >>>
>> >>> I'd like to call a new vote on the issue: should we add dependency "org.opencypher:okapi-shade.okapi" into spark ? The issue background is:
>> >>>
>> >>> Spark is going to add a big feature "Spark Graph", the prototypical implementation is here
>> >>> https://github.com/apache/spark/pull/24297
>> >>> which will introduce dependency org.opencypher:okapi-shade.okapi
>> >>>
>> >>> Xiangrui already mentioned 2 concerns on this dependency change:
>> >>>>
>> >>>> On the technical side, my main concern is the runtime dependency on org.opencypher:okapi-shade.okapi depends on several Scala libraries. We came out with the solution to shade a few Scala libraries to avoid pollution. However, I'm not super confident that the approach is sustainable for two reasons: a) there exists no proper shading libraries for Scala, 2) We will have to wait for upgrades from those Scala libraries before we can upgrade Spark to use a newer Scala version. So it would be great if some Scala experts can help review the current implementation and help assess the risk.
>> >>>
>> >>>
>> >>> So let's discuss and vote whether this is a good choice.
>> >>> Before this spark graph feature to get into spark ASAP, this issue should be resolved first.
>> >>>
>> >>> This vote is open until next Tuseday (Oct. 22).
>> >>>
>> >>> [ ] +1: Accept the proposal
>> >>> [ ] +0
>> >>> [ ] -1: I don't think this is a good idea because ...
>> >>>
>> >>> Thank you!
>> >>>
>> >>> Weichen
>> >>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [hidden email]
>>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Add spark dependency on on org.opencypher:okapi-shade.okapi

rxin
Just curious - did we discuss why this shouldn't be another Apache sister project?


On Wed, Oct 16, 2019 at 10:21 AM, Sean Owen <[hidden email]> wrote:

We don't all have to agree on whether to add this -- there are like 10 people with an opinion -- and I certainly would not veto it. In practice a medium-sized changes needs someone to review/merge it all the way through, and nobody strongly objecting. I too don't know what to make of the situation; what happened to the supporters here?

I am concerned about maintenance, as inevitably any new module falls on everyone to maintain to some degree, and people come and go despite their intentions. But that isn't the substance of why I personally wouldn't merge it. Just doesn't seem like it must live in Spark. But again this is my opinion; you don't need to convince me, just need to
(re?)-convince a shepherd, sponsor for this change.

Voting on the dependency part or whatever is also not important. It's a detail, and already merged even.

The issue to hand is: if nobody supports reviewing and merging the rest of the change, what then? we can't leave it half implemented. The fallback plan is just to back it out and reconsider later. This would be a poor outcome process-wise, but better than leaving it incomplete.

On Wed, Oct 16, 2019 at 3:15 AM Martin Junghanns
<[hidden email]> wrote:

I'm slightly confused about this discussion. I worked on all of the aforementioned PRs: the module PR that has been merged, the current PR that introduces the Graph API and the PoC PR that contains the full implementation. The issues around shading were addressed and the module PR eventually got merged. Two PMC members including the SPIP shepherd are working with me (and others) on the current API PR. The SPIP to bring Spark Graph into Apache Spark itself has been successfully voted on earlier this year. I presented this work at Spark Summit in San Fransisco in May and was asked by the organizers to present the topic at the European Spark Summit. I'm currently sitting in the speakers room of that conference preparing for the talk and reading this thread. I hope you understand my confusion.

I admit - and Xiangrui pointed this out in the other thread, too - that we made the early mistake of not bringing more Spark committers on board which lead to a stagnation period during summer when Xiangrui wasn't around to help review and bring progress to the project.

Sean, if your concern is the lack of maintainers of that module, I personally would like to volunteer to maintain Spark Graph. I'm also a contributor to the Okapi stack and am able to work on whatever issue might come up on that end including updating dependencies etc. FWIW, Okapi is actively maintained by a team at Neo4j.

Best, Martin

On Wed, 16 Oct 2019, 4:35 AM Sean Owen <[hidden email]> wrote:

I do not have a very informed opinion here, so take this with a grain of salt.

I'd say that we need to either commit a coherent version of this for Spark 3, or not at all. If it doesn't have support, I'd back out the existing changes.
I was initially skeptical about how much this needs to be in Spark vs a third-party package, and that still stands.

The addition of another dependency isn't that big a deal IMHO, but, yes, it does add something to the maintenance overhead. But that's all the more true of a new module.

I don't feel strongly about it, but if this isn't obviously getting support from any committers, can we keep it as a third party library for now?

On Tue, Oct 15, 2019 at 8:53 PM Weichen Xu <[hidden email]> wrote:

Hi Mats Rydberg,

Although this dependency "org.opencypher:okapi-shade.okapi" was added into spark, but Xiangrui raised two concerns (see above mail) about it, so we'd better rethink on this and consider whether this is a good choice, so I call this vote.

Thanks!

On Tue, Oct 15, 2019 at 10:56 PM Mats Rydberg <[hidden email]> wrote:

Hello Weichen, community

I'm sorry, I'm feeling a little bit confused about this vote. Is this about the PR (https://github.com/apache/spark/pull/24490) that was merged in early June and introduced the spark-graph module including the okapi-shade dependency?

Regarding the okapi-shade dependency which was developed as part of the above PR work, some advice was offered by Scala experts at TripleQuote which helped find a satisfactory solution. The shading mechanism used is standard and very comparable to a Java library shading solution.

The PR you link (https://github.com/apache/spark/pull/24297) is not meant for merging. It is just a proof-of-concept branch containing a full implementation of the system, which is kept up-to-date with the API discussion on the currently proposed PR: https://github.com/apache/spark/pull/24851.

Thank you
Mats

On Tue, Oct 15, 2019 at 10:38 AM Weichen Xu <[hidden email]> wrote:

Hi everyone,

I'd like to call a new vote on the issue: should we add dependency "org.opencypher:okapi-shade.okapi" into spark ? The issue background is:

Spark is going to add a big feature "Spark Graph", the prototypical implementation is here https://github.com/apache/spark/pull/24297
which will introduce dependency org.opencypher:okapi-shade.okapi

Xiangrui already mentioned 2 concerns on this dependency change:

On the technical side, my main concern is the runtime dependency on org.opencypher:okapi-shade.okapi depends on several Scala libraries. We came out with the solution to shade a few Scala libraries to avoid pollution. However, I'm not super confident that the approach is sustainable for two reasons: a) there exists no proper shading libraries for Scala, 2) We will have to wait for upgrades from those Scala libraries before we can upgrade Spark to use a newer Scala version. So it would be great if some Scala experts can help review the current implementation and help assess the risk.

So let's discuss and vote whether this is a good choice. Before this spark graph feature to get into spark ASAP, this issue should be resolved first.

This vote is open until next Tuseday (Oct. 22).

[ ] +1: Accept the proposal
[ ] +0
[ ] -1: I don't think this is a good idea because ...

Thank you!

Weichen

--------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]

--------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Add spark dependency on on org.opencypher:okapi-shade.okapi

Weichen Xu
Attach the design doc here
https://docs.google.com/document/d/1Wxzghj0PvpOVu7XD1iA8uonRYhexwn18utdcTxtkxlI/edit#

I think the initial design intention is to replace graphx in spark. At first, we plan to merge graphframe project into spark, now this design is improved to not only include graphframe, but also include a graph query (CypherQL) engine.

On Thu, Oct 17, 2019 at 1:41 AM Reynold Xin <[hidden email]> wrote:
Just curious - did we discuss why this shouldn't be another Apache sister project?


On Wed, Oct 16, 2019 at 10:21 AM, Sean Owen <[hidden email]> wrote:

We don't all have to agree on whether to add this -- there are like 10 people with an opinion -- and I certainly would not veto it. In practice a medium-sized changes needs someone to review/merge it all the way through, and nobody strongly objecting. I too don't know what to make of the situation; what happened to the supporters here?

I am concerned about maintenance, as inevitably any new module falls on everyone to maintain to some degree, and people come and go despite their intentions. But that isn't the substance of why I personally wouldn't merge it. Just doesn't seem like it must live in Spark. But again this is my opinion; you don't need to convince me, just need to
(re?)-convince a shepherd, sponsor for this change.

Voting on the dependency part or whatever is also not important. It's a detail, and already merged even.

The issue to hand is: if nobody supports reviewing and merging the rest of the change, what then? we can't leave it half implemented. The fallback plan is just to back it out and reconsider later. This would be a poor outcome process-wise, but better than leaving it incomplete.

On Wed, Oct 16, 2019 at 3:15 AM Martin Junghanns
<[hidden email]> wrote:

I'm slightly confused about this discussion. I worked on all of the aforementioned PRs: the module PR that has been merged, the current PR that introduces the Graph API and the PoC PR that contains the full implementation. The issues around shading were addressed and the module PR eventually got merged. Two PMC members including the SPIP shepherd are working with me (and others) on the current API PR. The SPIP to bring Spark Graph into Apache Spark itself has been successfully voted on earlier this year. I presented this work at Spark Summit in San Fransisco in May and was asked by the organizers to present the topic at the European Spark Summit. I'm currently sitting in the speakers room of that conference preparing for the talk and reading this thread. I hope you understand my confusion.

I admit - and Xiangrui pointed this out in the other thread, too - that we made the early mistake of not bringing more Spark committers on board which lead to a stagnation period during summer when Xiangrui wasn't around to help review and bring progress to the project.

Sean, if your concern is the lack of maintainers of that module, I personally would like to volunteer to maintain Spark Graph. I'm also a contributor to the Okapi stack and am able to work on whatever issue might come up on that end including updating dependencies etc. FWIW, Okapi is actively maintained by a team at Neo4j.

Best, Martin

On Wed, 16 Oct 2019, 4:35 AM Sean Owen <[hidden email]> wrote:

I do not have a very informed opinion here, so take this with a grain of salt.

I'd say that we need to either commit a coherent version of this for Spark 3, or not at all. If it doesn't have support, I'd back out the existing changes.
I was initially skeptical about how much this needs to be in Spark vs a third-party package, and that still stands.

The addition of another dependency isn't that big a deal IMHO, but, yes, it does add something to the maintenance overhead. But that's all the more true of a new module.

I don't feel strongly about it, but if this isn't obviously getting support from any committers, can we keep it as a third party library for now?

On Tue, Oct 15, 2019 at 8:53 PM Weichen Xu <[hidden email]> wrote:

Hi Mats Rydberg,

Although this dependency "org.opencypher:okapi-shade.okapi" was added into spark, but Xiangrui raised two concerns (see above mail) about it, so we'd better rethink on this and consider whether this is a good choice, so I call this vote.

Thanks!

On Tue, Oct 15, 2019 at 10:56 PM Mats Rydberg <[hidden email]> wrote:

Hello Weichen, community

I'm sorry, I'm feeling a little bit confused about this vote. Is this about the PR (https://github.com/apache/spark/pull/24490) that was merged in early June and introduced the spark-graph module including the okapi-shade dependency?

Regarding the okapi-shade dependency which was developed as part of the above PR work, some advice was offered by Scala experts at TripleQuote which helped find a satisfactory solution. The shading mechanism used is standard and very comparable to a Java library shading solution.

The PR you link (https://github.com/apache/spark/pull/24297) is not meant for merging. It is just a proof-of-concept branch containing a full implementation of the system, which is kept up-to-date with the API discussion on the currently proposed PR: https://github.com/apache/spark/pull/24851.

Thank you
Mats

On Tue, Oct 15, 2019 at 10:38 AM Weichen Xu <[hidden email]> wrote:

Hi everyone,

I'd like to call a new vote on the issue: should we add dependency "org.opencypher:okapi-shade.okapi" into spark ? The issue background is:

Spark is going to add a big feature "Spark Graph", the prototypical implementation is here https://github.com/apache/spark/pull/24297
which will introduce dependency org.opencypher:okapi-shade.okapi

Xiangrui already mentioned 2 concerns on this dependency change:

On the technical side, my main concern is the runtime dependency on org.opencypher:okapi-shade.okapi depends on several Scala libraries. We came out with the solution to shade a few Scala libraries to avoid pollution. However, I'm not super confident that the approach is sustainable for two reasons: a) there exists no proper shading libraries for Scala, 2) We will have to wait for upgrades from those Scala libraries before we can upgrade Spark to use a newer Scala version. So it would be great if some Scala experts can help review the current implementation and help assess the risk.

So let's discuss and vote whether this is a good choice. Before this spark graph feature to get into spark ASAP, this issue should be resolved first.

This vote is open until next Tuseday (Oct. 22).

[ ] +1: Accept the proposal
[ ] +0
[ ] -1: I don't think this is a good idea because ...

Thank you!

Weichen

--------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]

--------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]