A common naming policy for third-party packages/modules under org.apache.spark?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

A common naming policy for third-party packages/modules under org.apache.spark?

Steve Loughran-2

I've just been stack-trace-chasing the 404-in-task-commit code:


And although it's got an org.apache.spark. prefix, it's actually org.apache.spark.sql.delta, which lives in github, so the code/issue tracker lives elsewhere.


however, it can be confusing and time wasting

Can I suggest some common prefix for third-party-classes put into the spark package tree, just to make clear that they are external contributions? It will set expectations up all round

-Steve

(*) Side node: Could whoever maintains that code do retries, which have to have sleeps of >10-15s? We ended up having to do exponental backoff of > 90s to make sure the load balancers were clean. The time for a 404 to clear is not "time since file was added", it is "time since last HEAD/GET/COPY request". thx
Reply | Threaded
Open this post in threaded view
|

Re: A common naming policy for third-party packages/modules under org.apache.spark?

Dongjoon Hyun-2
Hi, Steve.

Sure, you can suggest, but I'm wondering how the suggested namespaces are able to satisfy the existing visibility rules. Could you give us some examples specifically?

> Can I suggest some common prefix for third-party-classes put into the spark package tree, just to make clear that they are external contributions?

Bests,
Dongjoon.


On Mon, Sep 21, 2020 at 6:29 AM Steve Loughran <[hidden email]> wrote:

I've just been stack-trace-chasing the 404-in-task-commit code:


And although it's got an org.apache.spark. prefix, it's actually org.apache.spark.sql.delta, which lives in github, so the code/issue tracker lives elsewhere.


however, it can be confusing and time wasting

Can I suggest some common prefix for third-party-classes put into the spark package tree, just to make clear that they are external contributions? It will set expectations up all round

-Steve

(*) Side node: Could whoever maintains that code do retries, which have to have sleeps of >10-15s? We ended up having to do exponental backoff of > 90s to make sure the load balancers were clean. The time for a 404 to clear is not "time since file was added", it is "time since last HEAD/GET/COPY request". thx
Reply | Threaded
Open this post in threaded view
|

Re: A common naming policy for third-party packages/modules under org.apache.spark?

Dongjoon Hyun-2
In reply to this post by Steve Loughran-2
Hi, Steve.

Sure, you can suggest, but I'm wondering how the suggested namespaces are able to satisfy the existing visibility rules. Could you give us some examples specifically?

> Can I suggest some common prefix for third-party-classes put into the spark package tree, just to make clear that they are external contributions?

Bests,
Dongjoon.


On Mon, Sep 21, 2020 at 6:29 AM Steve Loughran <[hidden email]> wrote:

I've just been stack-trace-chasing the 404-in-task-commit code:


And although it's got an org.apache.spark. prefix, it's actually org.apache.spark.sql.delta, which lives in github, so the code/issue tracker lives elsewhere.


however, it can be confusing and time wasting

Can I suggest some common prefix for third-party-classes put into the spark package tree, just to make clear that they are external contributions? It will set expectations up all round

-Steve

(*) Side node: Could whoever maintains that code do retries, which have to have sleeps of >10-15s? We ended up having to do exponental backoff of > 90s to make sure the load balancers were clean. The time for a 404 to clear is not "time since file was added", it is "time since last HEAD/GET/COPY request". thx
Reply | Threaded
Open this post in threaded view
|

Re: A common naming policy for third-party packages/modules under org.apache.spark?

Steve Loughran-2
the issue is that sometimes people explicitly want to put stuff into the spark package tree just to get at things which spark scoped as org.apache.spark. Unless/Until the relevant APIs/classes are rescoped to be public, putting your classes under the package hierarchy lets your own code at it. It just confuses stack trace analysis as it's not immediately obvious whose code is playing up.



On Tue, 22 Sep 2020 at 04:03, Dongjoon Hyun <[hidden email]> wrote:
Hi, Steve.

Sure, you can suggest, but I'm wondering how the suggested namespaces are able to satisfy the existing visibility rules. Could you give us some examples specifically?

> Can I suggest some common prefix for third-party-classes put into the spark package tree, just to make clear that they are external contributions?

Bests,
Dongjoon.


On Mon, Sep 21, 2020 at 6:29 AM Steve Loughran <[hidden email]> wrote:

I've just been stack-trace-chasing the 404-in-task-commit code:


And although it's got an org.apache.spark. prefix, it's actually org.apache.spark.sql.delta, which lives in github, so the code/issue tracker lives elsewhere.


however, it can be confusing and time wasting

Can I suggest some common prefix for third-party-classes put into the spark package tree, just to make clear that they are external contributions? It will set expectations up all round

-Steve

(*) Side node: Could whoever maintains that code do retries, which have to have sleeps of >10-15s? We ended up having to do exponental backoff of > 90s to make sure the load balancers were clean. The time for a 404 to clear is not "time since file was added", it is "time since last HEAD/GET/COPY request". thx
Reply | Threaded
Open this post in threaded view
|

Re: A common naming policy for third-party packages/modules under org.apache.spark?

Sean Owen-2
Sure it is a good idea, but not sure Spark can enforce it? even a
documented suggestion probably isn't going to be noticed.
FooBar can put code under org.apache.spark.foobar, ideally, I guess.

On Wed, Sep 23, 2020 at 8:01 AM Steve Loughran
<[hidden email]> wrote:

>
> the issue is that sometimes people explicitly want to put stuff into the spark package tree just to get at things which spark scoped as org.apache.spark. Unless/Until the relevant APIs/classes are rescoped to be public, putting your classes under the package hierarchy lets your own code at it. It just confuses stack trace analysis as it's not immediately obvious whose code is playing up.
>
>
>
> On Tue, 22 Sep 2020 at 04:03, Dongjoon Hyun <[hidden email]> wrote:
>>
>> Hi, Steve.
>>
>> Sure, you can suggest, but I'm wondering how the suggested namespaces are able to satisfy the existing visibility rules. Could you give us some examples specifically?
>>
>> > Can I suggest some common prefix for third-party-classes put into the spark package tree, just to make clear that they are external contributions?
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Mon, Sep 21, 2020 at 6:29 AM Steve Loughran <[hidden email]> wrote:
>>>
>>>
>>> I've just been stack-trace-chasing the 404-in-task-commit code:
>>>
>>> https://issues.apache.org/jira/browse/HADOOP-17216
>>>
>>> And although it's got an org.apache.spark. prefix, it's actually org.apache.spark.sql.delta, which lives in github, so the code/issue tracker lives elsewhere.
>>>
>>> I understand why they've done this -I've done it myself- it's to get a classes package-scoped to spark (https://github.com/hortonworks-spark/cloud-integration/blob/master/spark-cloud-integration/src/main/scala/org/apache/spark/cloudera/ParallelizedWithLocalityRDD.scala)
>>>
>>> however, it can be confusing and time wasting
>>>
>>> Can I suggest some common prefix for third-party-classes put into the spark package tree, just to make clear that they are external contributions? It will set expectations up all round
>>>
>>> -Steve
>>>
>>> (*) Side node: Could whoever maintains that code do retries, which have to have sleeps of >10-15s? We ended up having to do exponental backoff of > 90s to make sure the load balancers were clean. The time for a 404 to clear is not "time since file was added", it is "time since last HEAD/GET/COPY request". thx

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]