which classes/methods are considered as private in Spark?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

which classes/methods are considered as private in Spark?

cloud0fan
Hi all,

Recently I updated the MiMa exclusion rules, and found MiMa tracks some private classes/methods unexpectedly.

Note that, "private" here means that, we have no guarantee about compatibility. We don't provide documents and users need to take the risk when using them.

In the API document, it has some obvious private classes, e.g. https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.serializer.DummySerializerInstance , which is not expected either.

I looked around and can't find a clear definition of "private" in Spark.

AFAIK, we have several rules:
1. everything which is really private that end users can't access, e.g. package private classes, private methods, etc.
2. classes under certain packages. I don't know if we have a list, the catalyst package is considered as a private package.
3. everything which has a @Private annotation.

I'm sending this email to collect more feedback, and hope we can come up with a clear definition about what is "private".

Thanks,
Wenchen
Reply | Threaded
Open this post in threaded view
|

Re: which classes/methods are considered as private in Spark?

Sean Owen-2
You should find that 'surprisingly public' classes are there because
of language technicalities. For example DummySerializerInstance is
public because it's a Java class, and can't be used outside its
package otherwise.

LIkewise I think MiMa just looks at bytecode, and private[spark]
classes are public in the bytecode for similar reasons (although Scala
enforces the access within Scala as expected). Hence it will flag
changes to "nonpublic" private[spark] classes.

I think things that are meant to be marked private are, well, marked
private, or else as private as possible and flagged with annotations
like @Private. (It does sound like DummySerializerInstance should be
so annotated?) Yes, the catalyst package in its entirety is one big
exception - private by fiat, not by painstaking flagging of every
class.

The issue to me is really docs. If we have java/scaladoc of private
classes, and there's a way to avoid that like with annotations, that
should be fixed.
On Tue, Nov 13, 2018 at 6:26 PM Wenchen Fan <[hidden email]> wrote:

>
> Hi all,
>
> Recently I updated the MiMa exclusion rules, and found MiMa tracks some private classes/methods unexpectedly.
>
> Note that, "private" here means that, we have no guarantee about compatibility. We don't provide documents and users need to take the risk when using them.
>
> In the API document, it has some obvious private classes, e.g. https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.serializer.DummySerializerInstance , which is not expected either.
>
> I looked around and can't find a clear definition of "private" in Spark.
>
> AFAIK, we have several rules:
> 1. everything which is really private that end users can't access, e.g. package private classes, private methods, etc.
> 2. classes under certain packages. I don't know if we have a list, the catalyst package is considered as a private package.
> 3. everything which has a @Private annotation.
>
> I'm sending this email to collect more feedback, and hope we can come up with a clear definition about what is "private".
>
> Thanks,
> Wenchen

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: which classes/methods are considered as private in Spark?

Marcelo Vanzin-2
In reply to this post by cloud0fan
On Tue, Nov 13, 2018 at 6:26 PM Wenchen Fan <[hidden email]> wrote:
> Recently I updated the MiMa exclusion rules, and found MiMa tracks some private classes/methods unexpectedly.

Could you clarify what you mean here? Mima has some known limitations
such as not handling "private[blah]" very well (because that means
public in Java). Spark has (had?) this tool to generate an exclusions
file for Mima, but not sure how up-to-date it is.

> AFAIK, we have several rules:
> 1. everything which is really private that end users can't access, e.g. package private classes, private methods, etc.
> 2. classes under certain packages. I don't know if we have a list, the catalyst package is considered as a private package.
> 3. everything which has a @Private annotation.

That's my understanding of the scope of the rules.

(2) to me means "things that show up in the public API docs". That's,
AFAIK, tracked in SparkBuild.scala; seems like it's tracked by a bunch
of exclusions in the Unidoc object (I remember that being different in
the past).

(3) might be a limitation of the doc generation tool? Not sure if it's
easy to say "do not document classes that have @Private". At the very
least, that annotation seems to be missing the "@Documented"
annotation, which would make that info present in the javadoc. I do
not know if the scala doc tool handles that.

--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: which classes/methods are considered as private in Spark?

cloud0fan
> Could you clarify what you mean here? Mima has some known limitations such as not handling "private[blah]" very well

Yes that's what I mean.

What I want to know here is, which classes/methods we expect them to be private. I think things marked as "private[blabla]" are expected to be private for sure, it's just the MiMa and doc generator can't handle it well. We can fix them later, by using the @Private annotation probably.

> seems like it's tracked by a bunch of exclusions in the Unidoc object

That's good. At least we have a clear definition about which packages are meant to be private. We should make it consistent between MiMa and doc generator though.

On Wed, Nov 14, 2018 at 10:41 AM Marcelo Vanzin <[hidden email]> wrote:
On Tue, Nov 13, 2018 at 6:26 PM Wenchen Fan <[hidden email]> wrote:
> Recently I updated the MiMa exclusion rules, and found MiMa tracks some private classes/methods unexpectedly.

Could you clarify what you mean here? Mima has some known limitations
such as not handling "private[blah]" very well (because that means
public in Java). Spark has (had?) this tool to generate an exclusions
file for Mima, but not sure how up-to-date it is.

> AFAIK, we have several rules:
> 1. everything which is really private that end users can't access, e.g. package private classes, private methods, etc.
> 2. classes under certain packages. I don't know if we have a list, the catalyst package is considered as a private package.
> 3. everything which has a @Private annotation.

That's my understanding of the scope of the rules.

(2) to me means "things that show up in the public API docs". That's,
AFAIK, tracked in SparkBuild.scala; seems like it's tracked by a bunch
of exclusions in the Unidoc object (I remember that being different in
the past).

(3) might be a limitation of the doc generation tool? Not sure if it's
easy to say "do not document classes that have @Private". At the very
least, that annotation seems to be missing the "@Documented"
annotation, which would make that info present in the javadoc. I do
not know if the scala doc tool handles that.

--
Marcelo
Reply | Threaded
Open this post in threaded view
|

Re: which classes/methods are considered as private in Spark?

rxin
I used to, before each release during the RC phase, go through every single doc page to make sure we don’t unintentionally leave things public. I no longer have time to do that unfortunately. I find that very useful because I always catch some mistakes through organic development.

On Nov 13, 2018, at 8:00 PM, Wenchen Fan <[hidden email]> wrote:

> Could you clarify what you mean here? Mima has some known limitations such as not handling "private[blah]" very well

Yes that's what I mean.

What I want to know here is, which classes/methods we expect them to be private. I think things marked as "private[blabla]" are expected to be private for sure, it's just the MiMa and doc generator can't handle it well. We can fix them later, by using the @Private annotation probably.

> seems like it's tracked by a bunch of exclusions in the Unidoc object

That's good. At least we have a clear definition about which packages are meant to be private. We should make it consistent between MiMa and doc generator though.

On Wed, Nov 14, 2018 at 10:41 AM Marcelo Vanzin <[hidden email]> wrote:
On Tue, Nov 13, 2018 at 6:26 PM Wenchen Fan <[hidden email]> wrote:
> Recently I updated the MiMa exclusion rules, and found MiMa tracks some private classes/methods unexpectedly.

Could you clarify what you mean here? Mima has some known limitations
such as not handling "private[blah]" very well (because that means
public in Java). Spark has (had?) this tool to generate an exclusions
file for Mima, but not sure how up-to-date it is.

> AFAIK, we have several rules:
> 1. everything which is really private that end users can't access, e.g. package private classes, private methods, etc.
> 2. classes under certain packages. I don't know if we have a list, the catalyst package is considered as a private package.
> 3. everything which has a @Private annotation.

That's my understanding of the scope of the rules.

(2) to me means "things that show up in the public API docs". That's,
AFAIK, tracked in SparkBuild.scala; seems like it's tracked by a bunch
of exclusions in the Unidoc object (I remember that being different in
the past).

(3) might be a limitation of the doc generation tool? Not sure if it's
easy to say "do not document classes that have @Private". At the very
least, that annotation seems to be missing the "@Documented"
annotation, which would make that info present in the javadoc. I do
not know if the scala doc tool handles that.

--
Marcelo