2.4.0 Blockers, Critical, etc

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

2.4.0 Blockers, Critical, etc

Sean Owen-3
Because we're into 2.4 release candidates, I thought I'd look at
what's still open and targeted at 2.4.0. I presume the Blockers are
the usual umbrellas that don't themselves block anything, but,
confirming, there is nothing left to do there?

I think that's mostly a question for Joseph and Weichen.

As ever, anyone who knows these items are a) done or b) not going to
be in 2.4, go ahead and update them.


Blocker:

SPARK-25321 ML, Graph 2.4 QA: API: New Scala APIs, docs
SPARK-25324 ML 2.4 QA: API: Java compatibility, docs
SPARK-25323 ML 2.4 QA: API: Python API coverage
SPARK-25320 ML, Graph 2.4 QA: API: Binary incompatible changes

Critical:

SPARK-25319 Spark MLlib, GraphX 2.4 QA umbrella
SPARK-25378 ArrayData.toArray(StringType) assume UTF8String in 2.4
SPARK-25327 Update MLlib, GraphX websites for 2.4
SPARK-25325 ML, Graph 2.4 QA: Update user guide for new features & APIs
SPARK-25326 ML, Graph 2.4 QA: Programming guide update and migration guide

Other:

SPARK-25346 Document Spark builtin data sources
SPARK-25347 Document image data source in doc site
SPARK-12978 Skip unnecessary final group-by when input data already
clustered with group-by keys
SPARK-20184 performance regression for complex/long sql when enable
whole stage codegen
SPARK-16196 Optimize in-memory scan performance using ColumnarBatches
SPARK-15693 Write schema definition out for file-based data sources to
avoid schema inference
SPARK-23597 Audit Spark SQL code base for non-interpreted expressions
SPARK-25179 Document the features that require Pyarrow 0.10
SPARK-25110 make sure Flume streaming connector works with Spark 2.4
SPARK-21318 The exception message thrown by `lookupFunction` is ambiguous.
SPARK-24464 Unit tests for MLlib's Instrumentation
SPARK-23197 Flaky test: spark.streaming.ReceiverSuite."receiver_life_cycle"
SPARK-22809 pyspark is sensitive to imports with dots
SPARK-22739 Additional Expression Support for Objects
SPARK-22231 Support of map, filter, withColumn, dropColumn in nested
list of structures
SPARK-21030 extend hint syntax to support any expression for Python and R
SPARK-22386 Data Source V2 improvements
SPARK-15117 Generate code that get a value in each compressed column
from CachedBatch when DataFrame.cache() is called

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: 2.4.0 Blockers, Critical, etc

cloud0fan
Sean thanks for checking them!

I made one pass and re-targeted/closed some of them. Most of them are documentation and auditing, do we need to block the release for them?

On Fri, Sep 21, 2018 at 6:01 AM Sean Owen <[hidden email]> wrote:
Because we're into 2.4 release candidates, I thought I'd look at
what's still open and targeted at 2.4.0. I presume the Blockers are
the usual umbrellas that don't themselves block anything, but,
confirming, there is nothing left to do there?

I think that's mostly a question for Joseph and Weichen.

As ever, anyone who knows these items are a) done or b) not going to
be in 2.4, go ahead and update them.


Blocker:

SPARK-25321 ML, Graph 2.4 QA: API: New Scala APIs, docs
SPARK-25324 ML 2.4 QA: API: Java compatibility, docs
SPARK-25323 ML 2.4 QA: API: Python API coverage
SPARK-25320 ML, Graph 2.4 QA: API: Binary incompatible changes

Critical:

SPARK-25319 Spark MLlib, GraphX 2.4 QA umbrella
SPARK-25378 ArrayData.toArray(StringType) assume UTF8String in 2.4
SPARK-25327 Update MLlib, GraphX websites for 2.4
SPARK-25325 ML, Graph 2.4 QA: Update user guide for new features & APIs
SPARK-25326 ML, Graph 2.4 QA: Programming guide update and migration guide

Other:

SPARK-25346 Document Spark builtin data sources
SPARK-25347 Document image data source in doc site
SPARK-12978 Skip unnecessary final group-by when input data already
clustered with group-by keys
SPARK-20184 performance regression for complex/long sql when enable
whole stage codegen
SPARK-16196 Optimize in-memory scan performance using ColumnarBatches
SPARK-15693 Write schema definition out for file-based data sources to
avoid schema inference
SPARK-23597 Audit Spark SQL code base for non-interpreted expressions
SPARK-25179 Document the features that require Pyarrow 0.10
SPARK-25110 make sure Flume streaming connector works with Spark 2.4
SPARK-21318 The exception message thrown by `lookupFunction` is ambiguous.
SPARK-24464 Unit tests for MLlib's Instrumentation
SPARK-23197 Flaky test: spark.streaming.ReceiverSuite."receiver_life_cycle"
SPARK-22809 pyspark is sensitive to imports with dots
SPARK-22739 Additional Expression Support for Objects
SPARK-22231 Support of map, filter, withColumn, dropColumn in nested
list of structures
SPARK-21030 extend hint syntax to support any expression for Python and R
SPARK-22386 Data Source V2 improvements
SPARK-15117 Generate code that get a value in each compressed column
from CachedBatch when DataFrame.cache() is called

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: 2.4.0 Blockers, Critical, etc

Felix Cheung
I think the point is we actually need to do these validation before completing the release...

 

From: Wenchen Fan <[hidden email]>
Sent: Friday, September 21, 2018 12:02 AM
To: Sean Owen
Cc: Spark dev list
Subject: Re: 2.4.0 Blockers, Critical, etc
 
Sean thanks for checking them!

I made one pass and re-targeted/closed some of them. Most of them are documentation and auditing, do we need to block the release for them?

On Fri, Sep 21, 2018 at 6:01 AM Sean Owen <[hidden email]> wrote:
Because we're into 2.4 release candidates, I thought I'd look at
what's still open and targeted at 2.4.0. I presume the Blockers are
the usual umbrellas that don't themselves block anything, but,
confirming, there is nothing left to do there?

I think that's mostly a question for Joseph and Weichen.

As ever, anyone who knows these items are a) done or b) not going to
be in 2.4, go ahead and update them.


Blocker:

SPARK-25321 ML, Graph 2.4 QA: API: New Scala APIs, docs
SPARK-25324 ML 2.4 QA: API: Java compatibility, docs
SPARK-25323 ML 2.4 QA: API: Python API coverage
SPARK-25320 ML, Graph 2.4 QA: API: Binary incompatible changes

Critical:

SPARK-25319 Spark MLlib, GraphX 2.4 QA umbrella
SPARK-25378 ArrayData.toArray(StringType) assume UTF8String in 2.4
SPARK-25327 Update MLlib, GraphX websites for 2.4
SPARK-25325 ML, Graph 2.4 QA: Update user guide for new features & APIs
SPARK-25326 ML, Graph 2.4 QA: Programming guide update and migration guide

Other:

SPARK-25346 Document Spark builtin data sources
SPARK-25347 Document image data source in doc site
SPARK-12978 Skip unnecessary final group-by when input data already
clustered with group-by keys
SPARK-20184 performance regression for complex/long sql when enable
whole stage codegen
SPARK-16196 Optimize in-memory scan performance using ColumnarBatches
SPARK-15693 Write schema definition out for file-based data sources to
avoid schema inference
SPARK-23597 Audit Spark SQL code base for non-interpreted expressions
SPARK-25179 Document the features that require Pyarrow 0.10
SPARK-25110 make sure Flume streaming connector works with Spark 2.4
SPARK-21318 The exception message thrown by `lookupFunction` is ambiguous.
SPARK-24464 Unit tests for MLlib's Instrumentation
SPARK-23197 Flaky test: spark.streaming.ReceiverSuite."receiver_life_cycle"
SPARK-22809 pyspark is sensitive to imports with dots
SPARK-22739 Additional Expression Support for Objects
SPARK-22231 Support of map, filter, withColumn, dropColumn in nested
list of structures
SPARK-21030 extend hint syntax to support any expression for Python and R
SPARK-22386 Data Source V2 improvements
SPARK-15117 Generate code that get a value in each compressed column
from CachedBatch when DataFrame.cache() is called

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: 2.4.0 Blockers, Critical, etc

Sean Owen-3
Yes, documentation for 2.4 has to be done before the 2.4 release. Or
else it's not for 2.4. Likewise auditing that must happen before 2.4,
must happen before 2.4 is released.
"Foo for 2.4" as Blocker for 2.4 needs to be resolved for 2.4, by
definition. Or else it's not a Blocker, not for 2.4.

 I know we've had this discussion before and agree to disagree about
the semantics. But we won't, say, release 2.4.0 and then go
retroactively patch the 2.4.0 released docs with docs for 2.4.

Really, I'm just asking if all the things those items mean to cover
are done? even if for whatever reason the JIRA is not resolved.

We have a new blocker thought, FWIW:
https://issues.apache.org/jira/browse/SPARK-25495
On Fri, Sep 21, 2018 at 3:02 AM Felix Cheung <[hidden email]> wrote:

>
> I think the point is we actually need to do these validation before completing the release...
>
>
> ________________________________
> From: Wenchen Fan <[hidden email]>
> Sent: Friday, September 21, 2018 12:02 AM
> To: Sean Owen
> Cc: Spark dev list
> Subject: Re: 2.4.0 Blockers, Critical, etc
>
> Sean thanks for checking them!
>
> I made one pass and re-targeted/closed some of them. Most of them are documentation and auditing, do we need to block the release for them?
>
> On Fri, Sep 21, 2018 at 6:01 AM Sean Owen <[hidden email]> wrote:
>>
>> Because we're into 2.4 release candidates, I thought I'd look at
>> what's still open and targeted at 2.4.0. I presume the Blockers are
>> the usual umbrellas that don't themselves block anything, but,
>> confirming, there is nothing left to do there?
>>
>> I think that's mostly a question for Joseph and Weichen.
>>
>> As ever, anyone who knows these items are a) done or b) not going to
>> be in 2.4, go ahead and update them.
>>
>>
>> Blocker:
>>
>> SPARK-25321 ML, Graph 2.4 QA: API: New Scala APIs, docs
>> SPARK-25324 ML 2.4 QA: API: Java compatibility, docs
>> SPARK-25323 ML 2.4 QA: API: Python API coverage
>> SPARK-25320 ML, Graph 2.4 QA: API: Binary incompatible changes
>>
>> Critical:
>>
>> SPARK-25319 Spark MLlib, GraphX 2.4 QA umbrella
>> SPARK-25378 ArrayData.toArray(StringType) assume UTF8String in 2.4
>> SPARK-25327 Update MLlib, GraphX websites for 2.4
>> SPARK-25325 ML, Graph 2.4 QA: Update user guide for new features & APIs
>> SPARK-25326 ML, Graph 2.4 QA: Programming guide update and migration guide
>>
>> Other:
>>
>> SPARK-25346 Document Spark builtin data sources
>> SPARK-25347 Document image data source in doc site
>> SPARK-12978 Skip unnecessary final group-by when input data already
>> clustered with group-by keys
>> SPARK-20184 performance regression for complex/long sql when enable
>> whole stage codegen
>> SPARK-16196 Optimize in-memory scan performance using ColumnarBatches
>> SPARK-15693 Write schema definition out for file-based data sources to
>> avoid schema inference
>> SPARK-23597 Audit Spark SQL code base for non-interpreted expressions
>> SPARK-25179 Document the features that require Pyarrow 0.10
>> SPARK-25110 make sure Flume streaming connector works with Spark 2.4
>> SPARK-21318 The exception message thrown by `lookupFunction` is ambiguous.
>> SPARK-24464 Unit tests for MLlib's Instrumentation
>> SPARK-23197 Flaky test: spark.streaming.ReceiverSuite."receiver_life_cycle"
>> SPARK-22809 pyspark is sensitive to imports with dots
>> SPARK-22739 Additional Expression Support for Objects
>> SPARK-22231 Support of map, filter, withColumn, dropColumn in nested
>> list of structures
>> SPARK-21030 extend hint syntax to support any expression for Python and R
>> SPARK-22386 Data Source V2 improvements
>> SPARK-15117 Generate code that get a value in each compressed column
>> from CachedBatch when DataFrame.cache() is called
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [hidden email]
>>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: 2.4.0 Blockers, Critical, etc

Xiangrui Meng-2
Sean, thanks for checking! The MLlib blockers were resolved today by reverting breaking API changes. We still have some documentation work to wrap up. -Xiangrui


On Fri, Sep 21, 2018 at 6:54 AM Sean Owen <[hidden email]> wrote:
Yes, documentation for 2.4 has to be done before the 2.4 release. Or
else it's not for 2.4. Likewise auditing that must happen before 2.4,
must happen before 2.4 is released.
"Foo for 2.4" as Blocker for 2.4 needs to be resolved for 2.4, by
definition. Or else it's not a Blocker, not for 2.4.

 I know we've had this discussion before and agree to disagree about
the semantics. But we won't, say, release 2.4.0 and then go
retroactively patch the 2.4.0 released docs with docs for 2.4.

Really, I'm just asking if all the things those items mean to cover
are done? even if for whatever reason the JIRA is not resolved.

We have a new blocker thought, FWIW:
https://issues.apache.org/jira/browse/SPARK-25495
On Fri, Sep 21, 2018 at 3:02 AM Felix Cheung <[hidden email]> wrote:
>
> I think the point is we actually need to do these validation before completing the release...
>
>
> ________________________________
> From: Wenchen Fan <[hidden email]>
> Sent: Friday, September 21, 2018 12:02 AM
> To: Sean Owen
> Cc: Spark dev list
> Subject: Re: 2.4.0 Blockers, Critical, etc
>
> Sean thanks for checking them!
>
> I made one pass and re-targeted/closed some of them. Most of them are documentation and auditing, do we need to block the release for them?
>
> On Fri, Sep 21, 2018 at 6:01 AM Sean Owen <[hidden email]> wrote:
>>
>> Because we're into 2.4 release candidates, I thought I'd look at
>> what's still open and targeted at 2.4.0. I presume the Blockers are
>> the usual umbrellas that don't themselves block anything, but,
>> confirming, there is nothing left to do there?
>>
>> I think that's mostly a question for Joseph and Weichen.
>>
>> As ever, anyone who knows these items are a) done or b) not going to
>> be in 2.4, go ahead and update them.
>>
>>
>> Blocker:
>>
>> SPARK-25321 ML, Graph 2.4 QA: API: New Scala APIs, docs
>> SPARK-25324 ML 2.4 QA: API: Java compatibility, docs
>> SPARK-25323 ML 2.4 QA: API: Python API coverage
>> SPARK-25320 ML, Graph 2.4 QA: API: Binary incompatible changes
>>
>> Critical:
>>
>> SPARK-25319 Spark MLlib, GraphX 2.4 QA umbrella
>> SPARK-25378 ArrayData.toArray(StringType) assume UTF8String in 2.4
>> SPARK-25327 Update MLlib, GraphX websites for 2.4
>> SPARK-25325 ML, Graph 2.4 QA: Update user guide for new features & APIs
>> SPARK-25326 ML, Graph 2.4 QA: Programming guide update and migration guide
>>
>> Other:
>>
>> SPARK-25346 Document Spark builtin data sources
>> SPARK-25347 Document image data source in doc site
>> SPARK-12978 Skip unnecessary final group-by when input data already
>> clustered with group-by keys
>> SPARK-20184 performance regression for complex/long sql when enable
>> whole stage codegen
>> SPARK-16196 Optimize in-memory scan performance using ColumnarBatches
>> SPARK-15693 Write schema definition out for file-based data sources to
>> avoid schema inference
>> SPARK-23597 Audit Spark SQL code base for non-interpreted expressions
>> SPARK-25179 Document the features that require Pyarrow 0.10
>> SPARK-25110 make sure Flume streaming connector works with Spark 2.4
>> SPARK-21318 The exception message thrown by `lookupFunction` is ambiguous.
>> SPARK-24464 Unit tests for MLlib's Instrumentation
>> SPARK-23197 Flaky test: spark.streaming.ReceiverSuite."receiver_life_cycle"
>> SPARK-22809 pyspark is sensitive to imports with dots
>> SPARK-22739 Additional Expression Support for Objects
>> SPARK-22231 Support of map, filter, withColumn, dropColumn in nested
>> list of structures
>> SPARK-21030 extend hint syntax to support any expression for Python and R
>> SPARK-22386 Data Source V2 improvements
>> SPARK-15117 Generate code that get a value in each compressed column
>> from CachedBatch when DataFrame.cache() is called
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [hidden email]
>>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

--

Xiangrui Meng

Software Engineer

Databricks Inc. http://databricks.com