Branch 2.4 is cut

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Branch 2.4 is cut

cloud0fan
Hi all,

I've cut the branch-2.4 since all the major blockers are resolved. If no objections I'll shortly followup with an RC to get the QA started in parallel.

Committers, please only merge PRs to branch-2.4 that are bug fixes, performance regression fixes, document changes, or test suites changes.

Thanks,
Wenchen
Reply | Threaded
Open this post in threaded view
|

Re: Branch 2.4 is cut

Hyukjin Kwon
Thanks, Wenchen.

2018년 9월 6일 (목) 오후 3:32, Wenchen Fan <[hidden email]>님이 작성:
Hi all,

I've cut the branch-2.4 since all the major blockers are resolved. If no objections I'll shortly followup with an RC to get the QA started in parallel.

Committers, please only merge PRs to branch-2.4 that are bug fixes, performance regression fixes, document changes, or test suites changes.

Thanks,
Wenchen
Reply | Threaded
Open this post in threaded view
|

Re: Branch 2.4 is cut

Sean Owen-2
In reply to this post by cloud0fan
BTW it does appear the Scala 2.12 build works now:

Let's try also producing a 2.12 build with this release. The machinery should be there in the release scripts, but let me know if something fails while running the release for 2.12.

On Thu, Sep 6, 2018 at 12:32 AM Wenchen Fan <[hidden email]> wrote:
Hi all,

I've cut the branch-2.4 since all the major blockers are resolved. If no objections I'll shortly followup with an RC to get the QA started in parallel.

Committers, please only merge PRs to branch-2.4 that are bug fixes, performance regression fixes, document changes, or test suites changes.

Thanks,
Wenchen
Reply | Threaded
Open this post in threaded view
|

Re: Branch 2.4 is cut

cloud0fan
Good news! I'll try and update you later. Thanks!

On Thu, Sep 6, 2018 at 9:44 PM Sean Owen <[hidden email]> wrote:
BTW it does appear the Scala 2.12 build works now:

Let's try also producing a 2.12 build with this release. The machinery should be there in the release scripts, but let me know if something fails while running the release for 2.12.

On Thu, Sep 6, 2018 at 12:32 AM Wenchen Fan <[hidden email]> wrote:
Hi all,

I've cut the branch-2.4 since all the major blockers are resolved. If no objections I'll shortly followup with an RC to get the QA started in parallel.

Committers, please only merge PRs to branch-2.4 that are bug fixes, performance regression fixes, document changes, or test suites changes.

Thanks,
Wenchen
Reply | Threaded
Open this post in threaded view
|

Re: Branch 2.4 is cut

Dongjoon Hyun-2
Great for branch cut and Scala 2.12 build.

We also need to add `branch-2.4` to our Jenkins dashboard to prevent any regression.


Bests,
Dongjoon.


On Thu, Sep 6, 2018 at 6:56 AM Wenchen Fan <[hidden email]> wrote:
Good news! I'll try and update you later. Thanks!

On Thu, Sep 6, 2018 at 9:44 PM Sean Owen <[hidden email]> wrote:
BTW it does appear the Scala 2.12 build works now:

Let's try also producing a 2.12 build with this release. The machinery should be there in the release scripts, but let me know if something fails while running the release for 2.12.

On Thu, Sep 6, 2018 at 12:32 AM Wenchen Fan <[hidden email]> wrote:
Hi all,

I've cut the branch-2.4 since all the major blockers are resolved. If no objections I'll shortly followup with an RC to get the QA started in parallel.

Committers, please only merge PRs to branch-2.4 that are bug fixes, performance regression fixes, document changes, or test suites changes.

Thanks,
Wenchen
Reply | Threaded
Open this post in threaded view
|

Re: Branch 2.4 is cut

Sean Owen-2
CC Shane who might have the permission to do so.

If master is going to be 3.0, we can remove the Hadoop 2.6 builds soon for master, note. We could remove them now, honestly.

On Thu, Sep 6, 2018 at 10:09 AM Dongjoon Hyun <[hidden email]> wrote:
Great for branch cut and Scala 2.12 build.

We also need to add `branch-2.4` to our Jenkins dashboard to prevent any regression.


Bests,
Dongjoon.


On Thu, Sep 6, 2018 at 6:56 AM Wenchen Fan <[hidden email]> wrote:
Good news! I'll try and update you later. Thanks!

On Thu, Sep 6, 2018 at 9:44 PM Sean Owen <[hidden email]> wrote:
BTW it does appear the Scala 2.12 build works now:

Let's try also producing a 2.12 build with this release. The machinery should be there in the release scripts, but let me know if something fails while running the release for 2.12.

On Thu, Sep 6, 2018 at 12:32 AM Wenchen Fan <[hidden email]> wrote:
Hi all,

I've cut the branch-2.4 since all the major blockers are resolved. If no objections I'll shortly followup with an RC to get the QA started in parallel.

Committers, please only merge PRs to branch-2.4 that are bug fixes, performance regression fixes, document changes, or test suites changes.

Thanks,
Wenchen
Reply | Threaded
Open this post in threaded view
|

Re: Branch 2.4 is cut

cloud0fan
I've reached to Shane, but he is busy recently. I'll figure it out with Josh soon. Will post update to this thread later.

Thanks,
Wenchen

On Fri, Sep 7, 2018 at 11:01 PM Sean Owen <[hidden email]> wrote:
CC Shane who might have the permission to do so.

If master is going to be 3.0, we can remove the Hadoop 2.6 builds soon for master, note. We could remove them now, honestly.

On Thu, Sep 6, 2018 at 10:09 AM Dongjoon Hyun <[hidden email]> wrote:
Great for branch cut and Scala 2.12 build.

We also need to add `branch-2.4` to our Jenkins dashboard to prevent any regression.


Bests,
Dongjoon.


On Thu, Sep 6, 2018 at 6:56 AM Wenchen Fan <[hidden email]> wrote:
Good news! I'll try and update you later. Thanks!

On Thu, Sep 6, 2018 at 9:44 PM Sean Owen <[hidden email]> wrote:
BTW it does appear the Scala 2.12 build works now:

Let's try also producing a 2.12 build with this release. The machinery should be there in the release scripts, but let me know if something fails while running the release for 2.12.

On Thu, Sep 6, 2018 at 12:32 AM Wenchen Fan <[hidden email]> wrote:
Hi all,

I've cut the branch-2.4 since all the major blockers are resolved. If no objections I'll shortly followup with an RC to get the QA started in parallel.

Committers, please only merge PRs to branch-2.4 that are bug fixes, performance regression fixes, document changes, or test suites changes.

Thanks,
Wenchen
Reply | Threaded
Open this post in threaded view
|

Re: Branch 2.4 is cut

Holden Karau
Was doing my weekly code review and went to close an issue, but since it wasn't one of the categories listed wasn't going to merge into the 2.4 branch but we need a new version in JIRA for us to close issues to that are going to merge into master but not branch-2.4. Do we know what the next version is going to be yet or should we make a place holder tag that we can change later?

On Fri, Sep 7, 2018 at 8:15 AM Wenchen Fan <[hidden email]> wrote:
I've reached to Shane, but he is busy recently. I'll figure it out with Josh soon. Will post update to this thread later.

Thanks,
Wenchen

On Fri, Sep 7, 2018 at 11:01 PM Sean Owen <[hidden email]> wrote:
CC Shane who might have the permission to do so.

If master is going to be 3.0, we can remove the Hadoop 2.6 builds soon for master, note. We could remove them now, honestly.

On Thu, Sep 6, 2018 at 10:09 AM Dongjoon Hyun <[hidden email]> wrote:
Great for branch cut and Scala 2.12 build.

We also need to add `branch-2.4` to our Jenkins dashboard to prevent any regression.


Bests,
Dongjoon.


On Thu, Sep 6, 2018 at 6:56 AM Wenchen Fan <[hidden email]> wrote:
Good news! I'll try and update you later. Thanks!

On Thu, Sep 6, 2018 at 9:44 PM Sean Owen <[hidden email]> wrote:
BTW it does appear the Scala 2.12 build works now:

Let's try also producing a 2.12 build with this release. The machinery should be there in the release scripts, but let me know if something fails while running the release for 2.12.

On Thu, Sep 6, 2018 at 12:32 AM Wenchen Fan <[hidden email]> wrote:
Hi all,

I've cut the branch-2.4 since all the major blockers are resolved. If no objections I'll shortly followup with an RC to get the QA started in parallel.

Committers, please only merge PRs to branch-2.4 that are bug fixes, performance regression fixes, document changes, or test suites changes.

Thanks,
Wenchen


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: Branch 2.4 is cut

Sean Owen-2
I'm just using 3.0 but would not hurt to create 2.5.0. If 2.5 doesn't happen then we just move those to 3.0.0 later.

On Fri, Sep 7, 2018, 9:40 AM Holden Karau <[hidden email]> wrote:
Was doing my weekly code review and went to close an issue, but since it wasn't one of the categories listed wasn't going to merge into the 2.4 branch but we need a new version in JIRA for us to close issues to that are going to merge into master but not branch-2.4. Do we know what the next version is going to be yet or should we make a place holder tag that we can change later?

On Fri, Sep 7, 2018 at 8:15 AM Wenchen Fan <[hidden email]> wrote:
I've reached to Shane, but he is busy recently. I'll figure it out with Josh soon. Will post update to this thread later.

Thanks,
Wenchen

On Fri, Sep 7, 2018 at 11:01 PM Sean Owen <[hidden email]> wrote:
CC Shane who might have the permission to do so.

If master is going to be 3.0, we can remove the Hadoop 2.6 builds soon for master, note. We could remove them now, honestly.

On Thu, Sep 6, 2018 at 10:09 AM Dongjoon Hyun <[hidden email]> wrote:
Great for branch cut and Scala 2.12 build.

We also need to add `branch-2.4` to our Jenkins dashboard to prevent any regression.


Bests,
Dongjoon.


On Thu, Sep 6, 2018 at 6:56 AM Wenchen Fan <[hidden email]> wrote:
Good news! I'll try and update you later. Thanks!

On Thu, Sep 6, 2018 at 9:44 PM Sean Owen <[hidden email]> wrote:
BTW it does appear the Scala 2.12 build works now:

Let's try also producing a 2.12 build with this release. The machinery should be there in the release scripts, but let me know if something fails while running the release for 2.12.

On Thu, Sep 6, 2018 at 12:32 AM Wenchen Fan <[hidden email]> wrote:
Hi all,

I've cut the branch-2.4 since all the major blockers are resolved. If no objections I'll shortly followup with an RC to get the QA started in parallel.

Committers, please only merge PRs to branch-2.4 that are bug fixes, performance regression fixes, document changes, or test suites changes.

Thanks,
Wenchen


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: Branch 2.4 is cut

shane knapp
In reply to this post by cloud0fan
i'll try and get to the 2.4 branch stuff today...  

On Fri, Sep 7, 2018 at 8:15 AM, Wenchen Fan <[hidden email]> wrote:
I've reached to Shane, but he is busy recently. I'll figure it out with Josh soon. Will post update to this thread later.

Thanks,
Wenchen

On Fri, Sep 7, 2018 at 11:01 PM Sean Owen <[hidden email]> wrote:
CC Shane who might have the permission to do so.

If master is going to be 3.0, we can remove the Hadoop 2.6 builds soon for master, note. We could remove them now, honestly.

On Thu, Sep 6, 2018 at 10:09 AM Dongjoon Hyun <[hidden email]> wrote:
Great for branch cut and Scala 2.12 build.

We also need to add `branch-2.4` to our Jenkins dashboard to prevent any regression.


Bests,
Dongjoon.


On Thu, Sep 6, 2018 at 6:56 AM Wenchen Fan <[hidden email]> wrote:
Good news! I'll try and update you later. Thanks!

On Thu, Sep 6, 2018 at 9:44 PM Sean Owen <[hidden email]> wrote:
BTW it does appear the Scala 2.12 build works now:

Let's try also producing a 2.12 build with this release. The machinery should be there in the release scripts, but let me know if something fails while running the release for 2.12.

On Thu, Sep 6, 2018 at 12:32 AM Wenchen Fan <[hidden email]> wrote:
Hi all,

I've cut the branch-2.4 since all the major blockers are resolved. If no objections I'll shortly followup with an RC to get the QA started in parallel.

Committers, please only merge PRs to branch-2.4 that are bug fixes, performance regression fixes, document changes, or test suites changes.

Thanks,
Wenchen



--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
Reply | Threaded
Open this post in threaded view
|

Re: Branch 2.4 is cut

Dongjoon Hyun-2
Thank you, Shane! :D

Bests,
Dongjoon.

On Fri, Sep 7, 2018 at 9:51 AM shane knapp <[hidden email]> wrote:
i'll try and get to the 2.4 branch stuff today...  

Reply | Threaded
Open this post in threaded view
|

Re: Branch 2.4 is cut

shane knapp
++joshrosen  (thanks for the help w/deploying the jenkins configs)

the basic 2.4 builds are deployed and building!

i haven't created (a) build(s) yet for scala 2.12...  i'll be coordinating this w/the databricks folks next week.

On Fri, Sep 7, 2018 at 9:53 AM, Dongjoon Hyun <[hidden email]> wrote:
Thank you, Shane! :D

Bests,
Dongjoon.

On Fri, Sep 7, 2018 at 9:51 AM shane knapp <[hidden email]> wrote:
i'll try and get to the 2.4 branch stuff today...  




--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
Reply | Threaded
Open this post in threaded view
|

Re: Branch 2.4 is cut

Ryan Blue
Wenchen, can you hold off on the first RC?

The half-finished changes from the redesign of the DataSourceV2 API are in master, added in SPARK-24882, and are now in the 2.4 branch. We've had a lot of good discussion since that PR was merged to update and fix the design, plus only one of the follow-ups on SPARK-25186 is done. Clearly, the redesign was too large to get into 2.4 in so little time -- it was proposed about 10 days before the original branch date -- and I don't think it is a good idea to release half-finished major changes.

The easiest solution is to revert SPARK-24882 in the release branch. That way we have minor changes in 2.4 and major changes in the next release, instead of major changes in both. What does everyone think?

rb

On Fri, Sep 7, 2018 at 10:37 AM shane knapp <[hidden email]> wrote:
++joshrosen  (thanks for the help w/deploying the jenkins configs)

the basic 2.4 builds are deployed and building!

i haven't created (a) build(s) yet for scala 2.12...  i'll be coordinating this w/the databricks folks next week.

On Fri, Sep 7, 2018 at 9:53 AM, Dongjoon Hyun <[hidden email]> wrote:
Thank you, Shane! :D

Bests,
Dongjoon.

On Fri, Sep 7, 2018 at 9:51 AM shane knapp <[hidden email]> wrote:
i'll try and get to the 2.4 branch stuff today...  




--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead


--
Ryan Blue
Software Engineer
Netflix
Reply | Threaded
Open this post in threaded view
|

Re: Branch 2.4 is cut

cloud0fan
Strictly speaking, data source v2 is always half-finished until we mark it as stable. We need some small milestones to move forward step by step.

The redesign also happens in an incremental way. SPARK-24882 mostly focus on the "RDD" part of the API: the separation of reader factory and input partitions, the introduction of ScanConfig, etc. Then we focus on the high-level abstraction and want to change the "table" part of the API.

In my understanding, each PR should be self-contained. If we are OK to have SPARK-24882 in master as an individual commit, I think it's also OK to have it in branch 2.4.

I've created https://issues.apache.org/jira/browse/SPARK-25390 to track the new abstraction. It doesn't change the API a lot, but update the streaming execution engine quite a bit.

Thanks,
Wenchen

On Mon, Sep 10, 2018 at 4:20 AM Ryan Blue <[hidden email]> wrote:
Wenchen, can you hold off on the first RC?

The half-finished changes from the redesign of the DataSourceV2 API are in master, added in SPARK-24882, and are now in the 2.4 branch. We've had a lot of good discussion since that PR was merged to update and fix the design, plus only one of the follow-ups on SPARK-25186 is done. Clearly, the redesign was too large to get into 2.4 in so little time -- it was proposed about 10 days before the original branch date -- and I don't think it is a good idea to release half-finished major changes.

The easiest solution is to revert SPARK-24882 in the release branch. That way we have minor changes in 2.4 and major changes in the next release, instead of major changes in both. What does everyone think?

rb

On Fri, Sep 7, 2018 at 10:37 AM shane knapp <[hidden email]> wrote:
++joshrosen  (thanks for the help w/deploying the jenkins configs)

the basic 2.4 builds are deployed and building!

i haven't created (a) build(s) yet for scala 2.12...  i'll be coordinating this w/the databricks folks next week.

On Fri, Sep 7, 2018 at 9:53 AM, Dongjoon Hyun <[hidden email]> wrote:
Thank you, Shane! :D

Bests,
Dongjoon.

On Fri, Sep 7, 2018 at 9:51 AM shane knapp <[hidden email]> wrote:
i'll try and get to the 2.4 branch stuff today...  




--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead


--
Ryan Blue
Software Engineer
Netflix
Reply | Threaded
Open this post in threaded view
|

Re: Branch 2.4 is cut

Arun Mahadevan
Ryan's proposal makes a lot of sense. Its better not to release half-baked changes in 2.4 which not only breaks a lot of the APIs released in 2.3, but also expected to change further due redesigns before 3.0 so don't see much value releasing it in 2.4.

On Sun, 9 Sep 2018 at 22:42, Wenchen Fan <[hidden email]> wrote:
Strictly speaking, data source v2 is always half-finished until we mark it as stable. We need some small milestones to move forward step by step.

The redesign also happens in an incremental way. SPARK-24882 mostly focus on the "RDD" part of the API: the separation of reader factory and input partitions, the introduction of ScanConfig, etc. Then we focus on the high-level abstraction and want to change the "table" part of the API.

In my understanding, each PR should be self-contained. If we are OK to have SPARK-24882 in master as an individual commit, I think it's also OK to have it in branch 2.4.

I've created https://issues.apache.org/jira/browse/SPARK-25390 to track the new abstraction. It doesn't change the API a lot, but update the streaming execution engine quite a bit.

Thanks,
Wenchen

On Mon, Sep 10, 2018 at 4:20 AM Ryan Blue <[hidden email]> wrote:
Wenchen, can you hold off on the first RC?

The half-finished changes from the redesign of the DataSourceV2 API are in master, added in SPARK-24882, and are now in the 2.4 branch. We've had a lot of good discussion since that PR was merged to update and fix the design, plus only one of the follow-ups on SPARK-25186 is done. Clearly, the redesign was too large to get into 2.4 in so little time -- it was proposed about 10 days before the original branch date -- and I don't think it is a good idea to release half-finished major changes.

The easiest solution is to revert SPARK-24882 in the release branch. That way we have minor changes in 2.4 and major changes in the next release, instead of major changes in both. What does everyone think?

rb

On Fri, Sep 7, 2018 at 10:37 AM shane knapp <[hidden email]> wrote:
++joshrosen  (thanks for the help w/deploying the jenkins configs)

the basic 2.4 builds are deployed and building!

i haven't created (a) build(s) yet for scala 2.12...  i'll be coordinating this w/the databricks folks next week.

On Fri, Sep 7, 2018 at 9:53 AM, Dongjoon Hyun <[hidden email]> wrote:
Thank you, Shane! :D

Bests,
Dongjoon.

On Fri, Sep 7, 2018 at 9:51 AM shane knapp <[hidden email]> wrote:
i'll try and get to the 2.4 branch stuff today...  




--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead


--
Ryan Blue
Software Engineer
Netflix
Reply | Threaded
Open this post in threaded view
|

Re: Branch 2.4 is cut

cloud0fan
There are a lot of "breaking" changes we made in 2.4 for data source v2, while I agree SPARK-24882 is "breaking" most.

I don't agree SPARK-24882 is half-baked. But I'm willing to revert it if we have a bunch of data source v2 users and they are not willing to update their implementation intensely before data source v2 API is stabilized.

On Mon, Sep 10, 2018 at 2:55 PM Arun Mahadevan <[hidden email]> wrote:
Ryan's proposal makes a lot of sense. Its better not to release half-baked changes in 2.4 which not only breaks a lot of the APIs released in 2.3, but also expected to change further due redesigns before 3.0 so don't see much value releasing it in 2.4.

On Sun, 9 Sep 2018 at 22:42, Wenchen Fan <[hidden email]> wrote:
Strictly speaking, data source v2 is always half-finished until we mark it as stable. We need some small milestones to move forward step by step.

The redesign also happens in an incremental way. SPARK-24882 mostly focus on the "RDD" part of the API: the separation of reader factory and input partitions, the introduction of ScanConfig, etc. Then we focus on the high-level abstraction and want to change the "table" part of the API.

In my understanding, each PR should be self-contained. If we are OK to have SPARK-24882 in master as an individual commit, I think it's also OK to have it in branch 2.4.

I've created https://issues.apache.org/jira/browse/SPARK-25390 to track the new abstraction. It doesn't change the API a lot, but update the streaming execution engine quite a bit.

Thanks,
Wenchen

On Mon, Sep 10, 2018 at 4:20 AM Ryan Blue <[hidden email]> wrote:
Wenchen, can you hold off on the first RC?

The half-finished changes from the redesign of the DataSourceV2 API are in master, added in SPARK-24882, and are now in the 2.4 branch. We've had a lot of good discussion since that PR was merged to update and fix the design, plus only one of the follow-ups on SPARK-25186 is done. Clearly, the redesign was too large to get into 2.4 in so little time -- it was proposed about 10 days before the original branch date -- and I don't think it is a good idea to release half-finished major changes.

The easiest solution is to revert SPARK-24882 in the release branch. That way we have minor changes in 2.4 and major changes in the next release, instead of major changes in both. What does everyone think?

rb

On Fri, Sep 7, 2018 at 10:37 AM shane knapp <[hidden email]> wrote:
++joshrosen  (thanks for the help w/deploying the jenkins configs)

the basic 2.4 builds are deployed and building!

i haven't created (a) build(s) yet for scala 2.12...  i'll be coordinating this w/the databricks folks next week.

On Fri, Sep 7, 2018 at 9:53 AM, Dongjoon Hyun <[hidden email]> wrote:
Thank you, Shane! :D

Bests,
Dongjoon.

On Fri, Sep 7, 2018 at 9:51 AM shane knapp <[hidden email]> wrote:
i'll try and get to the 2.4 branch stuff today...  




--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead


--
Ryan Blue
Software Engineer
Netflix
Reply | Threaded
Open this post in threaded view
|

Re: Branch 2.4 is cut

Felix Cheung
I’m a bit concern about what Arun is summarizing?

We are building on DSv2 and already have to rewrite for bunch of changes in master/2.4, increasing in cost for dev work and release management.

If we are saying more changes are coming in 3.0, do we have more info on what value the current changes in 2.4 are adding now?


 

From: Wenchen Fan <[hidden email]>
Sent: Monday, September 10, 2018 12:35 AM
To: [hidden email]
Cc: Ryan Blue; [hidden email]; Dongjoon Hyun; [hidden email]; Sean Owen; Spark dev list
Subject: Re: Branch 2.4 is cut
 
There are a lot of "breaking" changes we made in 2.4 for data source v2, while I agree SPARK-24882 is "breaking" most.

I don't agree SPARK-24882 is half-baked. But I'm willing to revert it if we have a bunch of data source v2 users and they are not willing to update their implementation intensely before data source v2 API is stabilized.

On Mon, Sep 10, 2018 at 2:55 PM Arun Mahadevan <[hidden email]> wrote:
Ryan's proposal makes a lot of sense. Its better not to release half-baked changes in 2.4 which not only breaks a lot of the APIs released in 2.3, but also expected to change further due redesigns before 3.0 so don't see much value releasing it in 2.4.

On Sun, 9 Sep 2018 at 22:42, Wenchen Fan <[hidden email]> wrote:
Strictly speaking, data source v2 is always half-finished until we mark it as stable. We need some small milestones to move forward step by step.

The redesign also happens in an incremental way. SPARK-24882 mostly focus on the "RDD" part of the API: the separation of reader factory and input partitions, the introduction of ScanConfig, etc. Then we focus on the high-level abstraction and want to change the "table" part of the API.

In my understanding, each PR should be self-contained. If we are OK to have SPARK-24882 in master as an individual commit, I think it's also OK to have it in branch 2.4.

I've created https://issues.apache.org/jira/browse/SPARK-25390 to track the new abstraction. It doesn't change the API a lot, but update the streaming execution engine quite a bit.

Thanks,
Wenchen

On Mon, Sep 10, 2018 at 4:20 AM Ryan Blue <[hidden email]> wrote:
Wenchen, can you hold off on the first RC?

The half-finished changes from the redesign of the DataSourceV2 API are in master, added in SPARK-24882, and are now in the 2.4 branch. We've had a lot of good discussion since that PR was merged to update and fix the design, plus only one of the follow-ups on SPARK-25186 is done. Clearly, the redesign was too large to get into 2.4 in so little time -- it was proposed about 10 days before the original branch date -- and I don't think it is a good idea to release half-finished major changes.

The easiest solution is to revert SPARK-24882 in the release branch. That way we have minor changes in 2.4 and major changes in the next release, instead of major changes in both. What does everyone think?

rb

On Fri, Sep 7, 2018 at 10:37 AM shane knapp <[hidden email]> wrote:
++joshrosen  (thanks for the help w/deploying the jenkins configs)

the basic 2.4 builds are deployed and building!

i haven't created (a) build(s) yet for scala 2.12...  i'll be coordinating this w/the databricks folks next week.

On Fri, Sep 7, 2018 at 9:53 AM, Dongjoon Hyun <[hidden email]> wrote:
Thank you, Shane! :D

Bests,
Dongjoon.

On Fri, Sep 7, 2018 at 9:51 AM shane knapp <[hidden email]> wrote:
i'll try and get to the 2.4 branch stuff today...  




--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead


--
Ryan Blue
Software Engineer
Netflix
Reply | Threaded
Open this post in threaded view
|

Re: Branch 2.4 is cut

cloud0fan
In reply to this post by cloud0fan
Since it's not a clean revert, I've sent a PR to revert it from 2.4, please take a look, thanks!

On Tue, Sep 11, 2018 at 1:16 AM Ryan Blue <[hidden email]> wrote:
SPARK-24882 was committed in order to make some progress, with a note about following up with separate PRs. But the reason why all of the open discussions were happening on the same PR is that this was so close to the 2.4 branching. I wanted to make sure that either the redesign was finished or it didn't go into 2.4.

There are major changes that need to happen for the next release, like updating the write path. I think it would be better not to change this only to include another major change in the next release.

On Sun, Sep 9, 2018 at 10:41 PM Wenchen Fan <[hidden email]> wrote:
Strictly speaking, data source v2 is always half-finished until we mark it as stable. We need some small milestones to move forward step by step.

The redesign also happens in an incremental way. SPARK-24882 mostly focus on the "RDD" part of the API: the separation of reader factory and input partitions, the introduction of ScanConfig, etc. Then we focus on the high-level abstraction and want to change the "table" part of the API.

In my understanding, each PR should be self-contained. If we are OK to have SPARK-24882 in master as an individual commit, I think it's also OK to have it in branch 2.4.

I've created https://issues.apache.org/jira/browse/SPARK-25390 to track the new abstraction. It doesn't change the API a lot, but update the streaming execution engine quite a bit.

Thanks,
Wenchen

On Mon, Sep 10, 2018 at 4:20 AM Ryan Blue <[hidden email]> wrote:
Wenchen, can you hold off on the first RC?

The half-finished changes from the redesign of the DataSourceV2 API are in master, added in SPARK-24882, and are now in the 2.4 branch. We've had a lot of good discussion since that PR was merged to update and fix the design, plus only one of the follow-ups on SPARK-25186 is done. Clearly, the redesign was too large to get into 2.4 in so little time -- it was proposed about 10 days before the original branch date -- and I don't think it is a good idea to release half-finished major changes.

The easiest solution is to revert SPARK-24882 in the release branch. That way we have minor changes in 2.4 and major changes in the next release, instead of major changes in both. What does everyone think?

rb

On Fri, Sep 7, 2018 at 10:37 AM shane knapp <[hidden email]> wrote:
++joshrosen  (thanks for the help w/deploying the jenkins configs)

the basic 2.4 builds are deployed and building!

i haven't created (a) build(s) yet for scala 2.12...  i'll be coordinating this w/the databricks folks next week.

On Fri, Sep 7, 2018 at 9:53 AM, Dongjoon Hyun <[hidden email]> wrote:
Thank you, Shane! :D

Bests,
Dongjoon.

On Fri, Sep 7, 2018 at 9:51 AM shane knapp <[hidden email]> wrote:
i'll try and get to the 2.4 branch stuff today...  




--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead


--
Ryan Blue
Software Engineer
Netflix


--
Ryan Blue
Software Engineer
Netflix