[discuss] Spark 2.x release cadence

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[discuss] Spark 2.x release cadence

rxin
We are 2 months past releasing Spark 2.0.0, an important milestone for the project. Spark 2.0.0 deviated (took 6 month from the regular release cadence we had for the 1.x line, and we never explicitly discussed what the release cadence should look like for 2.x. Thus this email.

During Spark 1.x, roughly every three months we make a new 1.x feature release (e.g. 1.5.0 comes out three months after 1.4.0). Development happened primarily in the first two months, and then a release branch was cut at the end of month 2, and the last month was reserved for QA and release preparation.

During 2.0.0 development, I really enjoyed the longer release cycle because there was a lot of major changes happening and the longer time was critical for thinking through architectural changes as well as API design. While I don't expect the same degree of drastic changes in a 2.x feature release, I do think it'd make sense to increase the length of release cycle so we can make better designs.

My strawman proposal is to maintain a regular release cadence, as we did in Spark 1.x, and increase the cycle from 3 months to 4 months. This effectively gives us ~50% more time to develop (in reality it'd be slightly less than 50% since longer dev time also means longer QA time). As for maintenance releases, I think those should still be cut on-demand, similar to Spark 1.x, but more aggressively.

To put this into perspective, 4-month cycle means we will release Spark 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at the end of Oct).

I am curious what others think.


Reply | Threaded
Open this post in threaded view
|

Re: [discuss] Spark 2.x release cadence

Shivaram Venkataraman
+1 I think having a 4 month window instead of a 3 month window sounds good.

However I think figuring out a timeline for maintenance releases would
also be good. This is a common concern that comes up in many user
threads and it'll be better to have some structure around this. It
doesn't need to be strict, but something like the first maintenance
release for the latest 2.x.0 release within 2 months. And then a
second maintenance release within 6 months or something like that.

Thanks
Shivaram

On Tue, Sep 27, 2016 at 12:06 PM, Reynold Xin <[hidden email]> wrote:

> We are 2 months past releasing Spark 2.0.0, an important milestone for the
> project. Spark 2.0.0 deviated (took 6 month from the regular release cadence
> we had for the 1.x line, and we never explicitly discussed what the release
> cadence should look like for 2.x. Thus this email.
>
> During Spark 1.x, roughly every three months we make a new 1.x feature
> release (e.g. 1.5.0 comes out three months after 1.4.0). Development
> happened primarily in the first two months, and then a release branch was
> cut at the end of month 2, and the last month was reserved for QA and
> release preparation.
>
> During 2.0.0 development, I really enjoyed the longer release cycle because
> there was a lot of major changes happening and the longer time was critical
> for thinking through architectural changes as well as API design. While I
> don't expect the same degree of drastic changes in a 2.x feature release, I
> do think it'd make sense to increase the length of release cycle so we can
> make better designs.
>
> My strawman proposal is to maintain a regular release cadence, as we did in
> Spark 1.x, and increase the cycle from 3 months to 4 months. This
> effectively gives us ~50% more time to develop (in reality it'd be slightly
> less than 50% since longer dev time also means longer QA time). As for
> maintenance releases, I think those should still be cut on-demand, similar
> to Spark 1.x, but more aggressively.
>
> To put this into perspective, 4-month cycle means we will release Spark
> 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at the
> end of Oct).
>
> I am curious what others think.
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [discuss] Spark 2.x release cadence

Sean Owen
In reply to this post by rxin
+1 -- I think the minor releases were taking more like 4 months than 3
months anyway, and it was good for the reasons you give. This reflects
reality and is a good thing. All the better if we then can more
comfortably really follow the timeline.

On Tue, Sep 27, 2016 at 3:06 PM, Reynold Xin <[hidden email]> wrote:

> We are 2 months past releasing Spark 2.0.0, an important milestone for the
> project. Spark 2.0.0 deviated (took 6 month from the regular release cadence
> we had for the 1.x line, and we never explicitly discussed what the release
> cadence should look like for 2.x. Thus this email.
>
> During Spark 1.x, roughly every three months we make a new 1.x feature
> release (e.g. 1.5.0 comes out three months after 1.4.0). Development
> happened primarily in the first two months, and then a release branch was
> cut at the end of month 2, and the last month was reserved for QA and
> release preparation.
>
> During 2.0.0 development, I really enjoyed the longer release cycle because
> there was a lot of major changes happening and the longer time was critical
> for thinking through architectural changes as well as API design. While I
> don't expect the same degree of drastic changes in a 2.x feature release, I
> do think it'd make sense to increase the length of release cycle so we can
> make better designs.
>
> My strawman proposal is to maintain a regular release cadence, as we did in
> Spark 1.x, and increase the cycle from 3 months to 4 months. This
> effectively gives us ~50% more time to develop (in reality it'd be slightly
> less than 50% since longer dev time also means longer QA time). As for
> maintenance releases, I think those should still be cut on-demand, similar
> to Spark 1.x, but more aggressively.
>
> To put this into perspective, 4-month cycle means we will release Spark
> 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at the
> end of Oct).
>
> I am curious what others think.
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [discuss] Spark 2.x release cadence

Mark Hamstra
In reply to this post by rxin
+1

And I'll dare say that for those with Spark in production, what is more important is that maintenance releases come out in a timely fashion than that new features are released one month sooner or later.

On Tue, Sep 27, 2016 at 12:06 PM, Reynold Xin <[hidden email]> wrote:
We are 2 months past releasing Spark 2.0.0, an important milestone for the project. Spark 2.0.0 deviated (took 6 month from the regular release cadence we had for the 1.x line, and we never explicitly discussed what the release cadence should look like for 2.x. Thus this email.

During Spark 1.x, roughly every three months we make a new 1.x feature release (e.g. 1.5.0 comes out three months after 1.4.0). Development happened primarily in the first two months, and then a release branch was cut at the end of month 2, and the last month was reserved for QA and release preparation.

During 2.0.0 development, I really enjoyed the longer release cycle because there was a lot of major changes happening and the longer time was critical for thinking through architectural changes as well as API design. While I don't expect the same degree of drastic changes in a 2.x feature release, I do think it'd make sense to increase the length of release cycle so we can make better designs.

My strawman proposal is to maintain a regular release cadence, as we did in Spark 1.x, and increase the cycle from 3 months to 4 months. This effectively gives us ~50% more time to develop (in reality it'd be slightly less than 50% since longer dev time also means longer QA time). As for maintenance releases, I think those should still be cut on-demand, similar to Spark 1.x, but more aggressively.

To put this into perspective, 4-month cycle means we will release Spark 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at the end of Oct).

I am curious what others think.



Reply | Threaded
Open this post in threaded view
|

Re: [discuss] Spark 2.x release cadence

Felix Cheung
+1 on longer release cycle at schedule and more maintenance releases.


_____________________________
From: Mark Hamstra <[hidden email]>
Sent: Tuesday, September 27, 2016 2:01 PM
Subject: Re: [discuss] Spark 2.x release cadence
To: Reynold Xin <[hidden email]>
Cc: <[hidden email]>


+1

And I'll dare say that for those with Spark in production, what is more important is that maintenance releases come out in a timely fashion than that new features are released one month sooner or later.

On Tue, Sep 27, 2016 at 12:06 PM, Reynold Xin <[hidden email]> wrote:
We are 2 months past releasing Spark 2.0.0, an important milestone for the project. Spark 2.0.0 deviated (took 6 month from the regular release cadence we had for the 1.x line, and we never explicitly discussed what the release cadence should look like for 2.x. Thus this email.

During Spark 1.x, roughly every three months we make a new 1.x feature release (e.g. 1.5.0 comes out three months after 1.4.0). Development happened primarily in the first two months, and then a release branch was cut at the end of month 2, and the last month was reserved for QA and release preparation.

During 2.0.0 development, I really enjoyed the longer release cycle because there was a lot of major changes happening and the longer time was critical for thinking through architectural changes as well as API design. While I don't expect the same degree of drastic changes in a 2.x feature release, I do think it'd make sense to increase the length of release cycle so we can make better designs.

My strawman proposal is to maintain a regular release cadence, as we did in Spark 1.x, and increase the cycle from 3 months to 4 months. This effectively gives us ~50% more time to develop (in reality it'd be slightly less than 50% since longer dev time also means longer QA time). As for maintenance releases, I think those should still be cut on-demand, similar to Spark 1.x, but more aggressively.

To put this into perspective, 4-month cycle means we will release Spark 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at the end of Oct).

I am curious what others think.





Reply | Threaded
Open this post in threaded view
|

Re: [discuss] Spark 2.x release cadence

Tom Graves-2
In reply to this post by rxin
+1 to 4 months.

Tom


On Tuesday, September 27, 2016 2:07 PM, Reynold Xin <[hidden email]> wrote:


We are 2 months past releasing Spark 2.0.0, an important milestone for the project. Spark 2.0.0 deviated (took 6 month from the regular release cadence we had for the 1.x line, and we never explicitly discussed what the release cadence should look like for 2.x. Thus this email.

During Spark 1.x, roughly every three months we make a new 1.x feature release (e.g. 1.5.0 comes out three months after 1.4.0). Development happened primarily in the first two months, and then a release branch was cut at the end of month 2, and the last month was reserved for QA and release preparation.

During 2.0.0 development, I really enjoyed the longer release cycle because there was a lot of major changes happening and the longer time was critical for thinking through architectural changes as well as API design. While I don't expect the same degree of drastic changes in a 2.x feature release, I do think it'd make sense to increase the length of release cycle so we can make better designs.

My strawman proposal is to maintain a regular release cadence, as we did in Spark 1.x, and increase the cycle from 3 months to 4 months. This effectively gives us ~50% more time to develop (in reality it'd be slightly less than 50% since longer dev time also means longer QA time). As for maintenance releases, I think those should still be cut on-demand, similar to Spark 1.x, but more aggressively.

To put this into perspective, 4-month cycle means we will release Spark 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at the end of Oct).

I am curious what others think.




Reply | Threaded
Open this post in threaded view
|

Re: [discuss] Spark 2.x release cadence

Joseph Bradley
+1 for 4 months.  With QA taking about a month, that's very reasonable.

My main ask (especially for MLlib) is for contributors and committers to take extra care not to delay on updating the Programming Guide for new APIs.  Documentation debt often collects and has to be paid off during QA, and a longer cycle will exacerbate this problem.

On Wed, Sep 28, 2016 at 7:30 AM, Tom Graves <[hidden email]> wrote:
+1 to 4 months.

Tom


On Tuesday, September 27, 2016 2:07 PM, Reynold Xin <[hidden email]> wrote:


We are 2 months past releasing Spark 2.0.0, an important milestone for the project. Spark 2.0.0 deviated (took 6 month from the regular release cadence we had for the 1.x line, and we never explicitly discussed what the release cadence should look like for 2.x. Thus this email.

During Spark 1.x, roughly every three months we make a new 1.x feature release (e.g. 1.5.0 comes out three months after 1.4.0). Development happened primarily in the first two months, and then a release branch was cut at the end of month 2, and the last month was reserved for QA and release preparation.

During 2.0.0 development, I really enjoyed the longer release cycle because there was a lot of major changes happening and the longer time was critical for thinking through architectural changes as well as API design. While I don't expect the same degree of drastic changes in a 2.x feature release, I do think it'd make sense to increase the length of release cycle so we can make better designs.

My strawman proposal is to maintain a regular release cadence, as we did in Spark 1.x, and increase the cycle from 3 months to 4 months. This effectively gives us ~50% more time to develop (in reality it'd be slightly less than 50% since longer dev time also means longer QA time). As for maintenance releases, I think those should still be cut on-demand, similar to Spark 1.x, but more aggressively.

To put this into perspective, 4-month cycle means we will release Spark 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at the end of Oct).

I am curious what others think.





Reply | Threaded
Open this post in threaded view
|

Re: [discuss] Spark 2.x release cadence

Cody Koeninger-2
Regarding documentation debt, is there a reason not to deploy
documentation updates more frequently than releases?  I recall this
used to be the case.

On Wed, Sep 28, 2016 at 3:35 PM, Joseph Bradley <[hidden email]> wrote:

> +1 for 4 months.  With QA taking about a month, that's very reasonable.
>
> My main ask (especially for MLlib) is for contributors and committers to
> take extra care not to delay on updating the Programming Guide for new APIs.
> Documentation debt often collects and has to be paid off during QA, and a
> longer cycle will exacerbate this problem.
>
> On Wed, Sep 28, 2016 at 7:30 AM, Tom Graves <[hidden email]>
> wrote:
>>
>> +1 to 4 months.
>>
>> Tom
>>
>>
>> On Tuesday, September 27, 2016 2:07 PM, Reynold Xin <[hidden email]>
>> wrote:
>>
>>
>> We are 2 months past releasing Spark 2.0.0, an important milestone for the
>> project. Spark 2.0.0 deviated (took 6 month from the regular release cadence
>> we had for the 1.x line, and we never explicitly discussed what the release
>> cadence should look like for 2.x. Thus this email.
>>
>> During Spark 1.x, roughly every three months we make a new 1.x feature
>> release (e.g. 1.5.0 comes out three months after 1.4.0). Development
>> happened primarily in the first two months, and then a release branch was
>> cut at the end of month 2, and the last month was reserved for QA and
>> release preparation.
>>
>> During 2.0.0 development, I really enjoyed the longer release cycle
>> because there was a lot of major changes happening and the longer time was
>> critical for thinking through architectural changes as well as API design.
>> While I don't expect the same degree of drastic changes in a 2.x feature
>> release, I do think it'd make sense to increase the length of release cycle
>> so we can make better designs.
>>
>> My strawman proposal is to maintain a regular release cadence, as we did
>> in Spark 1.x, and increase the cycle from 3 months to 4 months. This
>> effectively gives us ~50% more time to develop (in reality it'd be slightly
>> less than 50% since longer dev time also means longer QA time). As for
>> maintenance releases, I think those should still be cut on-demand, similar
>> to Spark 1.x, but more aggressively.
>>
>> To put this into perspective, 4-month cycle means we will release Spark
>> 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at the
>> end of Oct).
>>
>> I am curious what others think.
>>
>>
>>
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [discuss] Spark 2.x release cadence

Weiqing Yang

+1 (non binding)

 

RC4 is compiled and tested on the system: CentOS Linux release 7.0.1406 / openjdk 1.8.0_102 / R 3.3.1

 All tests passed.

 

./build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver -Dpyspark -Dsparkr -DskipTests clean package

./build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver -Dpyspark -Dsparkr test

 

 

Best,

Weiqing 


On Thu, Sep 29, 2016 at 8:02 AM, Cody Koeninger <[hidden email]> wrote:
Regarding documentation debt, is there a reason not to deploy
documentation updates more frequently than releases?  I recall this
used to be the case.

On Wed, Sep 28, 2016 at 3:35 PM, Joseph Bradley <[hidden email]> wrote:
> +1 for 4 months.  With QA taking about a month, that's very reasonable.
>
> My main ask (especially for MLlib) is for contributors and committers to
> take extra care not to delay on updating the Programming Guide for new APIs.
> Documentation debt often collects and has to be paid off during QA, and a
> longer cycle will exacerbate this problem.
>
> On Wed, Sep 28, 2016 at 7:30 AM, Tom Graves <[hidden email]>
> wrote:
>>
>> +1 to 4 months.
>>
>> Tom
>>
>>
>> On Tuesday, September 27, 2016 2:07 PM, Reynold Xin <[hidden email]>
>> wrote:
>>
>>
>> We are 2 months past releasing Spark 2.0.0, an important milestone for the
>> project. Spark 2.0.0 deviated (took 6 month from the regular release cadence
>> we had for the 1.x line, and we never explicitly discussed what the release
>> cadence should look like for 2.x. Thus this email.
>>
>> During Spark 1.x, roughly every three months we make a new 1.x feature
>> release (e.g. 1.5.0 comes out three months after 1.4.0). Development
>> happened primarily in the first two months, and then a release branch was
>> cut at the end of month 2, and the last month was reserved for QA and
>> release preparation.
>>
>> During 2.0.0 development, I really enjoyed the longer release cycle
>> because there was a lot of major changes happening and the longer time was
>> critical for thinking through architectural changes as well as API design.
>> While I don't expect the same degree of drastic changes in a 2.x feature
>> release, I do think it'd make sense to increase the length of release cycle
>> so we can make better designs.
>>
>> My strawman proposal is to maintain a regular release cadence, as we did
>> in Spark 1.x, and increase the cycle from 3 months to 4 months. This
>> effectively gives us ~50% more time to develop (in reality it'd be slightly
>> less than 50% since longer dev time also means longer QA time). As for
>> maintenance releases, I think those should still be cut on-demand, similar
>> to Spark 1.x, but more aggressively.
>>
>> To put this into perspective, 4-month cycle means we will release Spark
>> 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at the
>> end of Oct).
>>
>> I am curious what others think.
>>
>>
>>
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: [discuss] Spark 2.x release cadence

Weiqing Yang
Sorry. I think I just replied to the wrong thread. :(


WQ

On Thu, Sep 29, 2016 at 10:58 AM, Weiqing Yang <[hidden email]> wrote:

+1 (non binding)

 

RC4 is compiled and tested on the system: CentOS Linux release 7.0.1406 / openjdk 1.8.0_102 / R 3.3.1

 All tests passed.

 

./build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver -Dpyspark -Dsparkr -DskipTests clean package

./build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver -Dpyspark -Dsparkr test

 

 

Best,

Weiqing 


On Thu, Sep 29, 2016 at 8:02 AM, Cody Koeninger <[hidden email]> wrote:
Regarding documentation debt, is there a reason not to deploy
documentation updates more frequently than releases?  I recall this
used to be the case.

On Wed, Sep 28, 2016 at 3:35 PM, Joseph Bradley <[hidden email]> wrote:
> +1 for 4 months.  With QA taking about a month, that's very reasonable.
>
> My main ask (especially for MLlib) is for contributors and committers to
> take extra care not to delay on updating the Programming Guide for new APIs.
> Documentation debt often collects and has to be paid off during QA, and a
> longer cycle will exacerbate this problem.
>
> On Wed, Sep 28, 2016 at 7:30 AM, Tom Graves <[hidden email]>
> wrote:
>>
>> +1 to 4 months.
>>
>> Tom
>>
>>
>> On Tuesday, September 27, 2016 2:07 PM, Reynold Xin <[hidden email]>
>> wrote:
>>
>>
>> We are 2 months past releasing Spark 2.0.0, an important milestone for the
>> project. Spark 2.0.0 deviated (took 6 month from the regular release cadence
>> we had for the 1.x line, and we never explicitly discussed what the release
>> cadence should look like for 2.x. Thus this email.
>>
>> During Spark 1.x, roughly every three months we make a new 1.x feature
>> release (e.g. 1.5.0 comes out three months after 1.4.0). Development
>> happened primarily in the first two months, and then a release branch was
>> cut at the end of month 2, and the last month was reserved for QA and
>> release preparation.
>>
>> During 2.0.0 development, I really enjoyed the longer release cycle
>> because there was a lot of major changes happening and the longer time was
>> critical for thinking through architectural changes as well as API design.
>> While I don't expect the same degree of drastic changes in a 2.x feature
>> release, I do think it'd make sense to increase the length of release cycle
>> so we can make better designs.
>>
>> My strawman proposal is to maintain a regular release cadence, as we did
>> in Spark 1.x, and increase the cycle from 3 months to 4 months. This
>> effectively gives us ~50% more time to develop (in reality it'd be slightly
>> less than 50% since longer dev time also means longer QA time). As for
>> maintenance releases, I think those should still be cut on-demand, similar
>> to Spark 1.x, but more aggressively.
>>
>> To put this into perspective, 4-month cycle means we will release Spark
>> 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at the
>> end of Oct).
>>
>> I am curious what others think.
>>
>>
>>
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]