Should we consider a Spark 2.1.1 release?

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Should we consider a Spark 2.1.1 release?

Holden Karau
Hi Spark Devs,

Spark 2.1 has been out since end of December and we've got quite a few fixes merged for 2.1.1

On the Python side one of the things I'd like to see us get out into a patch release is a packaging fix (now merged) before we upload to PyPI & Conda, and we also have the normal batch of fixes like toLocalIterator for large DataFrames in PySpark.

I've chatted with Felix & Shivaram who seem to think the R side is looking close to in good shape for a 2.1.1 release to submit to CRAN (if I've miss-spoken my apologies). The two outstanding issues that are being tracked for R are SPARK-18817, SPARK-19237.

Looking at the other components quickly it seems like structured streaming could also benefit from a patch release.

What do others think - are there any issues people are actively targeting for 2.1.1? Is this too early to be considering a patch release?

Cheers,

Holden
--
Cell : 425-233-8271
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Should we consider a Spark 2.1.1 release?

Felix Cheung
+1
there are a lot of good fixes in overall and we need a release for Python and R packages.



From: Holden Karau <[hidden email]>
Sent: Monday, March 13, 2017 12:06:47 PM
To: Felix Cheung; Shivaram Venkataraman; [hidden email]
Subject: Should we consider a Spark 2.1.1 release?
 
Hi Spark Devs,

Spark 2.1 has been out since end of December and we've got quite a few fixes merged for 2.1.1

On the Python side one of the things I'd like to see us get out into a patch release is a packaging fix (now merged) before we upload to PyPI & Conda, and we also have the normal batch of fixes like toLocalIterator for large DataFrames in PySpark.

I've chatted with Felix & Shivaram who seem to think the R side is looking close to in good shape for a 2.1.1 release to submit to CRAN (if I've miss-spoken my apologies). The two outstanding issues that are being tracked for R are SPARK-18817, SPARK-19237.

Looking at the other components quickly it seems like structured streaming could also benefit from a patch release.

What do others think - are there any issues people are actively targeting for 2.1.1? Is this too early to be considering a patch release?

Cheers,

Holden
--
Cell : 425-233-8271
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Should we consider a Spark 2.1.1 release?

Sean Owen
In reply to this post by Holden Karau
It seems reasonable to me, in that other x.y.1 releases have followed ~2 months after the x.y.0 release and it's been about 3 months since 2.1.0.

Related: creating releases is tough work, so I feel kind of bad voting for someone else to do that much work. Would it make sense to deputize another release manager to help get out just the maintenance releases? this may in turn mean maintenance branches last longer. Experienced hands can continue to manage new minor and major releases as they require more coordination.

I know most of the release process is written down; I know it's also still going to be work to make it 100% documented. Eventually it'll be necessary to make sure it's entirely codified anyway.

Not pushing for it myself, just noting I had heard this brought up in side conversations before.

On Mon, Mar 13, 2017 at 7:07 PM Holden Karau <[hidden email]> wrote:
Hi Spark Devs,

Spark 2.1 has been out since end of December and we've got quite a few fixes merged for 2.1.1

On the Python side one of the things I'd like to see us get out into a patch release is a packaging fix (now merged) before we upload to PyPI & Conda, and we also have the normal batch of fixes like toLocalIterator for large DataFrames in PySpark.

I've chatted with Felix & Shivaram who seem to think the R side is looking close to in good shape for a 2.1.1 release to submit to CRAN (if I've miss-spoken my apologies). The two outstanding issues that are being tracked for R are SPARK-18817, SPARK-19237.

Looking at the other components quickly it seems like structured streaming could also benefit from a patch release.

What do others think - are there any issues people are actively targeting for 2.1.1? Is this too early to be considering a patch release?

Cheers,

Holden
--
Cell : 425-233-8271
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Should we consider a Spark 2.1.1 release?

Holden Karau
I'd be happy to do the work of coordinating a 2.1.1 release if that's a thing a committer can do (I think the release coordinator for the most recent Arrow release was a committer and the final publish step took a PMC member to upload but other than that I don't remember any issues).

On Mon, Mar 13, 2017 at 1:05 PM Sean Owen <[hidden email]> wrote:
It seems reasonable to me, in that other x.y.1 releases have followed ~2 months after the x.y.0 release and it's been about 3 months since 2.1.0.

Related: creating releases is tough work, so I feel kind of bad voting for someone else to do that much work. Would it make sense to deputize another release manager to help get out just the maintenance releases? this may in turn mean maintenance branches last longer. Experienced hands can continue to manage new minor and major releases as they require more coordination.

I know most of the release process is written down; I know it's also still going to be work to make it 100% documented. Eventually it'll be necessary to make sure it's entirely codified anyway.

Not pushing for it myself, just noting I had heard this brought up in side conversations before.


On Mon, Mar 13, 2017 at 7:07 PM Holden Karau <[hidden email]> wrote:
Hi Spark Devs,

Spark 2.1 has been out since end of December and we've got quite a few fixes merged for 2.1.1

On the Python side one of the things I'd like to see us get out into a patch release is a packaging fix (now merged) before we upload to PyPI & Conda, and we also have the normal batch of fixes like toLocalIterator for large DataFrames in PySpark.

I've chatted with Felix & Shivaram who seem to think the R side is looking close to in good shape for a 2.1.1 release to submit to CRAN (if I've miss-spoken my apologies). The two outstanding issues that are being tracked for R are SPARK-18817, SPARK-19237.

Looking at the other components quickly it seems like structured streaming could also benefit from a patch release.

What do others think - are there any issues people are actively targeting for 2.1.1? Is this too early to be considering a patch release?

Cheers,

Holden
--
Cell : 425-233-8271
--
Cell : 425-233-8271
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Should we consider a Spark 2.1.1 release?

Michael Armbrust
Hey Holden,

Thanks for bringing this up!  I think we usually cut patch releases when there are enough fixes to justify it.  Sometimes just a few weeks after the release.  I guess if we are at 3 months Spark 2.1.0 was a pretty good release :)

That said, it is probably time. I was about to start thinking about 2.2 as well (we are a little past the posted code-freeze deadline), so I'm happy to push the buttons etc (this is a very good description if you are curious). I would love help watching JIRA, posting the burn down on issues and shepherding in any critical patches.  Feel free to ping me off-line if you like to coordinate.

Unless there are any objections, how about we aim for an RC of 2.1.1 on Monday and I'll also plan to cut branch-2.2 then?  (I'll send a separate email on this as well).

Michael

On Mon, Mar 13, 2017 at 1:40 PM, Holden Karau <[hidden email]> wrote:
I'd be happy to do the work of coordinating a 2.1.1 release if that's a thing a committer can do (I think the release coordinator for the most recent Arrow release was a committer and the final publish step took a PMC member to upload but other than that I don't remember any issues).

On Mon, Mar 13, 2017 at 1:05 PM Sean Owen <[hidden email]> wrote:
It seems reasonable to me, in that other x.y.1 releases have followed ~2 months after the x.y.0 release and it's been about 3 months since 2.1.0.

Related: creating releases is tough work, so I feel kind of bad voting for someone else to do that much work. Would it make sense to deputize another release manager to help get out just the maintenance releases? this may in turn mean maintenance branches last longer. Experienced hands can continue to manage new minor and major releases as they require more coordination.

I know most of the release process is written down; I know it's also still going to be work to make it 100% documented. Eventually it'll be necessary to make sure it's entirely codified anyway.

Not pushing for it myself, just noting I had heard this brought up in side conversations before.


On Mon, Mar 13, 2017 at 7:07 PM Holden Karau <[hidden email]> wrote:
Hi Spark Devs,

Spark 2.1 has been out since end of December and we've got quite a few fixes merged for 2.1.1

On the Python side one of the things I'd like to see us get out into a patch release is a packaging fix (now merged) before we upload to PyPI & Conda, and we also have the normal batch of fixes like toLocalIterator for large DataFrames in PySpark.

I've chatted with Felix & Shivaram who seem to think the R side is looking close to in good shape for a 2.1.1 release to submit to CRAN (if I've miss-spoken my apologies). The two outstanding issues that are being tracked for R are SPARK-18817, SPARK-19237.

Looking at the other components quickly it seems like structured streaming could also benefit from a patch release.

What do others think - are there any issues people are actively targeting for 2.1.1? Is this too early to be considering a patch release?

Cheers,

Holden
--
Cell : <a href="tel:(425)%20233-8271" value="+14252338271" target="_blank">425-233-8271
--
Cell : <a href="tel:(425)%20233-8271" value="+14252338271" target="_blank">425-233-8271

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Should we consider a Spark 2.1.1 release?

Nick Pentreath
Spark 1.5.1 had 87 issues fix version 1 month after 1.5.0.

Spark 1.6.1 had 123 issues 2 months after 1.6.0

2.0.1 was larger (317 issues) at 3 months after 2.0.0 - makes sense due to how large a release it was.

We are at 185 for 2.1.1 and 3 months after (and not released yet so it could slip further) - so not totally unusual as the release interval has certainly increased, but in fairness probably a bit later than usual. I'd say definitely makes sense to cut the RC!



On Thu, 16 Mar 2017 at 02:06 Michael Armbrust <[hidden email]> wrote:
Hey Holden,

Thanks for bringing this up!  I think we usually cut patch releases when there are enough fixes to justify it.  Sometimes just a few weeks after the release.  I guess if we are at 3 months Spark 2.1.0 was a pretty good release :)

That said, it is probably time. I was about to start thinking about 2.2 as well (we are a little past the posted code-freeze deadline), so I'm happy to push the buttons etc (this is a very good description if you are curious). I would love help watching JIRA, posting the burn down on issues and shepherding in any critical patches.  Feel free to ping me off-line if you like to coordinate.

Unless there are any objections, how about we aim for an RC of 2.1.1 on Monday and I'll also plan to cut branch-2.2 then?  (I'll send a separate email on this as well).

Michael

On Mon, Mar 13, 2017 at 1:40 PM, Holden Karau <[hidden email]> wrote:
I'd be happy to do the work of coordinating a 2.1.1 release if that's a thing a committer can do (I think the release coordinator for the most recent Arrow release was a committer and the final publish step took a PMC member to upload but other than that I don't remember any issues).

On Mon, Mar 13, 2017 at 1:05 PM Sean Owen <[hidden email]> wrote:
It seems reasonable to me, in that other x.y.1 releases have followed ~2 months after the x.y.0 release and it's been about 3 months since 2.1.0.

Related: creating releases is tough work, so I feel kind of bad voting for someone else to do that much work. Would it make sense to deputize another release manager to help get out just the maintenance releases? this may in turn mean maintenance branches last longer. Experienced hands can continue to manage new minor and major releases as they require more coordination.

I know most of the release process is written down; I know it's also still going to be work to make it 100% documented. Eventually it'll be necessary to make sure it's entirely codified anyway.

Not pushing for it myself, just noting I had heard this brought up in side conversations before.


On Mon, Mar 13, 2017 at 7:07 PM Holden Karau <[hidden email]> wrote:
Hi Spark Devs,

Spark 2.1 has been out since end of December and we've got quite a few fixes merged for 2.1.1

On the Python side one of the things I'd like to see us get out into a patch release is a packaging fix (now merged) before we upload to PyPI & Conda, and we also have the normal batch of fixes like toLocalIterator for large DataFrames in PySpark.

I've chatted with Felix & Shivaram who seem to think the R side is looking close to in good shape for a 2.1.1 release to submit to CRAN (if I've miss-spoken my apologies). The two outstanding issues that are being tracked for R are SPARK-18817, SPARK-19237.

Looking at the other components quickly it seems like structured streaming could also benefit from a patch release.

What do others think - are there any issues people are actively targeting for 2.1.1? Is this too early to be considering a patch release?

Cheers,

Holden
--
Cell : <a href="tel:(425)%20233-8271" value="+14252338271" class="gmail_msg" target="_blank">425-233-8271
--
Cell : <a href="tel:(425)%20233-8271" value="+14252338271" class="gmail_msg" target="_blank">425-233-8271

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Should we consider a Spark 2.1.1 release?

Jacek Laskowski
In reply to this post by Holden Karau
+10000

More smaller and more frequent releases (so major releases get even more quality). 

Jacek

On 13 Mar 2017 8:07 p.m., "Holden Karau" <[hidden email]> wrote:
Hi Spark Devs,

Spark 2.1 has been out since end of December and we've got quite a few fixes merged for 2.1.1

On the Python side one of the things I'd like to see us get out into a patch release is a packaging fix (now merged) before we upload to PyPI & Conda, and we also have the normal batch of fixes like toLocalIterator for large DataFrames in PySpark.

I've chatted with Felix & Shivaram who seem to think the R side is looking close to in good shape for a 2.1.1 release to submit to CRAN (if I've miss-spoken my apologies). The two outstanding issues that are being tracked for R are SPARK-18817, SPARK-19237.

Looking at the other components quickly it seems like structured streaming could also benefit from a patch release.

What do others think - are there any issues people are actively targeting for 2.1.1? Is this too early to be considering a patch release?

Cheers,

Holden
--
Cell : <a href="tel:(425)%20233-8271" value="+14252338271" target="_blank">425-233-8271
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Should we consider a Spark 2.1.1 release?

Mark Hamstra
That doesn't necessarily follow, Jacek. There is a point where too frequent releases decrease quality. That is because releases don't come for free -- each one demands a considerable amount of time from release managers, testers, etc. -- time that would otherwise typically be devoted to improving (or at least adding to) the code. And that doesn't even begin to consider the time that needs to be spent putting a new version into a larger software distribution or that users need to put in to deploy and use a new version. If you have an extremely lightweight deployment cycle, then small, quick releases can make sense; but "lightweight" doesn't really describe a Spark release. The concern for excessive overhead is a large part of the thinking behind why we stretched out the roadmap to allow longer intervals between scheduled releases. A similar concern does come into play for unscheduled maintenance releases -- but I don't think that that is the forcing function at this point: A 2.1.1 release is a good idea.

On Sun, Mar 19, 2017 at 6:24 AM, Jacek Laskowski <[hidden email]> wrote:
+10000

More smaller and more frequent releases (so major releases get even more quality). 

Jacek

On 13 Mar 2017 8:07 p.m., "Holden Karau" <[hidden email]> wrote:
Hi Spark Devs,

Spark 2.1 has been out since end of December and we've got quite a few fixes merged for 2.1.1

On the Python side one of the things I'd like to see us get out into a patch release is a packaging fix (now merged) before we upload to PyPI & Conda, and we also have the normal batch of fixes like toLocalIterator for large DataFrames in PySpark.

I've chatted with Felix & Shivaram who seem to think the R side is looking close to in good shape for a 2.1.1 release to submit to CRAN (if I've miss-spoken my apologies). The two outstanding issues that are being tracked for R are SPARK-18817, SPARK-19237.

Looking at the other components quickly it seems like structured streaming could also benefit from a patch release.

What do others think - are there any issues people are actively targeting for 2.1.1? Is this too early to be considering a patch release?

Cheers,

Holden
--
Cell : <a href="tel:(425)%20233-8271" value="+14252338271" target="_blank">425-233-8271

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Should we consider a Spark 2.1.1 release?

Jacek Laskowski
Hi Mark,

I appreciate your comment.

My thinking is that the more frequent minor and patch releases the
more often end users can give them a shot and be part of the bigger
release cycle for major releases. Spark's an OSS project and we all
can make mistakes and my thinking is is that the more eyeballs the
less the number of the mistakes. If we make very fine/minor releases
often we should be able to attract more people who spend their time on
testing/verification that eventually contribute to a higher quality of
Spark.

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Sun, Mar 19, 2017 at 10:50 PM, Mark Hamstra <[hidden email]> wrote:

> That doesn't necessarily follow, Jacek. There is a point where too frequent
> releases decrease quality. That is because releases don't come for free --
> each one demands a considerable amount of time from release managers,
> testers, etc. -- time that would otherwise typically be devoted to improving
> (or at least adding to) the code. And that doesn't even begin to consider
> the time that needs to be spent putting a new version into a larger software
> distribution or that users need to put in to deploy and use a new version.
> If you have an extremely lightweight deployment cycle, then small, quick
> releases can make sense; but "lightweight" doesn't really describe a Spark
> release. The concern for excessive overhead is a large part of the thinking
> behind why we stretched out the roadmap to allow longer intervals between
> scheduled releases. A similar concern does come into play for unscheduled
> maintenance releases -- but I don't think that that is the forcing function
> at this point: A 2.1.1 release is a good idea.
>
> On Sun, Mar 19, 2017 at 6:24 AM, Jacek Laskowski <[hidden email]> wrote:
>>
>> +10000
>>
>> More smaller and more frequent releases (so major releases get even more
>> quality).
>>
>> Jacek
>>
>> On 13 Mar 2017 8:07 p.m., "Holden Karau" <[hidden email]> wrote:
>>>
>>> Hi Spark Devs,
>>>
>>> Spark 2.1 has been out since end of December and we've got quite a few
>>> fixes merged for 2.1.1.
>>>
>>> On the Python side one of the things I'd like to see us get out into a
>>> patch release is a packaging fix (now merged) before we upload to PyPI &
>>> Conda, and we also have the normal batch of fixes like toLocalIterator for
>>> large DataFrames in PySpark.
>>>
>>> I've chatted with Felix & Shivaram who seem to think the R side is
>>> looking close to in good shape for a 2.1.1 release to submit to CRAN (if
>>> I've miss-spoken my apologies). The two outstanding issues that are being
>>> tracked for R are SPARK-18817, SPARK-19237.
>>>
>>> Looking at the other components quickly it seems like structured
>>> streaming could also benefit from a patch release.
>>>
>>> What do others think - are there any issues people are actively targeting
>>> for 2.1.1? Is this too early to be considering a patch release?
>>>
>>> Cheers,
>>>
>>> Holden
>>> --
>>> Cell : 425-233-8271
>>> Twitter: https://twitter.com/holdenkarau
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Should we consider a Spark 2.1.1 release?

Holden Karau
This discussions seems like it might benefit from its own thread as we've previously decided to lengthen release cycles but if their are different opinions about this it seems unrelated to the specific 2.1.1 release.

On Sun, Mar 19, 2017 at 2:57 PM Jacek Laskowski <[hidden email]> wrote:
Hi Mark,

I appreciate your comment.

My thinking is that the more frequent minor and patch releases the
more often end users can give them a shot and be part of the bigger
release cycle for major releases. Spark's an OSS project and we all
can make mistakes and my thinking is is that the more eyeballs the
less the number of the mistakes. If we make very fine/minor releases
often we should be able to attract more people who spend their time on
testing/verification that eventually contribute to a higher quality of
Spark.

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Sun, Mar 19, 2017 at 10:50 PM, Mark Hamstra <[hidden email]> wrote:
> That doesn't necessarily follow, Jacek. There is a point where too frequent
> releases decrease quality. That is because releases don't come for free --
> each one demands a considerable amount of time from release managers,
> testers, etc. -- time that would otherwise typically be devoted to improving
> (or at least adding to) the code. And that doesn't even begin to consider
> the time that needs to be spent putting a new version into a larger software
> distribution or that users need to put in to deploy and use a new version.
> If you have an extremely lightweight deployment cycle, then small, quick
> releases can make sense; but "lightweight" doesn't really describe a Spark
> release. The concern for excessive overhead is a large part of the thinking
> behind why we stretched out the roadmap to allow longer intervals between
> scheduled releases. A similar concern does come into play for unscheduled
> maintenance releases -- but I don't think that that is the forcing function
> at this point: A 2.1.1 release is a good idea.
>
> On Sun, Mar 19, 2017 at 6:24 AM, Jacek Laskowski <[hidden email]> wrote:
>>
>> +10000
>>
>> More smaller and more frequent releases (so major releases get even more
>> quality).
>>
>> Jacek
>>
>> On 13 Mar 2017 8:07 p.m., "Holden Karau" <[hidden email]> wrote:
>>>
>>> Hi Spark Devs,
>>>
>>> Spark 2.1 has been out since end of December and we've got quite a few
>>> fixes merged for 2.1.1.
>>>
>>> On the Python side one of the things I'd like to see us get out into a
>>> patch release is a packaging fix (now merged) before we upload to PyPI &
>>> Conda, and we also have the normal batch of fixes like toLocalIterator for
>>> large DataFrames in PySpark.
>>>
>>> I've chatted with Felix & Shivaram who seem to think the R side is
>>> looking close to in good shape for a 2.1.1 release to submit to CRAN (if
>>> I've miss-spoken my apologies). The two outstanding issues that are being
>>> tracked for R are SPARK-18817, SPARK-19237.
>>>
>>> Looking at the other components quickly it seems like structured
>>> streaming could also benefit from a patch release.
>>>
>>> What do others think - are there any issues people are actively targeting
>>> for 2.1.1? Is this too early to be considering a patch release?
>>>
>>> Cheers,
>>>
>>> Holden
>>> --
>>> Cell : 425-233-8271
>>> Twitter: https://twitter.com/holdenkarau
>
>
--
Cell : 425-233-8271
tsh
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Should we consider a Spark 2.1.1 release?

tsh
Hello guys,

Spark benefits from stable versions not frequent ones.
A lot of people still have 1.6.x in production. Those who wants the freshest (like me) can always deploy night builts.
My question is: how long version 1.6 will be supported? 

On Sunday, March 19, 2017, Holden Karau <[hidden email]> wrote:
This discussions seems like it might benefit from its own thread as we've previously decided to lengthen release cycles but if their are different opinions about this it seems unrelated to the specific 2.1.1 release.

On Sun, Mar 19, 2017 at 2:57 PM Jacek Laskowski <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;jacek@japila.pl&#39;);" target="_blank">jacek@...> wrote:
Hi Mark,

I appreciate your comment.

My thinking is that the more frequent minor and patch releases the
more often end users can give them a shot and be part of the bigger
release cycle for major releases. Spark's an OSS project and we all
can make mistakes and my thinking is is that the more eyeballs the
less the number of the mistakes. If we make very fine/minor releases
often we should be able to attract more people who spend their time on
testing/verification that eventually contribute to a higher quality of
Spark.

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Sun, Mar 19, 2017 at 10:50 PM, Mark Hamstra <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;mark@clearstorydata.com&#39;);" target="_blank">mark@...> wrote:
> That doesn't necessarily follow, Jacek. There is a point where too frequent
> releases decrease quality. That is because releases don't come for free --
> each one demands a considerable amount of time from release managers,
> testers, etc. -- time that would otherwise typically be devoted to improving
> (or at least adding to) the code. And that doesn't even begin to consider
> the time that needs to be spent putting a new version into a larger software
> distribution or that users need to put in to deploy and use a new version.
> If you have an extremely lightweight deployment cycle, then small, quick
> releases can make sense; but "lightweight" doesn't really describe a Spark
> release. The concern for excessive overhead is a large part of the thinking
> behind why we stretched out the roadmap to allow longer intervals between
> scheduled releases. A similar concern does come into play for unscheduled
> maintenance releases -- but I don't think that that is the forcing function
> at this point: A 2.1.1 release is a good idea.
>
> On Sun, Mar 19, 2017 at 6:24 AM, Jacek Laskowski <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;jacek@japila.pl&#39;);" target="_blank">jacek@...> wrote:
>>
>> +10000
>>
>> More smaller and more frequent releases (so major releases get even more
>> quality).
>>
>> Jacek
>>
>> On 13 Mar 2017 8:07 p.m., "Holden Karau" <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;holden@pigscanfly.ca&#39;);" target="_blank">holden@...> wrote:
>>>
>>> Hi Spark Devs,
>>>
>>> Spark 2.1 has been out since end of December and we've got quite a few
>>> fixes merged for 2.1.1.
>>>
>>> On the Python side one of the things I'd like to see us get out into a
>>> patch release is a packaging fix (now merged) before we upload to PyPI &
>>> Conda, and we also have the normal batch of fixes like toLocalIterator for
>>> large DataFrames in PySpark.
>>>
>>> I've chatted with Felix & Shivaram who seem to think the R side is
>>> looking close to in good shape for a 2.1.1 release to submit to CRAN (if
>>> I've miss-spoken my apologies). The two outstanding issues that are being
>>> tracked for R are SPARK-18817, SPARK-19237.
>>>
>>> Looking at the other components quickly it seems like structured
>>> streaming could also benefit from a patch release.
>>>
>>> What do others think - are there any issues people are actively targeting
>>> for 2.1.1? Is this too early to be considering a patch release?
>>>
>>> Cheers,
>>>
>>> Holden
>>> --
>>> Cell : 425-233-8271
>>> Twitter: https://twitter.com/holdenkarau
>
>
--
Cell : 425-233-8271
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Should we consider a Spark 2.1.1 release?

Holden Karau
I think questions around how long the 1.6 series will be supported are really important, but probably belong in a different thread than the 2.1.1 release discussion.

On Mon, Mar 20, 2017 at 11:34 AM Timur Shenkao <[hidden email]> wrote:
Hello guys,

Spark benefits from stable versions not frequent ones.
A lot of people still have 1.6.x in production. Those who wants the freshest (like me) can always deploy night builts.
My question is: how long version 1.6 will be supported? 


On Sunday, March 19, 2017, Holden Karau <[hidden email]> wrote:
This discussions seems like it might benefit from its own thread as we've previously decided to lengthen release cycles but if their are different opinions about this it seems unrelated to the specific 2.1.1 release.

On Sun, Mar 19, 2017 at 2:57 PM Jacek Laskowski <[hidden email]> wrote:
Hi Mark,

I appreciate your comment.

My thinking is that the more frequent minor and patch releases the
more often end users can give them a shot and be part of the bigger
release cycle for major releases. Spark's an OSS project and we all
can make mistakes and my thinking is is that the more eyeballs the
less the number of the mistakes. If we make very fine/minor releases
often we should be able to attract more people who spend their time on
testing/verification that eventually contribute to a higher quality of
Spark.

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Sun, Mar 19, 2017 at 10:50 PM, Mark Hamstra <[hidden email]> wrote:
> That doesn't necessarily follow, Jacek. There is a point where too frequent
> releases decrease quality. That is because releases don't come for free --
> each one demands a considerable amount of time from release managers,
> testers, etc. -- time that would otherwise typically be devoted to improving
> (or at least adding to) the code. And that doesn't even begin to consider
> the time that needs to be spent putting a new version into a larger software
> distribution or that users need to put in to deploy and use a new version.
> If you have an extremely lightweight deployment cycle, then small, quick
> releases can make sense; but "lightweight" doesn't really describe a Spark
> release. The concern for excessive overhead is a large part of the thinking
> behind why we stretched out the roadmap to allow longer intervals between
> scheduled releases. A similar concern does come into play for unscheduled
> maintenance releases -- but I don't think that that is the forcing function
> at this point: A 2.1.1 release is a good idea.
>
> On Sun, Mar 19, 2017 at 6:24 AM, Jacek Laskowski <[hidden email]> wrote:
>>
>> +10000
>>
>> More smaller and more frequent releases (so major releases get even more
>> quality).
>>
>> Jacek
>>
>> On 13 Mar 2017 8:07 p.m., "Holden Karau" <[hidden email]> wrote:
>>>
>>> Hi Spark Devs,
>>>
>>> Spark 2.1 has been out since end of December and we've got quite a few
>>> fixes merged for 2.1.1.
>>>
>>> On the Python side one of the things I'd like to see us get out into a
>>> patch release is a packaging fix (now merged) before we upload to PyPI &
>>> Conda, and we also have the normal batch of fixes like toLocalIterator for
>>> large DataFrames in PySpark.
>>>
>>> I've chatted with Felix & Shivaram who seem to think the R side is
>>> looking close to in good shape for a 2.1.1 release to submit to CRAN (if
>>> I've miss-spoken my apologies). The two outstanding issues that are being
>>> tracked for R are SPARK-18817, SPARK-19237.
>>>
>>> Looking at the other components quickly it seems like structured
>>> streaming could also benefit from a patch release.
>>>
>>> What do others think - are there any issues people are actively targeting
>>> for 2.1.1? Is this too early to be considering a patch release?
>>>
>>> Cheers,
>>>
>>> Holden
>>> --
>>> Cell : 425-233-8271
>>> Twitter: https://twitter.com/holdenkarau
>
>
--
Cell : 425-233-8271
--
Cell : 425-233-8271
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Should we consider a Spark 2.1.1 release?

Ted Yu
In reply to this post by tsh
Timur:
Mind starting a new thread ?

I have the same question as you have. 

On Mar 20, 2017, at 11:34 AM, Timur Shenkao <[hidden email]> wrote:

Hello guys,

Spark benefits from stable versions not frequent ones.
A lot of people still have 1.6.x in production. Those who wants the freshest (like me) can always deploy night builts.
My question is: how long version 1.6 will be supported? 

On Sunday, March 19, 2017, Holden Karau <[hidden email]> wrote:
This discussions seems like it might benefit from its own thread as we've previously decided to lengthen release cycles but if their are different opinions about this it seems unrelated to the specific 2.1.1 release.

On Sun, Mar 19, 2017 at 2:57 PM Jacek Laskowski <<a href="javascript:_e(%7B%7D,'cvml','jacek@japila.pl');" target="_blank">jacek@...> wrote:
Hi Mark,

I appreciate your comment.

My thinking is that the more frequent minor and patch releases the
more often end users can give them a shot and be part of the bigger
release cycle for major releases. Spark's an OSS project and we all
can make mistakes and my thinking is is that the more eyeballs the
less the number of the mistakes. If we make very fine/minor releases
often we should be able to attract more people who spend their time on
testing/verification that eventually contribute to a higher quality of
Spark.

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Sun, Mar 19, 2017 at 10:50 PM, Mark Hamstra <<a href="javascript:_e(%7B%7D,'cvml','mark@clearstorydata.com');" target="_blank">mark@...> wrote:
> That doesn't necessarily follow, Jacek. There is a point where too frequent
> releases decrease quality. That is because releases don't come for free --
> each one demands a considerable amount of time from release managers,
> testers, etc. -- time that would otherwise typically be devoted to improving
> (or at least adding to) the code. And that doesn't even begin to consider
> the time that needs to be spent putting a new version into a larger software
> distribution or that users need to put in to deploy and use a new version.
> If you have an extremely lightweight deployment cycle, then small, quick
> releases can make sense; but "lightweight" doesn't really describe a Spark
> release. The concern for excessive overhead is a large part of the thinking
> behind why we stretched out the roadmap to allow longer intervals between
> scheduled releases. A similar concern does come into play for unscheduled
> maintenance releases -- but I don't think that that is the forcing function
> at this point: A 2.1.1 release is a good idea.
>
> On Sun, Mar 19, 2017 at 6:24 AM, Jacek Laskowski <<a href="javascript:_e(%7B%7D,'cvml','jacek@japila.pl');" target="_blank">jacek@...> wrote:
>>
>> +10000
>>
>> More smaller and more frequent releases (so major releases get even more
>> quality).
>>
>> Jacek
>>
>> On 13 Mar 2017 8:07 p.m., "Holden Karau" <<a href="javascript:_e(%7B%7D,'cvml','holden@pigscanfly.ca');" target="_blank">holden@...> wrote:
>>>
>>> Hi Spark Devs,
>>>
>>> Spark 2.1 has been out since end of December and we've got quite a few
>>> fixes merged for 2.1.1.
>>>
>>> On the Python side one of the things I'd like to see us get out into a
>>> patch release is a packaging fix (now merged) before we upload to PyPI &
>>> Conda, and we also have the normal batch of fixes like toLocalIterator for
>>> large DataFrames in PySpark.
>>>
>>> I've chatted with Felix & Shivaram who seem to think the R side is
>>> looking close to in good shape for a 2.1.1 release to submit to CRAN (if
>>> I've miss-spoken my apologies). The two outstanding issues that are being
>>> tracked for R are SPARK-18817, SPARK-19237.
>>>
>>> Looking at the other components quickly it seems like structured
>>> streaming could also benefit from a patch release.
>>>
>>> What do others think - are there any issues people are actively targeting
>>> for 2.1.1? Is this too early to be considering a patch release?
>>>
>>> Cheers,
>>>
>>> Holden
>>> --
>>> Cell : 425-233-8271
>>> Twitter: https://twitter.com/holdenkarau
>
>
--
Cell : 425-233-8271
Loading...