Spark Improvement Proposals

classic Classic list List threaded Threaded
107 messages Options
123456
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark Improvement Proposals

Matei Zaharia
Administrator
Hi Cody,

I think this would be a lot more concrete if we had a more detailed template for SIPs. Right now, it's not super clear what's in scope -- e.g. are  they a way to solicit feedback on the user-facing behavior or on the internals? "Goals" can cover both things. I've been thinking of SIPs more as Product Requirements Docs (PRDs), which focus on *what* a code change should do as opposed to how.

In particular, here are some things that you may or may not consider in scope for SIPs:

- Goals and non-goals: This is definitely in scope, and IMO should focus on user-visible behavior (e.g. "system supports SQL window functions" or "system continues working if one node fails"). BTW I wouldn't say "rejected goals" because some of them might become goals later, so we're not definitively rejecting them.

- Public API: Probably should be included in most SIPs unless it's too large to fully specify then (e.g. "let's add an ML library").

- Use cases: I usually find this very useful in PRDs to better communicate the goals.

- Internal architecture: This is usually *not* a thing users can easily comment on and it sounds more like a design doc item. Of course it's important to show that the SIP is feasible to implement. One exception, however, is that I think we'll have some SIPs primarily on internals (e.g. if somebody wants to refactor Spark's query optimizer or something).

- Rejected strategies: I personally wouldn't put this, because what's the point of voting to reject a strategy before you've really begun designing and implementing something? What if you discover that the strategy is actually better when you start doing stuff?

At a super high level, it depends on whether you want the SIPs to be PRDs for getting some quick feedback on the goals of a feature before it is designed, or something more like full-fledged design docs (just a more visible design doc for bigger changes). I looked at Kafka's KIPs, and they actually seem to be more like design docs. This can work too but it does require more work from the proposer and it can lead to the same problems you mentioned with people already having a design and implementation in mind.

Basically, the question is, are you trying to iterate faster on design by adding a step for user feedback earlier? Or are you just trying to make design docs for key features more visible (and their approval more formal)?

BTW note that in either case, I'd like to have a template for design docs too, which should also include goals. I think that would've avoided some of the issues you brought up.

Matei

On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]> wrote:

Here's my specific proposal (meta-proposal?)

Spark Improvement Proposals (SIP)


Background:

The current problem is that design and implementation of large features are often done in private, before soliciting user feedback.

When feedback is solicited, it is often as to detailed design specifics, not focused on goals.

When implementation does take place after design, there is often disagreement as to what goals are or are not in scope.

This results in commits that don't fully meet user needs.


Goals:

- Ensure user, contributor, and committer goals are clearly identified and agreed upon, before implementation takes place.

- Ensure that a technically feasible strategy is chosen that is likely to meet the goals.


Rejected Goals:

- SIPs are not for detailed design.  Design by committee doesn't work.

- SIPs are not for every change.  We dont need that much process.


Strategy:

My suggestion is outlined as a Spark Improvement Proposal process documented at

https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md

Specifics of Jira manipulation are an implementation detail we can figure out.

I'm suggesting voting; the need here is for a _clear_ outcome.


Rejected Strategies:

Having someone who understands the problem implement it first works, but only if significant iteration after user feedback is allowed.

Historically this has been problematic due to pressure to limit public api changes.


On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]> wrote:
Alright looks like there are quite a bit of support. We should wait to hear from more people too.

To push this forward, Cody and I will be working together in the next couple of weeks to come up with a concrete, detailed proposal on what this entails, and then we can discuss this the specific proposal as well.


On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden email]> wrote:
Yeah, in case it wasn't clear, I was talking about SIPs for major user-facing or cross-cutting changes, not minor feature adds.

On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos <[hidden email]> wrote:
+1 to the SIP label as long as it does not slow down things and it targets optimizing efforts, coordination etc. For example really small features should not need to go through this process (assuming they dont touch public interfaces)  or re-factorings and hope it will be kept this way. So as a guideline doc should be provided, like in the KIP case.

IMHO so far aside from tagging things and linking them elsewhere simply having design docs and prototypes implementations in PRs is not something that has not worked so far. What is really a pain in many projects out there is discontinuity in progress of PRs, missing features, slow reviews which is understandable to some extent... it is not only about Spark but things can be improved for sure for this project in particular as already stated.

On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden email]> wrote:
+1 to adding an SIP label and linking it from the website.  I think it needs

- template that focuses it towards soliciting user goals / non goals
- clear resolution as to which strategy was chosen to pursue.  I'd
recommend a vote.

Matei asked me to clarify what I meant by changing interfaces, I think
it's directly relevant to the SIP idea so I'll clarify here, and split
a thread for the other discussion per Nicholas' request.

I meant changing public user interfaces.  I think the first design is
unlikely to be right, because it's done at a time when you have the
least information.  As a user, I find it considerably more frustrating
to be unable to use a tool to get my job done, than I do having to
make minor changes to my code in order to take advantage of features.
I've seen committers be seriously reluctant to allow changes to
@experimental code that are needed in order for it to really work
right.  You need to be able to iterate, and if people on both sides of
the fence aren't going to respect that some newer apis are subject to
change, then why even mark them as such?

Ideally a finished SIP should give me a checklist of things that an
implementation must do, and things that it doesn't need to do.
Contributors/committers should be seriously discouraged from putting
out a version 0.1 that doesn't have at least a prototype
implementation of all those things, especially if they're then going
to argue against interface changes necessary to get the the rest of
the things done in the 0.2 version.


On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden email]> wrote:
> I like the lightweight proposal to add a SIP label.
>
> During Spark 2.0 development, Tom (Graves) and I suggested using wiki to
> track the list of major changes, but that never really materialized due to
> the overhead. Adding a SIP label on major JIRAs and then link to them
> prominently on the Spark website makes a lot of sense.
>
>
> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia <[hidden email]>
> wrote:
>>
>> For the improvement proposals, I think one major point was to make them
>> really visible to users who are not contributors, so we should do more than
>> sending stuff to dev@. One very lightweight idea is to have a new type of
>> JIRA called a SIP and have a link to a filter that shows all such JIRAs from
>> http://spark.apache.org. I also like the idea of SIP and design doc
>> templates (in fact many projects have them).
>>
>> Matei
>>
>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]> wrote:
>>
>> I called Cody last night and talked about some of the topics in his email.
>> It became clear to me Cody genuinely cares about the project.
>>
>> Some of the frustrations come from the success of the project itself
>> becoming very "hot", and it is difficult to get clarity from people who
>> don't dedicate all their time to Spark. In fact, it is in some ways similar
>> to scaling an engineering team in a successful startup: old processes that
>> worked well might not work so well when it gets to a certain size, cultures
>> can get diluted, building culture vs building process, etc.
>>
>> I also really like to have a more visible process for larger changes,
>> especially major user facing API changes. Historically we upload design docs
>> for major changes, but it is not always consistent and difficult to quality
>> of the docs, due to the volunteering nature of the organization.
>>
>> Some of the more concrete ideas we discussed focus on building a culture
>> to improve clarity:
>>
>> - Process: Large changes should have design docs posted on JIRA. One thing
>> Cody and I didn't discuss but an idea that just came to me is we should
>> create a design doc template for the project and ask everybody to follow.
>> The design doc template should also explicitly list goals and non-goals, to
>> make design doc more consistent.
>>
>> - Process: Email dev@ to solicit feedback. We have some this with some
>> changes, but again very inconsistent. Just posting something on JIRA isn't
>> sufficient, because there are simply too many JIRAs and the signal get lost
>> in the noise. While this is generally impossible to enforce because we can't
>> force all volunteers to conform to a process (or they might not even be
>> aware of this),  those who are more familiar with the project can help by
>> emailing the dev@ when they see something that hasn't been.
>>
>> - Culture: The design doc author(s) should be open to feedback. A design
>> doc should serve as the base for discussion and is by no means the final
>> design. Of course, this does not mean the author has to accept every
>> feedback. They should also be comfortable accepting / rejecting ideas on
>> technical grounds.
>>
>> - Process / Culture: For major ongoing projects, it can be useful to have
>> some monthly Google hangouts that are open to the world. I am actually not
>> sure how well this will work, because of the volunteering nature and we need
>> to adjust for timezones for people across the globe, but it seems worth
>> trying.
>>
>> - Culture: Contributors (including committers) should be more direct in
>> setting expectations, including whether they are working on a specific
>> issue, whether they will be working on a specific issue, and whether an
>> issue or pr or jira should be rejected. Most people I know in this community
>> are nice and don't enjoy telling other people no, but it is often more
>> annoying to a contributor to not know anything than getting a no.
>>
>>
>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia <[hidden email]>
>> wrote:
>>>
>>>
>>> Love the idea of a more visible "Spark Improvement Proposal" process that
>>> solicits user input on new APIs. For what it's worth, I don't think
>>> committers are trying to minimize their own work -- every committer cares
>>> about making the software useful for users. However, it is always hard to
>>> get user input and so it helps to have this kind of process. I've certainly
>>> looked at the *IPs a lot in other software I use just to see the biggest
>>> things on the roadmap.
>>>
>>> When you're talking about "changing interfaces", are you talking about
>>> public or internal APIs? I do think many people hate changing public APIs
>>> and I actually think that's for the best of the project. That's a technical
>>> debate, but basically, the worst thing when you're using a piece of software
>>> is that the developers constantly ask you to rewrite your app to update to a
>>> new version (and thus benefit from bug fixes, etc). Cue anyone who's used
>>> Protobuf, or Guava. The "let's get everyone to change their code this
>>> release" model works well within a single large company, but doesn't work
>>> well for a community, which is why nearly all *very* widely used programming
>>> interfaces (I'm talking things like Java standard library, Windows API, etc)
>>> almost *never* break backwards compatibility. All this is done within reason
>>> though, e.g. we do change things in major releases (2.x, 3.x, etc).
>>
>>
>>
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]




--
Stavros Kontopoulos
Senior Software Engineer
Lightbend, Inc.
<a href="tel:%2B1%20650%20678%200020" value="+16506780020" target="_blank" class="">p:  +30 6977967274




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark Improvement Proposals

Nicholas Chammas
  • Rejected strategies: I personally wouldn’t put this, because what’s the point of voting to reject a strategy before you’ve really begun designing and implementing something? What if you discover that the strategy is actually better when you start doing stuff?

I would guess the point is to document alternatives that were discussed and rejected, so that later on people can be pointed to that discussion and the devs don’t have to repeat themselves unnecessarily every time someone comes along and asks “Why didn’t you do this other thing?” That doesn’t mean a rejected proposal can’t later be revisited and the SIP can’t be updated.

For reference from the Python community, PEP 492, a Python Enhancement Proposal for adding async and await syntax and “first-class” coroutines to Python, has a section on rejected ideas for the new syntax. It captures a summary of what the devs discussed, but it doesn’t mean the PEP can’t be updated and a previously rejected proposal can’t be revived.

At least in the Python community, a PEP serves not just as formal starting point for a proposal (the “real” starting point is usually a discussion on python-ideas or python-dev), but also as documentation of what was agreed on and a living “spec” of sorts. So PEPs sometimes get updated years after they are approved when revisions are agreed upon. PEPs are also intended for wide consumption, vs. bug tracker issues which the broader Python dev community are not expected to follow closely.

Dunno if we want to follow a similar pattern for Spark, since the project’s needs are different. But the Python community has used PEPs to help organize and steer development since 2000; there are plenty of examples there we can probably take inspiration from.

By the way, can we call these things something other than Spark Improvement Proposals? The acronym, SIP, conflicts with Scala SIPs. Since the Scala and Spark communities have a lot of overlap, we don’t want, for example, names like “SIP-10” to have an ambiguous meaning.

Nick


On Sun, Oct 9, 2016 at 3:34 PM Matei Zaharia <[hidden email]> wrote:
Hi Cody,

I think this would be a lot more concrete if we had a more detailed template for SIPs. Right now, it's not super clear what's in scope -- e.g. are  they a way to solicit feedback on the user-facing behavior or on the internals? "Goals" can cover both things. I've been thinking of SIPs more as Product Requirements Docs (PRDs), which focus on *what* a code change should do as opposed to how.

In particular, here are some things that you may or may not consider in scope for SIPs:

- Goals and non-goals: This is definitely in scope, and IMO should focus on user-visible behavior (e.g. "system supports SQL window functions" or "system continues working if one node fails"). BTW I wouldn't say "rejected goals" because some of them might become goals later, so we're not definitively rejecting them.

- Public API: Probably should be included in most SIPs unless it's too large to fully specify then (e.g. "let's add an ML library").

- Use cases: I usually find this very useful in PRDs to better communicate the goals.

- Internal architecture: This is usually *not* a thing users can easily comment on and it sounds more like a design doc item. Of course it's important to show that the SIP is feasible to implement. One exception, however, is that I think we'll have some SIPs primarily on internals (e.g. if somebody wants to refactor Spark's query optimizer or something).

- Rejected strategies: I personally wouldn't put this, because what's the point of voting to reject a strategy before you've really begun designing and implementing something? What if you discover that the strategy is actually better when you start doing stuff?

At a super high level, it depends on whether you want the SIPs to be PRDs for getting some quick feedback on the goals of a feature before it is designed, or something more like full-fledged design docs (just a more visible design doc for bigger changes). I looked at Kafka's KIPs, and they actually seem to be more like design docs. This can work too but it does require more work from the proposer and it can lead to the same problems you mentioned with people already having a design and implementation in mind.

Basically, the question is, are you trying to iterate faster on design by adding a step for user feedback earlier? Or are you just trying to make design docs for key features more visible (and their approval more formal)?

BTW note that in either case, I'd like to have a template for design docs too, which should also include goals. I think that would've avoided some of the issues you brought up.

Matei

On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]> wrote:

Here's my specific proposal (meta-proposal?)

Spark Improvement Proposals (SIP)


Background:

The current problem is that design and implementation of large features are often done in private, before soliciting user feedback.

When feedback is solicited, it is often as to detailed design specifics, not focused on goals.

When implementation does take place after design, there is often disagreement as to what goals are or are not in scope.

This results in commits that don't fully meet user needs.


Goals:

- Ensure user, contributor, and committer goals are clearly identified and agreed upon, before implementation takes place.

- Ensure that a technically feasible strategy is chosen that is likely to meet the goals.


Rejected Goals:

- SIPs are not for detailed design.  Design by committee doesn't work.

- SIPs are not for every change.  We dont need that much process.


Strategy:

My suggestion is outlined as a Spark Improvement Proposal process documented at

https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md

Specifics of Jira manipulation are an implementation detail we can figure out.

I'm suggesting voting; the need here is for a _clear_ outcome.


Rejected Strategies:

Having someone who understands the problem implement it first works, but only if significant iteration after user feedback is allowed.

Historically this has been problematic due to pressure to limit public api changes.


On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]> wrote:
Alright looks like there are quite a bit of support. We should wait to hear from more people too.

To push this forward, Cody and I will be working together in the next couple of weeks to come up with a concrete, detailed proposal on what this entails, and then we can discuss this the specific proposal as well.


On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden email]> wrote:
Yeah, in case it wasn't clear, I was talking about SIPs for major user-facing or cross-cutting changes, not minor feature adds.

On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos <[hidden email]> wrote:
+1 to the SIP label as long as it does not slow down things and it targets optimizing efforts, coordination etc. For example really small features should not need to go through this process (assuming they dont touch public interfaces)  or re-factorings and hope it will be kept this way. So as a guideline doc should be provided, like in the KIP case.

IMHO so far aside from tagging things and linking them elsewhere simply having design docs and prototypes implementations in PRs is not something that has not worked so far. What is really a pain in many projects out there is discontinuity in progress of PRs, missing features, slow reviews which is understandable to some extent... it is not only about Spark but things can be improved for sure for this project in particular as already stated.

On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden email]> wrote:
+1 to adding an SIP label and linking it from the website.  I think it needs

- template that focuses it towards soliciting user goals / non goals
- clear resolution as to which strategy was chosen to pursue.  I'd
recommend a vote.

Matei asked me to clarify what I meant by changing interfaces, I think
it's directly relevant to the SIP idea so I'll clarify here, and split
a thread for the other discussion per Nicholas' request.

I meant changing public user interfaces.  I think the first design is
unlikely to be right, because it's done at a time when you have the
least information.  As a user, I find it considerably more frustrating
to be unable to use a tool to get my job done, than I do having to
make minor changes to my code in order to take advantage of features.
I've seen committers be seriously reluctant to allow changes to
@experimental code that are needed in order for it to really work
right.  You need to be able to iterate, and if people on both sides of
the fence aren't going to respect that some newer apis are subject to
change, then why even mark them as such?

Ideally a finished SIP should give me a checklist of things that an
implementation must do, and things that it doesn't need to do.
Contributors/committers should be seriously discouraged from putting
out a version 0.1 that doesn't have at least a prototype
implementation of all those things, especially if they're then going
to argue against interface changes necessary to get the the rest of
the things done in the 0.2 version.


On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden email]> wrote:
> I like the lightweight proposal to add a SIP label.
>
> During Spark 2.0 development, Tom (Graves) and I suggested using wiki to
> track the list of major changes, but that never really materialized due to
> the overhead. Adding a SIP label on major JIRAs and then link to them
> prominently on the Spark website makes a lot of sense.
>
>
> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia <[hidden email]>
> wrote:
>>
>> For the improvement proposals, I think one major point was to make them
>> really visible to users who are not contributors, so we should do more than
>> sending stuff to dev@. One very lightweight idea is to have a new type of
>> JIRA called a SIP and have a link to a filter that shows all such JIRAs from
>> http://spark.apache.org. I also like the idea of SIP and design doc
>> templates (in fact many projects have them).
>>
>> Matei
>>
>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]> wrote:
>>
>> I called Cody last night and talked about some of the topics in his email.
>> It became clear to me Cody genuinely cares about the project.
>>
>> Some of the frustrations come from the success of the project itself
>> becoming very "hot", and it is difficult to get clarity from people who
>> don't dedicate all their time to Spark. In fact, it is in some ways similar
>> to scaling an engineering team in a successful startup: old processes that
>> worked well might not work so well when it gets to a certain size, cultures
>> can get diluted, building culture vs building process, etc.
>>
>> I also really like to have a more visible process for larger changes,
>> especially major user facing API changes. Historically we upload design docs
>> for major changes, but it is not always consistent and difficult to quality
>> of the docs, due to the volunteering nature of the organization.
>>
>> Some of the more concrete ideas we discussed focus on building a culture
>> to improve clarity:
>>
>> - Process: Large changes should have design docs posted on JIRA. One thing
>> Cody and I didn't discuss but an idea that just came to me is we should
>> create a design doc template for the project and ask everybody to follow.
>> The design doc template should also explicitly list goals and non-goals, to
>> make design doc more consistent.
>>
>> - Process: Email dev@ to solicit feedback. We have some this with some
>> changes, but again very inconsistent. Just posting something on JIRA isn't
>> sufficient, because there are simply too many JIRAs and the signal get lost
>> in the noise. While this is generally impossible to enforce because we can't
>> force all volunteers to conform to a process (or they might not even be
>> aware of this),  those who are more familiar with the project can help by
>> emailing the dev@ when they see something that hasn't been.
>>
>> - Culture: The design doc author(s) should be open to feedback. A design
>> doc should serve as the base for discussion and is by no means the final
>> design. Of course, this does not mean the author has to accept every
>> feedback. They should also be comfortable accepting / rejecting ideas on
>> technical grounds.
>>
>> - Process / Culture: For major ongoing projects, it can be useful to have
>> some monthly Google hangouts that are open to the world. I am actually not
>> sure how well this will work, because of the volunteering nature and we need
>> to adjust for timezones for people across the globe, but it seems worth
>> trying.
>>
>> - Culture: Contributors (including committers) should be more direct in
>> setting expectations, including whether they are working on a specific
>> issue, whether they will be working on a specific issue, and whether an
>> issue or pr or jira should be rejected. Most people I know in this community
>> are nice and don't enjoy telling other people no, but it is often more
>> annoying to a contributor to not know anything than getting a no.
>>
>>
>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia <[hidden email]>
>> wrote:
>>>
>>>
>>> Love the idea of a more visible "Spark Improvement Proposal" process that
>>> solicits user input on new APIs. For what it's worth, I don't think
>>> committers are trying to minimize their own work -- every committer cares
>>> about making the software useful for users. However, it is always hard to
>>> get user input and so it helps to have this kind of process. I've certainly
>>> looked at the *IPs a lot in other software I use just to see the biggest
>>> things on the roadmap.
>>>
>>> When you're talking about "changing interfaces", are you talking about
>>> public or internal APIs? I do think many people hate changing public APIs
>>> and I actually think that's for the best of the project. That's a technical
>>> debate, but basically, the worst thing when you're using a piece of software
>>> is that the developers constantly ask you to rewrite your app to update to a
>>> new version (and thus benefit from bug fixes, etc). Cue anyone who's used
>>> Protobuf, or Guava. The "let's get everyone to change their code this
>>> release" model works well within a single large company, but doesn't work
>>> well for a community, which is why nearly all *very* widely used programming
>>> interfaces (I'm talking things like Java standard library, Windows API, etc)
>>> almost *never* break backwards compatibility. All this is done within reason
>>> though, e.g. we do change things in major releases (2.x, 3.x, etc).
>>
>>
>>
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]




--
Stavros Kontopoulos
Senior Software Engineer
Lightbend, Inc.
<a href="tel:%2B1%20650%20678%200020" value="+16506780020" class="gmail_msg" target="_blank">p:  +30 6977967274




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark Improvement Proposals

Matei Zaharia
Administrator
Yup, but the example you gave is for alternatives about *user-facing behavior*, not implementation. The current SIP doc describes "strategy" more as implementation strategy. I'm just saying there are different possible goals for these types of docs.

BTW, PEPs and Scala SIPs focus primarily on user-facing behavior, but also require a reference implementation. This is a bit different from what Cody had in mind, I think.

Matei

On Oct 9, 2016, at 1:25 PM, Nicholas Chammas <[hidden email]> wrote:

  • Rejected strategies: I personally wouldn’t put this, because what’s the point of voting to reject a strategy before you’ve really begun designing and implementing something? What if you discover that the strategy is actually better when you start doing stuff?

I would guess the point is to document alternatives that were discussed and rejected, so that later on people can be pointed to that discussion and the devs don’t have to repeat themselves unnecessarily every time someone comes along and asks “Why didn’t you do this other thing?” That doesn’t mean a rejected proposal can’t later be revisited and the SIP can’t be updated.

For reference from the Python community, PEP 492, a Python Enhancement Proposal for adding async and await syntax and “first-class” coroutines to Python, has a section on rejected ideas for the new syntax. It captures a summary of what the devs discussed, but it doesn’t mean the PEP can’t be updated and a previously rejected proposal can’t be revived.

At least in the Python community, a PEP serves not just as formal starting point for a proposal (the “real” starting point is usually a discussion on python-ideas or python-dev), but also as documentation of what was agreed on and a living “spec” of sorts. So PEPs sometimes get updated years after they are approved when revisions are agreed upon. PEPs are also intended for wide consumption, vs. bug tracker issues which the broader Python dev community are not expected to follow closely.

Dunno if we want to follow a similar pattern for Spark, since the project’s needs are different. But the Python community has used PEPs to help organize and steer development since 2000; there are plenty of examples there we can probably take inspiration from.

By the way, can we call these things something other than Spark Improvement Proposals? The acronym, SIP, conflicts with Scala SIPs. Since the Scala and Spark communities have a lot of overlap, we don’t want, for example, names like “SIP-10” to have an ambiguous meaning.

Nick


On Sun, Oct 9, 2016 at 3:34 PM Matei Zaharia <[hidden email]> wrote:
Hi Cody,

I think this would be a lot more concrete if we had a more detailed template for SIPs. Right now, it's not super clear what's in scope -- e.g. are  they a way to solicit feedback on the user-facing behavior or on the internals? "Goals" can cover both things. I've been thinking of SIPs more as Product Requirements Docs (PRDs), which focus on *what* a code change should do as opposed to how.

In particular, here are some things that you may or may not consider in scope for SIPs:

- Goals and non-goals: This is definitely in scope, and IMO should focus on user-visible behavior (e.g. "system supports SQL window functions" or "system continues working if one node fails"). BTW I wouldn't say "rejected goals" because some of them might become goals later, so we're not definitively rejecting them.

- Public API: Probably should be included in most SIPs unless it's too large to fully specify then (e.g. "let's add an ML library").

- Use cases: I usually find this very useful in PRDs to better communicate the goals.

- Internal architecture: This is usually *not* a thing users can easily comment on and it sounds more like a design doc item. Of course it's important to show that the SIP is feasible to implement. One exception, however, is that I think we'll have some SIPs primarily on internals (e.g. if somebody wants to refactor Spark's query optimizer or something).

- Rejected strategies: I personally wouldn't put this, because what's the point of voting to reject a strategy before you've really begun designing and implementing something? What if you discover that the strategy is actually better when you start doing stuff?

At a super high level, it depends on whether you want the SIPs to be PRDs for getting some quick feedback on the goals of a feature before it is designed, or something more like full-fledged design docs (just a more visible design doc for bigger changes). I looked at Kafka's KIPs, and they actually seem to be more like design docs. This can work too but it does require more work from the proposer and it can lead to the same problems you mentioned with people already having a design and implementation in mind.

Basically, the question is, are you trying to iterate faster on design by adding a step for user feedback earlier? Or are you just trying to make design docs for key features more visible (and their approval more formal)?

BTW note that in either case, I'd like to have a template for design docs too, which should also include goals. I think that would've avoided some of the issues you brought up.

Matei

On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]> wrote:

Here's my specific proposal (meta-proposal?)

Spark Improvement Proposals (SIP)


Background:

The current problem is that design and implementation of large features are often done in private, before soliciting user feedback.

When feedback is solicited, it is often as to detailed design specifics, not focused on goals.

When implementation does take place after design, there is often disagreement as to what goals are or are not in scope.

This results in commits that don't fully meet user needs.


Goals:

- Ensure user, contributor, and committer goals are clearly identified and agreed upon, before implementation takes place.

- Ensure that a technically feasible strategy is chosen that is likely to meet the goals.


Rejected Goals:

- SIPs are not for detailed design.  Design by committee doesn't work.

- SIPs are not for every change.  We dont need that much process.


Strategy:

My suggestion is outlined as a Spark Improvement Proposal process documented at

https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md

Specifics of Jira manipulation are an implementation detail we can figure out.

I'm suggesting voting; the need here is for a _clear_ outcome.


Rejected Strategies:

Having someone who understands the problem implement it first works, but only if significant iteration after user feedback is allowed.

Historically this has been problematic due to pressure to limit public api changes.


On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]> wrote:
Alright looks like there are quite a bit of support. We should wait to hear from more people too.

To push this forward, Cody and I will be working together in the next couple of weeks to come up with a concrete, detailed proposal on what this entails, and then we can discuss this the specific proposal as well.


On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden email]> wrote:
Yeah, in case it wasn't clear, I was talking about SIPs for major user-facing or cross-cutting changes, not minor feature adds.

On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos <[hidden email]> wrote:
+1 to the SIP label as long as it does not slow down things and it targets optimizing efforts, coordination etc. For example really small features should not need to go through this process (assuming they dont touch public interfaces)  or re-factorings and hope it will be kept this way. So as a guideline doc should be provided, like in the KIP case.

IMHO so far aside from tagging things and linking them elsewhere simply having design docs and prototypes implementations in PRs is not something that has not worked so far. What is really a pain in many projects out there is discontinuity in progress of PRs, missing features, slow reviews which is understandable to some extent... it is not only about Spark but things can be improved for sure for this project in particular as already stated.

On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden email]> wrote:
+1 to adding an SIP label and linking it from the website.  I think it needs

- template that focuses it towards soliciting user goals / non goals
- clear resolution as to which strategy was chosen to pursue.  I'd
recommend a vote.

Matei asked me to clarify what I meant by changing interfaces, I think
it's directly relevant to the SIP idea so I'll clarify here, and split
a thread for the other discussion per Nicholas' request.

I meant changing public user interfaces.  I think the first design is
unlikely to be right, because it's done at a time when you have the
least information.  As a user, I find it considerably more frustrating
to be unable to use a tool to get my job done, than I do having to
make minor changes to my code in order to take advantage of features.
I've seen committers be seriously reluctant to allow changes to
@experimental code that are needed in order for it to really work
right.  You need to be able to iterate, and if people on both sides of
the fence aren't going to respect that some newer apis are subject to
change, then why even mark them as such?

Ideally a finished SIP should give me a checklist of things that an
implementation must do, and things that it doesn't need to do.
Contributors/committers should be seriously discouraged from putting
out a version 0.1 that doesn't have at least a prototype
implementation of all those things, especially if they're then going
to argue against interface changes necessary to get the the rest of
the things done in the 0.2 version.


On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden email]> wrote:
> I like the lightweight proposal to add a SIP label.
>
> During Spark 2.0 development, Tom (Graves) and I suggested using wiki to
> track the list of major changes, but that never really materialized due to
> the overhead. Adding a SIP label on major JIRAs and then link to them
> prominently on the Spark website makes a lot of sense.
>
>
> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia <[hidden email]>
> wrote:
>>
>> For the improvement proposals, I think one major point was to make them
>> really visible to users who are not contributors, so we should do more than
>> sending stuff to dev@. One very lightweight idea is to have a new type of
>> JIRA called a SIP and have a link to a filter that shows all such JIRAs from
>> http://spark.apache.org. I also like the idea of SIP and design doc
>> templates (in fact many projects have them).
>>
>> Matei
>>
>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]> wrote:
>>
>> I called Cody last night and talked about some of the topics in his email.
>> It became clear to me Cody genuinely cares about the project.
>>
>> Some of the frustrations come from the success of the project itself
>> becoming very "hot", and it is difficult to get clarity from people who
>> don't dedicate all their time to Spark. In fact, it is in some ways similar
>> to scaling an engineering team in a successful startup: old processes that
>> worked well might not work so well when it gets to a certain size, cultures
>> can get diluted, building culture vs building process, etc.
>>
>> I also really like to have a more visible process for larger changes,
>> especially major user facing API changes. Historically we upload design docs
>> for major changes, but it is not always consistent and difficult to quality
>> of the docs, due to the volunteering nature of the organization.
>>
>> Some of the more concrete ideas we discussed focus on building a culture
>> to improve clarity:
>>
>> - Process: Large changes should have design docs posted on JIRA. One thing
>> Cody and I didn't discuss but an idea that just came to me is we should
>> create a design doc template for the project and ask everybody to follow.
>> The design doc template should also explicitly list goals and non-goals, to
>> make design doc more consistent.
>>
>> - Process: Email dev@ to solicit feedback. We have some this with some
>> changes, but again very inconsistent. Just posting something on JIRA isn't
>> sufficient, because there are simply too many JIRAs and the signal get lost
>> in the noise. While this is generally impossible to enforce because we can't
>> force all volunteers to conform to a process (or they might not even be
>> aware of this),  those who are more familiar with the project can help by
>> emailing the dev@ when they see something that hasn't been.
>>
>> - Culture: The design doc author(s) should be open to feedback. A design
>> doc should serve as the base for discussion and is by no means the final
>> design. Of course, this does not mean the author has to accept every
>> feedback. They should also be comfortable accepting / rejecting ideas on
>> technical grounds.
>>
>> - Process / Culture: For major ongoing projects, it can be useful to have
>> some monthly Google hangouts that are open to the world. I am actually not
>> sure how well this will work, because of the volunteering nature and we need
>> to adjust for timezones for people across the globe, but it seems worth
>> trying.
>>
>> - Culture: Contributors (including committers) should be more direct in
>> setting expectations, including whether they are working on a specific
>> issue, whether they will be working on a specific issue, and whether an
>> issue or pr or jira should be rejected. Most people I know in this community
>> are nice and don't enjoy telling other people no, but it is often more
>> annoying to a contributor to not know anything than getting a no.
>>
>>
>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia <[hidden email]>
>> wrote:
>>>
>>>
>>> Love the idea of a more visible "Spark Improvement Proposal" process that
>>> solicits user input on new APIs. For what it's worth, I don't think
>>> committers are trying to minimize their own work -- every committer cares
>>> about making the software useful for users. However, it is always hard to
>>> get user input and so it helps to have this kind of process. I've certainly
>>> looked at the *IPs a lot in other software I use just to see the biggest
>>> things on the roadmap.
>>>
>>> When you're talking about "changing interfaces", are you talking about
>>> public or internal APIs? I do think many people hate changing public APIs
>>> and I actually think that's for the best of the project. That's a technical
>>> debate, but basically, the worst thing when you're using a piece of software
>>> is that the developers constantly ask you to rewrite your app to update to a
>>> new version (and thus benefit from bug fixes, etc). Cue anyone who's used
>>> Protobuf, or Guava. The "let's get everyone to change their code this
>>> release" model works well within a single large company, but doesn't work
>>> well for a community, which is why nearly all *very* widely used programming
>>> interfaces (I'm talking things like Java standard library, Windows API, etc)
>>> almost *never* break backwards compatibility. All this is done within reason
>>> though, e.g. we do change things in major releases (2.x, 3.x, etc).
>>
>>
>>
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]




--
Stavros Kontopoulos
Senior Software Engineer
Lightbend, Inc.
<a href="tel:%2B1%20650%20678%200020" value="+16506780020" class="gmail_msg" target="_blank">p:  +30 6977967274





Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark Improvement Proposals

Nicholas Chammas

Oh, hmm… I guess I’m a little confused on the relation between Cody’s email and the document he linked to, which says:

https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md#when

SIPs should be used for significant user-facing or cross-cutting changes, not day-to-day improvements. When in doubt, if a committer thinks a change needs an SIP, it does.

Nick


On Sun, Oct 9, 2016 at 4:40 PM Matei Zaharia <[hidden email]> wrote:
Yup, but the example you gave is for alternatives about *user-facing behavior*, not implementation. The current SIP doc describes "strategy" more as implementation strategy. I'm just saying there are different possible goals for these types of docs.

BTW, PEPs and Scala SIPs focus primarily on user-facing behavior, but also require a reference implementation. This is a bit different from what Cody had in mind, I think.


Matei

On Oct 9, 2016, at 1:25 PM, Nicholas Chammas <[hidden email]> wrote:

  • Rejected strategies: I personally wouldn’t put this, because what’s the point of voting to reject a strategy before you’ve really begun designing and implementing something? What if you discover that the strategy is actually better when you start doing stuff?

I would guess the point is to document alternatives that were discussed and rejected, so that later on people can be pointed to that discussion and the devs don’t have to repeat themselves unnecessarily every time someone comes along and asks “Why didn’t you do this other thing?” That doesn’t mean a rejected proposal can’t later be revisited and the SIP can’t be updated.

For reference from the Python community, PEP 492, a Python Enhancement Proposal for adding async and await syntax and “first-class” coroutines to Python, has a section on rejected ideas for the new syntax. It captures a summary of what the devs discussed, but it doesn’t mean the PEP can’t be updated and a previously rejected proposal can’t be revived.

At least in the Python community, a PEP serves not just as formal starting point for a proposal (the “real” starting point is usually a discussion on python-ideas or python-dev), but also as documentation of what was agreed on and a living “spec” of sorts. So PEPs sometimes get updated years after they are approved when revisions are agreed upon. PEPs are also intended for wide consumption, vs. bug tracker issues which the broader Python dev community are not expected to follow closely.

Dunno if we want to follow a similar pattern for Spark, since the project’s needs are different. But the Python community has used PEPs to help organize and steer development since 2000; there are plenty of examples there we can probably take inspiration from.

By the way, can we call these things something other than Spark Improvement Proposals? The acronym, SIP, conflicts with Scala SIPs. Since the Scala and Spark communities have a lot of overlap, we don’t want, for example, names like “SIP-10” to have an ambiguous meaning.

Nick


On Sun, Oct 9, 2016 at 3:34 PM Matei Zaharia <[hidden email]> wrote:
Hi Cody,

I think this would be a lot more concrete if we had a more detailed template for SIPs. Right now, it's not super clear what's in scope -- e.g. are  they a way to solicit feedback on the user-facing behavior or on the internals? "Goals" can cover both things. I've been thinking of SIPs more as Product Requirements Docs (PRDs), which focus on *what* a code change should do as opposed to how.

In particular, here are some things that you may or may not consider in scope for SIPs:

- Goals and non-goals: This is definitely in scope, and IMO should focus on user-visible behavior (e.g. "system supports SQL window functions" or "system continues working if one node fails"). BTW I wouldn't say "rejected goals" because some of them might become goals later, so we're not definitively rejecting them.

- Public API: Probably should be included in most SIPs unless it's too large to fully specify then (e.g. "let's add an ML library").

- Use cases: I usually find this very useful in PRDs to better communicate the goals.

- Internal architecture: This is usually *not* a thing users can easily comment on and it sounds more like a design doc item. Of course it's important to show that the SIP is feasible to implement. One exception, however, is that I think we'll have some SIPs primarily on internals (e.g. if somebody wants to refactor Spark's query optimizer or something).

- Rejected strategies: I personally wouldn't put this, because what's the point of voting to reject a strategy before you've really begun designing and implementing something? What if you discover that the strategy is actually better when you start doing stuff?

At a super high level, it depends on whether you want the SIPs to be PRDs for getting some quick feedback on the goals of a feature before it is designed, or something more like full-fledged design docs (just a more visible design doc for bigger changes). I looked at Kafka's KIPs, and they actually seem to be more like design docs. This can work too but it does require more work from the proposer and it can lead to the same problems you mentioned with people already having a design and implementation in mind.

Basically, the question is, are you trying to iterate faster on design by adding a step for user feedback earlier? Or are you just trying to make design docs for key features more visible (and their approval more formal)?

BTW note that in either case, I'd like to have a template for design docs too, which should also include goals. I think that would've avoided some of the issues you brought up.

Matei

On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]> wrote:

Here's my specific proposal (meta-proposal?)

Spark Improvement Proposals (SIP)


Background:

The current problem is that design and implementation of large features are often done in private, before soliciting user feedback.

When feedback is solicited, it is often as to detailed design specifics, not focused on goals.

When implementation does take place after design, there is often disagreement as to what goals are or are not in scope.

This results in commits that don't fully meet user needs.


Goals:

- Ensure user, contributor, and committer goals are clearly identified and agreed upon, before implementation takes place.

- Ensure that a technically feasible strategy is chosen that is likely to meet the goals.


Rejected Goals:

- SIPs are not for detailed design.  Design by committee doesn't work.

- SIPs are not for every change.  We dont need that much process.


Strategy:

My suggestion is outlined as a Spark Improvement Proposal process documented at

https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md

Specifics of Jira manipulation are an implementation detail we can figure out.

I'm suggesting voting; the need here is for a _clear_ outcome.


Rejected Strategies:

Having someone who understands the problem implement it first works, but only if significant iteration after user feedback is allowed.

Historically this has been problematic due to pressure to limit public api changes.


On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]> wrote:
Alright looks like there are quite a bit of support. We should wait to hear from more people too.

To push this forward, Cody and I will be working together in the next couple of weeks to come up with a concrete, detailed proposal on what this entails, and then we can discuss this the specific proposal as well.


On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden email]> wrote:
Yeah, in case it wasn't clear, I was talking about SIPs for major user-facing or cross-cutting changes, not minor feature adds.

On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos <[hidden email]> wrote:
+1 to the SIP label as long as it does not slow down things and it targets optimizing efforts, coordination etc. For example really small features should not need to go through this process (assuming they dont touch public interfaces)  or re-factorings and hope it will be kept this way. So as a guideline doc should be provided, like in the KIP case.

IMHO so far aside from tagging things and linking them elsewhere simply having design docs and prototypes implementations in PRs is not something that has not worked so far. What is really a pain in many projects out there is discontinuity in progress of PRs, missing features, slow reviews which is understandable to some extent... it is not only about Spark but things can be improved for sure for this project in particular as already stated.

On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden email]> wrote:
+1 to adding an SIP label and linking it from the website.  I think it needs

- template that focuses it towards soliciting user goals / non goals
- clear resolution as to which strategy was chosen to pursue.  I'd
recommend a vote.

Matei asked me to clarify what I meant by changing interfaces, I think
it's directly relevant to the SIP idea so I'll clarify here, and split
a thread for the other discussion per Nicholas' request.

I meant changing public user interfaces.  I think the first design is
unlikely to be right, because it's done at a time when you have the
least information.  As a user, I find it considerably more frustrating
to be unable to use a tool to get my job done, than I do having to
make minor changes to my code in order to take advantage of features.
I've seen committers be seriously reluctant to allow changes to
@experimental code that are needed in order for it to really work
right.  You need to be able to iterate, and if people on both sides of
the fence aren't going to respect that some newer apis are subject to
change, then why even mark them as such?

Ideally a finished SIP should give me a checklist of things that an
implementation must do, and things that it doesn't need to do.
Contributors/committers should be seriously discouraged from putting
out a version 0.1 that doesn't have at least a prototype
implementation of all those things, especially if they're then going
to argue against interface changes necessary to get the the rest of
the things done in the 0.2 version.


On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden email]> wrote:
> I like the lightweight proposal to add a SIP label.
>
> During Spark 2.0 development, Tom (Graves) and I suggested using wiki to
> track the list of major changes, but that never really materialized due to
> the overhead. Adding a SIP label on major JIRAs and then link to them
> prominently on the Spark website makes a lot of sense.
>
>
> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia <[hidden email]>
> wrote:
>>
>> For the improvement proposals, I think one major point was to make them
>> really visible to users who are not contributors, so we should do more than
>> sending stuff to dev@. One very lightweight idea is to have a new type of
>> JIRA called a SIP and have a link to a filter that shows all such JIRAs from
>> http://spark.apache.org. I also like the idea of SIP and design doc
>> templates (in fact many projects have them).
>>
>> Matei
>>
>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]> wrote:
>>
>> I called Cody last night and talked about some of the topics in his email.
>> It became clear to me Cody genuinely cares about the project.
>>
>> Some of the frustrations come from the success of the project itself
>> becoming very "hot", and it is difficult to get clarity from people who
>> don't dedicate all their time to Spark. In fact, it is in some ways similar
>> to scaling an engineering team in a successful startup: old processes that
>> worked well might not work so well when it gets to a certain size, cultures
>> can get diluted, building culture vs building process, etc.
>>
>> I also really like to have a more visible process for larger changes,
>> especially major user facing API changes. Historically we upload design docs
>> for major changes, but it is not always consistent and difficult to quality
>> of the docs, due to the volunteering nature of the organization.
>>
>> Some of the more concrete ideas we discussed focus on building a culture
>> to improve clarity:
>>
>> - Process: Large changes should have design docs posted on JIRA. One thing
>> Cody and I didn't discuss but an idea that just came to me is we should
>> create a design doc template for the project and ask everybody to follow.
>> The design doc template should also explicitly list goals and non-goals, to
>> make design doc more consistent.
>>
>> - Process: Email dev@ to solicit feedback. We have some this with some
>> changes, but again very inconsistent. Just posting something on JIRA isn't
>> sufficient, because there are simply too many JIRAs and the signal get lost
>> in the noise. While this is generally impossible to enforce because we can't
>> force all volunteers to conform to a process (or they might not even be
>> aware of this),  those who are more familiar with the project can help by
>> emailing the dev@ when they see something that hasn't been.
>>
>> - Culture: The design doc author(s) should be open to feedback. A design
>> doc should serve as the base for discussion and is by no means the final
>> design. Of course, this does not mean the author has to accept every
>> feedback. They should also be comfortable accepting / rejecting ideas on
>> technical grounds.
>>
>> - Process / Culture: For major ongoing projects, it can be useful to have
>> some monthly Google hangouts that are open to the world. I am actually not
>> sure how well this will work, because of the volunteering nature and we need
>> to adjust for timezones for people across the globe, but it seems worth
>> trying.
>>
>> - Culture: Contributors (including committers) should be more direct in
>> setting expectations, including whether they are working on a specific
>> issue, whether they will be working on a specific issue, and whether an
>> issue or pr or jira should be rejected. Most people I know in this community
>> are nice and don't enjoy telling other people no, but it is often more
>> annoying to a contributor to not know anything than getting a no.
>>
>>
>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia <[hidden email]>
>> wrote:
>>>
>>>
>>> Love the idea of a more visible "Spark Improvement Proposal" process that
>>> solicits user input on new APIs. For what it's worth, I don't think
>>> committers are trying to minimize their own work -- every committer cares
>>> about making the software useful for users. However, it is always hard to
>>> get user input and so it helps to have this kind of process. I've certainly
>>> looked at the *IPs a lot in other software I use just to see the biggest
>>> things on the roadmap.
>>>
>>> When you're talking about "changing interfaces", are you talking about
>>> public or internal APIs? I do think many people hate changing public APIs
>>> and I actually think that's for the best of the project. That's a technical
>>> debate, but basically, the worst thing when you're using a piece of software
>>> is that the developers constantly ask you to rewrite your app to update to a
>>> new version (and thus benefit from bug fixes, etc). Cue anyone who's used
>>> Protobuf, or Guava. The "let's get everyone to change their code this
>>> release" model works well within a single large company, but doesn't work
>>> well for a community, which is why nearly all *very* widely used programming
>>> interfaces (I'm talking things like Java standard library, Windows API, etc)
>>> almost *never* break backwards compatibility. All this is done within reason
>>> though, e.g. we do change things in major releases (2.x, 3.x, etc).
>>
>>
>>
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]




--
Stavros Kontopoulos
Senior Software Engineer
Lightbend, Inc.
<a href="tel:%2B1%20650%20678%200020" value="+16506780020" class="gmail_msg" target="_blank">p:  +30 6977967274





Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark Improvement Proposals

Cody Koeninger-2
If there's confusion there, the document is specifically what I'm proposing.  The email is just by way of introduction.

On Sun, Oct 9, 2016 at 3:47 PM, Nicholas Chammas <[hidden email]> wrote:

Oh, hmm… I guess I’m a little confused on the relation between Cody’s email and the document he linked to, which says:

https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md#when

SIPs should be used for significant user-facing or cross-cutting changes, not day-to-day improvements. When in doubt, if a committer thinks a change needs an SIP, it does.

Nick


On Sun, Oct 9, 2016 at 4:40 PM Matei Zaharia <[hidden email]> wrote:
Yup, but the example you gave is for alternatives about *user-facing behavior*, not implementation. The current SIP doc describes "strategy" more as implementation strategy. I'm just saying there are different possible goals for these types of docs.

BTW, PEPs and Scala SIPs focus primarily on user-facing behavior, but also require a reference implementation. This is a bit different from what Cody had in mind, I think.


Matei

On Oct 9, 2016, at 1:25 PM, Nicholas Chammas <[hidden email]> wrote:

  • Rejected strategies: I personally wouldn’t put this, because what’s the point of voting to reject a strategy before you’ve really begun designing and implementing something? What if you discover that the strategy is actually better when you start doing stuff?

I would guess the point is to document alternatives that were discussed and rejected, so that later on people can be pointed to that discussion and the devs don’t have to repeat themselves unnecessarily every time someone comes along and asks “Why didn’t you do this other thing?” That doesn’t mean a rejected proposal can’t later be revisited and the SIP can’t be updated.

For reference from the Python community, PEP 492, a Python Enhancement Proposal for adding async and await syntax and “first-class” coroutines to Python, has a section on rejected ideas for the new syntax. It captures a summary of what the devs discussed, but it doesn’t mean the PEP can’t be updated and a previously rejected proposal can’t be revived.

At least in the Python community, a PEP serves not just as formal starting point for a proposal (the “real” starting point is usually a discussion on python-ideas or python-dev), but also as documentation of what was agreed on and a living “spec” of sorts. So PEPs sometimes get updated years after they are approved when revisions are agreed upon. PEPs are also intended for wide consumption, vs. bug tracker issues which the broader Python dev community are not expected to follow closely.

Dunno if we want to follow a similar pattern for Spark, since the project’s needs are different. But the Python community has used PEPs to help organize and steer development since 2000; there are plenty of examples there we can probably take inspiration from.

By the way, can we call these things something other than Spark Improvement Proposals? The acronym, SIP, conflicts with Scala SIPs. Since the Scala and Spark communities have a lot of overlap, we don’t want, for example, names like “SIP-10” to have an ambiguous meaning.

Nick


On Sun, Oct 9, 2016 at 3:34 PM Matei Zaharia <[hidden email]> wrote:
Hi Cody,

I think this would be a lot more concrete if we had a more detailed template for SIPs. Right now, it's not super clear what's in scope -- e.g. are  they a way to solicit feedback on the user-facing behavior or on the internals? "Goals" can cover both things. I've been thinking of SIPs more as Product Requirements Docs (PRDs), which focus on *what* a code change should do as opposed to how.

In particular, here are some things that you may or may not consider in scope for SIPs:

- Goals and non-goals: This is definitely in scope, and IMO should focus on user-visible behavior (e.g. "system supports SQL window functions" or "system continues working if one node fails"). BTW I wouldn't say "rejected goals" because some of them might become goals later, so we're not definitively rejecting them.

- Public API: Probably should be included in most SIPs unless it's too large to fully specify then (e.g. "let's add an ML library").

- Use cases: I usually find this very useful in PRDs to better communicate the goals.

- Internal architecture: This is usually *not* a thing users can easily comment on and it sounds more like a design doc item. Of course it's important to show that the SIP is feasible to implement. One exception, however, is that I think we'll have some SIPs primarily on internals (e.g. if somebody wants to refactor Spark's query optimizer or something).

- Rejected strategies: I personally wouldn't put this, because what's the point of voting to reject a strategy before you've really begun designing and implementing something? What if you discover that the strategy is actually better when you start doing stuff?

At a super high level, it depends on whether you want the SIPs to be PRDs for getting some quick feedback on the goals of a feature before it is designed, or something more like full-fledged design docs (just a more visible design doc for bigger changes). I looked at Kafka's KIPs, and they actually seem to be more like design docs. This can work too but it does require more work from the proposer and it can lead to the same problems you mentioned with people already having a design and implementation in mind.

Basically, the question is, are you trying to iterate faster on design by adding a step for user feedback earlier? Or are you just trying to make design docs for key features more visible (and their approval more formal)?

BTW note that in either case, I'd like to have a template for design docs too, which should also include goals. I think that would've avoided some of the issues you brought up.

Matei

On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]> wrote:

Here's my specific proposal (meta-proposal?)

Spark Improvement Proposals (SIP)


Background:

The current problem is that design and implementation of large features are often done in private, before soliciting user feedback.

When feedback is solicited, it is often as to detailed design specifics, not focused on goals.

When implementation does take place after design, there is often disagreement as to what goals are or are not in scope.

This results in commits that don't fully meet user needs.


Goals:

- Ensure user, contributor, and committer goals are clearly identified and agreed upon, before implementation takes place.

- Ensure that a technically feasible strategy is chosen that is likely to meet the goals.


Rejected Goals:

- SIPs are not for detailed design.  Design by committee doesn't work.

- SIPs are not for every change.  We dont need that much process.


Strategy:

My suggestion is outlined as a Spark Improvement Proposal process documented at

https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md

Specifics of Jira manipulation are an implementation detail we can figure out.

I'm suggesting voting; the need here is for a _clear_ outcome.


Rejected Strategies:

Having someone who understands the problem implement it first works, but only if significant iteration after user feedback is allowed.

Historically this has been problematic due to pressure to limit public api changes.


On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]> wrote:
Alright looks like there are quite a bit of support. We should wait to hear from more people too.

To push this forward, Cody and I will be working together in the next couple of weeks to come up with a concrete, detailed proposal on what this entails, and then we can discuss this the specific proposal as well.


On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden email]> wrote:
Yeah, in case it wasn't clear, I was talking about SIPs for major user-facing or cross-cutting changes, not minor feature adds.

On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos <[hidden email]> wrote:
+1 to the SIP label as long as it does not slow down things and it targets optimizing efforts, coordination etc. For example really small features should not need to go through this process (assuming they dont touch public interfaces)  or re-factorings and hope it will be kept this way. So as a guideline doc should be provided, like in the KIP case.

IMHO so far aside from tagging things and linking them elsewhere simply having design docs and prototypes implementations in PRs is not something that has not worked so far. What is really a pain in many projects out there is discontinuity in progress of PRs, missing features, slow reviews which is understandable to some extent... it is not only about Spark but things can be improved for sure for this project in particular as already stated.

On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden email]> wrote:
+1 to adding an SIP label and linking it from the website.  I think it needs

- template that focuses it towards soliciting user goals / non goals
- clear resolution as to which strategy was chosen to pursue.  I'd
recommend a vote.

Matei asked me to clarify what I meant by changing interfaces, I think
it's directly relevant to the SIP idea so I'll clarify here, and split
a thread for the other discussion per Nicholas' request.

I meant changing public user interfaces.  I think the first design is
unlikely to be right, because it's done at a time when you have the
least information.  As a user, I find it considerably more frustrating
to be unable to use a tool to get my job done, than I do having to
make minor changes to my code in order to take advantage of features.
I've seen committers be seriously reluctant to allow changes to
@experimental code that are needed in order for it to really work
right.  You need to be able to iterate, and if people on both sides of
the fence aren't going to respect that some newer apis are subject to
change, then why even mark them as such?

Ideally a finished SIP should give me a checklist of things that an
implementation must do, and things that it doesn't need to do.
Contributors/committers should be seriously discouraged from putting
out a version 0.1 that doesn't have at least a prototype
implementation of all those things, especially if they're then going
to argue against interface changes necessary to get the the rest of
the things done in the 0.2 version.


On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden email]> wrote:
> I like the lightweight proposal to add a SIP label.
>
> During Spark 2.0 development, Tom (Graves) and I suggested using wiki to
> track the list of major changes, but that never really materialized due to
> the overhead. Adding a SIP label on major JIRAs and then link to them
> prominently on the Spark website makes a lot of sense.
>
>
> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia <[hidden email]>
> wrote:
>>
>> For the improvement proposals, I think one major point was to make them
>> really visible to users who are not contributors, so we should do more than
>> sending stuff to dev@. One very lightweight idea is to have a new type of
>> JIRA called a SIP and have a link to a filter that shows all such JIRAs from
>> http://spark.apache.org. I also like the idea of SIP and design doc
>> templates (in fact many projects have them).
>>
>> Matei
>>
>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]> wrote:
>>
>> I called Cody last night and talked about some of the topics in his email.
>> It became clear to me Cody genuinely cares about the project.
>>
>> Some of the frustrations come from the success of the project itself
>> becoming very "hot", and it is difficult to get clarity from people who
>> don't dedicate all their time to Spark. In fact, it is in some ways similar
>> to scaling an engineering team in a successful startup: old processes that
>> worked well might not work so well when it gets to a certain size, cultures
>> can get diluted, building culture vs building process, etc.
>>
>> I also really like to have a more visible process for larger changes,
>> especially major user facing API changes. Historically we upload design docs
>> for major changes, but it is not always consistent and difficult to quality
>> of the docs, due to the volunteering nature of the organization.
>>
>> Some of the more concrete ideas we discussed focus on building a culture
>> to improve clarity:
>>
>> - Process: Large changes should have design docs posted on JIRA. One thing
>> Cody and I didn't discuss but an idea that just came to me is we should
>> create a design doc template for the project and ask everybody to follow.
>> The design doc template should also explicitly list goals and non-goals, to
>> make design doc more consistent.
>>
>> - Process: Email dev@ to solicit feedback. We have some this with some
>> changes, but again very inconsistent. Just posting something on JIRA isn't
>> sufficient, because there are simply too many JIRAs and the signal get lost
>> in the noise. While this is generally impossible to enforce because we can't
>> force all volunteers to conform to a process (or they might not even be
>> aware of this),  those who are more familiar with the project can help by
>> emailing the dev@ when they see something that hasn't been.
>>
>> - Culture: The design doc author(s) should be open to feedback. A design
>> doc should serve as the base for discussion and is by no means the final
>> design. Of course, this does not mean the author has to accept every
>> feedback. They should also be comfortable accepting / rejecting ideas on
>> technical grounds.
>>
>> - Process / Culture: For major ongoing projects, it can be useful to have
>> some monthly Google hangouts that are open to the world. I am actually not
>> sure how well this will work, because of the volunteering nature and we need
>> to adjust for timezones for people across the globe, but it seems worth
>> trying.
>>
>> - Culture: Contributors (including committers) should be more direct in
>> setting expectations, including whether they are working on a specific
>> issue, whether they will be working on a specific issue, and whether an
>> issue or pr or jira should be rejected. Most people I know in this community
>> are nice and don't enjoy telling other people no, but it is often more
>> annoying to a contributor to not know anything than getting a no.
>>
>>
>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia <[hidden email]>
>> wrote:
>>>
>>>
>>> Love the idea of a more visible "Spark Improvement Proposal" process that
>>> solicits user input on new APIs. For what it's worth, I don't think
>>> committers are trying to minimize their own work -- every committer cares
>>> about making the software useful for users. However, it is always hard to
>>> get user input and so it helps to have this kind of process. I've certainly
>>> looked at the *IPs a lot in other software I use just to see the biggest
>>> things on the roadmap.
>>>
>>> When you're talking about "changing interfaces", are you talking about
>>> public or internal APIs? I do think many people hate changing public APIs
>>> and I actually think that's for the best of the project. That's a technical
>>> debate, but basically, the worst thing when you're using a piece of software
>>> is that the developers constantly ask you to rewrite your app to update to a
>>> new version (and thus benefit from bug fixes, etc). Cue anyone who's used
>>> Protobuf, or Guava. The "let's get everyone to change their code this
>>> release" model works well within a single large company, but doesn't work
>>> well for a community, which is why nearly all *very* widely used programming
>>> interfaces (I'm talking things like Java standard library, Windows API, etc)
>>> almost *never* break backwards compatibility. All this is done within reason
>>> though, e.g. we do change things in major releases (2.x, 3.x, etc).
>>
>>
>>
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]




--
Stavros Kontopoulos
Senior Software Engineer
Lightbend, Inc.
<a href="tel:%2B1%20650%20678%200020" value="+16506780020" target="_blank">p:  +30 6977967274






Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark Improvement Proposals

Cody Koeninger-2
In reply to this post by Matei Zaharia
So to focus the discussion on the specific strategy I'm suggesting,
documented at

https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md

"Goals: What must this allow people to do, that they can't currently?"

Is it unclear that this is focusing specifically on people-visible behavior?

Rejected goals -  are important because otherwise people keep trying
to argue about scope.  Of course you can change things later with a
different SIP and different vote, the point is to focus.

Use cases - are something that people are going to bring up in
discussion.  If they aren't clearly documented as a goal ("This must
allow me to connect using SSL"), they should be added.

Internal architecture - if the people who need specific behavior are
implementers of other parts of the system, that's fine.

Rejected strategies - If you have none of these, you have no evidence
that the proponent didn't just go with the first thing they had in
mind (or have already implemented), which is a big problem currently.
Approval isn't binding as to specifics of implementation, so these
aren't handcuffs.  The goals are the contract, the strategy is
evidence that contract can actually be met.

Design docs - I'm not touching design docs.  The markdown file I
linked specifically says of the strategy section "This is not a full
design document."  Is this unclear?  Design docs can be worked on
obviously, but that's not what I'm concerned with here.




On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[hidden email]> wrote:

> Hi Cody,
>
> I think this would be a lot more concrete if we had a more detailed template
> for SIPs. Right now, it's not super clear what's in scope -- e.g. are  they
> a way to solicit feedback on the user-facing behavior or on the internals?
> "Goals" can cover both things. I've been thinking of SIPs more as Product
> Requirements Docs (PRDs), which focus on *what* a code change should do as
> opposed to how.
>
> In particular, here are some things that you may or may not consider in
> scope for SIPs:
>
> - Goals and non-goals: This is definitely in scope, and IMO should focus on
> user-visible behavior (e.g. "system supports SQL window functions" or
> "system continues working if one node fails"). BTW I wouldn't say "rejected
> goals" because some of them might become goals later, so we're not
> definitively rejecting them.
>
> - Public API: Probably should be included in most SIPs unless it's too large
> to fully specify then (e.g. "let's add an ML library").
>
> - Use cases: I usually find this very useful in PRDs to better communicate
> the goals.
>
> - Internal architecture: This is usually *not* a thing users can easily
> comment on and it sounds more like a design doc item. Of course it's
> important to show that the SIP is feasible to implement. One exception,
> however, is that I think we'll have some SIPs primarily on internals (e.g.
> if somebody wants to refactor Spark's query optimizer or something).
>
> - Rejected strategies: I personally wouldn't put this, because what's the
> point of voting to reject a strategy before you've really begun designing
> and implementing something? What if you discover that the strategy is
> actually better when you start doing stuff?
>
> At a super high level, it depends on whether you want the SIPs to be PRDs
> for getting some quick feedback on the goals of a feature before it is
> designed, or something more like full-fledged design docs (just a more
> visible design doc for bigger changes). I looked at Kafka's KIPs, and they
> actually seem to be more like design docs. This can work too but it does
> require more work from the proposer and it can lead to the same problems you
> mentioned with people already having a design and implementation in mind.
>
> Basically, the question is, are you trying to iterate faster on design by
> adding a step for user feedback earlier? Or are you just trying to make
> design docs for key features more visible (and their approval more formal)?
>
> BTW note that in either case, I'd like to have a template for design docs
> too, which should also include goals. I think that would've avoided some of
> the issues you brought up.
>
> Matei
>
> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]> wrote:
>
> Here's my specific proposal (meta-proposal?)
>
> Spark Improvement Proposals (SIP)
>
>
> Background:
>
> The current problem is that design and implementation of large features are
> often done in private, before soliciting user feedback.
>
> When feedback is solicited, it is often as to detailed design specifics, not
> focused on goals.
>
> When implementation does take place after design, there is often
> disagreement as to what goals are or are not in scope.
>
> This results in commits that don't fully meet user needs.
>
>
> Goals:
>
> - Ensure user, contributor, and committer goals are clearly identified and
> agreed upon, before implementation takes place.
>
> - Ensure that a technically feasible strategy is chosen that is likely to
> meet the goals.
>
>
> Rejected Goals:
>
> - SIPs are not for detailed design.  Design by committee doesn't work.
>
> - SIPs are not for every change.  We dont need that much process.
>
>
> Strategy:
>
> My suggestion is outlined as a Spark Improvement Proposal process documented
> at
>
> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>
> Specifics of Jira manipulation are an implementation detail we can figure
> out.
>
> I'm suggesting voting; the need here is for a _clear_ outcome.
>
>
> Rejected Strategies:
>
> Having someone who understands the problem implement it first works, but
> only if significant iteration after user feedback is allowed.
>
> Historically this has been problematic due to pressure to limit public api
> changes.
>
>
> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]> wrote:
>>
>> Alright looks like there are quite a bit of support. We should wait to
>> hear from more people too.
>>
>> To push this forward, Cody and I will be working together in the next
>> couple of weeks to come up with a concrete, detailed proposal on what this
>> entails, and then we can discuss this the specific proposal as well.
>>
>>
>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden email]> wrote:
>>>
>>> Yeah, in case it wasn't clear, I was talking about SIPs for major
>>> user-facing or cross-cutting changes, not minor feature adds.
>>>
>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos
>>> <[hidden email]> wrote:
>>>>
>>>> +1 to the SIP label as long as it does not slow down things and it
>>>> targets optimizing efforts, coordination etc. For example really small
>>>> features should not need to go through this process (assuming they dont
>>>> touch public interfaces)  or re-factorings and hope it will be kept this
>>>> way. So as a guideline doc should be provided, like in the KIP case.
>>>>
>>>> IMHO so far aside from tagging things and linking them elsewhere simply
>>>> having design docs and prototypes implementations in PRs is not something
>>>> that has not worked so far. What is really a pain in many projects out there
>>>> is discontinuity in progress of PRs, missing features, slow reviews which is
>>>> understandable to some extent... it is not only about Spark but things can
>>>> be improved for sure for this project in particular as already stated.
>>>>
>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden email]>
>>>> wrote:
>>>>>
>>>>> +1 to adding an SIP label and linking it from the website.  I think it
>>>>> needs
>>>>>
>>>>> - template that focuses it towards soliciting user goals / non goals
>>>>> - clear resolution as to which strategy was chosen to pursue.  I'd
>>>>> recommend a vote.
>>>>>
>>>>> Matei asked me to clarify what I meant by changing interfaces, I think
>>>>> it's directly relevant to the SIP idea so I'll clarify here, and split
>>>>> a thread for the other discussion per Nicholas' request.
>>>>>
>>>>> I meant changing public user interfaces.  I think the first design is
>>>>> unlikely to be right, because it's done at a time when you have the
>>>>> least information.  As a user, I find it considerably more frustrating
>>>>> to be unable to use a tool to get my job done, than I do having to
>>>>> make minor changes to my code in order to take advantage of features.
>>>>> I've seen committers be seriously reluctant to allow changes to
>>>>> @experimental code that are needed in order for it to really work
>>>>> right.  You need to be able to iterate, and if people on both sides of
>>>>> the fence aren't going to respect that some newer apis are subject to
>>>>> change, then why even mark them as such?
>>>>>
>>>>> Ideally a finished SIP should give me a checklist of things that an
>>>>> implementation must do, and things that it doesn't need to do.
>>>>> Contributors/committers should be seriously discouraged from putting
>>>>> out a version 0.1 that doesn't have at least a prototype
>>>>> implementation of all those things, especially if they're then going
>>>>> to argue against interface changes necessary to get the the rest of
>>>>> the things done in the 0.2 version.
>>>>>
>>>>>
>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden email]>
>>>>> wrote:
>>>>> > I like the lightweight proposal to add a SIP label.
>>>>> >
>>>>> > During Spark 2.0 development, Tom (Graves) and I suggested using wiki
>>>>> > to
>>>>> > track the list of major changes, but that never really materialized
>>>>> > due to
>>>>> > the overhead. Adding a SIP label on major JIRAs and then link to them
>>>>> > prominently on the Spark website makes a lot of sense.
>>>>> >
>>>>> >
>>>>> > On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia
>>>>> > <[hidden email]>
>>>>> > wrote:
>>>>> >>
>>>>> >> For the improvement proposals, I think one major point was to make
>>>>> >> them
>>>>> >> really visible to users who are not contributors, so we should do
>>>>> >> more than
>>>>> >> sending stuff to dev@. One very lightweight idea is to have a new
>>>>> >> type of
>>>>> >> JIRA called a SIP and have a link to a filter that shows all such
>>>>> >> JIRAs from
>>>>> >> http://spark.apache.org. I also like the idea of SIP and design doc
>>>>> >> templates (in fact many projects have them).
>>>>> >>
>>>>> >> Matei
>>>>> >>
>>>>> >> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]>
>>>>> >> wrote:
>>>>> >>
>>>>> >> I called Cody last night and talked about some of the topics in his
>>>>> >> email.
>>>>> >> It became clear to me Cody genuinely cares about the project.
>>>>> >>
>>>>> >> Some of the frustrations come from the success of the project itself
>>>>> >> becoming very "hot", and it is difficult to get clarity from people
>>>>> >> who
>>>>> >> don't dedicate all their time to Spark. In fact, it is in some ways
>>>>> >> similar
>>>>> >> to scaling an engineering team in a successful startup: old
>>>>> >> processes that
>>>>> >> worked well might not work so well when it gets to a certain size,
>>>>> >> cultures
>>>>> >> can get diluted, building culture vs building process, etc.
>>>>> >>
>>>>> >> I also really like to have a more visible process for larger
>>>>> >> changes,
>>>>> >> especially major user facing API changes. Historically we upload
>>>>> >> design docs
>>>>> >> for major changes, but it is not always consistent and difficult to
>>>>> >> quality
>>>>> >> of the docs, due to the volunteering nature of the organization.
>>>>> >>
>>>>> >> Some of the more concrete ideas we discussed focus on building a
>>>>> >> culture
>>>>> >> to improve clarity:
>>>>> >>
>>>>> >> - Process: Large changes should have design docs posted on JIRA. One
>>>>> >> thing
>>>>> >> Cody and I didn't discuss but an idea that just came to me is we
>>>>> >> should
>>>>> >> create a design doc template for the project and ask everybody to
>>>>> >> follow.
>>>>> >> The design doc template should also explicitly list goals and
>>>>> >> non-goals, to
>>>>> >> make design doc more consistent.
>>>>> >>
>>>>> >> - Process: Email dev@ to solicit feedback. We have some this with
>>>>> >> some
>>>>> >> changes, but again very inconsistent. Just posting something on JIRA
>>>>> >> isn't
>>>>> >> sufficient, because there are simply too many JIRAs and the signal
>>>>> >> get lost
>>>>> >> in the noise. While this is generally impossible to enforce because
>>>>> >> we can't
>>>>> >> force all volunteers to conform to a process (or they might not even
>>>>> >> be
>>>>> >> aware of this),  those who are more familiar with the project can
>>>>> >> help by
>>>>> >> emailing the dev@ when they see something that hasn't been.
>>>>> >>
>>>>> >> - Culture: The design doc author(s) should be open to feedback. A
>>>>> >> design
>>>>> >> doc should serve as the base for discussion and is by no means the
>>>>> >> final
>>>>> >> design. Of course, this does not mean the author has to accept every
>>>>> >> feedback. They should also be comfortable accepting / rejecting
>>>>> >> ideas on
>>>>> >> technical grounds.
>>>>> >>
>>>>> >> - Process / Culture: For major ongoing projects, it can be useful to
>>>>> >> have
>>>>> >> some monthly Google hangouts that are open to the world. I am
>>>>> >> actually not
>>>>> >> sure how well this will work, because of the volunteering nature and
>>>>> >> we need
>>>>> >> to adjust for timezones for people across the globe, but it seems
>>>>> >> worth
>>>>> >> trying.
>>>>> >>
>>>>> >> - Culture: Contributors (including committers) should be more direct
>>>>> >> in
>>>>> >> setting expectations, including whether they are working on a
>>>>> >> specific
>>>>> >> issue, whether they will be working on a specific issue, and whether
>>>>> >> an
>>>>> >> issue or pr or jira should be rejected. Most people I know in this
>>>>> >> community
>>>>> >> are nice and don't enjoy telling other people no, but it is often
>>>>> >> more
>>>>> >> annoying to a contributor to not know anything than getting a no.
>>>>> >>
>>>>> >>
>>>>> >> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia
>>>>> >> <[hidden email]>
>>>>> >> wrote:
>>>>> >>>
>>>>> >>>
>>>>> >>> Love the idea of a more visible "Spark Improvement Proposal"
>>>>> >>> process that
>>>>> >>> solicits user input on new APIs. For what it's worth, I don't think
>>>>> >>> committers are trying to minimize their own work -- every committer
>>>>> >>> cares
>>>>> >>> about making the software useful for users. However, it is always
>>>>> >>> hard to
>>>>> >>> get user input and so it helps to have this kind of process. I've
>>>>> >>> certainly
>>>>> >>> looked at the *IPs a lot in other software I use just to see the
>>>>> >>> biggest
>>>>> >>> things on the roadmap.
>>>>> >>>
>>>>> >>> When you're talking about "changing interfaces", are you talking
>>>>> >>> about
>>>>> >>> public or internal APIs? I do think many people hate changing
>>>>> >>> public APIs
>>>>> >>> and I actually think that's for the best of the project. That's a
>>>>> >>> technical
>>>>> >>> debate, but basically, the worst thing when you're using a piece of
>>>>> >>> software
>>>>> >>> is that the developers constantly ask you to rewrite your app to
>>>>> >>> update to a
>>>>> >>> new version (and thus benefit from bug fixes, etc). Cue anyone
>>>>> >>> who's used
>>>>> >>> Protobuf, or Guava. The "let's get everyone to change their code
>>>>> >>> this
>>>>> >>> release" model works well within a single large company, but
>>>>> >>> doesn't work
>>>>> >>> well for a community, which is why nearly all *very* widely used
>>>>> >>> programming
>>>>> >>> interfaces (I'm talking things like Java standard library, Windows
>>>>> >>> API, etc)
>>>>> >>> almost *never* break backwards compatibility. All this is done
>>>>> >>> within reason
>>>>> >>> though, e.g. we do change things in major releases (2.x, 3.x, etc).
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe e-mail: [hidden email]
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Stavros Kontopoulos
>>>> Senior Software Engineer
>>>> Lightbend, Inc.
>>>> p:  +30 6977967274
>>>> e: [hidden email]
>>>>
>>>>
>>>
>>
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark Improvement Proposals

Cody Koeninger-2
Regarding name, if the SIP overlap is a concern, we can pick a different name.
My tongue in cheek suggestion would be
Spark Lightweight Improvement process (SPARKLI)

On Sun, Oct 9, 2016 at 4:14 PM, Cody Koeninger <[hidden email]> wrote:

> So to focus the discussion on the specific strategy I'm suggesting,
> documented at
>
> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>
> "Goals: What must this allow people to do, that they can't currently?"
>
> Is it unclear that this is focusing specifically on people-visible behavior?
>
> Rejected goals -  are important because otherwise people keep trying
> to argue about scope.  Of course you can change things later with a
> different SIP and different vote, the point is to focus.
>
> Use cases - are something that people are going to bring up in
> discussion.  If they aren't clearly documented as a goal ("This must
> allow me to connect using SSL"), they should be added.
>
> Internal architecture - if the people who need specific behavior are
> implementers of other parts of the system, that's fine.
>
> Rejected strategies - If you have none of these, you have no evidence
> that the proponent didn't just go with the first thing they had in
> mind (or have already implemented), which is a big problem currently.
> Approval isn't binding as to specifics of implementation, so these
> aren't handcuffs.  The goals are the contract, the strategy is
> evidence that contract can actually be met.
>
> Design docs - I'm not touching design docs.  The markdown file I
> linked specifically says of the strategy section "This is not a full
> design document."  Is this unclear?  Design docs can be worked on
> obviously, but that's not what I'm concerned with here.
>
>
>
>
> On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[hidden email]> wrote:
>> Hi Cody,
>>
>> I think this would be a lot more concrete if we had a more detailed template
>> for SIPs. Right now, it's not super clear what's in scope -- e.g. are  they
>> a way to solicit feedback on the user-facing behavior or on the internals?
>> "Goals" can cover both things. I've been thinking of SIPs more as Product
>> Requirements Docs (PRDs), which focus on *what* a code change should do as
>> opposed to how.
>>
>> In particular, here are some things that you may or may not consider in
>> scope for SIPs:
>>
>> - Goals and non-goals: This is definitely in scope, and IMO should focus on
>> user-visible behavior (e.g. "system supports SQL window functions" or
>> "system continues working if one node fails"). BTW I wouldn't say "rejected
>> goals" because some of them might become goals later, so we're not
>> definitively rejecting them.
>>
>> - Public API: Probably should be included in most SIPs unless it's too large
>> to fully specify then (e.g. "let's add an ML library").
>>
>> - Use cases: I usually find this very useful in PRDs to better communicate
>> the goals.
>>
>> - Internal architecture: This is usually *not* a thing users can easily
>> comment on and it sounds more like a design doc item. Of course it's
>> important to show that the SIP is feasible to implement. One exception,
>> however, is that I think we'll have some SIPs primarily on internals (e.g.
>> if somebody wants to refactor Spark's query optimizer or something).
>>
>> - Rejected strategies: I personally wouldn't put this, because what's the
>> point of voting to reject a strategy before you've really begun designing
>> and implementing something? What if you discover that the strategy is
>> actually better when you start doing stuff?
>>
>> At a super high level, it depends on whether you want the SIPs to be PRDs
>> for getting some quick feedback on the goals of a feature before it is
>> designed, or something more like full-fledged design docs (just a more
>> visible design doc for bigger changes). I looked at Kafka's KIPs, and they
>> actually seem to be more like design docs. This can work too but it does
>> require more work from the proposer and it can lead to the same problems you
>> mentioned with people already having a design and implementation in mind.
>>
>> Basically, the question is, are you trying to iterate faster on design by
>> adding a step for user feedback earlier? Or are you just trying to make
>> design docs for key features more visible (and their approval more formal)?
>>
>> BTW note that in either case, I'd like to have a template for design docs
>> too, which should also include goals. I think that would've avoided some of
>> the issues you brought up.
>>
>> Matei
>>
>> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]> wrote:
>>
>> Here's my specific proposal (meta-proposal?)
>>
>> Spark Improvement Proposals (SIP)
>>
>>
>> Background:
>>
>> The current problem is that design and implementation of large features are
>> often done in private, before soliciting user feedback.
>>
>> When feedback is solicited, it is often as to detailed design specifics, not
>> focused on goals.
>>
>> When implementation does take place after design, there is often
>> disagreement as to what goals are or are not in scope.
>>
>> This results in commits that don't fully meet user needs.
>>
>>
>> Goals:
>>
>> - Ensure user, contributor, and committer goals are clearly identified and
>> agreed upon, before implementation takes place.
>>
>> - Ensure that a technically feasible strategy is chosen that is likely to
>> meet the goals.
>>
>>
>> Rejected Goals:
>>
>> - SIPs are not for detailed design.  Design by committee doesn't work.
>>
>> - SIPs are not for every change.  We dont need that much process.
>>
>>
>> Strategy:
>>
>> My suggestion is outlined as a Spark Improvement Proposal process documented
>> at
>>
>> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>>
>> Specifics of Jira manipulation are an implementation detail we can figure
>> out.
>>
>> I'm suggesting voting; the need here is for a _clear_ outcome.
>>
>>
>> Rejected Strategies:
>>
>> Having someone who understands the problem implement it first works, but
>> only if significant iteration after user feedback is allowed.
>>
>> Historically this has been problematic due to pressure to limit public api
>> changes.
>>
>>
>> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]> wrote:
>>>
>>> Alright looks like there are quite a bit of support. We should wait to
>>> hear from more people too.
>>>
>>> To push this forward, Cody and I will be working together in the next
>>> couple of weeks to come up with a concrete, detailed proposal on what this
>>> entails, and then we can discuss this the specific proposal as well.
>>>
>>>
>>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden email]> wrote:
>>>>
>>>> Yeah, in case it wasn't clear, I was talking about SIPs for major
>>>> user-facing or cross-cutting changes, not minor feature adds.
>>>>
>>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos
>>>> <[hidden email]> wrote:
>>>>>
>>>>> +1 to the SIP label as long as it does not slow down things and it
>>>>> targets optimizing efforts, coordination etc. For example really small
>>>>> features should not need to go through this process (assuming they dont
>>>>> touch public interfaces)  or re-factorings and hope it will be kept this
>>>>> way. So as a guideline doc should be provided, like in the KIP case.
>>>>>
>>>>> IMHO so far aside from tagging things and linking them elsewhere simply
>>>>> having design docs and prototypes implementations in PRs is not something
>>>>> that has not worked so far. What is really a pain in many projects out there
>>>>> is discontinuity in progress of PRs, missing features, slow reviews which is
>>>>> understandable to some extent... it is not only about Spark but things can
>>>>> be improved for sure for this project in particular as already stated.
>>>>>
>>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden email]>
>>>>> wrote:
>>>>>>
>>>>>> +1 to adding an SIP label and linking it from the website.  I think it
>>>>>> needs
>>>>>>
>>>>>> - template that focuses it towards soliciting user goals / non goals
>>>>>> - clear resolution as to which strategy was chosen to pursue.  I'd
>>>>>> recommend a vote.
>>>>>>
>>>>>> Matei asked me to clarify what I meant by changing interfaces, I think
>>>>>> it's directly relevant to the SIP idea so I'll clarify here, and split
>>>>>> a thread for the other discussion per Nicholas' request.
>>>>>>
>>>>>> I meant changing public user interfaces.  I think the first design is
>>>>>> unlikely to be right, because it's done at a time when you have the
>>>>>> least information.  As a user, I find it considerably more frustrating
>>>>>> to be unable to use a tool to get my job done, than I do having to
>>>>>> make minor changes to my code in order to take advantage of features.
>>>>>> I've seen committers be seriously reluctant to allow changes to
>>>>>> @experimental code that are needed in order for it to really work
>>>>>> right.  You need to be able to iterate, and if people on both sides of
>>>>>> the fence aren't going to respect that some newer apis are subject to
>>>>>> change, then why even mark them as such?
>>>>>>
>>>>>> Ideally a finished SIP should give me a checklist of things that an
>>>>>> implementation must do, and things that it doesn't need to do.
>>>>>> Contributors/committers should be seriously discouraged from putting
>>>>>> out a version 0.1 that doesn't have at least a prototype
>>>>>> implementation of all those things, especially if they're then going
>>>>>> to argue against interface changes necessary to get the the rest of
>>>>>> the things done in the 0.2 version.
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden email]>
>>>>>> wrote:
>>>>>> > I like the lightweight proposal to add a SIP label.
>>>>>> >
>>>>>> > During Spark 2.0 development, Tom (Graves) and I suggested using wiki
>>>>>> > to
>>>>>> > track the list of major changes, but that never really materialized
>>>>>> > due to
>>>>>> > the overhead. Adding a SIP label on major JIRAs and then link to them
>>>>>> > prominently on the Spark website makes a lot of sense.
>>>>>> >
>>>>>> >
>>>>>> > On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia
>>>>>> > <[hidden email]>
>>>>>> > wrote:
>>>>>> >>
>>>>>> >> For the improvement proposals, I think one major point was to make
>>>>>> >> them
>>>>>> >> really visible to users who are not contributors, so we should do
>>>>>> >> more than
>>>>>> >> sending stuff to dev@. One very lightweight idea is to have a new
>>>>>> >> type of
>>>>>> >> JIRA called a SIP and have a link to a filter that shows all such
>>>>>> >> JIRAs from
>>>>>> >> http://spark.apache.org. I also like the idea of SIP and design doc
>>>>>> >> templates (in fact many projects have them).
>>>>>> >>
>>>>>> >> Matei
>>>>>> >>
>>>>>> >> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]>
>>>>>> >> wrote:
>>>>>> >>
>>>>>> >> I called Cody last night and talked about some of the topics in his
>>>>>> >> email.
>>>>>> >> It became clear to me Cody genuinely cares about the project.
>>>>>> >>
>>>>>> >> Some of the frustrations come from the success of the project itself
>>>>>> >> becoming very "hot", and it is difficult to get clarity from people
>>>>>> >> who
>>>>>> >> don't dedicate all their time to Spark. In fact, it is in some ways
>>>>>> >> similar
>>>>>> >> to scaling an engineering team in a successful startup: old
>>>>>> >> processes that
>>>>>> >> worked well might not work so well when it gets to a certain size,
>>>>>> >> cultures
>>>>>> >> can get diluted, building culture vs building process, etc.
>>>>>> >>
>>>>>> >> I also really like to have a more visible process for larger
>>>>>> >> changes,
>>>>>> >> especially major user facing API changes. Historically we upload
>>>>>> >> design docs
>>>>>> >> for major changes, but it is not always consistent and difficult to
>>>>>> >> quality
>>>>>> >> of the docs, due to the volunteering nature of the organization.
>>>>>> >>
>>>>>> >> Some of the more concrete ideas we discussed focus on building a
>>>>>> >> culture
>>>>>> >> to improve clarity:
>>>>>> >>
>>>>>> >> - Process: Large changes should have design docs posted on JIRA. One
>>>>>> >> thing
>>>>>> >> Cody and I didn't discuss but an idea that just came to me is we
>>>>>> >> should
>>>>>> >> create a design doc template for the project and ask everybody to
>>>>>> >> follow.
>>>>>> >> The design doc template should also explicitly list goals and
>>>>>> >> non-goals, to
>>>>>> >> make design doc more consistent.
>>>>>> >>
>>>>>> >> - Process: Email dev@ to solicit feedback. We have some this with
>>>>>> >> some
>>>>>> >> changes, but again very inconsistent. Just posting something on JIRA
>>>>>> >> isn't
>>>>>> >> sufficient, because there are simply too many JIRAs and the signal
>>>>>> >> get lost
>>>>>> >> in the noise. While this is generally impossible to enforce because
>>>>>> >> we can't
>>>>>> >> force all volunteers to conform to a process (or they might not even
>>>>>> >> be
>>>>>> >> aware of this),  those who are more familiar with the project can
>>>>>> >> help by
>>>>>> >> emailing the dev@ when they see something that hasn't been.
>>>>>> >>
>>>>>> >> - Culture: The design doc author(s) should be open to feedback. A
>>>>>> >> design
>>>>>> >> doc should serve as the base for discussion and is by no means the
>>>>>> >> final
>>>>>> >> design. Of course, this does not mean the author has to accept every
>>>>>> >> feedback. They should also be comfortable accepting / rejecting
>>>>>> >> ideas on
>>>>>> >> technical grounds.
>>>>>> >>
>>>>>> >> - Process / Culture: For major ongoing projects, it can be useful to
>>>>>> >> have
>>>>>> >> some monthly Google hangouts that are open to the world. I am
>>>>>> >> actually not
>>>>>> >> sure how well this will work, because of the volunteering nature and
>>>>>> >> we need
>>>>>> >> to adjust for timezones for people across the globe, but it seems
>>>>>> >> worth
>>>>>> >> trying.
>>>>>> >>
>>>>>> >> - Culture: Contributors (including committers) should be more direct
>>>>>> >> in
>>>>>> >> setting expectations, including whether they are working on a
>>>>>> >> specific
>>>>>> >> issue, whether they will be working on a specific issue, and whether
>>>>>> >> an
>>>>>> >> issue or pr or jira should be rejected. Most people I know in this
>>>>>> >> community
>>>>>> >> are nice and don't enjoy telling other people no, but it is often
>>>>>> >> more
>>>>>> >> annoying to a contributor to not know anything than getting a no.
>>>>>> >>
>>>>>> >>
>>>>>> >> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia
>>>>>> >> <[hidden email]>
>>>>>> >> wrote:
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> Love the idea of a more visible "Spark Improvement Proposal"
>>>>>> >>> process that
>>>>>> >>> solicits user input on new APIs. For what it's worth, I don't think
>>>>>> >>> committers are trying to minimize their own work -- every committer
>>>>>> >>> cares
>>>>>> >>> about making the software useful for users. However, it is always
>>>>>> >>> hard to
>>>>>> >>> get user input and so it helps to have this kind of process. I've
>>>>>> >>> certainly
>>>>>> >>> looked at the *IPs a lot in other software I use just to see the
>>>>>> >>> biggest
>>>>>> >>> things on the roadmap.
>>>>>> >>>
>>>>>> >>> When you're talking about "changing interfaces", are you talking
>>>>>> >>> about
>>>>>> >>> public or internal APIs? I do think many people hate changing
>>>>>> >>> public APIs
>>>>>> >>> and I actually think that's for the best of the project. That's a
>>>>>> >>> technical
>>>>>> >>> debate, but basically, the worst thing when you're using a piece of
>>>>>> >>> software
>>>>>> >>> is that the developers constantly ask you to rewrite your app to
>>>>>> >>> update to a
>>>>>> >>> new version (and thus benefit from bug fixes, etc). Cue anyone
>>>>>> >>> who's used
>>>>>> >>> Protobuf, or Guava. The "let's get everyone to change their code
>>>>>> >>> this
>>>>>> >>> release" model works well within a single large company, but
>>>>>> >>> doesn't work
>>>>>> >>> well for a community, which is why nearly all *very* widely used
>>>>>> >>> programming
>>>>>> >>> interfaces (I'm talking things like Java standard library, Windows
>>>>>> >>> API, etc)
>>>>>> >>> almost *never* break backwards compatibility. All this is done
>>>>>> >>> within reason
>>>>>> >>> though, e.g. we do change things in major releases (2.x, 3.x, etc).
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: [hidden email]
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Stavros Kontopoulos
>>>>> Senior Software Engineer
>>>>> Lightbend, Inc.
>>>>> p:  +30 6977967274
>>>>> e: [hidden email]
>>>>>
>>>>>
>>>>
>>>
>>
>>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark Improvement Proposals

Matei Zaharia
Administrator
In reply to this post by Cody Koeninger-2
Yup, this is the stuff that I found unclear. Thanks for clarifying here, but we should also clarify it in the writeup. In particular:

- Goals needs to be about user-facing behavior ("people" is broad)

- I'd rename Rejected Goals to Non-Goals. Otherwise someone will dig up one of these and say "Spark's developers have officially rejected X, which our awesome system has".

- For user-facing stuff, I think you need a section on API. Virtually all other *IPs I've seen have that.

- I'm still not sure why the strategy section is needed if the purpose is to define user-facing behavior -- unless this is the strategy for setting the goals or for defining the API. That sounds squarely like a design doc issue. In some sense, who cares whether the proposal is technically feasible right now? If it's infeasible, that will be discovered later during design and implementation. Same thing with rejected strategies -- listing some of those is definitely useful sometimes, but if you make this a *required* section, people are just going to fill it in with bogus stuff (I've seen this happen before).

Matei

> On Oct 9, 2016, at 2:14 PM, Cody Koeninger <[hidden email]> wrote:
>
> So to focus the discussion on the specific strategy I'm suggesting,
> documented at
>
> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>
> "Goals: What must this allow people to do, that they can't currently?"
>
> Is it unclear that this is focusing specifically on people-visible behavior?
>
> Rejected goals -  are important because otherwise people keep trying
> to argue about scope.  Of course you can change things later with a
> different SIP and different vote, the point is to focus.
>
> Use cases - are something that people are going to bring up in
> discussion.  If they aren't clearly documented as a goal ("This must
> allow me to connect using SSL"), they should be added.
>
> Internal architecture - if the people who need specific behavior are
> implementers of other parts of the system, that's fine.
>
> Rejected strategies - If you have none of these, you have no evidence
> that the proponent didn't just go with the first thing they had in
> mind (or have already implemented), which is a big problem currently.
> Approval isn't binding as to specifics of implementation, so these
> aren't handcuffs.  The goals are the contract, the strategy is
> evidence that contract can actually be met.
>
> Design docs - I'm not touching design docs.  The markdown file I
> linked specifically says of the strategy section "This is not a full
> design document."  Is this unclear?  Design docs can be worked on
> obviously, but that's not what I'm concerned with here.
>
>
>
>
> On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[hidden email]> wrote:
>> Hi Cody,
>>
>> I think this would be a lot more concrete if we had a more detailed template
>> for SIPs. Right now, it's not super clear what's in scope -- e.g. are  they
>> a way to solicit feedback on the user-facing behavior or on the internals?
>> "Goals" can cover both things. I've been thinking of SIPs more as Product
>> Requirements Docs (PRDs), which focus on *what* a code change should do as
>> opposed to how.
>>
>> In particular, here are some things that you may or may not consider in
>> scope for SIPs:
>>
>> - Goals and non-goals: This is definitely in scope, and IMO should focus on
>> user-visible behavior (e.g. "system supports SQL window functions" or
>> "system continues working if one node fails"). BTW I wouldn't say "rejected
>> goals" because some of them might become goals later, so we're not
>> definitively rejecting them.
>>
>> - Public API: Probably should be included in most SIPs unless it's too large
>> to fully specify then (e.g. "let's add an ML library").
>>
>> - Use cases: I usually find this very useful in PRDs to better communicate
>> the goals.
>>
>> - Internal architecture: This is usually *not* a thing users can easily
>> comment on and it sounds more like a design doc item. Of course it's
>> important to show that the SIP is feasible to implement. One exception,
>> however, is that I think we'll have some SIPs primarily on internals (e.g.
>> if somebody wants to refactor Spark's query optimizer or something).
>>
>> - Rejected strategies: I personally wouldn't put this, because what's the
>> point of voting to reject a strategy before you've really begun designing
>> and implementing something? What if you discover that the strategy is
>> actually better when you start doing stuff?
>>
>> At a super high level, it depends on whether you want the SIPs to be PRDs
>> for getting some quick feedback on the goals of a feature before it is
>> designed, or something more like full-fledged design docs (just a more
>> visible design doc for bigger changes). I looked at Kafka's KIPs, and they
>> actually seem to be more like design docs. This can work too but it does
>> require more work from the proposer and it can lead to the same problems you
>> mentioned with people already having a design and implementation in mind.
>>
>> Basically, the question is, are you trying to iterate faster on design by
>> adding a step for user feedback earlier? Or are you just trying to make
>> design docs for key features more visible (and their approval more formal)?
>>
>> BTW note that in either case, I'd like to have a template for design docs
>> too, which should also include goals. I think that would've avoided some of
>> the issues you brought up.
>>
>> Matei
>>
>> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]> wrote:
>>
>> Here's my specific proposal (meta-proposal?)
>>
>> Spark Improvement Proposals (SIP)
>>
>>
>> Background:
>>
>> The current problem is that design and implementation of large features are
>> often done in private, before soliciting user feedback.
>>
>> When feedback is solicited, it is often as to detailed design specifics, not
>> focused on goals.
>>
>> When implementation does take place after design, there is often
>> disagreement as to what goals are or are not in scope.
>>
>> This results in commits that don't fully meet user needs.
>>
>>
>> Goals:
>>
>> - Ensure user, contributor, and committer goals are clearly identified and
>> agreed upon, before implementation takes place.
>>
>> - Ensure that a technically feasible strategy is chosen that is likely to
>> meet the goals.
>>
>>
>> Rejected Goals:
>>
>> - SIPs are not for detailed design.  Design by committee doesn't work.
>>
>> - SIPs are not for every change.  We dont need that much process.
>>
>>
>> Strategy:
>>
>> My suggestion is outlined as a Spark Improvement Proposal process documented
>> at
>>
>> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>>
>> Specifics of Jira manipulation are an implementation detail we can figure
>> out.
>>
>> I'm suggesting voting; the need here is for a _clear_ outcome.
>>
>>
>> Rejected Strategies:
>>
>> Having someone who understands the problem implement it first works, but
>> only if significant iteration after user feedback is allowed.
>>
>> Historically this has been problematic due to pressure to limit public api
>> changes.
>>
>>
>> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]> wrote:
>>>
>>> Alright looks like there are quite a bit of support. We should wait to
>>> hear from more people too.
>>>
>>> To push this forward, Cody and I will be working together in the next
>>> couple of weeks to come up with a concrete, detailed proposal on what this
>>> entails, and then we can discuss this the specific proposal as well.
>>>
>>>
>>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden email]> wrote:
>>>>
>>>> Yeah, in case it wasn't clear, I was talking about SIPs for major
>>>> user-facing or cross-cutting changes, not minor feature adds.
>>>>
>>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos
>>>> <[hidden email]> wrote:
>>>>>
>>>>> +1 to the SIP label as long as it does not slow down things and it
>>>>> targets optimizing efforts, coordination etc. For example really small
>>>>> features should not need to go through this process (assuming they dont
>>>>> touch public interfaces)  or re-factorings and hope it will be kept this
>>>>> way. So as a guideline doc should be provided, like in the KIP case.
>>>>>
>>>>> IMHO so far aside from tagging things and linking them elsewhere simply
>>>>> having design docs and prototypes implementations in PRs is not something
>>>>> that has not worked so far. What is really a pain in many projects out there
>>>>> is discontinuity in progress of PRs, missing features, slow reviews which is
>>>>> understandable to some extent... it is not only about Spark but things can
>>>>> be improved for sure for this project in particular as already stated.
>>>>>
>>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden email]>
>>>>> wrote:
>>>>>>
>>>>>> +1 to adding an SIP label and linking it from the website.  I think it
>>>>>> needs
>>>>>>
>>>>>> - template that focuses it towards soliciting user goals / non goals
>>>>>> - clear resolution as to which strategy was chosen to pursue.  I'd
>>>>>> recommend a vote.
>>>>>>
>>>>>> Matei asked me to clarify what I meant by changing interfaces, I think
>>>>>> it's directly relevant to the SIP idea so I'll clarify here, and split
>>>>>> a thread for the other discussion per Nicholas' request.
>>>>>>
>>>>>> I meant changing public user interfaces.  I think the first design is
>>>>>> unlikely to be right, because it's done at a time when you have the
>>>>>> least information.  As a user, I find it considerably more frustrating
>>>>>> to be unable to use a tool to get my job done, than I do having to
>>>>>> make minor changes to my code in order to take advantage of features.
>>>>>> I've seen committers be seriously reluctant to allow changes to
>>>>>> @experimental code that are needed in order for it to really work
>>>>>> right.  You need to be able to iterate, and if people on both sides of
>>>>>> the fence aren't going to respect that some newer apis are subject to
>>>>>> change, then why even mark them as such?
>>>>>>
>>>>>> Ideally a finished SIP should give me a checklist of things that an
>>>>>> implementation must do, and things that it doesn't need to do.
>>>>>> Contributors/committers should be seriously discouraged from putting
>>>>>> out a version 0.1 that doesn't have at least a prototype
>>>>>> implementation of all those things, especially if they're then going
>>>>>> to argue against interface changes necessary to get the the rest of
>>>>>> the things done in the 0.2 version.
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden email]>
>>>>>> wrote:
>>>>>>> I like the lightweight proposal to add a SIP label.
>>>>>>>
>>>>>>> During Spark 2.0 development, Tom (Graves) and I suggested using wiki
>>>>>>> to
>>>>>>> track the list of major changes, but that never really materialized
>>>>>>> due to
>>>>>>> the overhead. Adding a SIP label on major JIRAs and then link to them
>>>>>>> prominently on the Spark website makes a lot of sense.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia
>>>>>>> <[hidden email]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> For the improvement proposals, I think one major point was to make
>>>>>>>> them
>>>>>>>> really visible to users who are not contributors, so we should do
>>>>>>>> more than
>>>>>>>> sending stuff to dev@. One very lightweight idea is to have a new
>>>>>>>> type of
>>>>>>>> JIRA called a SIP and have a link to a filter that shows all such
>>>>>>>> JIRAs from
>>>>>>>> http://spark.apache.org. I also like the idea of SIP and design doc
>>>>>>>> templates (in fact many projects have them).
>>>>>>>>
>>>>>>>> Matei
>>>>>>>>
>>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> I called Cody last night and talked about some of the topics in his
>>>>>>>> email.
>>>>>>>> It became clear to me Cody genuinely cares about the project.
>>>>>>>>
>>>>>>>> Some of the frustrations come from the success of the project itself
>>>>>>>> becoming very "hot", and it is difficult to get clarity from people
>>>>>>>> who
>>>>>>>> don't dedicate all their time to Spark. In fact, it is in some ways
>>>>>>>> similar
>>>>>>>> to scaling an engineering team in a successful startup: old
>>>>>>>> processes that
>>>>>>>> worked well might not work so well when it gets to a certain size,
>>>>>>>> cultures
>>>>>>>> can get diluted, building culture vs building process, etc.
>>>>>>>>
>>>>>>>> I also really like to have a more visible process for larger
>>>>>>>> changes,
>>>>>>>> especially major user facing API changes. Historically we upload
>>>>>>>> design docs
>>>>>>>> for major changes, but it is not always consistent and difficult to
>>>>>>>> quality
>>>>>>>> of the docs, due to the volunteering nature of the organization.
>>>>>>>>
>>>>>>>> Some of the more concrete ideas we discussed focus on building a
>>>>>>>> culture
>>>>>>>> to improve clarity:
>>>>>>>>
>>>>>>>> - Process: Large changes should have design docs posted on JIRA. One
>>>>>>>> thing
>>>>>>>> Cody and I didn't discuss but an idea that just came to me is we
>>>>>>>> should
>>>>>>>> create a design doc template for the project and ask everybody to
>>>>>>>> follow.
>>>>>>>> The design doc template should also explicitly list goals and
>>>>>>>> non-goals, to
>>>>>>>> make design doc more consistent.
>>>>>>>>
>>>>>>>> - Process: Email dev@ to solicit feedback. We have some this with
>>>>>>>> some
>>>>>>>> changes, but again very inconsistent. Just posting something on JIRA
>>>>>>>> isn't
>>>>>>>> sufficient, because there are simply too many JIRAs and the signal
>>>>>>>> get lost
>>>>>>>> in the noise. While this is generally impossible to enforce because
>>>>>>>> we can't
>>>>>>>> force all volunteers to conform to a process (or they might not even
>>>>>>>> be
>>>>>>>> aware of this),  those who are more familiar with the project can
>>>>>>>> help by
>>>>>>>> emailing the dev@ when they see something that hasn't been.
>>>>>>>>
>>>>>>>> - Culture: The design doc author(s) should be open to feedback. A
>>>>>>>> design
>>>>>>>> doc should serve as the base for discussion and is by no means the
>>>>>>>> final
>>>>>>>> design. Of course, this does not mean the author has to accept every
>>>>>>>> feedback. They should also be comfortable accepting / rejecting
>>>>>>>> ideas on
>>>>>>>> technical grounds.
>>>>>>>>
>>>>>>>> - Process / Culture: For major ongoing projects, it can be useful to
>>>>>>>> have
>>>>>>>> some monthly Google hangouts that are open to the world. I am
>>>>>>>> actually not
>>>>>>>> sure how well this will work, because of the volunteering nature and
>>>>>>>> we need
>>>>>>>> to adjust for timezones for people across the globe, but it seems
>>>>>>>> worth
>>>>>>>> trying.
>>>>>>>>
>>>>>>>> - Culture: Contributors (including committers) should be more direct
>>>>>>>> in
>>>>>>>> setting expectations, including whether they are working on a
>>>>>>>> specific
>>>>>>>> issue, whether they will be working on a specific issue, and whether
>>>>>>>> an
>>>>>>>> issue or pr or jira should be rejected. Most people I know in this
>>>>>>>> community
>>>>>>>> are nice and don't enjoy telling other people no, but it is often
>>>>>>>> more
>>>>>>>> annoying to a contributor to not know anything than getting a no.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia
>>>>>>>> <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Love the idea of a more visible "Spark Improvement Proposal"
>>>>>>>>> process that
>>>>>>>>> solicits user input on new APIs. For what it's worth, I don't think
>>>>>>>>> committers are trying to minimize their own work -- every committer
>>>>>>>>> cares
>>>>>>>>> about making the software useful for users. However, it is always
>>>>>>>>> hard to
>>>>>>>>> get user input and so it helps to have this kind of process. I've
>>>>>>>>> certainly
>>>>>>>>> looked at the *IPs a lot in other software I use just to see the
>>>>>>>>> biggest
>>>>>>>>> things on the roadmap.
>>>>>>>>>
>>>>>>>>> When you're talking about "changing interfaces", are you talking
>>>>>>>>> about
>>>>>>>>> public or internal APIs? I do think many people hate changing
>>>>>>>>> public APIs
>>>>>>>>> and I actually think that's for the best of the project. That's a
>>>>>>>>> technical
>>>>>>>>> debate, but basically, the worst thing when you're using a piece of
>>>>>>>>> software
>>>>>>>>> is that the developers constantly ask you to rewrite your app to
>>>>>>>>> update to a
>>>>>>>>> new version (and thus benefit from bug fixes, etc). Cue anyone
>>>>>>>>> who's used
>>>>>>>>> Protobuf, or Guava. The "let's get everyone to change their code
>>>>>>>>> this
>>>>>>>>> release" model works well within a single large company, but
>>>>>>>>> doesn't work
>>>>>>>>> well for a community, which is why nearly all *very* widely used
>>>>>>>>> programming
>>>>>>>>> interfaces (I'm talking things like Java standard library, Windows
>>>>>>>>> API, etc)
>>>>>>>>> almost *never* break backwards compatibility. All this is done
>>>>>>>>> within reason
>>>>>>>>> though, e.g. we do change things in major releases (2.x, 3.x, etc).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: [hidden email]
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Stavros Kontopoulos
>>>>> Senior Software Engineer
>>>>> Lightbend, Inc.
>>>>> p:  +30 6977967274
>>>>> e: [hidden email]
>>>>>
>>>>>
>>>>
>>>
>>
>>


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark Improvement Proposals

Ofir Manor
This is a great discussion!
Maybe you could have a look at Kafka's process - it also uses Rejected Alternatives and I personally find it very clear actually (the link also leads to all KIPs):
Cody - maybe you could take one of the open issues and write a sample proposal? A concrete example might make it clearer for those who see this for the first time. Maybe the Kafka offset discussion or some other Kafka/Structured Streaming open issue? Will that be helpful?

Ofir Manor

Co-Founder & CTO | Equalum

Mobile: <a href="tel:%2B972-54-7801286" value="+972507470820" target="_blank">+972-54-7801286 | Email: [hidden email]


On Mon, Oct 10, 2016 at 12:36 AM, Matei Zaharia <[hidden email]> wrote:
Yup, this is the stuff that I found unclear. Thanks for clarifying here, but we should also clarify it in the writeup. In particular:

- Goals needs to be about user-facing behavior ("people" is broad)

- I'd rename Rejected Goals to Non-Goals. Otherwise someone will dig up one of these and say "Spark's developers have officially rejected X, which our awesome system has".

- For user-facing stuff, I think you need a section on API. Virtually all other *IPs I've seen have that.

- I'm still not sure why the strategy section is needed if the purpose is to define user-facing behavior -- unless this is the strategy for setting the goals or for defining the API. That sounds squarely like a design doc issue. In some sense, who cares whether the proposal is technically feasible right now? If it's infeasible, that will be discovered later during design and implementation. Same thing with rejected strategies -- listing some of those is definitely useful sometimes, but if you make this a *required* section, people are just going to fill it in with bogus stuff (I've seen this happen before).

Matei

> On Oct 9, 2016, at 2:14 PM, Cody Koeninger <[hidden email]> wrote:
>
> So to focus the discussion on the specific strategy I'm suggesting,
> documented at
>
> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>
> "Goals: What must this allow people to do, that they can't currently?"
>
> Is it unclear that this is focusing specifically on people-visible behavior?
>
> Rejected goals -  are important because otherwise people keep trying
> to argue about scope.  Of course you can change things later with a
> different SIP and different vote, the point is to focus.
>
> Use cases - are something that people are going to bring up in
> discussion.  If they aren't clearly documented as a goal ("This must
> allow me to connect using SSL"), they should be added.
>
> Internal architecture - if the people who need specific behavior are
> implementers of other parts of the system, that's fine.
>
> Rejected strategies - If you have none of these, you have no evidence
> that the proponent didn't just go with the first thing they had in
> mind (or have already implemented), which is a big problem currently.
> Approval isn't binding as to specifics of implementation, so these
> aren't handcuffs.  The goals are the contract, the strategy is
> evidence that contract can actually be met.
>
> Design docs - I'm not touching design docs.  The markdown file I
> linked specifically says of the strategy section "This is not a full
> design document."  Is this unclear?  Design docs can be worked on
> obviously, but that's not what I'm concerned with here.
>
>
>
>
> On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[hidden email]> wrote:
>> Hi Cody,
>>
>> I think this would be a lot more concrete if we had a more detailed template
>> for SIPs. Right now, it's not super clear what's in scope -- e.g. are  they
>> a way to solicit feedback on the user-facing behavior or on the internals?
>> "Goals" can cover both things. I've been thinking of SIPs more as Product
>> Requirements Docs (PRDs), which focus on *what* a code change should do as
>> opposed to how.
>>
>> In particular, here are some things that you may or may not consider in
>> scope for SIPs:
>>
>> - Goals and non-goals: This is definitely in scope, and IMO should focus on
>> user-visible behavior (e.g. "system supports SQL window functions" or
>> "system continues working if one node fails"). BTW I wouldn't say "rejected
>> goals" because some of them might become goals later, so we're not
>> definitively rejecting them.
>>
>> - Public API: Probably should be included in most SIPs unless it's too large
>> to fully specify then (e.g. "let's add an ML library").
>>
>> - Use cases: I usually find this very useful in PRDs to better communicate
>> the goals.
>>
>> - Internal architecture: This is usually *not* a thing users can easily
>> comment on and it sounds more like a design doc item. Of course it's
>> important to show that the SIP is feasible to implement. One exception,
>> however, is that I think we'll have some SIPs primarily on internals (e.g.
>> if somebody wants to refactor Spark's query optimizer or something).
>>
>> - Rejected strategies: I personally wouldn't put this, because what's the
>> point of voting to reject a strategy before you've really begun designing
>> and implementing something? What if you discover that the strategy is
>> actually better when you start doing stuff?
>>
>> At a super high level, it depends on whether you want the SIPs to be PRDs
>> for getting some quick feedback on the goals of a feature before it is
>> designed, or something more like full-fledged design docs (just a more
>> visible design doc for bigger changes). I looked at Kafka's KIPs, and they
>> actually seem to be more like design docs. This can work too but it does
>> require more work from the proposer and it can lead to the same problems you
>> mentioned with people already having a design and implementation in mind.
>>
>> Basically, the question is, are you trying to iterate faster on design by
>> adding a step for user feedback earlier? Or are you just trying to make
>> design docs for key features more visible (and their approval more formal)?
>>
>> BTW note that in either case, I'd like to have a template for design docs
>> too, which should also include goals. I think that would've avoided some of
>> the issues you brought up.
>>
>> Matei
>>
>> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]> wrote:
>>
>> Here's my specific proposal (meta-proposal?)
>>
>> Spark Improvement Proposals (SIP)
>>
>>
>> Background:
>>
>> The current problem is that design and implementation of large features are
>> often done in private, before soliciting user feedback.
>>
>> When feedback is solicited, it is often as to detailed design specifics, not
>> focused on goals.
>>
>> When implementation does take place after design, there is often
>> disagreement as to what goals are or are not in scope.
>>
>> This results in commits that don't fully meet user needs.
>>
>>
>> Goals:
>>
>> - Ensure user, contributor, and committer goals are clearly identified and
>> agreed upon, before implementation takes place.
>>
>> - Ensure that a technically feasible strategy is chosen that is likely to
>> meet the goals.
>>
>>
>> Rejected Goals:
>>
>> - SIPs are not for detailed design.  Design by committee doesn't work.
>>
>> - SIPs are not for every change.  We dont need that much process.
>>
>>
>> Strategy:
>>
>> My suggestion is outlined as a Spark Improvement Proposal process documented
>> at
>>
>> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>>
>> Specifics of Jira manipulation are an implementation detail we can figure
>> out.
>>
>> I'm suggesting voting; the need here is for a _clear_ outcome.
>>
>>
>> Rejected Strategies:
>>
>> Having someone who understands the problem implement it first works, but
>> only if significant iteration after user feedback is allowed.
>>
>> Historically this has been problematic due to pressure to limit public api
>> changes.
>>
>>
>> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]> wrote:
>>>
>>> Alright looks like there are quite a bit of support. We should wait to
>>> hear from more people too.
>>>
>>> To push this forward, Cody and I will be working together in the next
>>> couple of weeks to come up with a concrete, detailed proposal on what this
>>> entails, and then we can discuss this the specific proposal as well.
>>>
>>>
>>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden email]> wrote:
>>>>
>>>> Yeah, in case it wasn't clear, I was talking about SIPs for major
>>>> user-facing or cross-cutting changes, not minor feature adds.
>>>>
>>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos
>>>> <[hidden email]> wrote:
>>>>>
>>>>> +1 to the SIP label as long as it does not slow down things and it
>>>>> targets optimizing efforts, coordination etc. For example really small
>>>>> features should not need to go through this process (assuming they dont
>>>>> touch public interfaces)  or re-factorings and hope it will be kept this
>>>>> way. So as a guideline doc should be provided, like in the KIP case.
>>>>>
>>>>> IMHO so far aside from tagging things and linking them elsewhere simply
>>>>> having design docs and prototypes implementations in PRs is not something
>>>>> that has not worked so far. What is really a pain in many projects out there
>>>>> is discontinuity in progress of PRs, missing features, slow reviews which is
>>>>> understandable to some extent... it is not only about Spark but things can
>>>>> be improved for sure for this project in particular as already stated.
>>>>>
>>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden email]>
>>>>> wrote:
>>>>>>
>>>>>> +1 to adding an SIP label and linking it from the website.  I think it
>>>>>> needs
>>>>>>
>>>>>> - template that focuses it towards soliciting user goals / non goals
>>>>>> - clear resolution as to which strategy was chosen to pursue.  I'd
>>>>>> recommend a vote.
>>>>>>
>>>>>> Matei asked me to clarify what I meant by changing interfaces, I think
>>>>>> it's directly relevant to the SIP idea so I'll clarify here, and split
>>>>>> a thread for the other discussion per Nicholas' request.
>>>>>>
>>>>>> I meant changing public user interfaces.  I think the first design is
>>>>>> unlikely to be right, because it's done at a time when you have the
>>>>>> least information.  As a user, I find it considerably more frustrating
>>>>>> to be unable to use a tool to get my job done, than I do having to
>>>>>> make minor changes to my code in order to take advantage of features.
>>>>>> I've seen committers be seriously reluctant to allow changes to
>>>>>> @experimental code that are needed in order for it to really work
>>>>>> right.  You need to be able to iterate, and if people on both sides of
>>>>>> the fence aren't going to respect that some newer apis are subject to
>>>>>> change, then why even mark them as such?
>>>>>>
>>>>>> Ideally a finished SIP should give me a checklist of things that an
>>>>>> implementation must do, and things that it doesn't need to do.
>>>>>> Contributors/committers should be seriously discouraged from putting
>>>>>> out a version 0.1 that doesn't have at least a prototype
>>>>>> implementation of all those things, especially if they're then going
>>>>>> to argue against interface changes necessary to get the the rest of
>>>>>> the things done in the 0.2 version.
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden email]>
>>>>>> wrote:
>>>>>>> I like the lightweight proposal to add a SIP label.
>>>>>>>
>>>>>>> During Spark 2.0 development, Tom (Graves) and I suggested using wiki
>>>>>>> to
>>>>>>> track the list of major changes, but that never really materialized
>>>>>>> due to
>>>>>>> the overhead. Adding a SIP label on major JIRAs and then link to them
>>>>>>> prominently on the Spark website makes a lot of sense.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia
>>>>>>> <[hidden email]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> For the improvement proposals, I think one major point was to make
>>>>>>>> them
>>>>>>>> really visible to users who are not contributors, so we should do
>>>>>>>> more than
>>>>>>>> sending stuff to dev@. One very lightweight idea is to have a new
>>>>>>>> type of
>>>>>>>> JIRA called a SIP and have a link to a filter that shows all such
>>>>>>>> JIRAs from
>>>>>>>> http://spark.apache.org. I also like the idea of SIP and design doc
>>>>>>>> templates (in fact many projects have them).
>>>>>>>>
>>>>>>>> Matei
>>>>>>>>
>>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> I called Cody last night and talked about some of the topics in his
>>>>>>>> email.
>>>>>>>> It became clear to me Cody genuinely cares about the project.
>>>>>>>>
>>>>>>>> Some of the frustrations come from the success of the project itself
>>>>>>>> becoming very "hot", and it is difficult to get clarity from people
>>>>>>>> who
>>>>>>>> don't dedicate all their time to Spark. In fact, it is in some ways
>>>>>>>> similar
>>>>>>>> to scaling an engineering team in a successful startup: old
>>>>>>>> processes that
>>>>>>>> worked well might not work so well when it gets to a certain size,
>>>>>>>> cultures
>>>>>>>> can get diluted, building culture vs building process, etc.
>>>>>>>>
>>>>>>>> I also really like to have a more visible process for larger
>>>>>>>> changes,
>>>>>>>> especially major user facing API changes. Historically we upload
>>>>>>>> design docs
>>>>>>>> for major changes, but it is not always consistent and difficult to
>>>>>>>> quality
>>>>>>>> of the docs, due to the volunteering nature of the organization.
>>>>>>>>
>>>>>>>> Some of the more concrete ideas we discussed focus on building a
>>>>>>>> culture
>>>>>>>> to improve clarity:
>>>>>>>>
>>>>>>>> - Process: Large changes should have design docs posted on JIRA. One
>>>>>>>> thing
>>>>>>>> Cody and I didn't discuss but an idea that just came to me is we
>>>>>>>> should
>>>>>>>> create a design doc template for the project and ask everybody to
>>>>>>>> follow.
>>>>>>>> The design doc template should also explicitly list goals and
>>>>>>>> non-goals, to
>>>>>>>> make design doc more consistent.
>>>>>>>>
>>>>>>>> - Process: Email dev@ to solicit feedback. We have some this with
>>>>>>>> some
>>>>>>>> changes, but again very inconsistent. Just posting something on JIRA
>>>>>>>> isn't
>>>>>>>> sufficient, because there are simply too many JIRAs and the signal
>>>>>>>> get lost
>>>>>>>> in the noise. While this is generally impossible to enforce because
>>>>>>>> we can't
>>>>>>>> force all volunteers to conform to a process (or they might not even
>>>>>>>> be
>>>>>>>> aware of this),  those who are more familiar with the project can
>>>>>>>> help by
>>>>>>>> emailing the dev@ when they see something that hasn't been.
>>>>>>>>
>>>>>>>> - Culture: The design doc author(s) should be open to feedback. A
>>>>>>>> design
>>>>>>>> doc should serve as the base for discussion and is by no means the
>>>>>>>> final
>>>>>>>> design. Of course, this does not mean the author has to accept every
>>>>>>>> feedback. They should also be comfortable accepting / rejecting
>>>>>>>> ideas on
>>>>>>>> technical grounds.
>>>>>>>>
>>>>>>>> - Process / Culture: For major ongoing projects, it can be useful to
>>>>>>>> have
>>>>>>>> some monthly Google hangouts that are open to the world. I am
>>>>>>>> actually not
>>>>>>>> sure how well this will work, because of the volunteering nature and
>>>>>>>> we need
>>>>>>>> to adjust for timezones for people across the globe, but it seems
>>>>>>>> worth
>>>>>>>> trying.
>>>>>>>>
>>>>>>>> - Culture: Contributors (including committers) should be more direct
>>>>>>>> in
>>>>>>>> setting expectations, including whether they are working on a
>>>>>>>> specific
>>>>>>>> issue, whether they will be working on a specific issue, and whether
>>>>>>>> an
>>>>>>>> issue or pr or jira should be rejected. Most people I know in this
>>>>>>>> community
>>>>>>>> are nice and don't enjoy telling other people no, but it is often
>>>>>>>> more
>>>>>>>> annoying to a contributor to not know anything than getting a no.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia
>>>>>>>> <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Love the idea of a more visible "Spark Improvement Proposal"
>>>>>>>>> process that
>>>>>>>>> solicits user input on new APIs. For what it's worth, I don't think
>>>>>>>>> committers are trying to minimize their own work -- every committer
>>>>>>>>> cares
>>>>>>>>> about making the software useful for users. However, it is always
>>>>>>>>> hard to
>>>>>>>>> get user input and so it helps to have this kind of process. I've
>>>>>>>>> certainly
>>>>>>>>> looked at the *IPs a lot in other software I use just to see the
>>>>>>>>> biggest
>>>>>>>>> things on the roadmap.
>>>>>>>>>
>>>>>>>>> When you're talking about "changing interfaces", are you talking
>>>>>>>>> about
>>>>>>>>> public or internal APIs? I do think many people hate changing
>>>>>>>>> public APIs
>>>>>>>>> and I actually think that's for the best of the project. That's a
>>>>>>>>> technical
>>>>>>>>> debate, but basically, the worst thing when you're using a piece of
>>>>>>>>> software
>>>>>>>>> is that the developers constantly ask you to rewrite your app to
>>>>>>>>> update to a
>>>>>>>>> new version (and thus benefit from bug fixes, etc). Cue anyone
>>>>>>>>> who's used
>>>>>>>>> Protobuf, or Guava. The "let's get everyone to change their code
>>>>>>>>> this
>>>>>>>>> release" model works well within a single large company, but
>>>>>>>>> doesn't work
>>>>>>>>> well for a community, which is why nearly all *very* widely used
>>>>>>>>> programming
>>>>>>>>> interfaces (I'm talking things like Java standard library, Windows
>>>>>>>>> API, etc)
>>>>>>>>> almost *never* break backwards compatibility. All this is done
>>>>>>>>> within reason
>>>>>>>>> though, e.g. we do change things in major releases (2.x, 3.x, etc).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: [hidden email]
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Stavros Kontopoulos
>>>>> Senior Software Engineer
>>>>> Lightbend, Inc.
>>>>> p:  <a href="tel:%2B30%206977967274" value="+306977967274">+30 6977967274
>>>>> e: [hidden email]
>>>>>
>>>>>
>>>>
>>>
>>
>>


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark Improvement Proposals

Cody Koeninger-2
In reply to this post by Matei Zaharia

Users instead of people, sure.  Commiters and contributors are (or at least should be) a subset of users.

Non goals, sure. I don't care what the name is, but we need to clearly say e.g. 'no we are not maintaining compatibility with XYZ right now'.

API, what I care most about is whether it allows me to accomplish the goals. Arguing about how ugly or pretty it is can be saved for design/ implementation imho.

Strategy, this is necessary because otherwise goals can be out of line with reality.  Don't propose goals you don't have at least some idea of how to implement.

Rejected strategies, given that commiters are the only ones I'm saying should formally submit SPARKLIs or SIPs, if they put junk in a required section then slap them down for it and tell them to fix it.


On Oct 9, 2016 4:36 PM, "Matei Zaharia" <[hidden email]> wrote:
Yup, this is the stuff that I found unclear. Thanks for clarifying here, but we should also clarify it in the writeup. In particular:

- Goals needs to be about user-facing behavior ("people" is broad)

- I'd rename Rejected Goals to Non-Goals. Otherwise someone will dig up one of these and say "Spark's developers have officially rejected X, which our awesome system has".

- For user-facing stuff, I think you need a section on API. Virtually all other *IPs I've seen have that.

- I'm still not sure why the strategy section is needed if the purpose is to define user-facing behavior -- unless this is the strategy for setting the goals or for defining the API. That sounds squarely like a design doc issue. In some sense, who cares whether the proposal is technically feasible right now? If it's infeasible, that will be discovered later during design and implementation. Same thing with rejected strategies -- listing some of those is definitely useful sometimes, but if you make this a *required* section, people are just going to fill it in with bogus stuff (I've seen this happen before).

Matei

> On Oct 9, 2016, at 2:14 PM, Cody Koeninger <[hidden email]> wrote:
>
> So to focus the discussion on the specific strategy I'm suggesting,
> documented at
>
> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>
> "Goals: What must this allow people to do, that they can't currently?"
>
> Is it unclear that this is focusing specifically on people-visible behavior?
>
> Rejected goals -  are important because otherwise people keep trying
> to argue about scope.  Of course you can change things later with a
> different SIP and different vote, the point is to focus.
>
> Use cases - are something that people are going to bring up in
> discussion.  If they aren't clearly documented as a goal ("This must
> allow me to connect using SSL"), they should be added.
>
> Internal architecture - if the people who need specific behavior are
> implementers of other parts of the system, that's fine.
>
> Rejected strategies - If you have none of these, you have no evidence
> that the proponent didn't just go with the first thing they had in
> mind (or have already implemented), which is a big problem currently.
> Approval isn't binding as to specifics of implementation, so these
> aren't handcuffs.  The goals are the contract, the strategy is
> evidence that contract can actually be met.
>
> Design docs - I'm not touching design docs.  The markdown file I
> linked specifically says of the strategy section "This is not a full
> design document."  Is this unclear?  Design docs can be worked on
> obviously, but that's not what I'm concerned with here.
>
>
>
>
> On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[hidden email]> wrote:
>> Hi Cody,
>>
>> I think this would be a lot more concrete if we had a more detailed template
>> for SIPs. Right now, it's not super clear what's in scope -- e.g. are  they
>> a way to solicit feedback on the user-facing behavior or on the internals?
>> "Goals" can cover both things. I've been thinking of SIPs more as Product
>> Requirements Docs (PRDs), which focus on *what* a code change should do as
>> opposed to how.
>>
>> In particular, here are some things that you may or may not consider in
>> scope for SIPs:
>>
>> - Goals and non-goals: This is definitely in scope, and IMO should focus on
>> user-visible behavior (e.g. "system supports SQL window functions" or
>> "system continues working if one node fails"). BTW I wouldn't say "rejected
>> goals" because some of them might become goals later, so we're not
>> definitively rejecting them.
>>
>> - Public API: Probably should be included in most SIPs unless it's too large
>> to fully specify then (e.g. "let's add an ML library").
>>
>> - Use cases: I usually find this very useful in PRDs to better communicate
>> the goals.
>>
>> - Internal architecture: This is usually *not* a thing users can easily
>> comment on and it sounds more like a design doc item. Of course it's
>> important to show that the SIP is feasible to implement. One exception,
>> however, is that I think we'll have some SIPs primarily on internals (e.g.
>> if somebody wants to refactor Spark's query optimizer or something).
>>
>> - Rejected strategies: I personally wouldn't put this, because what's the
>> point of voting to reject a strategy before you've really begun designing
>> and implementing something? What if you discover that the strategy is
>> actually better when you start doing stuff?
>>
>> At a super high level, it depends on whether you want the SIPs to be PRDs
>> for getting some quick feedback on the goals of a feature before it is
>> designed, or something more like full-fledged design docs (just a more
>> visible design doc for bigger changes). I looked at Kafka's KIPs, and they
>> actually seem to be more like design docs. This can work too but it does
>> require more work from the proposer and it can lead to the same problems you
>> mentioned with people already having a design and implementation in mind.
>>
>> Basically, the question is, are you trying to iterate faster on design by
>> adding a step for user feedback earlier? Or are you just trying to make
>> design docs for key features more visible (and their approval more formal)?
>>
>> BTW note that in either case, I'd like to have a template for design docs
>> too, which should also include goals. I think that would've avoided some of
>> the issues you brought up.
>>
>> Matei
>>
>> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]> wrote:
>>
>> Here's my specific proposal (meta-proposal?)
>>
>> Spark Improvement Proposals (SIP)
>>
>>
>> Background:
>>
>> The current problem is that design and implementation of large features are
>> often done in private, before soliciting user feedback.
>>
>> When feedback is solicited, it is often as to detailed design specifics, not
>> focused on goals.
>>
>> When implementation does take place after design, there is often
>> disagreement as to what goals are or are not in scope.
>>
>> This results in commits that don't fully meet user needs.
>>
>>
>> Goals:
>>
>> - Ensure user, contributor, and committer goals are clearly identified and
>> agreed upon, before implementation takes place.
>>
>> - Ensure that a technically feasible strategy is chosen that is likely to
>> meet the goals.
>>
>>
>> Rejected Goals:
>>
>> - SIPs are not for detailed design.  Design by committee doesn't work.
>>
>> - SIPs are not for every change.  We dont need that much process.
>>
>>
>> Strategy:
>>
>> My suggestion is outlined as a Spark Improvement Proposal process documented
>> at
>>
>> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>>
>> Specifics of Jira manipulation are an implementation detail we can figure
>> out.
>>
>> I'm suggesting voting; the need here is for a _clear_ outcome.
>>
>>
>> Rejected Strategies:
>>
>> Having someone who understands the problem implement it first works, but
>> only if significant iteration after user feedback is allowed.
>>
>> Historically this has been problematic due to pressure to limit public api
>> changes.
>>
>>
>> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]> wrote:
>>>
>>> Alright looks like there are quite a bit of support. We should wait to
>>> hear from more people too.
>>>
>>> To push this forward, Cody and I will be working together in the next
>>> couple of weeks to come up with a concrete, detailed proposal on what this
>>> entails, and then we can discuss this the specific proposal as well.
>>>
>>>
>>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden email]> wrote:
>>>>
>>>> Yeah, in case it wasn't clear, I was talking about SIPs for major
>>>> user-facing or cross-cutting changes, not minor feature adds.
>>>>
>>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos
>>>> <[hidden email]> wrote:
>>>>>
>>>>> +1 to the SIP label as long as it does not slow down things and it
>>>>> targets optimizing efforts, coordination etc. For example really small
>>>>> features should not need to go through this process (assuming they dont
>>>>> touch public interfaces)  or re-factorings and hope it will be kept this
>>>>> way. So as a guideline doc should be provided, like in the KIP case.
>>>>>
>>>>> IMHO so far aside from tagging things and linking them elsewhere simply
>>>>> having design docs and prototypes implementations in PRs is not something
>>>>> that has not worked so far. What is really a pain in many projects out there
>>>>> is discontinuity in progress of PRs, missing features, slow reviews which is
>>>>> understandable to some extent... it is not only about Spark but things can
>>>>> be improved for sure for this project in particular as already stated.
>>>>>
>>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden email]>
>>>>> wrote:
>>>>>>
>>>>>> +1 to adding an SIP label and linking it from the website.  I think it
>>>>>> needs
>>>>>>
>>>>>> - template that focuses it towards soliciting user goals / non goals
>>>>>> - clear resolution as to which strategy was chosen to pursue.  I'd
>>>>>> recommend a vote.
>>>>>>
>>>>>> Matei asked me to clarify what I meant by changing interfaces, I think
>>>>>> it's directly relevant to the SIP idea so I'll clarify here, and split
>>>>>> a thread for the other discussion per Nicholas' request.
>>>>>>
>>>>>> I meant changing public user interfaces.  I think the first design is
>>>>>> unlikely to be right, because it's done at a time when you have the
>>>>>> least information.  As a user, I find it considerably more frustrating
>>>>>> to be unable to use a tool to get my job done, than I do having to
>>>>>> make minor changes to my code in order to take advantage of features.
>>>>>> I've seen committers be seriously reluctant to allow changes to
>>>>>> @experimental code that are needed in order for it to really work
>>>>>> right.  You need to be able to iterate, and if people on both sides of
>>>>>> the fence aren't going to respect that some newer apis are subject to
>>>>>> change, then why even mark them as such?
>>>>>>
>>>>>> Ideally a finished SIP should give me a checklist of things that an
>>>>>> implementation must do, and things that it doesn't need to do.
>>>>>> Contributors/committers should be seriously discouraged from putting
>>>>>> out a version 0.1 that doesn't have at least a prototype
>>>>>> implementation of all those things, especially if they're then going
>>>>>> to argue against interface changes necessary to get the the rest of
>>>>>> the things done in the 0.2 version.
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden email]>
>>>>>> wrote:
>>>>>>> I like the lightweight proposal to add a SIP label.
>>>>>>>
>>>>>>> During Spark 2.0 development, Tom (Graves) and I suggested using wiki
>>>>>>> to
>>>>>>> track the list of major changes, but that never really materialized
>>>>>>> due to
>>>>>>> the overhead. Adding a SIP label on major JIRAs and then link to them
>>>>>>> prominently on the Spark website makes a lot of sense.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia
>>>>>>> <[hidden email]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> For the improvement proposals, I think one major point was to make
>>>>>>>> them
>>>>>>>> really visible to users who are not contributors, so we should do
>>>>>>>> more than
>>>>>>>> sending stuff to dev@. One very lightweight idea is to have a new
>>>>>>>> type of
>>>>>>>> JIRA called a SIP and have a link to a filter that shows all such
>>>>>>>> JIRAs from
>>>>>>>> http://spark.apache.org. I also like the idea of SIP and design doc
>>>>>>>> templates (in fact many projects have them).
>>>>>>>>
>>>>>>>> Matei
>>>>>>>>
>>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> I called Cody last night and talked about some of the topics in his
>>>>>>>> email.
>>>>>>>> It became clear to me Cody genuinely cares about the project.
>>>>>>>>
>>>>>>>> Some of the frustrations come from the success of the project itself
>>>>>>>> becoming very "hot", and it is difficult to get clarity from people
>>>>>>>> who
>>>>>>>> don't dedicate all their time to Spark. In fact, it is in some ways
>>>>>>>> similar
>>>>>>>> to scaling an engineering team in a successful startup: old
>>>>>>>> processes that
>>>>>>>> worked well might not work so well when it gets to a certain size,
>>>>>>>> cultures
>>>>>>>> can get diluted, building culture vs building process, etc.
>>>>>>>>
>>>>>>>> I also really like to have a more visible process for larger
>>>>>>>> changes,
>>>>>>>> especially major user facing API changes. Historically we upload
>>>>>>>> design docs
>>>>>>>> for major changes, but it is not always consistent and difficult to
>>>>>>>> quality
>>>>>>>> of the docs, due to the volunteering nature of the organization.
>>>>>>>>
>>>>>>>> Some of the more concrete ideas we discussed focus on building a
>>>>>>>> culture
>>>>>>>> to improve clarity:
>>>>>>>>
>>>>>>>> - Process: Large changes should have design docs posted on JIRA. One
>>>>>>>> thing
>>>>>>>> Cody and I didn't discuss but an idea that just came to me is we
>>>>>>>> should
>>>>>>>> create a design doc template for the project and ask everybody to
>>>>>>>> follow.
>>>>>>>> The design doc template should also explicitly list goals and
>>>>>>>> non-goals, to
>>>>>>>> make design doc more consistent.
>>>>>>>>
>>>>>>>> - Process: Email dev@ to solicit feedback. We have some this with
>>>>>>>> some
>>>>>>>> changes, but again very inconsistent. Just posting something on JIRA
>>>>>>>> isn't
>>>>>>>> sufficient, because there are simply too many JIRAs and the signal
>>>>>>>> get lost
>>>>>>>> in the noise. While this is generally impossible to enforce because
>>>>>>>> we can't
>>>>>>>> force all volunteers to conform to a process (or they might not even
>>>>>>>> be
>>>>>>>> aware of this),  those who are more familiar with the project can
>>>>>>>> help by
>>>>>>>> emailing the dev@ when they see something that hasn't been.
>>>>>>>>
>>>>>>>> - Culture: The design doc author(s) should be open to feedback. A
>>>>>>>> design
>>>>>>>> doc should serve as the base for discussion and is by no means the
>>>>>>>> final
>>>>>>>> design. Of course, this does not mean the author has to accept every
>>>>>>>> feedback. They should also be comfortable accepting / rejecting
>>>>>>>> ideas on
>>>>>>>> technical grounds.
>>>>>>>>
>>>>>>>> - Process / Culture: For major ongoing projects, it can be useful to
>>>>>>>> have
>>>>>>>> some monthly Google hangouts that are open to the world. I am
>>>>>>>> actually not
>>>>>>>> sure how well this will work, because of the volunteering nature and
>>>>>>>> we need
>>>>>>>> to adjust for timezones for people across the globe, but it seems
>>>>>>>> worth
>>>>>>>> trying.
>>>>>>>>
>>>>>>>> - Culture: Contributors (including committers) should be more direct
>>>>>>>> in
>>>>>>>> setting expectations, including whether they are working on a
>>>>>>>> specific
>>>>>>>> issue, whether they will be working on a specific issue, and whether
>>>>>>>> an
>>>>>>>> issue or pr or jira should be rejected. Most people I know in this
>>>>>>>> community
>>>>>>>> are nice and don't enjoy telling other people no, but it is often
>>>>>>>> more
>>>>>>>> annoying to a contributor to not know anything than getting a no.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia
>>>>>>>> <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Love the idea of a more visible "Spark Improvement Proposal"
>>>>>>>>> process that
>>>>>>>>> solicits user input on new APIs. For what it's worth, I don't think
>>>>>>>>> committers are trying to minimize their own work -- every committer
>>>>>>>>> cares
>>>>>>>>> about making the software useful for users. However, it is always
>>>>>>>>> hard to
>>>>>>>>> get user input and so it helps to have this kind of process. I've
>>>>>>>>> certainly
>>>>>>>>> looked at the *IPs a lot in other software I use just to see the
>>>>>>>>> biggest
>>>>>>>>> things on the roadmap.
>>>>>>>>>
>>>>>>>>> When you're talking about "changing interfaces", are you talking
>>>>>>>>> about
>>>>>>>>> public or internal APIs? I do think many people hate changing
>>>>>>>>> public APIs
>>>>>>>>> and I actually think that's for the best of the project. That's a
>>>>>>>>> technical
>>>>>>>>> debate, but basically, the worst thing when you're using a piece of
>>>>>>>>> software
>>>>>>>>> is that the developers constantly ask you to rewrite your app to
>>>>>>>>> update to a
>>>>>>>>> new version (and thus benefit from bug fixes, etc). Cue anyone
>>>>>>>>> who's used
>>>>>>>>> Protobuf, or Guava. The "let's get everyone to change their code
>>>>>>>>> this
>>>>>>>>> release" model works well within a single large company, but
>>>>>>>>> doesn't work
>>>>>>>>> well for a community, which is why nearly all *very* widely used
>>>>>>>>> programming
>>>>>>>>> interfaces (I'm talking things like Java standard library, Windows
>>>>>>>>> API, etc)
>>>>>>>>> almost *never* break backwards compatibility. All this is done
>>>>>>>>> within reason
>>>>>>>>> though, e.g. we do change things in major releases (2.x, 3.x, etc).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: [hidden email]
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Stavros Kontopoulos
>>>>> Senior Software Engineer
>>>>> Lightbend, Inc.
>>>>> p:  <a href="tel:%2B30%206977967274" value="+306977967274">+30 6977967274
>>>>> e: [hidden email]
>>>>>
>>>>>
>>>>
>>>
>>
>>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark Improvement Proposals

Cody Koeninger-2
In reply to this post by Ofir Manor
Yeah, I've looked at KIPs and Scala SIPs.

I'm reluctant to use the Kafka structured streaming as an example
because of the pre-existing conflict around it.  If Michael or another
committer wanted to put it forth as an example, I'd participate in
good faith though.

On Sun, Oct 9, 2016 at 5:07 PM, Ofir Manor <[hidden email]> wrote:

> This is a great discussion!
> Maybe you could have a look at Kafka's process - it also uses Rejected
> Alternatives and I personally find it very clear actually (the link also
> leads to all KIPs):
>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> Cody - maybe you could take one of the open issues and write a sample
> proposal? A concrete example might make it clearer for those who see this
> for the first time. Maybe the Kafka offset discussion or some other
> Kafka/Structured Streaming open issue? Will that be helpful?
>
> Ofir Manor
>
> Co-Founder & CTO | Equalum
>
> Mobile: +972-54-7801286 | Email: [hidden email]
>
>
> On Mon, Oct 10, 2016 at 12:36 AM, Matei Zaharia <[hidden email]>
> wrote:
>>
>> Yup, this is the stuff that I found unclear. Thanks for clarifying here,
>> but we should also clarify it in the writeup. In particular:
>>
>> - Goals needs to be about user-facing behavior ("people" is broad)
>>
>> - I'd rename Rejected Goals to Non-Goals. Otherwise someone will dig up
>> one of these and say "Spark's developers have officially rejected X, which
>> our awesome system has".
>>
>> - For user-facing stuff, I think you need a section on API. Virtually all
>> other *IPs I've seen have that.
>>
>> - I'm still not sure why the strategy section is needed if the purpose is
>> to define user-facing behavior -- unless this is the strategy for setting
>> the goals or for defining the API. That sounds squarely like a design doc
>> issue. In some sense, who cares whether the proposal is technically feasible
>> right now? If it's infeasible, that will be discovered later during design
>> and implementation. Same thing with rejected strategies -- listing some of
>> those is definitely useful sometimes, but if you make this a *required*
>> section, people are just going to fill it in with bogus stuff (I've seen
>> this happen before).
>>
>> Matei
>>
>> > On Oct 9, 2016, at 2:14 PM, Cody Koeninger <[hidden email]> wrote:
>> >
>> > So to focus the discussion on the specific strategy I'm suggesting,
>> > documented at
>> >
>> >
>> > https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>> >
>> > "Goals: What must this allow people to do, that they can't currently?"
>> >
>> > Is it unclear that this is focusing specifically on people-visible
>> > behavior?
>> >
>> > Rejected goals -  are important because otherwise people keep trying
>> > to argue about scope.  Of course you can change things later with a
>> > different SIP and different vote, the point is to focus.
>> >
>> > Use cases - are something that people are going to bring up in
>> > discussion.  If they aren't clearly documented as a goal ("This must
>> > allow me to connect using SSL"), they should be added.
>> >
>> > Internal architecture - if the people who need specific behavior are
>> > implementers of other parts of the system, that's fine.
>> >
>> > Rejected strategies - If you have none of these, you have no evidence
>> > that the proponent didn't just go with the first thing they had in
>> > mind (or have already implemented), which is a big problem currently.
>> > Approval isn't binding as to specifics of implementation, so these
>> > aren't handcuffs.  The goals are the contract, the strategy is
>> > evidence that contract can actually be met.
>> >
>> > Design docs - I'm not touching design docs.  The markdown file I
>> > linked specifically says of the strategy section "This is not a full
>> > design document."  Is this unclear?  Design docs can be worked on
>> > obviously, but that's not what I'm concerned with here.
>> >
>> >
>> >
>> >
>> > On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[hidden email]>
>> > wrote:
>> >> Hi Cody,
>> >>
>> >> I think this would be a lot more concrete if we had a more detailed
>> >> template
>> >> for SIPs. Right now, it's not super clear what's in scope -- e.g. are
>> >> they
>> >> a way to solicit feedback on the user-facing behavior or on the
>> >> internals?
>> >> "Goals" can cover both things. I've been thinking of SIPs more as
>> >> Product
>> >> Requirements Docs (PRDs), which focus on *what* a code change should do
>> >> as
>> >> opposed to how.
>> >>
>> >> In particular, here are some things that you may or may not consider in
>> >> scope for SIPs:
>> >>
>> >> - Goals and non-goals: This is definitely in scope, and IMO should
>> >> focus on
>> >> user-visible behavior (e.g. "system supports SQL window functions" or
>> >> "system continues working if one node fails"). BTW I wouldn't say
>> >> "rejected
>> >> goals" because some of them might become goals later, so we're not
>> >> definitively rejecting them.
>> >>
>> >> - Public API: Probably should be included in most SIPs unless it's too
>> >> large
>> >> to fully specify then (e.g. "let's add an ML library").
>> >>
>> >> - Use cases: I usually find this very useful in PRDs to better
>> >> communicate
>> >> the goals.
>> >>
>> >> - Internal architecture: This is usually *not* a thing users can easily
>> >> comment on and it sounds more like a design doc item. Of course it's
>> >> important to show that the SIP is feasible to implement. One exception,
>> >> however, is that I think we'll have some SIPs primarily on internals
>> >> (e.g.
>> >> if somebody wants to refactor Spark's query optimizer or something).
>> >>
>> >> - Rejected strategies: I personally wouldn't put this, because what's
>> >> the
>> >> point of voting to reject a strategy before you've really begun
>> >> designing
>> >> and implementing something? What if you discover that the strategy is
>> >> actually better when you start doing stuff?
>> >>
>> >> At a super high level, it depends on whether you want the SIPs to be
>> >> PRDs
>> >> for getting some quick feedback on the goals of a feature before it is
>> >> designed, or something more like full-fledged design docs (just a more
>> >> visible design doc for bigger changes). I looked at Kafka's KIPs, and
>> >> they
>> >> actually seem to be more like design docs. This can work too but it
>> >> does
>> >> require more work from the proposer and it can lead to the same
>> >> problems you
>> >> mentioned with people already having a design and implementation in
>> >> mind.
>> >>
>> >> Basically, the question is, are you trying to iterate faster on design
>> >> by
>> >> adding a step for user feedback earlier? Or are you just trying to make
>> >> design docs for key features more visible (and their approval more
>> >> formal)?
>> >>
>> >> BTW note that in either case, I'd like to have a template for design
>> >> docs
>> >> too, which should also include goals. I think that would've avoided
>> >> some of
>> >> the issues you brought up.
>> >>
>> >> Matei
>> >>
>> >> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]> wrote:
>> >>
>> >> Here's my specific proposal (meta-proposal?)
>> >>
>> >> Spark Improvement Proposals (SIP)
>> >>
>> >>
>> >> Background:
>> >>
>> >> The current problem is that design and implementation of large features
>> >> are
>> >> often done in private, before soliciting user feedback.
>> >>
>> >> When feedback is solicited, it is often as to detailed design
>> >> specifics, not
>> >> focused on goals.
>> >>
>> >> When implementation does take place after design, there is often
>> >> disagreement as to what goals are or are not in scope.
>> >>
>> >> This results in commits that don't fully meet user needs.
>> >>
>> >>
>> >> Goals:
>> >>
>> >> - Ensure user, contributor, and committer goals are clearly identified
>> >> and
>> >> agreed upon, before implementation takes place.
>> >>
>> >> - Ensure that a technically feasible strategy is chosen that is likely
>> >> to
>> >> meet the goals.
>> >>
>> >>
>> >> Rejected Goals:
>> >>
>> >> - SIPs are not for detailed design.  Design by committee doesn't work.
>> >>
>> >> - SIPs are not for every change.  We dont need that much process.
>> >>
>> >>
>> >> Strategy:
>> >>
>> >> My suggestion is outlined as a Spark Improvement Proposal process
>> >> documented
>> >> at
>> >>
>> >>
>> >> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>> >>
>> >> Specifics of Jira manipulation are an implementation detail we can
>> >> figure
>> >> out.
>> >>
>> >> I'm suggesting voting; the need here is for a _clear_ outcome.
>> >>
>> >>
>> >> Rejected Strategies:
>> >>
>> >> Having someone who understands the problem implement it first works,
>> >> but
>> >> only if significant iteration after user feedback is allowed.
>> >>
>> >> Historically this has been problematic due to pressure to limit public
>> >> api
>> >> changes.
>> >>
>> >>
>> >> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]>
>> >> wrote:
>> >>>
>> >>> Alright looks like there are quite a bit of support. We should wait to
>> >>> hear from more people too.
>> >>>
>> >>> To push this forward, Cody and I will be working together in the next
>> >>> couple of weeks to come up with a concrete, detailed proposal on what
>> >>> this
>> >>> entails, and then we can discuss this the specific proposal as well.
>> >>>
>> >>>
>> >>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden email]>
>> >>> wrote:
>> >>>>
>> >>>> Yeah, in case it wasn't clear, I was talking about SIPs for major
>> >>>> user-facing or cross-cutting changes, not minor feature adds.
>> >>>>
>> >>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos
>> >>>> <[hidden email]> wrote:
>> >>>>>
>> >>>>> +1 to the SIP label as long as it does not slow down things and it
>> >>>>> targets optimizing efforts, coordination etc. For example really
>> >>>>> small
>> >>>>> features should not need to go through this process (assuming they
>> >>>>> dont
>> >>>>> touch public interfaces)  or re-factorings and hope it will be kept
>> >>>>> this
>> >>>>> way. So as a guideline doc should be provided, like in the KIP case.
>> >>>>>
>> >>>>> IMHO so far aside from tagging things and linking them elsewhere
>> >>>>> simply
>> >>>>> having design docs and prototypes implementations in PRs is not
>> >>>>> something
>> >>>>> that has not worked so far. What is really a pain in many projects
>> >>>>> out there
>> >>>>> is discontinuity in progress of PRs, missing features, slow reviews
>> >>>>> which is
>> >>>>> understandable to some extent... it is not only about Spark but
>> >>>>> things can
>> >>>>> be improved for sure for this project in particular as already
>> >>>>> stated.
>> >>>>>
>> >>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden email]>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> +1 to adding an SIP label and linking it from the website.  I think
>> >>>>>> it
>> >>>>>> needs
>> >>>>>>
>> >>>>>> - template that focuses it towards soliciting user goals / non
>> >>>>>> goals
>> >>>>>> - clear resolution as to which strategy was chosen to pursue.  I'd
>> >>>>>> recommend a vote.
>> >>>>>>
>> >>>>>> Matei asked me to clarify what I meant by changing interfaces, I
>> >>>>>> think
>> >>>>>> it's directly relevant to the SIP idea so I'll clarify here, and
>> >>>>>> split
>> >>>>>> a thread for the other discussion per Nicholas' request.
>> >>>>>>
>> >>>>>> I meant changing public user interfaces.  I think the first design
>> >>>>>> is
>> >>>>>> unlikely to be right, because it's done at a time when you have the
>> >>>>>> least information.  As a user, I find it considerably more
>> >>>>>> frustrating
>> >>>>>> to be unable to use a tool to get my job done, than I do having to
>> >>>>>> make minor changes to my code in order to take advantage of
>> >>>>>> features.
>> >>>>>> I've seen committers be seriously reluctant to allow changes to
>> >>>>>> @experimental code that are needed in order for it to really work
>> >>>>>> right.  You need to be able to iterate, and if people on both sides
>> >>>>>> of
>> >>>>>> the fence aren't going to respect that some newer apis are subject
>> >>>>>> to
>> >>>>>> change, then why even mark them as such?
>> >>>>>>
>> >>>>>> Ideally a finished SIP should give me a checklist of things that an
>> >>>>>> implementation must do, and things that it doesn't need to do.
>> >>>>>> Contributors/committers should be seriously discouraged from
>> >>>>>> putting
>> >>>>>> out a version 0.1 that doesn't have at least a prototype
>> >>>>>> implementation of all those things, especially if they're then
>> >>>>>> going
>> >>>>>> to argue against interface changes necessary to get the the rest of
>> >>>>>> the things done in the 0.2 version.
>> >>>>>>
>> >>>>>>
>> >>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden email]>
>> >>>>>> wrote:
>> >>>>>>> I like the lightweight proposal to add a SIP label.
>> >>>>>>>
>> >>>>>>> During Spark 2.0 development, Tom (Graves) and I suggested using
>> >>>>>>> wiki
>> >>>>>>> to
>> >>>>>>> track the list of major changes, but that never really
>> >>>>>>> materialized
>> >>>>>>> due to
>> >>>>>>> the overhead. Adding a SIP label on major JIRAs and then link to
>> >>>>>>> them
>> >>>>>>> prominently on the Spark website makes a lot of sense.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia
>> >>>>>>> <[hidden email]>
>> >>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>> For the improvement proposals, I think one major point was to
>> >>>>>>>> make
>> >>>>>>>> them
>> >>>>>>>> really visible to users who are not contributors, so we should do
>> >>>>>>>> more than
>> >>>>>>>> sending stuff to dev@. One very lightweight idea is to have a new
>> >>>>>>>> type of
>> >>>>>>>> JIRA called a SIP and have a link to a filter that shows all such
>> >>>>>>>> JIRAs from
>> >>>>>>>> http://spark.apache.org. I also like the idea of SIP and design
>> >>>>>>>> doc
>> >>>>>>>> templates (in fact many projects have them).
>> >>>>>>>>
>> >>>>>>>> Matei
>> >>>>>>>>
>> >>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>> I called Cody last night and talked about some of the topics in
>> >>>>>>>> his
>> >>>>>>>> email.
>> >>>>>>>> It became clear to me Cody genuinely cares about the project.
>> >>>>>>>>
>> >>>>>>>> Some of the frustrations come from the success of the project
>> >>>>>>>> itself
>> >>>>>>>> becoming very "hot", and it is difficult to get clarity from
>> >>>>>>>> people
>> >>>>>>>> who
>> >>>>>>>> don't dedicate all their time to Spark. In fact, it is in some
>> >>>>>>>> ways
>> >>>>>>>> similar
>> >>>>>>>> to scaling an engineering team in a successful startup: old
>> >>>>>>>> processes that
>> >>>>>>>> worked well might not work so well when it gets to a certain
>> >>>>>>>> size,
>> >>>>>>>> cultures
>> >>>>>>>> can get diluted, building culture vs building process, etc.
>> >>>>>>>>
>> >>>>>>>> I also really like to have a more visible process for larger
>> >>>>>>>> changes,
>> >>>>>>>> especially major user facing API changes. Historically we upload
>> >>>>>>>> design docs
>> >>>>>>>> for major changes, but it is not always consistent and difficult
>> >>>>>>>> to
>> >>>>>>>> quality
>> >>>>>>>> of the docs, due to the volunteering nature of the organization.
>> >>>>>>>>
>> >>>>>>>> Some of the more concrete ideas we discussed focus on building a
>> >>>>>>>> culture
>> >>>>>>>> to improve clarity:
>> >>>>>>>>
>> >>>>>>>> - Process: Large changes should have design docs posted on JIRA.
>> >>>>>>>> One
>> >>>>>>>> thing
>> >>>>>>>> Cody and I didn't discuss but an idea that just came to me is we
>> >>>>>>>> should
>> >>>>>>>> create a design doc template for the project and ask everybody to
>> >>>>>>>> follow.
>> >>>>>>>> The design doc template should also explicitly list goals and
>> >>>>>>>> non-goals, to
>> >>>>>>>> make design doc more consistent.
>> >>>>>>>>
>> >>>>>>>> - Process: Email dev@ to solicit feedback. We have some this with
>> >>>>>>>> some
>> >>>>>>>> changes, but again very inconsistent. Just posting something on
>> >>>>>>>> JIRA
>> >>>>>>>> isn't
>> >>>>>>>> sufficient, because there are simply too many JIRAs and the
>> >>>>>>>> signal
>> >>>>>>>> get lost
>> >>>>>>>> in the noise. While this is generally impossible to enforce
>> >>>>>>>> because
>> >>>>>>>> we can't
>> >>>>>>>> force all volunteers to conform to a process (or they might not
>> >>>>>>>> even
>> >>>>>>>> be
>> >>>>>>>> aware of this),  those who are more familiar with the project can
>> >>>>>>>> help by
>> >>>>>>>> emailing the dev@ when they see something that hasn't been.
>> >>>>>>>>
>> >>>>>>>> - Culture: The design doc author(s) should be open to feedback. A
>> >>>>>>>> design
>> >>>>>>>> doc should serve as the base for discussion and is by no means
>> >>>>>>>> the
>> >>>>>>>> final
>> >>>>>>>> design. Of course, this does not mean the author has to accept
>> >>>>>>>> every
>> >>>>>>>> feedback. They should also be comfortable accepting / rejecting
>> >>>>>>>> ideas on
>> >>>>>>>> technical grounds.
>> >>>>>>>>
>> >>>>>>>> - Process / Culture: For major ongoing projects, it can be useful
>> >>>>>>>> to
>> >>>>>>>> have
>> >>>>>>>> some monthly Google hangouts that are open to the world. I am
>> >>>>>>>> actually not
>> >>>>>>>> sure how well this will work, because of the volunteering nature
>> >>>>>>>> and
>> >>>>>>>> we need
>> >>>>>>>> to adjust for timezones for people across the globe, but it seems
>> >>>>>>>> worth
>> >>>>>>>> trying.
>> >>>>>>>>
>> >>>>>>>> - Culture: Contributors (including committers) should be more
>> >>>>>>>> direct
>> >>>>>>>> in
>> >>>>>>>> setting expectations, including whether they are working on a
>> >>>>>>>> specific
>> >>>>>>>> issue, whether they will be working on a specific issue, and
>> >>>>>>>> whether
>> >>>>>>>> an
>> >>>>>>>> issue or pr or jira should be rejected. Most people I know in
>> >>>>>>>> this
>> >>>>>>>> community
>> >>>>>>>> are nice and don't enjoy telling other people no, but it is often
>> >>>>>>>> more
>> >>>>>>>> annoying to a contributor to not know anything than getting a no.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia
>> >>>>>>>> <[hidden email]>
>> >>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Love the idea of a more visible "Spark Improvement Proposal"
>> >>>>>>>>> process that
>> >>>>>>>>> solicits user input on new APIs. For what it's worth, I don't
>> >>>>>>>>> think
>> >>>>>>>>> committers are trying to minimize their own work -- every
>> >>>>>>>>> committer
>> >>>>>>>>> cares
>> >>>>>>>>> about making the software useful for users. However, it is
>> >>>>>>>>> always
>> >>>>>>>>> hard to
>> >>>>>>>>> get user input and so it helps to have this kind of process.
>> >>>>>>>>> I've
>> >>>>>>>>> certainly
>> >>>>>>>>> looked at the *IPs a lot in other software I use just to see the
>> >>>>>>>>> biggest
>> >>>>>>>>> things on the roadmap.
>> >>>>>>>>>
>> >>>>>>>>> When you're talking about "changing interfaces", are you talking
>> >>>>>>>>> about
>> >>>>>>>>> public or internal APIs? I do think many people hate changing
>> >>>>>>>>> public APIs
>> >>>>>>>>> and I actually think that's for the best of the project. That's
>> >>>>>>>>> a
>> >>>>>>>>> technical
>> >>>>>>>>> debate, but basically, the worst thing when you're using a piece
>> >>>>>>>>> of
>> >>>>>>>>> software
>> >>>>>>>>> is that the developers constantly ask you to rewrite your app to
>> >>>>>>>>> update to a
>> >>>>>>>>> new version (and thus benefit from bug fixes, etc). Cue anyone
>> >>>>>>>>> who's used
>> >>>>>>>>> Protobuf, or Guava. The "let's get everyone to change their code
>> >>>>>>>>> this
>> >>>>>>>>> release" model works well within a single large company, but
>> >>>>>>>>> doesn't work
>> >>>>>>>>> well for a community, which is why nearly all *very* widely used
>> >>>>>>>>> programming
>> >>>>>>>>> interfaces (I'm talking things like Java standard library,
>> >>>>>>>>> Windows
>> >>>>>>>>> API, etc)
>> >>>>>>>>> almost *never* break backwards compatibility. All this is done
>> >>>>>>>>> within reason
>> >>>>>>>>> though, e.g. we do change things in major releases (2.x, 3.x,
>> >>>>>>>>> etc).
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> ---------------------------------------------------------------------
>> >>>>>> To unsubscribe e-mail: [hidden email]
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Stavros Kontopoulos
>> >>>>> Senior Software Engineer
>> >>>>> Lightbend, Inc.
>> >>>>> p:  +30 6977967274
>> >>>>> e: [hidden email]
>> >>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [hidden email]
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark Improvement Proposals

Matei Zaharia
Administrator
In reply to this post by Cody Koeninger-2
Well, I think there are a few things here that don't make sense. First, why should only committers submit SIPs? Development in the project should be open to all contributors, whether they're committers or not. Second, I think unrealistic goals can be found just by inspecting the goals, and I'm not super worried that we'll accept a lot of SIPs that are then infeasible -- we can then submit new ones. But this depends on whether you want this process to be a "design doc lite", where people also agree on implementation strategy, or just a way to agree on goals. This is what I asked earlier about PRDs vs design docs (and I'm open to either one but I'd just like clarity). Finally, both as a user and designer of software, I always want to give feedback on APIs, so I'd really like a culture of having those early. People don't argue about prettiness when they discuss APIs, they argue about the core concepts to expose in order to meet various goals, and then they're stuck maintaining those for a long time.

Matei

On Oct 9, 2016, at 3:10 PM, Cody Koeninger <[hidden email]> wrote:

Users instead of people, sure.  Commiters and contributors are (or at least should be) a subset of users.

Non goals, sure. I don't care what the name is, but we need to clearly say e.g. 'no we are not maintaining compatibility with XYZ right now'.

API, what I care most about is whether it allows me to accomplish the goals. Arguing about how ugly or pretty it is can be saved for design/ implementation imho.

Strategy, this is necessary because otherwise goals can be out of line with reality.  Don't propose goals you don't have at least some idea of how to implement.

Rejected strategies, given that commiters are the only ones I'm saying should formally submit SPARKLIs or SIPs, if they put junk in a required section then slap them down for it and tell them to fix it.


On Oct 9, 2016 4:36 PM, "Matei Zaharia" <[hidden email]> wrote:
Yup, this is the stuff that I found unclear. Thanks for clarifying here, but we should also clarify it in the writeup. In particular:

- Goals needs to be about user-facing behavior ("people" is broad)

- I'd rename Rejected Goals to Non-Goals. Otherwise someone will dig up one of these and say "Spark's developers have officially rejected X, which our awesome system has".

- For user-facing stuff, I think you need a section on API. Virtually all other *IPs I've seen have that.

- I'm still not sure why the strategy section is needed if the purpose is to define user-facing behavior -- unless this is the strategy for setting the goals or for defining the API. That sounds squarely like a design doc issue. In some sense, who cares whether the proposal is technically feasible right now? If it's infeasible, that will be discovered later during design and implementation. Same thing with rejected strategies -- listing some of those is definitely useful sometimes, but if you make this a *required* section, people are just going to fill it in with bogus stuff (I've seen this happen before).

Matei

> On Oct 9, 2016, at 2:14 PM, Cody Koeninger <[hidden email]> wrote:
>
> So to focus the discussion on the specific strategy I'm suggesting,
> documented at
>
> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>
> "Goals: What must this allow people to do, that they can't currently?"
>
> Is it unclear that this is focusing specifically on people-visible behavior?
>
> Rejected goals -  are important because otherwise people keep trying
> to argue about scope.  Of course you can change things later with a
> different SIP and different vote, the point is to focus.
>
> Use cases - are something that people are going to bring up in
> discussion.  If they aren't clearly documented as a goal ("This must
> allow me to connect using SSL"), they should be added.
>
> Internal architecture - if the people who need specific behavior are
> implementers of other parts of the system, that's fine.
>
> Rejected strategies - If you have none of these, you have no evidence
> that the proponent didn't just go with the first thing they had in
> mind (or have already implemented), which is a big problem currently.
> Approval isn't binding as to specifics of implementation, so these
> aren't handcuffs.  The goals are the contract, the strategy is
> evidence that contract can actually be met.
>
> Design docs - I'm not touching design docs.  The markdown file I
> linked specifically says of the strategy section "This is not a full
> design document."  Is this unclear?  Design docs can be worked on
> obviously, but that's not what I'm concerned with here.
>
>
>
>
> On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[hidden email]> wrote:
>> Hi Cody,
>>
>> I think this would be a lot more concrete if we had a more detailed template
>> for SIPs. Right now, it's not super clear what's in scope -- e.g. are  they
>> a way to solicit feedback on the user-facing behavior or on the internals?
>> "Goals" can cover both things. I've been thinking of SIPs more as Product
>> Requirements Docs (PRDs), which focus on *what* a code change should do as
>> opposed to how.
>>
>> In particular, here are some things that you may or may not consider in
>> scope for SIPs:
>>
>> - Goals and non-goals: This is definitely in scope, and IMO should focus on
>> user-visible behavior (e.g. "system supports SQL window functions" or
>> "system continues working if one node fails"). BTW I wouldn't say "rejected
>> goals" because some of them might become goals later, so we're not
>> definitively rejecting them.
>>
>> - Public API: Probably should be included in most SIPs unless it's too large
>> to fully specify then (e.g. "let's add an ML library").
>>
>> - Use cases: I usually find this very useful in PRDs to better communicate
>> the goals.
>>
>> - Internal architecture: This is usually *not* a thing users can easily
>> comment on and it sounds more like a design doc item. Of course it's
>> important to show that the SIP is feasible to implement. One exception,
>> however, is that I think we'll have some SIPs primarily on internals (e.g.
>> if somebody wants to refactor Spark's query optimizer or something).
>>
>> - Rejected strategies: I personally wouldn't put this, because what's the
>> point of voting to reject a strategy before you've really begun designing
>> and implementing something? What if you discover that the strategy is
>> actually better when you start doing stuff?
>>
>> At a super high level, it depends on whether you want the SIPs to be PRDs
>> for getting some quick feedback on the goals of a feature before it is
>> designed, or something more like full-fledged design docs (just a more
>> visible design doc for bigger changes). I looked at Kafka's KIPs, and they
>> actually seem to be more like design docs. This can work too but it does
>> require more work from the proposer and it can lead to the same problems you
>> mentioned with people already having a design and implementation in mind.
>>
>> Basically, the question is, are you trying to iterate faster on design by
>> adding a step for user feedback earlier? Or are you just trying to make
>> design docs for key features more visible (and their approval more formal)?
>>
>> BTW note that in either case, I'd like to have a template for design docs
>> too, which should also include goals. I think that would've avoided some of
>> the issues you brought up.
>>
>> Matei
>>
>> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]> wrote:
>>
>> Here's my specific proposal (meta-proposal?)
>>
>> Spark Improvement Proposals (SIP)
>>
>>
>> Background:
>>
>> The current problem is that design and implementation of large features are
>> often done in private, before soliciting user feedback.
>>
>> When feedback is solicited, it is often as to detailed design specifics, not
>> focused on goals.
>>
>> When implementation does take place after design, there is often
>> disagreement as to what goals are or are not in scope.
>>
>> This results in commits that don't fully meet user needs.
>>
>>
>> Goals:
>>
>> - Ensure user, contributor, and committer goals are clearly identified and
>> agreed upon, before implementation takes place.
>>
>> - Ensure that a technically feasible strategy is chosen that is likely to
>> meet the goals.
>>
>>
>> Rejected Goals:
>>
>> - SIPs are not for detailed design.  Design by committee doesn't work.
>>
>> - SIPs are not for every change.  We dont need that much process.
>>
>>
>> Strategy:
>>
>> My suggestion is outlined as a Spark Improvement Proposal process documented
>> at
>>
>> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>>
>> Specifics of Jira manipulation are an implementation detail we can figure
>> out.
>>
>> I'm suggesting voting; the need here is for a _clear_ outcome.
>>
>>
>> Rejected Strategies:
>>
>> Having someone who understands the problem implement it first works, but
>> only if significant iteration after user feedback is allowed.
>>
>> Historically this has been problematic due to pressure to limit public api
>> changes.
>>
>>
>> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]> wrote:
>>>
>>> Alright looks like there are quite a bit of support. We should wait to
>>> hear from more people too.
>>>
>>> To push this forward, Cody and I will be working together in the next
>>> couple of weeks to come up with a concrete, detailed proposal on what this
>>> entails, and then we can discuss this the specific proposal as well.
>>>
>>>
>>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden email]> wrote:
>>>>
>>>> Yeah, in case it wasn't clear, I was talking about SIPs for major
>>>> user-facing or cross-cutting changes, not minor feature adds.
>>>>
>>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos
>>>> <[hidden email]> wrote:
>>>>>
>>>>> +1 to the SIP label as long as it does not slow down things and it
>>>>> targets optimizing efforts, coordination etc. For example really small
>>>>> features should not need to go through this process (assuming they dont
>>>>> touch public interfaces)  or re-factorings and hope it will be kept this
>>>>> way. So as a guideline doc should be provided, like in the KIP case.
>>>>>
>>>>> IMHO so far aside from tagging things and linking them elsewhere simply
>>>>> having design docs and prototypes implementations in PRs is not something
>>>>> that has not worked so far. What is really a pain in many projects out there
>>>>> is discontinuity in progress of PRs, missing features, slow reviews which is
>>>>> understandable to some extent... it is not only about Spark but things can
>>>>> be improved for sure for this project in particular as already stated.
>>>>>
>>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden email]>
>>>>> wrote:
>>>>>>
>>>>>> +1 to adding an SIP label and linking it from the website.  I think it
>>>>>> needs
>>>>>>
>>>>>> - template that focuses it towards soliciting user goals / non goals
>>>>>> - clear resolution as to which strategy was chosen to pursue.  I'd
>>>>>> recommend a vote.
>>>>>>
>>>>>> Matei asked me to clarify what I meant by changing interfaces, I think
>>>>>> it's directly relevant to the SIP idea so I'll clarify here, and split
>>>>>> a thread for the other discussion per Nicholas' request.
>>>>>>
>>>>>> I meant changing public user interfaces.  I think the first design is
>>>>>> unlikely to be right, because it's done at a time when you have the
>>>>>> least information.  As a user, I find it considerably more frustrating
>>>>>> to be unable to use a tool to get my job done, than I do having to
>>>>>> make minor changes to my code in order to take advantage of features.
>>>>>> I've seen committers be seriously reluctant to allow changes to
>>>>>> @experimental code that are needed in order for it to really work
>>>>>> right.  You need to be able to iterate, and if people on both sides of
>>>>>> the fence aren't going to respect that some newer apis are subject to
>>>>>> change, then why even mark them as such?
>>>>>>
>>>>>> Ideally a finished SIP should give me a checklist of things that an
>>>>>> implementation must do, and things that it doesn't need to do.
>>>>>> Contributors/committers should be seriously discouraged from putting
>>>>>> out a version 0.1 that doesn't have at least a prototype
>>>>>> implementation of all those things, especially if they're then going
>>>>>> to argue against interface changes necessary to get the the rest of
>>>>>> the things done in the 0.2 version.
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden email]>
>>>>>> wrote:
>>>>>>> I like the lightweight proposal to add a SIP label.
>>>>>>>
>>>>>>> During Spark 2.0 development, Tom (Graves) and I suggested using wiki
>>>>>>> to
>>>>>>> track the list of major changes, but that never really materialized
>>>>>>> due to
>>>>>>> the overhead. Adding a SIP label on major JIRAs and then link to them
>>>>>>> prominently on the Spark website makes a lot of sense.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia
>>>>>>> <[hidden email]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> For the improvement proposals, I think one major point was to make
>>>>>>>> them
>>>>>>>> really visible to users who are not contributors, so we should do
>>>>>>>> more than
>>>>>>>> sending stuff to dev@. One very lightweight idea is to have a new
>>>>>>>> type of
>>>>>>>> JIRA called a SIP and have a link to a filter that shows all such
>>>>>>>> JIRAs from
>>>>>>>> http://spark.apache.org. I also like the idea of SIP and design doc
>>>>>>>> templates (in fact many projects have them).
>>>>>>>>
>>>>>>>> Matei
>>>>>>>>
>>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> I called Cody last night and talked about some of the topics in his
>>>>>>>> email.
>>>>>>>> It became clear to me Cody genuinely cares about the project.
>>>>>>>>
>>>>>>>> Some of the frustrations come from the success of the project itself
>>>>>>>> becoming very "hot", and it is difficult to get clarity from people
>>>>>>>> who
>>>>>>>> don't dedicate all their time to Spark. In fact, it is in some ways
>>>>>>>> similar
>>>>>>>> to scaling an engineering team in a successful startup: old
>>>>>>>> processes that
>>>>>>>> worked well might not work so well when it gets to a certain size,
>>>>>>>> cultures
>>>>>>>> can get diluted, building culture vs building process, etc.
>>>>>>>>
>>>>>>>> I also really like to have a more visible process for larger
>>>>>>>> changes,
>>>>>>>> especially major user facing API changes. Historically we upload
>>>>>>>> design docs
>>>>>>>> for major changes, but it is not always consistent and difficult to
>>>>>>>> quality
>>>>>>>> of the docs, due to the volunteering nature of the organization.
>>>>>>>>
>>>>>>>> Some of the more concrete ideas we discussed focus on building a
>>>>>>>> culture
>>>>>>>> to improve clarity:
>>>>>>>>
>>>>>>>> - Process: Large changes should have design docs posted on JIRA. One
>>>>>>>> thing
>>>>>>>> Cody and I didn't discuss but an idea that just came to me is we
>>>>>>>> should
>>>>>>>> create a design doc template for the project and ask everybody to
>>>>>>>> follow.
>>>>>>>> The design doc template should also explicitly list goals and
>>>>>>>> non-goals, to
>>>>>>>> make design doc more consistent.
>>>>>>>>
>>>>>>>> - Process: Email dev@ to solicit feedback. We have some this with
>>>>>>>> some
>>>>>>>> changes, but again very inconsistent. Just posting something on JIRA
>>>>>>>> isn't
>>>>>>>> sufficient, because there are simply too many JIRAs and the signal
>>>>>>>> get lost
>>>>>>>> in the noise. While this is generally impossible to enforce because
>>>>>>>> we can't
>>>>>>>> force all volunteers to conform to a process (or they might not even
>>>>>>>> be
>>>>>>>> aware of this),  those who are more familiar with the project can
>>>>>>>> help by
>>>>>>>> emailing the dev@ when they see something that hasn't been.
>>>>>>>>
>>>>>>>> - Culture: The design doc author(s) should be open to feedback. A
>>>>>>>> design
>>>>>>>> doc should serve as the base for discussion and is by no means the
>>>>>>>> final
>>>>>>>> design. Of course, this does not mean the author has to accept every
>>>>>>>> feedback. They should also be comfortable accepting / rejecting
>>>>>>>> ideas on
>>>>>>>> technical grounds.
>>>>>>>>
>>>>>>>> - Process / Culture: For major ongoing projects, it can be useful to
>>>>>>>> have
>>>>>>>> some monthly Google hangouts that are open to the world. I am
>>>>>>>> actually not
>>>>>>>> sure how well this will work, because of the volunteering nature and
>>>>>>>> we need
>>>>>>>> to adjust for timezones for people across the globe, but it seems
>>>>>>>> worth
>>>>>>>> trying.
>>>>>>>>
>>>>>>>> - Culture: Contributors (including committers) should be more direct
>>>>>>>> in
>>>>>>>> setting expectations, including whether they are working on a
>>>>>>>> specific
>>>>>>>> issue, whether they will be working on a specific issue, and whether
>>>>>>>> an
>>>>>>>> issue or pr or jira should be rejected. Most people I know in this
>>>>>>>> community
>>>>>>>> are nice and don't enjoy telling other people no, but it is often
>>>>>>>> more
>>>>>>>> annoying to a contributor to not know anything than getting a no.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia
>>>>>>>> <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Love the idea of a more visible "Spark Improvement Proposal"
>>>>>>>>> process that
>>>>>>>>> solicits user input on new APIs. For what it's worth, I don't think
>>>>>>>>> committers are trying to minimize their own work -- every committer
>>>>>>>>> cares
>>>>>>>>> about making the software useful for users. However, it is always
>>>>>>>>> hard to
>>>>>>>>> get user input and so it helps to have this kind of process. I've
>>>>>>>>> certainly
>>>>>>>>> looked at the *IPs a lot in other software I use just to see the
>>>>>>>>> biggest
>>>>>>>>> things on the roadmap.
>>>>>>>>>
>>>>>>>>> When you're talking about "changing interfaces", are you talking
>>>>>>>>> about
>>>>>>>>> public or internal APIs? I do think many people hate changing
>>>>>>>>> public APIs
>>>>>>>>> and I actually think that's for the best of the project. That's a
>>>>>>>>> technical
>>>>>>>>> debate, but basically, the worst thing when you're using a piece of
>>>>>>>>> software
>>>>>>>>> is that the developers constantly ask you to rewrite your app to
>>>>>>>>> update to a
>>>>>>>>> new version (and thus benefit from bug fixes, etc). Cue anyone
>>>>>>>>> who's used
>>>>>>>>> Protobuf, or Guava. The "let's get everyone to change their code
>>>>>>>>> this
>>>>>>>>> release" model works well within a single large company, but
>>>>>>>>> doesn't work
>>>>>>>>> well for a community, which is why nearly all *very* widely used
>>>>>>>>> programming
>>>>>>>>> interfaces (I'm talking things like Java standard library, Windows
>>>>>>>>> API, etc)
>>>>>>>>> almost *never* break backwards compatibility. All this is done
>>>>>>>>> within reason
>>>>>>>>> though, e.g. we do change things in major releases (2.x, 3.x, etc).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: [hidden email]
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Stavros Kontopoulos
>>>>> Senior Software Engineer
>>>>> Lightbend, Inc.
>>>>> p:  <a href="tel:%2B30%206977967274" value="+306977967274" class="">+30 6977967274
>>>>> e: [hidden email]
>>>>>
>>>>>
>>>>
>>>
>>
>>


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark Improvement Proposals

Nicholas Chammas
In reply to this post by Cody Koeninger-2
On Sun, Oct 9, 2016 at 5:19 PM Cody Koeninger <[hidden email]> wrote:
Regarding name, if the SIP overlap is a concern, we can pick a different name.

My tongue in cheek suggestion would be

Spark Lightweight Improvement process (SPARKLI)

If others share my minor concern about the SIP name, I propose Spark Enhancement Proposal (SEP), taking inspiration from the Python Enhancement Proposal name.

So if we're going to number proposals like other projects do, they'd be numbered SEP-1, SEP-2, etc. This avoids the naming conflict with Scala SIPs.

Another way to avoid a conflict is to stick with "Spark Improvement Proposal" but use SPIP as the acronym. So SPIP-1, SPIP-2, etc.

Anyway, it's not a big deal. I just wanted to raise this point.

Nick
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark Improvement Proposals

Cody Koeninger-2
In reply to this post by Matei Zaharia
Only committers should formally submit SIPs because in an apache
project only commiters have explicit political power.  If a user can't
find a commiter willing to sponsor an SIP idea, they have no way to
get the idea passed in any case.  If I can't find a committer to
sponsor this meta-SIP idea, I'm out of luck.

I do not believe unrealistic goals can be found solely by inspection.
We've managed to ignore unrealistic goals even after implementation!
Focusing on APIs can allow people to think they've solved something,
when there's really no way of implementing that API while meeting the
goals.  Rapid iteration is clearly the best way to address this, but
we've already talked about why that hasn't really worked.  If adding a
non-binding API section to the template is important to you, I'm not
against it, but I don't think it's sufficient.

On your PRD vs design doc spectrum, I'm saying this is closer to a
PRD.  Clear agreement on goals is the most important thing and that's
why it's the thing I want binding agreement on.  But I cannot agree to
goals unless I have enough minimal technical info to judge whether the
goals are likely to actually be accomplished.



On Sun, Oct 9, 2016 at 5:35 PM, Matei Zaharia <[hidden email]> wrote:

> Well, I think there are a few things here that don't make sense. First, why
> should only committers submit SIPs? Development in the project should be
> open to all contributors, whether they're committers or not. Second, I think
> unrealistic goals can be found just by inspecting the goals, and I'm not
> super worried that we'll accept a lot of SIPs that are then infeasible -- we
> can then submit new ones. But this depends on whether you want this process
> to be a "design doc lite", where people also agree on implementation
> strategy, or just a way to agree on goals. This is what I asked earlier
> about PRDs vs design docs (and I'm open to either one but I'd just like
> clarity). Finally, both as a user and designer of software, I always want to
> give feedback on APIs, so I'd really like a culture of having those early.
> People don't argue about prettiness when they discuss APIs, they argue about
> the core concepts to expose in order to meet various goals, and then they're
> stuck maintaining those for a long time.
>
> Matei
>
> On Oct 9, 2016, at 3:10 PM, Cody Koeninger <[hidden email]> wrote:
>
> Users instead of people, sure.  Commiters and contributors are (or at least
> should be) a subset of users.
>
> Non goals, sure. I don't care what the name is, but we need to clearly say
> e.g. 'no we are not maintaining compatibility with XYZ right now'.
>
> API, what I care most about is whether it allows me to accomplish the goals.
> Arguing about how ugly or pretty it is can be saved for design/
> implementation imho.
>
> Strategy, this is necessary because otherwise goals can be out of line with
> reality.  Don't propose goals you don't have at least some idea of how to
> implement.
>
> Rejected strategies, given that commiters are the only ones I'm saying
> should formally submit SPARKLIs or SIPs, if they put junk in a required
> section then slap them down for it and tell them to fix it.
>
>
> On Oct 9, 2016 4:36 PM, "Matei Zaharia" <[hidden email]> wrote:
>>
>> Yup, this is the stuff that I found unclear. Thanks for clarifying here,
>> but we should also clarify it in the writeup. In particular:
>>
>> - Goals needs to be about user-facing behavior ("people" is broad)
>>
>> - I'd rename Rejected Goals to Non-Goals. Otherwise someone will dig up
>> one of these and say "Spark's developers have officially rejected X, which
>> our awesome system has".
>>
>> - For user-facing stuff, I think you need a section on API. Virtually all
>> other *IPs I've seen have that.
>>
>> - I'm still not sure why the strategy section is needed if the purpose is
>> to define user-facing behavior -- unless this is the strategy for setting
>> the goals or for defining the API. That sounds squarely like a design doc
>> issue. In some sense, who cares whether the proposal is technically feasible
>> right now? If it's infeasible, that will be discovered later during design
>> and implementation. Same thing with rejected strategies -- listing some of
>> those is definitely useful sometimes, but if you make this a *required*
>> section, people are just going to fill it in with bogus stuff (I've seen
>> this happen before).
>>
>> Matei
>>
>> > On Oct 9, 2016, at 2:14 PM, Cody Koeninger <[hidden email]> wrote:
>> >
>> > So to focus the discussion on the specific strategy I'm suggesting,
>> > documented at
>> >
>> >
>> > https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>> >
>> > "Goals: What must this allow people to do, that they can't currently?"
>> >
>> > Is it unclear that this is focusing specifically on people-visible
>> > behavior?
>> >
>> > Rejected goals -  are important because otherwise people keep trying
>> > to argue about scope.  Of course you can change things later with a
>> > different SIP and different vote, the point is to focus.
>> >
>> > Use cases - are something that people are going to bring up in
>> > discussion.  If they aren't clearly documented as a goal ("This must
>> > allow me to connect using SSL"), they should be added.
>> >
>> > Internal architecture - if the people who need specific behavior are
>> > implementers of other parts of the system, that's fine.
>> >
>> > Rejected strategies - If you have none of these, you have no evidence
>> > that the proponent didn't just go with the first thing they had in
>> > mind (or have already implemented), which is a big problem currently.
>> > Approval isn't binding as to specifics of implementation, so these
>> > aren't handcuffs.  The goals are the contract, the strategy is
>> > evidence that contract can actually be met.
>> >
>> > Design docs - I'm not touching design docs.  The markdown file I
>> > linked specifically says of the strategy section "This is not a full
>> > design document."  Is this unclear?  Design docs can be worked on
>> > obviously, but that's not what I'm concerned with here.
>> >
>> >
>> >
>> >
>> > On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[hidden email]>
>> > wrote:
>> >> Hi Cody,
>> >>
>> >> I think this would be a lot more concrete if we had a more detailed
>> >> template
>> >> for SIPs. Right now, it's not super clear what's in scope -- e.g. are
>> >> they
>> >> a way to solicit feedback on the user-facing behavior or on the
>> >> internals?
>> >> "Goals" can cover both things. I've been thinking of SIPs more as
>> >> Product
>> >> Requirements Docs (PRDs), which focus on *what* a code change should do
>> >> as
>> >> opposed to how.
>> >>
>> >> In particular, here are some things that you may or may not consider in
>> >> scope for SIPs:
>> >>
>> >> - Goals and non-goals: This is definitely in scope, and IMO should
>> >> focus on
>> >> user-visible behavior (e.g. "system supports SQL window functions" or
>> >> "system continues working if one node fails"). BTW I wouldn't say
>> >> "rejected
>> >> goals" because some of them might become goals later, so we're not
>> >> definitively rejecting them.
>> >>
>> >> - Public API: Probably should be included in most SIPs unless it's too
>> >> large
>> >> to fully specify then (e.g. "let's add an ML library").
>> >>
>> >> - Use cases: I usually find this very useful in PRDs to better
>> >> communicate
>> >> the goals.
>> >>
>> >> - Internal architecture: This is usually *not* a thing users can easily
>> >> comment on and it sounds more like a design doc item. Of course it's
>> >> important to show that the SIP is feasible to implement. One exception,
>> >> however, is that I think we'll have some SIPs primarily on internals
>> >> (e.g.
>> >> if somebody wants to refactor Spark's query optimizer or something).
>> >>
>> >> - Rejected strategies: I personally wouldn't put this, because what's
>> >> the
>> >> point of voting to reject a strategy before you've really begun
>> >> designing
>> >> and implementing something? What if you discover that the strategy is
>> >> actually better when you start doing stuff?
>> >>
>> >> At a super high level, it depends on whether you want the SIPs to be
>> >> PRDs
>> >> for getting some quick feedback on the goals of a feature before it is
>> >> designed, or something more like full-fledged design docs (just a more
>> >> visible design doc for bigger changes). I looked at Kafka's KIPs, and
>> >> they
>> >> actually seem to be more like design docs. This can work too but it
>> >> does
>> >> require more work from the proposer and it can lead to the same
>> >> problems you
>> >> mentioned with people already having a design and implementation in
>> >> mind.
>> >>
>> >> Basically, the question is, are you trying to iterate faster on design
>> >> by
>> >> adding a step for user feedback earlier? Or are you just trying to make
>> >> design docs for key features more visible (and their approval more
>> >> formal)?
>> >>
>> >> BTW note that in either case, I'd like to have a template for design
>> >> docs
>> >> too, which should also include goals. I think that would've avoided
>> >> some of
>> >> the issues you brought up.
>> >>
>> >> Matei
>> >>
>> >> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]> wrote:
>> >>
>> >> Here's my specific proposal (meta-proposal?)
>> >>
>> >> Spark Improvement Proposals (SIP)
>> >>
>> >>
>> >> Background:
>> >>
>> >> The current problem is that design and implementation of large features
>> >> are
>> >> often done in private, before soliciting user feedback.
>> >>
>> >> When feedback is solicited, it is often as to detailed design
>> >> specifics, not
>> >> focused on goals.
>> >>
>> >> When implementation does take place after design, there is often
>> >> disagreement as to what goals are or are not in scope.
>> >>
>> >> This results in commits that don't fully meet user needs.
>> >>
>> >>
>> >> Goals:
>> >>
>> >> - Ensure user, contributor, and committer goals are clearly identified
>> >> and
>> >> agreed upon, before implementation takes place.
>> >>
>> >> - Ensure that a technically feasible strategy is chosen that is likely
>> >> to
>> >> meet the goals.
>> >>
>> >>
>> >> Rejected Goals:
>> >>
>> >> - SIPs are not for detailed design.  Design by committee doesn't work.
>> >>
>> >> - SIPs are not for every change.  We dont need that much process.
>> >>
>> >>
>> >> Strategy:
>> >>
>> >> My suggestion is outlined as a Spark Improvement Proposal process
>> >> documented
>> >> at
>> >>
>> >>
>> >> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>> >>
>> >> Specifics of Jira manipulation are an implementation detail we can
>> >> figure
>> >> out.
>> >>
>> >> I'm suggesting voting; the need here is for a _clear_ outcome.
>> >>
>> >>
>> >> Rejected Strategies:
>> >>
>> >> Having someone who understands the problem implement it first works,
>> >> but
>> >> only if significant iteration after user feedback is allowed.
>> >>
>> >> Historically this has been problematic due to pressure to limit public
>> >> api
>> >> changes.
>> >>
>> >>
>> >> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]>
>> >> wrote:
>> >>>
>> >>> Alright looks like there are quite a bit of support. We should wait to
>> >>> hear from more people too.
>> >>>
>> >>> To push this forward, Cody and I will be working together in the next
>> >>> couple of weeks to come up with a concrete, detailed proposal on what
>> >>> this
>> >>> entails, and then we can discuss this the specific proposal as well.
>> >>>
>> >>>
>> >>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden email]>
>> >>> wrote:
>> >>>>
>> >>>> Yeah, in case it wasn't clear, I was talking about SIPs for major
>> >>>> user-facing or cross-cutting changes, not minor feature adds.
>> >>>>
>> >>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos
>> >>>> <[hidden email]> wrote:
>> >>>>>
>> >>>>> +1 to the SIP label as long as it does not slow down things and it
>> >>>>> targets optimizing efforts, coordination etc. For example really
>> >>>>> small
>> >>>>> features should not need to go through this process (assuming they
>> >>>>> dont
>> >>>>> touch public interfaces)  or re-factorings and hope it will be kept
>> >>>>> this
>> >>>>> way. So as a guideline doc should be provided, like in the KIP case.
>> >>>>>
>> >>>>> IMHO so far aside from tagging things and linking them elsewhere
>> >>>>> simply
>> >>>>> having design docs and prototypes implementations in PRs is not
>> >>>>> something
>> >>>>> that has not worked so far. What is really a pain in many projects
>> >>>>> out there
>> >>>>> is discontinuity in progress of PRs, missing features, slow reviews
>> >>>>> which is
>> >>>>> understandable to some extent... it is not only about Spark but
>> >>>>> things can
>> >>>>> be improved for sure for this project in particular as already
>> >>>>> stated.
>> >>>>>
>> >>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden email]>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> +1 to adding an SIP label and linking it from the website.  I think
>> >>>>>> it
>> >>>>>> needs
>> >>>>>>
>> >>>>>> - template that focuses it towards soliciting user goals / non
>> >>>>>> goals
>> >>>>>> - clear resolution as to which strategy was chosen to pursue.  I'd
>> >>>>>> recommend a vote.
>> >>>>>>
>> >>>>>> Matei asked me to clarify what I meant by changing interfaces, I
>> >>>>>> think
>> >>>>>> it's directly relevant to the SIP idea so I'll clarify here, and
>> >>>>>> split
>> >>>>>> a thread for the other discussion per Nicholas' request.
>> >>>>>>
>> >>>>>> I meant changing public user interfaces.  I think the first design
>> >>>>>> is
>> >>>>>> unlikely to be right, because it's done at a time when you have the
>> >>>>>> least information.  As a user, I find it considerably more
>> >>>>>> frustrating
>> >>>>>> to be unable to use a tool to get my job done, than I do having to
>> >>>>>> make minor changes to my code in order to take advantage of
>> >>>>>> features.
>> >>>>>> I've seen committers be seriously reluctant to allow changes to
>> >>>>>> @experimental code that are needed in order for it to really work
>> >>>>>> right.  You need to be able to iterate, and if people on both sides
>> >>>>>> of
>> >>>>>> the fence aren't going to respect that some newer apis are subject
>> >>>>>> to
>> >>>>>> change, then why even mark them as such?
>> >>>>>>
>> >>>>>> Ideally a finished SIP should give me a checklist of things that an
>> >>>>>> implementation must do, and things that it doesn't need to do.
>> >>>>>> Contributors/committers should be seriously discouraged from
>> >>>>>> putting
>> >>>>>> out a version 0.1 that doesn't have at least a prototype
>> >>>>>> implementation of all those things, especially if they're then
>> >>>>>> going
>> >>>>>> to argue against interface changes necessary to get the the rest of
>> >>>>>> the things done in the 0.2 version.
>> >>>>>>
>> >>>>>>
>> >>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden email]>
>> >>>>>> wrote:
>> >>>>>>> I like the lightweight proposal to add a SIP label.
>> >>>>>>>
>> >>>>>>> During Spark 2.0 development, Tom (Graves) and I suggested using
>> >>>>>>> wiki
>> >>>>>>> to
>> >>>>>>> track the list of major changes, but that never really
>> >>>>>>> materialized
>> >>>>>>> due to
>> >>>>>>> the overhead. Adding a SIP label on major JIRAs and then link to
>> >>>>>>> them
>> >>>>>>> prominently on the Spark website makes a lot of sense.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia
>> >>>>>>> <[hidden email]>
>> >>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>> For the improvement proposals, I think one major point was to
>> >>>>>>>> make
>> >>>>>>>> them
>> >>>>>>>> really visible to users who are not contributors, so we should do
>> >>>>>>>> more than
>> >>>>>>>> sending stuff to dev@. One very lightweight idea is to have a new
>> >>>>>>>> type of
>> >>>>>>>> JIRA called a SIP and have a link to a filter that shows all such
>> >>>>>>>> JIRAs from
>> >>>>>>>> http://spark.apache.org. I also like the idea of SIP and design
>> >>>>>>>> doc
>> >>>>>>>> templates (in fact many projects have them).
>> >>>>>>>>
>> >>>>>>>> Matei
>> >>>>>>>>
>> >>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>> I called Cody last night and talked about some of the topics in
>> >>>>>>>> his
>> >>>>>>>> email.
>> >>>>>>>> It became clear to me Cody genuinely cares about the project.
>> >>>>>>>>
>> >>>>>>>> Some of the frustrations come from the success of the project
>> >>>>>>>> itself
>> >>>>>>>> becoming very "hot", and it is difficult to get clarity from
>> >>>>>>>> people
>> >>>>>>>> who
>> >>>>>>>> don't dedicate all their time to Spark. In fact, it is in some
>> >>>>>>>> ways
>> >>>>>>>> similar
>> >>>>>>>> to scaling an engineering team in a successful startup: old
>> >>>>>>>> processes that
>> >>>>>>>> worked well might not work so well when it gets to a certain
>> >>>>>>>> size,
>> >>>>>>>> cultures
>> >>>>>>>> can get diluted, building culture vs building process, etc.
>> >>>>>>>>
>> >>>>>>>> I also really like to have a more visible process for larger
>> >>>>>>>> changes,
>> >>>>>>>> especially major user facing API changes. Historically we upload
>> >>>>>>>> design docs
>> >>>>>>>> for major changes, but it is not always consistent and difficult
>> >>>>>>>> to
>> >>>>>>>> quality
>> >>>>>>>> of the docs, due to the volunteering nature of the organization.
>> >>>>>>>>
>> >>>>>>>> Some of the more concrete ideas we discussed focus on building a
>> >>>>>>>> culture
>> >>>>>>>> to improve clarity:
>> >>>>>>>>
>> >>>>>>>> - Process: Large changes should have design docs posted on JIRA.
>> >>>>>>>> One
>> >>>>>>>> thing
>> >>>>>>>> Cody and I didn't discuss but an idea that just came to me is we
>> >>>>>>>> should
>> >>>>>>>> create a design doc template for the project and ask everybody to
>> >>>>>>>> follow.
>> >>>>>>>> The design doc template should also explicitly list goals and
>> >>>>>>>> non-goals, to
>> >>>>>>>> make design doc more consistent.
>> >>>>>>>>
>> >>>>>>>> - Process: Email dev@ to solicit feedback. We have some this with
>> >>>>>>>> some
>> >>>>>>>> changes, but again very inconsistent. Just posting something on
>> >>>>>>>> JIRA
>> >>>>>>>> isn't
>> >>>>>>>> sufficient, because there are simply too many JIRAs and the
>> >>>>>>>> signal
>> >>>>>>>> get lost
>> >>>>>>>> in the noise. While this is generally impossible to enforce
>> >>>>>>>> because
>> >>>>>>>> we can't
>> >>>>>>>> force all volunteers to conform to a process (or they might not
>> >>>>>>>> even
>> >>>>>>>> be
>> >>>>>>>> aware of this),  those who are more familiar with the project can
>> >>>>>>>> help by
>> >>>>>>>> emailing the dev@ when they see something that hasn't been.
>> >>>>>>>>
>> >>>>>>>> - Culture: The design doc author(s) should be open to feedback. A
>> >>>>>>>> design
>> >>>>>>>> doc should serve as the base for discussion and is by no means
>> >>>>>>>> the
>> >>>>>>>> final
>> >>>>>>>> design. Of course, this does not mean the author has to accept
>> >>>>>>>> every
>> >>>>>>>> feedback. They should also be comfortable accepting / rejecting
>> >>>>>>>> ideas on
>> >>>>>>>> technical grounds.
>> >>>>>>>>
>> >>>>>>>> - Process / Culture: For major ongoing projects, it can be useful
>> >>>>>>>> to
>> >>>>>>>> have
>> >>>>>>>> some monthly Google hangouts that are open to the world. I am
>> >>>>>>>> actually not
>> >>>>>>>> sure how well this will work, because of the volunteering nature
>> >>>>>>>> and
>> >>>>>>>> we need
>> >>>>>>>> to adjust for timezones for people across the globe, but it seems
>> >>>>>>>> worth
>> >>>>>>>> trying.
>> >>>>>>>>
>> >>>>>>>> - Culture: Contributors (including committers) should be more
>> >>>>>>>> direct
>> >>>>>>>> in
>> >>>>>>>> setting expectations, including whether they are working on a
>> >>>>>>>> specific
>> >>>>>>>> issue, whether they will be working on a specific issue, and
>> >>>>>>>> whether
>> >>>>>>>> an
>> >>>>>>>> issue or pr or jira should be rejected. Most people I know in
>> >>>>>>>> this
>> >>>>>>>> community
>> >>>>>>>> are nice and don't enjoy telling other people no, but it is often
>> >>>>>>>> more
>> >>>>>>>> annoying to a contributor to not know anything than getting a no.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia
>> >>>>>>>> <[hidden email]>
>> >>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Love the idea of a more visible "Spark Improvement Proposal"
>> >>>>>>>>> process that
>> >>>>>>>>> solicits user input on new APIs. For what it's worth, I don't
>> >>>>>>>>> think
>> >>>>>>>>> committers are trying to minimize their own work -- every
>> >>>>>>>>> committer
>> >>>>>>>>> cares
>> >>>>>>>>> about making the software useful for users. However, it is
>> >>>>>>>>> always
>> >>>>>>>>> hard to
>> >>>>>>>>> get user input and so it helps to have this kind of process.
>> >>>>>>>>> I've
>> >>>>>>>>> certainly
>> >>>>>>>>> looked at the *IPs a lot in other software I use just to see the
>> >>>>>>>>> biggest
>> >>>>>>>>> things on the roadmap.
>> >>>>>>>>>
>> >>>>>>>>> When you're talking about "changing interfaces", are you talking
>> >>>>>>>>> about
>> >>>>>>>>> public or internal APIs? I do think many people hate changing
>> >>>>>>>>> public APIs
>> >>>>>>>>> and I actually think that's for the best of the project. That's
>> >>>>>>>>> a
>> >>>>>>>>> technical
>> >>>>>>>>> debate, but basically, the worst thing when you're using a piece
>> >>>>>>>>> of
>> >>>>>>>>> software
>> >>>>>>>>> is that the developers constantly ask you to rewrite your app to
>> >>>>>>>>> update to a
>> >>>>>>>>> new version (and thus benefit from bug fixes, etc). Cue anyone
>> >>>>>>>>> who's used
>> >>>>>>>>> Protobuf, or Guava. The "let's get everyone to change their code
>> >>>>>>>>> this
>> >>>>>>>>> release" model works well within a single large company, but
>> >>>>>>>>> doesn't work
>> >>>>>>>>> well for a community, which is why nearly all *very* widely used
>> >>>>>>>>> programming
>> >>>>>>>>> interfaces (I'm talking things like Java standard library,
>> >>>>>>>>> Windows
>> >>>>>>>>> API, etc)
>> >>>>>>>>> almost *never* break backwards compatibility. All this is done
>> >>>>>>>>> within reason
>> >>>>>>>>> though, e.g. we do change things in major releases (2.x, 3.x,
>> >>>>>>>>> etc).
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> ---------------------------------------------------------------------
>> >>>>>> To unsubscribe e-mail: [hidden email]
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Stavros Kontopoulos
>> >>>>> Senior Software Engineer
>> >>>>> Lightbend, Inc.
>> >>>>> p:  +30 6977967274
>> >>>>> e: [hidden email]
>> >>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >>
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Spark Improvement Proposals

assaf.mendelson

I agree with most of what Cody said.

Two things:

First we can always have other people suggest SIPs but mark them as “unreviewed” and have committers basically move them forward. The problem is that writing a good document takes time. This way we can leverage non committers to do some of this work (it is just another way to contribute).

 

As for strategy, in many cases implementation strategy can affect the goals. I will give  a small example: In the current structured streaming strategy, we group by the time to achieve a sliding window. This is definitely an implementation decision and not a goal. However, I can think of several aggregation functions which have the time inside their calculation buffer. For example, let’s say we want to return a set of all distinct values. One way to implement this would be to make the set into a map and have the value contain the last time seen. Multiplying it across the groupby would cost a lot in performance. So adding such a strategy would have a great effect on the type of aggregations and their performance which does affect the goal. Without adding the strategy, it is easy for whoever goes to the design document to not think about these cases. Furthermore, it might be decided that these cases are rare enough so that the strategy is still good enough but how would we know it without user feedback?

I believe this example is exactly what Cody was talking about. Since many times implementation strategies have a large effect on the goal, we should have it discussed when discussing the goals. In addition, while it is often easy to throw out completely infeasible goals, it is often much harder to figure out that the goals are unfeasible without fine tuning.

 

 

Assaf.

 

From: Cody Koeninger-2 [via Apache Spark Developers List] [mailto:ml-node+[hidden email]]
Sent: Monday, October 10, 2016 2:25 AM
To: Mendelson, Assaf
Subject: Re: Spark Improvement Proposals

 

Only committers should formally submit SIPs because in an apache
project only commiters have explicit political power.  If a user can't
find a commiter willing to sponsor an SIP idea, they have no way to
get the idea passed in any case.  If I can't find a committer to
sponsor this meta-SIP idea, I'm out of luck.

I do not believe unrealistic goals can be found solely by inspection.
We've managed to ignore unrealistic goals even after implementation!
Focusing on APIs can allow people to think they've solved something,
when there's really no way of implementing that API while meeting the
goals.  Rapid iteration is clearly the best way to address this, but
we've already talked about why that hasn't really worked.  If adding a
non-binding API section to the template is important to you, I'm not
against it, but I don't think it's sufficient.

On your PRD vs design doc spectrum, I'm saying this is closer to a
PRD.  Clear agreement on goals is the most important thing and that's
why it's the thing I want binding agreement on.  But I cannot agree to
goals unless I have enough minimal technical info to judge whether the
goals are likely to actually be accomplished.



On Sun, Oct 9, 2016 at 5:35 PM, Matei Zaharia <[hidden email]> wrote:


> Well, I think there are a few things here that don't make sense. First, why
> should only committers submit SIPs? Development in the project should be
> open to all contributors, whether they're committers or not. Second, I think
> unrealistic goals can be found just by inspecting the goals, and I'm not
> super worried that we'll accept a lot of SIPs that are then infeasible -- we
> can then submit new ones. But this depends on whether you want this process
> to be a "design doc lite", where people also agree on implementation
> strategy, or just a way to agree on goals. This is what I asked earlier
> about PRDs vs design docs (and I'm open to either one but I'd just like
> clarity). Finally, both as a user and designer of software, I always want to
> give feedback on APIs, so I'd really like a culture of having those early.
> People don't argue about prettiness when they discuss APIs, they argue about
> the core concepts to expose in order to meet various goals, and then they're
> stuck maintaining those for a long time.
>
> Matei
>
> On Oct 9, 2016, at 3:10 PM, Cody Koeninger <[hidden email]> wrote:
>
> Users instead of people, sure.  Commiters and contributors are (or at least
> should be) a subset of users.
>
> Non goals, sure. I don't care what the name is, but we need to clearly say
> e.g. 'no we are not maintaining compatibility with XYZ right now'.
>
> API, what I care most about is whether it allows me to accomplish the goals.
> Arguing about how ugly or pretty it is can be saved for design/
> implementation imho.
>
> Strategy, this is necessary because otherwise goals can be out of line with
> reality.  Don't propose goals you don't have at least some idea of how to
> implement.
>
> Rejected strategies, given that commiters are the only ones I'm saying
> should formally submit SPARKLIs or SIPs, if they put junk in a required
> section then slap them down for it and tell them to fix it.
>
>
> On Oct 9, 2016 4:36 PM, "Matei Zaharia" <[hidden email]> wrote:
>>
>> Yup, this is the stuff that I found unclear. Thanks for clarifying here,
>> but we should also clarify it in the writeup. In particular:
>>
>> - Goals needs to be about user-facing behavior ("people" is broad)
>>
>> - I'd rename Rejected Goals to Non-Goals. Otherwise someone will dig up
>> one of these and say "Spark's developers have officially rejected X, which
>> our awesome system has".
>>
>> - For user-facing stuff, I think you need a section on API. Virtually all
>> other *IPs I've seen have that.
>>
>> - I'm still not sure why the strategy section is needed if the purpose is
>> to define user-facing behavior -- unless this is the strategy for setting
>> the goals or for defining the API. That sounds squarely like a design doc
>> issue. In some sense, who cares whether the proposal is technically feasible
>> right now? If it's infeasible, that will be discovered later during design
>> and implementation. Same thing with rejected strategies -- listing some of
>> those is definitely useful sometimes, but if you make this a *required*
>> section, people are just going to fill it in with bogus stuff (I've seen
>> this happen before).
>>
>> Matei
>>
>> > On Oct 9, 2016, at 2:14 PM, Cody Koeninger <[hidden email]> wrote:
>> >
>> > So to focus the discussion on the specific strategy I'm suggesting,
>> > documented at
>> >
>> >
>> > https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>> >
>> > "Goals: What must this allow people to do, that they can't currently?"
>> >
>> > Is it unclear that this is focusing specifically on people-visible
>> > behavior?
>> >
>> > Rejected goals -  are important because otherwise people keep trying
>> > to argue about scope.  Of course you can change things later with a
>> > different SIP and different vote, the point is to focus.
>> >
>> > Use cases - are something that people are going to bring up in
>> > discussion.  If they aren't clearly documented as a goal ("This must
>> > allow me to connect using SSL"), they should be added.
>> >
>> > Internal architecture - if the people who need specific behavior are
>> > implementers of other parts of the system, that's fine.
>> >
>> > Rejected strategies - If you have none of these, you have no evidence
>> > that the proponent didn't just go with the first thing they had in
>> > mind (or have already implemented), which is a big problem currently.
>> > Approval isn't binding as to specifics of implementation, so these
>> > aren't handcuffs.  The goals are the contract, the strategy is
>> > evidence that contract can actually be met.
>> >
>> > Design docs - I'm not touching design docs.  The markdown file I
>> > linked specifically says of the strategy section "This is not a full
>> > design document."  Is this unclear?  Design docs can be worked on
>> > obviously, but that's not what I'm concerned with here.
>> >
>> >
>> >
>> >
>> > On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[hidden email]>
>> > wrote:
>> >> Hi Cody,
>> >>
>> >> I think this would be a lot more concrete if we had a more detailed
>> >> template
>> >> for SIPs. Right now, it's not super clear what's in scope -- e.g. are
>> >> they
>> >> a way to solicit feedback on the user-facing behavior or on the
>> >> internals?
>> >> "Goals" can cover both things. I've been thinking of SIPs more as
>> >> Product
>> >> Requirements Docs (PRDs), which focus on *what* a code change should do
>> >> as
>> >> opposed to how.
>> >>
>> >> In particular, here are some things that you may or may not consider in
>> >> scope for SIPs:
>> >>
>> >> - Goals and non-goals: This is definitely in scope, and IMO should
>> >> focus on
>> >> user-visible behavior (e.g. "system supports SQL window functions" or
>> >> "system continues working if one node fails"). BTW I wouldn't say
>> >> "rejected
>> >> goals" because some of them might become goals later, so we're not
>> >> definitively rejecting them.
>> >>
>> >> - Public API: Probably should be included in most SIPs unless it's too
>> >> large
>> >> to fully specify then (e.g. "let's add an ML library").
>> >>
>> >> - Use cases: I usually find this very useful in PRDs to better
>> >> communicate
>> >> the goals.
>> >>
>> >> - Internal architecture: This is usually *not* a thing users can easily
>> >> comment on and it sounds more like a design doc item. Of course it's
>> >> important to show that the SIP is feasible to implement. One exception,
>> >> however, is that I think we'll have some SIPs primarily on internals
>> >> (e.g.
>> >> if somebody wants to refactor Spark's query optimizer or something).
>> >>
>> >> - Rejected strategies: I personally wouldn't put this, because what's
>> >> the
>> >> point of voting to reject a strategy before you've really begun
>> >> designing
>> >> and implementing something? What if you discover that the strategy is
>> >> actually better when you start doing stuff?
>> >>
>> >> At a super high level, it depends on whether you want the SIPs to be
>> >> PRDs
>> >> for getting some quick feedback on the goals of a feature before it is
>> >> designed, or something more like full-fledged design docs (just a more
>> >> visible design doc for bigger changes). I looked at Kafka's KIPs, and
>> >> they
>> >> actually seem to be more like design docs. This can work too but it
>> >> does
>> >> require more work from the proposer and it can lead to the same
>> >> problems you
>> >> mentioned with people already having a design and implementation in
>> >> mind.
>> >>
>> >> Basically, the question is, are you trying to iterate faster on design
>> >> by
>> >> adding a step for user feedback earlier? Or are you just trying to make
>> >> design docs for key features more visible (and their approval more
>> >> formal)?
>> >>
>> >> BTW note that in either case, I'd like to have a template for design
>> >> docs
>> >> too, which should also include goals. I think that would've avoided
>> >> some of
>> >> the issues you brought up.
>> >>
>> >> Matei
>> >>
>> >> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]> wrote:
>> >>
>> >> Here's my specific proposal (meta-proposal?)
>> >>
>> >> Spark Improvement Proposals (SIP)
>> >>
>> >>
>> >> Background:
>> >>
>> >> The current problem is that design and implementation of large features
>> >> are
>> >> often done in private, before soliciting user feedback.
>> >>
>> >> When feedback is solicited, it is often as to detailed design
>> >> specifics, not
>> >> focused on goals.
>> >>
>> >> When implementation does take place after design, there is often
>> >> disagreement as to what goals are or are not in scope.
>> >>
>> >> This results in commits that don't fully meet user needs.
>> >>
>> >>
>> >> Goals:
>> >>
>> >> - Ensure user, contributor, and committer goals are clearly identified
>> >> and
>> >> agreed upon, before implementation takes place.
>> >>
>> >> - Ensure that a technically feasible strategy is chosen that is likely
>> >> to
>> >> meet the goals.
>> >>
>> >>
>> >> Rejected Goals:
>> >>
>> >> - SIPs are not for detailed design.  Design by committee doesn't work.
>> >>
>> >> - SIPs are not for every change.  We dont need that much process.
>> >>
>> >>
>> >> Strategy:
>> >>
>> >> My suggestion is outlined as a Spark Improvement Proposal process
>> >> documented
>> >> at
>> >>
>> >>
>> >> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>> >>
>> >> Specifics of Jira manipulation are an implementation detail we can
>> >> figure
>> >> out.
>> >>
>> >> I'm suggesting voting; the need here is for a _clear_ outcome.
>> >>
>> >>
>> >> Rejected Strategies:
>> >>
>> >> Having someone who understands the problem implement it first works,
>> >> but
>> >> only if significant iteration after user feedback is allowed.
>> >>
>> >> Historically this has been problematic due to pressure to limit public
>> >> api
>> >> changes.
>> >>
>> >>
>> >> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]>
>> >> wrote:
>> >>>
>> >>> Alright looks like there are quite a bit of support. We should wait to
>> >>> hear from more people too.
>> >>>
>> >>> To push this forward, Cody and I will be working together in the next
>> >>> couple of weeks to come up with a concrete, detailed proposal on what
>> >>> this
>> >>> entails, and then we can discuss this the specific proposal as well.
>> >>>
>> >>>
>> >>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden email]>
>> >>> wrote:
>> >>>>
>> >>>> Yeah, in case it wasn't clear, I was talking about SIPs for major
>> >>>> user-facing or cross-cutting changes, not minor feature adds.
>> >>>>
>> >>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos
>> >>>> <[hidden email]> wrote:
>> >>>>>
>> >>>>> +1 to the SIP label as long as it does not slow down things and it
>> >>>>> targets optimizing efforts, coordination etc. For example really
>> >>>>> small
>> >>>>> features should not need to go through this process (assuming they
>> >>>>> dont
>> >>>>> touch public interfaces)  or re-factorings and hope it will be kept
>> >>>>> this
>> >>>>> way. So as a guideline doc should be provided, like in the KIP case.
>> >>>>>
>> >>>>> IMHO so far aside from tagging things and linking them elsewhere
>> >>>>> simply
>> >>>>> having design docs and prototypes implementations in PRs is not
>> >>>>> something
>> >>>>> that has not worked so far. What is really a pain in many projects
>> >>>>> out there
>> >>>>> is discontinuity in progress of PRs, missing features, slow reviews
>> >>>>> which is
>> >>>>> understandable to some extent... it is not only about Spark but
>> >>>>> things can
>> >>>>> be improved for sure for this project in particular as already
>> >>>>> stated.
>> >>>>>
>> >>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden email]>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> +1 to adding an SIP label and linking it from the website.  I think
>> >>>>>> it
>> >>>>>> needs
>> >>>>>>
>> >>>>>> - template that focuses it towards soliciting user goals / non
>> >>>>>> goals
>> >>>>>> - clear resolution as to which strategy was chosen to pursue.  I'd
>> >>>>>> recommend a vote.
>> >>>>>>
>> >>>>>> Matei asked me to clarify what I meant by changing interfaces, I
>> >>>>>> think
>> >>>>>> it's directly relevant to the SIP idea so I'll clarify here, and
>> >>>>>> split
>> >>>>>> a thread for the other discussion per Nicholas' request.
>> >>>>>>
>> >>>>>> I meant changing public user interfaces.  I think the first design
>> >>>>>> is
>> >>>>>> unlikely to be right, because it's done at a time when you have the
>> >>>>>> least information.  As a user, I find it considerably more
>> >>>>>> frustrating
>> >>>>>> to be unable to use a tool to get my job done, than I do having to
>> >>>>>> make minor changes to my code in order to take advantage of
>> >>>>>> features.
>> >>>>>> I've seen committers be seriously reluctant to allow changes to
>> >>>>>> @experimental code that are needed in order for it to really work
>> >>>>>> right.  You need to be able to iterate, and if people on both sides
>> >>>>>> of
>> >>>>>> the fence aren't going to respect that some newer apis are subject
>> >>>>>> to
>> >>>>>> change, then why even mark them as such?
>> >>>>>>
>> >>>>>> Ideally a finished SIP should give me a checklist of things that an
>> >>>>>> implementation must do, and things that it doesn't need to do.
>> >>>>>> Contributors/committers should be seriously discouraged from
>> >>>>>> putting
>> >>>>>> out a version 0.1 that doesn't have at least a prototype
>> >>>>>> implementation of all those things, especially if they're then
>> >>>>>> going
>> >>>>>> to argue against interface changes necessary to get the the rest of
>> >>>>>> the things done in the 0.2 version.
>> >>>>>>
>> >>>>>>
>> >>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden email]>
>> >>>>>> wrote:
>> >>>>>>> I like the lightweight proposal to add a SIP label.
>> >>>>>>>
>> >>>>>>> During Spark 2.0 development, Tom (Graves) and I suggested using
>> >>>>>>> wiki
>> >>>>>>> to
>> >>>>>>> track the list of major changes, but that never really
>> >>>>>>> materialized
>> >>>>>>> due to
>> >>>>>>> the overhead. Adding a SIP label on major JIRAs and then link to
>> >>>>>>> them
>> >>>>>>> prominently on the Spark website makes a lot of sense.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia
>> >>>>>>> <[hidden email]>
>> >>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>> For the improvement proposals, I think one major point was to
>> >>>>>>>> make
>> >>>>>>>> them
>> >>>>>>>> really visible to users who are not contributors, so we should do
>> >>>>>>>> more than
>> >>>>>>>> sending stuff to dev@. One very lightweight idea is to have a new
>> >>>>>>>> type of
>> >>>>>>>> JIRA called a SIP and have a link to a filter that shows all such
>> >>>>>>>> JIRAs from
>> >>>>>>>> http://spark.apache.org. I also like the idea of SIP and design
>> >>>>>>>> doc
>> >>>>>>>> templates (in fact many projects have them).
>> >>>>>>>>
>> >>>>>>>> Matei
>> >>>>>>>>
>> >>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>> I called Cody last night and talked about some of the topics in
>> >>>>>>>> his
>> >>>>>>>> email.
>> >>>>>>>> It became clear to me Cody genuinely cares about the project.
>> >>>>>>>>
>> >>>>>>>> Some of the frustrations come from the success of the project
>> >>>>>>>> itself
>> >>>>>>>> becoming very "hot", and it is difficult to get clarity from
>> >>>>>>>> people
>> >>>>>>>> who
>> >>>>>>>> don't dedicate all their time to Spark. In fact, it is in some
>> >>>>>>>> ways
>> >>>>>>>> similar
>> >>>>>>>> to scaling an engineering team in a successful startup: old
>> >>>>>>>> processes that
>> >>>>>>>> worked well might not work so well when it gets to a certain
>> >>>>>>>> size,
>> >>>>>>>> cultures
>> >>>>>>>> can get diluted, building culture vs building process, etc.
>> >>>>>>>>
>> >>>>>>>> I also really like to have a more visible process for larger
>> >>>>>>>> changes,
>> >>>>>>>> especially major user facing API changes. Historically we upload
>> >>>>>>>> design docs
>> >>>>>>>> for major changes, but it is not always consistent and difficult
>> >>>>>>>> to
>> >>>>>>>> quality
>> >>>>>>>> of the docs, due to the volunteering nature of the organization.
>> >>>>>>>>
>> >>>>>>>> Some of the more concrete ideas we discussed focus on building a
>> >>>>>>>> culture
>> >>>>>>>> to improve clarity:
>> >>>>>>>>
>> >>>>>>>> - Process: Large changes should have design docs posted on JIRA.
>> >>>>>>>> One
>> >>>>>>>> thing
>> >>>>>>>> Cody and I didn't discuss but an idea that just came to me is we
>> >>>>>>>> should
>> >>>>>>>> create a design doc template for the project and ask everybody to
>> >>>>>>>> follow.
>> >>>>>>>> The design doc template should also explicitly list goals and
>> >>>>>>>> non-goals, to
>> >>>>>>>> make design doc more consistent.
>> >>>>>>>>
>> >>>>>>>> - Process: Email dev@ to solicit feedback. We have some this with
>> >>>>>>>> some
>> >>>>>>>> changes, but again very inconsistent. Just posting something on
>> >>>>>>>> JIRA
>> >>>>>>>> isn't
>> >>>>>>>> sufficient, because there are simply too many JIRAs and the
>> >>>>>>>> signal
>> >>>>>>>> get lost
>> >>>>>>>> in the noise. While this is generally impossible to enforce
>> >>>>>>>> because
>> >>>>>>>> we can't
>> >>>>>>>> force all volunteers to conform to a process (or they might not
>> >>>>>>>> even
>> >>>>>>>> be
>> >>>>>>>> aware of this),  those who are more familiar with the project can
>> >>>>>>>> help by
>> >>>>>>>> emailing the dev@ when they see something that hasn't been.
>> >>>>>>>>
>> >>>>>>>> - Culture: The design doc author(s) should be open to feedback. A
>> >>>>>>>> design
>> >>>>>>>> doc should serve as the base for discussion and is by no means
>> >>>>>>>> the
>> >>>>>>>> final
>> >>>>>>>> design. Of course, this does not mean the author has to accept
>> >>>>>>>> every
>> >>>>>>>> feedback. They should also be comfortable accepting / rejecting
>> >>>>>>>> ideas on
>> >>>>>>>> technical grounds.
>> >>>>>>>>
>> >>>>>>>> - Process / Culture: For major ongoing projects, it can be useful
>> >>>>>>>> to
>> >>>>>>>> have
>> >>>>>>>> some monthly Google hangouts that are open to the world. I am
>> >>>>>>>> actually not
>> >>>>>>>> sure how well this will work, because of the volunteering nature
>> >>>>>>>> and
>> >>>>>>>> we need
>> >>>>>>>> to adjust for timezones for people across the globe, but it seems
>> >>>>>>>> worth
>> >>>>>>>> trying.
>> >>>>>>>>
>> >>>>>>>> - Culture: Contributors (including committers) should be more
>> >>>>>>>> direct
>> >>>>>>>> in
>> >>>>>>>> setting expectations, including whether they are working on a
>> >>>>>>>> specific
>> >>>>>>>> issue, whether they will be working on a specific issue, and
>> >>>>>>>> whether
>> >>>>>>>> an
>> >>>>>>>> issue or pr or jira should be rejected. Most people I know in
>> >>>>>>>> this
>> >>>>>>>> community
>> >>>>>>>> are nice and don't enjoy telling other people no, but it is often
>> >>>>>>>> more
>> >>>>>>>> annoying to a contributor to not know anything than getting a no.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia
>> >>>>>>>> <[hidden email]>
>> >>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Love the idea of a more visible "Spark Improvement Proposal"
>> >>>>>>>>> process that
>> >>>>>>>>> solicits user input on new APIs. For what it's worth, I don't
>> >>>>>>>>> think
>> >>>>>>>>> committers are trying to minimize their own work -- every
>> >>>>>>>>> committer
>> >>>>>>>>> cares
>> >>>>>>>>> about making the software useful for users. However, it is
>> >>>>>>>>> always
>> >>>>>>>>> hard to
>> >>>>>>>>> get user input and so it helps to have this kind of process.
>> >>>>>>>>> I've
>> >>>>>>>>> certainly
>> >>>>>>>>> looked at the *IPs a lot in other software I use just to see the
>> >>>>>>>>> biggest
>> >>>>>>>>> things on the roadmap.
>> >>>>>>>>>
>> >>>>>>>>> When you're talking about "changing interfaces", are you talking
>> >>>>>>>>> about
>> >>>>>>>>> public or internal APIs? I do think many people hate changing
>> >>>>>>>>> public APIs
>> >>>>>>>>> and I actually think that's for the best of the project. That's
>> >>>>>>>>> a
>> >>>>>>>>> technical
>> >>>>>>>>> debate, but basically, the worst thing when you're using a piece
>> >>>>>>>>> of
>> >>>>>>>>> software
>> >>>>>>>>> is that the developers constantly ask you to rewrite your app to
>> >>>>>>>>> update to a
>> >>>>>>>>> new version (and thus benefit from bug fixes, etc). Cue anyone
>> >>>>>>>>> who's used
>> >>>>>>>>> Protobuf, or Guava. The "let's get everyone to change their code
>> >>>>>>>>> this
>> >>>>>>>>> release" model works well within a single large company, but
>> >>>>>>>>> doesn't work
>> >>>>>>>>> well for a community, which is why nearly all *very* widely used
>> >>>>>>>>> programming
>> >>>>>>>>> interfaces (I'm talking things like Java standard library,
>> >>>>>>>>> Windows
>> >>>>>>>>> API, etc)
>> >>>>>>>>> almost *never* break backwards compatibility. All this is done
>> >>>>>>>>> within reason
>> >>>>>>>>> though, e.g. we do change things in major releases (2.x, 3.x,
>> >>>>>>>>> etc).
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> ---------------------------------------------------------------------
>> >>>>>> To unsubscribe e-mail: [hidden email]
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Stavros Kontopoulos
>> >>>>> Senior Software Engineer
>> >>>>> Lightbend, Inc.
>> >>>>> p:  +30 6977967274
>> >>>>> e: [hidden email]
>> >>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >>
>>
>


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



If you reply to this email, your message will be added to the discussion below:

http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-Improvement-Proposals-tp19268p19359.html

To start a new topic under Apache Spark Developers List, email [hidden email]
To unsubscribe from Apache Spark Developers List, click here.
NAML

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark Improvement Proposals

Cody Koeninger-2
Yes, users suggesting SIPs is a good thing and is explicitly called
out in the linked document under the Who? section.  Formally proposing
them, not so much, because of the political realities.

Yes, implementation strategy definitely affects goals.  There are all
kinds of examples of this, I'll pick one that's my fault so as to
avoid sounding like I'm blaming:

When I implemented the Kafka DStream, one of my (not explicitly agreed
upon by the community) goals was to make sure people could use the
Dstream with however they were already using Kafka at work.  The lack
of explicit agreement on that goal led to all kinds of fighting with
committers, that could have been avoided.  The lack of explicit
up-front strategy discussion led to the DStream not really working
with compacted topics.  I knew about compacted topics, but don't have
a use for them, so had a blind spot there.  If there was explicit
up-front discussion that my strategy was "assume that batches can be
defined on the driver solely by beginning and ending offsets", there's
a greater chance that a user would have seen that and said, "hey, what
about non-contiguous offsets in a compacted topic".

This kind of thing is only going to happen smoothly if we have a
lightweight user-visible process with clear outcomes.

On Mon, Oct 10, 2016 at 1:34 AM, assaf.mendelson
<[hidden email]> wrote:

> I agree with most of what Cody said.
>
> Two things:
>
> First we can always have other people suggest SIPs but mark them as
> “unreviewed” and have committers basically move them forward. The problem is
> that writing a good document takes time. This way we can leverage non
> committers to do some of this work (it is just another way to contribute).
>
>
>
> As for strategy, in many cases implementation strategy can affect the goals.
> I will give  a small example: In the current structured streaming strategy,
> we group by the time to achieve a sliding window. This is definitely an
> implementation decision and not a goal. However, I can think of several
> aggregation functions which have the time inside their calculation buffer.
> For example, let’s say we want to return a set of all distinct values. One
> way to implement this would be to make the set into a map and have the value
> contain the last time seen. Multiplying it across the groupby would cost a
> lot in performance. So adding such a strategy would have a great effect on
> the type of aggregations and their performance which does affect the goal.
> Without adding the strategy, it is easy for whoever goes to the design
> document to not think about these cases. Furthermore, it might be decided
> that these cases are rare enough so that the strategy is still good enough
> but how would we know it without user feedback?
>
> I believe this example is exactly what Cody was talking about. Since many
> times implementation strategies have a large effect on the goal, we should
> have it discussed when discussing the goals. In addition, while it is often
> easy to throw out completely infeasible goals, it is often much harder to
> figure out that the goals are unfeasible without fine tuning.
>
>
>
>
>
> Assaf.
>
>
>
> From: Cody Koeninger-2 [via Apache Spark Developers List]
> [mailto:ml-node+[hidden email]]
> Sent: Monday, October 10, 2016 2:25 AM
> To: Mendelson, Assaf
> Subject: Re: Spark Improvement Proposals
>
>
>
> Only committers should formally submit SIPs because in an apache
> project only commiters have explicit political power.  If a user can't
> find a commiter willing to sponsor an SIP idea, they have no way to
> get the idea passed in any case.  If I can't find a committer to
> sponsor this meta-SIP idea, I'm out of luck.
>
> I do not believe unrealistic goals can be found solely by inspection.
> We've managed to ignore unrealistic goals even after implementation!
> Focusing on APIs can allow people to think they've solved something,
> when there's really no way of implementing that API while meeting the
> goals.  Rapid iteration is clearly the best way to address this, but
> we've already talked about why that hasn't really worked.  If adding a
> non-binding API section to the template is important to you, I'm not
> against it, but I don't think it's sufficient.
>
> On your PRD vs design doc spectrum, I'm saying this is closer to a
> PRD.  Clear agreement on goals is the most important thing and that's
> why it's the thing I want binding agreement on.  But I cannot agree to
> goals unless I have enough minimal technical info to judge whether the
> goals are likely to actually be accomplished.
>
>
>
> On Sun, Oct 9, 2016 at 5:35 PM, Matei Zaharia <[hidden email]> wrote:
>
>
>> Well, I think there are a few things here that don't make sense. First,
>> why
>> should only committers submit SIPs? Development in the project should be
>> open to all contributors, whether they're committers or not. Second, I
>> think
>> unrealistic goals can be found just by inspecting the goals, and I'm not
>> super worried that we'll accept a lot of SIPs that are then infeasible --
>> we
>> can then submit new ones. But this depends on whether you want this
>> process
>> to be a "design doc lite", where people also agree on implementation
>> strategy, or just a way to agree on goals. This is what I asked earlier
>> about PRDs vs design docs (and I'm open to either one but I'd just like
>> clarity). Finally, both as a user and designer of software, I always want
>> to
>> give feedback on APIs, so I'd really like a culture of having those early.
>> People don't argue about prettiness when they discuss APIs, they argue
>> about
>> the core concepts to expose in order to meet various goals, and then
>> they're
>> stuck maintaining those for a long time.
>>
>> Matei
>>
>> On Oct 9, 2016, at 3:10 PM, Cody Koeninger <[hidden email]> wrote:
>>
>> Users instead of people, sure.  Commiters and contributors are (or at
>> least
>> should be) a subset of users.
>>
>> Non goals, sure. I don't care what the name is, but we need to clearly say
>> e.g. 'no we are not maintaining compatibility with XYZ right now'.
>>
>> API, what I care most about is whether it allows me to accomplish the
>> goals.
>> Arguing about how ugly or pretty it is can be saved for design/
>> implementation imho.
>>
>> Strategy, this is necessary because otherwise goals can be out of line
>> with
>> reality.  Don't propose goals you don't have at least some idea of how to
>> implement.
>>
>> Rejected strategies, given that commiters are the only ones I'm saying
>> should formally submit SPARKLIs or SIPs, if they put junk in a required
>> section then slap them down for it and tell them to fix it.
>>
>>
>> On Oct 9, 2016 4:36 PM, "Matei Zaharia" <[hidden email]> wrote:
>>>
>>> Yup, this is the stuff that I found unclear. Thanks for clarifying here,
>>> but we should also clarify it in the writeup. In particular:
>>>
>>> - Goals needs to be about user-facing behavior ("people" is broad)
>>>
>>> - I'd rename Rejected Goals to Non-Goals. Otherwise someone will dig up
>>> one of these and say "Spark's developers have officially rejected X,
>>> which
>>> our awesome system has".
>>>
>>> - For user-facing stuff, I think you need a section on API. Virtually all
>>> other *IPs I've seen have that.
>>>
>>> - I'm still not sure why the strategy section is needed if the purpose is
>>> to define user-facing behavior -- unless this is the strategy for setting
>>> the goals or for defining the API. That sounds squarely like a design doc
>>> issue. In some sense, who cares whether the proposal is technically
>>> feasible
>>> right now? If it's infeasible, that will be discovered later during
>>> design
>>> and implementation. Same thing with rejected strategies -- listing some
>>> of
>>> those is definitely useful sometimes, but if you make this a *required*
>>> section, people are just going to fill it in with bogus stuff (I've seen
>>> this happen before).
>>>
>>> Matei
>>>
>
>>> > On Oct 9, 2016, at 2:14 PM, Cody Koeninger <[hidden email]> wrote:
>>> >
>>> > So to focus the discussion on the specific strategy I'm suggesting,
>>> > documented at
>>> >
>>> >
>>> >
>>> > https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>>> >
>>> > "Goals: What must this allow people to do, that they can't currently?"
>>> >
>>> > Is it unclear that this is focusing specifically on people-visible
>>> > behavior?
>>> >
>>> > Rejected goals -  are important because otherwise people keep trying
>>> > to argue about scope.  Of course you can change things later with a
>>> > different SIP and different vote, the point is to focus.
>>> >
>>> > Use cases - are something that people are going to bring up in
>>> > discussion.  If they aren't clearly documented as a goal ("This must
>>> > allow me to connect using SSL"), they should be added.
>>> >
>>> > Internal architecture - if the people who need specific behavior are
>>> > implementers of other parts of the system, that's fine.
>>> >
>>> > Rejected strategies - If you have none of these, you have no evidence
>>> > that the proponent didn't just go with the first thing they had in
>>> > mind (or have already implemented), which is a big problem currently.
>>> > Approval isn't binding as to specifics of implementation, so these
>>> > aren't handcuffs.  The goals are the contract, the strategy is
>>> > evidence that contract can actually be met.
>>> >
>>> > Design docs - I'm not touching design docs.  The markdown file I
>>> > linked specifically says of the strategy section "This is not a full
>>> > design document."  Is this unclear?  Design docs can be worked on
>>> > obviously, but that's not what I'm concerned with here.
>>> >
>>> >
>>> >
>>> >
>>> > On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[hidden email]>
>>> > wrote:
>>> >> Hi Cody,
>>> >>
>>> >> I think this would be a lot more concrete if we had a more detailed
>>> >> template
>>> >> for SIPs. Right now, it's not super clear what's in scope -- e.g. are
>>> >> they
>>> >> a way to solicit feedback on the user-facing behavior or on the
>>> >> internals?
>>> >> "Goals" can cover both things. I've been thinking of SIPs more as
>>> >> Product
>>> >> Requirements Docs (PRDs), which focus on *what* a code change should
>>> >> do
>>> >> as
>>> >> opposed to how.
>>> >>
>>> >> In particular, here are some things that you may or may not consider
>>> >> in
>>> >> scope for SIPs:
>>> >>
>>> >> - Goals and non-goals: This is definitely in scope, and IMO should
>>> >> focus on
>>> >> user-visible behavior (e.g. "system supports SQL window functions" or
>>> >> "system continues working if one node fails"). BTW I wouldn't say
>>> >> "rejected
>>> >> goals" because some of them might become goals later, so we're not
>>> >> definitively rejecting them.
>>> >>
>>> >> - Public API: Probably should be included in most SIPs unless it's too
>>> >> large
>>> >> to fully specify then (e.g. "let's add an ML library").
>>> >>
>>> >> - Use cases: I usually find this very useful in PRDs to better
>>> >> communicate
>>> >> the goals.
>>> >>
>>> >> - Internal architecture: This is usually *not* a thing users can
>>> >> easily
>>> >> comment on and it sounds more like a design doc item. Of course it's
>>> >> important to show that the SIP is feasible to implement. One
>>> >> exception,
>>> >> however, is that I think we'll have some SIPs primarily on internals
>>> >> (e.g.
>>> >> if somebody wants to refactor Spark's query optimizer or something).
>>> >>
>>> >> - Rejected strategies: I personally wouldn't put this, because what's
>>> >> the
>>> >> point of voting to reject a strategy before you've really begun
>>> >> designing
>>> >> and implementing something? What if you discover that the strategy is
>>> >> actually better when you start doing stuff?
>>> >>
>>> >> At a super high level, it depends on whether you want the SIPs to be
>>> >> PRDs
>>> >> for getting some quick feedback on the goals of a feature before it is
>>> >> designed, or something more like full-fledged design docs (just a more
>>> >> visible design doc for bigger changes). I looked at Kafka's KIPs, and
>>> >> they
>>> >> actually seem to be more like design docs. This can work too but it
>>> >> does
>>> >> require more work from the proposer and it can lead to the same
>>> >> problems you
>>> >> mentioned with people already having a design and implementation in
>>> >> mind.
>>> >>
>>> >> Basically, the question is, are you trying to iterate faster on design
>>> >> by
>>> >> adding a step for user feedback earlier? Or are you just trying to
>>> >> make
>>> >> design docs for key features more visible (and their approval more
>>> >> formal)?
>>> >>
>>> >> BTW note that in either case, I'd like to have a template for design
>>> >> docs
>>> >> too, which should also include goals. I think that would've avoided
>>> >> some of
>>> >> the issues you brought up.
>>> >>
>>> >> Matei
>>> >>
>>> >> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]> wrote:
>>> >>
>>> >> Here's my specific proposal (meta-proposal?)
>>> >>
>>> >> Spark Improvement Proposals (SIP)
>>> >>
>>> >>
>>> >> Background:
>>> >>
>>> >> The current problem is that design and implementation of large
>>> >> features
>>> >> are
>>> >> often done in private, before soliciting user feedback.
>>> >>
>>> >> When feedback is solicited, it is often as to detailed design
>>> >> specifics, not
>>> >> focused on goals.
>>> >>
>>> >> When implementation does take place after design, there is often
>>> >> disagreement as to what goals are or are not in scope.
>>> >>
>>> >> This results in commits that don't fully meet user needs.
>>> >>
>>> >>
>>> >> Goals:
>>> >>
>>> >> - Ensure user, contributor, and committer goals are clearly identified
>>> >> and
>>> >> agreed upon, before implementation takes place.
>>> >>
>>> >> - Ensure that a technically feasible strategy is chosen that is likely
>>> >> to
>>> >> meet the goals.
>>> >>
>>> >>
>>> >> Rejected Goals:
>>> >>
>>> >> - SIPs are not for detailed design.  Design by committee doesn't work.
>>> >>
>>> >> - SIPs are not for every change.  We dont need that much process.
>>> >>
>>> >>
>>> >> Strategy:
>>> >>
>>> >> My suggestion is outlined as a Spark Improvement Proposal process
>>> >> documented
>>> >> at
>>> >>
>>> >>
>>> >>
>>> >> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>>> >>
>>> >> Specifics of Jira manipulation are an implementation detail we can
>>> >> figure
>>> >> out.
>>> >>
>>> >> I'm suggesting voting; the need here is for a _clear_ outcome.
>>> >>
>>> >>
>>> >> Rejected Strategies:
>>> >>
>>> >> Having someone who understands the problem implement it first works,
>>> >> but
>>> >> only if significant iteration after user feedback is allowed.
>>> >>
>>> >> Historically this has been problematic due to pressure to limit public
>>> >> api
>>> >> changes.
>>> >>
>>> >>
>>> >> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]>
>>> >> wrote:
>>> >>>
>>> >>> Alright looks like there are quite a bit of support. We should wait
>>> >>> to
>>> >>> hear from more people too.
>>> >>>
>>> >>> To push this forward, Cody and I will be working together in the next
>>> >>> couple of weeks to come up with a concrete, detailed proposal on what
>>> >>> this
>>> >>> entails, and then we can discuss this the specific proposal as well.
>>> >>>
>>> >>>
>>> >>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden email]>
>>> >>> wrote:
>>> >>>>
>>> >>>> Yeah, in case it wasn't clear, I was talking about SIPs for major
>>> >>>> user-facing or cross-cutting changes, not minor feature adds.
>>> >>>>
>>> >>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos
>>> >>>> <[hidden email]> wrote:
>>> >>>>>
>>> >>>>> +1 to the SIP label as long as it does not slow down things and it
>>> >>>>> targets optimizing efforts, coordination etc. For example really
>>> >>>>> small
>>> >>>>> features should not need to go through this process (assuming they
>>> >>>>> dont
>>> >>>>> touch public interfaces)  or re-factorings and hope it will be kept
>>> >>>>> this
>>> >>>>> way. So as a guideline doc should be provided, like in the KIP
>>> >>>>> case.
>>> >>>>>
>>> >>>>> IMHO so far aside from tagging things and linking them elsewhere
>>> >>>>> simply
>>> >>>>> having design docs and prototypes implementations in PRs is not
>>> >>>>> something
>>> >>>>> that has not worked so far. What is really a pain in many projects
>>> >>>>> out there
>>> >>>>> is discontinuity in progress of PRs, missing features, slow reviews
>>> >>>>> which is
>>> >>>>> understandable to some extent... it is not only about Spark but
>>> >>>>> things can
>>> >>>>> be improved for sure for this project in particular as already
>>> >>>>> stated.
>>> >>>>>
>>> >>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden email]>
>>> >>>>> wrote:
>>> >>>>>>
>>> >>>>>> +1 to adding an SIP label and linking it from the website.  I
>>> >>>>>> think
>>> >>>>>> it
>>> >>>>>> needs
>>> >>>>>>
>>> >>>>>> - template that focuses it towards soliciting user goals / non
>>> >>>>>> goals
>>> >>>>>> - clear resolution as to which strategy was chosen to pursue.  I'd
>>> >>>>>> recommend a vote.
>>> >>>>>>
>>> >>>>>> Matei asked me to clarify what I meant by changing interfaces, I
>>> >>>>>> think
>>> >>>>>> it's directly relevant to the SIP idea so I'll clarify here, and
>>> >>>>>> split
>>> >>>>>> a thread for the other discussion per Nicholas' request.
>>> >>>>>>
>>> >>>>>> I meant changing public user interfaces.  I think the first design
>>> >>>>>> is
>>> >>>>>> unlikely to be right, because it's done at a time when you have
>>> >>>>>> the
>>> >>>>>> least information.  As a user, I find it considerably more
>>> >>>>>> frustrating
>>> >>>>>> to be unable to use a tool to get my job done, than I do having to
>>> >>>>>> make minor changes to my code in order to take advantage of
>>> >>>>>> features.
>>> >>>>>> I've seen committers be seriously reluctant to allow changes to
>>> >>>>>> @experimental code that are needed in order for it to really work
>>> >>>>>> right.  You need to be able to iterate, and if people on both
>>> >>>>>> sides
>>> >>>>>> of
>>> >>>>>> the fence aren't going to respect that some newer apis are subject
>>> >>>>>> to
>>> >>>>>> change, then why even mark them as such?
>>> >>>>>>
>>> >>>>>> Ideally a finished SIP should give me a checklist of things that
>>> >>>>>> an
>>> >>>>>> implementation must do, and things that it doesn't need to do.
>>> >>>>>> Contributors/committers should be seriously discouraged from
>>> >>>>>> putting
>>> >>>>>> out a version 0.1 that doesn't have at least a prototype
>>> >>>>>> implementation of all those things, especially if they're then
>>> >>>>>> going
>>> >>>>>> to argue against interface changes necessary to get the the rest
>>> >>>>>> of
>>> >>>>>> the things done in the 0.2 version.
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden email]>
>>> >>>>>> wrote:
>>> >>>>>>> I like the lightweight proposal to add a SIP label.
>>> >>>>>>>
>>> >>>>>>> During Spark 2.0 development, Tom (Graves) and I suggested using
>>> >>>>>>> wiki
>>> >>>>>>> to
>>> >>>>>>> track the list of major changes, but that never really
>>> >>>>>>> materialized
>>> >>>>>>> due to
>>> >>>>>>> the overhead. Adding a SIP label on major JIRAs and then link to
>>> >>>>>>> them
>>> >>>>>>> prominently on the Spark website makes a lot of sense.
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia
>>> >>>>>>> <[hidden email]>
>>> >>>>>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>> For the improvement proposals, I think one major point was to
>>> >>>>>>>> make
>>> >>>>>>>> them
>>> >>>>>>>> really visible to users who are not contributors, so we should
>>> >>>>>>>> do
>>> >>>>>>>> more than
>>> >>>>>>>> sending stuff to dev@. One very lightweight idea is to have a
>>> >>>>>>>> new
>>> >>>>>>>> type of
>>> >>>>>>>> JIRA called a SIP and have a link to a filter that shows all
>>> >>>>>>>> such
>>> >>>>>>>> JIRAs from
>>> >>>>>>>> http://spark.apache.org. I also like the idea of SIP and design
>>> >>>>>>>> doc
>>> >>>>>>>> templates (in fact many projects have them).
>>> >>>>>>>>
>>> >>>>>>>> Matei
>>> >>>>>>>>
>>> >>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]>
>>> >>>>>>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>> I called Cody last night and talked about some of the topics in
>>> >>>>>>>> his
>>> >>>>>>>> email.
>>> >>>>>>>> It became clear to me Cody genuinely cares about the project.
>>> >>>>>>>>
>>> >>>>>>>> Some of the frustrations come from the success of the project
>>> >>>>>>>> itself
>>> >>>>>>>> becoming very "hot", and it is difficult to get clarity from
>>> >>>>>>>> people
>>> >>>>>>>> who
>>> >>>>>>>> don't dedicate all their time to Spark. In fact, it is in some
>>> >>>>>>>> ways
>>> >>>>>>>> similar
>>> >>>>>>>> to scaling an engineering team in a successful startup: old
>>> >>>>>>>> processes that
>>> >>>>>>>> worked well might not work so well when it gets to a certain
>>> >>>>>>>> size,
>>> >>>>>>>> cultures
>>> >>>>>>>> can get diluted, building culture vs building process, etc.
>>> >>>>>>>>
>>> >>>>>>>> I also really like to have a more visible process for larger
>>> >>>>>>>> changes,
>>> >>>>>>>> especially major user facing API changes. Historically we upload
>>> >>>>>>>> design docs
>>> >>>>>>>> for major changes, but it is not always consistent and difficult
>>> >>>>>>>> to
>>> >>>>>>>> quality
>>> >>>>>>>> of the docs, due to the volunteering nature of the organization.
>>> >>>>>>>>
>>> >>>>>>>> Some of the more concrete ideas we discussed focus on building a
>>> >>>>>>>> culture
>>> >>>>>>>> to improve clarity:
>>> >>>>>>>>
>>> >>>>>>>> - Process: Large changes should have design docs posted on JIRA.
>>> >>>>>>>> One
>>> >>>>>>>> thing
>>> >>>>>>>> Cody and I didn't discuss but an idea that just came to me is we
>>> >>>>>>>> should
>>> >>>>>>>> create a design doc template for the project and ask everybody
>>> >>>>>>>> to
>>> >>>>>>>> follow.
>>> >>>>>>>> The design doc template should also explicitly list goals and
>>> >>>>>>>> non-goals, to
>>> >>>>>>>> make design doc more consistent.
>>> >>>>>>>>
>>> >>>>>>>> - Process: Email dev@ to solicit feedback. We have some this
>>> >>>>>>>> with
>>> >>>>>>>> some
>>> >>>>>>>> changes, but again very inconsistent. Just posting something on
>>> >>>>>>>> JIRA
>>> >>>>>>>> isn't
>>> >>>>>>>> sufficient, because there are simply too many JIRAs and the
>>> >>>>>>>> signal
>>> >>>>>>>> get lost
>>> >>>>>>>> in the noise. While this is generally impossible to enforce
>>> >>>>>>>> because
>>> >>>>>>>> we can't
>>> >>>>>>>> force all volunteers to conform to a process (or they might not
>>> >>>>>>>> even
>>> >>>>>>>> be
>>> >>>>>>>> aware of this),  those who are more familiar with the project
>>> >>>>>>>> can
>>> >>>>>>>> help by
>>> >>>>>>>> emailing the dev@ when they see something that hasn't been.
>>> >>>>>>>>
>>> >>>>>>>> - Culture: The design doc author(s) should be open to feedback.
>>> >>>>>>>> A
>>> >>>>>>>> design
>>> >>>>>>>> doc should serve as the base for discussion and is by no means
>>> >>>>>>>> the
>>> >>>>>>>> final
>>> >>>>>>>> design. Of course, this does not mean the author has to accept
>>> >>>>>>>> every
>>> >>>>>>>> feedback. They should also be comfortable accepting / rejecting
>>> >>>>>>>> ideas on
>>> >>>>>>>> technical grounds.
>>> >>>>>>>>
>>> >>>>>>>> - Process / Culture: For major ongoing projects, it can be
>>> >>>>>>>> useful
>>> >>>>>>>> to
>>> >>>>>>>> have
>>> >>>>>>>> some monthly Google hangouts that are open to the world. I am
>>> >>>>>>>> actually not
>>> >>>>>>>> sure how well this will work, because of the volunteering nature
>>> >>>>>>>> and
>>> >>>>>>>> we need
>>> >>>>>>>> to adjust for timezones for people across the globe, but it
>>> >>>>>>>> seems
>>> >>>>>>>> worth
>>> >>>>>>>> trying.
>>> >>>>>>>>
>>> >>>>>>>> - Culture: Contributors (including committers) should be more
>>> >>>>>>>> direct
>>> >>>>>>>> in
>>> >>>>>>>> setting expectations, including whether they are working on a
>>> >>>>>>>> specific
>>> >>>>>>>> issue, whether they will be working on a specific issue, and
>>> >>>>>>>> whether
>>> >>>>>>>> an
>>> >>>>>>>> issue or pr or jira should be rejected. Most people I know in
>>> >>>>>>>> this
>>> >>>>>>>> community
>>> >>>>>>>> are nice and don't enjoy telling other people no, but it is
>>> >>>>>>>> often
>>> >>>>>>>> more
>>> >>>>>>>> annoying to a contributor to not know anything than getting a
>>> >>>>>>>> no.
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia
>>> >>>>>>>> <[hidden email]>
>>> >>>>>>>> wrote:
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> Love the idea of a more visible "Spark Improvement Proposal"
>>> >>>>>>>>> process that
>>> >>>>>>>>> solicits user input on new APIs. For what it's worth, I don't
>>> >>>>>>>>> think
>>> >>>>>>>>> committers are trying to minimize their own work -- every
>>> >>>>>>>>> committer
>>> >>>>>>>>> cares
>>> >>>>>>>>> about making the software useful for users. However, it is
>>> >>>>>>>>> always
>>> >>>>>>>>> hard to
>>> >>>>>>>>> get user input and so it helps to have this kind of process.
>>> >>>>>>>>> I've
>>> >>>>>>>>> certainly
>>> >>>>>>>>> looked at the *IPs a lot in other software I use just to see
>>> >>>>>>>>> the
>>> >>>>>>>>> biggest
>>> >>>>>>>>> things on the roadmap.
>>> >>>>>>>>>
>>> >>>>>>>>> When you're talking about "changing interfaces", are you
>>> >>>>>>>>> talking
>>> >>>>>>>>> about
>>> >>>>>>>>> public or internal APIs? I do think many people hate changing
>>> >>>>>>>>> public APIs
>>> >>>>>>>>> and I actually think that's for the best of the project. That's
>>> >>>>>>>>> a
>>> >>>>>>>>> technical
>>> >>>>>>>>> debate, but basically, the worst thing when you're using a
>>> >>>>>>>>> piece
>>> >>>>>>>>> of
>>> >>>>>>>>> software
>>> >>>>>>>>> is that the developers constantly ask you to rewrite your app
>>> >>>>>>>>> to
>>> >>>>>>>>> update to a
>>> >>>>>>>>> new version (and thus benefit from bug fixes, etc). Cue anyone
>>> >>>>>>>>> who's used
>>> >>>>>>>>> Protobuf, or Guava. The "let's get everyone to change their
>>> >>>>>>>>> code
>>> >>>>>>>>> this
>>> >>>>>>>>> release" model works well within a single large company, but
>>> >>>>>>>>> doesn't work
>>> >>>>>>>>> well for a community, which is why nearly all *very* widely
>>> >>>>>>>>> used
>>> >>>>>>>>> programming
>>> >>>>>>>>> interfaces (I'm talking things like Java standard library,
>>> >>>>>>>>> Windows
>>> >>>>>>>>> API, etc)
>>> >>>>>>>>> almost *never* break backwards compatibility. All this is done
>>> >>>>>>>>> within reason
>>> >>>>>>>>> though, e.g. we do change things in major releases (2.x, 3.x,
>>> >>>>>>>>> etc).
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> ---------------------------------------------------------------------
>>> >>>>>> To unsubscribe e-mail: [hidden email]
>>> >>>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> Stavros Kontopoulos
>>> >>>>> Senior Software Engineer
>>> >>>>> Lightbend, Inc.
>>> >>>>> p:  +30 6977967274
>>> >>>>> e: [hidden email]
>>> >>>>>
>>> >>>>>
>>> >>>>
>>> >>>
>>> >>
>>> >>
>>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>
>
> ________________________________
>
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-Improvement-Proposals-tp19268p19359.html
>
> To start a new topic under Apache Spark Developers List, email [hidden
> email]
> To unsubscribe from Apache Spark Developers List, click here.
> NAML
>
>
> ________________________________
> View this message in context: RE: Spark Improvement Proposals
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark Improvement Proposals

Ryan Blue
+1 to votes to approve proposals. I agree that proposals should have an official mechanism to be accepted, and a vote is an established means of doing that well. I like that it includes a period to review the proposal and I think proposals should have been discussed enough ahead of a vote to survive the possibility of a veto.

I also like the names that are short and (mostly) unique, like SEP.

Where I disagree is with the requirement that a committer must formally propose an enhancement. I don't see the value of restricting this: if someone has the will to write up a proposal then they should be encouraged to do so and start a discussion about it. Even if there is a political reality as Cody says, what is the value of codifying that in our process? I think restricting who can submit proposals would only undermine them by pushing contributors out. Maybe I'm missing something here?

rb



On Mon, Oct 10, 2016 at 7:41 AM, Cody Koeninger <[hidden email]> wrote:
Yes, users suggesting SIPs is a good thing and is explicitly called
out in the linked document under the Who? section.  Formally proposing
them, not so much, because of the political realities.

Yes, implementation strategy definitely affects goals.  There are all
kinds of examples of this, I'll pick one that's my fault so as to
avoid sounding like I'm blaming:

When I implemented the Kafka DStream, one of my (not explicitly agreed
upon by the community) goals was to make sure people could use the
Dstream with however they were already using Kafka at work.  The lack
of explicit agreement on that goal led to all kinds of fighting with
committers, that could have been avoided.  The lack of explicit
up-front strategy discussion led to the DStream not really working
with compacted topics.  I knew about compacted topics, but don't have
a use for them, so had a blind spot there.  If there was explicit
up-front discussion that my strategy was "assume that batches can be
defined on the driver solely by beginning and ending offsets", there's
a greater chance that a user would have seen that and said, "hey, what
about non-contiguous offsets in a compacted topic".

This kind of thing is only going to happen smoothly if we have a
lightweight user-visible process with clear outcomes.

On Mon, Oct 10, 2016 at 1:34 AM, assaf.mendelson
<[hidden email]> wrote:
> I agree with most of what Cody said.
>
> Two things:
>
> First we can always have other people suggest SIPs but mark them as
> “unreviewed” and have committers basically move them forward. The problem is
> that writing a good document takes time. This way we can leverage non
> committers to do some of this work (it is just another way to contribute).
>
>
>
> As for strategy, in many cases implementation strategy can affect the goals.
> I will give  a small example: In the current structured streaming strategy,
> we group by the time to achieve a sliding window. This is definitely an
> implementation decision and not a goal. However, I can think of several
> aggregation functions which have the time inside their calculation buffer.
> For example, let’s say we want to return a set of all distinct values. One
> way to implement this would be to make the set into a map and have the value
> contain the last time seen. Multiplying it across the groupby would cost a
> lot in performance. So adding such a strategy would have a great effect on
> the type of aggregations and their performance which does affect the goal.
> Without adding the strategy, it is easy for whoever goes to the design
> document to not think about these cases. Furthermore, it might be decided
> that these cases are rare enough so that the strategy is still good enough
> but how would we know it without user feedback?
>
> I believe this example is exactly what Cody was talking about. Since many
> times implementation strategies have a large effect on the goal, we should
> have it discussed when discussing the goals. In addition, while it is often
> easy to throw out completely infeasible goals, it is often much harder to
> figure out that the goals are unfeasible without fine tuning.
>
>
>
>
>
> Assaf.
>
>
>
> From: Cody Koeninger-2 [via Apache Spark Developers List]
> [mailto:[hidden email][hidden email]]
> Sent: Monday, October 10, 2016 2:25 AM
> To: Mendelson, Assaf
> Subject: Re: Spark Improvement Proposals
>
>
>
> Only committers should formally submit SIPs because in an apache
> project only commiters have explicit political power.  If a user can't
> find a commiter willing to sponsor an SIP idea, they have no way to
> get the idea passed in any case.  If I can't find a committer to
> sponsor this meta-SIP idea, I'm out of luck.
>
> I do not believe unrealistic goals can be found solely by inspection.
> We've managed to ignore unrealistic goals even after implementation!
> Focusing on APIs can allow people to think they've solved something,
> when there's really no way of implementing that API while meeting the
> goals.  Rapid iteration is clearly the best way to address this, but
> we've already talked about why that hasn't really worked.  If adding a
> non-binding API section to the template is important to you, I'm not
> against it, but I don't think it's sufficient.
>
> On your PRD vs design doc spectrum, I'm saying this is closer to a
> PRD.  Clear agreement on goals is the most important thing and that's
> why it's the thing I want binding agreement on.  But I cannot agree to
> goals unless I have enough minimal technical info to judge whether the
> goals are likely to actually be accomplished.
>
>
>
> On Sun, Oct 9, 2016 at 5:35 PM, Matei Zaharia <[hidden email]> wrote:
>
>
>> Well, I think there are a few things here that don't make sense. First,
>> why
>> should only committers submit SIPs? Development in the project should be
>> open to all contributors, whether they're committers or not. Second, I
>> think
>> unrealistic goals can be found just by inspecting the goals, and I'm not
>> super worried that we'll accept a lot of SIPs that are then infeasible --
>> we
>> can then submit new ones. But this depends on whether you want this
>> process
>> to be a "design doc lite", where people also agree on implementation
>> strategy, or just a way to agree on goals. This is what I asked earlier
>> about PRDs vs design docs (and I'm open to either one but I'd just like
>> clarity). Finally, both as a user and designer of software, I always want
>> to
>> give feedback on APIs, so I'd really like a culture of having those early.
>> People don't argue about prettiness when they discuss APIs, they argue
>> about
>> the core concepts to expose in order to meet various goals, and then
>> they're
>> stuck maintaining those for a long time.
>>
>> Matei
>>
>> On Oct 9, 2016, at 3:10 PM, Cody Koeninger <[hidden email]> wrote:
>>
>> Users instead of people, sure.  Commiters and contributors are (or at
>> least
>> should be) a subset of users.
>>
>> Non goals, sure. I don't care what the name is, but we need to clearly say
>> e.g. 'no we are not maintaining compatibility with XYZ right now'.
>>
>> API, what I care most about is whether it allows me to accomplish the
>> goals.
>> Arguing about how ugly or pretty it is can be saved for design/
>> implementation imho.
>>
>> Strategy, this is necessary because otherwise goals can be out of line
>> with
>> reality.  Don't propose goals you don't have at least some idea of how to
>> implement.
>>
>> Rejected strategies, given that commiters are the only ones I'm saying
>> should formally submit SPARKLIs or SIPs, if they put junk in a required
>> section then slap them down for it and tell them to fix it.
>>
>>
>> On Oct 9, 2016 4:36 PM, "Matei Zaharia" <[hidden email]> wrote:
>>>
>>> Yup, this is the stuff that I found unclear. Thanks for clarifying here,
>>> but we should also clarify it in the writeup. In particular:
>>>
>>> - Goals needs to be about user-facing behavior ("people" is broad)
>>>
>>> - I'd rename Rejected Goals to Non-Goals. Otherwise someone will dig up
>>> one of these and say "Spark's developers have officially rejected X,
>>> which
>>> our awesome system has".
>>>
>>> - For user-facing stuff, I think you need a section on API. Virtually all
>>> other *IPs I've seen have that.
>>>
>>> - I'm still not sure why the strategy section is needed if the purpose is
>>> to define user-facing behavior -- unless this is the strategy for setting
>>> the goals or for defining the API. That sounds squarely like a design doc
>>> issue. In some sense, who cares whether the proposal is technically
>>> feasible
>>> right now? If it's infeasible, that will be discovered later during
>>> design
>>> and implementation. Same thing with rejected strategies -- listing some
>>> of
>>> those is definitely useful sometimes, but if you make this a *required*
>>> section, people are just going to fill it in with bogus stuff (I've seen
>>> this happen before).
>>>
>>> Matei
>>>
>
>>> > On Oct 9, 2016, at 2:14 PM, Cody Koeninger <[hidden email]> wrote:
>>> >
>>> > So to focus the discussion on the specific strategy I'm suggesting,
>>> > documented at
>>> >
>>> >
>>> >
>>> > https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>>> >
>>> > "Goals: What must this allow people to do, that they can't currently?"
>>> >
>>> > Is it unclear that this is focusing specifically on people-visible
>>> > behavior?
>>> >
>>> > Rejected goals -  are important because otherwise people keep trying
>>> > to argue about scope.  Of course you can change things later with a
>>> > different SIP and different vote, the point is to focus.
>>> >
>>> > Use cases - are something that people are going to bring up in
>>> > discussion.  If they aren't clearly documented as a goal ("This must
>>> > allow me to connect using SSL"), they should be added.
>>> >
>>> > Internal architecture - if the people who need specific behavior are
>>> > implementers of other parts of the system, that's fine.
>>> >
>>> > Rejected strategies - If you have none of these, you have no evidence
>>> > that the proponent didn't just go with the first thing they had in
>>> > mind (or have already implemented), which is a big problem currently.
>>> > Approval isn't binding as to specifics of implementation, so these
>>> > aren't handcuffs.  The goals are the contract, the strategy is
>>> > evidence that contract can actually be met.
>>> >
>>> > Design docs - I'm not touching design docs.  The markdown file I
>>> > linked specifically says of the strategy section "This is not a full
>>> > design document."  Is this unclear?  Design docs can be worked on
>>> > obviously, but that's not what I'm concerned with here.
>>> >
>>> >
>>> >
>>> >
>>> > On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[hidden email]>
>>> > wrote:
>>> >> Hi Cody,
>>> >>
>>> >> I think this would be a lot more concrete if we had a more detailed
>>> >> template
>>> >> for SIPs. Right now, it's not super clear what's in scope -- e.g. are
>>> >> they
>>> >> a way to solicit feedback on the user-facing behavior or on the
>>> >> internals?
>>> >> "Goals" can cover both things. I've been thinking of SIPs more as
>>> >> Product
>>> >> Requirements Docs (PRDs), which focus on *what* a code change should
>>> >> do
>>> >> as
>>> >> opposed to how.
>>> >>
>>> >> In particular, here are some things that you may or may not consider
>>> >> in
>>> >> scope for SIPs:
>>> >>
>>> >> - Goals and non-goals: This is definitely in scope, and IMO should
>>> >> focus on
>>> >> user-visible behavior (e.g. "system supports SQL window functions" or
>>> >> "system continues working if one node fails"). BTW I wouldn't say
>>> >> "rejected
>>> >> goals" because some of them might become goals later, so we're not
>>> >> definitively rejecting them.
>>> >>
>>> >> - Public API: Probably should be included in most SIPs unless it's too
>>> >> large
>>> >> to fully specify then (e.g. "let's add an ML library").
>>> >>
>>> >> - Use cases: I usually find this very useful in PRDs to better
>>> >> communicate
>>> >> the goals.
>>> >>
>>> >> - Internal architecture: This is usually *not* a thing users can
>>> >> easily
>>> >> comment on and it sounds more like a design doc item. Of course it's
>>> >> important to show that the SIP is feasible to implement. One
>>> >> exception,
>>> >> however, is that I think we'll have some SIPs primarily on internals
>>> >> (e.g.
>>> >> if somebody wants to refactor Spark's query optimizer or something).
>>> >>
>>> >> - Rejected strategies: I personally wouldn't put this, because what's
>>> >> the
>>> >> point of voting to reject a strategy before you've really begun
>>> >> designing
>>> >> and implementing something? What if you discover that the strategy is
>>> >> actually better when you start doing stuff?
>>> >>
>>> >> At a super high level, it depends on whether you want the SIPs to be
>>> >> PRDs
>>> >> for getting some quick feedback on the goals of a feature before it is
>>> >> designed, or something more like full-fledged design docs (just a more
>>> >> visible design doc for bigger changes). I looked at Kafka's KIPs, and
>>> >> they
>>> >> actually seem to be more like design docs. This can work too but it
>>> >> does
>>> >> require more work from the proposer and it can lead to the same
>>> >> problems you
>>> >> mentioned with people already having a design and implementation in
>>> >> mind.
>>> >>
>>> >> Basically, the question is, are you trying to iterate faster on design
>>> >> by
>>> >> adding a step for user feedback earlier? Or are you just trying to
>>> >> make
>>> >> design docs for key features more visible (and their approval more
>>> >> formal)?
>>> >>
>>> >> BTW note that in either case, I'd like to have a template for design
>>> >> docs
>>> >> too, which should also include goals. I think that would've avoided
>>> >> some of
>>> >> the issues you brought up.
>>> >>
>>> >> Matei
>>> >>
>>> >> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]> wrote:
>>> >>
>>> >> Here's my specific proposal (meta-proposal?)
>>> >>
>>> >> Spark Improvement Proposals (SIP)
>>> >>
>>> >>
>>> >> Background:
>>> >>
>>> >> The current problem is that design and implementation of large
>>> >> features
>>> >> are
>>> >> often done in private, before soliciting user feedback.
>>> >>
>>> >> When feedback is solicited, it is often as to detailed design
>>> >> specifics, not
>>> >> focused on goals.
>>> >>
>>> >> When implementation does take place after design, there is often
>>> >> disagreement as to what goals are or are not in scope.
>>> >>
>>> >> This results in commits that don't fully meet user needs.
>>> >>
>>> >>
>>> >> Goals:
>>> >>
>>> >> - Ensure user, contributor, and committer goals are clearly identified
>>> >> and
>>> >> agreed upon, before implementation takes place.
>>> >>
>>> >> - Ensure that a technically feasible strategy is chosen that is likely
>>> >> to
>>> >> meet the goals.
>>> >>
>>> >>
>>> >> Rejected Goals:
>>> >>
>>> >> - SIPs are not for detailed design.  Design by committee doesn't work.
>>> >>
>>> >> - SIPs are not for every change.  We dont need that much process.
>>> >>
>>> >>
>>> >> Strategy:
>>> >>
>>> >> My suggestion is outlined as a Spark Improvement Proposal process
>>> >> documented
>>> >> at
>>> >>
>>> >>
>>> >>
>>> >> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>>> >>
>>> >> Specifics of Jira manipulation are an implementation detail we can
>>> >> figure
>>> >> out.
>>> >>
>>> >> I'm suggesting voting; the need here is for a _clear_ outcome.
>>> >>
>>> >>
>>> >> Rejected Strategies:
>>> >>
>>> >> Having someone who understands the problem implement it first works,
>>> >> but
>>> >> only if significant iteration after user feedback is allowed.
>>> >>
>>> >> Historically this has been problematic due to pressure to limit public
>>> >> api
>>> >> changes.
>>> >>
>>> >>
>>> >> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]>
>>> >> wrote:
>>> >>>
>>> >>> Alright looks like there are quite a bit of support. We should wait
>>> >>> to
>>> >>> hear from more people too.
>>> >>>
>>> >>> To push this forward, Cody and I will be working together in the next
>>> >>> couple of weeks to come up with a concrete, detailed proposal on what
>>> >>> this
>>> >>> entails, and then we can discuss this the specific proposal as well.
>>> >>>
>>> >>>
>>> >>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden email]>
>>> >>> wrote:
>>> >>>>
>>> >>>> Yeah, in case it wasn't clear, I was talking about SIPs for major
>>> >>>> user-facing or cross-cutting changes, not minor feature adds.
>>> >>>>
>>> >>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos
>>> >>>> <[hidden email]> wrote:
>>> >>>>>
>>> >>>>> +1 to the SIP label as long as it does not slow down things and it
>>> >>>>> targets optimizing efforts, coordination etc. For example really
>>> >>>>> small
>>> >>>>> features should not need to go through this process (assuming they
>>> >>>>> dont
>>> >>>>> touch public interfaces)  or re-factorings and hope it will be kept
>>> >>>>> this
>>> >>>>> way. So as a guideline doc should be provided, like in the KIP
>>> >>>>> case.
>>> >>>>>
>>> >>>>> IMHO so far aside from tagging things and linking them elsewhere
>>> >>>>> simply
>>> >>>>> having design docs and prototypes implementations in PRs is not
>>> >>>>> something
>>> >>>>> that has not worked so far. What is really a pain in many projects
>>> >>>>> out there
>>> >>>>> is discontinuity in progress of PRs, missing features, slow reviews
>>> >>>>> which is
>>> >>>>> understandable to some extent... it is not only about Spark but
>>> >>>>> things can
>>> >>>>> be improved for sure for this project in particular as already
>>> >>>>> stated.
>>> >>>>>
>>> >>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden email]>
>>> >>>>> wrote:
>>> >>>>>>
>>> >>>>>> +1 to adding an SIP label and linking it from the website.  I
>>> >>>>>> think
>>> >>>>>> it
>>> >>>>>> needs
>>> >>>>>>
>>> >>>>>> - template that focuses it towards soliciting user goals / non
>>> >>>>>> goals
>>> >>>>>> - clear resolution as to which strategy was chosen to pursue.  I'd
>>> >>>>>> recommend a vote.
>>> >>>>>>
>>> >>>>>> Matei asked me to clarify what I meant by changing interfaces, I
>>> >>>>>> think
>>> >>>>>> it's directly relevant to the SIP idea so I'll clarify here, and
>>> >>>>>> split
>>> >>>>>> a thread for the other discussion per Nicholas' request.
>>> >>>>>>
>>> >>>>>> I meant changing public user interfaces.  I think the first design
>>> >>>>>> is
>>> >>>>>> unlikely to be right, because it's done at a time when you have
>>> >>>>>> the
>>> >>>>>> least information.  As a user, I find it considerably more
>>> >>>>>> frustrating
>>> >>>>>> to be unable to use a tool to get my job done, than I do having to
>>> >>>>>> make minor changes to my code in order to take advantage of
>>> >>>>>> features.
>>> >>>>>> I've seen committers be seriously reluctant to allow changes to
>>> >>>>>> @experimental code that are needed in order for it to really work
>>> >>>>>> right.  You need to be able to iterate, and if people on both
>>> >>>>>> sides
>>> >>>>>> of
>>> >>>>>> the fence aren't going to respect that some newer apis are subject
>>> >>>>>> to
>>> >>>>>> change, then why even mark them as such?
>>> >>>>>>
>>> >>>>>> Ideally a finished SIP should give me a checklist of things that
>>> >>>>>> an
>>> >>>>>> implementation must do, and things that it doesn't need to do.
>>> >>>>>> Contributors/committers should be seriously discouraged from
>>> >>>>>> putting
>>> >>>>>> out a version 0.1 that doesn't have at least a prototype
>>> >>>>>> implementation of all those things, especially if they're then
>>> >>>>>> going
>>> >>>>>> to argue against interface changes necessary to get the the rest
>>> >>>>>> of
>>> >>>>>> the things done in the 0.2 version.
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden email]>
>>> >>>>>> wrote:
>>> >>>>>>> I like the lightweight proposal to add a SIP label.
>>> >>>>>>>
>>> >>>>>>> During Spark 2.0 development, Tom (Graves) and I suggested using
>>> >>>>>>> wiki
>>> >>>>>>> to
>>> >>>>>>> track the list of major changes, but that never really
>>> >>>>>>> materialized
>>> >>>>>>> due to
>>> >>>>>>> the overhead. Adding a SIP label on major JIRAs and then link to
>>> >>>>>>> them
>>> >>>>>>> prominently on the Spark website makes a lot of sense.
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia
>>> >>>>>>> <[hidden email]>
>>> >>>>>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>> For the improvement proposals, I think one major point was to
>>> >>>>>>>> make
>>> >>>>>>>> them
>>> >>>>>>>> really visible to users who are not contributors, so we should
>>> >>>>>>>> do
>>> >>>>>>>> more than
>>> >>>>>>>> sending stuff to dev@. One very lightweight idea is to have a
>>> >>>>>>>> new
>>> >>>>>>>> type of
>>> >>>>>>>> JIRA called a SIP and have a link to a filter that shows all
>>> >>>>>>>> such
>>> >>>>>>>> JIRAs from
>>> >>>>>>>> http://spark.apache.org. I also like the idea of SIP and design
>>> >>>>>>>> doc
>>> >>>>>>>> templates (in fact many projects have them).
>>> >>>>>>>>
>>> >>>>>>>> Matei
>>> >>>>>>>>
>>> >>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]>
>>> >>>>>>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>> I called Cody last night and talked about some of the topics in
>>> >>>>>>>> his
>>> >>>>>>>> email.
>>> >>>>>>>> It became clear to me Cody genuinely cares about the project.
>>> >>>>>>>>
>>> >>>>>>>> Some of the frustrations come from the success of the project
>>> >>>>>>>> itself
>>> >>>>>>>> becoming very "hot", and it is difficult to get clarity from
>>> >>>>>>>> people
>>> >>>>>>>> who
>>> >>>>>>>> don't dedicate all their time to Spark. In fact, it is in some
>>> >>>>>>>> ways
>>> >>>>>>>> similar
>>> >>>>>>>> to scaling an engineering team in a successful startup: old
>>> >>>>>>>> processes that
>>> >>>>>>>> worked well might not work so well when it gets to a certain
>>> >>>>>>>> size,
>>> >>>>>>>> cultures
>>> >>>>>>>> can get diluted, building culture vs building process, etc.
>>> >>>>>>>>
>>> >>>>>>>> I also really like to have a more visible process for larger
>>> >>>>>>>> changes,
>>> >>>>>>>> especially major user facing API changes. Historically we upload
>>> >>>>>>>> design docs
>>> >>>>>>>> for major changes, but it is not always consistent and difficult
>>> >>>>>>>> to
>>> >>>>>>>> quality
>>> >>>>>>>> of the docs, due to the volunteering nature of the organization.
>>> >>>>>>>>
>>> >>>>>>>> Some of the more concrete ideas we discussed focus on building a
>>> >>>>>>>> culture
>>> >>>>>>>> to improve clarity:
>>> >>>>>>>>
>>> >>>>>>>> - Process: Large changes should have design docs posted on JIRA.
>>> >>>>>>>> One
>>> >>>>>>>> thing
>>> >>>>>>>> Cody and I didn't discuss but an idea that just came to me is we
>>> >>>>>>>> should
>>> >>>>>>>> create a design doc template for the project and ask everybody
>>> >>>>>>>> to
>>> >>>>>>>> follow.
>>> >>>>>>>> The design doc template should also explicitly list goals and
>>> >>>>>>>> non-goals, to
>>> >>>>>>>> make design doc more consistent.
>>> >>>>>>>>
>>> >>>>>>>> - Process: Email dev@ to solicit feedback. We have some this
>>> >>>>>>>> with
>>> >>>>>>>> some
>>> >>>>>>>> changes, but again very inconsistent. Just posting something on
>>> >>>>>>>> JIRA
>>> >>>>>>>> isn't
>>> >>>>>>>> sufficient, because there are simply too many JIRAs and the
>>> >>>>>>>> signal
>>> >>>>>>>> get lost
>>> >>>>>>>> in the noise. While this is generally impossible to enforce
>>> >>>>>>>> because
>>> >>>>>>>> we can't
>>> >>>>>>>> force all volunteers to conform to a process (or they might not
>>> >>>>>>>> even
>>> >>>>>>>> be
>>> >>>>>>>> aware of this),  those who are more familiar with the project
>>> >>>>>>>> can
>>> >>>>>>>> help by
>>> >>>>>>>> emailing the dev@ when they see something that hasn't been.
>>> >>>>>>>>
>>> >>>>>>>> - Culture: The design doc author(s) should be open to feedback.
>>> >>>>>>>> A
>>> >>>>>>>> design
>>> >>>>>>>> doc should serve as the base for discussion and is by no means
>>> >>>>>>>> the
>>> >>>>>>>> final
>>> >>>>>>>> design. Of course, this does not mean the author has to accept
>>> >>>>>>>> every
>>> >>>>>>>> feedback. They should also be comfortable accepting / rejecting
>>> >>>>>>>> ideas on
>>> >>>>>>>> technical grounds.
>>> >>>>>>>>
>>> >>>>>>>> - Process / Culture: For major ongoing projects, it can be
>>> >>>>>>>> useful
>>> >>>>>>>> to
>>> >>>>>>>> have
>>> >>>>>>>> some monthly Google hangouts that are open to the world. I am
>>> >>>>>>>> actually not
>>> >>>>>>>> sure how well this will work, because of the volunteering nature
>>> >>>>>>>> and
>>> >>>>>>>> we need
>>> >>>>>>>> to adjust for timezones for people across the globe, but it
>>> >>>>>>>> seems
>>> >>>>>>>> worth
>>> >>>>>>>> trying.
>>> >>>>>>>>
>>> >>>>>>>> - Culture: Contributors (including committers) should be more
>>> >>>>>>>> direct
>>> >>>>>>>> in
>>> >>>>>>>> setting expectations, including whether they are working on a
>>> >>>>>>>> specific
>>> >>>>>>>> issue, whether they will be working on a specific issue, and
>>> >>>>>>>> whether
>>> >>>>>>>> an
>>> >>>>>>>> issue or pr or jira should be rejected. Most people I know in
>>> >>>>>>>> this
>>> >>>>>>>> community
>>> >>>>>>>> are nice and don't enjoy telling other people no, but it is
>>> >>>>>>>> often
>>> >>>>>>>> more
>>> >>>>>>>> annoying to a contributor to not know anything than getting a
>>> >>>>>>>> no.
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia
>>> >>>>>>>> <[hidden email]>
>>> >>>>>>>> wrote:
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> Love the idea of a more visible "Spark Improvement Proposal"
>>> >>>>>>>>> process that
>>> >>>>>>>>> solicits user input on new APIs. For what it's worth, I don't
>>> >>>>>>>>> think
>>> >>>>>>>>> committers are trying to minimize their own work -- every
>>> >>>>>>>>> committer
>>> >>>>>>>>> cares
>>> >>>>>>>>> about making the software useful for users. However, it is
>>> >>>>>>>>> always
>>> >>>>>>>>> hard to
>>> >>>>>>>>> get user input and so it helps to have this kind of process.
>>> >>>>>>>>> I've
>>> >>>>>>>>> certainly
>>> >>>>>>>>> looked at the *IPs a lot in other software I use just to see
>>> >>>>>>>>> the
>>> >>>>>>>>> biggest
>>> >>>>>>>>> things on the roadmap.
>>> >>>>>>>>>
>>> >>>>>>>>> When you're talking about "changing interfaces", are you
>>> >>>>>>>>> talking
>>> >>>>>>>>> about
>>> >>>>>>>>> public or internal APIs? I do think many people hate changing
>>> >>>>>>>>> public APIs
>>> >>>>>>>>> and I actually think that's for the best of the project. That's
>>> >>>>>>>>> a
>>> >>>>>>>>> technical
>>> >>>>>>>>> debate, but basically, the worst thing when you're using a
>>> >>>>>>>>> piece
>>> >>>>>>>>> of
>>> >>>>>>>>> software
>>> >>>>>>>>> is that the developers constantly ask you to rewrite your app
>>> >>>>>>>>> to
>>> >>>>>>>>> update to a
>>> >>>>>>>>> new version (and thus benefit from bug fixes, etc). Cue anyone
>>> >>>>>>>>> who's used
>>> >>>>>>>>> Protobuf, or Guava. The "let's get everyone to change their
>>> >>>>>>>>> code
>>> >>>>>>>>> this
>>> >>>>>>>>> release" model works well within a single large company, but
>>> >>>>>>>>> doesn't work
>>> >>>>>>>>> well for a community, which is why nearly all *very* widely
>>> >>>>>>>>> used
>>> >>>>>>>>> programming
>>> >>>>>>>>> interfaces (I'm talking things like Java standard library,
>>> >>>>>>>>> Windows
>>> >>>>>>>>> API, etc)
>>> >>>>>>>>> almost *never* break backwards compatibility. All this is done
>>> >>>>>>>>> within reason
>>> >>>>>>>>> though, e.g. we do change things in major releases (2.x, 3.x,
>>> >>>>>>>>> etc).
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> ---------------------------------------------------------------------
>>> >>>>>> To unsubscribe e-mail: [hidden email]
>>> >>>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> Stavros Kontopoulos
>>> >>>>> Senior Software Engineer
>>> >>>>> Lightbend, Inc.
>>> >>>>> p:  <a href="tel:%2B30%206977967274" value="+306977967274">+30 6977967274
>>> >>>>> e: [hidden email]
>>> >>>>>
>>> >>>>>
>>> >>>>
>>> >>>
>>> >>
>>> >>
>>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>
>
> ________________________________
>
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-Improvement-Proposals-tp19268p19359.html
>
> To start a new topic under Apache Spark Developers List, email [hidden
> email]
> To unsubscribe from Apache Spark Developers List, click here.
> NAML
>
>
> ________________________________
> View this message in context: RE: Spark Improvement Proposals
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]




--
Ryan Blue
Software Engineer
Netflix
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark Improvement Proposals

Cody Koeninger-2
I think the main value is in being honest about what's going on.  No
one other than committers can cast a meaningful vote, that's the
reality.  Beyond that, if people think it's more open to allow formal
proposals from anyone, I'm not necessarily against it, but my main
question would be this:

If anyone can submit a proposal, are committers actually going to
clearly reject and close proposals that don't meet the requirements?

Right now we have a serious problem with lack of clarity regarding
contributions, and that cannot spill over into goal-setting.

On Mon, Oct 10, 2016 at 1:54 PM, Ryan Blue <[hidden email]> wrote:

> +1 to votes to approve proposals. I agree that proposals should have an
> official mechanism to be accepted, and a vote is an established means of
> doing that well. I like that it includes a period to review the proposal and
> I think proposals should have been discussed enough ahead of a vote to
> survive the possibility of a veto.
>
> I also like the names that are short and (mostly) unique, like SEP.
>
> Where I disagree is with the requirement that a committer must formally
> propose an enhancement. I don't see the value of restricting this: if
> someone has the will to write up a proposal then they should be encouraged
> to do so and start a discussion about it. Even if there is a political
> reality as Cody says, what is the value of codifying that in our process? I
> think restricting who can submit proposals would only undermine them by
> pushing contributors out. Maybe I'm missing something here?
>
> rb
>
>
>
> On Mon, Oct 10, 2016 at 7:41 AM, Cody Koeninger <[hidden email]> wrote:
>>
>> Yes, users suggesting SIPs is a good thing and is explicitly called
>> out in the linked document under the Who? section.  Formally proposing
>> them, not so much, because of the political realities.
>>
>> Yes, implementation strategy definitely affects goals.  There are all
>> kinds of examples of this, I'll pick one that's my fault so as to
>> avoid sounding like I'm blaming:
>>
>> When I implemented the Kafka DStream, one of my (not explicitly agreed
>> upon by the community) goals was to make sure people could use the
>> Dstream with however they were already using Kafka at work.  The lack
>> of explicit agreement on that goal led to all kinds of fighting with
>> committers, that could have been avoided.  The lack of explicit
>> up-front strategy discussion led to the DStream not really working
>> with compacted topics.  I knew about compacted topics, but don't have
>> a use for them, so had a blind spot there.  If there was explicit
>> up-front discussion that my strategy was "assume that batches can be
>> defined on the driver solely by beginning and ending offsets", there's
>> a greater chance that a user would have seen that and said, "hey, what
>> about non-contiguous offsets in a compacted topic".
>>
>> This kind of thing is only going to happen smoothly if we have a
>> lightweight user-visible process with clear outcomes.
>>
>> On Mon, Oct 10, 2016 at 1:34 AM, assaf.mendelson
>> <[hidden email]> wrote:
>> > I agree with most of what Cody said.
>> >
>> > Two things:
>> >
>> > First we can always have other people suggest SIPs but mark them as
>> > “unreviewed” and have committers basically move them forward. The
>> > problem is
>> > that writing a good document takes time. This way we can leverage non
>> > committers to do some of this work (it is just another way to
>> > contribute).
>> >
>> >
>> >
>> > As for strategy, in many cases implementation strategy can affect the
>> > goals.
>> > I will give  a small example: In the current structured streaming
>> > strategy,
>> > we group by the time to achieve a sliding window. This is definitely an
>> > implementation decision and not a goal. However, I can think of several
>> > aggregation functions which have the time inside their calculation
>> > buffer.
>> > For example, let’s say we want to return a set of all distinct values.
>> > One
>> > way to implement this would be to make the set into a map and have the
>> > value
>> > contain the last time seen. Multiplying it across the groupby would cost
>> > a
>> > lot in performance. So adding such a strategy would have a great effect
>> > on
>> > the type of aggregations and their performance which does affect the
>> > goal.
>> > Without adding the strategy, it is easy for whoever goes to the design
>> > document to not think about these cases. Furthermore, it might be
>> > decided
>> > that these cases are rare enough so that the strategy is still good
>> > enough
>> > but how would we know it without user feedback?
>> >
>> > I believe this example is exactly what Cody was talking about. Since
>> > many
>> > times implementation strategies have a large effect on the goal, we
>> > should
>> > have it discussed when discussing the goals. In addition, while it is
>> > often
>> > easy to throw out completely infeasible goals, it is often much harder
>> > to
>> > figure out that the goals are unfeasible without fine tuning.
>> >
>> >
>> >
>> >
>> >
>> > Assaf.
>> >
>> >
>> >
>> > From: Cody Koeninger-2 [via Apache Spark Developers List]
>> > [mailto:ml-node+[hidden email]]
>> > Sent: Monday, October 10, 2016 2:25 AM
>> > To: Mendelson, Assaf
>> > Subject: Re: Spark Improvement Proposals
>> >
>> >
>> >
>> > Only committers should formally submit SIPs because in an apache
>> > project only commiters have explicit political power.  If a user can't
>> > find a commiter willing to sponsor an SIP idea, they have no way to
>> > get the idea passed in any case.  If I can't find a committer to
>> > sponsor this meta-SIP idea, I'm out of luck.
>> >
>> > I do not believe unrealistic goals can be found solely by inspection.
>> > We've managed to ignore unrealistic goals even after implementation!
>> > Focusing on APIs can allow people to think they've solved something,
>> > when there's really no way of implementing that API while meeting the
>> > goals.  Rapid iteration is clearly the best way to address this, but
>> > we've already talked about why that hasn't really worked.  If adding a
>> > non-binding API section to the template is important to you, I'm not
>> > against it, but I don't think it's sufficient.
>> >
>> > On your PRD vs design doc spectrum, I'm saying this is closer to a
>> > PRD.  Clear agreement on goals is the most important thing and that's
>> > why it's the thing I want binding agreement on.  But I cannot agree to
>> > goals unless I have enough minimal technical info to judge whether the
>> > goals are likely to actually be accomplished.
>> >
>> >
>> >
>> > On Sun, Oct 9, 2016 at 5:35 PM, Matei Zaharia <[hidden email]> wrote:
>> >
>> >
>> >> Well, I think there are a few things here that don't make sense. First,
>> >> why
>> >> should only committers submit SIPs? Development in the project should
>> >> be
>> >> open to all contributors, whether they're committers or not. Second, I
>> >> think
>> >> unrealistic goals can be found just by inspecting the goals, and I'm
>> >> not
>> >> super worried that we'll accept a lot of SIPs that are then infeasible
>> >> --
>> >> we
>> >> can then submit new ones. But this depends on whether you want this
>> >> process
>> >> to be a "design doc lite", where people also agree on implementation
>> >> strategy, or just a way to agree on goals. This is what I asked earlier
>> >> about PRDs vs design docs (and I'm open to either one but I'd just like
>> >> clarity). Finally, both as a user and designer of software, I always
>> >> want
>> >> to
>> >> give feedback on APIs, so I'd really like a culture of having those
>> >> early.
>> >> People don't argue about prettiness when they discuss APIs, they argue
>> >> about
>> >> the core concepts to expose in order to meet various goals, and then
>> >> they're
>> >> stuck maintaining those for a long time.
>> >>
>> >> Matei
>> >>
>> >> On Oct 9, 2016, at 3:10 PM, Cody Koeninger <[hidden email]> wrote:
>> >>
>> >> Users instead of people, sure.  Commiters and contributors are (or at
>> >> least
>> >> should be) a subset of users.
>> >>
>> >> Non goals, sure. I don't care what the name is, but we need to clearly
>> >> say
>> >> e.g. 'no we are not maintaining compatibility with XYZ right now'.
>> >>
>> >> API, what I care most about is whether it allows me to accomplish the
>> >> goals.
>> >> Arguing about how ugly or pretty it is can be saved for design/
>> >> implementation imho.
>> >>
>> >> Strategy, this is necessary because otherwise goals can be out of line
>> >> with
>> >> reality.  Don't propose goals you don't have at least some idea of how
>> >> to
>> >> implement.
>> >>
>> >> Rejected strategies, given that commiters are the only ones I'm saying
>> >> should formally submit SPARKLIs or SIPs, if they put junk in a required
>> >> section then slap them down for it and tell them to fix it.
>> >>
>> >>
>> >> On Oct 9, 2016 4:36 PM, "Matei Zaharia" <[hidden email]> wrote:
>> >>>
>> >>> Yup, this is the stuff that I found unclear. Thanks for clarifying
>> >>> here,
>> >>> but we should also clarify it in the writeup. In particular:
>> >>>
>> >>> - Goals needs to be about user-facing behavior ("people" is broad)
>> >>>
>> >>> - I'd rename Rejected Goals to Non-Goals. Otherwise someone will dig
>> >>> up
>> >>> one of these and say "Spark's developers have officially rejected X,
>> >>> which
>> >>> our awesome system has".
>> >>>
>> >>> - For user-facing stuff, I think you need a section on API. Virtually
>> >>> all
>> >>> other *IPs I've seen have that.
>> >>>
>> >>> - I'm still not sure why the strategy section is needed if the purpose
>> >>> is
>> >>> to define user-facing behavior -- unless this is the strategy for
>> >>> setting
>> >>> the goals or for defining the API. That sounds squarely like a design
>> >>> doc
>> >>> issue. In some sense, who cares whether the proposal is technically
>> >>> feasible
>> >>> right now? If it's infeasible, that will be discovered later during
>> >>> design
>> >>> and implementation. Same thing with rejected strategies -- listing
>> >>> some
>> >>> of
>> >>> those is definitely useful sometimes, but if you make this a
>> >>> *required*
>> >>> section, people are just going to fill it in with bogus stuff (I've
>> >>> seen
>> >>> this happen before).
>> >>>
>> >>> Matei
>> >>>
>> >
>> >>> > On Oct 9, 2016, at 2:14 PM, Cody Koeninger <[hidden email]> wrote:
>> >>> >
>> >>> > So to focus the discussion on the specific strategy I'm suggesting,
>> >>> > documented at
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> > https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>> >>> >
>> >>> > "Goals: What must this allow people to do, that they can't
>> >>> > currently?"
>> >>> >
>> >>> > Is it unclear that this is focusing specifically on people-visible
>> >>> > behavior?
>> >>> >
>> >>> > Rejected goals -  are important because otherwise people keep trying
>> >>> > to argue about scope.  Of course you can change things later with a
>> >>> > different SIP and different vote, the point is to focus.
>> >>> >
>> >>> > Use cases - are something that people are going to bring up in
>> >>> > discussion.  If they aren't clearly documented as a goal ("This must
>> >>> > allow me to connect using SSL"), they should be added.
>> >>> >
>> >>> > Internal architecture - if the people who need specific behavior are
>> >>> > implementers of other parts of the system, that's fine.
>> >>> >
>> >>> > Rejected strategies - If you have none of these, you have no
>> >>> > evidence
>> >>> > that the proponent didn't just go with the first thing they had in
>> >>> > mind (or have already implemented), which is a big problem
>> >>> > currently.
>> >>> > Approval isn't binding as to specifics of implementation, so these
>> >>> > aren't handcuffs.  The goals are the contract, the strategy is
>> >>> > evidence that contract can actually be met.
>> >>> >
>> >>> > Design docs - I'm not touching design docs.  The markdown file I
>> >>> > linked specifically says of the strategy section "This is not a full
>> >>> > design document."  Is this unclear?  Design docs can be worked on
>> >>> > obviously, but that's not what I'm concerned with here.
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> > On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[hidden email]>
>> >>> > wrote:
>> >>> >> Hi Cody,
>> >>> >>
>> >>> >> I think this would be a lot more concrete if we had a more detailed
>> >>> >> template
>> >>> >> for SIPs. Right now, it's not super clear what's in scope -- e.g.
>> >>> >> are
>> >>> >> they
>> >>> >> a way to solicit feedback on the user-facing behavior or on the
>> >>> >> internals?
>> >>> >> "Goals" can cover both things. I've been thinking of SIPs more as
>> >>> >> Product
>> >>> >> Requirements Docs (PRDs), which focus on *what* a code change
>> >>> >> should
>> >>> >> do
>> >>> >> as
>> >>> >> opposed to how.
>> >>> >>
>> >>> >> In particular, here are some things that you may or may not
>> >>> >> consider
>> >>> >> in
>> >>> >> scope for SIPs:
>> >>> >>
>> >>> >> - Goals and non-goals: This is definitely in scope, and IMO should
>> >>> >> focus on
>> >>> >> user-visible behavior (e.g. "system supports SQL window functions"
>> >>> >> or
>> >>> >> "system continues working if one node fails"). BTW I wouldn't say
>> >>> >> "rejected
>> >>> >> goals" because some of them might become goals later, so we're not
>> >>> >> definitively rejecting them.
>> >>> >>
>> >>> >> - Public API: Probably should be included in most SIPs unless it's
>> >>> >> too
>> >>> >> large
>> >>> >> to fully specify then (e.g. "let's add an ML library").
>> >>> >>
>> >>> >> - Use cases: I usually find this very useful in PRDs to better
>> >>> >> communicate
>> >>> >> the goals.
>> >>> >>
>> >>> >> - Internal architecture: This is usually *not* a thing users can
>> >>> >> easily
>> >>> >> comment on and it sounds more like a design doc item. Of course
>> >>> >> it's
>> >>> >> important to show that the SIP is feasible to implement. One
>> >>> >> exception,
>> >>> >> however, is that I think we'll have some SIPs primarily on
>> >>> >> internals
>> >>> >> (e.g.
>> >>> >> if somebody wants to refactor Spark's query optimizer or
>> >>> >> something).
>> >>> >>
>> >>> >> - Rejected strategies: I personally wouldn't put this, because
>> >>> >> what's
>> >>> >> the
>> >>> >> point of voting to reject a strategy before you've really begun
>> >>> >> designing
>> >>> >> and implementing something? What if you discover that the strategy
>> >>> >> is
>> >>> >> actually better when you start doing stuff?
>> >>> >>
>> >>> >> At a super high level, it depends on whether you want the SIPs to
>> >>> >> be
>> >>> >> PRDs
>> >>> >> for getting some quick feedback on the goals of a feature before it
>> >>> >> is
>> >>> >> designed, or something more like full-fledged design docs (just a
>> >>> >> more
>> >>> >> visible design doc for bigger changes). I looked at Kafka's KIPs,
>> >>> >> and
>> >>> >> they
>> >>> >> actually seem to be more like design docs. This can work too but it
>> >>> >> does
>> >>> >> require more work from the proposer and it can lead to the same
>> >>> >> problems you
>> >>> >> mentioned with people already having a design and implementation in
>> >>> >> mind.
>> >>> >>
>> >>> >> Basically, the question is, are you trying to iterate faster on
>> >>> >> design
>> >>> >> by
>> >>> >> adding a step for user feedback earlier? Or are you just trying to
>> >>> >> make
>> >>> >> design docs for key features more visible (and their approval more
>> >>> >> formal)?
>> >>> >>
>> >>> >> BTW note that in either case, I'd like to have a template for
>> >>> >> design
>> >>> >> docs
>> >>> >> too, which should also include goals. I think that would've avoided
>> >>> >> some of
>> >>> >> the issues you brought up.
>> >>> >>
>> >>> >> Matei
>> >>> >>
>> >>> >> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]> wrote:
>> >>> >>
>> >>> >> Here's my specific proposal (meta-proposal?)
>> >>> >>
>> >>> >> Spark Improvement Proposals (SIP)
>> >>> >>
>> >>> >>
>> >>> >> Background:
>> >>> >>
>> >>> >> The current problem is that design and implementation of large
>> >>> >> features
>> >>> >> are
>> >>> >> often done in private, before soliciting user feedback.
>> >>> >>
>> >>> >> When feedback is solicited, it is often as to detailed design
>> >>> >> specifics, not
>> >>> >> focused on goals.
>> >>> >>
>> >>> >> When implementation does take place after design, there is often
>> >>> >> disagreement as to what goals are or are not in scope.
>> >>> >>
>> >>> >> This results in commits that don't fully meet user needs.
>> >>> >>
>> >>> >>
>> >>> >> Goals:
>> >>> >>
>> >>> >> - Ensure user, contributor, and committer goals are clearly
>> >>> >> identified
>> >>> >> and
>> >>> >> agreed upon, before implementation takes place.
>> >>> >>
>> >>> >> - Ensure that a technically feasible strategy is chosen that is
>> >>> >> likely
>> >>> >> to
>> >>> >> meet the goals.
>> >>> >>
>> >>> >>
>> >>> >> Rejected Goals:
>> >>> >>
>> >>> >> - SIPs are not for detailed design.  Design by committee doesn't
>> >>> >> work.
>> >>> >>
>> >>> >> - SIPs are not for every change.  We dont need that much process.
>> >>> >>
>> >>> >>
>> >>> >> Strategy:
>> >>> >>
>> >>> >> My suggestion is outlined as a Spark Improvement Proposal process
>> >>> >> documented
>> >>> >> at
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>> >>> >>
>> >>> >> Specifics of Jira manipulation are an implementation detail we can
>> >>> >> figure
>> >>> >> out.
>> >>> >>
>> >>> >> I'm suggesting voting; the need here is for a _clear_ outcome.
>> >>> >>
>> >>> >>
>> >>> >> Rejected Strategies:
>> >>> >>
>> >>> >> Having someone who understands the problem implement it first
>> >>> >> works,
>> >>> >> but
>> >>> >> only if significant iteration after user feedback is allowed.
>> >>> >>
>> >>> >> Historically this has been problematic due to pressure to limit
>> >>> >> public
>> >>> >> api
>> >>> >> changes.
>> >>> >>
>> >>> >>
>> >>> >> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]>
>> >>> >> wrote:
>> >>> >>>
>> >>> >>> Alright looks like there are quite a bit of support. We should
>> >>> >>> wait
>> >>> >>> to
>> >>> >>> hear from more people too.
>> >>> >>>
>> >>> >>> To push this forward, Cody and I will be working together in the
>> >>> >>> next
>> >>> >>> couple of weeks to come up with a concrete, detailed proposal on
>> >>> >>> what
>> >>> >>> this
>> >>> >>> entails, and then we can discuss this the specific proposal as
>> >>> >>> well.
>> >>> >>>
>> >>> >>>
>> >>> >>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden email]>
>> >>> >>> wrote:
>> >>> >>>>
>> >>> >>>> Yeah, in case it wasn't clear, I was talking about SIPs for major
>> >>> >>>> user-facing or cross-cutting changes, not minor feature adds.
>> >>> >>>>
>> >>> >>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos
>> >>> >>>> <[hidden email]> wrote:
>> >>> >>>>>
>> >>> >>>>> +1 to the SIP label as long as it does not slow down things and
>> >>> >>>>> it
>> >>> >>>>> targets optimizing efforts, coordination etc. For example really
>> >>> >>>>> small
>> >>> >>>>> features should not need to go through this process (assuming
>> >>> >>>>> they
>> >>> >>>>> dont
>> >>> >>>>> touch public interfaces)  or re-factorings and hope it will be
>> >>> >>>>> kept
>> >>> >>>>> this
>> >>> >>>>> way. So as a guideline doc should be provided, like in the KIP
>> >>> >>>>> case.
>> >>> >>>>>
>> >>> >>>>> IMHO so far aside from tagging things and linking them elsewhere
>> >>> >>>>> simply
>> >>> >>>>> having design docs and prototypes implementations in PRs is not
>> >>> >>>>> something
>> >>> >>>>> that has not worked so far. What is really a pain in many
>> >>> >>>>> projects
>> >>> >>>>> out there
>> >>> >>>>> is discontinuity in progress of PRs, missing features, slow
>> >>> >>>>> reviews
>> >>> >>>>> which is
>> >>> >>>>> understandable to some extent... it is not only about Spark but
>> >>> >>>>> things can
>> >>> >>>>> be improved for sure for this project in particular as already
>> >>> >>>>> stated.
>> >>> >>>>>
>> >>> >>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden email]>
>> >>> >>>>> wrote:
>> >>> >>>>>>
>> >>> >>>>>> +1 to adding an SIP label and linking it from the website.  I
>> >>> >>>>>> think
>> >>> >>>>>> it
>> >>> >>>>>> needs
>> >>> >>>>>>
>> >>> >>>>>> - template that focuses it towards soliciting user goals / non
>> >>> >>>>>> goals
>> >>> >>>>>> - clear resolution as to which strategy was chosen to pursue.
>> >>> >>>>>> I'd
>> >>> >>>>>> recommend a vote.
>> >>> >>>>>>
>> >>> >>>>>> Matei asked me to clarify what I meant by changing interfaces,
>> >>> >>>>>> I
>> >>> >>>>>> think
>> >>> >>>>>> it's directly relevant to the SIP idea so I'll clarify here,
>> >>> >>>>>> and
>> >>> >>>>>> split
>> >>> >>>>>> a thread for the other discussion per Nicholas' request.
>> >>> >>>>>>
>> >>> >>>>>> I meant changing public user interfaces.  I think the first
>> >>> >>>>>> design
>> >>> >>>>>> is
>> >>> >>>>>> unlikely to be right, because it's done at a time when you have
>> >>> >>>>>> the
>> >>> >>>>>> least information.  As a user, I find it considerably more
>> >>> >>>>>> frustrating
>> >>> >>>>>> to be unable to use a tool to get my job done, than I do having
>> >>> >>>>>> to
>> >>> >>>>>> make minor changes to my code in order to take advantage of
>> >>> >>>>>> features.
>> >>> >>>>>> I've seen committers be seriously reluctant to allow changes to
>> >>> >>>>>> @experimental code that are needed in order for it to really
>> >>> >>>>>> work
>> >>> >>>>>> right.  You need to be able to iterate, and if people on both
>> >>> >>>>>> sides
>> >>> >>>>>> of
>> >>> >>>>>> the fence aren't going to respect that some newer apis are
>> >>> >>>>>> subject
>> >>> >>>>>> to
>> >>> >>>>>> change, then why even mark them as such?
>> >>> >>>>>>
>> >>> >>>>>> Ideally a finished SIP should give me a checklist of things
>> >>> >>>>>> that
>> >>> >>>>>> an
>> >>> >>>>>> implementation must do, and things that it doesn't need to do.
>> >>> >>>>>> Contributors/committers should be seriously discouraged from
>> >>> >>>>>> putting
>> >>> >>>>>> out a version 0.1 that doesn't have at least a prototype
>> >>> >>>>>> implementation of all those things, especially if they're then
>> >>> >>>>>> going
>> >>> >>>>>> to argue against interface changes necessary to get the the
>> >>> >>>>>> rest
>> >>> >>>>>> of
>> >>> >>>>>> the things done in the 0.2 version.
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden email]>
>> >>> >>>>>> wrote:
>> >>> >>>>>>> I like the lightweight proposal to add a SIP label.
>> >>> >>>>>>>
>> >>> >>>>>>> During Spark 2.0 development, Tom (Graves) and I suggested
>> >>> >>>>>>> using
>> >>> >>>>>>> wiki
>> >>> >>>>>>> to
>> >>> >>>>>>> track the list of major changes, but that never really
>> >>> >>>>>>> materialized
>> >>> >>>>>>> due to
>> >>> >>>>>>> the overhead. Adding a SIP label on major JIRAs and then link
>> >>> >>>>>>> to
>> >>> >>>>>>> them
>> >>> >>>>>>> prominently on the Spark website makes a lot of sense.
>> >>> >>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia
>> >>> >>>>>>> <[hidden email]>
>> >>> >>>>>>> wrote:
>> >>> >>>>>>>>
>> >>> >>>>>>>> For the improvement proposals, I think one major point was to
>> >>> >>>>>>>> make
>> >>> >>>>>>>> them
>> >>> >>>>>>>> really visible to users who are not contributors, so we
>> >>> >>>>>>>> should
>> >>> >>>>>>>> do
>> >>> >>>>>>>> more than
>> >>> >>>>>>>> sending stuff to dev@. One very lightweight idea is to have a
>> >>> >>>>>>>> new
>> >>> >>>>>>>> type of
>> >>> >>>>>>>> JIRA called a SIP and have a link to a filter that shows all
>> >>> >>>>>>>> such
>> >>> >>>>>>>> JIRAs from
>> >>> >>>>>>>> http://spark.apache.org. I also like the idea of SIP and
>> >>> >>>>>>>> design
>> >>> >>>>>>>> doc
>> >>> >>>>>>>> templates (in fact many projects have them).
>> >>> >>>>>>>>
>> >>> >>>>>>>> Matei
>> >>> >>>>>>>>
>> >>> >>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]>
>> >>> >>>>>>>> wrote:
>> >>> >>>>>>>>
>> >>> >>>>>>>> I called Cody last night and talked about some of the topics
>> >>> >>>>>>>> in
>> >>> >>>>>>>> his
>> >>> >>>>>>>> email.
>> >>> >>>>>>>> It became clear to me Cody genuinely cares about the project.
>> >>> >>>>>>>>
>> >>> >>>>>>>> Some of the frustrations come from the success of the project
>> >>> >>>>>>>> itself
>> >>> >>>>>>>> becoming very "hot", and it is difficult to get clarity from
>> >>> >>>>>>>> people
>> >>> >>>>>>>> who
>> >>> >>>>>>>> don't dedicate all their time to Spark. In fact, it is in
>> >>> >>>>>>>> some
>> >>> >>>>>>>> ways
>> >>> >>>>>>>> similar
>> >>> >>>>>>>> to scaling an engineering team in a successful startup: old
>> >>> >>>>>>>> processes that
>> >>> >>>>>>>> worked well might not work so well when it gets to a certain
>> >>> >>>>>>>> size,
>> >>> >>>>>>>> cultures
>> >>> >>>>>>>> can get diluted, building culture vs building process, etc.
>> >>> >>>>>>>>
>> >>> >>>>>>>> I also really like to have a more visible process for larger
>> >>> >>>>>>>> changes,
>> >>> >>>>>>>> especially major user facing API changes. Historically we
>> >>> >>>>>>>> upload
>> >>> >>>>>>>> design docs
>> >>> >>>>>>>> for major changes, but it is not always consistent and
>> >>> >>>>>>>> difficult
>> >>> >>>>>>>> to
>> >>> >>>>>>>> quality
>> >>> >>>>>>>> of the docs, due to the volunteering nature of the
>> >>> >>>>>>>> organization.
>> >>> >>>>>>>>
>> >>> >>>>>>>> Some of the more concrete ideas we discussed focus on
>> >>> >>>>>>>> building a
>> >>> >>>>>>>> culture
>> >>> >>>>>>>> to improve clarity:
>> >>> >>>>>>>>
>> >>> >>>>>>>> - Process: Large changes should have design docs posted on
>> >>> >>>>>>>> JIRA.
>> >>> >>>>>>>> One
>> >>> >>>>>>>> thing
>> >>> >>>>>>>> Cody and I didn't discuss but an idea that just came to me is
>> >>> >>>>>>>> we
>> >>> >>>>>>>> should
>> >>> >>>>>>>> create a design doc template for the project and ask
>> >>> >>>>>>>> everybody
>> >>> >>>>>>>> to
>> >>> >>>>>>>> follow.
>> >>> >>>>>>>> The design doc template should also explicitly list goals and
>> >>> >>>>>>>> non-goals, to
>> >>> >>>>>>>> make design doc more consistent.
>> >>> >>>>>>>>
>> >>> >>>>>>>> - Process: Email dev@ to solicit feedback. We have some this
>> >>> >>>>>>>> with
>> >>> >>>>>>>> some
>> >>> >>>>>>>> changes, but again very inconsistent. Just posting something
>> >>> >>>>>>>> on
>> >>> >>>>>>>> JIRA
>> >>> >>>>>>>> isn't
>> >>> >>>>>>>> sufficient, because there are simply too many JIRAs and the
>> >>> >>>>>>>> signal
>> >>> >>>>>>>> get lost
>> >>> >>>>>>>> in the noise. While this is generally impossible to enforce
>> >>> >>>>>>>> because
>> >>> >>>>>>>> we can't
>> >>> >>>>>>>> force all volunteers to conform to a process (or they might
>> >>> >>>>>>>> not
>> >>> >>>>>>>> even
>> >>> >>>>>>>> be
>> >>> >>>>>>>> aware of this),  those who are more familiar with the project
>> >>> >>>>>>>> can
>> >>> >>>>>>>> help by
>> >>> >>>>>>>> emailing the dev@ when they see something that hasn't been.
>> >>> >>>>>>>>
>> >>> >>>>>>>> - Culture: The design doc author(s) should be open to
>> >>> >>>>>>>> feedback.
>> >>> >>>>>>>> A
>> >>> >>>>>>>> design
>> >>> >>>>>>>> doc should serve as the base for discussion and is by no
>> >>> >>>>>>>> means
>> >>> >>>>>>>> the
>> >>> >>>>>>>> final
>> >>> >>>>>>>> design. Of course, this does not mean the author has to
>> >>> >>>>>>>> accept
>> >>> >>>>>>>> every
>> >>> >>>>>>>> feedback. They should also be comfortable accepting /
>> >>> >>>>>>>> rejecting
>> >>> >>>>>>>> ideas on
>> >>> >>>>>>>> technical grounds.
>> >>> >>>>>>>>
>> >>> >>>>>>>> - Process / Culture: For major ongoing projects, it can be
>> >>> >>>>>>>> useful
>> >>> >>>>>>>> to
>> >>> >>>>>>>> have
>> >>> >>>>>>>> some monthly Google hangouts that are open to the world. I am
>> >>> >>>>>>>> actually not
>> >>> >>>>>>>> sure how well this will work, because of the volunteering
>> >>> >>>>>>>> nature
>> >>> >>>>>>>> and
>> >>> >>>>>>>> we need
>> >>> >>>>>>>> to adjust for timezones for people across the globe, but it
>> >>> >>>>>>>> seems
>> >>> >>>>>>>> worth
>> >>> >>>>>>>> trying.
>> >>> >>>>>>>>
>> >>> >>>>>>>> - Culture: Contributors (including committers) should be more
>> >>> >>>>>>>> direct
>> >>> >>>>>>>> in
>> >>> >>>>>>>> setting expectations, including whether they are working on a
>> >>> >>>>>>>> specific
>> >>> >>>>>>>> issue, whether they will be working on a specific issue, and
>> >>> >>>>>>>> whether
>> >>> >>>>>>>> an
>> >>> >>>>>>>> issue or pr or jira should be rejected. Most people I know in
>> >>> >>>>>>>> this
>> >>> >>>>>>>> community
>> >>> >>>>>>>> are nice and don't enjoy telling other people no, but it is
>> >>> >>>>>>>> often
>> >>> >>>>>>>> more
>> >>> >>>>>>>> annoying to a contributor to not know anything than getting a
>> >>> >>>>>>>> no.
>> >>> >>>>>>>>
>> >>> >>>>>>>>
>> >>> >>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia
>> >>> >>>>>>>> <[hidden email]>
>> >>> >>>>>>>> wrote:
>> >>> >>>>>>>>>
>> >>> >>>>>>>>>
>> >>> >>>>>>>>> Love the idea of a more visible "Spark Improvement Proposal"
>> >>> >>>>>>>>> process that
>> >>> >>>>>>>>> solicits user input on new APIs. For what it's worth, I
>> >>> >>>>>>>>> don't
>> >>> >>>>>>>>> think
>> >>> >>>>>>>>> committers are trying to minimize their own work -- every
>> >>> >>>>>>>>> committer
>> >>> >>>>>>>>> cares
>> >>> >>>>>>>>> about making the software useful for users. However, it is
>> >>> >>>>>>>>> always
>> >>> >>>>>>>>> hard to
>> >>> >>>>>>>>> get user input and so it helps to have this kind of process.
>> >>> >>>>>>>>> I've
>> >>> >>>>>>>>> certainly
>> >>> >>>>>>>>> looked at the *IPs a lot in other software I use just to see
>> >>> >>>>>>>>> the
>> >>> >>>>>>>>> biggest
>> >>> >>>>>>>>> things on the roadmap.
>> >>> >>>>>>>>>
>> >>> >>>>>>>>> When you're talking about "changing interfaces", are you
>> >>> >>>>>>>>> talking
>> >>> >>>>>>>>> about
>> >>> >>>>>>>>> public or internal APIs? I do think many people hate
>> >>> >>>>>>>>> changing
>> >>> >>>>>>>>> public APIs
>> >>> >>>>>>>>> and I actually think that's for the best of the project.
>> >>> >>>>>>>>> That's
>> >>> >>>>>>>>> a
>> >>> >>>>>>>>> technical
>> >>> >>>>>>>>> debate, but basically, the worst thing when you're using a
>> >>> >>>>>>>>> piece
>> >>> >>>>>>>>> of
>> >>> >>>>>>>>> software
>> >>> >>>>>>>>> is that the developers constantly ask you to rewrite your
>> >>> >>>>>>>>> app
>> >>> >>>>>>>>> to
>> >>> >>>>>>>>> update to a
>> >>> >>>>>>>>> new version (and thus benefit from bug fixes, etc). Cue
>> >>> >>>>>>>>> anyone
>> >>> >>>>>>>>> who's used
>> >>> >>>>>>>>> Protobuf, or Guava. The "let's get everyone to change their
>> >>> >>>>>>>>> code
>> >>> >>>>>>>>> this
>> >>> >>>>>>>>> release" model works well within a single large company, but
>> >>> >>>>>>>>> doesn't work
>> >>> >>>>>>>>> well for a community, which is why nearly all *very* widely
>> >>> >>>>>>>>> used
>> >>> >>>>>>>>> programming
>> >>> >>>>>>>>> interfaces (I'm talking things like Java standard library,
>> >>> >>>>>>>>> Windows
>> >>> >>>>>>>>> API, etc)
>> >>> >>>>>>>>> almost *never* break backwards compatibility. All this is
>> >>> >>>>>>>>> done
>> >>> >>>>>>>>> within reason
>> >>> >>>>>>>>> though, e.g. we do change things in major releases (2.x,
>> >>> >>>>>>>>> 3.x,
>> >>> >>>>>>>>> etc).
>> >>> >>>>>>>>
>> >>> >>>>>>>>
>> >>> >>>>>>>>
>> >>> >>>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>> ---------------------------------------------------------------------
>> >>> >>>>>> To unsubscribe e-mail: [hidden email]
>> >>> >>>>>>
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>> --
>> >>> >>>>> Stavros Kontopoulos
>> >>> >>>>> Senior Software Engineer
>> >>> >>>>> Lightbend, Inc.
>> >>> >>>>> p:  +30 6977967274
>> >>> >>>>> e: [hidden email]
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>
>> >>> >>>
>> >>> >>
>> >>> >>
>> >>>
>> >>
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe e-mail: [hidden email]
>> >
>> >
>> > ________________________________
>> >
>> > If you reply to this email, your message will be added to the discussion
>> > below:
>> >
>> >
>> > http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-Improvement-Proposals-tp19268p19359.html
>> >
>> > To start a new topic under Apache Spark Developers List, email [hidden
>> > email]
>> > To unsubscribe from Apache Spark Developers List, click here.
>> > NAML
>> >
>> >
>> > ________________________________
>> > View this message in context: RE: Spark Improvement Proposals
>> > Sent from the Apache Spark Developers List mailing list archive at
>> > Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [hidden email]
>>
>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark Improvement Proposals

Ryan Blue
In reply to this post by Ryan Blue
Sorry, I missed that the proposal includes majority approval. Why majority instead of consensus? I think we want to build consensus around these proposals and it makes sense to discuss until no one would veto.

rb

On Mon, Oct 10, 2016 at 11:54 AM, Ryan Blue <[hidden email]> wrote:
+1 to votes to approve proposals. I agree that proposals should have an official mechanism to be accepted, and a vote is an established means of doing that well. I like that it includes a period to review the proposal and I think proposals should have been discussed enough ahead of a vote to survive the possibility of a veto.

I also like the names that are short and (mostly) unique, like SEP.

Where I disagree is with the requirement that a committer must formally propose an enhancement. I don't see the value of restricting this: if someone has the will to write up a proposal then they should be encouraged to do so and start a discussion about it. Even if there is a political reality as Cody says, what is the value of codifying that in our process? I think restricting who can submit proposals would only undermine them by pushing contributors out. Maybe I'm missing something here?

rb



On Mon, Oct 10, 2016 at 7:41 AM, Cody Koeninger <[hidden email]> wrote:
Yes, users suggesting SIPs is a good thing and is explicitly called
out in the linked document under the Who? section.  Formally proposing
them, not so much, because of the political realities.

Yes, implementation strategy definitely affects goals.  There are all
kinds of examples of this, I'll pick one that's my fault so as to
avoid sounding like I'm blaming:

When I implemented the Kafka DStream, one of my (not explicitly agreed
upon by the community) goals was to make sure people could use the
Dstream with however they were already using Kafka at work.  The lack
of explicit agreement on that goal led to all kinds of fighting with
committers, that could have been avoided.  The lack of explicit
up-front strategy discussion led to the DStream not really working
with compacted topics.  I knew about compacted topics, but don't have
a use for them, so had a blind spot there.  If there was explicit
up-front discussion that my strategy was "assume that batches can be
defined on the driver solely by beginning and ending offsets", there's
a greater chance that a user would have seen that and said, "hey, what
about non-contiguous offsets in a compacted topic".

This kind of thing is only going to happen smoothly if we have a
lightweight user-visible process with clear outcomes.

On Mon, Oct 10, 2016 at 1:34 AM, assaf.mendelson
<[hidden email]> wrote:
> I agree with most of what Cody said.
>
> Two things:
>
> First we can always have other people suggest SIPs but mark them as
> “unreviewed” and have committers basically move them forward. The problem is
> that writing a good document takes time. This way we can leverage non
> committers to do some of this work (it is just another way to contribute).
>
>
>
> As for strategy, in many cases implementation strategy can affect the goals.
> I will give  a small example: In the current structured streaming strategy,
> we group by the time to achieve a sliding window. This is definitely an
> implementation decision and not a goal. However, I can think of several
> aggregation functions which have the time inside their calculation buffer.
> For example, let’s say we want to return a set of all distinct values. One
> way to implement this would be to make the set into a map and have the value
> contain the last time seen. Multiplying it across the groupby would cost a
> lot in performance. So adding such a strategy would have a great effect on
> the type of aggregations and their performance which does affect the goal.
> Without adding the strategy, it is easy for whoever goes to the design
> document to not think about these cases. Furthermore, it might be decided
> that these cases are rare enough so that the strategy is still good enough
> but how would we know it without user feedback?
>
> I believe this example is exactly what Cody was talking about. Since many
> times implementation strategies have a large effect on the goal, we should
> have it discussed when discussing the goals. In addition, while it is often
> easy to throw out completely infeasible goals, it is often much harder to
> figure out that the goals are unfeasible without fine tuning.
>
>
>
>
>
> Assaf.
>
>
>
> From: Cody Koeninger-2 [via Apache Spark Developers List]
> [mailto:[hidden email][hidden email]]
> Sent: Monday, October 10, 2016 2:25 AM
> To: Mendelson, Assaf
> Subject: Re: Spark Improvement Proposals
>
>
>
> Only committers should formally submit SIPs because in an apache
> project only commiters have explicit political power.  If a user can't
> find a commiter willing to sponsor an SIP idea, they have no way to
> get the idea passed in any case.  If I can't find a committer to
> sponsor this meta-SIP idea, I'm out of luck.
>
> I do not believe unrealistic goals can be found solely by inspection.
> We've managed to ignore unrealistic goals even after implementation!
> Focusing on APIs can allow people to think they've solved something,
> when there's really no way of implementing that API while meeting the
> goals.  Rapid iteration is clearly the best way to address this, but
> we've already talked about why that hasn't really worked.  If adding a
> non-binding API section to the template is important to you, I'm not
> against it, but I don't think it's sufficient.
>
> On your PRD vs design doc spectrum, I'm saying this is closer to a
> PRD.  Clear agreement on goals is the most important thing and that's
> why it's the thing I want binding agreement on.  But I cannot agree to
> goals unless I have enough minimal technical info to judge whether the
> goals are likely to actually be accomplished.
>
>
>
> On Sun, Oct 9, 2016 at 5:35 PM, Matei Zaharia <[hidden email]> wrote:
>
>
>> Well, I think there are a few things here that don't make sense. First,
>> why
>> should only committers submit SIPs? Development in the project should be
>> open to all contributors, whether they're committers or not. Second, I
>> think
>> unrealistic goals can be found just by inspecting the goals, and I'm not
>> super worried that we'll accept a lot of SIPs that are then infeasible --
>> we
>> can then submit new ones. But this depends on whether you want this
>> process
>> to be a "design doc lite", where people also agree on implementation
>> strategy, or just a way to agree on goals. This is what I asked earlier
>> about PRDs vs design docs (and I'm open to either one but I'd just like
>> clarity). Finally, both as a user and designer of software, I always want
>> to
>> give feedback on APIs, so I'd really like a culture of having those early.
>> People don't argue about prettiness when they discuss APIs, they argue
>> about
>> the core concepts to expose in order to meet various goals, and then
>> they're
>> stuck maintaining those for a long time.
>>
>> Matei
>>
>> On Oct 9, 2016, at 3:10 PM, Cody Koeninger <[hidden email]> wrote:
>>
>> Users instead of people, sure.  Commiters and contributors are (or at
>> least
>> should be) a subset of users.
>>
>> Non goals, sure. I don't care what the name is, but we need to clearly say
>> e.g. 'no we are not maintaining compatibility with XYZ right now'.
>>
>> API, what I care most about is whether it allows me to accomplish the
>> goals.
>> Arguing about how ugly or pretty it is can be saved for design/
>> implementation imho.
>>
>> Strategy, this is necessary because otherwise goals can be out of line
>> with
>> reality.  Don't propose goals you don't have at least some idea of how to
>> implement.
>>
>> Rejected strategies, given that commiters are the only ones I'm saying
>> should formally submit SPARKLIs or SIPs, if they put junk in a required
>> section then slap them down for it and tell them to fix it.
>>
>>
>> On Oct 9, 2016 4:36 PM, "Matei Zaharia" <[hidden email]> wrote:
>>>
>>> Yup, this is the stuff that I found unclear. Thanks for clarifying here,
>>> but we should also clarify it in the writeup. In particular:
>>>
>>> - Goals needs to be about user-facing behavior ("people" is broad)
>>>
>>> - I'd rename Rejected Goals to Non-Goals. Otherwise someone will dig up
>>> one of these and say "Spark's developers have officially rejected X,
>>> which
>>> our awesome system has".
>>>
>>> - For user-facing stuff, I think you need a section on API. Virtually all
>>> other *IPs I've seen have that.
>>>
>>> - I'm still not sure why the strategy section is needed if the purpose is
>>> to define user-facing behavior -- unless this is the strategy for setting
>>> the goals or for defining the API. That sounds squarely like a design doc
>>> issue. In some sense, who cares whether the proposal is technically
>>> feasible
>>> right now? If it's infeasible, that will be discovered later during
>>> design
>>> and implementation. Same thing with rejected strategies -- listing some
>>> of
>>> those is definitely useful sometimes, but if you make this a *required*
>>> section, people are just going to fill it in with bogus stuff (I've seen
>>> this happen before).
>>>
>>> Matei
>>>
>
>>> > On Oct 9, 2016, at 2:14 PM, Cody Koeninger <[hidden email]> wrote:
>>> >
>>> > So to focus the discussion on the specific strategy I'm suggesting,
>>> > documented at
>>> >
>>> >
>>> >
>>> > https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>>> >
>>> > "Goals: What must this allow people to do, that they can't currently?"
>>> >
>>> > Is it unclear that this is focusing specifically on people-visible
>>> > behavior?
>>> >
>>> > Rejected goals -  are important because otherwise people keep trying
>>> > to argue about scope.  Of course you can change things later with a
>>> > different SIP and different vote, the point is to focus.
>>> >
>>> > Use cases - are something that people are going to bring up in
>>> > discussion.  If they aren't clearly documented as a goal ("This must
>>> > allow me to connect using SSL"), they should be added.
>>> >
>>> > Internal architecture - if the people who need specific behavior are
>>> > implementers of other parts of the system, that's fine.
>>> >
>>> > Rejected strategies - If you have none of these, you have no evidence
>>> > that the proponent didn't just go with the first thing they had in
>>> > mind (or have already implemented), which is a big problem currently.
>>> > Approval isn't binding as to specifics of implementation, so these
>>> > aren't handcuffs.  The goals are the contract, the strategy is
>>> > evidence that contract can actually be met.
>>> >
>>> > Design docs - I'm not touching design docs.  The markdown file I
>>> > linked specifically says of the strategy section "This is not a full
>>> > design document."  Is this unclear?  Design docs can be worked on
>>> > obviously, but that's not what I'm concerned with here.
>>> >
>>> >
>>> >
>>> >
>>> > On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[hidden email]>
>>> > wrote:
>>> >> Hi Cody,
>>> >>
>>> >> I think this would be a lot more concrete if we had a more detailed
>>> >> template
>>> >> for SIPs. Right now, it's not super clear what's in scope -- e.g. are
>>> >> they
>>> >> a way to solicit feedback on the user-facing behavior or on the
>>> >> internals?
>>> >> "Goals" can cover both things. I've been thinking of SIPs more as
>>> >> Product
>>> >> Requirements Docs (PRDs), which focus on *what* a code change should
>>> >> do
>>> >> as
>>> >> opposed to how.
>>> >>
>>> >> In particular, here are some things that you may or may not consider
>>> >> in
>>> >> scope for SIPs:
>>> >>
>>> >> - Goals and non-goals: This is definitely in scope, and IMO should
>>> >> focus on
>>> >> user-visible behavior (e.g. "system supports SQL window functions" or
>>> >> "system continues working if one node fails"). BTW I wouldn't say
>>> >> "rejected
>>> >> goals" because some of them might become goals later, so we're not
>>> >> definitively rejecting them.
>>> >>
>>> >> - Public API: Probably should be included in most SIPs unless it's too
>>> >> large
>>> >> to fully specify then (e.g. "let's add an ML library").
>>> >>
>>> >> - Use cases: I usually find this very useful in PRDs to better
>>> >> communicate
>>> >> the goals.
>>> >>
>>> >> - Internal architecture: This is usually *not* a thing users can
>>> >> easily
>>> >> comment on and it sounds more like a design doc item. Of course it's
>>> >> important to show that the SIP is feasible to implement. One
>>> >> exception,
>>> >> however, is that I think we'll have some SIPs primarily on internals
>>> >> (e.g.
>>> >> if somebody wants to refactor Spark's query optimizer or something).
>>> >>
>>> >> - Rejected strategies: I personally wouldn't put this, because what's
>>> >> the
>>> >> point of voting to reject a strategy before you've really begun
>>> >> designing
>>> >> and implementing something? What if you discover that the strategy is
>>> >> actually better when you start doing stuff?
>>> >>
>>> >> At a super high level, it depends on whether you want the SIPs to be
>>> >> PRDs
>>> >> for getting some quick feedback on the goals of a feature before it is
>>> >> designed, or something more like full-fledged design docs (just a more
>>> >> visible design doc for bigger changes). I looked at Kafka's KIPs, and
>>> >> they
>>> >> actually seem to be more like design docs. This can work too but it
>>> >> does
>>> >> require more work from the proposer and it can lead to the same
>>> >> problems you
>>> >> mentioned with people already having a design and implementation in
>>> >> mind.
>>> >>
>>> >> Basically, the question is, are you trying to iterate faster on design
>>> >> by
>>> >> adding a step for user feedback earlier? Or are you just trying to
>>> >> make
>>> >> design docs for key features more visible (and their approval more
>>> >> formal)?
>>> >>
>>> >> BTW note that in either case, I'd like to have a template for design
>>> >> docs
>>> >> too, which should also include goals. I think that would've avoided
>>> >> some of
>>> >> the issues you brought up.
>>> >>
>>> >> Matei
>>> >>
>>> >> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]> wrote:
>>> >>
>>> >> Here's my specific proposal (meta-proposal?)
>>> >>
>>> >> Spark Improvement Proposals (SIP)
>>> >>
>>> >>
>>> >> Background:
>>> >>
>>> >> The current problem is that design and implementation of large
>>> >> features
>>> >> are
>>> >> often done in private, before soliciting user feedback.
>>> >>
>>> >> When feedback is solicited, it is often as to detailed design
>>> >> specifics, not
>>> >> focused on goals.
>>> >>
>>> >> When implementation does take place after design, there is often
>>> >> disagreement as to what goals are or are not in scope.
>>> >>
>>> >> This results in commits that don't fully meet user needs.
>>> >>
>>> >>
>>> >> Goals:
>>> >>
>>> >> - Ensure user, contributor, and committer goals are clearly identified
>>> >> and
>>> >> agreed upon, before implementation takes place.
>>> >>
>>> >> - Ensure that a technically feasible strategy is chosen that is likely
>>> >> to
>>> >> meet the goals.
>>> >>
>>> >>
>>> >> Rejected Goals:
>>> >>
>>> >> - SIPs are not for detailed design.  Design by committee doesn't work.
>>> >>
>>> >> - SIPs are not for every change.  We dont need that much process.
>>> >>
>>> >>
>>> >> Strategy:
>>> >>
>>> >> My suggestion is outlined as a Spark Improvement Proposal process
>>> >> documented
>>> >> at
>>> >>
>>> >>
>>> >>
>>> >> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>>> >>
>>> >> Specifics of Jira manipulation are an implementation detail we can
>>> >> figure
>>> >> out.
>>> >>
>>> >> I'm suggesting voting; the need here is for a _clear_ outcome.
>>> >>
>>> >>
>>> >> Rejected Strategies:
>>> >>
>>> >> Having someone who understands the problem implement it first works,
>>> >> but
>>> >> only if significant iteration after user feedback is allowed.
>>> >>
>>> >> Historically this has been problematic due to pressure to limit public
>>> >> api
>>> >> changes.
>>> >>
>>> >>
>>> >> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]>
>>> >> wrote:
>>> >>>
>>> >>> Alright looks like there are quite a bit of support. We should wait
>>> >>> to
>>> >>> hear from more people too.
>>> >>>
>>> >>> To push this forward, Cody and I will be working together in the next
>>> >>> couple of weeks to come up with a concrete, detailed proposal on what
>>> >>> this
>>> >>> entails, and then we can discuss this the specific proposal as well.
>>> >>>
>>> >>>
>>> >>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden email]>
>>> >>> wrote:
>>> >>>>
>>> >>>> Yeah, in case it wasn't clear, I was talking about SIPs for major
>>> >>>> user-facing or cross-cutting changes, not minor feature adds.
>>> >>>>
>>> >>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos
>>> >>>> <[hidden email]> wrote:
>>> >>>>>
>>> >>>>> +1 to the SIP label as long as it does not slow down things and it
>>> >>>>> targets optimizing efforts, coordination etc. For example really
>>> >>>>> small
>>> >>>>> features should not need to go through this process (assuming they
>>> >>>>> dont
>>> >>>>> touch public interfaces)  or re-factorings and hope it will be kept
>>> >>>>> this
>>> >>>>> way. So as a guideline doc should be provided, like in the KIP
>>> >>>>> case.
>>> >>>>>
>>> >>>>> IMHO so far aside from tagging things and linking them elsewhere
>>> >>>>> simply
>>> >>>>> having design docs and prototypes implementations in PRs is not
>>> >>>>> something
>>> >>>>> that has not worked so far. What is really a pain in many projects
>>> >>>>> out there
>>> >>>>> is discontinuity in progress of PRs, missing features, slow reviews
>>> >>>>> which is
>>> >>>>> understandable to some extent... it is not only about Spark but
>>> >>>>> things can
>>> >>>>> be improved for sure for this project in particular as already
>>> >>>>> stated.
>>> >>>>>
>>> >>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden email]>
>>> >>>>> wrote:
>>> >>>>>>
>>> >>>>>> +1 to adding an SIP label and linking it from the website.  I
>>> >>>>>> think
>>> >>>>>> it
>>> >>>>>> needs
>>> >>>>>>
>>> >>>>>> - template that focuses it towards soliciting user goals / non
>>> >>>>>> goals
>>> >>>>>> - clear resolution as to which strategy was chosen to pursue.  I'd
>>> >>>>>> recommend a vote.
>>> >>>>>>
>>> >>>>>> Matei asked me to clarify what I meant by changing interfaces, I
>>> >>>>>> think
>>> >>>>>> it's directly relevant to the SIP idea so I'll clarify here, and
>>> >>>>>> split
>>> >>>>>> a thread for the other discussion per Nicholas' request.
>>> >>>>>>
>>> >>>>>> I meant changing public user interfaces.  I think the first design
>>> >>>>>> is
>>> >>>>>> unlikely to be right, because it's done at a time when you have
>>> >>>>>> the
>>> >>>>>> least information.  As a user, I find it considerably more
>>> >>>>>> frustrating
>>> >>>>>> to be unable to use a tool to get my job done, than I do having to
>>> >>>>>> make minor changes to my code in order to take advantage of
>>> >>>>>> features.
>>> >>>>>> I've seen committers be seriously reluctant to allow changes to
>>> >>>>>> @experimental code that are needed in order for it to really work
>>> >>>>>> right.  You need to be able to iterate, and if people on both
>>> >>>>>> sides
>>> >>>>>> of
>>> >>>>>> the fence aren't going to respect that some newer apis are subject
>>> >>>>>> to
>>> >>>>>> change, then why even mark them as such?
>>> >>>>>>
>>> >>>>>> Ideally a finished SIP should give me a checklist of things that
>>> >>>>>> an
>>> >>>>>> implementation must do, and things that it doesn't need to do.
>>> >>>>>> Contributors/committers should be seriously discouraged from
>>> >>>>>> putting
>>> >>>>>> out a version 0.1 that doesn't have at least a prototype
>>> >>>>>> implementation of all those things, especially if they're then
>>> >>>>>> going
>>> >>>>>> to argue against interface changes necessary to get the the rest
>>> >>>>>> of
>>> >>>>>> the things done in the 0.2 version.
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden email]>
>>> >>>>>> wrote:
>>> >>>>>>> I like the lightweight proposal to add a SIP label.
>>> >>>>>>>
>>> >>>>>>> During Spark 2.0 development, Tom (Graves) and I suggested using
>>> >>>>>>> wiki
>>> >>>>>>> to
>>> >>>>>>> track the list of major changes, but that never really
>>> >>>>>>> materialized
>>> >>>>>>> due to
>>> >>>>>>> the overhead. Adding a SIP label on major JIRAs and then link to
>>> >>>>>>> them
>>> >>>>>>> prominently on the Spark website makes a lot of sense.
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia
>>> >>>>>>> <[hidden email]>
>>> >>>>>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>> For the improvement proposals, I think one major point was to
>>> >>>>>>>> make
>>> >>>>>>>> them
>>> >>>>>>>> really visible to users who are not contributors, so we should
>>> >>>>>>>> do
>>> >>>>>>>> more than
>>> >>>>>>>> sending stuff to dev@. One very lightweight idea is to have a
>>> >>>>>>>> new
>>> >>>>>>>> type of
>>> >>>>>>>> JIRA called a SIP and have a link to a filter that shows all
>>> >>>>>>>> such
>>> >>>>>>>> JIRAs from
>>> >>>>>>>> http://spark.apache.org. I also like the idea of SIP and design
>>> >>>>>>>> doc
>>> >>>>>>>> templates (in fact many projects have them).
>>> >>>>>>>>
>>> >>>>>>>> Matei
>>> >>>>>>>>
>>> >>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]>
>>> >>>>>>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>> I called Cody last night and talked about some of the topics in
>>> >>>>>>>> his
>>> >>>>>>>> email.
>>> >>>>>>>> It became clear to me Cody genuinely cares about the project.
>>> >>>>>>>>
>>> >>>>>>>> Some of the frustrations come from the success of the project
>>> >>>>>>>> itself
>>> >>>>>>>> becoming very "hot", and it is difficult to get clarity from
>>> >>>>>>>> people
>>> >>>>>>>> who
>>> >>>>>>>> don't dedicate all their time to Spark. In fact, it is in some
>>> >>>>>>>> ways
>>> >>>>>>>> similar
>>> >>>>>>>> to scaling an engineering team in a successful startup: old
>>> >>>>>>>> processes that
>>> >>>>>>>> worked well might not work so well when it gets to a certain
>>> >>>>>>>> size,
>>> >>>>>>>> cultures
>>> >>>>>>>> can get diluted, building culture vs building process, etc.
>>> >>>>>>>>
>>> >>>>>>>> I also really like to have a more visible process for larger
>>> >>>>>>>> changes,
>>> >>>>>>>> especially major user facing API changes. Historically we upload
>>> >>>>>>>> design docs
>>> >>>>>>>> for major changes, but it is not always consistent and difficult
>>> >>>>>>>> to
>>> >>>>>>>> quality
>>> >>>>>>>> of the docs, due to the volunteering nature of the organization.
>>> >>>>>>>>
>>> >>>>>>>> Some of the more concrete ideas we discussed focus on building a
>>> >>>>>>>> culture
>>> >>>>>>>> to improve clarity:
>>> >>>>>>>>
>>> >>>>>>>> - Process: Large changes should have design docs posted on JIRA.
>>> >>>>>>>> One
>>> >>>>>>>> thing
>>> >>>>>>>> Cody and I didn't discuss but an idea that just came to me is we
>>> >>>>>>>> should
>>> >>>>>>>> create a design doc template for the project and ask everybody
>>> >>>>>>>> to
>>> >>>>>>>> follow.
>>> >>>>>>>> The design doc template should also explicitly list goals and
>>> >>>>>>>> non-goals, to
>>> >>>>>>>> make design doc more consistent.
>>> >>>>>>>>
>>> >>>>>>>> - Process: Email dev@ to solicit feedback. We have some this
>>> >>>>>>>> with
>>> >>>>>>>> some
>>> >>>>>>>> changes, but again very inconsistent. Just posting something on
>>> >>>>>>>> JIRA
>>> >>>>>>>> isn't
>>> >>>>>>>> sufficient, because there are simply too many JIRAs and the
>>> >>>>>>>> signal
>>> >>>>>>>> get lost
>>> >>>>>>>> in the noise. While this is generally impossible to enforce
>>> >>>>>>>> because
>>> >>>>>>>> we can't
>>> >>>>>>>> force all volunteers to conform to a process (or they might not
>>> >>>>>>>> even
>>> >>>>>>>> be
>>> >>>>>>>> aware of this),  those who are more familiar with the project
>>> >>>>>>>> can
>>> >>>>>>>> help by
>>> >>>>>>>> emailing the dev@ when they see something that hasn't been.
>>> >>>>>>>>
>>> >>>>>>>> - Culture: The design doc author(s) should be open to feedback.
>>> >>>>>>>> A
>>> >>>>>>>> design
>>> >>>>>>>> doc should serve as the base for discussion and is by no means
>>> >>>>>>>> the
>>> >>>>>>>> final
>>> >>>>>>>> design. Of course, this does not mean the author has to accept
>>> >>>>>>>> every
>>> >>>>>>>> feedback. They should also be comfortable accepting / rejecting
>>> >>>>>>>> ideas on
>>> >>>>>>>> technical grounds.
>>> >>>>>>>>
>>> >>>>>>>> - Process / Culture: For major ongoing projects, it can be
>>> >>>>>>>> useful
>>> >>>>>>>> to
>>> >>>>>>>> have
>>> >>>>>>>> some monthly Google hangouts that are open to the world. I am
>>> >>>>>>>> actually not
>>> >>>>>>>> sure how well this will work, because of the volunteering nature
>>> >>>>>>>> and
>>> >>>>>>>> we need
>>> >>>>>>>> to adjust for timezones for people across the globe, but it
>>> >>>>>>>> seems
>>> >>>>>>>> worth
>>> >>>>>>>> trying.
>>> >>>>>>>>
>>> >>>>>>>> - Culture: Contributors (including committers) should be more
>>> >>>>>>>> direct
>>> >>>>>>>> in
>>> >>>>>>>> setting expectations, including whether they are working on a
>>> >>>>>>>> specific
>>> >>>>>>>> issue, whether they will be working on a specific issue, and
>>> >>>>>>>> whether
>>> >>>>>>>> an
>>> >>>>>>>> issue or pr or jira should be rejected. Most people I know in
>>> >>>>>>>> this
>>> >>>>>>>> community
>>> >>>>>>>> are nice and don't enjoy telling other people no, but it is
>>> >>>>>>>> often
>>> >>>>>>>> more
>>> >>>>>>>> annoying to a contributor to not know anything than getting a
>>> >>>>>>>> no.
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia
>>> >>>>>>>> <[hidden email]>
>>> >>>>>>>> wrote:
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> Love the idea of a more visible "Spark Improvement Proposal"
>>> >>>>>>>>> process that
>>> >>>>>>>>> solicits user input on new APIs. For what it's worth, I don't
>>> >>>>>>>>> think
>>> >>>>>>>>> committers are trying to minimize their own work -- every
>>> >>>>>>>>> committer
>>> >>>>>>>>> cares
>>> >>>>>>>>> about making the software useful for users. However, it is
>>> >>>>>>>>> always
>>> >>>>>>>>> hard to
>>> >>>>>>>>> get user input and so it helps to have this kind of process.
>>> >>>>>>>>> I've
>>> >>>>>>>>> certainly
>>> >>>>>>>>> looked at the *IPs a lot in other software I use just to see
>>> >>>>>>>>> the
>>> >>>>>>>>> biggest
>>> >>>>>>>>> things on the roadmap.
>>> >>>>>>>>>
>>> >>>>>>>>> When you're talking about "changing interfaces", are you
>>> >>>>>>>>> talking
>>> >>>>>>>>> about
>>> >>>>>>>>> public or internal APIs? I do think many people hate changing
>>> >>>>>>>>> public APIs
>>> >>>>>>>>> and I actually think that's for the best of the project. That's
>>> >>>>>>>>> a
>>> >>>>>>>>> technical
>>> >>>>>>>>> debate, but basically, the worst thing when you're using a
>>> >>>>>>>>> piece
>>> >>>>>>>>> of
>>> >>>>>>>>> software
>>> >>>>>>>>> is that the developers constantly ask you to rewrite your app
>>> >>>>>>>>> to
>>> >>>>>>>>> update to a
>>> >>>>>>>>> new version (and thus benefit from bug fixes, etc). Cue anyone
>>> >>>>>>>>> who's used
>>> >>>>>>>>> Protobuf, or Guava. The "let's get everyone to change their
>>> >>>>>>>>> code
>>> >>>>>>>>> this
>>> >>>>>>>>> release" model works well within a single large company, but
>>> >>>>>>>>> doesn't work
>>> >>>>>>>>> well for a community, which is why nearly all *very* widely
>>> >>>>>>>>> used
>>> >>>>>>>>> programming
>>> >>>>>>>>> interfaces (I'm talking things like Java standard library,
>>> >>>>>>>>> Windows
>>> >>>>>>>>> API, etc)
>>> >>>>>>>>> almost *never* break backwards compatibility. All this is done
>>> >>>>>>>>> within reason
>>> >>>>>>>>> though, e.g. we do change things in major releases (2.x, 3.x,
>>> >>>>>>>>> etc).
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> ---------------------------------------------------------------------
>>> >>>>>> To unsubscribe e-mail: [hidden email]
>>> >>>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> Stavros Kontopoulos
>>> >>>>> Senior Software Engineer
>>> >>>>> Lightbend, Inc.
>>> >>>>> p:  <a href="tel:%2B30%206977967274" value="+306977967274" target="_blank">+30 6977967274
>>> >>>>> e: [hidden email]
>>> >>>>>
>>> >>>>>
>>> >>>>
>>> >>>
>>> >>
>>> >>
>>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>
>
> ________________________________
>
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-Improvement-Proposals-tp19268p19359.html
>
> To start a new topic under Apache Spark Developers List, email [hidden
> email]
> To unsubscribe from Apache Spark Developers List, click here.
> NAML
>
>
> ________________________________
> View this message in context: RE: Spark Improvement Proposals
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]




--
Ryan Blue
Software Engineer
Netflix



--
Ryan Blue
Software Engineer
Netflix
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark Improvement Proposals

Cody Koeninger-2
I think this is closer to a procedural issue than a code modification
issue, hence why majority.  If everyone thinks consensus is better, I
don't care.  Again, I don't feel strongly about the way we achieve
clarity, just that we achieve clarity.

On Mon, Oct 10, 2016 at 2:02 PM, Ryan Blue <[hidden email]> wrote:

> Sorry, I missed that the proposal includes majority approval. Why majority
> instead of consensus? I think we want to build consensus around these
> proposals and it makes sense to discuss until no one would veto.
>
> rb
>
> On Mon, Oct 10, 2016 at 11:54 AM, Ryan Blue <[hidden email]> wrote:
>>
>> +1 to votes to approve proposals. I agree that proposals should have an
>> official mechanism to be accepted, and a vote is an established means of
>> doing that well. I like that it includes a period to review the proposal and
>> I think proposals should have been discussed enough ahead of a vote to
>> survive the possibility of a veto.
>>
>> I also like the names that are short and (mostly) unique, like SEP.
>>
>> Where I disagree is with the requirement that a committer must formally
>> propose an enhancement. I don't see the value of restricting this: if
>> someone has the will to write up a proposal then they should be encouraged
>> to do so and start a discussion about it. Even if there is a political
>> reality as Cody says, what is the value of codifying that in our process? I
>> think restricting who can submit proposals would only undermine them by
>> pushing contributors out. Maybe I'm missing something here?
>>
>> rb
>>
>>
>>
>> On Mon, Oct 10, 2016 at 7:41 AM, Cody Koeninger <[hidden email]>
>> wrote:
>>>
>>> Yes, users suggesting SIPs is a good thing and is explicitly called
>>> out in the linked document under the Who? section.  Formally proposing
>>> them, not so much, because of the political realities.
>>>
>>> Yes, implementation strategy definitely affects goals.  There are all
>>> kinds of examples of this, I'll pick one that's my fault so as to
>>> avoid sounding like I'm blaming:
>>>
>>> When I implemented the Kafka DStream, one of my (not explicitly agreed
>>> upon by the community) goals was to make sure people could use the
>>> Dstream with however they were already using Kafka at work.  The lack
>>> of explicit agreement on that goal led to all kinds of fighting with
>>> committers, that could have been avoided.  The lack of explicit
>>> up-front strategy discussion led to the DStream not really working
>>> with compacted topics.  I knew about compacted topics, but don't have
>>> a use for them, so had a blind spot there.  If there was explicit
>>> up-front discussion that my strategy was "assume that batches can be
>>> defined on the driver solely by beginning and ending offsets", there's
>>> a greater chance that a user would have seen that and said, "hey, what
>>> about non-contiguous offsets in a compacted topic".
>>>
>>> This kind of thing is only going to happen smoothly if we have a
>>> lightweight user-visible process with clear outcomes.
>>>
>>> On Mon, Oct 10, 2016 at 1:34 AM, assaf.mendelson
>>> <[hidden email]> wrote:
>>> > I agree with most of what Cody said.
>>> >
>>> > Two things:
>>> >
>>> > First we can always have other people suggest SIPs but mark them as
>>> > “unreviewed” and have committers basically move them forward. The
>>> > problem is
>>> > that writing a good document takes time. This way we can leverage non
>>> > committers to do some of this work (it is just another way to
>>> > contribute).
>>> >
>>> >
>>> >
>>> > As for strategy, in many cases implementation strategy can affect the
>>> > goals.
>>> > I will give  a small example: In the current structured streaming
>>> > strategy,
>>> > we group by the time to achieve a sliding window. This is definitely an
>>> > implementation decision and not a goal. However, I can think of several
>>> > aggregation functions which have the time inside their calculation
>>> > buffer.
>>> > For example, let’s say we want to return a set of all distinct values.
>>> > One
>>> > way to implement this would be to make the set into a map and have the
>>> > value
>>> > contain the last time seen. Multiplying it across the groupby would
>>> > cost a
>>> > lot in performance. So adding such a strategy would have a great effect
>>> > on
>>> > the type of aggregations and their performance which does affect the
>>> > goal.
>>> > Without adding the strategy, it is easy for whoever goes to the design
>>> > document to not think about these cases. Furthermore, it might be
>>> > decided
>>> > that these cases are rare enough so that the strategy is still good
>>> > enough
>>> > but how would we know it without user feedback?
>>> >
>>> > I believe this example is exactly what Cody was talking about. Since
>>> > many
>>> > times implementation strategies have a large effect on the goal, we
>>> > should
>>> > have it discussed when discussing the goals. In addition, while it is
>>> > often
>>> > easy to throw out completely infeasible goals, it is often much harder
>>> > to
>>> > figure out that the goals are unfeasible without fine tuning.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > Assaf.
>>> >
>>> >
>>> >
>>> > From: Cody Koeninger-2 [via Apache Spark Developers List]
>>> > [mailto:ml-node+[hidden email]]
>>> > Sent: Monday, October 10, 2016 2:25 AM
>>> > To: Mendelson, Assaf
>>> > Subject: Re: Spark Improvement Proposals
>>> >
>>> >
>>> >
>>> > Only committers should formally submit SIPs because in an apache
>>> > project only commiters have explicit political power.  If a user can't
>>> > find a commiter willing to sponsor an SIP idea, they have no way to
>>> > get the idea passed in any case.  If I can't find a committer to
>>> > sponsor this meta-SIP idea, I'm out of luck.
>>> >
>>> > I do not believe unrealistic goals can be found solely by inspection.
>>> > We've managed to ignore unrealistic goals even after implementation!
>>> > Focusing on APIs can allow people to think they've solved something,
>>> > when there's really no way of implementing that API while meeting the
>>> > goals.  Rapid iteration is clearly the best way to address this, but
>>> > we've already talked about why that hasn't really worked.  If adding a
>>> > non-binding API section to the template is important to you, I'm not
>>> > against it, but I don't think it's sufficient.
>>> >
>>> > On your PRD vs design doc spectrum, I'm saying this is closer to a
>>> > PRD.  Clear agreement on goals is the most important thing and that's
>>> > why it's the thing I want binding agreement on.  But I cannot agree to
>>> > goals unless I have enough minimal technical info to judge whether the
>>> > goals are likely to actually be accomplished.
>>> >
>>> >
>>> >
>>> > On Sun, Oct 9, 2016 at 5:35 PM, Matei Zaharia <[hidden email]> wrote:
>>> >
>>> >
>>> >> Well, I think there are a few things here that don't make sense.
>>> >> First,
>>> >> why
>>> >> should only committers submit SIPs? Development in the project should
>>> >> be
>>> >> open to all contributors, whether they're committers or not. Second, I
>>> >> think
>>> >> unrealistic goals can be found just by inspecting the goals, and I'm
>>> >> not
>>> >> super worried that we'll accept a lot of SIPs that are then infeasible
>>> >> --
>>> >> we
>>> >> can then submit new ones. But this depends on whether you want this
>>> >> process
>>> >> to be a "design doc lite", where people also agree on implementation
>>> >> strategy, or just a way to agree on goals. This is what I asked
>>> >> earlier
>>> >> about PRDs vs design docs (and I'm open to either one but I'd just
>>> >> like
>>> >> clarity). Finally, both as a user and designer of software, I always
>>> >> want
>>> >> to
>>> >> give feedback on APIs, so I'd really like a culture of having those
>>> >> early.
>>> >> People don't argue about prettiness when they discuss APIs, they argue
>>> >> about
>>> >> the core concepts to expose in order to meet various goals, and then
>>> >> they're
>>> >> stuck maintaining those for a long time.
>>> >>
>>> >> Matei
>>> >>
>>> >> On Oct 9, 2016, at 3:10 PM, Cody Koeninger <[hidden email]> wrote:
>>> >>
>>> >> Users instead of people, sure.  Commiters and contributors are (or at
>>> >> least
>>> >> should be) a subset of users.
>>> >>
>>> >> Non goals, sure. I don't care what the name is, but we need to clearly
>>> >> say
>>> >> e.g. 'no we are not maintaining compatibility with XYZ right now'.
>>> >>
>>> >> API, what I care most about is whether it allows me to accomplish the
>>> >> goals.
>>> >> Arguing about how ugly or pretty it is can be saved for design/
>>> >> implementation imho.
>>> >>
>>> >> Strategy, this is necessary because otherwise goals can be out of line
>>> >> with
>>> >> reality.  Don't propose goals you don't have at least some idea of how
>>> >> to
>>> >> implement.
>>> >>
>>> >> Rejected strategies, given that commiters are the only ones I'm saying
>>> >> should formally submit SPARKLIs or SIPs, if they put junk in a
>>> >> required
>>> >> section then slap them down for it and tell them to fix it.
>>> >>
>>> >>
>>> >> On Oct 9, 2016 4:36 PM, "Matei Zaharia" <[hidden email]> wrote:
>>> >>>
>>> >>> Yup, this is the stuff that I found unclear. Thanks for clarifying
>>> >>> here,
>>> >>> but we should also clarify it in the writeup. In particular:
>>> >>>
>>> >>> - Goals needs to be about user-facing behavior ("people" is broad)
>>> >>>
>>> >>> - I'd rename Rejected Goals to Non-Goals. Otherwise someone will dig
>>> >>> up
>>> >>> one of these and say "Spark's developers have officially rejected X,
>>> >>> which
>>> >>> our awesome system has".
>>> >>>
>>> >>> - For user-facing stuff, I think you need a section on API. Virtually
>>> >>> all
>>> >>> other *IPs I've seen have that.
>>> >>>
>>> >>> - I'm still not sure why the strategy section is needed if the
>>> >>> purpose is
>>> >>> to define user-facing behavior -- unless this is the strategy for
>>> >>> setting
>>> >>> the goals or for defining the API. That sounds squarely like a design
>>> >>> doc
>>> >>> issue. In some sense, who cares whether the proposal is technically
>>> >>> feasible
>>> >>> right now? If it's infeasible, that will be discovered later during
>>> >>> design
>>> >>> and implementation. Same thing with rejected strategies -- listing
>>> >>> some
>>> >>> of
>>> >>> those is definitely useful sometimes, but if you make this a
>>> >>> *required*
>>> >>> section, people are just going to fill it in with bogus stuff (I've
>>> >>> seen
>>> >>> this happen before).
>>> >>>
>>> >>> Matei
>>> >>>
>>> >
>>> >>> > On Oct 9, 2016, at 2:14 PM, Cody Koeninger <[hidden email]> wrote:
>>> >>> >
>>> >>> > So to focus the discussion on the specific strategy I'm suggesting,
>>> >>> > documented at
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>>> >>> >
>>> >>> > "Goals: What must this allow people to do, that they can't
>>> >>> > currently?"
>>> >>> >
>>> >>> > Is it unclear that this is focusing specifically on people-visible
>>> >>> > behavior?
>>> >>> >
>>> >>> > Rejected goals -  are important because otherwise people keep
>>> >>> > trying
>>> >>> > to argue about scope.  Of course you can change things later with a
>>> >>> > different SIP and different vote, the point is to focus.
>>> >>> >
>>> >>> > Use cases - are something that people are going to bring up in
>>> >>> > discussion.  If they aren't clearly documented as a goal ("This
>>> >>> > must
>>> >>> > allow me to connect using SSL"), they should be added.
>>> >>> >
>>> >>> > Internal architecture - if the people who need specific behavior
>>> >>> > are
>>> >>> > implementers of other parts of the system, that's fine.
>>> >>> >
>>> >>> > Rejected strategies - If you have none of these, you have no
>>> >>> > evidence
>>> >>> > that the proponent didn't just go with the first thing they had in
>>> >>> > mind (or have already implemented), which is a big problem
>>> >>> > currently.
>>> >>> > Approval isn't binding as to specifics of implementation, so these
>>> >>> > aren't handcuffs.  The goals are the contract, the strategy is
>>> >>> > evidence that contract can actually be met.
>>> >>> >
>>> >>> > Design docs - I'm not touching design docs.  The markdown file I
>>> >>> > linked specifically says of the strategy section "This is not a
>>> >>> > full
>>> >>> > design document."  Is this unclear?  Design docs can be worked on
>>> >>> > obviously, but that's not what I'm concerned with here.
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[hidden email]>
>>> >>> > wrote:
>>> >>> >> Hi Cody,
>>> >>> >>
>>> >>> >> I think this would be a lot more concrete if we had a more
>>> >>> >> detailed
>>> >>> >> template
>>> >>> >> for SIPs. Right now, it's not super clear what's in scope -- e.g.
>>> >>> >> are
>>> >>> >> they
>>> >>> >> a way to solicit feedback on the user-facing behavior or on the
>>> >>> >> internals?
>>> >>> >> "Goals" can cover both things. I've been thinking of SIPs more as
>>> >>> >> Product
>>> >>> >> Requirements Docs (PRDs), which focus on *what* a code change
>>> >>> >> should
>>> >>> >> do
>>> >>> >> as
>>> >>> >> opposed to how.
>>> >>> >>
>>> >>> >> In particular, here are some things that you may or may not
>>> >>> >> consider
>>> >>> >> in
>>> >>> >> scope for SIPs:
>>> >>> >>
>>> >>> >> - Goals and non-goals: This is definitely in scope, and IMO should
>>> >>> >> focus on
>>> >>> >> user-visible behavior (e.g. "system supports SQL window functions"
>>> >>> >> or
>>> >>> >> "system continues working if one node fails"). BTW I wouldn't say
>>> >>> >> "rejected
>>> >>> >> goals" because some of them might become goals later, so we're not
>>> >>> >> definitively rejecting them.
>>> >>> >>
>>> >>> >> - Public API: Probably should be included in most SIPs unless it's
>>> >>> >> too
>>> >>> >> large
>>> >>> >> to fully specify then (e.g. "let's add an ML library").
>>> >>> >>
>>> >>> >> - Use cases: I usually find this very useful in PRDs to better
>>> >>> >> communicate
>>> >>> >> the goals.
>>> >>> >>
>>> >>> >> - Internal architecture: This is usually *not* a thing users can
>>> >>> >> easily
>>> >>> >> comment on and it sounds more like a design doc item. Of course
>>> >>> >> it's
>>> >>> >> important to show that the SIP is feasible to implement. One
>>> >>> >> exception,
>>> >>> >> however, is that I think we'll have some SIPs primarily on
>>> >>> >> internals
>>> >>> >> (e.g.
>>> >>> >> if somebody wants to refactor Spark's query optimizer or
>>> >>> >> something).
>>> >>> >>
>>> >>> >> - Rejected strategies: I personally wouldn't put this, because
>>> >>> >> what's
>>> >>> >> the
>>> >>> >> point of voting to reject a strategy before you've really begun
>>> >>> >> designing
>>> >>> >> and implementing something? What if you discover that the strategy
>>> >>> >> is
>>> >>> >> actually better when you start doing stuff?
>>> >>> >>
>>> >>> >> At a super high level, it depends on whether you want the SIPs to
>>> >>> >> be
>>> >>> >> PRDs
>>> >>> >> for getting some quick feedback on the goals of a feature before
>>> >>> >> it is
>>> >>> >> designed, or something more like full-fledged design docs (just a
>>> >>> >> more
>>> >>> >> visible design doc for bigger changes). I looked at Kafka's KIPs,
>>> >>> >> and
>>> >>> >> they
>>> >>> >> actually seem to be more like design docs. This can work too but
>>> >>> >> it
>>> >>> >> does
>>> >>> >> require more work from the proposer and it can lead to the same
>>> >>> >> problems you
>>> >>> >> mentioned with people already having a design and implementation
>>> >>> >> in
>>> >>> >> mind.
>>> >>> >>
>>> >>> >> Basically, the question is, are you trying to iterate faster on
>>> >>> >> design
>>> >>> >> by
>>> >>> >> adding a step for user feedback earlier? Or are you just trying to
>>> >>> >> make
>>> >>> >> design docs for key features more visible (and their approval more
>>> >>> >> formal)?
>>> >>> >>
>>> >>> >> BTW note that in either case, I'd like to have a template for
>>> >>> >> design
>>> >>> >> docs
>>> >>> >> too, which should also include goals. I think that would've
>>> >>> >> avoided
>>> >>> >> some of
>>> >>> >> the issues you brought up.
>>> >>> >>
>>> >>> >> Matei
>>> >>> >>
>>> >>> >> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]>
>>> >>> >> wrote:
>>> >>> >>
>>> >>> >> Here's my specific proposal (meta-proposal?)
>>> >>> >>
>>> >>> >> Spark Improvement Proposals (SIP)
>>> >>> >>
>>> >>> >>
>>> >>> >> Background:
>>> >>> >>
>>> >>> >> The current problem is that design and implementation of large
>>> >>> >> features
>>> >>> >> are
>>> >>> >> often done in private, before soliciting user feedback.
>>> >>> >>
>>> >>> >> When feedback is solicited, it is often as to detailed design
>>> >>> >> specifics, not
>>> >>> >> focused on goals.
>>> >>> >>
>>> >>> >> When implementation does take place after design, there is often
>>> >>> >> disagreement as to what goals are or are not in scope.
>>> >>> >>
>>> >>> >> This results in commits that don't fully meet user needs.
>>> >>> >>
>>> >>> >>
>>> >>> >> Goals:
>>> >>> >>
>>> >>> >> - Ensure user, contributor, and committer goals are clearly
>>> >>> >> identified
>>> >>> >> and
>>> >>> >> agreed upon, before implementation takes place.
>>> >>> >>
>>> >>> >> - Ensure that a technically feasible strategy is chosen that is
>>> >>> >> likely
>>> >>> >> to
>>> >>> >> meet the goals.
>>> >>> >>
>>> >>> >>
>>> >>> >> Rejected Goals:
>>> >>> >>
>>> >>> >> - SIPs are not for detailed design.  Design by committee doesn't
>>> >>> >> work.
>>> >>> >>
>>> >>> >> - SIPs are not for every change.  We dont need that much process.
>>> >>> >>
>>> >>> >>
>>> >>> >> Strategy:
>>> >>> >>
>>> >>> >> My suggestion is outlined as a Spark Improvement Proposal process
>>> >>> >> documented
>>> >>> >> at
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>>> >>> >>
>>> >>> >> Specifics of Jira manipulation are an implementation detail we can
>>> >>> >> figure
>>> >>> >> out.
>>> >>> >>
>>> >>> >> I'm suggesting voting; the need here is for a _clear_ outcome.
>>> >>> >>
>>> >>> >>
>>> >>> >> Rejected Strategies:
>>> >>> >>
>>> >>> >> Having someone who understands the problem implement it first
>>> >>> >> works,
>>> >>> >> but
>>> >>> >> only if significant iteration after user feedback is allowed.
>>> >>> >>
>>> >>> >> Historically this has been problematic due to pressure to limit
>>> >>> >> public
>>> >>> >> api
>>> >>> >> changes.
>>> >>> >>
>>> >>> >>
>>> >>> >> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]>
>>> >>> >> wrote:
>>> >>> >>>
>>> >>> >>> Alright looks like there are quite a bit of support. We should
>>> >>> >>> wait
>>> >>> >>> to
>>> >>> >>> hear from more people too.
>>> >>> >>>
>>> >>> >>> To push this forward, Cody and I will be working together in the
>>> >>> >>> next
>>> >>> >>> couple of weeks to come up with a concrete, detailed proposal on
>>> >>> >>> what
>>> >>> >>> this
>>> >>> >>> entails, and then we can discuss this the specific proposal as
>>> >>> >>> well.
>>> >>> >>>
>>> >>> >>>
>>> >>> >>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden email]>
>>> >>> >>> wrote:
>>> >>> >>>>
>>> >>> >>>> Yeah, in case it wasn't clear, I was talking about SIPs for
>>> >>> >>>> major
>>> >>> >>>> user-facing or cross-cutting changes, not minor feature adds.
>>> >>> >>>>
>>> >>> >>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos
>>> >>> >>>> <[hidden email]> wrote:
>>> >>> >>>>>
>>> >>> >>>>> +1 to the SIP label as long as it does not slow down things and
>>> >>> >>>>> it
>>> >>> >>>>> targets optimizing efforts, coordination etc. For example
>>> >>> >>>>> really
>>> >>> >>>>> small
>>> >>> >>>>> features should not need to go through this process (assuming
>>> >>> >>>>> they
>>> >>> >>>>> dont
>>> >>> >>>>> touch public interfaces)  or re-factorings and hope it will be
>>> >>> >>>>> kept
>>> >>> >>>>> this
>>> >>> >>>>> way. So as a guideline doc should be provided, like in the KIP
>>> >>> >>>>> case.
>>> >>> >>>>>
>>> >>> >>>>> IMHO so far aside from tagging things and linking them
>>> >>> >>>>> elsewhere
>>> >>> >>>>> simply
>>> >>> >>>>> having design docs and prototypes implementations in PRs is not
>>> >>> >>>>> something
>>> >>> >>>>> that has not worked so far. What is really a pain in many
>>> >>> >>>>> projects
>>> >>> >>>>> out there
>>> >>> >>>>> is discontinuity in progress of PRs, missing features, slow
>>> >>> >>>>> reviews
>>> >>> >>>>> which is
>>> >>> >>>>> understandable to some extent... it is not only about Spark but
>>> >>> >>>>> things can
>>> >>> >>>>> be improved for sure for this project in particular as already
>>> >>> >>>>> stated.
>>> >>> >>>>>
>>> >>> >>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden
>>> >>> >>>>> email]>
>>> >>> >>>>> wrote:
>>> >>> >>>>>>
>>> >>> >>>>>> +1 to adding an SIP label and linking it from the website.  I
>>> >>> >>>>>> think
>>> >>> >>>>>> it
>>> >>> >>>>>> needs
>>> >>> >>>>>>
>>> >>> >>>>>> - template that focuses it towards soliciting user goals / non
>>> >>> >>>>>> goals
>>> >>> >>>>>> - clear resolution as to which strategy was chosen to pursue.
>>> >>> >>>>>> I'd
>>> >>> >>>>>> recommend a vote.
>>> >>> >>>>>>
>>> >>> >>>>>> Matei asked me to clarify what I meant by changing interfaces,
>>> >>> >>>>>> I
>>> >>> >>>>>> think
>>> >>> >>>>>> it's directly relevant to the SIP idea so I'll clarify here,
>>> >>> >>>>>> and
>>> >>> >>>>>> split
>>> >>> >>>>>> a thread for the other discussion per Nicholas' request.
>>> >>> >>>>>>
>>> >>> >>>>>> I meant changing public user interfaces.  I think the first
>>> >>> >>>>>> design
>>> >>> >>>>>> is
>>> >>> >>>>>> unlikely to be right, because it's done at a time when you
>>> >>> >>>>>> have
>>> >>> >>>>>> the
>>> >>> >>>>>> least information.  As a user, I find it considerably more
>>> >>> >>>>>> frustrating
>>> >>> >>>>>> to be unable to use a tool to get my job done, than I do
>>> >>> >>>>>> having to
>>> >>> >>>>>> make minor changes to my code in order to take advantage of
>>> >>> >>>>>> features.
>>> >>> >>>>>> I've seen committers be seriously reluctant to allow changes
>>> >>> >>>>>> to
>>> >>> >>>>>> @experimental code that are needed in order for it to really
>>> >>> >>>>>> work
>>> >>> >>>>>> right.  You need to be able to iterate, and if people on both
>>> >>> >>>>>> sides
>>> >>> >>>>>> of
>>> >>> >>>>>> the fence aren't going to respect that some newer apis are
>>> >>> >>>>>> subject
>>> >>> >>>>>> to
>>> >>> >>>>>> change, then why even mark them as such?
>>> >>> >>>>>>
>>> >>> >>>>>> Ideally a finished SIP should give me a checklist of things
>>> >>> >>>>>> that
>>> >>> >>>>>> an
>>> >>> >>>>>> implementation must do, and things that it doesn't need to do.
>>> >>> >>>>>> Contributors/committers should be seriously discouraged from
>>> >>> >>>>>> putting
>>> >>> >>>>>> out a version 0.1 that doesn't have at least a prototype
>>> >>> >>>>>> implementation of all those things, especially if they're then
>>> >>> >>>>>> going
>>> >>> >>>>>> to argue against interface changes necessary to get the the
>>> >>> >>>>>> rest
>>> >>> >>>>>> of
>>> >>> >>>>>> the things done in the 0.2 version.
>>> >>> >>>>>>
>>> >>> >>>>>>
>>> >>> >>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden email]>
>>> >>> >>>>>> wrote:
>>> >>> >>>>>>> I like the lightweight proposal to add a SIP label.
>>> >>> >>>>>>>
>>> >>> >>>>>>> During Spark 2.0 development, Tom (Graves) and I suggested
>>> >>> >>>>>>> using
>>> >>> >>>>>>> wiki
>>> >>> >>>>>>> to
>>> >>> >>>>>>> track the list of major changes, but that never really
>>> >>> >>>>>>> materialized
>>> >>> >>>>>>> due to
>>> >>> >>>>>>> the overhead. Adding a SIP label on major JIRAs and then link
>>> >>> >>>>>>> to
>>> >>> >>>>>>> them
>>> >>> >>>>>>> prominently on the Spark website makes a lot of sense.
>>> >>> >>>>>>>
>>> >>> >>>>>>>
>>> >>> >>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia
>>> >>> >>>>>>> <[hidden email]>
>>> >>> >>>>>>> wrote:
>>> >>> >>>>>>>>
>>> >>> >>>>>>>> For the improvement proposals, I think one major point was
>>> >>> >>>>>>>> to
>>> >>> >>>>>>>> make
>>> >>> >>>>>>>> them
>>> >>> >>>>>>>> really visible to users who are not contributors, so we
>>> >>> >>>>>>>> should
>>> >>> >>>>>>>> do
>>> >>> >>>>>>>> more than
>>> >>> >>>>>>>> sending stuff to dev@. One very lightweight idea is to have
>>> >>> >>>>>>>> a
>>> >>> >>>>>>>> new
>>> >>> >>>>>>>> type of
>>> >>> >>>>>>>> JIRA called a SIP and have a link to a filter that shows all
>>> >>> >>>>>>>> such
>>> >>> >>>>>>>> JIRAs from
>>> >>> >>>>>>>> http://spark.apache.org. I also like the idea of SIP and
>>> >>> >>>>>>>> design
>>> >>> >>>>>>>> doc
>>> >>> >>>>>>>> templates (in fact many projects have them).
>>> >>> >>>>>>>>
>>> >>> >>>>>>>> Matei
>>> >>> >>>>>>>>
>>> >>> >>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]>
>>> >>> >>>>>>>> wrote:
>>> >>> >>>>>>>>
>>> >>> >>>>>>>> I called Cody last night and talked about some of the topics
>>> >>> >>>>>>>> in
>>> >>> >>>>>>>> his
>>> >>> >>>>>>>> email.
>>> >>> >>>>>>>> It became clear to me Cody genuinely cares about the
>>> >>> >>>>>>>> project.
>>> >>> >>>>>>>>
>>> >>> >>>>>>>> Some of the frustrations come from the success of the
>>> >>> >>>>>>>> project
>>> >>> >>>>>>>> itself
>>> >>> >>>>>>>> becoming very "hot", and it is difficult to get clarity from
>>> >>> >>>>>>>> people
>>> >>> >>>>>>>> who
>>> >>> >>>>>>>> don't dedicate all their time to Spark. In fact, it is in
>>> >>> >>>>>>>> some
>>> >>> >>>>>>>> ways
>>> >>> >>>>>>>> similar
>>> >>> >>>>>>>> to scaling an engineering team in a successful startup: old
>>> >>> >>>>>>>> processes that
>>> >>> >>>>>>>> worked well might not work so well when it gets to a certain
>>> >>> >>>>>>>> size,
>>> >>> >>>>>>>> cultures
>>> >>> >>>>>>>> can get diluted, building culture vs building process, etc.
>>> >>> >>>>>>>>
>>> >>> >>>>>>>> I also really like to have a more visible process for larger
>>> >>> >>>>>>>> changes,
>>> >>> >>>>>>>> especially major user facing API changes. Historically we
>>> >>> >>>>>>>> upload
>>> >>> >>>>>>>> design docs
>>> >>> >>>>>>>> for major changes, but it is not always consistent and
>>> >>> >>>>>>>> difficult
>>> >>> >>>>>>>> to
>>> >>> >>>>>>>> quality
>>> >>> >>>>>>>> of the docs, due to the volunteering nature of the
>>> >>> >>>>>>>> organization.
>>> >>> >>>>>>>>
>>> >>> >>>>>>>> Some of the more concrete ideas we discussed focus on
>>> >>> >>>>>>>> building a
>>> >>> >>>>>>>> culture
>>> >>> >>>>>>>> to improve clarity:
>>> >>> >>>>>>>>
>>> >>> >>>>>>>> - Process: Large changes should have design docs posted on
>>> >>> >>>>>>>> JIRA.
>>> >>> >>>>>>>> One
>>> >>> >>>>>>>> thing
>>> >>> >>>>>>>> Cody and I didn't discuss but an idea that just came to me
>>> >>> >>>>>>>> is we
>>> >>> >>>>>>>> should
>>> >>> >>>>>>>> create a design doc template for the project and ask
>>> >>> >>>>>>>> everybody
>>> >>> >>>>>>>> to
>>> >>> >>>>>>>> follow.
>>> >>> >>>>>>>> The design doc template should also explicitly list goals
>>> >>> >>>>>>>> and
>>> >>> >>>>>>>> non-goals, to
>>> >>> >>>>>>>> make design doc more consistent.
>>> >>> >>>>>>>>
>>> >>> >>>>>>>> - Process: Email dev@ to solicit feedback. We have some this
>>> >>> >>>>>>>> with
>>> >>> >>>>>>>> some
>>> >>> >>>>>>>> changes, but again very inconsistent. Just posting something
>>> >>> >>>>>>>> on
>>> >>> >>>>>>>> JIRA
>>> >>> >>>>>>>> isn't
>>> >>> >>>>>>>> sufficient, because there are simply too many JIRAs and the
>>> >>> >>>>>>>> signal
>>> >>> >>>>>>>> get lost
>>> >>> >>>>>>>> in the noise. While this is generally impossible to enforce
>>> >>> >>>>>>>> because
>>> >>> >>>>>>>> we can't
>>> >>> >>>>>>>> force all volunteers to conform to a process (or they might
>>> >>> >>>>>>>> not
>>> >>> >>>>>>>> even
>>> >>> >>>>>>>> be
>>> >>> >>>>>>>> aware of this),  those who are more familiar with the
>>> >>> >>>>>>>> project
>>> >>> >>>>>>>> can
>>> >>> >>>>>>>> help by
>>> >>> >>>>>>>> emailing the dev@ when they see something that hasn't been.
>>> >>> >>>>>>>>
>>> >>> >>>>>>>> - Culture: The design doc author(s) should be open to
>>> >>> >>>>>>>> feedback.
>>> >>> >>>>>>>> A
>>> >>> >>>>>>>> design
>>> >>> >>>>>>>> doc should serve as the base for discussion and is by no
>>> >>> >>>>>>>> means
>>> >>> >>>>>>>> the
>>> >>> >>>>>>>> final
>>> >>> >>>>>>>> design. Of course, this does not mean the author has to
>>> >>> >>>>>>>> accept
>>> >>> >>>>>>>> every
>>> >>> >>>>>>>> feedback. They should also be comfortable accepting /
>>> >>> >>>>>>>> rejecting
>>> >>> >>>>>>>> ideas on
>>> >>> >>>>>>>> technical grounds.
>>> >>> >>>>>>>>
>>> >>> >>>>>>>> - Process / Culture: For major ongoing projects, it can be
>>> >>> >>>>>>>> useful
>>> >>> >>>>>>>> to
>>> >>> >>>>>>>> have
>>> >>> >>>>>>>> some monthly Google hangouts that are open to the world. I
>>> >>> >>>>>>>> am
>>> >>> >>>>>>>> actually not
>>> >>> >>>>>>>> sure how well this will work, because of the volunteering
>>> >>> >>>>>>>> nature
>>> >>> >>>>>>>> and
>>> >>> >>>>>>>> we need
>>> >>> >>>>>>>> to adjust for timezones for people across the globe, but it
>>> >>> >>>>>>>> seems
>>> >>> >>>>>>>> worth
>>> >>> >>>>>>>> trying.
>>> >>> >>>>>>>>
>>> >>> >>>>>>>> - Culture: Contributors (including committers) should be
>>> >>> >>>>>>>> more
>>> >>> >>>>>>>> direct
>>> >>> >>>>>>>> in
>>> >>> >>>>>>>> setting expectations, including whether they are working on
>>> >>> >>>>>>>> a
>>> >>> >>>>>>>> specific
>>> >>> >>>>>>>> issue, whether they will be working on a specific issue, and
>>> >>> >>>>>>>> whether
>>> >>> >>>>>>>> an
>>> >>> >>>>>>>> issue or pr or jira should be rejected. Most people I know
>>> >>> >>>>>>>> in
>>> >>> >>>>>>>> this
>>> >>> >>>>>>>> community
>>> >>> >>>>>>>> are nice and don't enjoy telling other people no, but it is
>>> >>> >>>>>>>> often
>>> >>> >>>>>>>> more
>>> >>> >>>>>>>> annoying to a contributor to not know anything than getting
>>> >>> >>>>>>>> a
>>> >>> >>>>>>>> no.
>>> >>> >>>>>>>>
>>> >>> >>>>>>>>
>>> >>> >>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia
>>> >>> >>>>>>>> <[hidden email]>
>>> >>> >>>>>>>> wrote:
>>> >>> >>>>>>>>>
>>> >>> >>>>>>>>>
>>> >>> >>>>>>>>> Love the idea of a more visible "Spark Improvement
>>> >>> >>>>>>>>> Proposal"
>>> >>> >>>>>>>>> process that
>>> >>> >>>>>>>>> solicits user input on new APIs. For what it's worth, I
>>> >>> >>>>>>>>> don't
>>> >>> >>>>>>>>> think
>>> >>> >>>>>>>>> committers are trying to minimize their own work -- every
>>> >>> >>>>>>>>> committer
>>> >>> >>>>>>>>> cares
>>> >>> >>>>>>>>> about making the software useful for users. However, it is
>>> >>> >>>>>>>>> always
>>> >>> >>>>>>>>> hard to
>>> >>> >>>>>>>>> get user input and so it helps to have this kind of
>>> >>> >>>>>>>>> process.
>>> >>> >>>>>>>>> I've
>>> >>> >>>>>>>>> certainly
>>> >>> >>>>>>>>> looked at the *IPs a lot in other software I use just to
>>> >>> >>>>>>>>> see
>>> >>> >>>>>>>>> the
>>> >>> >>>>>>>>> biggest
>>> >>> >>>>>>>>> things on the roadmap.
>>> >>> >>>>>>>>>
>>> >>> >>>>>>>>> When you're talking about "changing interfaces", are you
>>> >>> >>>>>>>>> talking
>>> >>> >>>>>>>>> about
>>> >>> >>>>>>>>> public or internal APIs? I do think many people hate
>>> >>> >>>>>>>>> changing
>>> >>> >>>>>>>>> public APIs
>>> >>> >>>>>>>>> and I actually think that's for the best of the project.
>>> >>> >>>>>>>>> That's
>>> >>> >>>>>>>>> a
>>> >>> >>>>>>>>> technical
>>> >>> >>>>>>>>> debate, but basically, the worst thing when you're using a
>>> >>> >>>>>>>>> piece
>>> >>> >>>>>>>>> of
>>> >>> >>>>>>>>> software
>>> >>> >>>>>>>>> is that the developers constantly ask you to rewrite your
>>> >>> >>>>>>>>> app
>>> >>> >>>>>>>>> to
>>> >>> >>>>>>>>> update to a
>>> >>> >>>>>>>>> new version (and thus benefit from bug fixes, etc). Cue
>>> >>> >>>>>>>>> anyone
>>> >>> >>>>>>>>> who's used
>>> >>> >>>>>>>>> Protobuf, or Guava. The "let's get everyone to change their
>>> >>> >>>>>>>>> code
>>> >>> >>>>>>>>> this
>>> >>> >>>>>>>>> release" model works well within a single large company,
>>> >>> >>>>>>>>> but
>>> >>> >>>>>>>>> doesn't work
>>> >>> >>>>>>>>> well for a community, which is why nearly all *very* widely
>>> >>> >>>>>>>>> used
>>> >>> >>>>>>>>> programming
>>> >>> >>>>>>>>> interfaces (I'm talking things like Java standard library,
>>> >>> >>>>>>>>> Windows
>>> >>> >>>>>>>>> API, etc)
>>> >>> >>>>>>>>> almost *never* break backwards compatibility. All this is
>>> >>> >>>>>>>>> done
>>> >>> >>>>>>>>> within reason
>>> >>> >>>>>>>>> though, e.g. we do change things in major releases (2.x,
>>> >>> >>>>>>>>> 3.x,
>>> >>> >>>>>>>>> etc).
>>> >>> >>>>>>>>
>>> >>> >>>>>>>>
>>> >>> >>>>>>>>
>>> >>> >>>>>>>>
>>> >>> >>>>>>>
>>> >>> >>>>>>
>>> >>> >>>>>>
>>> >>> >>>>>>
>>> >>> >>>>>>
>>> >>> >>>>>> ---------------------------------------------------------------------
>>> >>> >>>>>> To unsubscribe e-mail: [hidden email]
>>> >>> >>>>>>
>>> >>> >>>>>
>>> >>> >>>>>
>>> >>> >>>>>
>>> >>> >>>>> --
>>> >>> >>>>> Stavros Kontopoulos
>>> >>> >>>>> Senior Software Engineer
>>> >>> >>>>> Lightbend, Inc.
>>> >>> >>>>> p:  +30 6977967274
>>> >>> >>>>> e: [hidden email]
>>> >>> >>>>>
>>> >>> >>>>>
>>> >>> >>>>
>>> >>> >>>
>>> >>> >>
>>> >>> >>
>>> >>>
>>> >>
>>> >
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe e-mail: [hidden email]
>>> >
>>> >
>>> > ________________________________
>>> >
>>> > If you reply to this email, your message will be added to the
>>> > discussion
>>> > below:
>>> >
>>> >
>>> > http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-Improvement-Proposals-tp19268p19359.html
>>> >
>>> > To start a new topic under Apache Spark Developers List, email [hidden
>>> > email]
>>> > To unsubscribe from Apache Spark Developers List, click here.
>>> > NAML
>>> >
>>> >
>>> > ________________________________
>>> > View this message in context: RE: Spark Improvement Proposals
>>> > Sent from the Apache Spark Developers List mailing list archive at
>>> > Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: [hidden email]
>>>
>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>
>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

123456
Loading...