Feedback on MLlib roadmap process proposal

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Feedback on MLlib roadmap process proposal

Joseph Bradley
Hi all,

This is a general call for thoughts about the process for the MLlib roadmap proposed in SPARK-18813.  See the section called "Roadmap process."

Summary:
* This process is about committers indicating intention to shepherd and review.
* The goal is to improve visibility and communication.
* This is fairly orthogonal to the SIP discussion since this proposal is more about setting release targets than about proposing future plans.

Thanks!
Joseph

--

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

http://databricks.com

Reply | Threaded
Open this post in threaded view
|

Re: Feedback on MLlib roadmap process proposal

Seth Hendrickson
I think the proposal laid out in SPARK-18813 is well done, and I do think it is going to improve the process going forward. I also really like the idea of getting the community to vote on JIRAs to give some of them priority - provided that we listen to those votes, of course. The biggest problem I see is that we do have several active contributors and those who want to help implement these changes, but PRs are reviewed rather sporadically and I imagine it is very difficult for contributors to understand why some get reviewed and some do not. The most important thing we can do, given that MLlib currently has a very limited committer review bandwidth, is to make clear issues that, if worked on, will definitely get reviewed. A hard thing to do in open source, no doubt, but even if we have to limit the scope of such issues to a very small subset, it's a gain for all I think.

On a related note, I would love to hear some discussion on the higher level goal of Spark MLlib (if this derails the original discussion, please let me know and we can discuss in another thread). The roadmap does contain specific items that help to convey some of this (ML parity with MLlib, model persistence, etc...), but I'm interested in what the "mission" of Spark MLlib is. We often see PRs for brand new algorithms which are sometimes rejected and sometimes not. Do we aim to keep implementing more and more algorithms? Or is our focus really, now that we have a reasonable library of algorithms, to simply make the existing ones faster/better/more robust? Should we aim to make interfaces that are easily extended for developers to easily implement their own custom code (e.g. custom optimization libraries), or do we want to restrict things to out-of-the box algorithms? Should we focus on more flexible, general abstractions like distributed linear algebra?

I was not involved in the project in the early days of MLlib when this discussion may have happened, but I think it would be useful to either revisit it or restate it here for some of the newer developers. 

On Tue, Jan 17, 2017 at 3:38 PM, Joseph Bradley <[hidden email]> wrote:
Hi all,

This is a general call for thoughts about the process for the MLlib roadmap proposed in SPARK-18813.  See the section called "Roadmap process."

Summary:
* This process is about committers indicating intention to shepherd and review.
* The goal is to improve visibility and communication.
* This is fairly orthogonal to the SIP discussion since this proposal is more about setting release targets than about proposing future plans.

Thanks!
Joseph

--

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

http://databricks.com


Reply | Threaded
Open this post in threaded view
|

Re: Feedback on MLlib roadmap process proposal

Mingjie Tang
+1 general abstractions like distributed linear algebra. 

On Thu, Jan 19, 2017 at 8:54 AM, Seth Hendrickson <[hidden email]> wrote:
I think the proposal laid out in SPARK-18813 is well done, and I do think it is going to improve the process going forward. I also really like the idea of getting the community to vote on JIRAs to give some of them priority - provided that we listen to those votes, of course. The biggest problem I see is that we do have several active contributors and those who want to help implement these changes, but PRs are reviewed rather sporadically and I imagine it is very difficult for contributors to understand why some get reviewed and some do not. The most important thing we can do, given that MLlib currently has a very limited committer review bandwidth, is to make clear issues that, if worked on, will definitely get reviewed. A hard thing to do in open source, no doubt, but even if we have to limit the scope of such issues to a very small subset, it's a gain for all I think.

On a related note, I would love to hear some discussion on the higher level goal of Spark MLlib (if this derails the original discussion, please let me know and we can discuss in another thread). The roadmap does contain specific items that help to convey some of this (ML parity with MLlib, model persistence, etc...), but I'm interested in what the "mission" of Spark MLlib is. We often see PRs for brand new algorithms which are sometimes rejected and sometimes not. Do we aim to keep implementing more and more algorithms? Or is our focus really, now that we have a reasonable library of algorithms, to simply make the existing ones faster/better/more robust? Should we aim to make interfaces that are easily extended for developers to easily implement their own custom code (e.g. custom optimization libraries), or do we want to restrict things to out-of-the box algorithms? Should we focus on more flexible, general abstractions like distributed linear algebra?

I was not involved in the project in the early days of MLlib when this discussion may have happened, but I think it would be useful to either revisit it or restate it here for some of the newer developers. 

On Tue, Jan 17, 2017 at 3:38 PM, Joseph Bradley <[hidden email]> wrote:
Hi all,

This is a general call for thoughts about the process for the MLlib roadmap proposed in SPARK-18813.  See the section called "Roadmap process."

Summary:
* This process is about committers indicating intention to shepherd and review.
* The goal is to improve visibility and communication.
* This is fairly orthogonal to the SIP discussion since this proposal is more about setting release targets than about proposing future plans.

Thanks!
Joseph

--

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

http://databricks.com



Reply | Threaded
Open this post in threaded view
|

Re: Feedback on MLlib roadmap process proposal

Felix Cheung
Hi Seth

Re: "The most important thing we can do, given that MLlib currently has a very limited committer review bandwidth, is to make clear issues that, if worked on, will definitely get reviewed. "

We are adopting a Shepherd model, as described in the JIRA Joseph has, in which, when assigned, the Shepherd will see it through with the contributor to make sure it lands with the target release.

I'm sure Joseph can explain it better than I do ;)


_____________________________
From: Mingjie Tang <[hidden email]>
Sent: Thursday, January 19, 2017 10:30 AM
Subject: Re: Feedback on MLlib roadmap process proposal
To: Seth Hendrickson <[hidden email]>
Cc: Joseph Bradley <[hidden email]>, <[hidden email]>


+1 general abstractions like distributed linear algebra. 

On Thu, Jan 19, 2017 at 8:54 AM, Seth Hendrickson <[hidden email]> wrote:
I think the proposal laid out in SPARK-18813 is well done, and I do think it is going to improve the process going forward. I also really like the idea of getting the community to vote on JIRAs to give some of them priority - provided that we listen to those votes, of course. The biggest problem I see is that we do have several active contributors and those who want to help implement these changes, but PRs are reviewed rather sporadically and I imagine it is very difficult for contributors to understand why some get reviewed and some do not. The most important thing we can do, given that MLlib currently has a very limited committer review bandwidth, is to make clear issues that, if worked on, will definitely get reviewed. A hard thing to do in open source, no doubt, but even if we have to limit the scope of such issues to a very small subset, it's a gain for all I think.

On a related note, I would love to hear some discussion on the higher level goal of Spark MLlib (if this derails the original discussion, please let me know and we can discuss in another thread). The roadmap does contain specific items that help to convey some of this (ML parity with MLlib, model persistence, etc...), but I'm interested in what the "mission" of Spark MLlib is. We often see PRs for brand new algorithms which are sometimes rejected and sometimes not. Do we aim to keep implementing more and more algorithms? Or is our focus really, now that we have a reasonable library of algorithms, to simply make the existing ones faster/better/more robust? Should we aim to make interfaces that are easily extended for developers to easily implement their own custom code (e.g. custom optimization libraries), or do we want to restrict things to out-of-the box algorithms? Should we focus on more flexible, general abstractions like distributed linear algebra?

I was not involved in the project in the early days of MLlib when this discussion may have happened, but I think it would be useful to either revisit it or restate it here for some of the newer developers. 

On Tue, Jan 17, 2017 at 3:38 PM, Joseph Bradley <[hidden email]> wrote:
Hi all,

This is a general call for thoughts about the process for the MLlib roadmap proposed in SPARK-18813.  See the section called "Roadmap process."

Summary:
* This process is about committers indicating intention to shepherd and review.
* The goal is to improve visibility and communication.
* This is fairly orthogonal to the SIP discussion since this proposal is more about setting release targets than about proposing future plans.

Thanks!
Joseph

--

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

http://databricks.com





Reply | Threaded
Open this post in threaded view
|

Re: Feedback on MLlib roadmap process proposal

Joseph Bradley
Hi Seth,

The proposal is geared towards exactly the issue you're describing: providing more visibility into the capacity and intentions of committers.  If there are things you'd add to it or change to improve further, it would be great to hear ideas!  The past roadmap JIRA has some more background discussion which is worth looking at too.

Let's break off the MLlib mission discussion into another thread.  I'll start one now.

Thanks,
Joseph

On Thu, Jan 19, 2017 at 1:51 PM, Felix Cheung <[hidden email]> wrote:
Hi Seth

Re: "The most important thing we can do, given that MLlib currently has a very limited committer review bandwidth, is to make clear issues that, if worked on, will definitely get reviewed. "

We are adopting a Shepherd model, as described in the JIRA Joseph has, in which, when assigned, the Shepherd will see it through with the contributor to make sure it lands with the target release.

I'm sure Joseph can explain it better than I do ;)


_____________________________
From: Mingjie Tang <[hidden email]>
Sent: Thursday, January 19, 2017 10:30 AM
Subject: Re: Feedback on MLlib roadmap process proposal
To: Seth Hendrickson <[hidden email]>
Cc: Joseph Bradley <[hidden email]>, <[hidden email]>



+1 general abstractions like distributed linear algebra. 

On Thu, Jan 19, 2017 at 8:54 AM, Seth Hendrickson <[hidden email]> wrote:
I think the proposal laid out in SPARK-18813 is well done, and I do think it is going to improve the process going forward. I also really like the idea of getting the community to vote on JIRAs to give some of them priority - provided that we listen to those votes, of course. The biggest problem I see is that we do have several active contributors and those who want to help implement these changes, but PRs are reviewed rather sporadically and I imagine it is very difficult for contributors to understand why some get reviewed and some do not. The most important thing we can do, given that MLlib currently has a very limited committer review bandwidth, is to make clear issues that, if worked on, will definitely get reviewed. A hard thing to do in open source, no doubt, but even if we have to limit the scope of such issues to a very small subset, it's a gain for all I think.

On a related note, I would love to hear some discussion on the higher level goal of Spark MLlib (if this derails the original discussion, please let me know and we can discuss in another thread). The roadmap does contain specific items that help to convey some of this (ML parity with MLlib, model persistence, etc...), but I'm interested in what the "mission" of Spark MLlib is. We often see PRs for brand new algorithms which are sometimes rejected and sometimes not. Do we aim to keep implementing more and more algorithms? Or is our focus really, now that we have a reasonable library of algorithms, to simply make the existing ones faster/better/more robust? Should we aim to make interfaces that are easily extended for developers to easily implement their own custom code (e.g. custom optimization libraries), or do we want to restrict things to out-of-the box algorithms? Should we focus on more flexible, general abstractions like distributed linear algebra?

I was not involved in the project in the early days of MLlib when this discussion may have happened, but I think it would be useful to either revisit it or restate it here for some of the newer developers. 

On Tue, Jan 17, 2017 at 3:38 PM, Joseph Bradley <[hidden email]> wrote:
Hi all,

This is a general call for thoughts about the process for the MLlib roadmap proposed in SPARK-18813.  See the section called "Roadmap process."

Summary:
* This process is about committers indicating intention to shepherd and review.
* The goal is to improve visibility and communication.
* This is fairly orthogonal to the SIP discussion since this proposal is more about setting release targets than about proposing future plans.

Thanks!
Joseph

--

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

http://databricks.com








--

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

http://databricks.com

Reply | Threaded
Open this post in threaded view
|

RE: Feedback on MLlib roadmap process proposal

Ilya Matiach

Just a few questions with regards to the MLLIB process:

 

  1. Is there a list of committers who can/are shepherds and what code they own?  I’ve seen this page: http://spark.apache.org/committers.html but I’m not sure if it is up to date and it doesn’t mention what code the committers own.  It would be useful to know who owns ML or MLLIB.  From my limited personal experience this seems to be Joseph K. Bradley, Yanbo Liang and Sean Owen.
  2. Based on both user votes and watchers, the top issue currently is “SPARK-5575: Artificial neural networks for MLlib deep learning”.  However, it looks like it has been opened for almost 2 years and not a lot of progress is being made.  There seem to be other top issues which aren’t getting addressed as well on these pages mentioned in the roadmap: MLlib, sorted by: Votes or Watchers .  Is my perception incorrect, or is there a very good reason for not addressing the top issues voted for by the community?  If there is a good reason, is there a way to filter such JIRAs out from the sorted lists, to know which JIRAs really should be taken/worked on?
  3. Also, this might be a newbie question, but for new contributors to spark, is there a process to convince a committer to be assigned to a JIRA that we are working on. It would be useful if there was a clear threshold for whether a committer can reject to work on a JIRA ahead of time, so contributors won’t waste time working on issues that aren’t important to spark and focus on making progress on the issues that the spark committers would like us to fix.

 

Thank you, Ilya

 

From: Joseph Bradley [mailto:[hidden email]]
Sent: Monday, January 23, 2017 8:04 PM
To: Felix Cheung <[hidden email]>
Cc: Mingjie Tang <[hidden email]>; Seth Hendrickson <[hidden email]>; [hidden email]
Subject: Re: Feedback on MLlib roadmap process proposal

 

Hi Seth,

 

The proposal is geared towards exactly the issue you're describing: providing more visibility into the capacity and intentions of committers.  If there are things you'd add to it or change to improve further, it would be great to hear ideas!  The past roadmap JIRA has some more background discussion which is worth looking at too.

 

Let's break off the MLlib mission discussion into another thread.  I'll start one now.

 

Thanks,

Joseph

 

On Thu, Jan 19, 2017 at 1:51 PM, Felix Cheung <[hidden email]> wrote:

Hi Seth

 

Re: "The most important thing we can do, given that MLlib currently has a very limited committer review bandwidth, is to make clear issues that, if worked on, will definitely get reviewed. "

 

We are adopting a Shepherd model, as described in the JIRA Joseph has, in which, when assigned, the Shepherd will see it through with the contributor to make sure it lands with the target release.

 

I'm sure Joseph can explain it better than I do ;)

 

_____________________________
From: Mingjie Tang <[hidden email]>
Sent: Thursday, January 19, 2017 10:30 AM
Subject: Re: Feedback on MLlib roadmap process proposal
To: Seth Hendrickson <[hidden email]>
Cc: Joseph Bradley <[hidden email]>, <[hidden email]>



+1 general abstractions like distributed linear algebra. 

 

On Thu, Jan 19, 2017 at 8:54 AM, Seth Hendrickson <[hidden email]> wrote:

I think the proposal laid out in SPARK-18813 is well done, and I do think it is going to improve the process going forward. I also really like the idea of getting the community to vote on JIRAs to give some of them priority - provided that we listen to those votes, of course. The biggest problem I see is that we do have several active contributors and those who want to help implement these changes, but PRs are reviewed rather sporadically and I imagine it is very difficult for contributors to understand why some get reviewed and some do not. The most important thing we can do, given that MLlib currently has a very limited committer review bandwidth, is to make clear issues that, if worked on, will definitely get reviewed. A hard thing to do in open source, no doubt, but even if we have to limit the scope of such issues to a very small subset, it's a gain for all I think.

 

On a related note, I would love to hear some discussion on the higher level goal of Spark MLlib (if this derails the original discussion, please let me know and we can discuss in another thread). The roadmap does contain specific items that help to convey some of this (ML parity with MLlib, model persistence, etc...), but I'm interested in what the "mission" of Spark MLlib is. We often see PRs for brand new algorithms which are sometimes rejected and sometimes not. Do we aim to keep implementing more and more algorithms? Or is our focus really, now that we have a reasonable library of algorithms, to simply make the existing ones faster/better/more robust? Should we aim to make interfaces that are easily extended for developers to easily implement their own custom code (e.g. custom optimization libraries), or do we want to restrict things to out-of-the box algorithms? Should we focus on more flexible, general abstractions like distributed linear algebra?

 

I was not involved in the project in the early days of MLlib when this discussion may have happened, but I think it would be useful to either revisit it or restate it here for some of the newer developers. 

 

On Tue, Jan 17, 2017 at 3:38 PM, Joseph Bradley <[hidden email]> wrote:

Hi all,

 

This is a general call for thoughts about the process for the MLlib roadmap proposed in SPARK-18813.  See the section called "Roadmap process."

 

Summary:

* This process is about committers indicating intention to shepherd and review.

* The goal is to improve visibility and communication.

* This is fairly orthogonal to the SIP discussion since this proposal is more about setting release targets than about proposing future plans.

 

Thanks!

Joseph

 

--

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

http://databricks.com

 

 

 



 

--

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

http://databricks.com

Reply | Threaded
Open this post in threaded view
|

Re: Feedback on MLlib roadmap process proposal

Sean Owen
On Tue, Jan 24, 2017 at 3:58 PM Ilya Matiach <[hidden email]> wrote:

Just a few questions with regards to the MLLIB process:

 

  1. Is there a list of committers who can/are shepherds and what code they own?  I’ve seen this page: http://spark.apache.org/committers.html but I’m not sure if it is up to date and it doesn’t mention what code the committers own.  It would be useful to know who owns ML or MLLIB.  From my limited personal experience this seems to be Joseph K. Bradley, Yanbo Liang and Sean Owen.
There is no such list because there's no formal notion of ownership or access to subsets of the project. Tracking an informal notion would be process mostly for its own sake, and probably just go out of date. We sort of tried this with 'maintainers' and it didn't actually do anything.

I am not active much in ML, but will occasionally help commit simple changes. What you see organically is pretty much what is, at any given time. People you see responding are the active ones, and influencers, commit bit or no.

 
  1. Based on both user votes and watchers, the top issue currently is “SPARK-5575: Artificial neural networks for MLlib deep learning”.  However, it looks like it has been opened for almost 2 years and not a lot of progress is being made.  There seem to be other top issues which aren’t getting addressed as well on these pages mentioned in the roadmap: MLlib, sorted by: Votes or Watchers .  Is my perception incorrect, or is there a very good reason for not addressing the top issues voted for by the community?  If there is a good reason, is there a way to filter such JIRAs out from the sorted lists, to know which JIRAs really should be taken/worked on?
JIRA votes and watchers don't mean anything, formally. This isn't a product company where one group might give another group a list of top priorities to work on. There's a general statement about this at http://spark.apache.org/contributing.html under "Code Review Criteria". In practice, it's a soft process of convincing other people that change X does more good than harm, is worth taking the burden of supporting, matters to users, etc. I ignore 80% of issues, that don't seem to fit these criteria, and choose to help with the 20% that do, which are usually simple and/or important bug fixes. 

ANNs? that's a tangent but my snap reaction are:
It's something Everybody wants Somebody Else to create, which may explain the votes vs activity?
There is one basic ANN implementation in Spark actually.
There are others outside Spark, so may be something people get elsewhere like dl4j or BigDL, or strapping TF to Spark in various ways. 
DL is also not an obviously-great fit for the data-parallel computation model here.
It's not a goal to implement everything in Spark. It could be a good idea, but, no need to tether it to the core project, to the exclusion of "unblessed" third-party packages.

 
  1. Also, this might be a newbie question, but for new contributors to spark, is there a process to convince a committer to be assigned to a JIRA that we are working on. It would be useful if there was a clear threshold for whether a committer can reject to work on a JIRA ahead of time, so contributors won’t waste time working on issues that aren’t important to spark and focus on making progress on the issues that the spark committers would like us to fix.

No, there's no concept of being tasked to work on something by someone else here. I can't imagine we could establish a clear objective threshold for such a subjective thing. 

It's not a satisfying answer but it is the most realistic one. All of these OSS projects work on soft power, persuasion and cooperation. I think the good news is that all the intuitive ways to gain soft power do work: give time to others' problems if you want time on your own, help review, make thoughtful careful changes, etc.

My general guidance is: don't bother doing significant feature work unless you have some clear buy-in from someone who can commit.

I completely agree that issues should be closed more aggressively for the reason you give. On the flip-side this often ruffles feathers. We are still overrun with issues but it's gotten a lot better culture-wise about honestly rejecting lots of inbound stuff quickly.
 
Reply | Threaded
Open this post in threaded view
|

Re: Feedback on MLlib roadmap process proposal

Cody Koeninger-2
Totally agree with most of what Sean said, just wanted to give an
alternate take on the "maintainers" thing

On Tue, Jan 24, 2017 at 10:23 AM, Sean Owen <[hidden email]> wrote:
> There is no such list because there's no formal notion of ownership or
> access to subsets of the project. Tracking an informal notion would be
> process mostly for its own sake, and probably just go out of date. We sort
> of tried this with 'maintainers' and it didn't actually do anything.
>

My perception of that situation is that the Apache process is actively
antagonistic towards factoring out responsibility for particular parts
of the code into a hierarchy.  I think if Spark was under a different
open source model, with otherwise exactly the same committers, that
attempt at identifying maintainers would have worked out differently.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Feedback on MLlib roadmap process proposal

Ilya Matiach
In reply to this post by Sean Owen

Thanks Sean, this is a really helpful overview, and contains good guidance for new contributors to ML/MLLIB. 

My confusion was that the ML 2.2 roadmap critical features (https://issues.apache.org/jira/browse/SPARK-18813) did not line up with the top ML/MLLIB JIRAs by Votes or Watchers.

Your explanation that they do not have to and there is a more complex process to choosing the changes that will make it into the next release makes sense to me.

My only humble recommendation would be to cleanup the top JIRAs by closing the ones which have spark packages for them (eg the NN one which already has several packages as you explained), noting or somehow marking on some that they will not be resolved, and changing the component on the ones not related to ML/MLLIB (eg https://issues.apache.org/jira/browse/SPARK-12965).

Also, I would love to do this if I had the permissions, but it would be great to change the JIRAs that are marked as “in progress” but where the corresponding pull request was closed/cancelled, for example https://issues.apache.org/jira/browse/SPARK-4638.  That JIRA is actually one of the top ones by number of watches (adding kernels like Radial Basis Function to SVM, and I can imagine why it’s one of the top ones), and seeing it marked as in progress with a pull request is somewhat confusing.  I’ve seen several other JIRAs similar to this one, where the pull request was closed but the JIRA status was not updated – and if the pull request was closed for a good reason, the corresponding JIRA should probably be closed as well.

Thank you, Ilya

 

 

From: Sean Owen [mailto:[hidden email]]
Sent: Tuesday, January 24, 2017 11:23 AM
To: Ilya Matiach <[hidden email]>
Cc: [hidden email]
Subject: Re: Feedback on MLlib roadmap process proposal

 

On Tue, Jan 24, 2017 at 3:58 PM Ilya Matiach <[hidden email]> wrote:

Just a few questions with regards to the MLLIB process:

 

  1. Is there a list of committers who can/are shepherds and what code they own?  I’ve seen this page: http://spark.apache.org/committers.html but I’m not sure if it is up to date and it doesn’t mention what code the committers own.  It would be useful to know who owns ML or MLLIB.  From my limited personal experience this seems to be Joseph K. Bradley, Yanbo Liang and Sean Owen.

There is no such list because there's no formal notion of ownership or access to subsets of the project. Tracking an informal notion would be process mostly for its own sake, and probably just go out of date. We sort of tried this with 'maintainers' and it didn't actually do anything.

 

I am not active much in ML, but will occasionally help commit simple changes. What you see organically is pretty much what is, at any given time. People you see responding are the active ones, and influencers, commit bit or no.

 

 

  1.  
  2. Based on both user votes and watchers, the top issue currently is “SPARK-5575: Artificial neural networks for MLlib deep learning”.  However, it looks like it has been opened for almost 2 years and not a lot of progress is being made.  There seem to be other top issues which aren’t getting addressed as well on these pages mentioned in the roadmap: MLlib, sorted by: Votes or Watchers .  Is my perception incorrect, or is there a very good reason for not addressing the top issues voted for by the community?  If there is a good reason, is there a way to filter such JIRAs out from the sorted lists, to know which JIRAs really should be taken/worked on?

JIRA votes and watchers don't mean anything, formally. This isn't a product company where one group might give another group a list of top priorities to work on. There's a general statement about this at http://spark.apache.org/contributing.html under "Code Review Criteria". In practice, it's a soft process of convincing other people that change X does more good than harm, is worth taking the burden of supporting, matters to users, etc. I ignore 80% of issues, that don't seem to fit these criteria, and choose to help with the 20% that do, which are usually simple and/or important bug fixes. 

 

ANNs? that's a tangent but my snap reaction are:

It's something Everybody wants Somebody Else to create, which may explain the votes vs activity?

There is one basic ANN implementation in Spark actually.

There are others outside Spark, so may be something people get elsewhere like dl4j or BigDL, or strapping TF to Spark in various ways. 

DL is also not an obviously-great fit for the data-parallel computation model here.

It's not a goal to implement everything in Spark. It could be a good idea, but, no need to tether it to the core project, to the exclusion of "unblessed" third-party packages.

 

 

  1.  
  2. Also, this might be a newbie question, but for new contributors to spark, is there a process to convince a committer to be assigned to a JIRA that we are working on. It would be useful if there was a clear threshold for whether a committer can reject to work on a JIRA ahead of time, so contributors won’t waste time working on issues that aren’t important to spark and focus on making progress on the issues that the spark committers would like us to fix.

 

No, there's no concept of being tasked to work on something by someone else here. I can't imagine we could establish a clear objective threshold for such a subjective thing. 

 

It's not a satisfying answer but it is the most realistic one. All of these OSS projects work on soft power, persuasion and cooperation. I think the good news is that all the intuitive ways to gain soft power do work: give time to others' problems if you want time on your own, help review, make thoughtful careful changes, etc.

 

My general guidance is: don't bother doing significant feature work unless you have some clear buy-in from someone who can commit.

 

I completely agree that issues should be closed more aggressively for the reason you give. On the flip-side this often ruffles feathers. We are still overrun with issues but it's gotten a lot better culture-wise about honestly rejecting lots of inbound stuff quickly.

 

Reply | Threaded
Open this post in threaded view
|

Re: Feedback on MLlib roadmap process proposal

Sean Owen
On Wed, Jan 25, 2017 at 6:01 AM Ilya Matiach <[hidden email]> wrote:

My confusion was that the ML 2.2 roadmap critical features (https://issues.apache.org/jira/browse/SPARK-18813) did not line up with the top ML/MLLIB JIRAs by Votes or Watchers.

Your explanation that they do not have to and there is a more complex process to choosing the changes that will make it into the next release makes sense to me.


For Spark ML, Joseph is the de facto leader and does publish a tentative roadmap. (We could also use JIRA mechanisms for this but any scheme is better than none.) Yes, not based on Votes -- nothing here is. Votes are noisy signal because it is usually measures: what would you like done if you didn't have to do it and there were no downsides for you?

 

We do that. It occasionally generates protests, so, I find myself erring on the side of ignoring. You can comment on any JIRA you think should be closed. That's helpful.

That particular JIRA seems potentially legitimate. I wouldn't close it. It also won't get fixed until someone proposes a resolution. I'd strongly encourage people saying "I have this problem too" to try to fix it. I tend to ignore these otherwise, myself, in favor of reviewing ones where someone has gone to the trouble of proposing a working fix.

 

Also, I would love to do this if I had the permissions, but it would be great to change the JIRAs that are marked as “in progress” but where the corresponding pull request was closed/cancelled, for example https://issues.apache.org/jira/browse/SPARK-4638.  That JIRA is


Yes, flag these. I or others can close them if appropriate. Anyone who consistently does this well, we could give JIRA permissions to.

Opening a PR automatically makes it "In Progress" but there's no complementary process to un-mark it. You can ignore the Open / In Progress distinction really.

This one is interesting because it does seem like a plausible feature to add. The original PR was abandoned by the author and nobody else submitted one -- despite the Votes. I hesitate to signal that no PRs would be considered, but, doesn't seem like it's in demand enough for someone to work on?


I think one of my messages is that, de facto, here, like in many Apache projects, committers do not take requests. They pursue the work they believe needs doing, and shepherd work initiated by others (a clear bug report, a PR) to a resolution. Things get done by doing them, or by building influence by doing other things the project needs doing. It isn't a mechanical, objective process, and can't be. But it does work in a recognizable way.
Reply | Threaded
Open this post in threaded view
|

Re: Feedback on MLlib roadmap process proposal

Joseph Bradley
Sean has given a great explanation.  A few more comments:

Roadmap: I have been creating roadmap JIRAs, but the goal really is to have all committers working on MLlib help to set that roadmap, based on either their knowledge of current maintenance/internal needs of the project or the feedback given from the rest of the community.
@Committers - I see people actively shepherding PRs for MLlib, but I don't see many major initiatives linked to the roadmap.  If there are ones large enough to merit adding to the roadmap, please do.

In general, there are many process improvements we could make.  A few in my mind are:
* Visibility: Let the community know what committers are focusing on.  This was the primary purpose of the "MLlib roadmap proposal."
* Community initiatives: This is currently very organic.  Some of the organic process could be improved, such as encouraging Votes/Watchers (though I agree with Sean about these being one-sided metrics).  Cody's SIP work is a great step towards adding more clarity and structure for major initiatives.
* JIRA hygiene: Always a challenge, and always requires some manual prodding.  But it's great to push for efforts on this.


On Wed, Jan 25, 2017 at 3:59 AM, Sean Owen <[hidden email]> wrote:
On Wed, Jan 25, 2017 at 6:01 AM Ilya Matiach <[hidden email]> wrote:

My confusion was that the ML 2.2 roadmap critical features (https://issues.apache.org/jira/browse/SPARK-18813) did not line up with the top ML/MLLIB JIRAs by Votes or Watchers.

Your explanation that they do not have to and there is a more complex process to choosing the changes that will make it into the next release makes sense to me.


For Spark ML, Joseph is the de facto leader and does publish a tentative roadmap. (We could also use JIRA mechanisms for this but any scheme is better than none.) Yes, not based on Votes -- nothing here is. Votes are noisy signal because it is usually measures: what would you like done if you didn't have to do it and there were no downsides for you?

 

We do that. It occasionally generates protests, so, I find myself erring on the side of ignoring. You can comment on any JIRA you think should be closed. That's helpful.

That particular JIRA seems potentially legitimate. I wouldn't close it. It also won't get fixed until someone proposes a resolution. I'd strongly encourage people saying "I have this problem too" to try to fix it. I tend to ignore these otherwise, myself, in favor of reviewing ones where someone has gone to the trouble of proposing a working fix.

 

Also, I would love to do this if I had the permissions, but it would be great to change the JIRAs that are marked as “in progress” but where the corresponding pull request was closed/cancelled, for example https://issues.apache.org/jira/browse/SPARK-4638.  That JIRA is


Yes, flag these. I or others can close them if appropriate. Anyone who consistently does this well, we could give JIRA permissions to.

Opening a PR automatically makes it "In Progress" but there's no complementary process to un-mark it. You can ignore the Open / In Progress distinction really.

This one is interesting because it does seem like a plausible feature to add. The original PR was abandoned by the author and nobody else submitted one -- despite the Votes. I hesitate to signal that no PRs would be considered, but, doesn't seem like it's in demand enough for someone to work on?


I think one of my messages is that, de facto, here, like in many Apache projects, committers do not take requests. They pursue the work they believe needs doing, and shepherd work initiated by others (a clear bug report, a PR) to a resolution. Things get done by doing them, or by building influence by doing other things the project needs doing. It isn't a mechanical, objective process, and can't be. But it does work in a recognizable way.



--

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

http://databricks.com

Reply | Threaded
Open this post in threaded view
|

Re: Feedback on MLlib roadmap process proposal

Nick Pentreath
Sorry for being late to the discussion. I think Joseph, Sean and others have covered the issues well. 

Overall I like the proposed cleaned up roadmap & process (thanks Joseph!). As for the actual critical roadmap items mentioned on SPARK-18813, I think it makes sense and will comment a bit further on that JIRA.

I would like to encourage votes & watching for issues to give a sense of what the community wants (I guess Vote is more explicit yet passive, while actually Watching an issue is more informative as it may indicate a real use case dependent on the issue?!).

I think if used well this is valuable information for contributors. Of course not everything on that list can get done. But if I look through the top votes or watch list, while not all of those are likely to go in, a great many of the issues are fairly non-contentious in terms of being good additions to the project.

Things like these are good examples IMO (I just sample a few of them, not exhaustive):
- sample weights for RF / DT
- multi-model and/or parallel model selection
- make sharedParams public?
- multi-column support for various transformers
- incremental model training
- tree algorithm enhancements

Now, whether these can be prioritised in terms of bandwidth available to reviewers and committers is a totally different thing. But as Sean mentions there is some process there for trying to find the balance of the issue being a "good thing to add", a shepherd with bandwidth & interest in the issue to review, and the maintenance burden imposed.

Let's take Deep Learning / NN for example. Here's a good example of something that has a lot of votes/watchers and as Sean mentions it is something that "everyone wants someone else to implement". In this case, much of the interest may in fact be "stale" - 2 years ago it would have been very interesting to have a strong DL impl in Spark. Now, because there are a plethora of very good DL libraries out there, how many of those Votes would be "deleted"? Granted few are well integrated with Spark but that can and is changing (DL4J, BigDL, the "XonSpark" flavours etc). 

So this is something that I dare say will not be in Spark any time in the foreseeable future or perhaps ever given the current status. Perhaps it's worth seriously thinking about just closing these kind of issues?



On Fri, 27 Jan 2017 at 05:53 Joseph Bradley <[hidden email]> wrote:
Sean has given a great explanation.  A few more comments:

Roadmap: I have been creating roadmap JIRAs, but the goal really is to have all committers working on MLlib help to set that roadmap, based on either their knowledge of current maintenance/internal needs of the project or the feedback given from the rest of the community.
@Committers - I see people actively shepherding PRs for MLlib, but I don't see many major initiatives linked to the roadmap.  If there are ones large enough to merit adding to the roadmap, please do.

In general, there are many process improvements we could make.  A few in my mind are:
* Visibility: Let the community know what committers are focusing on.  This was the primary purpose of the "MLlib roadmap proposal."
* Community initiatives: This is currently very organic.  Some of the organic process could be improved, such as encouraging Votes/Watchers (though I agree with Sean about these being one-sided metrics).  Cody's SIP work is a great step towards adding more clarity and structure for major initiatives.
* JIRA hygiene: Always a challenge, and always requires some manual prodding.  But it's great to push for efforts on this.


On Wed, Jan 25, 2017 at 3:59 AM, Sean Owen <[hidden email]> wrote:
On Wed, Jan 25, 2017 at 6:01 AM Ilya Matiach <[hidden email]> wrote:

My confusion was that the ML 2.2 roadmap critical features (https://issues.apache.org/jira/browse/SPARK-18813) did not line up with the top ML/MLLIB JIRAs by Votes or Watchers.

Your explanation that they do not have to and there is a more complex process to choosing the changes that will make it into the next release makes sense to me.


For Spark ML, Joseph is the de facto leader and does publish a tentative roadmap. (We could also use JIRA mechanisms for this but any scheme is better than none.) Yes, not based on Votes -- nothing here is. Votes are noisy signal because it is usually measures: what would you like done if you didn't have to do it and there were no downsides for you?

 

We do that. It occasionally generates protests, so, I find myself erring on the side of ignoring. You can comment on any JIRA you think should be closed. That's helpful.

That particular JIRA seems potentially legitimate. I wouldn't close it. It also won't get fixed until someone proposes a resolution. I'd strongly encourage people saying "I have this problem too" to try to fix it. I tend to ignore these otherwise, myself, in favor of reviewing ones where someone has gone to the trouble of proposing a working fix.

 

Also, I would love to do this if I had the permissions, but it would be great to change the JIRAs that are marked as “in progress” but where the corresponding pull request was closed/cancelled, for example https://issues.apache.org/jira/browse/SPARK-4638.  That JIRA is


Yes, flag these. I or others can close them if appropriate. Anyone who consistently does this well, we could give JIRA permissions to.

Opening a PR automatically makes it "In Progress" but there's no complementary process to un-mark it. You can ignore the Open / In Progress distinction really.

This one is interesting because it does seem like a plausible feature to add. The original PR was abandoned by the author and nobody else submitted one -- despite the Votes. I hesitate to signal that no PRs would be considered, but, doesn't seem like it's in demand enough for someone to work on?


I think one of my messages is that, de facto, here, like in many Apache projects, committers do not take requests. They pursue the work they believe needs doing, and shepherd work initiated by others (a clear bug report, a PR) to a resolution. Things get done by doing them, or by building influence by doing other things the project needs doing. It isn't a mechanical, objective process, and can't be. But it does work in a recognizable way.



--

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

http://databricks.com

Reply | Threaded
Open this post in threaded view
|

Re: Feedback on MLlib roadmap process proposal

Tim Hunter
As Sean wrote very nicely above, the changes made to Spark are decided in an organic fashion based on the interests and motivations of the committers and contributors. The case of deep learning is a good example. There is a lot of interest, and the core algorithms could be implemented without too much problem in a few thousands of lines of scala code. However, the performance of such a simple implementation would be one to two order of magnitude slower than what would get from the popular frameworks out there.

At this point, there are probably more man-hours invested in TensorFlow (as an example) than in MLlib, so I think we need to be realistic about what we can expect to achieve inside Spark. Unlike BLAS for linear algebra, there is no agreed-up interface for deep learning, and each of the XOnSpark flavors explores a slightly different design. It will be interesting to see what works well in practice. In the meantime, though, there are plenty of things that we could do to help developers of other libraries to have a great experience with Spark. Matei alluded to that in his Spark Summit keynote when he mentioned better integration with low-level libraries.

Tim


On Thu, Feb 23, 2017 at 5:32 AM, Nick Pentreath <[hidden email]> wrote:
Sorry for being late to the discussion. I think Joseph, Sean and others have covered the issues well. 

Overall I like the proposed cleaned up roadmap & process (thanks Joseph!). As for the actual critical roadmap items mentioned on SPARK-18813, I think it makes sense and will comment a bit further on that JIRA.

I would like to encourage votes & watching for issues to give a sense of what the community wants (I guess Vote is more explicit yet passive, while actually Watching an issue is more informative as it may indicate a real use case dependent on the issue?!).

I think if used well this is valuable information for contributors. Of course not everything on that list can get done. But if I look through the top votes or watch list, while not all of those are likely to go in, a great many of the issues are fairly non-contentious in terms of being good additions to the project.

Things like these are good examples IMO (I just sample a few of them, not exhaustive):
- sample weights for RF / DT
- multi-model and/or parallel model selection
- make sharedParams public?
- multi-column support for various transformers
- incremental model training
- tree algorithm enhancements

Now, whether these can be prioritised in terms of bandwidth available to reviewers and committers is a totally different thing. But as Sean mentions there is some process there for trying to find the balance of the issue being a "good thing to add", a shepherd with bandwidth & interest in the issue to review, and the maintenance burden imposed.

Let's take Deep Learning / NN for example. Here's a good example of something that has a lot of votes/watchers and as Sean mentions it is something that "everyone wants someone else to implement". In this case, much of the interest may in fact be "stale" - 2 years ago it would have been very interesting to have a strong DL impl in Spark. Now, because there are a plethora of very good DL libraries out there, how many of those Votes would be "deleted"? Granted few are well integrated with Spark but that can and is changing (DL4J, BigDL, the "XonSpark" flavours etc). 

So this is something that I dare say will not be in Spark any time in the foreseeable future or perhaps ever given the current status. Perhaps it's worth seriously thinking about just closing these kind of issues?



On Fri, 27 Jan 2017 at 05:53 Joseph Bradley <[hidden email]> wrote:
Sean has given a great explanation.  A few more comments:

Roadmap: I have been creating roadmap JIRAs, but the goal really is to have all committers working on MLlib help to set that roadmap, based on either their knowledge of current maintenance/internal needs of the project or the feedback given from the rest of the community.
@Committers - I see people actively shepherding PRs for MLlib, but I don't see many major initiatives linked to the roadmap.  If there are ones large enough to merit adding to the roadmap, please do.

In general, there are many process improvements we could make.  A few in my mind are:
* Visibility: Let the community know what committers are focusing on.  This was the primary purpose of the "MLlib roadmap proposal."
* Community initiatives: This is currently very organic.  Some of the organic process could be improved, such as encouraging Votes/Watchers (though I agree with Sean about these being one-sided metrics).  Cody's SIP work is a great step towards adding more clarity and structure for major initiatives.
* JIRA hygiene: Always a challenge, and always requires some manual prodding.  But it's great to push for efforts on this.


On Wed, Jan 25, 2017 at 3:59 AM, Sean Owen <[hidden email]> wrote:
On Wed, Jan 25, 2017 at 6:01 AM Ilya Matiach <[hidden email]> wrote:

My confusion was that the ML 2.2 roadmap critical features (https://issues.apache.org/jira/browse/SPARK-18813) did not line up with the top ML/MLLIB JIRAs by Votes or Watchers.

Your explanation that they do not have to and there is a more complex process to choosing the changes that will make it into the next release makes sense to me.


For Spark ML, Joseph is the de facto leader and does publish a tentative roadmap. (We could also use JIRA mechanisms for this but any scheme is better than none.) Yes, not based on Votes -- nothing here is. Votes are noisy signal because it is usually measures: what would you like done if you didn't have to do it and there were no downsides for you?

 

We do that. It occasionally generates protests, so, I find myself erring on the side of ignoring. You can comment on any JIRA you think should be closed. That's helpful.

That particular JIRA seems potentially legitimate. I wouldn't close it. It also won't get fixed until someone proposes a resolution. I'd strongly encourage people saying "I have this problem too" to try to fix it. I tend to ignore these otherwise, myself, in favor of reviewing ones where someone has gone to the trouble of proposing a working fix.

 

Also, I would love to do this if I had the permissions, but it would be great to change the JIRAs that are marked as “in progress” but where the corresponding pull request was closed/cancelled, for example https://issues.apache.org/jira/browse/SPARK-4638.  That JIRA is


Yes, flag these. I or others can close them if appropriate. Anyone who consistently does this well, we could give JIRA permissions to.

Opening a PR automatically makes it "In Progress" but there's no complementary process to un-mark it. You can ignore the Open / In Progress distinction really.

This one is interesting because it does seem like a plausible feature to add. The original PR was abandoned by the author and nobody else submitted one -- despite the Votes. I hesitate to signal that no PRs would be considered, but, doesn't seem like it's in demand enough for someone to work on?


I think one of my messages is that, de facto, here, like in many Apache projects, committers do not take requests. They pursue the work they believe needs doing, and shepherd work initiated by others (a clear bug report, a PR) to a resolution. Things get done by doing them, or by building influence by doing other things the project needs doing. It isn't a mechanical, objective process, and can't be. But it does work in a recognizable way.



--

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

http://databricks.com


Reply | Threaded
Open this post in threaded view
|

Re: Feedback on MLlib roadmap process proposal

Nick Pentreath
FYI I've started going through a few of the top Watched JIRAs and tried to identify those that are obviously stale and can probably be closed, to try to clean things up a bit.

On Thu, 23 Feb 2017 at 21:38 Tim Hunter <[hidden email]> wrote:
As Sean wrote very nicely above, the changes made to Spark are decided in an organic fashion based on the interests and motivations of the committers and contributors. The case of deep learning is a good example. There is a lot of interest, and the core algorithms could be implemented without too much problem in a few thousands of lines of scala code. However, the performance of such a simple implementation would be one to two order of magnitude slower than what would get from the popular frameworks out there.

At this point, there are probably more man-hours invested in TensorFlow (as an example) than in MLlib, so I think we need to be realistic about what we can expect to achieve inside Spark. Unlike BLAS for linear algebra, there is no agreed-up interface for deep learning, and each of the XOnSpark flavors explores a slightly different design. It will be interesting to see what works well in practice. In the meantime, though, there are plenty of things that we could do to help developers of other libraries to have a great experience with Spark. Matei alluded to that in his Spark Summit keynote when he mentioned better integration with low-level libraries.

Tim


On Thu, Feb 23, 2017 at 5:32 AM, Nick Pentreath <[hidden email]> wrote:
Sorry for being late to the discussion. I think Joseph, Sean and others have covered the issues well. 

Overall I like the proposed cleaned up roadmap & process (thanks Joseph!). As for the actual critical roadmap items mentioned on SPARK-18813, I think it makes sense and will comment a bit further on that JIRA.

I would like to encourage votes & watching for issues to give a sense of what the community wants (I guess Vote is more explicit yet passive, while actually Watching an issue is more informative as it may indicate a real use case dependent on the issue?!).

I think if used well this is valuable information for contributors. Of course not everything on that list can get done. But if I look through the top votes or watch list, while not all of those are likely to go in, a great many of the issues are fairly non-contentious in terms of being good additions to the project.

Things like these are good examples IMO (I just sample a few of them, not exhaustive):
- sample weights for RF / DT
- multi-model and/or parallel model selection
- make sharedParams public?
- multi-column support for various transformers
- incremental model training
- tree algorithm enhancements

Now, whether these can be prioritised in terms of bandwidth available to reviewers and committers is a totally different thing. But as Sean mentions there is some process there for trying to find the balance of the issue being a "good thing to add", a shepherd with bandwidth & interest in the issue to review, and the maintenance burden imposed.

Let's take Deep Learning / NN for example. Here's a good example of something that has a lot of votes/watchers and as Sean mentions it is something that "everyone wants someone else to implement". In this case, much of the interest may in fact be "stale" - 2 years ago it would have been very interesting to have a strong DL impl in Spark. Now, because there are a plethora of very good DL libraries out there, how many of those Votes would be "deleted"? Granted few are well integrated with Spark but that can and is changing (DL4J, BigDL, the "XonSpark" flavours etc). 

So this is something that I dare say will not be in Spark any time in the foreseeable future or perhaps ever given the current status. Perhaps it's worth seriously thinking about just closing these kind of issues?



On Fri, 27 Jan 2017 at 05:53 Joseph Bradley <[hidden email]> wrote:
Sean has given a great explanation.  A few more comments:

Roadmap: I have been creating roadmap JIRAs, but the goal really is to have all committers working on MLlib help to set that roadmap, based on either their knowledge of current maintenance/internal needs of the project or the feedback given from the rest of the community.
@Committers - I see people actively shepherding PRs for MLlib, but I don't see many major initiatives linked to the roadmap.  If there are ones large enough to merit adding to the roadmap, please do.

In general, there are many process improvements we could make.  A few in my mind are:
* Visibility: Let the community know what committers are focusing on.  This was the primary purpose of the "MLlib roadmap proposal."
* Community initiatives: This is currently very organic.  Some of the organic process could be improved, such as encouraging Votes/Watchers (though I agree with Sean about these being one-sided metrics).  Cody's SIP work is a great step towards adding more clarity and structure for major initiatives.
* JIRA hygiene: Always a challenge, and always requires some manual prodding.  But it's great to push for efforts on this.


On Wed, Jan 25, 2017 at 3:59 AM, Sean Owen <[hidden email]> wrote:
On Wed, Jan 25, 2017 at 6:01 AM Ilya Matiach <[hidden email]> wrote:

My confusion was that the ML 2.2 roadmap critical features (https://issues.apache.org/jira/browse/SPARK-18813) did not line up with the top ML/MLLIB JIRAs by Votes or Watchers.

Your explanation that they do not have to and there is a more complex process to choosing the changes that will make it into the next release makes sense to me.


For Spark ML, Joseph is the de facto leader and does publish a tentative roadmap. (We could also use JIRA mechanisms for this but any scheme is better than none.) Yes, not based on Votes -- nothing here is. Votes are noisy signal because it is usually measures: what would you like done if you didn't have to do it and there were no downsides for you?

 

We do that. It occasionally generates protests, so, I find myself erring on the side of ignoring. You can comment on any JIRA you think should be closed. That's helpful.

That particular JIRA seems potentially legitimate. I wouldn't close it. It also won't get fixed until someone proposes a resolution. I'd strongly encourage people saying "I have this problem too" to try to fix it. I tend to ignore these otherwise, myself, in favor of reviewing ones where someone has gone to the trouble of proposing a working fix.

 

Also, I would love to do this if I had the permissions, but it would be great to change the JIRAs that are marked as “in progress” but where the corresponding pull request was closed/cancelled, for example https://issues.apache.org/jira/browse/SPARK-4638.  That JIRA is


Yes, flag these. I or others can close them if appropriate. Anyone who consistently does this well, we could give JIRA permissions to.

Opening a PR automatically makes it "In Progress" but there's no complementary process to un-mark it. You can ignore the Open / In Progress distinction really.

This one is interesting because it does seem like a plausible feature to add. The original PR was abandoned by the author and nobody else submitted one -- despite the Votes. I hesitate to signal that no PRs would be considered, but, doesn't seem like it's in demand enough for someone to work on?


I think one of my messages is that, de facto, here, like in many Apache projects, committers do not take requests. They pursue the work they believe needs doing, and shepherd work initiated by others (a clear bug report, a PR) to a resolution. Things get done by doing them, or by building influence by doing other things the project needs doing. It isn't a mechanical, objective process, and can't be. But it does work in a recognizable way.



--

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

http://databricks.com