New to dev community | Contribution to Mlib

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

New to dev community | Contribution to Mlib

Venali Sonone

Hello,

I am new to dev community of Spark and also open source in general but have used Spark extensively.
I want to create a complete part on anomaly detection in spark Mlib, 
For the same I want to know if someone could guide me so i can start the development and contribute to Spark Mlib.

Sorry for sounding naive if i do but any help is appreciated.

Cheers!
-venna

Reply | Threaded
Open this post in threaded view
|

Re: New to dev community | Contribution to Mlib

Seth Hendrickson
I'm not exactly clear on what you're proposing, but this sounds like something that would live as a Spark package - a framework for anomaly detection built on Spark. If there is some specific algorithm you have in mind, it would be good to propose it on JIRA and discuss why you think it needs to be included in Spark and not live as a Spark package. 

In general, there will probably be resistance to including new algorithms in Spark ML, especially until the ML package has reached full parity with MLlib. Still, if you can provide more details that will help to understand what is best here.

On Thu, Sep 14, 2017 at 1:29 AM, Venali Sonone <[hidden email]> wrote:

Hello,

I am new to dev community of Spark and also open source in general but have used Spark extensively.
I want to create a complete part on anomaly detection in spark Mlib, 
For the same I want to know if someone could guide me so i can start the development and contribute to Spark Mlib.

Sorry for sounding naive if i do but any help is appreciated.

Cheers!
-venna


Reply | Threaded
Open this post in threaded view
|

Re: New to dev community | Contribution to Mlib

Venali Sonone
Thank you for your response.

The algorithm that I am proposing is Isolation Forest.
Link to paper: paper. I particularly find that it should be included in Spark ML because so many applications that use Spark as part of real time streaming engine in industry need anomaly detection and current Spark ML supports it in some way by means clustering. I will probably start to create the implementation and prepare for proposal as you suggested.

It is interesting to know that Spark is still implementing stuff in Spark ML to reach full parity with MLlib. Can I please get connected to folks working on it as I am interested in contributing. I have been heavy user of Spark since summer'15.

 Cheers!
-Venali 

On Thu, Sep 21, 2017 at 1:33 AM, Seth Hendrickson <[hidden email]> wrote:
I'm not exactly clear on what you're proposing, but this sounds like something that would live as a Spark package - a framework for anomaly detection built on Spark. If there is some specific algorithm you have in mind, it would be good to propose it on JIRA and discuss why you think it needs to be included in Spark and not live as a Spark package. 

In general, there will probably be resistance to including new algorithms in Spark ML, especially until the ML package has reached full parity with MLlib. Still, if you can provide more details that will help to understand what is best here.

On Thu, Sep 14, 2017 at 1:29 AM, Venali Sonone <[hidden email]> wrote:

Hello,

I am new to dev community of Spark and also open source in general but have used Spark extensively.
I want to create a complete part on anomaly detection in spark Mlib, 
For the same I want to know if someone could guide me so i can start the development and contribute to Spark Mlib.

Sorry for sounding naive if i do but any help is appreciated.

Cheers!
-venna



Reply | Threaded
Open this post in threaded view
|

Re: New to dev community | Contribution to Mlib

Driesprong, Fokko
Hi Venna,

Sounds like a very interesting algorithm. I have to agree with Seth, in the end you don't want to add a lot of algorithms to Spark itself, it will blow up the codebase and in the end the tests will run forever. You can also consider publishing it to the Spark Packages website. I've also published an outlier detection over there:

Cheers, Fokko

2017-09-22 2:10 GMT+02:00 Venali Sonone <[hidden email]>:
Thank you for your response.

The algorithm that I am proposing is Isolation Forest.
Link to paper: paper. I particularly find that it should be included in Spark ML because so many applications that use Spark as part of real time streaming engine in industry need anomaly detection and current Spark ML supports it in some way by means clustering. I will probably start to create the implementation and prepare for proposal as you suggested.

It is interesting to know that Spark is still implementing stuff in Spark ML to reach full parity with MLlib. Can I please get connected to folks working on it as I am interested in contributing. I have been heavy user of Spark since summer'15.

 Cheers!
-Venali 

On Thu, Sep 21, 2017 at 1:33 AM, Seth Hendrickson <[hidden email]> wrote:
I'm not exactly clear on what you're proposing, but this sounds like something that would live as a Spark package - a framework for anomaly detection built on Spark. If there is some specific algorithm you have in mind, it would be good to propose it on JIRA and discuss why you think it needs to be included in Spark and not live as a Spark package. 

In general, there will probably be resistance to including new algorithms in Spark ML, especially until the ML package has reached full parity with MLlib. Still, if you can provide more details that will help to understand what is best here.

On Thu, Sep 14, 2017 at 1:29 AM, Venali Sonone <[hidden email]> wrote:

Hello,

I am new to dev community of Spark and also open source in general but have used Spark extensively.
I want to create a complete part on anomaly detection in spark Mlib, 
For the same I want to know if someone could guide me so i can start the development and contribute to Spark Mlib.

Sorry for sounding naive if i do but any help is appreciated.

Cheers!
-venna