RDD MLLib Deprecation Question

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RDD MLLib Deprecation Question

John Compitello
Hey all,

I see on the MLLib website that there are plans to deprecate the RDD based API for MLLib once the new ML API reaches feature parity with RDD based one. Are there currently plans to reimplement all the distributed linear algebra / matrices operations as part of this new API, or are these things just going away? Like, will there still be a BlockMatrix class for distributed multiplies?

Best,

John



---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RDD MLLib Deprecation Question

Nick Pentreath
The short answer is those distributed linalg parts will not go away.

In the medium term, it's much less likely that the distributed matrix classes will be ported over to DataFrames (though the ideal would be to have DataFrame-backed distributed matrix classes) - given the time and effort it's taken just to port the various ML models and feature transformers over to ML.

The current distributed matrices use the old mllib linear algebra primitives for backing datastructures and ops, so those will have to be ported at some point to the ml package vectors & matrices, though overall functionality would remain the same initially I would expect.

There is https://issues.apache.org/jira/browse/SPARK-15882 that discusses some of the ideas. The decision would still need to be made on the higher-level API (whether it remains the same is current, or changes to be DF-based, and/or changed in other ways, etc)

On Tue, 30 May 2017 at 15:33 John Compitello <[hidden email]> wrote:
Hey all,

I see on the MLLib website that there are plans to deprecate the RDD based API for MLLib once the new ML API reaches feature parity with RDD based one. Are there currently plans to reimplement all the distributed linear algebra / matrices operations as part of this new API, or are these things just going away? Like, will there still be a BlockMatrix class for distributed multiplies?

Best,

John



---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Loading...