[ML] Migrating transformers from mllib to ml

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[ML] Migrating transformers from mllib to ml

Marco Gaido
Hello,

I saw that there are several TODOs to migrate some transformers (like HashingTF and IDF) to use only ml.Vector in order to avoid the overhead of converting them to the mllib ones and back.

Is there any reason why this has not been done so far? Is it to avoid code duplication? If so, is it still an issue since we are going to deprecate mllib from 2.3 (at least this is what I read on Spark docs)? If no, I can work on this.

Thanks,
Marco


Reply | Threaded
Open this post in threaded view
|

Re: [ML] Migrating transformers from mllib to ml

颜发才(Yan Facai)
Hi, I have migrated HashingTF from mllib to ml, and wait for review.

see:
[SPARK-21748][ML] Migrate the implementation of HashingTF from MLlib to ML #18998
https://github.com/apache/spark/pull/18998



On Mon, Nov 6, 2017 at 10:58 PM, Marco Gaido <[hidden email]> wrote:
Hello,

I saw that there are several TODOs to migrate some transformers (like HashingTF and IDF) to use only ml.Vector in order to avoid the overhead of converting them to the mllib ones and back.

Is there any reason why this has not been done so far? Is it to avoid code duplication? If so, is it still an issue since we are going to deprecate mllib from 2.3 (at least this is what I read on Spark docs)? If no, I can work on this.

Thanks,
Marco



Reply | Threaded
Open this post in threaded view
|

Re: [ML] Migrating transformers from mllib to ml

Joseph Bradley
Hi, we do still want to do this migration; it's just been a bit stalled due to low bandwidth.  There are still a few feature parity items which need to be completed, so the deprecation will likely not happen until after 2.3.
Joseph

On Tue, Nov 7, 2017 at 12:38 AM, 颜发才(Yan Facai) <[hidden email]> wrote:
Hi, I have migrated HashingTF from mllib to ml, and wait for review.

see:
[SPARK-21748][ML] Migrate the implementation of HashingTF from MLlib to ML #18998
https://github.com/apache/spark/pull/18998



On Mon, Nov 6, 2017 at 10:58 PM, Marco Gaido <[hidden email]> wrote:
Hello,

I saw that there are several TODOs to migrate some transformers (like HashingTF and IDF) to use only ml.Vector in order to avoid the overhead of converting them to the mllib ones and back.

Is there any reason why this has not been done so far? Is it to avoid code duplication? If so, is it still an issue since we are going to deprecate mllib from 2.3 (at least this is what I read on Spark docs)? If no, I can work on this.

Thanks,
Marco






--

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

http://databricks.com