MatrixUDT and VectorUDT in Spark ML

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

MatrixUDT and VectorUDT in Spark ML

Li Jin
Hi All,

I came across these two types MatrixUDT and VectorUDF in Spark ML when doing feature extraction and preprocessing with PySpark. However, when trying to do some basic operations, such as vector multiplication and matrix multiplication, I had to go down to Python UDF. 

It seems to be it would be very useful to have built-in operators on these types just like first class Spark SQL types, e.g.,

df.withColumn('v', df.matrix_column * df.vector_column)

I wonder what are other people's thoughts on this?

Li
Reply | Threaded
Open this post in threaded view
|

RE: MatrixUDT and VectorUDT in Spark ML

Himanshu Mohan

I agree

 

 

 

Thanks

Himanshu

 

From: Li Jin [mailto:[hidden email]]
Sent: Friday, March 23, 2018 8:24 PM
To: dev <[hidden email]>
Subject: MatrixUDT and VectorUDT in Spark ML

 

Hi All,

 

I came across these two types MatrixUDT and VectorUDF in Spark ML when doing feature extraction and preprocessing with PySpark. However, when trying to do some basic operations, such as vector multiplication and matrix multiplication, I had to go down to Python UDF. 

 

It seems to be it would be very useful to have built-in operators on these types just like first class Spark SQL types, e.g.,

 

df.withColumn('v', df.matrix_column * df.vector_column)

 

I wonder what are other people's thoughts on this?

 

Li


American Express made the following annotations


"This message and any attachments are solely for the intended recipient and may contain confidential or privileged information. If you are not the intended recipient, any disclosure, copying, use, or distribution of the information included in this message and any attachments is prohibited. If you have received this communication in error, please notify us by reply e-mail and immediately and permanently delete this message and any attachments. Thank you."

American Express a ajouté le commentaire suivant le
Ce courrier et toute pièce jointe qu'il contient sont réservés au seul destinataire indiqué et peuvent renfermer des renseignements confidentiels et privilégiés. Si vous n'êtes pas le destinataire prévu, toute divulgation, duplication, utilisation ou distribution du courrier ou de toute pièce jointe est interdite. Si vous avez reçu cette communication par erreur, veuillez nous en aviser par courrier et détruire immédiatement le courrier et les pièces jointes. Merci.

Reply | Threaded
Open this post in threaded view
|

Re: MatrixUDT and VectorUDT in Spark ML

Dongjin Lee
How is this issue going? Is there any Jira ticket about this?

Thanks,
Dongjin

On Sat, Mar 24, 2018 at 1:39 PM, Himanshu Mohan <[hidden email]> wrote:

I agree

 

 

 

Thanks

Himanshu

 

From: Li Jin [mailto:[hidden email]]
Sent: Friday, March 23, 2018 8:24 PM
To: dev <[hidden email]>
Subject: MatrixUDT and VectorUDT in Spark ML

 

Hi All,

 

I came across these two types MatrixUDT and VectorUDF in Spark ML when doing feature extraction and preprocessing with PySpark. However, when trying to do some basic operations, such as vector multiplication and matrix multiplication, I had to go down to Python UDF. 

 

It seems to be it would be very useful to have built-in operators on these types just like first class Spark SQL types, e.g.,

 

df.withColumn('v', df.matrix_column * df.vector_column)

 

I wonder what are other people's thoughts on this?

 

Li


American Express made the following annotations


"This message and any attachments are solely for the intended recipient and may contain confidential or privileged information. If you are not the intended recipient, any disclosure, copying, use, or distribution of the information included in this message and any attachments is prohibited. If you have received this communication in error, please notify us by reply e-mail and immediately and permanently delete this message and any attachments. Thank you."

American Express a ajouté le commentaire suivant le
Ce courrier et toute pièce jointe qu'il contient sont réservés au seul destinataire indiqué et peuvent renfermer des renseignements confidentiels et privilégiés. Si vous n'êtes pas le destinataire prévu, toute divulgation, duplication, utilisation ou distribution du courrier ou de toute pièce jointe est interdite. Si vous avez reçu cette communication par erreur, veuillez nous en aviser par courrier et détruire immédiatement le courrier et les pièces jointes. Merci.




--
Dongjin Lee

A hitchhiker in the mathematical world.
Reply | Threaded
Open this post in threaded view
|

Re: MatrixUDT and VectorUDT in Spark ML

Li Jin
Please see https://issues.apache.org/jira/browse/SPARK-24258
On Wed, May 30, 2018 at 10:40 PM Dongjin Lee <[hidden email]> wrote:
How is this issue going? Is there any Jira ticket about this?

Thanks,
Dongjin

On Sat, Mar 24, 2018 at 1:39 PM, Himanshu Mohan <[hidden email]> wrote:

I agree

 

 

 

Thanks

Himanshu

 

From: Li Jin [mailto:[hidden email]]
Sent: Friday, March 23, 2018 8:24 PM
To: dev <[hidden email]>
Subject: MatrixUDT and VectorUDT in Spark ML

 

Hi All,

 

I came across these two types MatrixUDT and VectorUDF in Spark ML when doing feature extraction and preprocessing with PySpark. However, when trying to do some basic operations, such as vector multiplication and matrix multiplication, I had to go down to Python UDF. 

 

It seems to be it would be very useful to have built-in operators on these types just like first class Spark SQL types, e.g.,

 

df.withColumn('v', df.matrix_column * df.vector_column)

 

I wonder what are other people's thoughts on this?

 

Li


American Express made the following annotations


"This message and any attachments are solely for the intended recipient and may contain confidential or privileged information. If you are not the intended recipient, any disclosure, copying, use, or distribution of the information included in this message and any attachments is prohibited. If you have received this communication in error, please notify us by reply e-mail and immediately and permanently delete this message and any attachments. Thank you."

American Express a ajouté le commentaire suivant le
Ce courrier et toute pièce jointe qu'il contient sont réservés au seul destinataire indiqué et peuvent renfermer des renseignements confidentiels et privilégiés. Si vous n'êtes pas le destinataire prévu, toute divulgation, duplication, utilisation ou distribution du courrier ou de toute pièce jointe est interdite. Si vous avez reçu cette communication par erreur, veuillez nous en aviser par courrier et détruire immédiatement le courrier et les pièces jointes. Merci.




--
Dongjin Lee

A hitchhiker in the mathematical world.