Spark DataFrame UNPIVOT feature

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark DataFrame UNPIVOT feature

Ivan Gozali
Hi there,

I was looking into why the UNPIVOT feature isn't implemented, given that Spark already has PIVOT implemented natively in the DataFrame/Dataset API.

Came across this JIRA which talks about implementing PIVOT in Spark 1.6, but no mention whatsoever regarding UNPIVOT, even though the JIRA curiously references a blog post that talks about both PIVOT and UNPIVOT :)

Is this because UNPIVOT is just simply generating multiple slim tables by selecting each column, and making a union out of all of them?

Thank you!

--
Regards,


Ivan Gozali
Lecida
Reply | Threaded
Open this post in threaded view
|

Re: Spark DataFrame UNPIVOT feature

rxin
Probably just because it is not used that often and nobody has submitted a patch for it. I've used pivot probably on average once a week (primarily in spreadsheets), but I've never used unpivot ...


On Tue, Aug 21, 2018 at 3:06 PM Ivan Gozali <[hidden email]> wrote:
Hi there,

I was looking into why the UNPIVOT feature isn't implemented, given that Spark already has PIVOT implemented natively in the DataFrame/Dataset API.

Came across this JIRA which talks about implementing PIVOT in Spark 1.6, but no mention whatsoever regarding UNPIVOT, even though the JIRA curiously references a blog post that talks about both PIVOT and UNPIVOT :)

Is this because UNPIVOT is just simply generating multiple slim tables by selecting each column, and making a union out of all of them?

Thank you!

--
Regards,


Ivan Gozali
Lecida
Reply | Threaded
Open this post in threaded view
|

Re: Spark DataFrame UNPIVOT feature

Mike Hynes
Hi Reynold/Ivan,

People familiar with pandas and R dataframes will likely have used the dataframe "melt" idiom, which is the functionality I believe you are referring to:

I have had to write this function myself in my own work in Spark SQL, as it is a common step in data wrangling when you do not control the structure of the input dataframes you are working with in your pipelines.

I would hence second Ivan that adding it as a native dataframe method would no doubt be helpful (and for what it's worth, so would other concepts from the pandas API, such as named indexing & multilevel indexing).

Cheers,
Mike 




On Tue, Aug 21, 2018, 5:07 PM Reynold Xin, <[hidden email]> wrote:
Probably just because it is not used that often and nobody has submitted a patch for it. I've used pivot probably on average once a week (primarily in spreadsheets), but I've never used unpivot ...


On Tue, Aug 21, 2018 at 3:06 PM Ivan Gozali <[hidden email]> wrote:
Hi there,

I was looking into why the UNPIVOT feature isn't implemented, given that Spark already has PIVOT implemented natively in the DataFrame/Dataset API.

Came across this JIRA which talks about implementing PIVOT in Spark 1.6, but no mention whatsoever regarding UNPIVOT, even though the JIRA curiously references a blog post that talks about both PIVOT and UNPIVOT :)

Is this because UNPIVOT is just simply generating multiple slim tables by selecting each column, and making a union out of all of them?

Thank you!

--
Regards,


Ivan Gozali
Lecida
Reply | Threaded
Open this post in threaded view
|

Re: Spark DataFrame UNPIVOT feature

zero323
In reply to this post by rxin
Given popularity of related SO questions:


On Wed, 22 Aug 2018 at 00:07, Reynold Xin <[hidden email]> wrote:
Probably just because it is not used that often and nobody has submitted a patch for it. I've used pivot probably on average once a week (primarily in spreadsheets), but I've never used unpivot ...


On Tue, Aug 21, 2018 at 3:06 PM Ivan Gozali <[hidden email]> wrote:
Hi there,

I was looking into why the UNPIVOT feature isn't implemented, given that Spark already has PIVOT implemented natively in the DataFrame/Dataset API.

Came across this JIRA which talks about implementing PIVOT in Spark 1.6, but no mention whatsoever regarding UNPIVOT, even though the JIRA curiously references a blog post that talks about both PIVOT and UNPIVOT :)

Is this because UNPIVOT is just simply generating multiple slim tables by selecting each column, and making a union out of all of them?

Thank you!

--
Regards,


Ivan Gozali
Lecida