Will higher order functions in spark SQL be pushed upstream?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Will higher order functions in spark SQL be pushed upstream?

Antoine HOM
Hey guys,

Databricks released higher order functions as part of their runtime
3.0 beta (https://databricks.com/blog/2017/05/24/working-with-nested-data-using-higher-order-functions-in-sql-on-databricks.html),
which helps working with array within SQL statements.

* As a heavy user of complex data types I was wondering if there was
any plan to push those changes upstream?
* In addition, I was wondering if as part of this change it also tries
to solve the column pruning / filter pushdown issues with complex
datatypes?

Thanks!

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Will higher order functions in spark SQL be pushed upstream?

Olivier Girardot-2
+1 for the question

2017-06-07 19:50 GMT+02:00 Antoine HOM <[hidden email]>:
Hey guys,

Databricks released higher order functions as part of their runtime
3.0 beta (https://databricks.com/blog/2017/05/24/working-with-nested-data-using-higher-order-functions-in-sql-on-databricks.html),
which helps working with array within SQL statements.

* As a heavy user of complex data types I was wondering if there was
any plan to push those changes upstream?
* In addition, I was wondering if as part of this change it also tries
to solve the column pruning / filter pushdown issues with complex
datatypes?

Thanks!

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]




--
Olivier Girardot | Associé
Reply | Threaded
Open this post in threaded view
|

Re: Will higher order functions in spark SQL be pushed upstream?

Sameer Agarwal
In reply to this post by Antoine HOM
* As a heavy user of complex data types I was wondering if there was
any plan to push those changes upstream?

Yes, we intend to contribute this to open source.
 
* In addition, I was wondering if as part of this change it also tries
to solve the column pruning / filter pushdown issues with complex
datatypes?

For parquet, this effort is primarily tracked via SPARK-4502 (see https://github.com/apache/spark/pull/16578) and is currently targeted for 2.3.
Reply | Threaded
Open this post in threaded view
|

Re: Will higher order functions in spark SQL be pushed upstream?

Antoine HOM
Good news :) Thx Sameer.


On Friday, June 9, 2017, Sameer Agarwal <[hidden email]> wrote:
* As a heavy user of complex data types I was wondering if there was
any plan to push those changes upstream?

Yes, we intend to contribute this to open source.
 
* In addition, I was wondering if as part of this change it also tries
to solve the column pruning / filter pushdown issues with complex
datatypes?

For parquet, this effort is primarily tracked via SPARK-4502 (see https://github.com/apache/spark/pull/16578) and is currently targeted for 2.3.
Reply | Threaded
Open this post in threaded view
|

Re: Will higher order functions in spark SQL be pushed upstream?

DB Tsai-4
Hello,

At Netflix's algorithm team, we work on ranking problems a lot where
we naturally deal with the dataset with nested list of the structs. We
built Scala APIs like map, filter, drop, withColumn that can work on
the nested list of structs efficiently using SQL expression with
codegen.

Here is what we purpose on how APIs will look like, and we would like
to socialize with community to get more feedback!

https://issues.apache.org/jira/browse/SPARK-22231

It will be cool to share some building blocks with Databricks's higher
order function feature.

Thanks.

On Fri, Jun 9, 2017 at 5:04 PM, Antoine HOM <[hidden email]> wrote:

> Good news :) Thx Sameer.
>
>
> On Friday, June 9, 2017, Sameer Agarwal <[hidden email]> wrote:
>>>
>>> * As a heavy user of complex data types I was wondering if there was
>>> any plan to push those changes upstream?
>>
>>
>> Yes, we intend to contribute this to open source.
>>
>>>
>>> * In addition, I was wondering if as part of this change it also tries
>>> to solve the column pruning / filter pushdown issues with complex
>>> datatypes?
>>
>>
>> For parquet, this effort is primarily tracked via SPARK-4502 (see
>> https://github.com/apache/spark/pull/16578) and is currently targeted for
>> 2.3.

--
Sincerely,

DB Tsai
----------------------------------------------------------
PGP Key ID: 0x5CED8B896A6BDFA0

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]