[DISCUSS] Support year-month and day-time Intervals

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Support year-month and day-time Intervals

Dr. Kent Yao
Hi, Devs

I’d like to propose to add two new interval types which are year-month and
day-time intervals for better ANSI support and future improvements. We will
keep the current CalenderIntervalType but mark it as deprecated until we
find the right time to remove it completely. The backward compatibility of
the old interval type usages in 2.4 will be guaranteed.

Here is the design doc:

[SPIP] Support Year-Month and Day-Time Intervals -
https://docs.google.com/document/d/1JNRzcBk4hcm7k2cOXSG1A9U9QM2iNGQzBSXZzScUwAU/edit?usp=sharing

All comments are welcome!

Thanks,

Kent Yao




--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support year-month and day-time Intervals

Dongjoon Hyun-2
Hi, Kent. 

Thank you for the proposal.

Does your proposal need to revert something from the master branch?
I'm just asking because it's not clear in the proposal document.

Bests,
Dongjoon.

On Fri, Jan 10, 2020 at 5:31 AM Dr. Kent Yao <[hidden email]> wrote:
Hi, Devs

I’d like to propose to add two new interval types which are year-month and
day-time intervals for better ANSI support and future improvements. We will
keep the current CalenderIntervalType but mark it as deprecated until we
find the right time to remove it completely. The backward compatibility of
the old interval type usages in 2.4 will be guaranteed.

Here is the design doc:

[SPIP] Support Year-Month and Day-Time Intervals -
https://docs.google.com/document/d/1JNRzcBk4hcm7k2cOXSG1A9U9QM2iNGQzBSXZzScUwAU/edit?usp=sharing

All comments are welcome!

Thanks,

Kent Yao




--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support year-month and day-time Intervals

Dr. Kent Yao
Hi Dongjoon,

Yes, As we want make CalenderIntervalType deprecated and so far, we just find
1. The make_interval function that produces legacy CalenderIntervalType values, 
2. `interval` -> CalenderIntervalType support in the parser

Thanks

Kent Yao
Data Science Center, Hangzhou Research Institute, Netease Corp.
PHONE: (86) 186-5715-3499

On 01/11/2020 01:57[hidden email] wrote:
Hi, Kent. 

Thank you for the proposal.

Does your proposal need to revert something from the master branch?
I'm just asking because it's not clear in the proposal document.

Bests,
Dongjoon.

On Fri, Jan 10, 2020 at 5:31 AM Dr. Kent Yao <[hidden email]> wrote:
Hi, Devs

I’d like to propose to add two new interval types which are year-month and
day-time intervals for better ANSI support and future improvements. We will
keep the current CalenderIntervalType but mark it as deprecated until we
find the right time to remove it completely. The backward compatibility of
the old interval type usages in 2.4 will be guaranteed.

Here is the design doc:

[SPIP] Support Year-Month and Day-Time Intervals -
https://docs.google.com/document/d/1JNRzcBk4hcm7k2cOXSG1A9U9QM2iNGQzBSXZzScUwAU/edit?usp=sharing

All comments are welcome!

Thanks,

Kent Yao




--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

--------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support year-month and day-time Intervals

Dongjoon Hyun-2
Thank you for clarification.

Bests,
Dongjoon.

On Fri, Jan 10, 2020 at 10:07 AM Kent Yao <[hidden email]> wrote:
Hi Dongjoon,

Yes, As we want make CalenderIntervalType deprecated and so far, we just find
1. The make_interval function that produces legacy CalenderIntervalType values, 
2. `interval` -> CalenderIntervalType support in the parser

Thanks

Kent Yao
Data Science Center, Hangzhou Research Institute, Netease Corp.
PHONE: (86) 186-5715-3499

On 01/11/2020 01:57[hidden email] wrote:
Hi, Kent. 

Thank you for the proposal.

Does your proposal need to revert something from the master branch?
I'm just asking because it's not clear in the proposal document.

Bests,
Dongjoon.

On Fri, Jan 10, 2020 at 5:31 AM Dr. Kent Yao <[hidden email]> wrote:
Hi, Devs

I’d like to propose to add two new interval types which are year-month and
day-time intervals for better ANSI support and future improvements. We will
keep the current CalenderIntervalType but mark it as deprecated until we
find the right time to remove it completely. The backward compatibility of
the old interval type usages in 2.4 will be guaranteed.

Here is the design doc:

[SPIP] Support Year-Month and Day-Time Intervals -
https://docs.google.com/document/d/1JNRzcBk4hcm7k2cOXSG1A9U9QM2iNGQzBSXZzScUwAU/edit?usp=sharing

All comments are welcome!

Thanks,

Kent Yao




--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support year-month and day-time Intervals

rxin
Introducing a new data type has high overhead, both in terms of internal complexity and users' cognitive load. Introducing two data types would have even higher overhead.

I looked quickly and looks like both Redshift and Snowflake, two of the most recent SQL analytics successes, have only one interval type, and don't support storing that. That gets me thinking in reality storing interval type is not that useful.

Do we really need to do this? One of the worst things we can do as a community is to introduce features that are almost never used, but at the same time have high internal complexity for maintenance.



On Fri, Jan 10, 2020 at 10:45 AM, Dongjoon Hyun <[hidden email]> wrote:
Thank you for clarification.

Bests,
Dongjoon.

On Fri, Jan 10, 2020 at 10:07 AM Kent Yao <[hidden email]> wrote:
Hi Dongjoon,

Yes, As we want make CalenderIntervalType deprecated and so far, we just find
1. The make_interval function that produces legacy CalenderIntervalType values, 
2. `interval` -> CalenderIntervalType support in the parser

Thanks

Kent Yao
Data Science Center, Hangzhou Research Institute, Netease Corp.
PHONE: (86) 186-5715-3499

On 01/11/2020 01:57[hidden email] wrote:
Hi, Kent. 

Thank you for the proposal.

Does your proposal need to revert something from the master branch?
I'm just asking because it's not clear in the proposal document.

Bests,
Dongjoon.

On Fri, Jan 10, 2020 at 5:31 AM Dr. Kent Yao <[hidden email]> wrote:
Hi, Devs

I’d like to propose to add two new interval types which are year-month and
day-time intervals for better ANSI support and future improvements. We will
keep the current CalenderIntervalType but mark it as deprecated until we
find the right time to remove it completely. The backward compatibility of
the old interval type usages in 2.4 will be guaranteed.

Here is the design doc:

[SPIP] Support Year-Month and Day-Time Intervals -
https://docs.google.com/document/d/1JNRzcBk4hcm7k2cOXSG1A9U9QM2iNGQzBSXZzScUwAU/edit?usp=sharing

All comments are welcome!

Thanks,

Kent Yao




--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support year-month and day-time Intervals

Dr. Kent Yao
Following ANSI might be a good option but also a serious user behavior change
to introduce two different interval types, so I also agree with Reynold to
follow what we have done since version 1.5.0, just like Snowflake and
Redshift.

Perhaps, we can make some efforts for the current interval type to make it
more future-proofing. e.g.
1. add unstable annotation to the CalendarInterval class. People already use
it as UDF inputs so it’s better to make it clear it’s unstable.
2. Add a schema checker to prohibit create v2 custom catalog table with
intervals, as same as what we do for the builtin catalog
3. Add a schema checker for DataFrameWriterV2 too
4. Make the interval type incomparable as version 2.4 for disambiguation of
comparison between year-month and day-time fields
5. The 3.0 newly added to_csv should not support output intervals as same as
using CSV file format
6. The function to_json should not allow using interval as a key field as
same as the value field and JSON datasource, with a legacy config to
restore.
7. Revert interval ISO/ANSI SQL Standard output since we decide not to
follow ANSI, so there is no round trip.

Bests,

Kent




--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support year-month and day-time Intervals

cloud0fan
The proposal makes sense to me. If we are not going to make interval type ANSI-compliant in this release, we should not expose it widely.

Thanks for driving it, Kent!

On Fri, Jan 17, 2020 at 10:52 AM Dr. Kent Yao <[hidden email]> wrote:
Following ANSI might be a good option but also a serious user behavior change
to introduce two different interval types, so I also agree with Reynold to
follow what we have done since version 1.5.0, just like Snowflake and
Redshift.

Perhaps, we can make some efforts for the current interval type to make it
more future-proofing. e.g.
1. add unstable annotation to the CalendarInterval class. People already use
it as UDF inputs so it’s better to make it clear it’s unstable.
2. Add a schema checker to prohibit create v2 custom catalog table with
intervals, as same as what we do for the builtin catalog
3. Add a schema checker for DataFrameWriterV2 too
4. Make the interval type incomparable as version 2.4 for disambiguation of
comparison between year-month and day-time fields
5. The 3.0 newly added to_csv should not support output intervals as same as
using CSV file format
6. The function to_json should not allow using interval as a key field as
same as the value field and JSON datasource, with a legacy config to
restore.
7. Revert interval ISO/ANSI SQL Standard output since we decide not to
follow ANSI, so there is no round trip.

Bests,

Kent




--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]