Introduce FORMAT clause to CAST with SQL:2016 datetime patterns

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Introduce FORMAT clause to CAST with SQL:2016 datetime patterns

Gabor Kaszab
Hey Hive and Spark communities,
[dev@impala in cc]

I'm working on an Impala improvement to introduce the FORMAT clause within CAST() operator and to implement ISO SQL:2016 datetime pattern support for this new FORMAT clause:

One example of the new format:
SELECT(CAST("2018-01-02 09:15" as timestamp FORMAT "YYYY-MM-DD HH12:MI"));

I have put together a document for my proposal of how to do this in Impala and what patterns we plan to support to cover the SQL standard and what additional patterns we propose to support on top of the standard's recommendation.

The reason I share this with the Hive and Spark communities because I feel it would be nice that these systems were in line with the Impala implementation. So I'd like to involve these communities to the planning phase of this task so that everyone can share their opinion about whether this make sense in the proposed form.
Eventually I feel that each of these systems should have the SQL:2016 datetime format and I think it would be nice to have it with a newly introduced CAST(..FORMAT..) clause.

I would like to ask members from both Hive and Spark to take a look at my proposal and share their opinion from their own component's perspective. If we get on the same page I'll eventually open Jiras to cover this improvement for each mentioned systems.

Cheers,
Gabor



Reply | Threaded
Open this post in threaded view
|

Re: Introduce FORMAT clause to CAST with SQL:2016 datetime patterns

zero323
One concern here is introduction of second formatting convention.

This can not only cause confusion among users, but also result in some hard to spot bugs, when wrong format, with different meaning, is used. This is already a problem for Python and R users, with week year and months / minutes mixups popping out from time to time.

On Wed, 20 Mar 2019 at 10:53, Gabor Kaszab <[hidden email]> wrote:
Hey Hive and Spark communities,
[dev@impala in cc]

I'm working on an Impala improvement to introduce the FORMAT clause within CAST() operator and to implement ISO SQL:2016 datetime pattern support for this new FORMAT clause:

One example of the new format:
SELECT(CAST("2018-01-02 09:15" as timestamp FORMAT "YYYY-MM-DD HH12:MI"));

I have put together a document for my proposal of how to do this in Impala and what patterns we plan to support to cover the SQL standard and what additional patterns we propose to support on top of the standard's recommendation.

The reason I share this with the Hive and Spark communities because I feel it would be nice that these systems were in line with the Impala implementation. So I'd like to involve these communities to the planning phase of this task so that everyone can share their opinion about whether this make sense in the proposed form.
Eventually I feel that each of these systems should have the SQL:2016 datetime format and I think it would be nice to have it with a newly introduced CAST(..FORMAT..) clause.

I would like to ask members from both Hive and Spark to take a look at my proposal and share their opinion from their own component's perspective. If we get on the same page I'll eventually open Jiras to cover this improvement for each mentioned systems.

Cheers,
Gabor





--
Regards,
Maciej
Reply | Threaded
Open this post in threaded view
|

Re: Introduce FORMAT clause to CAST with SQL:2016 datetime patterns

Gabor Kaszab
In reply to this post by Gabor Kaszab
Thanks for the quick feedbacks, Maciej and Shawn!

Maciej:
The concern about confusing users with supporting multiple datetime patterns is a valid one. The cleanest way to introduce SQL:2016 patterns would be to drop the existing pattern support (SimpleDateFormat in case of Impala) and replace it with the new approach. This however, would break backwards compatibility and would break existing user workflows that use the old pattern. So in order to introduce the patterns from the standard (to be in sync with RDBMS like Oracle, Postgre and so on) I see the only way is to have both approaches next to each other. To reduce user confusion I think we should put emphasis on the docs to have a good coverage on this topic and clarify in which scenario which pattern is used.

Cheers,
Gabor




On Wed, Mar 20, 2019 at 9:37 PM Shawn Weeks <[hidden email]> wrote:
I’ve done some work on a to timestamp function for hive and one of the things I keep running into is most date time libraries don’t support fractional seconds for their format patterns yet most rdbms do support fractional seconds. It tends to trip things up when your porting sql over. If we’re going the cast with format way everywhere I’d like it to support that

Thanks
Shawn Weeks

Sent from my iPhone

> On Mar 20, 2019, at 4:53 AM, Gabor Kaszab <[hidden email]> wrote:
>
> Hey Hive and Spark communities,
> [dev@impala in cc]
>
> I'm working on an Impala improvement to introduce the FORMAT clause within
> CAST() operator and to implement ISO SQL:2016 datetime pattern support for
> this new FORMAT clause:
> https://issues.apache.org/jira/browse/IMPALA-4018
>
> One example of the new format:
> SELECT(CAST("2018-01-02 09:15" as timestamp FORMAT "YYYY-MM-DD HH12:MI"));
>
> I have put together a document for my proposal of how to do this in Impala
> and what patterns we plan to support to cover the SQL standard and what
> additional patterns we propose to support on top of the standard's
> recommendation.
> https://docs.google.com/document/d/1V7k6-lrPGW7_uhqM-FhKl3QsxwCRy69v2KIxPsGjc1k/
>
> The reason I share this with the Hive and Spark communities because I feel
> it would be nice that these systems were in line with the Impala
> implementation. So I'd like to involve these communities to the planning
> phase of this task so that everyone can share their opinion about whether
> this make sense in the proposed form.
> Eventually I feel that each of these systems should have the SQL:2016
> datetime format and I think it would be nice to have it with a newly
> introduced CAST(..FORMAT..) clause.
>
> I would like to ask members from both Hive and Spark to take a look at my
> proposal and share their opinion from their own component's perspective. If
> we get on the same page I'll eventually open Jiras to cover this
> improvement for each mentioned systems.
>
> Cheers,
> Gabor
Reply | Threaded
Open this post in threaded view
|

Re: Introduce FORMAT clause to CAST with SQL:2016 datetime patterns

Gabor Kaszab
Thanks for the feedback!
As I haven't received any comments recently and I hope I have addresses the previous ones I'll advance to the next step and open the related jiras for both Spark and Hive.

Cheers,
Gabor


On Thu, Mar 21, 2019 at 12:00 PM Gabor Kaszab <[hidden email]> wrote:
Thanks for the quick feedbacks, Maciej and Shawn!

Maciej:
The concern about confusing users with supporting multiple datetime patterns is a valid one. The cleanest way to introduce SQL:2016 patterns would be to drop the existing pattern support (SimpleDateFormat in case of Impala) and replace it with the new approach. This however, would break backwards compatibility and would break existing user workflows that use the old pattern. So in order to introduce the patterns from the standard (to be in sync with RDBMS like Oracle, Postgre and so on) I see the only way is to have both approaches next to each other. To reduce user confusion I think we should put emphasis on the docs to have a good coverage on this topic and clarify in which scenario which pattern is used.

Cheers,
Gabor




On Wed, Mar 20, 2019 at 9:37 PM Shawn Weeks <[hidden email]> wrote:
I’ve done some work on a to timestamp function for hive and one of the things I keep running into is most date time libraries don’t support fractional seconds for their format patterns yet most rdbms do support fractional seconds. It tends to trip things up when your porting sql over. If we’re going the cast with format way everywhere I’d like it to support that

Thanks
Shawn Weeks

Sent from my iPhone

> On Mar 20, 2019, at 4:53 AM, Gabor Kaszab <[hidden email]> wrote:
>
> Hey Hive and Spark communities,
> [dev@impala in cc]
>
> I'm working on an Impala improvement to introduce the FORMAT clause within
> CAST() operator and to implement ISO SQL:2016 datetime pattern support for
> this new FORMAT clause:
> https://issues.apache.org/jira/browse/IMPALA-4018
>
> One example of the new format:
> SELECT(CAST("2018-01-02 09:15" as timestamp FORMAT "YYYY-MM-DD HH12:MI"));
>
> I have put together a document for my proposal of how to do this in Impala
> and what patterns we plan to support to cover the SQL standard and what
> additional patterns we propose to support on top of the standard's
> recommendation.
> https://docs.google.com/document/d/1V7k6-lrPGW7_uhqM-FhKl3QsxwCRy69v2KIxPsGjc1k/
>
> The reason I share this with the Hive and Spark communities because I feel
> it would be nice that these systems were in line with the Impala
> implementation. So I'd like to involve these communities to the planning
> phase of this task so that everyone can share their opinion about whether
> this make sense in the proposed form.
> Eventually I feel that each of these systems should have the SQL:2016
> datetime format and I think it would be nice to have it with a newly
> introduced CAST(..FORMAT..) clause.
>
> I would like to ask members from both Hive and Spark to take a look at my
> proposal and share their opinion from their own component's perspective. If
> we get on the same page I'll eventually open Jiras to cover this
> improvement for each mentioned systems.
>
> Cheers,
> Gabor