[Spark SQL] Nanoseconds in Timestamps are set as Microseconds

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Spark SQL] Nanoseconds in Timestamps are set as Microseconds

Anton Okolnychyi
Hi all, 

I would like to ask what the community thinks regarding the way how Spark handles nanoseconds in the Timestamp type. 

As far as I see in the code, Spark assumes microseconds precision. Therefore, I expect to have a truncated to microseconds timestamp or an exception if I specify a timestamp with nanoseconds. However, the current implementation just silently sets nanoseconds as microseconds in [1], which results in a wrong timestamp. Consider the example below:

spark.sql("SELECT cast('2015-01-02 00:00:00.000000001' as TIMESTAMP)").show(false)
+------------------------------------------------+
|CAST(2015-01-02 00:00:00.000000001 AS TIMESTAMP)|
+------------------------------------------------+
|2015-01-02 00:00:00.000001                      |
+------------------------------------------------+

This issue was already raised in SPARK-17914 but I do not see any decision there.

[1] - org.apache.spark.sql.catalyst.util.DateTimeUtils, toJavaTimestamp, line 204

Best regards,
Anton
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Spark SQL] Nanoseconds in Timestamps are set as Microseconds

rxin
Seems like a bug we should fix? I agree some form of truncation makes more sense.


On Thu, Jun 1, 2017 at 1:17 AM, Anton Okolnychyi <[hidden email]> wrote:
Hi all, 

I would like to ask what the community thinks regarding the way how Spark handles nanoseconds in the Timestamp type. 

As far as I see in the code, Spark assumes microseconds precision. Therefore, I expect to have a truncated to microseconds timestamp or an exception if I specify a timestamp with nanoseconds. However, the current implementation just silently sets nanoseconds as microseconds in [1], which results in a wrong timestamp. Consider the example below:

spark.sql("SELECT cast('2015-01-02 00:00:00.000000001' as TIMESTAMP)").show(false)
+------------------------------------------------+
|CAST(2015-01-02 00:00:00.000000001 AS TIMESTAMP)|
+------------------------------------------------+
|2015-01-02 00:00:00.000001                      |
+------------------------------------------------+

This issue was already raised in SPARK-17914 but I do not see any decision there.

[1] - org.apache.spark.sql.catalyst.util.DateTimeUtils, toJavaTimestamp, line 204

Best regards,
Anton

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Spark SQL] Nanoseconds in Timestamps are set as Microseconds

Anton Okolnychyi
Then let me provide a PR so that we can discuss an alternative way

2017-06-02 8:26 GMT+02:00 Reynold Xin <[hidden email]>:
Seems like a bug we should fix? I agree some form of truncation makes more sense.


On Thu, Jun 1, 2017 at 1:17 AM, Anton Okolnychyi <[hidden email]> wrote:
Hi all, 

I would like to ask what the community thinks regarding the way how Spark handles nanoseconds in the Timestamp type. 

As far as I see in the code, Spark assumes microseconds precision. Therefore, I expect to have a truncated to microseconds timestamp or an exception if I specify a timestamp with nanoseconds. However, the current implementation just silently sets nanoseconds as microseconds in [1], which results in a wrong timestamp. Consider the example below:

spark.sql("SELECT cast('2015-01-02 00:00:00.000000001' as TIMESTAMP)").show(false)
+------------------------------------------------+
|CAST(2015-01-02 00:00:00.000000001 AS TIMESTAMP)|
+------------------------------------------------+
|2015-01-02 00:00:00.000001                      |
+------------------------------------------------+

This issue was already raised in SPARK-17914 but I do not see any decision there.

[1] - org.apache.spark.sql.catalyst.util.DateTimeUtils, toJavaTimestamp, line 204

Best regards,
Anton


Loading...