SPIP: support decimals with negative scale in decimal operation

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

SPIP: support decimals with negative scale in decimal operation

Marco Gaido
Hi all,

I am writing this e-mail in order to discuss the issue which is reported in SPARK-25454 and according to Wenchen's suggestion I prepared a design doc for it.

The problem we are facing here is that our rules for decimals operations are taken from Hive and MS SQL server and they explicitly don't support decimals with negative scales. So the rules we have currently are not meant to deal with negative scales. The issue is that Spark, instead, doesn't forbid negative scales and - indeed - there are cases in which we are producing them (eg. a SQL constant like 1e8 would be turned to a decimal(1, -8)).

Having negative scales most likely wasn't really intended. But unfortunately getting rid of them would be a breaking change as many operations working fine currently would not be allowed anymore and would overflow (eg. select 1e36 * 10000). As such, this is something I'd definitely agree on doing, but I think we can target only for 3.0.

What we can start doing now, instead, is updating our rules in order to handle properly also the case when decimal scales are negative. From my investigation, it turns out that the only operations which has problems with them is Divide.

Here you can find the design doc with all the details: https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit?usp=sharing. The document is also linked in SPARK-25454. There is also already a PR with the change: https://github.com/apache/spark/pull/22450.

Looking forward to hear your feedback,
Thanks.
Marco
Reply | Threaded
Open this post in threaded view
|

Re: SPIP: support decimals with negative scale in decimal operation

cloud0fan
Hi Marco,

Thanks for sending it! The problem is clearly explained in this email, but I would not treat it as a SPIP. It proposes a fix for a very tricky bug, and SPIP is usually for new features. Others please correct me if I was wrong.

Thanks,
Wenchen

On Fri, Sep 21, 2018 at 5:47 PM Marco Gaido <[hidden email]> wrote:
Hi all,

I am writing this e-mail in order to discuss the issue which is reported in SPARK-25454 and according to Wenchen's suggestion I prepared a design doc for it.

The problem we are facing here is that our rules for decimals operations are taken from Hive and MS SQL server and they explicitly don't support decimals with negative scales. So the rules we have currently are not meant to deal with negative scales. The issue is that Spark, instead, doesn't forbid negative scales and - indeed - there are cases in which we are producing them (eg. a SQL constant like 1e8 would be turned to a decimal(1, -8)).

Having negative scales most likely wasn't really intended. But unfortunately getting rid of them would be a breaking change as many operations working fine currently would not be allowed anymore and would overflow (eg. select 1e36 * 10000). As such, this is something I'd definitely agree on doing, but I think we can target only for 3.0.

What we can start doing now, instead, is updating our rules in order to handle properly also the case when decimal scales are negative. From my investigation, it turns out that the only operations which has problems with them is Divide.

Here you can find the design doc with all the details: https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit?usp=sharing. The document is also linked in SPARK-25454. There is also already a PR with the change: https://github.com/apache/spark/pull/22450.

Looking forward to hear your feedback,
Thanks.
Marco
Reply | Threaded
Open this post in threaded view
|

Re: SPIP: support decimals with negative scale in decimal operation

Marco Gaido
Hi Wenchen,
Thank you for the clarification. I agree that this is more a bug fix rather than an improvement. I apologize for the error. Please consider this as a design doc.

Thanks,
Marco

Il giorno ven 21 set 2018 alle ore 12:04 Wenchen Fan <[hidden email]> ha scritto:
Hi Marco,

Thanks for sending it! The problem is clearly explained in this email, but I would not treat it as a SPIP. It proposes a fix for a very tricky bug, and SPIP is usually for new features. Others please correct me if I was wrong.

Thanks,
Wenchen

On Fri, Sep 21, 2018 at 5:47 PM Marco Gaido <[hidden email]> wrote:
Hi all,

I am writing this e-mail in order to discuss the issue which is reported in SPARK-25454 and according to Wenchen's suggestion I prepared a design doc for it.

The problem we are facing here is that our rules for decimals operations are taken from Hive and MS SQL server and they explicitly don't support decimals with negative scales. So the rules we have currently are not meant to deal with negative scales. The issue is that Spark, instead, doesn't forbid negative scales and - indeed - there are cases in which we are producing them (eg. a SQL constant like 1e8 would be turned to a decimal(1, -8)).

Having negative scales most likely wasn't really intended. But unfortunately getting rid of them would be a breaking change as many operations working fine currently would not be allowed anymore and would overflow (eg. select 1e36 * 10000). As such, this is something I'd definitely agree on doing, but I think we can target only for 3.0.

What we can start doing now, instead, is updating our rules in order to handle properly also the case when decimal scales are negative. From my investigation, it turns out that the only operations which has problems with them is Divide.

Here you can find the design doc with all the details: https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit?usp=sharing. The document is also linked in SPARK-25454. There is also already a PR with the change: https://github.com/apache/spark/pull/22450.

Looking forward to hear your feedback,
Thanks.
Marco
Reply | Threaded
Open this post in threaded view
|

Re: SPIP: support decimals with negative scale in decimal operation

Felix Cheung
DISCUSS thread is good to have...

 

From: Marco Gaido <[hidden email]>
Sent: Friday, September 21, 2018 3:31 AM
To: Wenchen Fan
Cc: dev
Subject: Re: SPIP: support decimals with negative scale in decimal operation
 
Hi Wenchen,
Thank you for the clarification. I agree that this is more a bug fix rather than an improvement. I apologize for the error. Please consider this as a design doc.

Thanks,
Marco

Il giorno ven 21 set 2018 alle ore 12:04 Wenchen Fan <[hidden email]> ha scritto:
Hi Marco,

Thanks for sending it! The problem is clearly explained in this email, but I would not treat it as a SPIP. It proposes a fix for a very tricky bug, and SPIP is usually for new features. Others please correct me if I was wrong.

Thanks,
Wenchen

On Fri, Sep 21, 2018 at 5:47 PM Marco Gaido <[hidden email]> wrote:
Hi all,

I am writing this e-mail in order to discuss the issue which is reported in SPARK-25454 and according to Wenchen's suggestion I prepared a design doc for it.

The problem we are facing here is that our rules for decimals operations are taken from Hive and MS SQL server and they explicitly don't support decimals with negative scales. So the rules we have currently are not meant to deal with negative scales. The issue is that Spark, instead, doesn't forbid negative scales and - indeed - there are cases in which we are producing them (eg. a SQL constant like 1e8 would be turned to a decimal(1, -8)).

Having negative scales most likely wasn't really intended. But unfortunately getting rid of them would be a breaking change as many operations working fine currently would not be allowed anymore and would overflow (eg. select 1e36 * 10000). As such, this is something I'd definitely agree on doing, but I think we can target only for 3.0.

What we can start doing now, instead, is updating our rules in order to handle properly also the case when decimal scales are negative. From my investigation, it turns out that the only operations which has problems with them is Divide.

Here you can find the design doc with all the details: https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit?usp=sharing. The document is also linked in SPARK-25454. There is also already a PR with the change: https://github.com/apache/spark/pull/22450.

Looking forward to hear your feedback,
Thanks.
Marco