Decimals with negative scale

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Decimals with negative scale

Marco Gaido
Hi all,

as you may remember, there was a design doc to support operations involving decimals with negative scales. After the discussion in the design doc, now the related PR is blocked because for 3.0 we have another option which we can explore, ie. forbidding negative scales. This is probably a cleaner solution, as most likely we didn't want negative scales, but it is a breaking change: so we wanted to check the opinion of the community.

Getting to the topic, here there are the 2 options:
 - Forbidding negative scales
  Pros: many sources do not support negative scales (so they can create issues); they were something which was not considered as possible in the initial implementation, so we get to a more stable situation.
  Cons: some operations which were supported earlier, won't be working anymore. Eg. since our max precision is 38, if the scale cannot be negative 1e36 * 1e36 would cause an overflow, while now works fine (producing a decimal with negative scale); basically impossible to create a config which controls the behavior.

 - Handling negative scales in operations
  Pros: no regressions; we support all the operations we supported on 2.x.
  Cons: negative scales can cause issues in other moments, eg. when saving to a data source which doesn't support them.

Looking forward to hear your thoughts,
Thanks.
Marco


Reply | Threaded
Open this post in threaded view
|

Re: Decimals with negative scale

rxin
Is this an analysis time thing or a runtime thing?

On Tue, Dec 18, 2018 at 7:45 AM Marco Gaido <[hidden email]> wrote:
Hi all,

as you may remember, there was a design doc to support operations involving decimals with negative scales. After the discussion in the design doc, now the related PR is blocked because for 3.0 we have another option which we can explore, ie. forbidding negative scales. This is probably a cleaner solution, as most likely we didn't want negative scales, but it is a breaking change: so we wanted to check the opinion of the community.

Getting to the topic, here there are the 2 options:
 - Forbidding negative scales
  Pros: many sources do not support negative scales (so they can create issues); they were something which was not considered as possible in the initial implementation, so we get to a more stable situation.
  Cons: some operations which were supported earlier, won't be working anymore. Eg. since our max precision is 38, if the scale cannot be negative 1e36 * 1e36 would cause an overflow, while now works fine (producing a decimal with negative scale); basically impossible to create a config which controls the behavior.

 - Handling negative scales in operations
  Pros: no regressions; we support all the operations we supported on 2.x.
  Cons: negative scales can cause issues in other moments, eg. when saving to a data source which doesn't support them.

Looking forward to hear your thoughts,
Thanks.
Marco


Reply | Threaded
Open this post in threaded view
|

Re: Decimals with negative scale

Marco Gaido
This is at analysis time.

On Tue, 18 Dec 2018, 17:32 Reynold Xin <[hidden email] wrote:
Is this an analysis time thing or a runtime thing?

On Tue, Dec 18, 2018 at 7:45 AM Marco Gaido <[hidden email]> wrote:
Hi all,

as you may remember, there was a design doc to support operations involving decimals with negative scales. After the discussion in the design doc, now the related PR is blocked because for 3.0 we have another option which we can explore, ie. forbidding negative scales. This is probably a cleaner solution, as most likely we didn't want negative scales, but it is a breaking change: so we wanted to check the opinion of the community.

Getting to the topic, here there are the 2 options:
 - Forbidding negative scales
  Pros: many sources do not support negative scales (so they can create issues); they were something which was not considered as possible in the initial implementation, so we get to a more stable situation.
  Cons: some operations which were supported earlier, won't be working anymore. Eg. since our max precision is 38, if the scale cannot be negative 1e36 * 1e36 would cause an overflow, while now works fine (producing a decimal with negative scale); basically impossible to create a config which controls the behavior.

 - Handling negative scales in operations
  Pros: no regressions; we support all the operations we supported on 2.x.
  Cons: negative scales can cause issues in other moments, eg. when saving to a data source which doesn't support them.

Looking forward to hear your thoughts,
Thanks.
Marco


Reply | Threaded
Open this post in threaded view
|

Re: Decimals with negative scale

rxin
So why can't we just do validation to fail sources that don't support negative scale, if it is not supported? This way, we don't need to break backward compatibility in anyway and it becomes a strict improvement.


On Tue, Dec 18, 2018 at 8:43 AM, Marco Gaido <[hidden email]> wrote:
This is at analysis time.

On Tue, 18 Dec 2018, 17:32 Reynold Xin <[hidden email] wrote:
Is this an analysis time thing or a runtime thing?

On Tue, Dec 18, 2018 at 7:45 AM Marco Gaido <[hidden email]> wrote:
Hi all,

as you may remember, there was a design doc to support operations involving decimals with negative scales. After the discussion in the design doc, now the related PR is blocked because for 3.0 we have another option which we can explore, ie. forbidding negative scales. This is probably a cleaner solution, as most likely we didn't want negative scales, but it is a breaking change: so we wanted to check the opinion of the community.

Getting to the topic, here there are the 2 options:
 - Forbidding negative scales
  Pros: many sources do not support negative scales (so they can create issues); they were something which was not considered as possible in the initial implementation, so we get to a more stable situation.
  Cons: some operations which were supported earlier, won't be working anymore. Eg. since our max precision is 38, if the scale cannot be negative 1e36 * 1e36 would cause an overflow, while now works fine (producing a decimal with negative scale); basically impossible to create a config which controls the behavior.

 - Handling negative scales in operations
  Pros: no regressions; we support all the operations we supported on 2.x.
  Cons: negative scales can cause issues in other moments, eg. when saving to a data source which doesn't support them.

Looking forward to hear your thoughts,
Thanks.
Marco

Reply | Threaded
Open this post in threaded view
|

Re: Decimals with negative scale

Marco Gaido
That is feasible, the main point is that negative scales were not really meant to be there in the first place, so it something which was forgot to be forbidden, and it is something which the DBs we are drawing our inspiration from for decimals (mainly SQLServer) do not support.
Honestly, my opinion on this topic is:
 - let's add the support to negative scales in the operations (I have already a PR out for that, https://github.com/apache/spark/pull/22450);
 - let's reduce our usage of DECIMAL in favor of DOUBLE when parsing literals, as done by Hive, Presto, DB2, ...; so the number of cases when we deal with negative scales in anyway small (and we do not have issues with datasources which don't support them).

Thanks,
Marco


Il giorno mar 18 dic 2018 alle ore 19:08 Reynold Xin <[hidden email]> ha scritto:
So why can't we just do validation to fail sources that don't support negative scale, if it is not supported? This way, we don't need to break backward compatibility in anyway and it becomes a strict improvement.


On Tue, Dec 18, 2018 at 8:43 AM, Marco Gaido <[hidden email]> wrote:
This is at analysis time.

On Tue, 18 Dec 2018, 17:32 Reynold Xin <[hidden email] wrote:
Is this an analysis time thing or a runtime thing?

On Tue, Dec 18, 2018 at 7:45 AM Marco Gaido <[hidden email]> wrote:
Hi all,

as you may remember, there was a design doc to support operations involving decimals with negative scales. After the discussion in the design doc, now the related PR is blocked because for 3.0 we have another option which we can explore, ie. forbidding negative scales. This is probably a cleaner solution, as most likely we didn't want negative scales, but it is a breaking change: so we wanted to check the opinion of the community.

Getting to the topic, here there are the 2 options:
 - Forbidding negative scales
  Pros: many sources do not support negative scales (so they can create issues); they were something which was not considered as possible in the initial implementation, so we get to a more stable situation.
  Cons: some operations which were supported earlier, won't be working anymore. Eg. since our max precision is 38, if the scale cannot be negative 1e36 * 1e36 would cause an overflow, while now works fine (producing a decimal with negative scale); basically impossible to create a config which controls the behavior.

 - Handling negative scales in operations
  Pros: no regressions; we support all the operations we supported on 2.x.
  Cons: negative scales can cause issues in other moments, eg. when saving to a data source which doesn't support them.

Looking forward to hear your thoughts,
Thanks.
Marco