Filtering based on a float value with more than one decimal place not working correctly in Pyspark dataframe

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Filtering based on a float value with more than one decimal place not working correctly in Pyspark dataframe

Meethu Mathew-2
Hi all,

I tried the following code and the output was not as expected.

schema = StructType([StructField('Id', StringType(), False),
                     StructField('Value', FloatType(), False)])  
df_test = spark.createDataFrame([('a',5.0),('b',1.236),('c',-0.31)],schema)
df_test 

Output :  DataFrame[Id: string, Value: float]
image.png
But when the value is given as a string, it worked.

image.png
Again tried with a floating point number with one decimal place and it worked.
image.png
And when the equals operation is changed to greater than or less than, its working with more than one decimal place numbers
image.png
Is this a bug?

Regards, 
Meethu Mathew


Reply | Threaded
Open this post in threaded view
|

Re: Filtering based on a float value with more than one decimal place not working correctly in Pyspark dataframe

sandeep_katta
I think it is similar to the one SPARK-25452

Regards
Sandeep Katta

On Wed, 26 Sep 2018 at 11:16 AM, Meethu Mathew <[hidden email]> wrote:
Hi all,

I tried the following code and the output was not as expected.

schema = StructType([StructField('Id', StringType(), False),
                     StructField('Value', FloatType(), False)])  
df_test = spark.createDataFrame([('a',5.0),('b',1.236),('c',-0.31)],schema)
df_test 

Output :  DataFrame[Id: string, Value: float]
image.png
But when the value is given as a string, it worked.

image.png
Again tried with a floating point number with one decimal place and it worked.
image.png
And when the equals operation is changed to greater than or less than, its working with more than one decimal place numbers
image.png
Is this a bug?

Regards, 


Meethu Mathew


Reply | Threaded
Open this post in threaded view
|

Re: Filtering based on a float value with more than one decimal place not working correctly in Pyspark dataframe

Sean Owen-2
In reply to this post by Meethu Mathew-2
Is this not just a case of floating-point literals not being exact? this is expressed in Python, not SQL.

On Wed, Sep 26, 2018 at 12:46 AM Meethu Mathew <[hidden email]> wrote:
Hi all,

I tried the following code and the output was not as expected.

schema = StructType([StructField('Id', StringType(), False),
                     StructField('Value', FloatType(), False)])  
df_test = spark.createDataFrame([('a',5.0),('b',1.236),('c',-0.31)],schema)
df_test 

Output :  DataFrame[Id: string, Value: float]
image.png
But when the value is given as a string, it worked.

image.png
Again tried with a floating point number with one decimal place and it worked.
image.png
And when the equals operation is changed to greater than or less than, its working with more than one decimal place numbers
image.png
Is this a bug?

Regards, 
Meethu Mathew