Re: Support for decimal separator (comma or period) in spark 2.1

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Support for decimal separator (comma or period) in spark 2.1

Arkadiusz Bicz
Hi Team,

I would like to know if it is possible to specify decimal localization for DataFrameReader for  csv?

I have cvs files from localization where decimal separator is comma like 0,32 instead of US way like 0.32 

Is it a way to specify in current version of spark to provide localization: 

spark.read.option("sep",";").option("header", "true").option("inferSchema", "true").format("csv").load("nonuslocalized.csv")

If not should I create ticket in jira for this ? I can work on solution if not available. 

Best Regards,

Arkadiusz Bicz 

Reply | Threaded
Open this post in threaded view
|

Re: Support for decimal separator (comma or period) in spark 2.1

Sam Elamin
Hi Arkadiuz

Not sure if there is a localisation ability but I'm sure other will correct me if I'm wrong


What you could do is write a udf function that replaces the commas with a .

Assuming you know the column in question


Regards
Sam
On Thu, 23 Feb 2017 at 12:31, Arkadiusz Bicz <[hidden email]> wrote:
Hi Team,

I would like to know if it is possible to specify decimal localization for DataFrameReader for  csv?

I have cvs files from localization where decimal separator is comma like 0,32 instead of US way like 0.32 

Is it a way to specify in current version of spark to provide localization: 

spark.read.option("sep",";").option("header", "true").option("inferSchema", "true").format("csv").load("nonuslocalized.csv")

If not should I create ticket in jira for this ? I can work on solution if not available. 

Best Regards,

Arkadiusz Bicz 

Reply | Threaded
Open this post in threaded view
|

Re: Support for decimal separator (comma or period) in spark 2.1

Arkadiusz Bicz
Thank you Sam for answer, I have solved problem by loading all decimals columns as string and replacing all commas with dots but this solution is lacking of automatic infer schema which is quite nice functionality. 

I can work on adding new option to DataFrameReader  for localization like: 

spark.read.option("NumberLocale", "German").csv("filefromeurope.csv")

Just wonder if it will be accepted? 

Best Regards,

Arkadiusz Bicz 

On Thu, Feb 23, 2017 at 1:35 PM, Sam Elamin <[hidden email]> wrote:
Hi Arkadiuz

Not sure if there is a localisation ability but I'm sure other will correct me if I'm wrong


What you could do is write a udf function that replaces the commas with a .

Assuming you know the column in question


Regards
Sam
On Thu, 23 Feb 2017 at 12:31, Arkadiusz Bicz <[hidden email]> wrote:
Hi Team,

I would like to know if it is possible to specify decimal localization for DataFrameReader for  csv?

I have cvs files from localization where decimal separator is comma like 0,32 instead of US way like 0.32 

Is it a way to specify in current version of spark to provide localization: 

spark.read.option("sep",";").option("header", "true").option("inferSchema", "true").format("csv").load("nonuslocalized.csv")

If not should I create ticket in jira for this ? I can work on solution if not available. 

Best Regards,

Arkadiusz Bicz 


Reply | Threaded
Open this post in threaded view
|

Re: Support for decimal separator (comma or period) in spark 2.1

Hyukjin Kwon

2017-02-23 21:53 GMT+09:00 Arkadiusz Bicz <[hidden email]>:
Thank you Sam for answer, I have solved problem by loading all decimals columns as string and replacing all commas with dots but this solution is lacking of automatic infer schema which is quite nice functionality. 

I can work on adding new option to DataFrameReader  for localization like: 

spark.read.option("NumberLocale", "German").csv("filefromeurope.csv")

Just wonder if it will be accepted? 

Best Regards,

Arkadiusz Bicz 

On Thu, Feb 23, 2017 at 1:35 PM, Sam Elamin <[hidden email]> wrote:
Hi Arkadiuz

Not sure if there is a localisation ability but I'm sure other will correct me if I'm wrong


What you could do is write a udf function that replaces the commas with a .

Assuming you know the column in question


Regards
Sam
On Thu, 23 Feb 2017 at 12:31, Arkadiusz Bicz <[hidden email]> wrote:
Hi Team,

I would like to know if it is possible to specify decimal localization for DataFrameReader for  csv?

I have cvs files from localization where decimal separator is comma like 0,32 instead of US way like 0.32 

Is it a way to specify in current version of spark to provide localization: 

spark.read.option("sep",";").option("header", "true").option("inferSchema", "true").format("csv").load("nonuslocalized.csv")

If not should I create ticket in jira for this ? I can work on solution if not available. 

Best Regards,

Arkadiusz Bicz