\r\n in csv output

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

\r\n in csv output

Steven Parkes
SPARK-26108PR#23080 added a require on CSVOptions#lineSeparator to be a single character.

AFAICT, this keeps us from writing CSV files with \r\n line terminators.

Wondering if this was intended or a bug? Is there an alternative mechanism or something else I'm missing?
Reply | Threaded
Open this post in threaded view
|

Re: \r\n in csv output

Steven Parkes
Hrm ... looks like we were setting this in the past although it looks like it was being ignored ...

On Mon, Mar 23, 2020 at 12:53 PM Steven Parkes <[hidden email]> wrote:
SPARK-26108PR#23080 added a require on CSVOptions#lineSeparator to be a single character.

AFAICT, this keeps us from writing CSV files with \r\n line terminators.

Wondering if this was intended or a bug? Is there an alternative mechanism or something else I'm missing?
Reply | Threaded
Open this post in threaded view
|

Re: \r\n in csv output

Vipul Rajan
In reply to this post by Steven Parkes
You can use newAPIHadoopFile

import org.apache.hadoop.io.LongWritable
import org.apache.hadoop.io.Text
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat
val conf = new Configuration

conf.set("textinputformat.record.delimiter", "\r\n")

val df = sc.newAPIHadoopFile("path/to/file", classOf[TextInputFormat], classOf[LongWritable], classOf[Text], conf).map(_._2.toString).toDF()

You would get a dataframe with just a single string column. You'd have to split that column and make it into columnar format, it can be done. If you need help feel free to ping back.

Regards

On Tue, Mar 24, 2020 at 1:23 AM Steven Parkes <[hidden email]> wrote:
SPARK-26108PR#23080 added a require on CSVOptions#lineSeparator to be a single character.

AFAICT, this keeps us from writing CSV files with \r\n line terminators.

Wondering if this was intended or a bug? Is there an alternative mechanism or something else I'm missing?