Quantcast

Spark madness

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Spark madness

Saikat Kanjilal

Hi Devs,

I'm needing to read a json file from hdfs and turn that into a scala string, I have dug around for documentation on how to do this  and found this: 


http://stackoverflow.com/questions/30445263/how-to-read-whole-file-in-one-string


The following two lines of code dont seem to do the job:


rdd = sc.wholeTextFiles("hdfs://nameservice1/user/me/test.txt")
rdd.collect.foreach(t=>println(t._2))


I have tried to set the second line to a scala string but that doesn't seem to work, I would really appreciate some insights into how to do this.


Thanks in advance.





Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark madness

Saikat Kanjilal

One additional point, the following line: 

rdd.collect.foreach(t=>println(t._2))
when set to a scala string prints nothing even when I use toString at the end.  This seems to not be something that should be that out of the ordinary but I could be wrong.



From: Saikat Kanjilal <[hidden email]>
Sent: Thursday, May 18, 2017 8:18 PM
To: [hidden email]
Subject: Spark madness
 

Hi Devs,

I'm needing to read a json file from hdfs and turn that into a scala string, I have dug around for documentation on how to do this  and found this: 


http://stackoverflow.com/questions/30445263/how-to-read-whole-file-in-one-string

stackoverflow.com
"How to read whole [HDFS] file in one string [in Spark, to use as sql]": e.g. // Put file to hdfs from edge-node's shell... hdfs dfs -put


The following two lines of code dont seem to do the job:


rdd = sc.wholeTextFiles("hdfs://nameservice1/user/me/test.txt")
rdd.collect.foreach(t=>println(t._2))


I have tried to set the second line to a scala string but that doesn't seem to work, I would really appreciate some insights into how to do this.


Thanks in advance.

stackoverflow.com
"How to read whole [HDFS] file in one string [in Spark, to use as sql]": e.g. // Put file to hdfs from edge-node's shell... hdfs dfs -put




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark madness

Jonathan Winandy

Hi Saikat,

You may use the wrong mailing list for your question (=> spark user).

If you want to make a single string, it's :
red.collect.mkString("\n")

Be careful of driver explosion !

Cheers,
Jonathan


On Fri, 19 May 2017, 05:21 Saikat Kanjilal, <[hidden email]> wrote:

One additional point, the following line: 

rdd.collect.foreach(t=>println(t._2))
when set to a scala string prints nothing even when I use toString at the end.  This seems to not be something that should be that out of the ordinary but I could be wrong.


From: Saikat Kanjilal <[hidden email]>
Sent: Thursday, May 18, 2017 8:18 PM
To: [hidden email]
Subject: Spark madness
 

Hi Devs,

I'm needing to read a json file from hdfs and turn that into a scala string, I have dug around for documentation on how to do this  and found this: 


http://stackoverflow.com/questions/30445263/how-to-read-whole-file-in-one-string

"How to read whole [HDFS] file in one string [in Spark, to use as sql]": e.g. // Put file to hdfs from edge-node's shell... hdfs dfs -put


The following two lines of code dont seem to do the job:


rdd = sc.wholeTextFiles("hdfs://nameservice1/user/me/test.txt")
rdd.collect.foreach(t=>println(t._2))


I have tried to set the second line to a scala string but that doesn't seem to work, I would really appreciate some insights into how to do this.


Thanks in advance.

"How to read whole [HDFS] file in one string [in Spark, to use as sql]": e.g. // Put file to hdfs from edge-node's shell... hdfs dfs -put




Loading...