spark convert rdd to dataframe from class

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

spark convert rdd to dataframe from class

zouba123
This post has NOT been accepted by the mailing list yet.
i have a big file , after filtring the first lines that containes some copy rights i want to take the header (140 fields) and convert it to dataframe.
i was thinking to use a class extends Product trait to define to schema of the dataframe and than convert it to dataframe According to that shema.
what is the best way to do it.
the file look like this :
    text  (06.07.03.216)  COPYRIGHT © skdjh 2000-2016
    text  160614_54554.vf Database    53643_csc   Interface   574 zn  65
    Start   Date    14/06/2016  00:00:00:000
    End Date    14/06/2016  00:14:59:999
    State   "s23"
   
        Con. Nb    Start Time  End Time    First Event Con. Duration   IMSI
        32055680    16/09/2010 16:59:59:245 16/09/2016 17:00:00:000 xxxxxxxxxxxxx
        32055680    16/09/2010 16:59:59:245 16/09/2016 17:00:00:000 xxxxxxxxxxxxx
        32055680    16/09/2010 16:59:59:245 16/09/2016 17:00:00:000 xxxxxxxxxxxxx
        32055680    16/09/2010 16:59:59:245 16/09/2016 17:00:00:000 xxxxxxxxxxxxx
        32055680    16/09/2010 16:59:59:245 16/09/2016 17:00:00:000 xxxxxxxxxxxxx

i want to convert it to spark sql like this schema

    |    Con. Nb  |  Start Time       |   End Time       |  First Event   |   Con. Duration  | IMSI  |
    | -------------------------------------------------------------------------------------|
    |   32055680 |   16/09/2010     |   16:59:59:245 |  16/09/2016  |   17:00:00:000  | xxxxx |
    |   32055680 |   16/09/2010     |   16:59:59:245 |  16/09/2016  |   17:00:00:000  | xxxxx |
    |   32055680 |   16/09/2010     |   16:59:59:245 |  16/09/2016  |   17:00:00:000  | xxxxx |
    |   20556800 |   16/09/2010     |   16:59:59:245 |  16/09/2016  |   17:00:00:000  | xxxxx |
    |   32055680 |   16/09/2010     |   16:59:59:245 |  16/09/2016  |   17:00:00:000  | xxxxx |
    ----------------------------------------------------------------------------------------
i'm using spark 1.6. thank you
Loading...