Maximum limit for akka.frame.size be greater than 500 MB ?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Maximum limit for akka.frame.size be greater than 500 MB ?

aravasai
I have a spark job running on 2 terabytes of data which creates more than 30,000 partitions. As a result, the spark job fails with the error
"Map output statuses were 170415722 bytes which exceeds spark.akka.frameSize 52428800 bytes" (For 1 TB data)
However, when I increase the akka.frame.size to around 500 MB, the job hangs with no further progress.

So, what is the ideal or maximum limit that i can assign akka.frame.size so that I do not get the error of map output statuses exceeding limit for large chunks of data ?

Is coalescing the data into smaller number of partitions the only solution to this problem? Is there any better way than coalescing many intermediate rdd's in program ?

My driver memory: 10G
Executor memory: 36G
Executor memory overhead : 3G



Reply | Threaded
Open this post in threaded view
|

Re: Maximum limit for akka.frame.size be greater than 500 MB ?

Jörn Franke
Which Spark version are you using? What are you trying to do exactly and what is the input data? As far as I know, akka has been dropped in recent Spark versions.

> On 30 Jan 2017, at 00:44, aravasai <[hidden email]> wrote:
>
> I have a spark job running on 2 terabytes of data which creates more than
> 30,000 partitions. As a result, the spark job fails with the error
> "Map output statuses were 170415722 bytes which exceeds spark.akka.frameSize
> 52428800 bytes" (For 1 TB data)
> However, when I increase the akka.frame.size to around 500 MB, the job hangs
> with no further progress.
>
> So, what is the ideal or maximum limit that i can assign akka.frame.size so
> that I do not get the error of map output statuses exceeding limit for large
> chunks of data ?
>
> Is coalescing the data into smaller number of partitions the only solution
> to this problem? Is there any better way than coalescing many intermediate
> rdd's in program ?
>
> My driver memory: 10G
> Executor memory: 36G
> Executor memory overhead : 3G
>
>
>
>
>
>
>
> --
> View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Maximum-limit-for-akka-frame-size-be-greater-than-500-MB-tp20793.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Maximum limit for akka.frame.size be greater than 500 MB ?

aravasai
Currently, I am using 1.6.1 version. I continue to use it as my current code is heavily reliant on RDD's and not dataframes. Also, because 1.6.1 is stabler than newer versions. 


The input data is user behavior data of 20 fields and 1 billion records (~ 1.5 TB) . I am trying to group by user id and calculate some users statistics. But, I guess the number of mapper tasks are too high resulting in akka.frame.size error.

1) Does akka.frame.size has to be proportionately increased with size of data which indirectly affects the number of partitions?
2) Or do the  huge number of mappers in the code (It may not be prevented) result in the frame size error?

On Sun, Jan 29, 2017 at 11:15 PM, Jörn Franke [via Apache Spark Developers List] <[hidden email]> wrote:
Which Spark version are you using? What are you trying to do exactly and what is the input data? As far as I know, akka has been dropped in recent Spark versions.

> On 30 Jan 2017, at 00:44, aravasai <[hidden email]> wrote:
>
> I have a spark job running on 2 terabytes of data which creates more than
> 30,000 partitions. As a result, the spark job fails with the error
> "Map output statuses were 170415722 bytes which exceeds spark.akka.frameSize
> 52428800 bytes" (For 1 TB data)
> However, when I increase the akka.frame.size to around 500 MB, the job hangs
> with no further progress.
>
> So, what is the ideal or maximum limit that i can assign akka.frame.size so
> that I do not get the error of map output statuses exceeding limit for large
> chunks of data ?
>
> Is coalescing the data into smaller number of partitions the only solution
> to this problem? Is there any better way than coalescing many intermediate
> rdd's in program ?
>
> My driver memory: 10G
> Executor memory: 36G
> Executor memory overhead : 3G
>
>
>
>
>
>
>
> --
> View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Maximum-limit-for-akka-frame-size-be-greater-than-500-MB-tp20793.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]




To start a new topic under Apache Spark Developers List, email [hidden email]
To unsubscribe from Maximum limit for akka.frame.size be greater than 500 MB ?, click here.
NAML