Expanded docs for the various storage levels

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Expanded docs for the various storage levels

Nicholas Chammas

I’m looking at the docs here:

http://spark.apache.org/docs/1.6.2/api/python/pyspark.html#pyspark.StorageLevel

A newcomer to Spark won’t understand the meaning of _2, or the meaning of _SER (or its value), and won’t understand how exactly memory and disk play together when something like MEMORY_AND_DISK is selected.

Is there a place in the docs that expands on the storage levels a bit? If not, shall we create a JIRA and expand this documentation? I don’t mind taking on this task, though frankly I’m interested in this because I don’t fully understand the differences myself. :)

Nick

Reply | Threaded
Open this post in threaded view
|

Re: Expanded docs for the various storage levels

rxin
Please create a patch. Thanks!


On Thu, Jul 7, 2016 at 12:07 PM, Nicholas Chammas <[hidden email]> wrote:

I’m looking at the docs here:

http://spark.apache.org/docs/1.6.2/api/python/pyspark.html#pyspark.StorageLevel

A newcomer to Spark won’t understand the meaning of _2, or the meaning of _SER (or its value), and won’t understand how exactly memory and disk play together when something like MEMORY_AND_DISK is selected.

Is there a place in the docs that expands on the storage levels a bit? If not, shall we create a JIRA and expand this documentation? I don’t mind taking on this task, though frankly I’m interested in this because I don’t fully understand the differences myself. :)

Nick


Reply | Threaded
Open this post in threaded view
|

Re: Expanded docs for the various storage levels

nsalian
This post has NOT been accepted by the mailing list yet.
In reply to this post by Nicholas Chammas
That would be good to have.
You could look at http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence and mimick what is in Scala to help.
Neelesh S. Salian  
Cloudera
Reply | Threaded
Open this post in threaded view
|

Re: Expanded docs for the various storage levels

Nicholas Chammas
In reply to this post by rxin

On Thu, Jul 7, 2016 at 3:18 PM Reynold Xin <[hidden email]> wrote:
Please create a patch. Thanks!


On Thu, Jul 7, 2016 at 12:07 PM, Nicholas Chammas <[hidden email]> wrote:

I’m looking at the docs here:

http://spark.apache.org/docs/1.6.2/api/python/pyspark.html#pyspark.StorageLevel

A newcomer to Spark won’t understand the meaning of _2, or the meaning of _SER (or its value), and won’t understand how exactly memory and disk play together when something like MEMORY_AND_DISK is selected.

Is there a place in the docs that expands on the storage levels a bit? If not, shall we create a JIRA and expand this documentation? I don’t mind taking on this task, though frankly I’m interested in this because I don’t fully understand the differences myself. :)

Nick