Unpersist return type

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Unpersist return type

German Schiavon
Hello!

I'd like to ask if there is any reason to return type when calling dataframe.unpersist

def unpersist(blocking: Boolean): this.type = {
sparkSession.sharedState.cacheManager.uncacheQuery(
sparkSession, logicalPlan, cascade = false, blocking)
this
}

Just pointing it out because this example from the docs don't compile since unpersist() is not Unit

streamingDF.writeStream.foreachBatch { (batchDF: DataFrame, batchId: Long) =>
  batchDF.persist()
  batchDF.write.format(...).save(...)  // location 1
  batchDF.write.format(...).save(...)  // location 2
  batchDF.unpersist()
}

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: Unpersist return type

Sean Owen-2
Probably for purposes of chaining, though won't be very useful here. Like df.unpersist().cache(... some other settings ...)

foreachBatch wants a function that evaluates to Unit, but this qualifies - doesn't matter what the value of the block is, if it's ignored.
This does seem to compile; are you sure? what error? may not be related to that, quite.


On Thu, Oct 22, 2020 at 5:40 AM German Schiavon <[hidden email]> wrote:
Hello!

I'd like to ask if there is any reason to return type when calling dataframe.unpersist

def unpersist(blocking: Boolean): this.type = {
sparkSession.sharedState.cacheManager.uncacheQuery(
sparkSession, logicalPlan, cascade = false, blocking)
this
}

Just pointing it out because this example from the docs don't compile since unpersist() is not Unit

streamingDF.writeStream.foreachBatch { (batchDF: DataFrame, batchId: Long) =>
  batchDF.persist()
  batchDF.write.format(...).save(...)  // location 1
  batchDF.write.format(...).save(...)  // location 2
  batchDF.unpersist()
}

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: Unpersist return type

German Schiavon

On Thu, 22 Oct 2020 at 15:53, Sean Owen <[hidden email]> wrote:
Probably for purposes of chaining, though won't be very useful here. Like df.unpersist().cache(... some other settings ...)

foreachBatch wants a function that evaluates to Unit, but this qualifies - doesn't matter what the value of the block is, if it's ignored.
This does seem to compile; are you sure? what error? may not be related to that, quite.


On Thu, Oct 22, 2020 at 5:40 AM German Schiavon <[hidden email]> wrote:
Hello!

I'd like to ask if there is any reason to return type when calling dataframe.unpersist

def unpersist(blocking: Boolean): this.type = {
sparkSession.sharedState.cacheManager.uncacheQuery(
sparkSession, logicalPlan, cascade = false, blocking)
this
}

Just pointing it out because this example from the docs don't compile since unpersist() is not Unit

streamingDF.writeStream.foreachBatch { (batchDF: DataFrame, batchId: Long) =>
  batchDF.persist()
  batchDF.write.format(...).save(...)  // location 1
  batchDF.write.format(...).save(...)  // location 2
  batchDF.unpersist()
}

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: Unpersist return type

Sean Owen-2
That's a compile error. If it said this were ambiguous, I'd say this is probably another instance where the legacy overloads for Java become ambiguous in 2.12 / 3.0 so you have to cast your function to the specific Scala overload. That's not quite the error though, but, might try it?
As you say you can also put in "Unit" at the end to 'fix' it as a workaround but shouldn't be necessary.
I'm not sure why the second overload doesn't apply as DataFrame is a Dataset[Row]. 

On Thu, Oct 22, 2020 at 9:02 AM German Schiavon <[hidden email]> wrote:

On Thu, 22 Oct 2020 at 15:53, Sean Owen <[hidden email]> wrote:
Probably for purposes of chaining, though won't be very useful here. Like df.unpersist().cache(... some other settings ...)

foreachBatch wants a function that evaluates to Unit, but this qualifies - doesn't matter what the value of the block is, if it's ignored.
This does seem to compile; are you sure? what error? may not be related to that, quite.


On Thu, Oct 22, 2020 at 5:40 AM German Schiavon <[hidden email]> wrote:
Hello!

I'd like to ask if there is any reason to return type when calling dataframe.unpersist

def unpersist(blocking: Boolean): this.type = {
sparkSession.sharedState.cacheManager.uncacheQuery(
sparkSession, logicalPlan, cascade = false, blocking)
this
}

Just pointing it out because this example from the docs don't compile since unpersist() is not Unit

streamingDF.writeStream.foreachBatch { (batchDF: DataFrame, batchId: Long) =>
  batchDF.persist()
  batchDF.write.format(...).save(...)  // location 1
  batchDF.write.format(...).save(...)  // location 2
  batchDF.unpersist()
}

Thanks!