RDD object Out of scope.

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

RDD object Out of scope.

Nasrulla Khan Haris

HI Spark developers,

 

Can someone point out the code where RDD objects go out of scope ?. I found the contextcleaner code in which only persisted RDDs are cleaned up in regular intervals if the RDD is registered to cleanup. I have not found where the destructor for RDD object is invoked. I am trying to understand when RDD cleanup happens when the RDD is not persisted.

 

Thanks in advance, appreciate your help.

Nasrulla

 

Reply | Threaded
Open this post in threaded view
|

Re: RDD object Out of scope.

cloud0fan
RDD is kind of a pointer to the actual data. Unless it's cached, we don't need to clean up the RDD.

On Tue, May 21, 2019 at 1:48 PM Nasrulla Khan Haris <[hidden email]> wrote:

HI Spark developers,

 

Can someone point out the code where RDD objects go out of scope ?. I found the contextcleaner code in which only persisted RDDs are cleaned up in regular intervals if the RDD is registered to cleanup. I have not found where the destructor for RDD object is invoked. I am trying to understand when RDD cleanup happens when the RDD is not persisted.

 

Thanks in advance, appreciate your help.

Nasrulla

 

Reply | Threaded
Open this post in threaded view
|

RE: RDD object Out of scope.

Nasrulla Khan Haris

Thanks for reply Wenchen, I am curious as what happens when RDD goes out of scope when it is not cached.

 

Nasrulla

 

From: Wenchen Fan <[hidden email]>
Sent: Tuesday, May 21, 2019 6:28 AM
To: Nasrulla Khan Haris <[hidden email]>
Cc: [hidden email]
Subject: Re: RDD object Out of scope.

 

RDD is kind of a pointer to the actual data. Unless it's cached, we don't need to clean up the RDD.

 

On Tue, May 21, 2019 at 1:48 PM Nasrulla Khan Haris <[hidden email]> wrote:

HI Spark developers,

 

Can someone point out the code where RDD objects go out of scope ?. I found the contextcleaner code in which only persisted RDDs are cleaned up in regular intervals if the RDD is registered to cleanup. I have not found where the destructor for RDD object is invoked. I am trying to understand when RDD cleanup happens when the RDD is not persisted.

 

Thanks in advance, appreciate your help.

Nasrulla

 

Reply | Threaded
Open this post in threaded view
|

Re: RDD object Out of scope.

Charoes
If you cached a RDD and hold a reference of that RDD in your code, then your RDD will NOT be cleaned up.
There is a ReferenceQueue in ContextCleaner, which is used to keep tracking the reference of RDD, Broadcast, and Accumulator etc.

On Wed, May 22, 2019 at 1:07 AM Nasrulla Khan Haris <[hidden email]> wrote:

Thanks for reply Wenchen, I am curious as what happens when RDD goes out of scope when it is not cached.

 

Nasrulla

 

From: Wenchen Fan <[hidden email]>
Sent: Tuesday, May 21, 2019 6:28 AM
To: Nasrulla Khan Haris <[hidden email]>
Cc: [hidden email]
Subject: Re: RDD object Out of scope.

 

RDD is kind of a pointer to the actual data. Unless it's cached, we don't need to clean up the RDD.

 

On Tue, May 21, 2019 at 1:48 PM Nasrulla Khan Haris <[hidden email]> wrote:

HI Spark developers,

 

Can someone point out the code where RDD objects go out of scope ?. I found the contextcleaner code in which only persisted RDDs are cleaned up in regular intervals if the RDD is registered to cleanup. I have not found where the destructor for RDD object is invoked. I am trying to understand when RDD cleanup happens when the RDD is not persisted.

 

Thanks in advance, appreciate your help.

Nasrulla

 

Reply | Threaded
Open this post in threaded view
|

RE: RDD object Out of scope.

Nasrulla Khan Haris

I am trying to find the code that cleans up uncached RDD.

 

Thanks,

Nasrulla

 

From: Charoes <[hidden email]>
Sent: Tuesday, May 21, 2019 5:10 PM
To: Nasrulla Khan Haris <[hidden email]>
Cc: Wenchen Fan <[hidden email]>; [hidden email]
Subject: Re: RDD object Out of scope.

 

If you cached a RDD and hold a reference of that RDD in your code, then your RDD will NOT be cleaned up.

There is a ReferenceQueue in ContextCleaner, which is used to keep tracking the reference of RDD, Broadcast, and Accumulator etc.

 

On Wed, May 22, 2019 at 1:07 AM Nasrulla Khan Haris <[hidden email]> wrote:

Thanks for reply Wenchen, I am curious as what happens when RDD goes out of scope when it is not cached.

 

Nasrulla

 

From: Wenchen Fan <[hidden email]>
Sent: Tuesday, May 21, 2019 6:28 AM
To: Nasrulla Khan Haris <[hidden email]>
Cc: [hidden email]
Subject: Re: RDD object Out of scope.

 

RDD is kind of a pointer to the actual data. Unless it's cached, we don't need to clean up the RDD.

 

On Tue, May 21, 2019 at 1:48 PM Nasrulla Khan Haris <[hidden email]> wrote:

HI Spark developers,

 

Can someone point out the code where RDD objects go out of scope ?. I found the contextcleaner code in which only persisted RDDs are cleaned up in regular intervals if the RDD is registered to cleanup. I have not found where the destructor for RDD object is invoked. I am trying to understand when RDD cleanup happens when the RDD is not persisted.

 

Thanks in advance, appreciate your help.

Nasrulla

 

Reply | Threaded
Open this post in threaded view
|

Re: RDD object Out of scope.

Sean Owen-2
I'm not clear what you're asking. An RDD itself is just an object in
the JVM. It will be garbage collected if there are no references. What
else would there be to clean up in your case? ContextCleaner handles
cleaned up of persisted RDDs, etc.

On Tue, May 21, 2019 at 7:39 PM Nasrulla Khan Haris
<[hidden email]> wrote:

>
> I am trying to find the code that cleans up uncached RDD.
>
>
>
> Thanks,
>
> Nasrulla
>
>
>
> From: Charoes <[hidden email]>
> Sent: Tuesday, May 21, 2019 5:10 PM
> To: Nasrulla Khan Haris <[hidden email]>
> Cc: Wenchen Fan <[hidden email]>; [hidden email]
> Subject: Re: RDD object Out of scope.
>
>
>
> If you cached a RDD and hold a reference of that RDD in your code, then your RDD will NOT be cleaned up.
>
> There is a ReferenceQueue in ContextCleaner, which is used to keep tracking the reference of RDD, Broadcast, and Accumulator etc.
>
>
>
> On Wed, May 22, 2019 at 1:07 AM Nasrulla Khan Haris <[hidden email]> wrote:
>
> Thanks for reply Wenchen, I am curious as what happens when RDD goes out of scope when it is not cached.
>
>
>
> Nasrulla
>
>
>
> From: Wenchen Fan <[hidden email]>
> Sent: Tuesday, May 21, 2019 6:28 AM
> To: Nasrulla Khan Haris <[hidden email]>
> Cc: [hidden email]
> Subject: Re: RDD object Out of scope.
>
>
>
> RDD is kind of a pointer to the actual data. Unless it's cached, we don't need to clean up the RDD.
>
>
>
> On Tue, May 21, 2019 at 1:48 PM Nasrulla Khan Haris <[hidden email]> wrote:
>
> HI Spark developers,
>
>
>
> Can someone point out the code where RDD objects go out of scope ?. I found the contextcleaner code in which only persisted RDDs are cleaned up in regular intervals if the RDD is registered to cleanup. I have not found where the destructor for RDD object is invoked. I am trying to understand when RDD cleanup happens when the RDD is not persisted.
>
>
>
> Thanks in advance, appreciate your help.
>
> Nasrulla
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: RDD object Out of scope.

Nasrulla Khan Haris
Thanks Sean, that makes sense.

Regards,
Nasrulla

-----Original Message-----
From: Sean Owen <[hidden email]>
Sent: Tuesday, May 21, 2019 6:24 PM
To: Nasrulla Khan Haris <[hidden email]>
Cc: [hidden email]
Subject: Re: RDD object Out of scope.

I'm not clear what you're asking. An RDD itself is just an object in the JVM. It will be garbage collected if there are no references. What else would there be to clean up in your case? ContextCleaner handles cleaned up of persisted RDDs, etc.

On Tue, May 21, 2019 at 7:39 PM Nasrulla Khan Haris <[hidden email]> wrote:

>
> I am trying to find the code that cleans up uncached RDD.
>
>
>
> Thanks,
>
> Nasrulla
>
>
>
> From: Charoes <[hidden email]>
> Sent: Tuesday, May 21, 2019 5:10 PM
> To: Nasrulla Khan Haris <[hidden email]>
> Cc: Wenchen Fan <[hidden email]>; [hidden email]
> Subject: Re: RDD object Out of scope.
>
>
>
> If you cached a RDD and hold a reference of that RDD in your code, then your RDD will NOT be cleaned up.
>
> There is a ReferenceQueue in ContextCleaner, which is used to keep tracking the reference of RDD, Broadcast, and Accumulator etc.
>
>
>
> On Wed, May 22, 2019 at 1:07 AM Nasrulla Khan Haris <[hidden email]> wrote:
>
> Thanks for reply Wenchen, I am curious as what happens when RDD goes out of scope when it is not cached.
>
>
>
> Nasrulla
>
>
>
> From: Wenchen Fan <[hidden email]>
> Sent: Tuesday, May 21, 2019 6:28 AM
> To: Nasrulla Khan Haris <[hidden email]>
> Cc: [hidden email]
> Subject: Re: RDD object Out of scope.
>
>
>
> RDD is kind of a pointer to the actual data. Unless it's cached, we don't need to clean up the RDD.
>
>
>
> On Tue, May 21, 2019 at 1:48 PM Nasrulla Khan Haris <[hidden email]> wrote:
>
> HI Spark developers,
>
>
>
> Can someone point out the code where RDD objects go out of scope ?. I found the contextcleaner code in which only persisted RDDs are cleaned up in regular intervals if the RDD is registered to cleanup. I have not found where the destructor for RDD object is invoked. I am trying to understand when RDD cleanup happens when the RDD is not persisted.
>
>
>
> Thanks in advance, appreciate your help.
>
> Nasrulla
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]