RandomForest caching

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

RandomForest caching

madhu phatak
Hi,

I am testing RandomForestClassification with 50gb of data which is cached in memory. I have 64gb of ram, in which 28gb is used for original dataset caching.

When I run random forest, it caches around 300GB of intermediate data which un caches the original dataset. This caching is triggered by below code in RandomForest.scala

```
    val baggedInput = BaggedPoint
          .convertToBaggedRDD(treeInput, strategy.subsamplingRate, numTrees, withReplacement, seed)
              .persist(StorageLevel.MEMORY_AND_DISK)

```

As I don't have control over storage level, I cannot make sure original dataset stays in memory for other interactive tasks when random forest is running.

Is it a good idea to make this storage level a user parameter? If so I can open a jira issue and give pr for the same.

--
Regards,
Madhukara Phatak
http://datamantra.io/
Reply | Threaded
Open this post in threaded view
|

Re: RandomForest caching

madhu phatak
Hi,
I opened a jira.


Can some one have a look?

On Fri, Apr 28, 2017 at 1:34 PM, madhu phatak <[hidden email]> wrote:
Hi,

I am testing RandomForestClassification with 50gb of data which is cached in memory. I have 64gb of ram, in which 28gb is used for original dataset caching.

When I run random forest, it caches around 300GB of intermediate data which un caches the original dataset. This caching is triggered by below code in RandomForest.scala

```
    val baggedInput = BaggedPoint
          .convertToBaggedRDD(treeInput, strategy.subsamplingRate, numTrees, withReplacement, seed)
              .persist(StorageLevel.MEMORY_AND_DISK)

```

As I don't have control over storage level, I cannot make sure original dataset stays in memory for other interactive tasks when random forest is running.

Is it a good idea to make this storage level a user parameter? If so I can open a jira issue and give pr for the same.

--
Regards,
Madhukara Phatak
http://datamantra.io/



--
Regards,
Madhukara Phatak
http://datamantra.io/