Remove non-Tungsten mode in Spark 3?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Remove non-Tungsten mode in Spark 3?

Sean Owen-3
Just wondering if there is a good reason to keep around the
pre-tungsten on-heap memory mode for Spark 3, and make
spark.memory.offHeap.enabled always true? It would simplify the code
somewhat, but I don't feel I'm so aware of the tradeoffs.

I know we didn't deprecate it, but it's been off by default for a long
time. It could be deprecated, too.

Same question for spark.memory.useLegacyMode and all its various
associated settings? Seems like these should go away at some point,
and Spark 3 is a good point. Same issue about deprecation though.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Remove non-Tungsten mode in Spark 3?

rxin
The issue with the offheap mode is it is a pretty big behavior change and does require additional setup (also for users that run with UDFs that allocate a lot of heap memory, it might not be as good).

I can see us removing the legacy mode since it's been legacy for a long time and perhaps very few users need it. How much code does it remove though?


On Thu, Jan 03, 2019 at 2:55 PM, Sean Owen <[hidden email]> wrote:

Just wondering if there is a good reason to keep around the pre-tungsten on-heap memory mode for Spark 3, and make spark.memory.offHeap.enabled always true? It would simplify the code somewhat, but I don't feel I'm so aware of the tradeoffs.

I know we didn't deprecate it, but it's been off by default for a long time. It could be deprecated, too.

Same question for spark.memory.useLegacyMode and all its various associated settings? Seems like these should go away at some point, and Spark 3 is a good point. Same issue about deprecation though.

--------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Remove non-Tungsten mode in Spark 3?

Sean Owen-3
OK, maybe leave in tungsten for 3.0.
I did a quick check, and removing StaticMemoryManager saves a few hundred lines. It's used in MemoryStore tests internally though, and not a trivial change to remove it. It's also used directly in HashedRelation. It could still be worth removing it as a user-facing option to reduce confusion about memory tuning, but it wouldn't take out much code. What do you all think?

On Thu, Jan 3, 2019 at 9:41 PM Reynold Xin <[hidden email]> wrote:
The issue with the offheap mode is it is a pretty big behavior change and does require additional setup (also for users that run with UDFs that allocate a lot of heap memory, it might not be as good).

I can see us removing the legacy mode since it's been legacy for a long time and perhaps very few users need it. How much code does it remove though?


On Thu, Jan 03, 2019 at 2:55 PM, Sean Owen <[hidden email]> wrote:

Just wondering if there is a good reason to keep around the pre-tungsten on-heap memory mode for Spark 3, and make spark.memory.offHeap.enabled always true? It would simplify the code somewhat, but I don't feel I'm so aware of the tradeoffs.

I know we didn't deprecate it, but it's been off by default for a long time. It could be deprecated, too.

Same question for spark.memory.useLegacyMode and all its various associated settings? Seems like these should go away at some point, and Spark 3 is a good point. Same issue about deprecation though.

--------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Remove non-Tungsten mode in Spark 3?

Erik Erlandson-2
Removing the user facing config seems like a good idea from the standpoint of reducing cognitive load, and documentation

On Fri, Jan 4, 2019 at 7:03 AM Sean Owen <[hidden email]> wrote:
OK, maybe leave in tungsten for 3.0.
I did a quick check, and removing StaticMemoryManager saves a few hundred lines. It's used in MemoryStore tests internally though, and not a trivial change to remove it. It's also used directly in HashedRelation. It could still be worth removing it as a user-facing option to reduce confusion about memory tuning, but it wouldn't take out much code. What do you all think?

On Thu, Jan 3, 2019 at 9:41 PM Reynold Xin <[hidden email]> wrote:
The issue with the offheap mode is it is a pretty big behavior change and does require additional setup (also for users that run with UDFs that allocate a lot of heap memory, it might not be as good).

I can see us removing the legacy mode since it's been legacy for a long time and perhaps very few users need it. How much code does it remove though?


On Thu, Jan 03, 2019 at 2:55 PM, Sean Owen <[hidden email]> wrote:

Just wondering if there is a good reason to keep around the pre-tungsten on-heap memory mode for Spark 3, and make spark.memory.offHeap.enabled always true? It would simplify the code somewhat, but I don't feel I'm so aware of the tradeoffs.

I know we didn't deprecate it, but it's been off by default for a long time. It could be deprecated, too.

Same question for spark.memory.useLegacyMode and all its various associated settings? Seems like these should go away at some point, and Spark 3 is a good point. Same issue about deprecation though.

--------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Remove non-Tungsten mode in Spark 3?

Sean Owen-3
I haven't touched Tungsten, but have proposed removing the deprecated old memory manager and settings -- yes I think that's the primary argument for it.

On Wed, Jan 9, 2019 at 6:06 PM Erik Erlandson <[hidden email]> wrote:
Removing the user facing config seems like a good idea from the standpoint of reducing cognitive load, and documentation

On Fri, Jan 4, 2019 at 7:03 AM Sean Owen <[hidden email]> wrote:
OK, maybe leave in tungsten for 3.0.
I did a quick check, and removing StaticMemoryManager saves a few hundred lines. It's used in MemoryStore tests internally though, and not a trivial change to remove it. It's also used directly in HashedRelation. It could still be worth removing it as a user-facing option to reduce confusion about memory tuning, but it wouldn't take out much code. What do you all think?