Object serialization for workers

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Object serialization for workers

R. Tyler Croy

Greetings! I am looking into the possibility of JRuby support for Spark, and
could use some pointers (references?) to orient myself a bit better within the
codebase.

JRuby fat jars load just fine in Spark but where things start to get
predictably dicey is with object serialization for RDDs getting sent to the
workers.

Having worked on something similar for Apache Storm
(https://github.com/jruby-gradle/redstorm), what we ended up doing was shimming
some classes to handy Ruby object/class serialization properly.

I'm expecting to do something similar in Spark but I'm not entirely sure which
interfaces/classes describe the serialization of RDDs. I'm figuring that I'll
need to implement a Ruby equivalent of the org.apache.spark.api.java.function
namespaces, but am not entirely where the pieces come together to turn those
into serialized objects.


Appreciate any direction you all might be able to share, in the meantime, I've
got my miner's cap on and am presently digging through core/ :)



Cheers

--
GitHub:  https://github.com/rtyler

GPG Key ID: 0F2298A980EE31ACCA0A7825E5C92681BEF6CEA2

signature.asc (891 bytes) Download Attachment