Presently, Apache Spark supports Python and R via a tightly integrated interop layer. It would seem that much of that existing interop layer could be refactored into a clean surface for general (third party) language bindings, such as the above mentioned. More specifically, could we generalize the following modules:
Deploy runners (e.g., PythonRunner and RRunner)
The last being questionable: integrating third party language extensions at the RDD level may be too heavy-weight and unnecessary given the preference towards the DataFrame abstraction.
The main goals of this effort would be:
Provide a clean abstraction for third party language extensions making it easier to maintain (the language extension) with the evolution of Apache Spark
Provide guidance to third party language authors on how a language extension should be implemented
Provide general reusable libraries that are not specific to any language extension
Open the door to developers that prefer alternative languages