Re: Why Spark generates Java code and not Scala?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Why Spark generates Java code and not Scala?

Holden Karau

Switching this from user to dev

On Sat, Nov 9, 2019 at 9:47 AM Bartosz Konieczny <[hidden email]> wrote:
Hi there,

Few days ago I got an intriguing but hard to answer question:
"Why Spark generates Java code and not Scala code?"

Since I'm not sure about the exact answer, I'd like to ask you to confirm or not my thinking. I was looking for the reasons in the JIRA and the research paper "Spark SQL: Relational Data Processing in Spark" (http://people.csail.mit.edu/matei/papers/2015/sigmod_spark_sql.pdf) but found nothing explaining why Java over Scala. The single task I found was about why Scala and not Java but concerning data types (https://issues.apache.org/jira/browse/SPARK-5193) That's why I'm writing here.

My guesses about choosing Java code are:
- Java runtime compiler libs are more mature and prod-ready than the Scala's - or at least, they were at the implementation time
From the discussions when I was doing some code gen (in MLlib not SQL) I think this is the primary reason why.
- Scala compiler seems to be more complex, so debugging & maintaining it would be harder
this was also given as a secondary reason
- it was easier to represent a pure Java OO design than mixed FP/OO in Scala
no one brought up this point. Maybe it was a consideration and it just wasn’t raised.
?

Thank you for your help.
--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: Why Spark generates Java code and not Scala?

rxin
It’s mainly due to compilation speed. Scala compiler is known to be slow. Even javac is quite slow. We use Janino which is a simpler compiler to get faster compilation speed at runtime. 

Also for low level code we can’t use (due to perf concerns) any of the edges scala has over java, eg we can’t use the scala collection library, functional programming, map/flatMap. So using scala doesn’t really buy anything even if there is no compilation speed concerns. 

On Sat, Nov 9, 2019 at 9:52 AM Holden Karau <[hidden email]> wrote:

Switching this from user to dev

On Sat, Nov 9, 2019 at 9:47 AM Bartosz Konieczny <[hidden email]> wrote:
Hi there,

Few days ago I got an intriguing but hard to answer question:
"Why Spark generates Java code and not Scala code?"

Since I'm not sure about the exact answer, I'd like to ask you to confirm or not my thinking. I was looking for the reasons in the JIRA and the research paper "Spark SQL: Relational Data Processing in Spark" (http://people.csail.mit.edu/matei/papers/2015/sigmod_spark_sql.pdf) but found nothing explaining why Java over Scala. The single task I found was about why Scala and not Java but concerning data types (https://issues.apache.org/jira/browse/SPARK-5193) That's why I'm writing here.

My guesses about choosing Java code are:
- Java runtime compiler libs are more mature and prod-ready than the Scala's - or at least, they were at the implementation time
From the discussions when I was doing some code gen (in MLlib not SQL) I think this is the primary reason why.
- Scala compiler seems to be more complex, so debugging & maintaining it would be harder
this was also given as a secondary reason
- it was easier to represent a pure Java OO design than mixed FP/OO in Scala
no one brought up this point. Maybe it was a consideration and it just wasn’t raised.
--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9