Whole-stage codegen and SparkPlan.newPredicate

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Whole-stage codegen and SparkPlan.newPredicate

Jacek Laskowski
Hi,

While working on an issue with Whole-stage codegen as reported @ https://stackoverflow.com/q/48026060/1305344 I found out that spark.sql.codegen.wholeStage=false does *not* turn whole-stage codegen off completely.

It looks like SparkPlan.newPredicate [1] gets called regardless of the value of spark.sql.codegen.wholeStage property.

$ ./bin/spark-shell --conf spark.sql.codegen.wholeStage=false
...
scala> spark.sessionState.conf.wholeStageEnabled
res7: Boolean = false

That leads to an issue in the SO question with whole-stage codegen regardless of the value:

...
  at org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:385)
  at org.apache.spark.sql.execution.FilterExec$$anonfun$18.apply(basicPhysicalOperators.scala:214)
  at org.apache.spark.sql.execution.FilterExec$$anonfun$18.apply(basicPhysicalOperators.scala:213)
  at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$apply$24.apply(RDD.scala:816) 
...

Is this a bug or does it work as intended? Why?


Pozdrawiam,
Jacek Laskowski
----
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Reply | Threaded
Open this post in threaded view
|

Re: Whole-stage codegen and SparkPlan.newPredicate

Herman van Hövell tot Westerflier-2
Hi Jacek,

In this case whole stage code generation is turned off. However we still use code generation for a lot of other things: projections, predicates, orderings & encoders. You are currently seeing a compile time failure while generating a predicate. There is currently no easy way to turn code generation off entirely.

The error itself is not great, but it still captures the problem in a relatively timely fashion. We should have caught this during analysis though. Can you file a ticket?

- Herman

On Sat, Dec 30, 2017 at 9:16 AM, Jacek Laskowski <[hidden email]> wrote:
Hi,

While working on an issue with Whole-stage codegen as reported @ https://stackoverflow.com/q/48026060/1305344 I found out that spark.sql.codegen.wholeStage=false does *not* turn whole-stage codegen off completely.

It looks like SparkPlan.newPredicate [1] gets called regardless of the value of spark.sql.codegen.wholeStage property.

$ ./bin/spark-shell --conf spark.sql.codegen.wholeStage=false
...
scala> spark.sessionState.conf.wholeStageEnabled
res7: Boolean = false

That leads to an issue in the SO question with whole-stage codegen regardless of the value:

...
  at org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:385)
  at org.apache.spark.sql.execution.FilterExec$$anonfun$18.apply(basicPhysicalOperators.scala:214)
  at org.apache.spark.sql.execution.FilterExec$$anonfun$18.apply(basicPhysicalOperators.scala:213)
  at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$apply$24.apply(RDD.scala:816) 
...

Is this a bug or does it work as intended? Why?


Pozdrawiam,
Jacek Laskowski
----
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark


Reply | Threaded
Open this post in threaded view
|

Re: Whole-stage codegen and SparkPlan.newPredicate

Kazuaki Ishizaki
I ran the program in URL of stackoverflow with Spark 2.2.1 and master. I cannot see the exception even when I disabled whole-stage codegen. Am I wrong?
We would appreciate it if you could create a JIRA entry with simple standalone repro.

In addition to this report, I realized that this program produces incorrect results. I created a JIRA entry https://issues.apache.org/jira/browse/SPARK-22934.

Best Regards,
Kazuaki Ishizaki



From:        Herman van Hövell tot Westerflier <[hidden email]>
To:        Jacek Laskowski <[hidden email]>
Cc:        dev <[hidden email]>
Date:        2017/12/31 21:44
Subject:        Re: Whole-stage codegen and SparkPlan.newPredicate




Hi Jacek,

In this case whole stage code generation is turned off. However we still use code generation for a lot of other things: projections, predicates, orderings & encoders. You are currently seeing a compile time failure while generating a predicate. There is currently no easy way to turn code generation off entirely.

The error itself is not great, but it still captures the problem in a relatively timely fashion. We should have caught this during analysis though. Can you file a ticket?

- Herman

On Sat, Dec 30, 2017 at 9:16 AM, Jacek Laskowski <jacek@...> wrote:
Hi,

While working on an issue with Whole-stage codegen as reported @ https://stackoverflow.com/q/48026060/1305344I found out that spark.sql.codegen.wholeStage=false does *not* turn whole-stage codegen off completely.

It looks like SparkPlan.newPredicate [1] gets called regardless of the value of spark.sql.codegen.wholeStage property.

$ ./bin/spark-shell --conf spark.sql.codegen.wholeStage=false
...
scala> spark.sessionState.conf.wholeStageEnabled
res7: Boolean = false

That leads to an issue in the SO question with whole-stage codegen regardless of the value:

...
  at org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:385)
  at org.apache.spark.sql.execution.FilterExec$$anonfun$18.apply(basicPhysicalOperators.scala:214)
  at org.apache.spark.sql.execution.FilterExec$$anonfun$18.apply(basicPhysicalOperators.scala:213)
  at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$apply$24.apply(RDD.scala:816) 
...

Is this a bug or does it work as intended? Why?

[1] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala?utf8=%E2%9C%93#L386

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski



Reply | Threaded
Open this post in threaded view
|

Re: Whole-stage codegen and SparkPlan.newPredicate

Herman van Hövell tot Westerflier-2
Wrong ticket: https://issues.apache.org/jira/browse/SPARK-22935

Thanks for working on this :)

On Mon, Jan 1, 2018 at 2:22 PM, Kazuaki Ishizaki <[hidden email]> wrote:
I ran the program in URL of stackoverflow with Spark 2.2.1 and master. I cannot see the exception even when I disabled whole-stage codegen. Am I wrong?
We would appreciate it if you could create a JIRA entry with simple standalone repro.

In addition to this report, I realized that this program produces incorrect results. I created a JIRA entry https://issues.apache.org/jira/browse/SPARK-22934.

Best Regards,
Kazuaki Ishizaki



From:        Herman van Hövell tot Westerflier <[hidden email]>
To:        Jacek Laskowski <[hidden email]>
Cc:        dev <[hidden email]>
Date:        2017/12/31 21:44
Subject:        Re: Whole-stage codegen and SparkPlan.newPredicate




Hi Jacek,

In this case whole stage code generation is turned off. However we still use code generation for a lot of other things: projections, predicates, orderings & encoders. You are currently seeing a compile time failure while generating a predicate. There is currently no easy way to turn code generation off entirely.

The error itself is not great, but it still captures the problem in a relatively timely fashion. We should have caught this during analysis though. Can you file a ticket?

- Herman

On Sat, Dec 30, 2017 at 9:16 AM, Jacek Laskowski <[hidden email]> wrote:
Hi,

While working on an issue with Whole-stage codegen as reported @ https://stackoverflow.com/q/48026060/1305344I found out that spark.sql.codegen.wholeStage=false does *not* turn whole-stage codegen off completely.


It looks like SparkPlan.newPredicate [1] gets called regardless of the value of spark.sql.codegen.wholeStage property.

$ ./bin/spark-shell --conf spark.sql.codegen.wholeStage=false
...
scala> spark.sessionState.conf.wholeStageEnabled
res7: Boolean = false

That leads to an issue in the SO question with whole-stage codegen regardless of the value:

...
  at org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:385)
  at org.apache.spark.sql.execution.FilterExec$$anonfun$18.apply(basicPhysicalOperators.scala:214)
  at org.apache.spark.sql.execution.FilterExec$$anonfun$18.apply(basicPhysicalOperators.scala:213)
  at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$apply$24.apply(RDD.scala:816) 
...

Is this a bug or does it work as intended? Why?

[1] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala?utf8=%E2%9C%93#L386

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski





Reply | Threaded
Open this post in threaded view
|

Re: Whole-stage codegen and SparkPlan.newPredicate

Kazuaki Ishizaki
Thank you for your correction :)

I also made mistake in a report. What I reported at first never occurs with the correct Java bean class.
Finally, I can reproduce a problem that Jacek reported even using the master. In my environment, this problem occurs with or without whole-stage codegen. I updated the JIRA ticket.

I am still working for this.

Kazuaki Ishizaki



From:        Herman van Hövell tot Westerflier <[hidden email]>
To:        Kazuaki Ishizaki <[hidden email]>
Cc:        Jacek Laskowski <[hidden email]>, dev <[hidden email]>
Date:        2018/01/02 04:12
Subject:        Re: Whole-stage codegen and SparkPlan.newPredicate




Wrong ticket: https://issues.apache.org/jira/browse/SPARK-22935

Thanks for working on this :)

On Mon, Jan 1, 2018 at 2:22 PM, Kazuaki Ishizaki <ISHIZAKI@...> wrote:
I ran the program in URL of stackoverflow with Spark 2.2.1 and master. I cannot see the exception even when I disabled whole-stage codegen. Am I wrong?
We would appreciate it if you could create a JIRA entry with simple standalone repro.


In addition to this report, I realized that this program produces incorrect results. I created a JIRA entry
https://issues.apache.org/jira/browse/SPARK-22934.

Best Regards,
Kazuaki Ishizaki




From:        
Herman van Hövell tot Westerflier <hvanhovell@...>
To:        
Jacek Laskowski <jacek@...>
Cc:        
dev <dev@...>
Date:        
2017/12/31 21:44
Subject:        
Re: Whole-stage codegen and SparkPlan.newPredicate




Hi Jacek,

In this case whole stage code generation is turned off. However we still use code generation for a lot of other things: projections, predicates, orderings & encoders. You are currently seeing a compile time failure while generating a predicate. There is currently no easy way to turn code generation off entirely.

The error itself is not great, but it still captures the problem in a relatively timely fashion. We should have caught this during analysis though. Can you file a ticket?

- Herman

On Sat, Dec 30, 2017 at 9:16 AM, Jacek Laskowski <
jacek@...> wrote:
Hi,

While working on an issue with Whole-stage codegen as reported @ 
https://stackoverflow.com/q/48026060/1305344I found out that spark.sql.codegen.wholeStage=false does *not* turn whole-stage codegen off completely.


It looks like SparkPlan.newPredicate [1] gets called regardless of the value of spark.sql.codegen.wholeStage property.

$ ./bin/spark-shell --conf spark.sql.codegen.wholeStage=false
...
scala> spark.sessionState.conf.wholeStageEnabled
res7: Boolean = false

That leads to an issue in the SO question with whole-stage codegen regardless of the value:

...
  at org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:385)
  at org.apache.spark.sql.execution.FilterExec$$anonfun$18.apply(basicPhysicalOperators.scala:214)
  at org.apache.spark.sql.execution.FilterExec$$anonfun$18.apply(basicPhysicalOperators.scala:213)
  at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$apply$24.apply(RDD.scala:816) 
...

Is this a bug or does it work as intended? Why?

[1] 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala?utf8=%E2%9C%93#L386

Pozdrawiam,
Jacek Laskowski
----

https://about.me/JacekLaskowski
Mastering Spark SQL 
https://bit.ly/mastering-spark-sql
Spark Structured Streaming 
https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 
https://bit.ly/mastering-apache-spark
Follow me at
https://twitter.com/jaceklaskowski






Reply | Threaded
Open this post in threaded view
|

Re: Whole-stage codegen and SparkPlan.newPredicate

Jacek Laskowski
Thanks for looking into it, Kazuaki!


On Tue, Jan 2, 2018 at 4:27 AM, Kazuaki Ishizaki <[hidden email]> wrote:
Thank you for your correction :)

I also made mistake in a report. What I reported at first never occurs with the correct Java bean class.
Finally, I can reproduce a problem that Jacek reported even using the master. In my environment, this problem occurs with or without whole-stage codegen. I updated the JIRA ticket.

I am still working for this.

Kazuaki Ishizaki



From:        Herman van Hövell tot Westerflier <[hidden email]>
To:        Kazuaki Ishizaki <[hidden email]>
Cc:        Jacek Laskowski <[hidden email]>, dev <[hidden email]>
Date:        2018/01/02 04:12

Subject:        Re: Whole-stage codegen and SparkPlan.newPredicate




Wrong ticket: https://issues.apache.org/jira/browse/SPARK-22935

Thanks for working on this :)

On Mon, Jan 1, 2018 at 2:22 PM, Kazuaki Ishizaki <[hidden email]> wrote:
I ran the program in URL of stackoverflow with Spark 2.2.1 and master. I cannot see the exception even when I disabled whole-stage codegen. Am I wrong?
We would appreciate it if you could create a JIRA entry with simple standalone repro.


In addition to this report, I realized that this program produces incorrect results. I created a JIRA entry
https://issues.apache.org/jira/browse/SPARK-22934.

Best Regards,
Kazuaki Ishizaki




From:        
Herman van Hövell tot Westerflier <[hidden email]>
To:        
Jacek Laskowski <[hidden email]>
Cc:        
dev <[hidden email]>
Date:        
2017/12/31 21:44
Subject:        
Re: Whole-stage codegen and SparkPlan.newPredicate




Hi Jacek,

In this case whole stage code generation is turned off. However we still use code generation for a lot of other things: projections, predicates, orderings & encoders. You are currently seeing a compile time failure while generating a predicate. There is currently no easy way to turn code generation off entirely.

The error itself is not great, but it still captures the problem in a relatively timely fashion. We should have caught this during analysis though. Can you file a ticket?

- Herman

On Sat, Dec 30, 2017 at 9:16 AM, Jacek Laskowski <
[hidden email]> wrote:
Hi,

While working on an issue with Whole-stage codegen as reported @ 
https://stackoverflow.com/q/48026060/1305344I found out that spark.sql.codegen.wholeStage=false does *not* turn whole-stage codegen off completely.


It looks like SparkPlan.newPredicate [1] gets called regardless of the value of spark.sql.codegen.wholeStage property.

$ ./bin/spark-shell --conf spark.sql.codegen.wholeStage=false
...
scala> spark.sessionState.conf.wholeStageEnabled
res7: Boolean = false

That leads to an issue in the SO question with whole-stage codegen regardless of the value:

...
  at org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:385)
  at org.apache.spark.sql.execution.FilterExec$$anonfun$18.apply(basicPhysicalOperators.scala:214)
  at org.apache.spark.sql.execution.FilterExec$$anonfun$18.apply(basicPhysicalOperators.scala:213)
  at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$apply$24.apply(RDD.scala:816) 
...

Is this a bug or does it work as intended? Why?

[1] 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala?utf8=%E2%9C%93#L386

Pozdrawiam,
Jacek Laskowski
----

https://about.me/JacekLaskowski
Mastering Spark SQL 
https://bit.ly/mastering-spark-sql
Spark Structured Streaming 
https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 
https://bit.ly/mastering-apache-spark
Follow me at
https://twitter.com/jaceklaskowski