Nested "struct" fonction call creates a compilation error in Spark SQL

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Nested "struct" fonction call creates a compilation error in Spark SQL

Olivier Girardot-2
Hi everyone, 
when we create recursive calls to "struct" (up to 5 levels) for extending a complex datastructure we end up with the following compilation error : 

org.codehaus.janino.JaninoRuntimeException: Code of method "(I[Lscala/collection/Iterator;)V" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator" grows beyond 64 KB

The CreateStruct code itself is properly using the ctx.splitExpression command but the "end result" of the df.select( struct(struct(struct(....) ))) ends up being too much.

Should I open a JIRA or is there a workaround ?

Regards,

--
Olivier Girardot | Associé
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nested "struct" fonction call creates a compilation error in Spark SQL

Michael Armbrust
Which version of Spark?  If its recent I'd open a JIRA.

On Thu, Jun 15, 2017 at 6:04 AM, Olivier Girardot <[hidden email]> wrote:
Hi everyone, 
when we create recursive calls to "struct" (up to 5 levels) for extending a complex datastructure we end up with the following compilation error : 

org.codehaus.janino.JaninoRuntimeException: Code of method "(I[Lscala/collection/Iterator;)V" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator" grows beyond 64 KB

The CreateStruct code itself is properly using the ctx.splitExpression command but the "end result" of the df.select( struct(struct(struct(....) ))) ends up being too much.

Should I open a JIRA or is there a workaround ?

Regards,

--
Olivier Girardot | Associé

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nested "struct" fonction call creates a compilation error in Spark SQL

Olivier Girardot-2
Hi Michael, 
Spark 2.0.2 - but I have a very interesting test case actually
The optimiser seems to be at fault in a way, I've joined to this email the explain when I limit myself to 2 levels of struct mutation and when it goes to 5.
As you can see the optimiser seems to be doing a lot more in the later case.
After further investigation, the code is not "failing" per se - spark is trying the whole stage codegen, the compilation is failing due to the compilation error and I think it's falling back to the "non codegen" way.

I'll try to create a simpler test case to reproduce this if I can, what do you think ?

Regards, 

Olivier.


2017-06-15 21:08 GMT+02:00 Michael Armbrust <[hidden email]>:
Which version of Spark?  If its recent I'd open a JIRA.

On Thu, Jun 15, 2017 at 6:04 AM, Olivier Girardot <[hidden email]> wrote:
Hi everyone, 
when we create recursive calls to "struct" (up to 5 levels) for extending a complex datastructure we end up with the following compilation error : 

org.codehaus.janino.JaninoRuntimeException: Code of method "(I[Lscala/collection/Iterator;)V" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator" grows beyond 64 KB

The CreateStruct code itself is properly using the ctx.splitExpression command but the "end result" of the df.select( struct(struct(struct(....) ))) ends up being too much.

Should I open a JIRA or is there a workaround ?

Regards,

--
Olivier Girardot | Associé




--
Olivier Girardot | Associé


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

nested_2_levels (4K) Download Attachment
nested_5_levels (354K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nested "struct" fonction call creates a compilation error in Spark SQL

Michael Armbrust
You might also try with a newer version.  Several instance of code generation failures have been fixed since 2.0.

On Thu, Jun 15, 2017 at 1:15 PM, Olivier Girardot <[hidden email]> wrote:
Hi Michael, 
Spark 2.0.2 - but I have a very interesting test case actually
The optimiser seems to be at fault in a way, I've joined to this email the explain when I limit myself to 2 levels of struct mutation and when it goes to 5.
As you can see the optimiser seems to be doing a lot more in the later case.
After further investigation, the code is not "failing" per se - spark is trying the whole stage codegen, the compilation is failing due to the compilation error and I think it's falling back to the "non codegen" way.

I'll try to create a simpler test case to reproduce this if I can, what do you think ?

Regards, 

Olivier.


2017-06-15 21:08 GMT+02:00 Michael Armbrust <[hidden email]>:
Which version of Spark?  If its recent I'd open a JIRA.

On Thu, Jun 15, 2017 at 6:04 AM, Olivier Girardot <[hidden email]> wrote:
Hi everyone, 
when we create recursive calls to "struct" (up to 5 levels) for extending a complex datastructure we end up with the following compilation error : 

org.codehaus.janino.JaninoRuntimeException: Code of method "(I[Lscala/collection/Iterator;)V" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator" grows beyond 64 KB

The CreateStruct code itself is properly using the ctx.splitExpression command but the "end result" of the df.select( struct(struct(struct(....) ))) ends up being too much.

Should I open a JIRA or is there a workaround ?

Regards,

--
Olivier Girardot | Associé




--
Olivier Girardot | Associé

Loading...