[SPARK-29176][DISCUSS] Optimization should change join type to CROSS

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[SPARK-29176][DISCUSS] Optimization should change join type to CROSS

Enrico Minack

Hi,

I would like to discuss issue SPARK-29176 to see if this is considered a bug and if so, to sketch out a fix.

In short, the issue is that a valid inner join with condition gets optimized so that no condition is left, but the type is still INNER. Then CheckCartesianProducts throws an exception. The type should have changed to CROSS when it gets optimized in that way.

I understand that with spark.sql.crossJoin.enabled you can make Spark not throw this exception, but I think you should not need this work-around for a valid query.

Please let me know what you think about this issue and how I could fix it. It might affect more rules than the two given in the Jira ticket.

Thanks,
Enrico
Reply | Threaded
Open this post in threaded view
|

Re: [SPARK-29176][DISCUSS] Optimization should change join type to CROSS

Sean Owen-2
You asked for an inner join but it turned into a cross-join. This
might be surprising, hence the error you can disable.
The query is not invalid in any case. It's just stopping you from
doing something you may not meant to, and which may be expensive.
However I think we've already changed the default to enable it in
Spark 3 anyway.

On Wed, Nov 6, 2019 at 8:50 AM Enrico Minack <[hidden email]> wrote:

>
> Hi,
>
> I would like to discuss issue SPARK-29176 to see if this is considered a bug and if so, to sketch out a fix.
>
> In short, the issue is that a valid inner join with condition gets optimized so that no condition is left, but the type is still INNER. Then CheckCartesianProducts throws an exception. The type should have changed to CROSS when it gets optimized in that way.
>
> I understand that with spark.sql.crossJoin.enabled you can make Spark not throw this exception, but I think you should not need this work-around for a valid query.
>
> Please let me know what you think about this issue and how I could fix it. It might affect more rules than the two given in the Jira ticket.
>
> Thanks,
> Enrico

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [SPARK-29176][DISCUSS] Optimization should change join type to CROSS

Enrico Minack
So you say the optimized inner join with no conditions is also a valid
query?

Then I agree the optimizer is not breaking the query, hence it is not a bug.

Enrico

Am 06.11.19 um 15:53 schrieb Sean Owen:

> You asked for an inner join but it turned into a cross-join. This
> might be surprising, hence the error you can disable.
> The query is not invalid in any case. It's just stopping you from
> doing something you may not meant to, and which may be expensive.
> However I think we've already changed the default to enable it in
> Spark 3 anyway.
>
> On Wed, Nov 6, 2019 at 8:50 AM Enrico Minack <[hidden email]> wrote:
>> Hi,
>>
>> I would like to discuss issue SPARK-29176 to see if this is considered a bug and if so, to sketch out a fix.
>>
>> In short, the issue is that a valid inner join with condition gets optimized so that no condition is left, but the type is still INNER. Then CheckCartesianProducts throws an exception. The type should have changed to CROSS when it gets optimized in that way.
>>
>> I understand that with spark.sql.crossJoin.enabled you can make Spark not throw this exception, but I think you should not need this work-around for a valid query.
>>
>> Please let me know what you think about this issue and how I could fix it. It might affect more rules than the two given in the Jira ticket.
>>
>> Thanks,
>> Enrico
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]