Spark 2.4.5 release for Parquet and Avro dependency updates?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark 2.4.5 release for Parquet and Avro dependency updates?

Michael Heuer
Hello,

Avro 1.8.2 to 1.9.1 is a binary incompatible update, and it appears that Parquet 1.10.1 to 1.11 will be a runtime-incompatible update (see thread on dev@parquet).

Might there be any desire to cut a Spark 2.4.5 release so that users can pick up these changes independently of all the other changes in Spark 3.0?

Thank you in advance,

   michael
Reply | Threaded
Open this post in threaded view
|

Re: Spark 2.4.5 release for Parquet and Avro dependency updates?

Nan Zhu
I am not sure if it is a good practice to have breaking changes in dependencies for maintenance releases

On Fri, Nov 22, 2019 at 8:56 AM Michael Heuer <[hidden email]> wrote:
Hello,

Avro 1.8.2 to 1.9.1 is a binary incompatible update, and it appears that Parquet 1.10.1 to 1.11 will be a runtime-incompatible update (see thread on dev@parquet).

Might there be any desire to cut a Spark 2.4.5 release so that users can pick up these changes independently of all the other changes in Spark 3.0?

Thank you in advance,

   michael
Reply | Threaded
Open this post in threaded view
|

Re: Spark 2.4.5 release for Parquet and Avro dependency updates?

Ryan Blue
In reply to this post by Michael Heuer
Just to clarify, I don't think that Parquet 1.10.1 to 1.11.0 is a runtime-incompatible change. The example mixed 1.11.0 and 1.10.1 in the same execution.

Michael, please be more careful about announcing compatibility problems in other communities. If you've observed problems, let's find out the root cause first.

rb

On Fri, Nov 22, 2019 at 8:56 AM Michael Heuer <[hidden email]> wrote:
Hello,

Avro 1.8.2 to 1.9.1 is a binary incompatible update, and it appears that Parquet 1.10.1 to 1.11 will be a runtime-incompatible update (see thread on dev@parquet).

Might there be any desire to cut a Spark 2.4.5 release so that users can pick up these changes independently of all the other changes in Spark 3.0?

Thank you in advance,

   michael


--
Ryan Blue
Software Engineer
Netflix
Reply | Threaded
Open this post in threaded view
|

Re: Spark 2.4.5 release for Parquet and Avro dependency updates?

Dongjoon Hyun-2
Hi, Michael.

I'm not sure Apache Spark is in the status close to what you want.

First, both Apache Spark 3.0.0-preview and Apache Spark 2.4 is using Avro 1.8.2. Also, `master` and `branch-2.4` branch does. Cutting new releases do not provide you what you want. 

Do we have a PR on the master branch? Otherwise, before starting to discuss the releases, could you make a PR first on the master branch? For Parquet, it's the same.

Second, we want to provide Apache Spark 3.0.0 as compatible as possible. The incompatible change could be a reason for rejection even in `master` branch for Apache Spark 3.0.0.

Lastly, we may consider backporting if it lands at `master` branch for 3.0.
However, as Nan Zhu said, the dependency upgrade backporting PR is -1 by default. Usually, it's allowed only for those serious cases like security/production outage.

Bests,
Dongjoon.


On Fri, Nov 22, 2019 at 9:00 AM Ryan Blue <[hidden email]> wrote:
Just to clarify, I don't think that Parquet 1.10.1 to 1.11.0 is a runtime-incompatible change. The example mixed 1.11.0 and 1.10.1 in the same execution.

Michael, please be more careful about announcing compatibility problems in other communities. If you've observed problems, let's find out the root cause first.

rb

On Fri, Nov 22, 2019 at 8:56 AM Michael Heuer <[hidden email]> wrote:
Hello,

Avro 1.8.2 to 1.9.1 is a binary incompatible update, and it appears that Parquet 1.10.1 to 1.11 will be a runtime-incompatible update (see thread on dev@parquet).

Might there be any desire to cut a Spark 2.4.5 release so that users can pick up these changes independently of all the other changes in Spark 3.0?

Thank you in advance,

   michael


--
Ryan Blue
Software Engineer
Netflix
Reply | Threaded
Open this post in threaded view
|

Re: Spark 2.4.5 release for Parquet and Avro dependency updates?

Michael Heuer
Hello,

I am sorry for asking a somewhat inappropriate question.

For context, our projects depend on a fix in Parquet master but not yet released.  Parquet 1.11.0 is in release-candidate phase.  It looks like we can't build against Parquet 1.11.0 RC to include the fix and run successfully on Spark 2.4.x, which includes 1.10.1, without various classpath workarounds.

I see now that Spark policy requires the Avro upgrade to wait until Spark 3.0, and since Parquet 1.11.0 RC currently depends on Avro 1.9.1, it may also have to wait.  I'll continue to think on this in the scope of the Parquet community.

Thank you for the clarification,

   michael


On Nov 22, 2019, at 12:07 PM, Dongjoon Hyun <[hidden email]> wrote:

Hi, Michael.

I'm not sure Apache Spark is in the status close to what you want.

First, both Apache Spark 3.0.0-preview and Apache Spark 2.4 is using Avro 1.8.2. Also, `master` and `branch-2.4` branch does. Cutting new releases do not provide you what you want. 

Do we have a PR on the master branch? Otherwise, before starting to discuss the releases, could you make a PR first on the master branch? For Parquet, it's the same.

Second, we want to provide Apache Spark 3.0.0 as compatible as possible. The incompatible change could be a reason for rejection even in `master` branch for Apache Spark 3.0.0.

Lastly, we may consider backporting if it lands at `master` branch for 3.0.
However, as Nan Zhu said, the dependency upgrade backporting PR is -1 by default. Usually, it's allowed only for those serious cases like security/production outage.

Bests,
Dongjoon.


On Fri, Nov 22, 2019 at 9:00 AM Ryan Blue <[hidden email]> wrote:
Just to clarify, I don't think that Parquet 1.10.1 to 1.11.0 is a runtime-incompatible change. The example mixed 1.11.0 and 1.10.1 in the same execution.

Michael, please be more careful about announcing compatibility problems in other communities. If you've observed problems, let's find out the root cause first.

rb

On Fri, Nov 22, 2019 at 8:56 AM Michael Heuer <[hidden email]> wrote:
Hello,

Avro 1.8.2 to 1.9.1 is a binary incompatible update, and it appears that Parquet 1.10.1 to 1.11 will be a runtime-incompatible update (see thread on dev@parquet).

Might there be any desire to cut a Spark 2.4.5 release so that users can pick up these changes independently of all the other changes in Spark 3.0?

Thank you in advance,

   michael


--
Ryan Blue
Software Engineer
Netflix

Reply | Threaded
Open this post in threaded view
|

Re: Spark 2.4.5 release for Parquet and Avro dependency updates?

Sean Owen-2
I haven't been following this closely, but I'm aware that there are
some tricky compatibility problems between Avro and Parquet, both of
which are used in Spark. That's made it pretty hard to update in 2.x.
master/3.0 is on Parquet 1.10.1 and Avro 1.8.2. Just a general
question: is that the best combo going forward? because the time to
update would be right about now for Spark 3. Backporting to 2.x is
pretty unlikely though.

On Fri, Nov 22, 2019 at 12:45 PM Michael Heuer <[hidden email]> wrote:

>
> Hello,
>
> I am sorry for asking a somewhat inappropriate question.
>
> For context, our projects depend on a fix in Parquet master but not yet released.  Parquet 1.11.0 is in release-candidate phase.  It looks like we can't build against Parquet 1.11.0 RC to include the fix and run successfully on Spark 2.4.x, which includes 1.10.1, without various classpath workarounds.
>
> I see now that Spark policy requires the Avro upgrade to wait until Spark 3.0, and since Parquet 1.11.0 RC currently depends on Avro 1.9.1, it may also have to wait.  I'll continue to think on this in the scope of the Parquet community.
>
> Thank you for the clarification,
>
>    michael
>
>
> On Nov 22, 2019, at 12:07 PM, Dongjoon Hyun <[hidden email]> wrote:
>
> Hi, Michael.
>
> I'm not sure Apache Spark is in the status close to what you want.
>
> First, both Apache Spark 3.0.0-preview and Apache Spark 2.4 is using Avro 1.8.2. Also, `master` and `branch-2.4` branch does. Cutting new releases do not provide you what you want.
>
> Do we have a PR on the master branch? Otherwise, before starting to discuss the releases, could you make a PR first on the master branch? For Parquet, it's the same.
>
> Second, we want to provide Apache Spark 3.0.0 as compatible as possible. The incompatible change could be a reason for rejection even in `master` branch for Apache Spark 3.0.0.
>
> Lastly, we may consider backporting if it lands at `master` branch for 3.0.
> However, as Nan Zhu said, the dependency upgrade backporting PR is -1 by default. Usually, it's allowed only for those serious cases like security/production outage.
>
> Bests,
> Dongjoon.
>
>
> On Fri, Nov 22, 2019 at 9:00 AM Ryan Blue <[hidden email]> wrote:
>>
>> Just to clarify, I don't think that Parquet 1.10.1 to 1.11.0 is a runtime-incompatible change. The example mixed 1.11.0 and 1.10.1 in the same execution.
>>
>> Michael, please be more careful about announcing compatibility problems in other communities. If you've observed problems, let's find out the root cause first.
>>
>> rb
>>
>> On Fri, Nov 22, 2019 at 8:56 AM Michael Heuer <[hidden email]> wrote:
>>>
>>> Hello,
>>>
>>> Avro 1.8.2 to 1.9.1 is a binary incompatible update, and it appears that Parquet 1.10.1 to 1.11 will be a runtime-incompatible update (see thread on dev@parquet).
>>>
>>> Might there be any desire to cut a Spark 2.4.5 release so that users can pick up these changes independently of all the other changes in Spark 3.0?
>>>
>>> Thank you in advance,
>>>
>>>    michael
>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]