Re: Ask for ARM CI for spark

classic Classic list List threaded Threaded
57 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Ask for ARM CI for spark

Holden Karau
Moving to dev@ for increased visibility among the developers.

On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <[hidden email]> wrote:
Thanks for your reply.

As I said before, I met some problem of build or test for spark on aarch64 server, so it will be better to have the ARM CI to make sure the spark is compatible for AArch64 platforms. 

I’m from OpenLab team(https://openlabtesting.org/ ,a community to do open source project testing. And we can support some Arm virtual machines to AMPLab Jenkins, and also we have a developer team that willing to work on this, we willing to maintain build CI jobs and address the CI issues.  What do you think?


Thanks for your attention.


On Wed, Jun 19, 2019 at 6:39 AM shane knapp <[hidden email]> wrote:
yeah, we don't have any aarch64 systems for testing...  this has been asked before but is currently pretty low on our priority list as we don't have the hardware.

sorry,

shane

On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <[hidden email]> wrote:
Hi, sorry to disturb you. 
The CI testing for apache spark is supported by AMPLab Jenkins, and I find there are some computers(most of them are Linux (amd64) arch) for the CI development, but seems there is no Aarch64 computer for spark CI testing. Recently, I build and run test for spark(master and branch-2.4) on my arm server, and unfortunately there are some problems, for example, ut test is failed due to a LEVELDBJNI native package, the details for java test see http://paste.openstack.org/show/752063/ and python test see http://paste.openstack.org/show/752709/ 
So I have a question about the ARM CI testing for spark, is there any plan to support it? Thank you very much and I will wait for your reply!


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: Ask for ARM CI for spark

Sean Owen-2
I'd begin by reporting and fixing ARM-related issues in the build. If
they're small, of course we should do them. If it requires significant
modifications, we can discuss how much Spark can support ARM. I don't
think it's yet necessary for the Spark project to run these CI builds
until that point, but it's always welcome if people are testing that
separately.

On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <[hidden email]> wrote:

>
> Moving to dev@ for increased visibility among the developers.
>
> On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <[hidden email]> wrote:
>>
>> Thanks for your reply.
>>
>> As I said before, I met some problem of build or test for spark on aarch64 server, so it will be better to have the ARM CI to make sure the spark is compatible for AArch64 platforms.
>>
>> I’m from OpenLab team(https://openlabtesting.org/ ,a community to do open source project testing. And we can support some Arm virtual machines to AMPLab Jenkins, and also we have a developer team that willing to work on this, we willing to maintain build CI jobs and address the CI issues.  What do you think?
>>
>>
>> Thanks for your attention.
>>
>>
>> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <[hidden email]> wrote:
>>>
>>> yeah, we don't have any aarch64 systems for testing...  this has been asked before but is currently pretty low on our priority list as we don't have the hardware.
>>>
>>> sorry,
>>>
>>> shane
>>>
>>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <[hidden email]> wrote:
>>>>
>>>> Hi, sorry to disturb you.
>>>> The CI testing for apache spark is supported by AMPLab Jenkins, and I find there are some computers(most of them are Linux (amd64) arch) for the CI development, but seems there is no Aarch64 computer for spark CI testing. Recently, I build and run test for spark(master and branch-2.4) on my arm server, and unfortunately there are some problems, for example, ut test is failed due to a LEVELDBJNI native package, the details for java test see http://paste.openstack.org/show/752063/ and python test see http://paste.openstack.org/show/752709/
>>>> So I have a question about the ARM CI testing for spark, is there any plan to support it? Thank you very much and I will wait for your reply!
>>>
>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Ask for ARM CI for spark

Tianhua huang
Thanks Sean.

I am very happy to hear that the community will put effort to fix the ARM-related issues. I'd be happy to help if you like. And could you give the trace link of this issue, then I can check it is fixed or not, thank you. 
As far as I know the old versions of spark support ARM, and now the new versions don't, this just shows that we need a CI to check whether the spark support ARM and whether some modification break it. 
I will add a demo job in OpenLab to build spark on ARM and do a simple UT test. Later I will give the job link. 

Let me know what you think.

Thank you all!


On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <[hidden email]> wrote:
I'd begin by reporting and fixing ARM-related issues in the build. If
they're small, of course we should do them. If it requires significant
modifications, we can discuss how much Spark can support ARM. I don't
think it's yet necessary for the Spark project to run these CI builds
until that point, but it's always welcome if people are testing that
separately.

On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <[hidden email]> wrote:
>
> Moving to dev@ for increased visibility among the developers.
>
> On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <[hidden email]> wrote:
>>
>> Thanks for your reply.
>>
>> As I said before, I met some problem of build or test for spark on aarch64 server, so it will be better to have the ARM CI to make sure the spark is compatible for AArch64 platforms.
>>
>> I’m from OpenLab team(https://openlabtesting.org/ ,a community to do open source project testing. And we can support some Arm virtual machines to AMPLab Jenkins, and also we have a developer team that willing to work on this, we willing to maintain build CI jobs and address the CI issues.  What do you think?
>>
>>
>> Thanks for your attention.
>>
>>
>> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <[hidden email]> wrote:
>>>
>>> yeah, we don't have any aarch64 systems for testing...  this has been asked before but is currently pretty low on our priority list as we don't have the hardware.
>>>
>>> sorry,
>>>
>>> shane
>>>
>>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <[hidden email]> wrote:
>>>>
>>>> Hi, sorry to disturb you.
>>>> The CI testing for apache spark is supported by AMPLab Jenkins, and I find there are some computers(most of them are Linux (amd64) arch) for the CI development, but seems there is no Aarch64 computer for spark CI testing. Recently, I build and run test for spark(master and branch-2.4) on my arm server, and unfortunately there are some problems, for example, ut test is failed due to a LEVELDBJNI native package, the details for java test see http://paste.openstack.org/show/752063/ and python test see http://paste.openstack.org/show/752709/
>>>> So I have a question about the ARM CI testing for spark, is there any plan to support it? Thank you very much and I will wait for your reply!
>>>
>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
Reply | Threaded
Open this post in threaded view
|

Re: Ask for ARM CI for spark

Tianhua huang
I forked apache/spark project and propose a job(https://github.com/theopenlab/spark/pull/1) for spark building in OpenLab ARM instance, this is the first step to build spark on ARM,  I can enable a periodic job for arm building for apache/spark master if you guys like.  Later I will run tests for spark. I also willing to be the maintainer of the arm ci of spark. 

Thanks for you attention.

On Thu, Jun 20, 2019 at 10:17 AM Tianhua huang <[hidden email]> wrote:
Thanks Sean.

I am very happy to hear that the community will put effort to fix the ARM-related issues. I'd be happy to help if you like. And could you give the trace link of this issue, then I can check it is fixed or not, thank you. 
As far as I know the old versions of spark support ARM, and now the new versions don't, this just shows that we need a CI to check whether the spark support ARM and whether some modification break it. 
I will add a demo job in OpenLab to build spark on ARM and do a simple UT test. Later I will give the job link. 

Let me know what you think.

Thank you all!


On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <[hidden email]> wrote:
I'd begin by reporting and fixing ARM-related issues in the build. If
they're small, of course we should do them. If it requires significant
modifications, we can discuss how much Spark can support ARM. I don't
think it's yet necessary for the Spark project to run these CI builds
until that point, but it's always welcome if people are testing that
separately.

On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <[hidden email]> wrote:
>
> Moving to dev@ for increased visibility among the developers.
>
> On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <[hidden email]> wrote:
>>
>> Thanks for your reply.
>>
>> As I said before, I met some problem of build or test for spark on aarch64 server, so it will be better to have the ARM CI to make sure the spark is compatible for AArch64 platforms.
>>
>> I’m from OpenLab team(https://openlabtesting.org/ ,a community to do open source project testing. And we can support some Arm virtual machines to AMPLab Jenkins, and also we have a developer team that willing to work on this, we willing to maintain build CI jobs and address the CI issues.  What do you think?
>>
>>
>> Thanks for your attention.
>>
>>
>> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <[hidden email]> wrote:
>>>
>>> yeah, we don't have any aarch64 systems for testing...  this has been asked before but is currently pretty low on our priority list as we don't have the hardware.
>>>
>>> sorry,
>>>
>>> shane
>>>
>>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <[hidden email]> wrote:
>>>>
>>>> Hi, sorry to disturb you.
>>>> The CI testing for apache spark is supported by AMPLab Jenkins, and I find there are some computers(most of them are Linux (amd64) arch) for the CI development, but seems there is no Aarch64 computer for spark CI testing. Recently, I build and run test for spark(master and branch-2.4) on my arm server, and unfortunately there are some problems, for example, ut test is failed due to a LEVELDBJNI native package, the details for java test see http://paste.openstack.org/show/752063/ and python test see http://paste.openstack.org/show/752709/
>>>> So I have a question about the ARM CI testing for spark, is there any plan to support it? Thank you very much and I will wait for your reply!
>>>
>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
Reply | Threaded
Open this post in threaded view
|

Re: Ask for ARM CI for spark

shane knapp
i'd much prefer that we keep the test/build infrastructure in one place. 

we don't have ARM hardware, but there's a slim possibility i can scare something up in our older research stock...

another option would be to run the build in a arm-based docker container, which (according to the intarwebs) is possible.

shane

On Tue, Jun 25, 2019 at 6:35 PM Tianhua huang <[hidden email]> wrote:
I forked apache/spark project and propose a job(https://github.com/theopenlab/spark/pull/1) for spark building in OpenLab ARM instance, this is the first step to build spark on ARM,  I can enable a periodic job for arm building for apache/spark master if you guys like.  Later I will run tests for spark. I also willing to be the maintainer of the arm ci of spark. 

Thanks for you attention.

On Thu, Jun 20, 2019 at 10:17 AM Tianhua huang <[hidden email]> wrote:
Thanks Sean.

I am very happy to hear that the community will put effort to fix the ARM-related issues. I'd be happy to help if you like. And could you give the trace link of this issue, then I can check it is fixed or not, thank you. 
As far as I know the old versions of spark support ARM, and now the new versions don't, this just shows that we need a CI to check whether the spark support ARM and whether some modification break it. 
I will add a demo job in OpenLab to build spark on ARM and do a simple UT test. Later I will give the job link. 

Let me know what you think.

Thank you all!


On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <[hidden email]> wrote:
I'd begin by reporting and fixing ARM-related issues in the build. If
they're small, of course we should do them. If it requires significant
modifications, we can discuss how much Spark can support ARM. I don't
think it's yet necessary for the Spark project to run these CI builds
until that point, but it's always welcome if people are testing that
separately.

On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <[hidden email]> wrote:
>
> Moving to dev@ for increased visibility among the developers.
>
> On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <[hidden email]> wrote:
>>
>> Thanks for your reply.
>>
>> As I said before, I met some problem of build or test for spark on aarch64 server, so it will be better to have the ARM CI to make sure the spark is compatible for AArch64 platforms.
>>
>> I’m from OpenLab team(https://openlabtesting.org/ ,a community to do open source project testing. And we can support some Arm virtual machines to AMPLab Jenkins, and also we have a developer team that willing to work on this, we willing to maintain build CI jobs and address the CI issues.  What do you think?
>>
>>
>> Thanks for your attention.
>>
>>
>> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <[hidden email]> wrote:
>>>
>>> yeah, we don't have any aarch64 systems for testing...  this has been asked before but is currently pretty low on our priority list as we don't have the hardware.
>>>
>>> sorry,
>>>
>>> shane
>>>
>>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <[hidden email]> wrote:
>>>>
>>>> Hi, sorry to disturb you.
>>>> The CI testing for apache spark is supported by AMPLab Jenkins, and I find there are some computers(most of them are Linux (amd64) arch) for the CI development, but seems there is no Aarch64 computer for spark CI testing. Recently, I build and run test for spark(master and branch-2.4) on my arm server, and unfortunately there are some problems, for example, ut test is failed due to a LEVELDBJNI native package, the details for java test see http://paste.openstack.org/show/752063/ and python test see http://paste.openstack.org/show/752709/
>>>> So I have a question about the ARM CI testing for spark, is there any plan to support it? Thank you very much and I will wait for your reply!
>>>
>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
Reply | Threaded
Open this post in threaded view
|

Re: Ask for ARM CI for spark

shane knapp
...or via VM as you mentioned earlier.  :)

shane (who will file a JIRA tomorrow)

On Tue, Jun 25, 2019 at 6:44 PM shane knapp <[hidden email]> wrote:
i'd much prefer that we keep the test/build infrastructure in one place. 

we don't have ARM hardware, but there's a slim possibility i can scare something up in our older research stock...

another option would be to run the build in a arm-based docker container, which (according to the intarwebs) is possible.

shane

On Tue, Jun 25, 2019 at 6:35 PM Tianhua huang <[hidden email]> wrote:
I forked apache/spark project and propose a job(https://github.com/theopenlab/spark/pull/1) for spark building in OpenLab ARM instance, this is the first step to build spark on ARM,  I can enable a periodic job for arm building for apache/spark master if you guys like.  Later I will run tests for spark. I also willing to be the maintainer of the arm ci of spark. 

Thanks for you attention.

On Thu, Jun 20, 2019 at 10:17 AM Tianhua huang <[hidden email]> wrote:
Thanks Sean.

I am very happy to hear that the community will put effort to fix the ARM-related issues. I'd be happy to help if you like. And could you give the trace link of this issue, then I can check it is fixed or not, thank you. 
As far as I know the old versions of spark support ARM, and now the new versions don't, this just shows that we need a CI to check whether the spark support ARM and whether some modification break it. 
I will add a demo job in OpenLab to build spark on ARM and do a simple UT test. Later I will give the job link. 

Let me know what you think.

Thank you all!


On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <[hidden email]> wrote:
I'd begin by reporting and fixing ARM-related issues in the build. If
they're small, of course we should do them. If it requires significant
modifications, we can discuss how much Spark can support ARM. I don't
think it's yet necessary for the Spark project to run these CI builds
until that point, but it's always welcome if people are testing that
separately.

On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <[hidden email]> wrote:
>
> Moving to dev@ for increased visibility among the developers.
>
> On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <[hidden email]> wrote:
>>
>> Thanks for your reply.
>>
>> As I said before, I met some problem of build or test for spark on aarch64 server, so it will be better to have the ARM CI to make sure the spark is compatible for AArch64 platforms.
>>
>> I’m from OpenLab team(https://openlabtesting.org/ ,a community to do open source project testing. And we can support some Arm virtual machines to AMPLab Jenkins, and also we have a developer team that willing to work on this, we willing to maintain build CI jobs and address the CI issues.  What do you think?
>>
>>
>> Thanks for your attention.
>>
>>
>> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <[hidden email]> wrote:
>>>
>>> yeah, we don't have any aarch64 systems for testing...  this has been asked before but is currently pretty low on our priority list as we don't have the hardware.
>>>
>>> sorry,
>>>
>>> shane
>>>
>>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <[hidden email]> wrote:
>>>>
>>>> Hi, sorry to disturb you.
>>>> The CI testing for apache spark is supported by AMPLab Jenkins, and I find there are some computers(most of them are Linux (amd64) arch) for the CI development, but seems there is no Aarch64 computer for spark CI testing. Recently, I build and run test for spark(master and branch-2.4) on my arm server, and unfortunately there are some problems, for example, ut test is failed due to a LEVELDBJNI native package, the details for java test see http://paste.openstack.org/show/752063/ and python test see http://paste.openstack.org/show/752709/
>>>> So I have a question about the ARM CI testing for spark, is there any plan to support it? Thank you very much and I will wait for your reply!
>>>
>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
Reply | Threaded
Open this post in threaded view
|

Re: Ask for ARM CI for spark

Tianhua huang
Thanks Shane :)

This sounds good, and yes I agree that it's best to keep the test/build infrastructure in one place. If you can't find the ARM resource we are willing to support the ARM instance :)  Our goal is to make more open source software to be more compatible for aarch64 platform, so let's to do it. I will be happy if I can give some help for the goal.

Waiting for you good news :)

On Wed, Jun 26, 2019 at 9:47 AM shane knapp <[hidden email]> wrote:
...or via VM as you mentioned earlier.  :)

shane (who will file a JIRA tomorrow)

On Tue, Jun 25, 2019 at 6:44 PM shane knapp <[hidden email]> wrote:
i'd much prefer that we keep the test/build infrastructure in one place. 

we don't have ARM hardware, but there's a slim possibility i can scare something up in our older research stock...

another option would be to run the build in a arm-based docker container, which (according to the intarwebs) is possible.

shane

On Tue, Jun 25, 2019 at 6:35 PM Tianhua huang <[hidden email]> wrote:
I forked apache/spark project and propose a job(https://github.com/theopenlab/spark/pull/1) for spark building in OpenLab ARM instance, this is the first step to build spark on ARM,  I can enable a periodic job for arm building for apache/spark master if you guys like.  Later I will run tests for spark. I also willing to be the maintainer of the arm ci of spark. 

Thanks for you attention.

On Thu, Jun 20, 2019 at 10:17 AM Tianhua huang <[hidden email]> wrote:
Thanks Sean.

I am very happy to hear that the community will put effort to fix the ARM-related issues. I'd be happy to help if you like. And could you give the trace link of this issue, then I can check it is fixed or not, thank you. 
As far as I know the old versions of spark support ARM, and now the new versions don't, this just shows that we need a CI to check whether the spark support ARM and whether some modification break it. 
I will add a demo job in OpenLab to build spark on ARM and do a simple UT test. Later I will give the job link. 

Let me know what you think.

Thank you all!


On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <[hidden email]> wrote:
I'd begin by reporting and fixing ARM-related issues in the build. If
they're small, of course we should do them. If it requires significant
modifications, we can discuss how much Spark can support ARM. I don't
think it's yet necessary for the Spark project to run these CI builds
until that point, but it's always welcome if people are testing that
separately.

On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <[hidden email]> wrote:
>
> Moving to dev@ for increased visibility among the developers.
>
> On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <[hidden email]> wrote:
>>
>> Thanks for your reply.
>>
>> As I said before, I met some problem of build or test for spark on aarch64 server, so it will be better to have the ARM CI to make sure the spark is compatible for AArch64 platforms.
>>
>> I’m from OpenLab team(https://openlabtesting.org/ ,a community to do open source project testing. And we can support some Arm virtual machines to AMPLab Jenkins, and also we have a developer team that willing to work on this, we willing to maintain build CI jobs and address the CI issues.  What do you think?
>>
>>
>> Thanks for your attention.
>>
>>
>> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <[hidden email]> wrote:
>>>
>>> yeah, we don't have any aarch64 systems for testing...  this has been asked before but is currently pretty low on our priority list as we don't have the hardware.
>>>
>>> sorry,
>>>
>>> shane
>>>
>>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <[hidden email]> wrote:
>>>>
>>>> Hi, sorry to disturb you.
>>>> The CI testing for apache spark is supported by AMPLab Jenkins, and I find there are some computers(most of them are Linux (amd64) arch) for the CI development, but seems there is no Aarch64 computer for spark CI testing. Recently, I build and run test for spark(master and branch-2.4) on my arm server, and unfortunately there are some problems, for example, ut test is failed due to a LEVELDBJNI native package, the details for java test see http://paste.openstack.org/show/752063/ and python test see http://paste.openstack.org/show/752709/
>>>> So I have a question about the ARM CI testing for spark, is there any plan to support it? Thank you very much and I will wait for your reply!
>>>
>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
Reply | Threaded
Open this post in threaded view
|

Re: Ask for ARM CI for spark

Sean Owen-2
Can you begin by testing yourself? I think the first step is to make
sure the build and tests work on ARM. If you find problems you can
isolate them and try to fix them, or at least report them. It's only
worth getting CI in place when we think builds will work.

On Tue, Jun 25, 2019 at 9:26 PM Tianhua huang <[hidden email]> wrote:

>
> Thanks Shane :)
>
> This sounds good, and yes I agree that it's best to keep the test/build infrastructure in one place. If you can't find the ARM resource we are willing to support the ARM instance :)  Our goal is to make more open source software to be more compatible for aarch64 platform, so let's to do it. I will be happy if I can give some help for the goal.
>
> Waiting for you good news :)
>
> On Wed, Jun 26, 2019 at 9:47 AM shane knapp <[hidden email]> wrote:
>>
>> ...or via VM as you mentioned earlier.  :)
>>
>> shane (who will file a JIRA tomorrow)
>>
>> On Tue, Jun 25, 2019 at 6:44 PM shane knapp <[hidden email]> wrote:
>>>
>>> i'd much prefer that we keep the test/build infrastructure in one place.
>>>
>>> we don't have ARM hardware, but there's a slim possibility i can scare something up in our older research stock...
>>>
>>> another option would be to run the build in a arm-based docker container, which (according to the intarwebs) is possible.
>>>
>>> shane
>>>
>>> On Tue, Jun 25, 2019 at 6:35 PM Tianhua huang <[hidden email]> wrote:
>>>>
>>>> I forked apache/spark project and propose a job(https://github.com/theopenlab/spark/pull/1) for spark building in OpenLab ARM instance, this is the first step to build spark on ARM,  I can enable a periodic job for arm building for apache/spark master if you guys like.  Later I will run tests for spark. I also willing to be the maintainer of the arm ci of spark.
>>>>
>>>> Thanks for you attention.
>>>>
>>>> On Thu, Jun 20, 2019 at 10:17 AM Tianhua huang <[hidden email]> wrote:
>>>>>
>>>>> Thanks Sean.
>>>>>
>>>>> I am very happy to hear that the community will put effort to fix the ARM-related issues. I'd be happy to help if you like. And could you give the trace link of this issue, then I can check it is fixed or not, thank you.
>>>>> As far as I know the old versions of spark support ARM, and now the new versions don't, this just shows that we need a CI to check whether the spark support ARM and whether some modification break it.
>>>>> I will add a demo job in OpenLab to build spark on ARM and do a simple UT test. Later I will give the job link.
>>>>>
>>>>> Let me know what you think.
>>>>>
>>>>> Thank you all!
>>>>>
>>>>>
>>>>> On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <[hidden email]> wrote:
>>>>>>
>>>>>> I'd begin by reporting and fixing ARM-related issues in the build. If
>>>>>> they're small, of course we should do them. If it requires significant
>>>>>> modifications, we can discuss how much Spark can support ARM. I don't
>>>>>> think it's yet necessary for the Spark project to run these CI builds
>>>>>> until that point, but it's always welcome if people are testing that
>>>>>> separately.
>>>>>>
>>>>>> On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <[hidden email]> wrote:
>>>>>> >
>>>>>> > Moving to dev@ for increased visibility among the developers.
>>>>>> >
>>>>>> > On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <[hidden email]> wrote:
>>>>>> >>
>>>>>> >> Thanks for your reply.
>>>>>> >>
>>>>>> >> As I said before, I met some problem of build or test for spark on aarch64 server, so it will be better to have the ARM CI to make sure the spark is compatible for AArch64 platforms.
>>>>>> >>
>>>>>> >> I’m from OpenLab team(https://openlabtesting.org/ ,a community to do open source project testing. And we can support some Arm virtual machines to AMPLab Jenkins, and also we have a developer team that willing to work on this, we willing to maintain build CI jobs and address the CI issues.  What do you think?
>>>>>> >>
>>>>>> >>
>>>>>> >> Thanks for your attention.
>>>>>> >>
>>>>>> >>
>>>>>> >> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <[hidden email]> wrote:
>>>>>> >>>
>>>>>> >>> yeah, we don't have any aarch64 systems for testing...  this has been asked before but is currently pretty low on our priority list as we don't have the hardware.
>>>>>> >>>
>>>>>> >>> sorry,
>>>>>> >>>
>>>>>> >>> shane
>>>>>> >>>
>>>>>> >>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <[hidden email]> wrote:
>>>>>> >>>>
>>>>>> >>>> Hi, sorry to disturb you.
>>>>>> >>>> The CI testing for apache spark is supported by AMPLab Jenkins, and I find there are some computers(most of them are Linux (amd64) arch) for the CI development, but seems there is no Aarch64 computer for spark CI testing. Recently, I build and run test for spark(master and branch-2.4) on my arm server, and unfortunately there are some problems, for example, ut test is failed due to a LEVELDBJNI native package, the details for java test see http://paste.openstack.org/show/752063/ and python test see http://paste.openstack.org/show/752709/
>>>>>> >>>> So I have a question about the ARM CI testing for spark, is there any plan to support it? Thank you very much and I will wait for your reply!
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> --
>>>>>> >>> Shane Knapp
>>>>>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>> >>> https://rise.cs.berkeley.edu
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Twitter: https://twitter.com/holdenkarau
>>>>>> > Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
>>>>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>
>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Ask for ARM CI for spark

Tianhua huang
I took the ut tests on my arm instance before and reported an issue in https://issues.apache.org/jira/browse/SPARK-27721,  and seems there was no leveldbjni native package for aarch64 in leveldbjni-all.jar(or 1.8) https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8we can find https://github.com/fusesource/leveldbjni/pull/82 this pr added the aarch64 support and merged on 2 Nov 2017, but the latest release of the repo is  on 17 Oct 2013, unfortunately it didn't include the aarch64 supporting.
 
I will running the test on the job mentioned above, and will try to fix the issue above, or if anyone have any idea of it, welcome reply me, thank you.


On Wed, Jun 26, 2019 at 8:11 PM Sean Owen <[hidden email]> wrote:
Can you begin by testing yourself? I think the first step is to make
sure the build and tests work on ARM. If you find problems you can
isolate them and try to fix them, or at least report them. It's only
worth getting CI in place when we think builds will work.

On Tue, Jun 25, 2019 at 9:26 PM Tianhua huang <[hidden email]> wrote:
>
> Thanks Shane :)
>
> This sounds good, and yes I agree that it's best to keep the test/build infrastructure in one place. If you can't find the ARM resource we are willing to support the ARM instance :)  Our goal is to make more open source software to be more compatible for aarch64 platform, so let's to do it. I will be happy if I can give some help for the goal.
>
> Waiting for you good news :)
>
> On Wed, Jun 26, 2019 at 9:47 AM shane knapp <[hidden email]> wrote:
>>
>> ...or via VM as you mentioned earlier.  :)
>>
>> shane (who will file a JIRA tomorrow)
>>
>> On Tue, Jun 25, 2019 at 6:44 PM shane knapp <[hidden email]> wrote:
>>>
>>> i'd much prefer that we keep the test/build infrastructure in one place.
>>>
>>> we don't have ARM hardware, but there's a slim possibility i can scare something up in our older research stock...
>>>
>>> another option would be to run the build in a arm-based docker container, which (according to the intarwebs) is possible.
>>>
>>> shane
>>>
>>> On Tue, Jun 25, 2019 at 6:35 PM Tianhua huang <[hidden email]> wrote:
>>>>
>>>> I forked apache/spark project and propose a job(https://github.com/theopenlab/spark/pull/1) for spark building in OpenLab ARM instance, this is the first step to build spark on ARM,  I can enable a periodic job for arm building for apache/spark master if you guys like.  Later I will run tests for spark. I also willing to be the maintainer of the arm ci of spark.
>>>>
>>>> Thanks for you attention.
>>>>
>>>> On Thu, Jun 20, 2019 at 10:17 AM Tianhua huang <[hidden email]> wrote:
>>>>>
>>>>> Thanks Sean.
>>>>>
>>>>> I am very happy to hear that the community will put effort to fix the ARM-related issues. I'd be happy to help if you like. And could you give the trace link of this issue, then I can check it is fixed or not, thank you.
>>>>> As far as I know the old versions of spark support ARM, and now the new versions don't, this just shows that we need a CI to check whether the spark support ARM and whether some modification break it.
>>>>> I will add a demo job in OpenLab to build spark on ARM and do a simple UT test. Later I will give the job link.
>>>>>
>>>>> Let me know what you think.
>>>>>
>>>>> Thank you all!
>>>>>
>>>>>
>>>>> On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <[hidden email]> wrote:
>>>>>>
>>>>>> I'd begin by reporting and fixing ARM-related issues in the build. If
>>>>>> they're small, of course we should do them. If it requires significant
>>>>>> modifications, we can discuss how much Spark can support ARM. I don't
>>>>>> think it's yet necessary for the Spark project to run these CI builds
>>>>>> until that point, but it's always welcome if people are testing that
>>>>>> separately.
>>>>>>
>>>>>> On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <[hidden email]> wrote:
>>>>>> >
>>>>>> > Moving to dev@ for increased visibility among the developers.
>>>>>> >
>>>>>> > On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <[hidden email]> wrote:
>>>>>> >>
>>>>>> >> Thanks for your reply.
>>>>>> >>
>>>>>> >> As I said before, I met some problem of build or test for spark on aarch64 server, so it will be better to have the ARM CI to make sure the spark is compatible for AArch64 platforms.
>>>>>> >>
>>>>>> >> I’m from OpenLab team(https://openlabtesting.org/ ,a community to do open source project testing. And we can support some Arm virtual machines to AMPLab Jenkins, and also we have a developer team that willing to work on this, we willing to maintain build CI jobs and address the CI issues.  What do you think?
>>>>>> >>
>>>>>> >>
>>>>>> >> Thanks for your attention.
>>>>>> >>
>>>>>> >>
>>>>>> >> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <[hidden email]> wrote:
>>>>>> >>>
>>>>>> >>> yeah, we don't have any aarch64 systems for testing...  this has been asked before but is currently pretty low on our priority list as we don't have the hardware.
>>>>>> >>>
>>>>>> >>> sorry,
>>>>>> >>>
>>>>>> >>> shane
>>>>>> >>>
>>>>>> >>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <[hidden email]> wrote:
>>>>>> >>>>
>>>>>> >>>> Hi, sorry to disturb you.
>>>>>> >>>> The CI testing for apache spark is supported by AMPLab Jenkins, and I find there are some computers(most of them are Linux (amd64) arch) for the CI development, but seems there is no Aarch64 computer for spark CI testing. Recently, I build and run test for spark(master and branch-2.4) on my arm server, and unfortunately there are some problems, for example, ut test is failed due to a LEVELDBJNI native package, the details for java test see http://paste.openstack.org/show/752063/ and python test see http://paste.openstack.org/show/752709/
>>>>>> >>>> So I have a question about the ARM CI testing for spark, is there any plan to support it? Thank you very much and I will wait for your reply!
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> --
>>>>>> >>> Shane Knapp
>>>>>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>> >>> https://rise.cs.berkeley.edu
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Twitter: https://twitter.com/holdenkarau
>>>>>> > Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
>>>>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>
>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
Reply | Threaded
Open this post in threaded view
|

Re: Ask for ARM CI for spark

Steve Loughran-2
level db and native codecs are invariably a problem here, as is anything else doing misaligned IO. Protobuf has also had "issues" in the past


I think any AA64 work is going to have to define very clearly what "works" is defined as; spark standalone with a specific set of codecs is probably the first thing to aim for -no Snappy or lz4. 

Anything which goes near: protobuf, checksums, native code, etc is in trouble. Don't try and deploy with HDFS as the cluster FS, would be my recommendation. 

If you want a cluster use NFS or one of google GCS, Azure WASB for the cluster FS. And before trying either of those cloud store, run the filesystem connector test suites (hadoop-azure; google gcs github) to see that they work. If the foundational FS test suites fail, nothing else will work



On Thu, Jun 27, 2019 at 3:09 AM Tianhua huang <[hidden email]> wrote:
I took the ut tests on my arm instance before and reported an issue in https://issues.apache.org/jira/browse/SPARK-27721,  and seems there was no leveldbjni native package for aarch64 in leveldbjni-all.jar(or 1.8) https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8we can find https://github.com/fusesource/leveldbjni/pull/82 this pr added the aarch64 support and merged on 2 Nov 2017, but the latest release of the repo is  on 17 Oct 2013, unfortunately it didn't include the aarch64 supporting.
 
I will running the test on the job mentioned above, and will try to fix the issue above, or if anyone have any idea of it, welcome reply me, thank you.


On Wed, Jun 26, 2019 at 8:11 PM Sean Owen <[hidden email]> wrote:
Can you begin by testing yourself? I think the first step is to make
sure the build and tests work on ARM. If you find problems you can
isolate them and try to fix them, or at least report them. It's only
worth getting CI in place when we think builds will work.

On Tue, Jun 25, 2019 at 9:26 PM Tianhua huang <[hidden email]> wrote:
>
> Thanks Shane :)
>
> This sounds good, and yes I agree that it's best to keep the test/build infrastructure in one place. If you can't find the ARM resource we are willing to support the ARM instance :)  Our goal is to make more open source software to be more compatible for aarch64 platform, so let's to do it. I will be happy if I can give some help for the goal.
>
> Waiting for you good news :)
>
> On Wed, Jun 26, 2019 at 9:47 AM shane knapp <[hidden email]> wrote:
>>
>> ...or via VM as you mentioned earlier.  :)
>>
>> shane (who will file a JIRA tomorrow)
>>
>> On Tue, Jun 25, 2019 at 6:44 PM shane knapp <[hidden email]> wrote:
>>>
>>> i'd much prefer that we keep the test/build infrastructure in one place.
>>>
>>> we don't have ARM hardware, but there's a slim possibility i can scare something up in our older research stock...
>>>
>>> another option would be to run the build in a arm-based docker container, which (according to the intarwebs) is possible.
>>>
>>> shane
>>>
>>> On Tue, Jun 25, 2019 at 6:35 PM Tianhua huang <[hidden email]> wrote:
>>>>
>>>> I forked apache/spark project and propose a job(https://github.com/theopenlab/spark/pull/1) for spark building in OpenLab ARM instance, this is the first step to build spark on ARM,  I can enable a periodic job for arm building for apache/spark master if you guys like.  Later I will run tests for spark. I also willing to be the maintainer of the arm ci of spark.
>>>>
>>>> Thanks for you attention.
>>>>
>>>> On Thu, Jun 20, 2019 at 10:17 AM Tianhua huang <[hidden email]> wrote:
>>>>>
>>>>> Thanks Sean.
>>>>>
>>>>> I am very happy to hear that the community will put effort to fix the ARM-related issues. I'd be happy to help if you like. And could you give the trace link of this issue, then I can check it is fixed or not, thank you.
>>>>> As far as I know the old versions of spark support ARM, and now the new versions don't, this just shows that we need a CI to check whether the spark support ARM and whether some modification break it.
>>>>> I will add a demo job in OpenLab to build spark on ARM and do a simple UT test. Later I will give the job link.
>>>>>
>>>>> Let me know what you think.
>>>>>
>>>>> Thank you all!
>>>>>
>>>>>
>>>>> On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <[hidden email]> wrote:
>>>>>>
>>>>>> I'd begin by reporting and fixing ARM-related issues in the build. If
>>>>>> they're small, of course we should do them. If it requires significant
>>>>>> modifications, we can discuss how much Spark can support ARM. I don't
>>>>>> think it's yet necessary for the Spark project to run these CI builds
>>>>>> until that point, but it's always welcome if people are testing that
>>>>>> separately.
>>>>>>
>>>>>> On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <[hidden email]> wrote:
>>>>>> >
>>>>>> > Moving to dev@ for increased visibility among the developers.
>>>>>> >
>>>>>> > On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <[hidden email]> wrote:
>>>>>> >>
>>>>>> >> Thanks for your reply.
>>>>>> >>
>>>>>> >> As I said before, I met some problem of build or test for spark on aarch64 server, so it will be better to have the ARM CI to make sure the spark is compatible for AArch64 platforms.
>>>>>> >>
>>>>>> >> I’m from OpenLab team(https://openlabtesting.org/ ,a community to do open source project testing. And we can support some Arm virtual machines to AMPLab Jenkins, and also we have a developer team that willing to work on this, we willing to maintain build CI jobs and address the CI issues.  What do you think?
>>>>>> >>
>>>>>> >>
>>>>>> >> Thanks for your attention.
>>>>>> >>
>>>>>> >>
>>>>>> >> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <[hidden email]> wrote:
>>>>>> >>>
>>>>>> >>> yeah, we don't have any aarch64 systems for testing...  this has been asked before but is currently pretty low on our priority list as we don't have the hardware.
>>>>>> >>>
>>>>>> >>> sorry,
>>>>>> >>>
>>>>>> >>> shane
>>>>>> >>>
>>>>>> >>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <[hidden email]> wrote:
>>>>>> >>>>
>>>>>> >>>> Hi, sorry to disturb you.
>>>>>> >>>> The CI testing for apache spark is supported by AMPLab Jenkins, and I find there are some computers(most of them are Linux (amd64) arch) for the CI development, but seems there is no Aarch64 computer for spark CI testing. Recently, I build and run test for spark(master and branch-2.4) on my arm server, and unfortunately there are some problems, for example, ut test is failed due to a LEVELDBJNI native package, the details for java test see http://paste.openstack.org/show/752063/ and python test see http://paste.openstack.org/show/752709/
>>>>>> >>>> So I have a question about the ARM CI testing for spark, is there any plan to support it? Thank you very much and I will wait for your reply!
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> --
>>>>>> >>> Shane Knapp
>>>>>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>> >>> https://rise.cs.berkeley.edu
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Twitter: https://twitter.com/holdenkarau
>>>>>> > Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
>>>>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>
>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
Reply | Threaded
Open this post in threaded view
|

Re: Ask for ARM CI for spark

Rui Chen
>  I think any AA64 work is going to have to define very clearly what "works" is defined as

+1
It's very valuable to build a clear scope of these projects functionality for ARM platform in upstream community, it bring confidence to end user and customers when they plan to deploy these projects on ARM.

This is absolute long term work, let's to make it step by step, CI, testing, issue and resolving.

Steve Loughran <[hidden email]> 于2019年6月27日周四 下午9:22写道:
level db and native codecs are invariably a problem here, as is anything else doing misaligned IO. Protobuf has also had "issues" in the past


I think any AA64 work is going to have to define very clearly what "works" is defined as; spark standalone with a specific set of codecs is probably the first thing to aim for -no Snappy or lz4. 

Anything which goes near: protobuf, checksums, native code, etc is in trouble. Don't try and deploy with HDFS as the cluster FS, would be my recommendation. 

If you want a cluster use NFS or one of google GCS, Azure WASB for the cluster FS. And before trying either of those cloud store, run the filesystem connector test suites (hadoop-azure; google gcs github) to see that they work. If the foundational FS test suites fail, nothing else will work



On Thu, Jun 27, 2019 at 3:09 AM Tianhua huang <[hidden email]> wrote:
I took the ut tests on my arm instance before and reported an issue in https://issues.apache.org/jira/browse/SPARK-27721,  and seems there was no leveldbjni native package for aarch64 in leveldbjni-all.jar(or 1.8) https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8we can find https://github.com/fusesource/leveldbjni/pull/82 this pr added the aarch64 support and merged on 2 Nov 2017, but the latest release of the repo is  on 17 Oct 2013, unfortunately it didn't include the aarch64 supporting.
 
I will running the test on the job mentioned above, and will try to fix the issue above, or if anyone have any idea of it, welcome reply me, thank you.


On Wed, Jun 26, 2019 at 8:11 PM Sean Owen <[hidden email]> wrote:
Can you begin by testing yourself? I think the first step is to make
sure the build and tests work on ARM. If you find problems you can
isolate them and try to fix them, or at least report them. It's only
worth getting CI in place when we think builds will work.

On Tue, Jun 25, 2019 at 9:26 PM Tianhua huang <[hidden email]> wrote:
>
> Thanks Shane :)
>
> This sounds good, and yes I agree that it's best to keep the test/build infrastructure in one place. If you can't find the ARM resource we are willing to support the ARM instance :)  Our goal is to make more open source software to be more compatible for aarch64 platform, so let's to do it. I will be happy if I can give some help for the goal.
>
> Waiting for you good news :)
>
> On Wed, Jun 26, 2019 at 9:47 AM shane knapp <[hidden email]> wrote:
>>
>> ...or via VM as you mentioned earlier.  :)
>>
>> shane (who will file a JIRA tomorrow)
>>
>> On Tue, Jun 25, 2019 at 6:44 PM shane knapp <[hidden email]> wrote:
>>>
>>> i'd much prefer that we keep the test/build infrastructure in one place.
>>>
>>> we don't have ARM hardware, but there's a slim possibility i can scare something up in our older research stock...
>>>
>>> another option would be to run the build in a arm-based docker container, which (according to the intarwebs) is possible.
>>>
>>> shane
>>>
>>> On Tue, Jun 25, 2019 at 6:35 PM Tianhua huang <[hidden email]> wrote:
>>>>
>>>> I forked apache/spark project and propose a job(https://github.com/theopenlab/spark/pull/1) for spark building in OpenLab ARM instance, this is the first step to build spark on ARM,  I can enable a periodic job for arm building for apache/spark master if you guys like.  Later I will run tests for spark. I also willing to be the maintainer of the arm ci of spark.
>>>>
>>>> Thanks for you attention.
>>>>
>>>> On Thu, Jun 20, 2019 at 10:17 AM Tianhua huang <[hidden email]> wrote:
>>>>>
>>>>> Thanks Sean.
>>>>>
>>>>> I am very happy to hear that the community will put effort to fix the ARM-related issues. I'd be happy to help if you like. And could you give the trace link of this issue, then I can check it is fixed or not, thank you.
>>>>> As far as I know the old versions of spark support ARM, and now the new versions don't, this just shows that we need a CI to check whether the spark support ARM and whether some modification break it.
>>>>> I will add a demo job in OpenLab to build spark on ARM and do a simple UT test. Later I will give the job link.
>>>>>
>>>>> Let me know what you think.
>>>>>
>>>>> Thank you all!
>>>>>
>>>>>
>>>>> On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <[hidden email]> wrote:
>>>>>>
>>>>>> I'd begin by reporting and fixing ARM-related issues in the build. If
>>>>>> they're small, of course we should do them. If it requires significant
>>>>>> modifications, we can discuss how much Spark can support ARM. I don't
>>>>>> think it's yet necessary for the Spark project to run these CI builds
>>>>>> until that point, but it's always welcome if people are testing that
>>>>>> separately.
>>>>>>
>>>>>> On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <[hidden email]> wrote:
>>>>>> >
>>>>>> > Moving to dev@ for increased visibility among the developers.
>>>>>> >
>>>>>> > On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <[hidden email]> wrote:
>>>>>> >>
>>>>>> >> Thanks for your reply.
>>>>>> >>
>>>>>> >> As I said before, I met some problem of build or test for spark on aarch64 server, so it will be better to have the ARM CI to make sure the spark is compatible for AArch64 platforms.
>>>>>> >>
>>>>>> >> I’m from OpenLab team(https://openlabtesting.org/ ,a community to do open source project testing. And we can support some Arm virtual machines to AMPLab Jenkins, and also we have a developer team that willing to work on this, we willing to maintain build CI jobs and address the CI issues.  What do you think?
>>>>>> >>
>>>>>> >>
>>>>>> >> Thanks for your attention.
>>>>>> >>
>>>>>> >>
>>>>>> >> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <[hidden email]> wrote:
>>>>>> >>>
>>>>>> >>> yeah, we don't have any aarch64 systems for testing...  this has been asked before but is currently pretty low on our priority list as we don't have the hardware.
>>>>>> >>>
>>>>>> >>> sorry,
>>>>>> >>>
>>>>>> >>> shane
>>>>>> >>>
>>>>>> >>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <[hidden email]> wrote:
>>>>>> >>>>
>>>>>> >>>> Hi, sorry to disturb you.
>>>>>> >>>> The CI testing for apache spark is supported by AMPLab Jenkins, and I find there are some computers(most of them are Linux (amd64) arch) for the CI development, but seems there is no Aarch64 computer for spark CI testing. Recently, I build and run test for spark(master and branch-2.4) on my arm server, and unfortunately there are some problems, for example, ut test is failed due to a LEVELDBJNI native package, the details for java test see http://paste.openstack.org/show/752063/ and python test see http://paste.openstack.org/show/752709/
>>>>>> >>>> So I have a question about the ARM CI testing for spark, is there any plan to support it? Thank you very much and I will wait for your reply!
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> --
>>>>>> >>> Shane Knapp
>>>>>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>> >>> https://rise.cs.berkeley.edu
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Twitter: https://twitter.com/holdenkarau
>>>>>> > Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
>>>>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>
>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
Reply | Threaded
Open this post in threaded view
|

Re: Ask for ARM CI for spark

Steve Loughran-2

Be interesting to see how well a Pi4 works; with only 4GB of RAM you wouldn't compile with it, but you could try installing the spark jar bundle and then run against some NFS mounted disks: https://www.raspberrypi.org/magpi/raspberry-pi-4-specs-benchmarks/ ; unlikely to be fast, but it'd be an efficient kind of slow

On Fri, Jun 28, 2019 at 3:08 AM Rui Chen <[hidden email]> wrote:
>  I think any AA64 work is going to have to define very clearly what "works" is defined as

+1
It's very valuable to build a clear scope of these projects functionality for ARM platform in upstream community, it bring confidence to end user and customers when they plan to deploy these projects on ARM.

This is absolute long term work, let's to make it step by step, CI, testing, issue and resolving.

Steve Loughran <[hidden email]> 于2019年6月27日周四 下午9:22写道:
level db and native codecs are invariably a problem here, as is anything else doing misaligned IO. Protobuf has also had "issues" in the past


I think any AA64 work is going to have to define very clearly what "works" is defined as; spark standalone with a specific set of codecs is probably the first thing to aim for -no Snappy or lz4. 

Anything which goes near: protobuf, checksums, native code, etc is in trouble. Don't try and deploy with HDFS as the cluster FS, would be my recommendation. 

If you want a cluster use NFS or one of google GCS, Azure WASB for the cluster FS. And before trying either of those cloud store, run the filesystem connector test suites (hadoop-azure; google gcs github) to see that they work. If the foundational FS test suites fail, nothing else will work



On Thu, Jun 27, 2019 at 3:09 AM Tianhua huang <[hidden email]> wrote:
I took the ut tests on my arm instance before and reported an issue in https://issues.apache.org/jira/browse/SPARK-27721,  and seems there was no leveldbjni native package for aarch64 in leveldbjni-all.jar(or 1.8) https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8we can find https://github.com/fusesource/leveldbjni/pull/82 this pr added the aarch64 support and merged on 2 Nov 2017, but the latest release of the repo is  on 17 Oct 2013, unfortunately it didn't include the aarch64 supporting.
 
I will running the test on the job mentioned above, and will try to fix the issue above, or if anyone have any idea of it, welcome reply me, thank you.


On Wed, Jun 26, 2019 at 8:11 PM Sean Owen <[hidden email]> wrote:
Can you begin by testing yourself? I think the first step is to make
sure the build and tests work on ARM. If you find problems you can
isolate them and try to fix them, or at least report them. It's only
worth getting CI in place when we think builds will work.

On Tue, Jun 25, 2019 at 9:26 PM Tianhua huang <[hidden email]> wrote:
>
> Thanks Shane :)
>
> This sounds good, and yes I agree that it's best to keep the test/build infrastructure in one place. If you can't find the ARM resource we are willing to support the ARM instance :)  Our goal is to make more open source software to be more compatible for aarch64 platform, so let's to do it. I will be happy if I can give some help for the goal.
>
> Waiting for you good news :)
>
> On Wed, Jun 26, 2019 at 9:47 AM shane knapp <[hidden email]> wrote:
>>
>> ...or via VM as you mentioned earlier.  :)
>>
>> shane (who will file a JIRA tomorrow)
>>
>> On Tue, Jun 25, 2019 at 6:44 PM shane knapp <[hidden email]> wrote:
>>>
>>> i'd much prefer that we keep the test/build infrastructure in one place.
>>>
>>> we don't have ARM hardware, but there's a slim possibility i can scare something up in our older research stock...
>>>
>>> another option would be to run the build in a arm-based docker container, which (according to the intarwebs) is possible.
>>>
>>> shane
>>>
>>> On Tue, Jun 25, 2019 at 6:35 PM Tianhua huang <[hidden email]> wrote:
>>>>
>>>> I forked apache/spark project and propose a job(https://github.com/theopenlab/spark/pull/1) for spark building in OpenLab ARM instance, this is the first step to build spark on ARM,  I can enable a periodic job for arm building for apache/spark master if you guys like.  Later I will run tests for spark. I also willing to be the maintainer of the arm ci of spark.
>>>>
>>>> Thanks for you attention.
>>>>
>>>> On Thu, Jun 20, 2019 at 10:17 AM Tianhua huang <[hidden email]> wrote:
>>>>>
>>>>> Thanks Sean.
>>>>>
>>>>> I am very happy to hear that the community will put effort to fix the ARM-related issues. I'd be happy to help if you like. And could you give the trace link of this issue, then I can check it is fixed or not, thank you.
>>>>> As far as I know the old versions of spark support ARM, and now the new versions don't, this just shows that we need a CI to check whether the spark support ARM and whether some modification break it.
>>>>> I will add a demo job in OpenLab to build spark on ARM and do a simple UT test. Later I will give the job link.
>>>>>
>>>>> Let me know what you think.
>>>>>
>>>>> Thank you all!
>>>>>
>>>>>
>>>>> On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <[hidden email]> wrote:
>>>>>>
>>>>>> I'd begin by reporting and fixing ARM-related issues in the build. If
>>>>>> they're small, of course we should do them. If it requires significant
>>>>>> modifications, we can discuss how much Spark can support ARM. I don't
>>>>>> think it's yet necessary for the Spark project to run these CI builds
>>>>>> until that point, but it's always welcome if people are testing that
>>>>>> separately.
>>>>>>
>>>>>> On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <[hidden email]> wrote:
>>>>>> >
>>>>>> > Moving to dev@ for increased visibility among the developers.
>>>>>> >
>>>>>> > On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <[hidden email]> wrote:
>>>>>> >>
>>>>>> >> Thanks for your reply.
>>>>>> >>
>>>>>> >> As I said before, I met some problem of build or test for spark on aarch64 server, so it will be better to have the ARM CI to make sure the spark is compatible for AArch64 platforms.
>>>>>> >>
>>>>>> >> I’m from OpenLab team(https://openlabtesting.org/ ,a community to do open source project testing. And we can support some Arm virtual machines to AMPLab Jenkins, and also we have a developer team that willing to work on this, we willing to maintain build CI jobs and address the CI issues.  What do you think?
>>>>>> >>
>>>>>> >>
>>>>>> >> Thanks for your attention.
>>>>>> >>
>>>>>> >>
>>>>>> >> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <[hidden email]> wrote:
>>>>>> >>>
>>>>>> >>> yeah, we don't have any aarch64 systems for testing...  this has been asked before but is currently pretty low on our priority list as we don't have the hardware.
>>>>>> >>>
>>>>>> >>> sorry,
>>>>>> >>>
>>>>>> >>> shane
>>>>>> >>>
>>>>>> >>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <[hidden email]> wrote:
>>>>>> >>>>
>>>>>> >>>> Hi, sorry to disturb you.
>>>>>> >>>> The CI testing for apache spark is supported by AMPLab Jenkins, and I find there are some computers(most of them are Linux (amd64) arch) for the CI development, but seems there is no Aarch64 computer for spark CI testing. Recently, I build and run test for spark(master and branch-2.4) on my arm server, and unfortunately there are some problems, for example, ut test is failed due to a LEVELDBJNI native package, the details for java test see http://paste.openstack.org/show/752063/ and python test see http://paste.openstack.org/show/752709/
>>>>>> >>>> So I have a question about the ARM CI testing for spark, is there any plan to support it? Thank you very much and I will wait for your reply!
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> --
>>>>>> >>> Shane Knapp
>>>>>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>> >>> https://rise.cs.berkeley.edu
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Twitter: https://twitter.com/holdenkarau
>>>>>> > Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
>>>>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>
>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
Reply | Threaded
Open this post in threaded view
|

Re: Ask for ARM CI for spark

Tianhua huang
We are focus on the arm instance of cloud, and now I use the arm instance of vexxhost cloud to run the build job which mentioned above, the specification of the arm instance is 8VCPU and 8GB of RAM, 
and we can use bigger flavor to create the arm instance to run the job, if need be. 

On Fri, Jun 28, 2019 at 6:55 PM Steve Loughran <[hidden email]> wrote:

Be interesting to see how well a Pi4 works; with only 4GB of RAM you wouldn't compile with it, but you could try installing the spark jar bundle and then run against some NFS mounted disks: https://www.raspberrypi.org/magpi/raspberry-pi-4-specs-benchmarks/ ; unlikely to be fast, but it'd be an efficient kind of slow

On Fri, Jun 28, 2019 at 3:08 AM Rui Chen <[hidden email]> wrote:
>  I think any AA64 work is going to have to define very clearly what "works" is defined as

+1
It's very valuable to build a clear scope of these projects functionality for ARM platform in upstream community, it bring confidence to end user and customers when they plan to deploy these projects on ARM.

This is absolute long term work, let's to make it step by step, CI, testing, issue and resolving.

Steve Loughran <[hidden email]> 于2019年6月27日周四 下午9:22写道:
level db and native codecs are invariably a problem here, as is anything else doing misaligned IO. Protobuf has also had "issues" in the past


I think any AA64 work is going to have to define very clearly what "works" is defined as; spark standalone with a specific set of codecs is probably the first thing to aim for -no Snappy or lz4. 

Anything which goes near: protobuf, checksums, native code, etc is in trouble. Don't try and deploy with HDFS as the cluster FS, would be my recommendation. 

If you want a cluster use NFS or one of google GCS, Azure WASB for the cluster FS. And before trying either of those cloud store, run the filesystem connector test suites (hadoop-azure; google gcs github) to see that they work. If the foundational FS test suites fail, nothing else will work



On Thu, Jun 27, 2019 at 3:09 AM Tianhua huang <[hidden email]> wrote:
I took the ut tests on my arm instance before and reported an issue in https://issues.apache.org/jira/browse/SPARK-27721,  and seems there was no leveldbjni native package for aarch64 in leveldbjni-all.jar(or 1.8) https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8we can find https://github.com/fusesource/leveldbjni/pull/82 this pr added the aarch64 support and merged on 2 Nov 2017, but the latest release of the repo is  on 17 Oct 2013, unfortunately it didn't include the aarch64 supporting.
 
I will running the test on the job mentioned above, and will try to fix the issue above, or if anyone have any idea of it, welcome reply me, thank you.


On Wed, Jun 26, 2019 at 8:11 PM Sean Owen <[hidden email]> wrote:
Can you begin by testing yourself? I think the first step is to make
sure the build and tests work on ARM. If you find problems you can
isolate them and try to fix them, or at least report them. It's only
worth getting CI in place when we think builds will work.

On Tue, Jun 25, 2019 at 9:26 PM Tianhua huang <[hidden email]> wrote:
>
> Thanks Shane :)
>
> This sounds good, and yes I agree that it's best to keep the test/build infrastructure in one place. If you can't find the ARM resource we are willing to support the ARM instance :)  Our goal is to make more open source software to be more compatible for aarch64 platform, so let's to do it. I will be happy if I can give some help for the goal.
>
> Waiting for you good news :)
>
> On Wed, Jun 26, 2019 at 9:47 AM shane knapp <[hidden email]> wrote:
>>
>> ...or via VM as you mentioned earlier.  :)
>>
>> shane (who will file a JIRA tomorrow)
>>
>> On Tue, Jun 25, 2019 at 6:44 PM shane knapp <[hidden email]> wrote:
>>>
>>> i'd much prefer that we keep the test/build infrastructure in one place.
>>>
>>> we don't have ARM hardware, but there's a slim possibility i can scare something up in our older research stock...
>>>
>>> another option would be to run the build in a arm-based docker container, which (according to the intarwebs) is possible.
>>>
>>> shane
>>>
>>> On Tue, Jun 25, 2019 at 6:35 PM Tianhua huang <[hidden email]> wrote:
>>>>
>>>> I forked apache/spark project and propose a job(https://github.com/theopenlab/spark/pull/1) for spark building in OpenLab ARM instance, this is the first step to build spark on ARM,  I can enable a periodic job for arm building for apache/spark master if you guys like.  Later I will run tests for spark. I also willing to be the maintainer of the arm ci of spark.
>>>>
>>>> Thanks for you attention.
>>>>
>>>> On Thu, Jun 20, 2019 at 10:17 AM Tianhua huang <[hidden email]> wrote:
>>>>>
>>>>> Thanks Sean.
>>>>>
>>>>> I am very happy to hear that the community will put effort to fix the ARM-related issues. I'd be happy to help if you like. And could you give the trace link of this issue, then I can check it is fixed or not, thank you.
>>>>> As far as I know the old versions of spark support ARM, and now the new versions don't, this just shows that we need a CI to check whether the spark support ARM and whether some modification break it.
>>>>> I will add a demo job in OpenLab to build spark on ARM and do a simple UT test. Later I will give the job link.
>>>>>
>>>>> Let me know what you think.
>>>>>
>>>>> Thank you all!
>>>>>
>>>>>
>>>>> On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <[hidden email]> wrote:
>>>>>>
>>>>>> I'd begin by reporting and fixing ARM-related issues in the build. If
>>>>>> they're small, of course we should do them. If it requires significant
>>>>>> modifications, we can discuss how much Spark can support ARM. I don't
>>>>>> think it's yet necessary for the Spark project to run these CI builds
>>>>>> until that point, but it's always welcome if people are testing that
>>>>>> separately.
>>>>>>
>>>>>> On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <[hidden email]> wrote:
>>>>>> >
>>>>>> > Moving to dev@ for increased visibility among the developers.
>>>>>> >
>>>>>> > On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <[hidden email]> wrote:
>>>>>> >>
>>>>>> >> Thanks for your reply.
>>>>>> >>
>>>>>> >> As I said before, I met some problem of build or test for spark on aarch64 server, so it will be better to have the ARM CI to make sure the spark is compatible for AArch64 platforms.
>>>>>> >>
>>>>>> >> I’m from OpenLab team(https://openlabtesting.org/ ,a community to do open source project testing. And we can support some Arm virtual machines to AMPLab Jenkins, and also we have a developer team that willing to work on this, we willing to maintain build CI jobs and address the CI issues.  What do you think?
>>>>>> >>
>>>>>> >>
>>>>>> >> Thanks for your attention.
>>>>>> >>
>>>>>> >>
>>>>>> >> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <[hidden email]> wrote:
>>>>>> >>>
>>>>>> >>> yeah, we don't have any aarch64 systems for testing...  this has been asked before but is currently pretty low on our priority list as we don't have the hardware.
>>>>>> >>>
>>>>>> >>> sorry,
>>>>>> >>>
>>>>>> >>> shane
>>>>>> >>>
>>>>>> >>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <[hidden email]> wrote:
>>>>>> >>>>
>>>>>> >>>> Hi, sorry to disturb you.
>>>>>> >>>> The CI testing for apache spark is supported by AMPLab Jenkins, and I find there are some computers(most of them are Linux (amd64) arch) for the CI development, but seems there is no Aarch64 computer for spark CI testing. Recently, I build and run test for spark(master and branch-2.4) on my arm server, and unfortunately there are some problems, for example, ut test is failed due to a LEVELDBJNI native package, the details for java test see http://paste.openstack.org/show/752063/ and python test see http://paste.openstack.org/show/752709/
>>>>>> >>>> So I have a question about the ARM CI testing for spark, is there any plan to support it? Thank you very much and I will wait for your reply!
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> --
>>>>>> >>> Shane Knapp
>>>>>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>> >>> https://rise.cs.berkeley.edu
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Twitter: https://twitter.com/holdenkarau
>>>>>> > Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
>>>>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>
>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
Reply | Threaded
Open this post in threaded view
|

Re: Ask for ARM CI for spark

Tianhua huang
Hi all,

I am glad to tell you there is a new progress of build/test spark on aarch64 server, the tests are running, see the build/test detail log https://logs.openlabtesting.org/logs/1/1/419fcb11764048d5a3cda186ea76dd43249e1f97/check/spark-build-arm64/75cc6f5/job-output.txt.gz and the aarch64 instance info see https://logs.openlabtesting.org/logs/1/1/419fcb11764048d5a3cda186ea76dd43249e1f97/check/spark-build-arm64/75cc6f5/zuul-info/zuul-info.ubuntu-xenial-arm64.txt In order to enable the test, I made some modification, the major one is to build leveldbjni local package, I forked fusesource/leveldbjni and chirino/leveldb repos, and made some modification to make sure to build the local package, see https://github.com/huangtianhua/leveldbjni/pull/1 and https://github.com/huangtianhua/leveldbjni/pull/2 , then to use it in spark, the detail you can find in https://github.com/theopenlab/spark/pull/1 

Now the tests are not all successful, I will try to fix it and any suggestion is welcome, thank you all. 

On Mon, Jul 1, 2019 at 5:25 PM Tianhua huang <[hidden email]> wrote:
We are focus on the arm instance of cloud, and now I use the arm instance of vexxhost cloud to run the build job which mentioned above, the specification of the arm instance is 8VCPU and 8GB of RAM, 
and we can use bigger flavor to create the arm instance to run the job, if need be. 

On Fri, Jun 28, 2019 at 6:55 PM Steve Loughran <[hidden email]> wrote:

Be interesting to see how well a Pi4 works; with only 4GB of RAM you wouldn't compile with it, but you could try installing the spark jar bundle and then run against some NFS mounted disks: https://www.raspberrypi.org/magpi/raspberry-pi-4-specs-benchmarks/ ; unlikely to be fast, but it'd be an efficient kind of slow

On Fri, Jun 28, 2019 at 3:08 AM Rui Chen <[hidden email]> wrote:
>  I think any AA64 work is going to have to define very clearly what "works" is defined as

+1
It's very valuable to build a clear scope of these projects functionality for ARM platform in upstream community, it bring confidence to end user and customers when they plan to deploy these projects on ARM.

This is absolute long term work, let's to make it step by step, CI, testing, issue and resolving.

Steve Loughran <[hidden email]> 于2019年6月27日周四 下午9:22写道:
level db and native codecs are invariably a problem here, as is anything else doing misaligned IO. Protobuf has also had "issues" in the past


I think any AA64 work is going to have to define very clearly what "works" is defined as; spark standalone with a specific set of codecs is probably the first thing to aim for -no Snappy or lz4. 

Anything which goes near: protobuf, checksums, native code, etc is in trouble. Don't try and deploy with HDFS as the cluster FS, would be my recommendation. 

If you want a cluster use NFS or one of google GCS, Azure WASB for the cluster FS. And before trying either of those cloud store, run the filesystem connector test suites (hadoop-azure; google gcs github) to see that they work. If the foundational FS test suites fail, nothing else will work



On Thu, Jun 27, 2019 at 3:09 AM Tianhua huang <[hidden email]> wrote:
I took the ut tests on my arm instance before and reported an issue in https://issues.apache.org/jira/browse/SPARK-27721,  and seems there was no leveldbjni native package for aarch64 in leveldbjni-all.jar(or 1.8) https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8we can find https://github.com/fusesource/leveldbjni/pull/82 this pr added the aarch64 support and merged on 2 Nov 2017, but the latest release of the repo is  on 17 Oct 2013, unfortunately it didn't include the aarch64 supporting.
 
I will running the test on the job mentioned above, and will try to fix the issue above, or if anyone have any idea of it, welcome reply me, thank you.


On Wed, Jun 26, 2019 at 8:11 PM Sean Owen <[hidden email]> wrote:
Can you begin by testing yourself? I think the first step is to make
sure the build and tests work on ARM. If you find problems you can
isolate them and try to fix them, or at least report them. It's only
worth getting CI in place when we think builds will work.

On Tue, Jun 25, 2019 at 9:26 PM Tianhua huang <[hidden email]> wrote:
>
> Thanks Shane :)
>
> This sounds good, and yes I agree that it's best to keep the test/build infrastructure in one place. If you can't find the ARM resource we are willing to support the ARM instance :)  Our goal is to make more open source software to be more compatible for aarch64 platform, so let's to do it. I will be happy if I can give some help for the goal.
>
> Waiting for you good news :)
>
> On Wed, Jun 26, 2019 at 9:47 AM shane knapp <[hidden email]> wrote:
>>
>> ...or via VM as you mentioned earlier.  :)
>>
>> shane (who will file a JIRA tomorrow)
>>
>> On Tue, Jun 25, 2019 at 6:44 PM shane knapp <[hidden email]> wrote:
>>>
>>> i'd much prefer that we keep the test/build infrastructure in one place.
>>>
>>> we don't have ARM hardware, but there's a slim possibility i can scare something up in our older research stock...
>>>
>>> another option would be to run the build in a arm-based docker container, which (according to the intarwebs) is possible.
>>>
>>> shane
>>>
>>> On Tue, Jun 25, 2019 at 6:35 PM Tianhua huang <[hidden email]> wrote:
>>>>
>>>> I forked apache/spark project and propose a job(https://github.com/theopenlab/spark/pull/1) for spark building in OpenLab ARM instance, this is the first step to build spark on ARM,  I can enable a periodic job for arm building for apache/spark master if you guys like.  Later I will run tests for spark. I also willing to be the maintainer of the arm ci of spark.
>>>>
>>>> Thanks for you attention.
>>>>
>>>> On Thu, Jun 20, 2019 at 10:17 AM Tianhua huang <[hidden email]> wrote:
>>>>>
>>>>> Thanks Sean.
>>>>>
>>>>> I am very happy to hear that the community will put effort to fix the ARM-related issues. I'd be happy to help if you like. And could you give the trace link of this issue, then I can check it is fixed or not, thank you.
>>>>> As far as I know the old versions of spark support ARM, and now the new versions don't, this just shows that we need a CI to check whether the spark support ARM and whether some modification break it.
>>>>> I will add a demo job in OpenLab to build spark on ARM and do a simple UT test. Later I will give the job link.
>>>>>
>>>>> Let me know what you think.
>>>>>
>>>>> Thank you all!
>>>>>
>>>>>
>>>>> On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <[hidden email]> wrote:
>>>>>>
>>>>>> I'd begin by reporting and fixing ARM-related issues in the build. If
>>>>>> they're small, of course we should do them. If it requires significant
>>>>>> modifications, we can discuss how much Spark can support ARM. I don't
>>>>>> think it's yet necessary for the Spark project to run these CI builds
>>>>>> until that point, but it's always welcome if people are testing that
>>>>>> separately.
>>>>>>
>>>>>> On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <[hidden email]> wrote:
>>>>>> >
>>>>>> > Moving to dev@ for increased visibility among the developers.
>>>>>> >
>>>>>> > On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <[hidden email]> wrote:
>>>>>> >>
>>>>>> >> Thanks for your reply.
>>>>>> >>
>>>>>> >> As I said before, I met some problem of build or test for spark on aarch64 server, so it will be better to have the ARM CI to make sure the spark is compatible for AArch64 platforms.
>>>>>> >>
>>>>>> >> I’m from OpenLab team(https://openlabtesting.org/ ,a community to do open source project testing. And we can support some Arm virtual machines to AMPLab Jenkins, and also we have a developer team that willing to work on this, we willing to maintain build CI jobs and address the CI issues.  What do you think?
>>>>>> >>
>>>>>> >>
>>>>>> >> Thanks for your attention.
>>>>>> >>
>>>>>> >>
>>>>>> >> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <[hidden email]> wrote:
>>>>>> >>>
>>>>>> >>> yeah, we don't have any aarch64 systems for testing...  this has been asked before but is currently pretty low on our priority list as we don't have the hardware.
>>>>>> >>>
>>>>>> >>> sorry,
>>>>>> >>>
>>>>>> >>> shane
>>>>>> >>>
>>>>>> >>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <[hidden email]> wrote:
>>>>>> >>>>
>>>>>> >>>> Hi, sorry to disturb you.
>>>>>> >>>> The CI testing for apache spark is supported by AMPLab Jenkins, and I find there are some computers(most of them are Linux (amd64) arch) for the CI development, but seems there is no Aarch64 computer for spark CI testing. Recently, I build and run test for spark(master and branch-2.4) on my arm server, and unfortunately there are some problems, for example, ut test is failed due to a LEVELDBJNI native package, the details for java test see http://paste.openstack.org/show/752063/ and python test see http://paste.openstack.org/show/752709/
>>>>>> >>>> So I have a question about the ARM CI testing for spark, is there any plan to support it? Thank you very much and I will wait for your reply!
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> --
>>>>>> >>> Shane Knapp
>>>>>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>> >>> https://rise.cs.berkeley.edu
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Twitter: https://twitter.com/holdenkarau
>>>>>> > Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
>>>>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>
>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
Reply | Threaded
Open this post in threaded view
|

Re: Ask for ARM CI for spark

Tianhua huang
Hi all,

We run all unit tests for spark on arm64 platform, after effort there are four tests FAILED, see https://logs.openlabtesting.org/logs/4/4/ae5ebaddd6ba6eba5a525b2bf757043ebbe78432/check/spark-build-arm64/9ecccad/job-output.txt.gz

Two failed and the reason is 'Can't find 1 executors before 10000 milliseconds elapsed', see below, then we try increase timeout the tests passed, so wonder if we can increase the timeout? and here I have another question about https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TestUtils.scala#L285, why is not >=? see the comment of the function, it should be >=? 
- test driver discovery under local-cluster mode *** FAILED ***
  java.util.concurrent.TimeoutException: Can't find 1 executors before 10000 milliseconds elapsed
  at org.apache.spark.TestUtils$.waitUntilExecutorsUp(TestUtils.scala:293)
  at org.apache.spark.SparkContextSuite.$anonfun$new$78(SparkContextSuite.scala:753)
  at org.apache.spark.SparkContextSuite.$anonfun$new$78$adapted(SparkContextSuite.scala:741)
  at org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:161)
  at org.apache.spark.SparkContextSuite.$anonfun$new$77(SparkContextSuite.scala:741)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
  at org.scalatest.Transformer.apply(Transformer.scala:22)
 
- test gpu driver resource files and discovery under local-cluster mode *** FAILED ***
  java.util.concurrent.TimeoutException: Can't find 1 executors before 10000 milliseconds elapsed
  at org.apache.spark.TestUtils$.waitUntilExecutorsUp(TestUtils.scala:293)
  at org.apache.spark.SparkContextSuite.$anonfun$new$80(SparkContextSuite.scala:781)
  at org.apache.spark.SparkContextSuite.$anonfun$new$80$adapted(SparkContextSuite.scala:761)
  at org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:161)
  at org.apache.spark.SparkContextSuite.$anonfun$new$79(SparkContextSuite.scala:761)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
  at org.scalatest.Transformer.apply(Transformer.scala:22)
The other two failed and the reason is '2143289344 equaled 2143289344', this because the value of floatToRawIntBits(0.0f/0.0f) on aarch64 platform is 2143289344 and equals to floatToRawIntBits(Float.NaN). About this I send email to jdk-dev and proposed a topic on scala community https://users.scala-lang.org/t/the-value-of-floattorawintbits-0-0f-0-0f-is-different-on-x86-64-and-aarch64-platforms/4845 and https://github.com/scala/bug/issues/11632, I thought it's something about jdk or scala, but after discuss, it should related with platform, so seems the following asserts is not appropriate? https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala#L704-L705 and https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala#L732-L733 
 - SPARK-26021: NaN and -0.0 in grouping expressions *** FAILED ***
   2143289344 equaled 2143289344 (DataFrameAggregateSuite.scala:732)
 - NaN and -0.0 in window partition keys *** FAILED ***
   2143289344 equaled 2143289344 (DataFrameWindowFunctionsSuite.scala:704)
About the failed tests fixing, we are waiting for your suggestions, thank you very much.

On Wed, Jul 10, 2019 at 10:07 AM Tianhua huang <[hidden email]> wrote:
Hi all,

I am glad to tell you there is a new progress of build/test spark on aarch64 server, the tests are running, see the build/test detail log https://logs.openlabtesting.org/logs/1/1/419fcb11764048d5a3cda186ea76dd43249e1f97/check/spark-build-arm64/75cc6f5/job-output.txt.gz and the aarch64 instance info see https://logs.openlabtesting.org/logs/1/1/419fcb11764048d5a3cda186ea76dd43249e1f97/check/spark-build-arm64/75cc6f5/zuul-info/zuul-info.ubuntu-xenial-arm64.txt In order to enable the test, I made some modification, the major one is to build leveldbjni local package, I forked fusesource/leveldbjni and chirino/leveldb repos, and made some modification to make sure to build the local package, see https://github.com/huangtianhua/leveldbjni/pull/1 and https://github.com/huangtianhua/leveldbjni/pull/2 , then to use it in spark, the detail you can find in https://github.com/theopenlab/spark/pull/1 

Now the tests are not all successful, I will try to fix it and any suggestion is welcome, thank you all. 

On Mon, Jul 1, 2019 at 5:25 PM Tianhua huang <[hidden email]> wrote:
We are focus on the arm instance of cloud, and now I use the arm instance of vexxhost cloud to run the build job which mentioned above, the specification of the arm instance is 8VCPU and 8GB of RAM, 
and we can use bigger flavor to create the arm instance to run the job, if need be. 

On Fri, Jun 28, 2019 at 6:55 PM Steve Loughran <[hidden email]> wrote:

Be interesting to see how well a Pi4 works; with only 4GB of RAM you wouldn't compile with it, but you could try installing the spark jar bundle and then run against some NFS mounted disks: https://www.raspberrypi.org/magpi/raspberry-pi-4-specs-benchmarks/ ; unlikely to be fast, but it'd be an efficient kind of slow

On Fri, Jun 28, 2019 at 3:08 AM Rui Chen <[hidden email]> wrote:
>  I think any AA64 work is going to have to define very clearly what "works" is defined as

+1
It's very valuable to build a clear scope of these projects functionality for ARM platform in upstream community, it bring confidence to end user and customers when they plan to deploy these projects on ARM.

This is absolute long term work, let's to make it step by step, CI, testing, issue and resolving.

Steve Loughran <[hidden email]> 于2019年6月27日周四 下午9:22写道:
level db and native codecs are invariably a problem here, as is anything else doing misaligned IO. Protobuf has also had "issues" in the past


I think any AA64 work is going to have to define very clearly what "works" is defined as; spark standalone with a specific set of codecs is probably the first thing to aim for -no Snappy or lz4. 

Anything which goes near: protobuf, checksums, native code, etc is in trouble. Don't try and deploy with HDFS as the cluster FS, would be my recommendation. 

If you want a cluster use NFS or one of google GCS, Azure WASB for the cluster FS. And before trying either of those cloud store, run the filesystem connector test suites (hadoop-azure; google gcs github) to see that they work. If the foundational FS test suites fail, nothing else will work



On Thu, Jun 27, 2019 at 3:09 AM Tianhua huang <[hidden email]> wrote:
I took the ut tests on my arm instance before and reported an issue in https://issues.apache.org/jira/browse/SPARK-27721,  and seems there was no leveldbjni native package for aarch64 in leveldbjni-all.jar(or 1.8) https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8we can find https://github.com/fusesource/leveldbjni/pull/82 this pr added the aarch64 support and merged on 2 Nov 2017, but the latest release of the repo is  on 17 Oct 2013, unfortunately it didn't include the aarch64 supporting.
 
I will running the test on the job mentioned above, and will try to fix the issue above, or if anyone have any idea of it, welcome reply me, thank you.


On Wed, Jun 26, 2019 at 8:11 PM Sean Owen <[hidden email]> wrote:
Can you begin by testing yourself? I think the first step is to make
sure the build and tests work on ARM. If you find problems you can
isolate them and try to fix them, or at least report them. It's only
worth getting CI in place when we think builds will work.

On Tue, Jun 25, 2019 at 9:26 PM Tianhua huang <[hidden email]> wrote:
>
> Thanks Shane :)
>
> This sounds good, and yes I agree that it's best to keep the test/build infrastructure in one place. If you can't find the ARM resource we are willing to support the ARM instance :)  Our goal is to make more open source software to be more compatible for aarch64 platform, so let's to do it. I will be happy if I can give some help for the goal.
>
> Waiting for you good news :)
>
> On Wed, Jun 26, 2019 at 9:47 AM shane knapp <[hidden email]> wrote:
>>
>> ...or via VM as you mentioned earlier.  :)
>>
>> shane (who will file a JIRA tomorrow)
>>
>> On Tue, Jun 25, 2019 at 6:44 PM shane knapp <[hidden email]> wrote:
>>>
>>> i'd much prefer that we keep the test/build infrastructure in one place.
>>>
>>> we don't have ARM hardware, but there's a slim possibility i can scare something up in our older research stock...
>>>
>>> another option would be to run the build in a arm-based docker container, which (according to the intarwebs) is possible.
>>>
>>> shane
>>>
>>> On Tue, Jun 25, 2019 at 6:35 PM Tianhua huang <[hidden email]> wrote:
>>>>
>>>> I forked apache/spark project and propose a job(https://github.com/theopenlab/spark/pull/1) for spark building in OpenLab ARM instance, this is the first step to build spark on ARM,  I can enable a periodic job for arm building for apache/spark master if you guys like.  Later I will run tests for spark. I also willing to be the maintainer of the arm ci of spark.
>>>>
>>>> Thanks for you attention.
>>>>
>>>> On Thu, Jun 20, 2019 at 10:17 AM Tianhua huang <[hidden email]> wrote:
>>>>>
>>>>> Thanks Sean.
>>>>>
>>>>> I am very happy to hear that the community will put effort to fix the ARM-related issues. I'd be happy to help if you like. And could you give the trace link of this issue, then I can check it is fixed or not, thank you.
>>>>> As far as I know the old versions of spark support ARM, and now the new versions don't, this just shows that we need a CI to check whether the spark support ARM and whether some modification break it.
>>>>> I will add a demo job in OpenLab to build spark on ARM and do a simple UT test. Later I will give the job link.
>>>>>
>>>>> Let me know what you think.
>>>>>
>>>>> Thank you all!
>>>>>
>>>>>
>>>>> On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <[hidden email]> wrote:
>>>>>>
>>>>>> I'd begin by reporting and fixing ARM-related issues in the build. If
>>>>>> they're small, of course we should do them. If it requires significant
>>>>>> modifications, we can discuss how much Spark can support ARM. I don't
>>>>>> think it's yet necessary for the Spark project to run these CI builds
>>>>>> until that point, but it's always welcome if people are testing that
>>>>>> separately.
>>>>>>
>>>>>> On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <[hidden email]> wrote:
>>>>>> >
>>>>>> > Moving to dev@ for increased visibility among the developers.
>>>>>> >
>>>>>> > On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <[hidden email]> wrote:
>>>>>> >>
>>>>>> >> Thanks for your reply.
>>>>>> >>
>>>>>> >> As I said before, I met some problem of build or test for spark on aarch64 server, so it will be better to have the ARM CI to make sure the spark is compatible for AArch64 platforms.
>>>>>> >>
>>>>>> >> I’m from OpenLab team(https://openlabtesting.org/ ,a community to do open source project testing. And we can support some Arm virtual machines to AMPLab Jenkins, and also we have a developer team that willing to work on this, we willing to maintain build CI jobs and address the CI issues.  What do you think?
>>>>>> >>
>>>>>> >>
>>>>>> >> Thanks for your attention.
>>>>>> >>
>>>>>> >>
>>>>>> >> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <[hidden email]> wrote:
>>>>>> >>>
>>>>>> >>> yeah, we don't have any aarch64 systems for testing...  this has been asked before but is currently pretty low on our priority list as we don't have the hardware.
>>>>>> >>>
>>>>>> >>> sorry,
>>>>>> >>>
>>>>>> >>> shane
>>>>>> >>>
>>>>>> >>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <[hidden email]> wrote:
>>>>>> >>>>
>>>>>> >>>> Hi, sorry to disturb you.
>>>>>> >>>> The CI testing for apache spark is supported by AMPLab Jenkins, and I find there are some computers(most of them are Linux (amd64) arch) for the CI development, but seems there is no Aarch64 computer for spark CI testing. Recently, I build and run test for spark(master and branch-2.4) on my arm server, and unfortunately there are some problems, for example, ut test is failed due to a LEVELDBJNI native package, the details for java test see http://paste.openstack.org/show/752063/ and python test see http://paste.openstack.org/show/752709/
>>>>>> >>>> So I have a question about the ARM CI testing for spark, is there any plan to support it? Thank you very much and I will wait for your reply!
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> --
>>>>>> >>> Shane Knapp
>>>>>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>> >>> https://rise.cs.berkeley.edu
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Twitter: https://twitter.com/holdenkarau
>>>>>> > Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
>>>>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>
>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
Reply | Threaded
Open this post in threaded view
|

Re: Ask for ARM CI for spark

Sean Owen-2
On Wed, Jul 17, 2019 at 6:28 AM Tianhua huang <[hidden email]> wrote:
> Two failed and the reason is 'Can't find 1 executors before 10000 milliseconds elapsed', see below, then we try increase timeout the tests passed, so wonder if we can increase the timeout? and here I have another question about https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TestUtils.scala#L285, why is not >=? see the comment of the function, it should be >=?
>

I think it's ">" because the driver is also an executor, but not 100%
sure. In any event it passes in general.
These errors typically mean "I didn't start successfully" for some
other reason that may be in the logs.

> The other two failed and the reason is '2143289344 equaled 2143289344', this because the value of floatToRawIntBits(0.0f/0.0f) on aarch64 platform is 2143289344 and equals to floatToRawIntBits(Float.NaN). About this I send email to jdk-dev and proposed a topic on scala community https://users.scala-lang.org/t/the-value-of-floattorawintbits-0-0f-0-0f-is-different-on-x86-64-and-aarch64-platforms/4845 and https://github.com/scala/bug/issues/11632, I thought it's something about jdk or scala, but after discuss, it should related with platform, so seems the following asserts is not appropriate? https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala#L704-L705 and https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala#L732-L733

These tests could special-case execution on ARM, like you'll see some
tests handle big-endian architectures.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Ask for ARM CI for spark

Tianhua huang
Thanks for your reply.

About the first problem we didn't find any other reason in log, just found timeout to wait the executor up, and after increase the timeout from 10000 ms to 30000(even 20000)ms, https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L764  https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L792  the test passed, and there are more than one executor up, not sure whether it's related with the flavor of our aarch64 instance? Now the flavor of the instance is 8C8G. Maybe we will try the bigger flavor later. Or any one has other suggestion, please contact me, thank you. 

About the second problem, I proposed a pull request to apache/spark, https://github.com/apache/spark/pull/25186  if you have time, would you please to help to review it, thank you very much.

On Wed, Jul 17, 2019 at 8:37 PM Sean Owen <[hidden email]> wrote:
On Wed, Jul 17, 2019 at 6:28 AM Tianhua huang <[hidden email]> wrote:
> Two failed and the reason is 'Can't find 1 executors before 10000 milliseconds elapsed', see below, then we try increase timeout the tests passed, so wonder if we can increase the timeout? and here I have another question about https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TestUtils.scala#L285, why is not >=? see the comment of the function, it should be >=?
>

I think it's ">" because the driver is also an executor, but not 100%
sure. In any event it passes in general.
These errors typically mean "I didn't start successfully" for some
other reason that may be in the logs.

> The other two failed and the reason is '2143289344 equaled 2143289344', this because the value of floatToRawIntBits(0.0f/0.0f) on aarch64 platform is 2143289344 and equals to floatToRawIntBits(Float.NaN). About this I send email to jdk-dev and proposed a topic on scala community https://users.scala-lang.org/t/the-value-of-floattorawintbits-0-0f-0-0f-is-different-on-x86-64-and-aarch64-platforms/4845 and https://github.com/scala/bug/issues/11632, I thought it's something about jdk or scala, but after discuss, it should related with platform, so seems the following asserts is not appropriate? https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala#L704-L705 and https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala#L732-L733

These tests could special-case execution on ARM, like you'll see some
tests handle big-endian architectures.
Reply | Threaded
Open this post in threaded view
|

Re: Ask for ARM CI for spark

Tianhua huang

Hi, all


Sorry to disturb again, there are several sql tests failed on arm64 instance:

  • pgSQL/float8.sql *** FAILED ***
    Expected "0.549306144334054[9]", but got "0.549306144334054[8]" Result did not match for query #56
    SELECT atanh(double('0.5')) (SQLQueryTestSuite.scala:362)
  • pgSQL/numeric.sql *** FAILED ***
    Expected "2 2247902679199174[72 224790267919917955.1326161858
    4 7405685069595001 7405685069594999.0773399947
    5 5068226527.321263 5068226527.3212726541
    6 281839893606.99365 281839893606.9937234336
    1716699575118595840 1716699575118597095.4233081991
    8 167361463828.0749 167361463828.0749132007
    107511333880051856] 107511333880052007....", but got "2 2247902679199174[40224790267919917955.1326161858
    4 7405685069595001 7405685069594999.0773399947
    5 5068226527.321263 5068226527.3212726541
    6 281839893606.99365 281839893606.9937234336
    1716699575118595580 1716699575118597095.4233081991
    8 167361463828.0749 167361463828.0749132007
    107511333880051872] 107511333880052007...." Result did not match for query #496
    SELECT t1.id1, t1.result, t2.expected
    FROM num_result t1, num_exp_power_10_ln t2
    WHERE t1.id1 = t2.id
    AND t1.result != t2.expected (SQLQueryTestSuite.scala:362)

The first test failed, because the value of math.log(3.0) is different on aarch64:

# on x86_64:

scala> val a = 0.5
a: Double = 0.5

scala> a * math.log((1.0 + a) / (1.0 - a))
res1: Double = 0.5493061443340549

scala> math.log((1.0 + a) / (1.0 - a))
res2: Double = 1.0986122886681098

# on aarch64:

scala> val a = 0.5

a: Double = 0.5 

scala> a * math.log((1.0 + a) / (1.0 - a))

res20: Double = 0.5493061443340548

scala> math.log((1.0 + a) / (1.0 - a))

res21: Double = 1.0986122886681096 

And I tried other several numbers like math.log(4.0) and math.log(5.0) and they are same, I don't know why math.log(3.0) is so special? But the result is different indeed on aarch64. If you are interesting, please try it.

The second test failed, because some values of pow(10, x) is different on aarch64, according to sql tests of spark, I took similar tests on aarch64 and x86_64, take '-83028485' as example:

# on x86_64:
scala> import java.lang.Math._
import java.lang.Math._
scala> var a = -83028485
a: Int = -83028485
scala> abs(a)
res4: Int = 83028485
scala> math.log(abs(a))
res5: Double = 18.234694299654787
scala> pow(10, math.log(abs(a)))
res6: Double = 1.71669957511859584E18

# on aarch64:

scala> var a = -83028485
a: Int = -83028485
scala> abs(a)
res38: Int = 83028485

scala> math.log(abs(a))  

res39: Double = 18.234694299654787
scala> pow(10, math.log(abs(a)))
res40: Double = 1.71669957511859558E18

I send an email to jdk-dev, hope someone can help, and also I proposed this to JIRA  https://issues.apache.org/jira/browse/SPARK-28519, , if you are interesting, welcome to join and discuss, thank you very much.


On Thu, Jul 18, 2019 at 11:12 AM Tianhua huang <[hidden email]> wrote:
Thanks for your reply.

About the first problem we didn't find any other reason in log, just found timeout to wait the executor up, and after increase the timeout from 10000 ms to 30000(even 20000)ms, https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L764  https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L792  the test passed, and there are more than one executor up, not sure whether it's related with the flavor of our aarch64 instance? Now the flavor of the instance is 8C8G. Maybe we will try the bigger flavor later. Or any one has other suggestion, please contact me, thank you. 

About the second problem, I proposed a pull request to apache/spark, https://github.com/apache/spark/pull/25186  if you have time, would you please to help to review it, thank you very much.

On Wed, Jul 17, 2019 at 8:37 PM Sean Owen <[hidden email]> wrote:
On Wed, Jul 17, 2019 at 6:28 AM Tianhua huang <[hidden email]> wrote:
> Two failed and the reason is 'Can't find 1 executors before 10000 milliseconds elapsed', see below, then we try increase timeout the tests passed, so wonder if we can increase the timeout? and here I have another question about https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TestUtils.scala#L285, why is not >=? see the comment of the function, it should be >=?
>

I think it's ">" because the driver is also an executor, but not 100%
sure. In any event it passes in general.
These errors typically mean "I didn't start successfully" for some
other reason that may be in the logs.

> The other two failed and the reason is '2143289344 equaled 2143289344', this because the value of floatToRawIntBits(0.0f/0.0f) on aarch64 platform is 2143289344 and equals to floatToRawIntBits(Float.NaN). About this I send email to jdk-dev and proposed a topic on scala community https://users.scala-lang.org/t/the-value-of-floattorawintbits-0-0f-0-0f-is-different-on-x86-64-and-aarch64-platforms/4845 and https://github.com/scala/bug/issues/11632, I thought it's something about jdk or scala, but after discuss, it should related with platform, so seems the following asserts is not appropriate? https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala#L704-L705 and https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala#L732-L733

These tests could special-case execution on ARM, like you'll see some
tests handle big-endian architectures.
Reply | Threaded
Open this post in threaded view
|

Re: Ask for ARM CI for spark

Sean Owen-2
Interesting. I don't think log(3) is special, it's just that some
differences in how it's implemented and floating-point values on
aarch64 vs x86, or in the JVM, manifest at some values like this. It's
still a little surprising! BTW Wolfram Alpha suggests that the correct
value is more like ...810969..., right between the two. java.lang.Math
doesn't guarantee strict IEEE floating-point behavior, but
java.lang.StrictMath is supposed to, at the potential cost of speed,
and it gives ...81096, in agreement with aarch64.

@Yuming Wang the results in float8.sql are from PostgreSQL directly?
Interesting if it also returns the same less accurate result, which
might suggest it's more to do with underlying OS math libraries. You
noted that these tests sometimes gave platform-dependent differences
in the last digit, so wondering if the test value directly reflects
PostgreSQL or just what we happen to return now.

One option is to use StrictMath in special cases like computing atanh.
That gives a value that agrees with aarch64.
I also note that 0.5 * (math.log(1 + x) - math.log(1 - x) gives the
more accurate answer too, and makes the result agree with, say,
Wolfram Alpha for atanh(0.5).
(Actually if we do that, better still is 0.5 * (math.log1p(x) -
math.log1p(-x)) for best accuracy near 0)
Commons Math also has implementations of sinh, cosh, atanh that we
could call. It claims it's possibly more accurate and faster. I
haven't tested its result here.

FWIW the "log1p" version appears, from some informal testing, to be
most accurate (in agreement with Wolfram) and using StrictMath doesn't
matter. If we change something, I'd use that version above.
The only issue is if this causes the result to disagree with
PostgreSQL, but then again it's more correct and maybe the DB is
wrong.


The rest may be a test vs PostgreSQL issue; see
https://issues.apache.org/jira/browse/SPARK-28316


On Fri, Jul 26, 2019 at 2:32 AM Tianhua huang <[hidden email]> wrote:

>
> Hi, all
>
>
> Sorry to disturb again, there are several sql tests failed on arm64 instance:
>
> pgSQL/float8.sql *** FAILED ***
> Expected "0.549306144334054[9]", but got "0.549306144334054[8]" Result did not match for query #56
> SELECT atanh(double('0.5')) (SQLQueryTestSuite.scala:362)
> pgSQL/numeric.sql *** FAILED ***
> Expected "2 2247902679199174[72 224790267919917955.1326161858
> 4 7405685069595001 7405685069594999.0773399947
> 5 5068226527.321263 5068226527.3212726541
> 6 281839893606.99365 281839893606.9937234336
> 7 1716699575118595840 1716699575118597095.4233081991
> 8 167361463828.0749 167361463828.0749132007
> 9 107511333880051856] 107511333880052007....", but got "2 2247902679199174[40224790267919917955.1326161858
> 4 7405685069595001 7405685069594999.0773399947
> 5 5068226527.321263 5068226527.3212726541
> 6 281839893606.99365 281839893606.9937234336
> 7 1716699575118595580 1716699575118597095.4233081991
> 8 167361463828.0749 167361463828.0749132007
> 9 107511333880051872] 107511333880052007...." Result did not match for query #496
> SELECT t1.id1, t1.result, t2.expected
> FROM num_result t1, num_exp_power_10_ln t2
> WHERE t1.id1 = t2.id
> AND t1.result != t2.expected (SQLQueryTestSuite.scala:362)
>
> The first test failed, because the value of math.log(3.0) is different on aarch64:
>
> # on x86_64:
>
> scala> val a = 0.5
> a: Double = 0.5
>
> scala> a * math.log((1.0 + a) / (1.0 - a))
> res1: Double = 0.5493061443340549
>
> scala> math.log((1.0 + a) / (1.0 - a))
> res2: Double = 1.0986122886681098
>
> # on aarch64:
>
> scala> val a = 0.5
>
> a: Double = 0.5
>
> scala> a * math.log((1.0 + a) / (1.0 - a))
>
> res20: Double = 0.5493061443340548
>
> scala> math.log((1.0 + a) / (1.0 - a))
>
> res21: Double = 1.0986122886681096
>
> And I tried other several numbers like math.log(4.0) and math.log(5.0) and they are same, I don't know why math.log(3.0) is so special? But the result is different indeed on aarch64. If you are interesting, please try it.
>
> The second test failed, because some values of pow(10, x) is different on aarch64, according to sql tests of spark, I took similar tests on aarch64 and x86_64, take '-83028485' as example:
>
> # on x86_64:
> scala> import java.lang.Math._
> import java.lang.Math._
> scala> var a = -83028485
> a: Int = -83028485
> scala> abs(a)
> res4: Int = 83028485
> scala> math.log(abs(a))
> res5: Double = 18.234694299654787
> scala> pow(10, math.log(abs(a)))
> res6: Double = 1.71669957511859584E18
>
> # on aarch64:
>
> scala> var a = -83028485
> a: Int = -83028485
> scala> abs(a)
> res38: Int = 83028485
>
> scala> math.log(abs(a))
>
> res39: Double = 18.234694299654787
> scala> pow(10, math.log(abs(a)))
> res40: Double = 1.71669957511859558E18
>
> I send an email to jdk-dev, hope someone can help, and also I proposed this to JIRA  https://issues.apache.org/jira/browse/SPARK-28519, , if you are interesting, welcome to join and discuss, thank you very much.
>
>
> On Thu, Jul 18, 2019 at 11:12 AM Tianhua huang <[hidden email]> wrote:
>>
>> Thanks for your reply.
>>
>> About the first problem we didn't find any other reason in log, just found timeout to wait the executor up, and after increase the timeout from 10000 ms to 30000(even 20000)ms, https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L764  https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L792  the test passed, and there are more than one executor up, not sure whether it's related with the flavor of our aarch64 instance? Now the flavor of the instance is 8C8G. Maybe we will try the bigger flavor later. Or any one has other suggestion, please contact me, thank you.
>>
>> About the second problem, I proposed a pull request to apache/spark, https://github.com/apache/spark/pull/25186  if you have time, would you please to help to review it, thank you very much.
>>
>> On Wed, Jul 17, 2019 at 8:37 PM Sean Owen <[hidden email]> wrote:
>>>
>>> On Wed, Jul 17, 2019 at 6:28 AM Tianhua huang <[hidden email]> wrote:
>>> > Two failed and the reason is 'Can't find 1 executors before 10000 milliseconds elapsed', see below, then we try increase timeout the tests passed, so wonder if we can increase the timeout? and here I have another question about https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TestUtils.scala#L285, why is not >=? see the comment of the function, it should be >=?
>>> >
>>>
>>> I think it's ">" because the driver is also an executor, but not 100%
>>> sure. In any event it passes in general.
>>> These errors typically mean "I didn't start successfully" for some
>>> other reason that may be in the logs.
>>>
>>> > The other two failed and the reason is '2143289344 equaled 2143289344', this because the value of floatToRawIntBits(0.0f/0.0f) on aarch64 platform is 2143289344 and equals to floatToRawIntBits(Float.NaN). About this I send email to jdk-dev and proposed a topic on scala community https://users.scala-lang.org/t/the-value-of-floattorawintbits-0-0f-0-0f-is-different-on-x86-64-and-aarch64-platforms/4845 and https://github.com/scala/bug/issues/11632, I thought it's something about jdk or scala, but after discuss, it should related with platform, so seems the following asserts is not appropriate? https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala#L704-L705 and https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala#L732-L733
>>>
>>> These tests could special-case execution on ARM, like you'll see some
>>> tests handle big-endian architectures.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Ask for ARM CI for spark

bo zhaobo
Hi all,

Thanks for your concern. Yeah, that's worth to also test in backend database. But need to note here, this issue is hit in Spark SQL, as we only test it with spark itself, not integrate other databases.

Best Regards,

ZhaoBo



Mailtrack Sender notified by
Mailtrack 19/07/27 上午09:30:56

Sean Owen <[hidden email]> 于2019年7月26日周五 下午5:46写道:
Interesting. I don't think log(3) is special, it's just that some
differences in how it's implemented and floating-point values on
aarch64 vs x86, or in the JVM, manifest at some values like this. It's
still a little surprising! BTW Wolfram Alpha suggests that the correct
value is more like ...810969..., right between the two. java.lang.Math
doesn't guarantee strict IEEE floating-point behavior, but
java.lang.StrictMath is supposed to, at the potential cost of speed,
and it gives ...81096, in agreement with aarch64.

@Yuming Wang the results in float8.sql are from PostgreSQL directly?
Interesting if it also returns the same less accurate result, which
might suggest it's more to do with underlying OS math libraries. You
noted that these tests sometimes gave platform-dependent differences
in the last digit, so wondering if the test value directly reflects
PostgreSQL or just what we happen to return now.

One option is to use StrictMath in special cases like computing atanh.
That gives a value that agrees with aarch64.
I also note that 0.5 * (math.log(1 + x) - math.log(1 - x) gives the
more accurate answer too, and makes the result agree with, say,
Wolfram Alpha for atanh(0.5).
(Actually if we do that, better still is 0.5 * (math.log1p(x) -
math.log1p(-x)) for best accuracy near 0)
Commons Math also has implementations of sinh, cosh, atanh that we
could call. It claims it's possibly more accurate and faster. I
haven't tested its result here.

FWIW the "log1p" version appears, from some informal testing, to be
most accurate (in agreement with Wolfram) and using StrictMath doesn't
matter. If we change something, I'd use that version above.
The only issue is if this causes the result to disagree with
PostgreSQL, but then again it's more correct and maybe the DB is
wrong.


The rest may be a test vs PostgreSQL issue; see
https://issues.apache.org/jira/browse/SPARK-28316


On Fri, Jul 26, 2019 at 2:32 AM Tianhua huang <[hidden email]> wrote:
>
> Hi, all
>
>
> Sorry to disturb again, there are several sql tests failed on arm64 instance:
>
> pgSQL/float8.sql *** FAILED ***
> Expected "0.549306144334054[9]", but got "0.549306144334054[8]" Result did not match for query #56
> SELECT atanh(double('0.5')) (SQLQueryTestSuite.scala:362)
> pgSQL/numeric.sql *** FAILED ***
> Expected "2 2247902679199174[72 224790267919917955.1326161858
> 4 7405685069595001 7405685069594999.0773399947
> 5 5068226527.321263 5068226527.3212726541
> 6 281839893606.99365 281839893606.9937234336
> 7 1716699575118595840 1716699575118597095.4233081991
> 8 167361463828.0749 167361463828.0749132007
> 9 107511333880051856] 107511333880052007....", but got "2 2247902679199174[40224790267919917955.1326161858
> 4 7405685069595001 7405685069594999.0773399947
> 5 5068226527.321263 5068226527.3212726541
> 6 281839893606.99365 281839893606.9937234336
> 7 1716699575118595580 1716699575118597095.4233081991
> 8 167361463828.0749 167361463828.0749132007
> 9 107511333880051872] 107511333880052007...." Result did not match for query #496
> SELECT t1.id1, t1.result, t2.expected
> FROM num_result t1, num_exp_power_10_ln t2
> WHERE t1.id1 = t2.id
> AND t1.result != t2.expected (SQLQueryTestSuite.scala:362)
>
> The first test failed, because the value of math.log(3.0) is different on aarch64:
>
> # on x86_64:
>
> scala> val a = 0.5
> a: Double = 0.5
>
> scala> a * math.log((1.0 + a) / (1.0 - a))
> res1: Double = 0.5493061443340549
>
> scala> math.log((1.0 + a) / (1.0 - a))
> res2: Double = 1.0986122886681098
>
> # on aarch64:
>
> scala> val a = 0.5
>
> a: Double = 0.5
>
> scala> a * math.log((1.0 + a) / (1.0 - a))
>
> res20: Double = 0.5493061443340548
>
> scala> math.log((1.0 + a) / (1.0 - a))
>
> res21: Double = 1.0986122886681096
>
> And I tried other several numbers like math.log(4.0) and math.log(5.0) and they are same, I don't know why math.log(3.0) is so special? But the result is different indeed on aarch64. If you are interesting, please try it.
>
> The second test failed, because some values of pow(10, x) is different on aarch64, according to sql tests of spark, I took similar tests on aarch64 and x86_64, take '-83028485' as example:
>
> # on x86_64:
> scala> import java.lang.Math._
> import java.lang.Math._
> scala> var a = -83028485
> a: Int = -83028485
> scala> abs(a)
> res4: Int = 83028485
> scala> math.log(abs(a))
> res5: Double = 18.234694299654787
> scala> pow(10, math.log(abs(a)))
> res6: Double = 1.71669957511859584E18
>
> # on aarch64:
>
> scala> var a = -83028485
> a: Int = -83028485
> scala> abs(a)
> res38: Int = 83028485
>
> scala> math.log(abs(a))
>
> res39: Double = 18.234694299654787
> scala> pow(10, math.log(abs(a)))
> res40: Double = 1.71669957511859558E18
>
> I send an email to jdk-dev, hope someone can help, and also I proposed this to JIRA  https://issues.apache.org/jira/browse/SPARK-28519, , if you are interesting, welcome to join and discuss, thank you very much.
>
>
> On Thu, Jul 18, 2019 at 11:12 AM Tianhua huang <[hidden email]> wrote:
>>
>> Thanks for your reply.
>>
>> About the first problem we didn't find any other reason in log, just found timeout to wait the executor up, and after increase the timeout from 10000 ms to 30000(even 20000)ms, https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L764  https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L792  the test passed, and there are more than one executor up, not sure whether it's related with the flavor of our aarch64 instance? Now the flavor of the instance is 8C8G. Maybe we will try the bigger flavor later. Or any one has other suggestion, please contact me, thank you.
>>
>> About the second problem, I proposed a pull request to apache/spark, https://github.com/apache/spark/pull/25186  if you have time, would you please to help to review it, thank you very much.
>>
>> On Wed, Jul 17, 2019 at 8:37 PM Sean Owen <[hidden email]> wrote:
>>>
>>> On Wed, Jul 17, 2019 at 6:28 AM Tianhua huang <[hidden email]> wrote:
>>> > Two failed and the reason is 'Can't find 1 executors before 10000 milliseconds elapsed', see below, then we try increase timeout the tests passed, so wonder if we can increase the timeout? and here I have another question about https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TestUtils.scala#L285, why is not >=? see the comment of the function, it should be >=?
>>> >
>>>
>>> I think it's ">" because the driver is also an executor, but not 100%
>>> sure. In any event it passes in general.
>>> These errors typically mean "I didn't start successfully" for some
>>> other reason that may be in the logs.
>>>
>>> > The other two failed and the reason is '2143289344 equaled 2143289344', this because the value of floatToRawIntBits(0.0f/0.0f) on aarch64 platform is 2143289344 and equals to floatToRawIntBits(Float.NaN). About this I send email to jdk-dev and proposed a topic on scala community https://users.scala-lang.org/t/the-value-of-floattorawintbits-0-0f-0-0f-is-different-on-x86-64-and-aarch64-platforms/4845 and https://github.com/scala/bug/issues/11632, I thought it's something about jdk or scala, but after discuss, it should related with platform, so seems the following asserts is not appropriate? https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala#L704-L705 and https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala#L732-L733
>>>
>>> These tests could special-case execution on ARM, like you'll see some
>>> tests handle big-endian architectures.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

123