Question, Flaky tests: pyspark.sql.tests.ArrowTests tests in Jenkins worker 5(?)

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Question, Flaky tests: pyspark.sql.tests.ArrowTests tests in Jenkins worker 5(?)

Hyukjin Kwon
Hi all,

I am seeing flaky Python tests time to time and if I am not mistaken mostly in amp-jenkins-worker-05:


======================================================================
ERROR: test_filtered_frame (pyspark.sql.tests.ArrowTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/anaconda/envs/py3k/lib/python3.4/site-packages/pandas/__init__.py", line 25, in <module>
    from pandas import hashtable, tslib, lib
ImportError: cannot import name 'hashtable'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/tests.py", line 3057, in test_filtered_frame
    pdf = df.filter("i < 0").toPandas()
  File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/dataframe.py", line 1727, in toPandas
    import pandas as pd
  File "/home/anaconda/envs/py3k/lib/python3.4/site-packages/pandas/__init__.py", line 31, in <module>
    "the C extensions first.".format(module))
ImportError: C extension: 'hashtable' not built. If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext --inplace --force' to build the C extensions first.

======================================================================
ERROR: test_null_conversion (pyspark.sql.tests.ArrowTests)
----------------------------------------------------------------------
...

======================================================================
ERROR: test_pandas_round_trip (pyspark.sql.tests.ArrowTests)
----------------------------------------------------------------------
...

======================================================================
ERROR: test_toPandas_arrow_toggle (pyspark.sql.tests.ArrowTests)
----------------------------------------------------------------------
...


I sounds environment problem apparently due to missing hashtable (which I believe should have been compiled and importable properly).

I suspect few possibilities such as a bug somewhere or unsuccessful manual build from Pandas source but I am unable to reproduce this and check this. So, yes. This is rather my guess.


Does anyone know if this is an environment problem and how to fix this?

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Question, Flaky tests: pyspark.sql.tests.ArrowTests tests in Jenkins worker 5(?)

Liang-Chi Hsieh

Maybe a possible fix: https://stackoverflow.com/questions/31495657/development-build-of-pandas-giving-importerror-c-extension-hashtable-not-bui

Hyukjin Kwon wrote
Hi all,

I am seeing flaky Python tests time to time and if I am not mistaken mostly
in amp-jenkins-worker-05:


======================================================================
ERROR: test_filtered_frame (pyspark.sql.tests.ArrowTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File
"/home/anaconda/envs/py3k/lib/python3.4/site-packages/pandas/__init__.py",
line 25, in <module>
    from pandas import hashtable, tslib, lib
ImportError: cannot import name 'hashtable'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/tests.py",
line 3057, in test_filtered_frame
    pdf = df.filter("i < 0").toPandas()
  File
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/dataframe.py",
line 1727, in toPandas
    import pandas as pd
  File
"/home/anaconda/envs/py3k/lib/python3.4/site-packages/pandas/__init__.py",
line 31, in <module>
    "the C extensions first.".format(module))
ImportError: C extension: 'hashtable' not built. If you want to import
pandas from the source directory, you may need to run 'python setup.py
build_ext --inplace --force' to build the C extensions first.

======================================================================
ERROR: test_null_conversion (pyspark.sql.tests.ArrowTests)
----------------------------------------------------------------------
...

======================================================================
ERROR: test_pandas_round_trip (pyspark.sql.tests.ArrowTests)
----------------------------------------------------------------------
...

======================================================================
ERROR: test_toPandas_arrow_toggle (pyspark.sql.tests.ArrowTests)
----------------------------------------------------------------------
...


I sounds environment problem apparently due to missing hashtable (which I
believe should have been compiled and importable properly).

I suspect few possibilities such as a bug somewhere or unsuccessful manual
build from Pandas source but I am unable to reproduce this and check this.
So, yes. This is rather my guess.


Does anyone know if this is an environment problem and how to fix this?
Liang-Chi Hsieh | @viirya
Spark Technology Center
http://www.spark.tc/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Question, Flaky tests: pyspark.sql.tests.ArrowTests tests in Jenkins worker 5(?)

shane knapp
amp-jenkins-worker-05 had 0.20.3 installed for some reason.  it's now
been downgraded to 0.19.2 and matches the other workers.

shane

On Sat, Aug 5, 2017 at 2:01 AM, Liang-Chi Hsieh <[hidden email]> wrote:

>
> Maybe a possible fix:
> https://stackoverflow.com/questions/31495657/development-build-of-pandas-giving-importerror-c-extension-hashtable-not-bui
>
>
> Hyukjin Kwon wrote
>> Hi all,
>>
>> I am seeing flaky Python tests time to time and if I am not mistaken
>> mostly
>> in amp-jenkins-worker-05:
>>
>>
>> ======================================================================
>> ERROR: test_filtered_frame (pyspark.sql.tests.ArrowTests)
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>>   File
>> "/home/anaconda/envs/py3k/lib/python3.4/site-packages/pandas/__init__.py",
>> line 25, in
>> <module>
>>     from pandas import hashtable, tslib, lib
>> ImportError: cannot import name 'hashtable'
>>
>> During handling of the above exception, another exception occurred:
>>
>> Traceback (most recent call last):
>>   File
>> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/tests.py",
>> line 3057, in test_filtered_frame
>>     pdf = df.filter("i < 0").toPandas()
>>   File
>> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/dataframe.py",
>> line 1727, in toPandas
>>     import pandas as pd
>>   File
>> "/home/anaconda/envs/py3k/lib/python3.4/site-packages/pandas/__init__.py",
>> line 31, in
>> <module>
>>     "the C extensions first.".format(module))
>> ImportError: C extension: 'hashtable' not built. If you want to import
>> pandas from the source directory, you may need to run 'python setup.py
>> build_ext --inplace --force' to build the C extensions first.
>>
>> ======================================================================
>> ERROR: test_null_conversion (pyspark.sql.tests.ArrowTests)
>> ----------------------------------------------------------------------
>> ...
>>
>> ======================================================================
>> ERROR: test_pandas_round_trip (pyspark.sql.tests.ArrowTests)
>> ----------------------------------------------------------------------
>> ...
>>
>> ======================================================================
>> ERROR: test_toPandas_arrow_toggle (pyspark.sql.tests.ArrowTests)
>> ----------------------------------------------------------------------
>> ...
>>
>>
>> I sounds environment problem apparently due to missing hashtable (which I
>> believe should have been compiled and importable properly).
>>
>> I suspect few possibilities such as a bug somewhere or unsuccessful manual
>> build from Pandas source but I am unable to reproduce this and check this.
>> So, yes. This is rather my guess.
>>
>>
>> Does anyone know if this is an environment problem and how to fix this?
>
>
>
>
>
> -----
> Liang-Chi Hsieh | @viirya
> Spark Technology Center
> http://www.spark.tc/
> --
> View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Question-Flaky-tests-pyspark-sql-tests-ArrowTests-tests-in-Jenkins-worker-5-tp22085p22086.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Question, Flaky tests: pyspark.sql.tests.ArrowTests tests in Jenkins worker 5(?)

shane knapp
ok, first test to run post-fix is green:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80289/

i'll keep an eye on this worker over the next few days.

shane

On Sat, Aug 5, 2017 at 11:06 AM, shane knapp <[hidden email]> wrote:

> amp-jenkins-worker-05 had 0.20.3 installed for some reason.  it's now
> been downgraded to 0.19.2 and matches the other workers.
>
> shane
>
> On Sat, Aug 5, 2017 at 2:01 AM, Liang-Chi Hsieh <[hidden email]> wrote:
>>
>> Maybe a possible fix:
>> https://stackoverflow.com/questions/31495657/development-build-of-pandas-giving-importerror-c-extension-hashtable-not-bui
>>
>>
>> Hyukjin Kwon wrote
>>> Hi all,
>>>
>>> I am seeing flaky Python tests time to time and if I am not mistaken
>>> mostly
>>> in amp-jenkins-worker-05:
>>>
>>>
>>> ======================================================================
>>> ERROR: test_filtered_frame (pyspark.sql.tests.ArrowTests)
>>> ----------------------------------------------------------------------
>>> Traceback (most recent call last):
>>>   File
>>> "/home/anaconda/envs/py3k/lib/python3.4/site-packages/pandas/__init__.py",
>>> line 25, in
>>> <module>
>>>     from pandas import hashtable, tslib, lib
>>> ImportError: cannot import name 'hashtable'
>>>
>>> During handling of the above exception, another exception occurred:
>>>
>>> Traceback (most recent call last):
>>>   File
>>> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/tests.py",
>>> line 3057, in test_filtered_frame
>>>     pdf = df.filter("i < 0").toPandas()
>>>   File
>>> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/dataframe.py",
>>> line 1727, in toPandas
>>>     import pandas as pd
>>>   File
>>> "/home/anaconda/envs/py3k/lib/python3.4/site-packages/pandas/__init__.py",
>>> line 31, in
>>> <module>
>>>     "the C extensions first.".format(module))
>>> ImportError: C extension: 'hashtable' not built. If you want to import
>>> pandas from the source directory, you may need to run 'python setup.py
>>> build_ext --inplace --force' to build the C extensions first.
>>>
>>> ======================================================================
>>> ERROR: test_null_conversion (pyspark.sql.tests.ArrowTests)
>>> ----------------------------------------------------------------------
>>> ...
>>>
>>> ======================================================================
>>> ERROR: test_pandas_round_trip (pyspark.sql.tests.ArrowTests)
>>> ----------------------------------------------------------------------
>>> ...
>>>
>>> ======================================================================
>>> ERROR: test_toPandas_arrow_toggle (pyspark.sql.tests.ArrowTests)
>>> ----------------------------------------------------------------------
>>> ...
>>>
>>>
>>> I sounds environment problem apparently due to missing hashtable (which I
>>> believe should have been compiled and importable properly).
>>>
>>> I suspect few possibilities such as a bug somewhere or unsuccessful manual
>>> build from Pandas source but I am unable to reproduce this and check this.
>>> So, yes. This is rather my guess.
>>>
>>>
>>> Does anyone know if this is an environment problem and how to fix this?
>>
>>
>>
>>
>>
>> -----
>> Liang-Chi Hsieh | @viirya
>> Spark Technology Center
>> http://www.spark.tc/
>> --
>> View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Question-Flaky-tests-pyspark-sql-tests-ArrowTests-tests-in-Jenkins-worker-5-tp22085p22086.html
>> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [hidden email]
>>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Question, Flaky tests: pyspark.sql.tests.ArrowTests tests in Jenkins worker 5(?)

Hyukjin Kwon
Thank you, Shane.

2017-08-06 8:30 GMT+09:00 shane knapp <[hidden email]>:
ok, first test to run post-fix is green:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80289/

i'll keep an eye on this worker over the next few days.

shane

On Sat, Aug 5, 2017 at 11:06 AM, shane knapp <[hidden email]> wrote:
> amp-jenkins-worker-05 had 0.20.3 installed for some reason.  it's now
> been downgraded to 0.19.2 and matches the other workers.
>
> shane
>
> On Sat, Aug 5, 2017 at 2:01 AM, Liang-Chi Hsieh <[hidden email]> wrote:
>>
>> Maybe a possible fix:
>> https://stackoverflow.com/questions/31495657/development-build-of-pandas-giving-importerror-c-extension-hashtable-not-bui
>>
>>
>> Hyukjin Kwon wrote
>>> Hi all,
>>>
>>> I am seeing flaky Python tests time to time and if I am not mistaken
>>> mostly
>>> in amp-jenkins-worker-05:
>>>
>>>
>>> ======================================================================
>>> ERROR: test_filtered_frame (pyspark.sql.tests.ArrowTests)
>>> ----------------------------------------------------------------------
>>> Traceback (most recent call last):
>>>   File
>>> "/home/anaconda/envs/py3k/lib/python3.4/site-packages/pandas/__init__.py",
>>> line 25, in
>>> <module>
>>>     from pandas import hashtable, tslib, lib
>>> ImportError: cannot import name 'hashtable'
>>>
>>> During handling of the above exception, another exception occurred:
>>>
>>> Traceback (most recent call last):
>>>   File
>>> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/tests.py",
>>> line 3057, in test_filtered_frame
>>>     pdf = df.filter("i < 0").toPandas()
>>>   File
>>> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/dataframe.py",
>>> line 1727, in toPandas
>>>     import pandas as pd
>>>   File
>>> "/home/anaconda/envs/py3k/lib/python3.4/site-packages/pandas/__init__.py",
>>> line 31, in
>>> <module>
>>>     "the C extensions first.".format(module))
>>> ImportError: C extension: 'hashtable' not built. If you want to import
>>> pandas from the source directory, you may need to run 'python setup.py
>>> build_ext --inplace --force' to build the C extensions first.
>>>
>>> ======================================================================
>>> ERROR: test_null_conversion (pyspark.sql.tests.ArrowTests)
>>> ----------------------------------------------------------------------
>>> ...
>>>
>>> ======================================================================
>>> ERROR: test_pandas_round_trip (pyspark.sql.tests.ArrowTests)
>>> ----------------------------------------------------------------------
>>> ...
>>>
>>> ======================================================================
>>> ERROR: test_toPandas_arrow_toggle (pyspark.sql.tests.ArrowTests)
>>> ----------------------------------------------------------------------
>>> ...
>>>
>>>
>>> I sounds environment problem apparently due to missing hashtable (which I
>>> believe should have been compiled and importable properly).
>>>
>>> I suspect few possibilities such as a bug somewhere or unsuccessful manual
>>> build from Pandas source but I am unable to reproduce this and check this.
>>> So, yes. This is rather my guess.
>>>
>>>
>>> Does anyone know if this is an environment problem and how to fix this?
>>
>>
>>
>>
>>
>> -----
>> Liang-Chi Hsieh | @viirya
>> Spark Technology Center
>> http://www.spark.tc/
>> --
>> View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Question-Flaky-tests-pyspark-sql-tests-ArrowTests-tests-in-Jenkins-worker-5-tp22085p22086.html
>> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [hidden email]
>>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


Loading...