Hi, I am getting an exception in Spark 2.1 reading parquet files where some columns are DELTA_BYTE_ARRAY encoded.
java.lang.UnsupportedOperationException: Unsupported encoding: DELTA_BYTE_ARRAY Is this exception by design, or am I missing something? If I turn off the vectorized reader, reading these files works fine. AndreiL |
Hi AndreiL,
Were these files written with the Parquet V2 writer? The Spark 2.1 vectorized reader does not appear to support that format. Michael > On May 9, 2017, at 11:04 AM, andreiL <[hidden email]> wrote: > > Hi, I am getting an exception in Spark 2.1 reading parquet files where some > columns are DELTA_BYTE_ARRAY encoded. > > java.lang.UnsupportedOperationException: Unsupported encoding: > DELTA_BYTE_ARRAY > > Is this exception by design, or am I missing something? > > If I turn off the vectorized reader, reading these files works fine. > > AndreiL > > > > -- > View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Parquet-vectorized-reader-DELTA-BYTE-ARRAY-tp21538.html > Sent from the Apache Spark Developers List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: [hidden email] > --------------------------------------------------------------------- To unsubscribe e-mail: [hidden email] |
Michael is right, the delta byte array encoding is a Parquet v2 feature. Parquet v2 isn't finished yet, though some features are in releases and those features will be supported in future releases. In other words, Parquet will maintain backward-compatibility for any released v2 features. I don't recommend using Parquet v2 yet because Parquet doesn't guarantee forward-compatibility for those features. For v1, old readers should be able to read the data written by newer versions, but we won't make that guarantee for v2 until the spec is considered finished. rb On Mon, May 22, 2017 at 10:16 AM, Michael Allman <[hidden email]> wrote: Hi AndreiL, Ryan Blue Software Engineer Netflix |
I took a closer look and, yes the files were written with Parquet v2.
For some reason Parquet v2 was set as the default, I set it back to Parquet v1. Thanks Michael and Ryan for the info. Andrei. |
Free forum by Nabble | Edit this page |