Parquet vectorized reader DELTA_BYTE_ARRAY

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Parquet vectorized reader DELTA_BYTE_ARRAY

andreiL
Hi, I am getting an exception in Spark 2.1 reading parquet files where some columns are DELTA_BYTE_ARRAY encoded.

java.lang.UnsupportedOperationException: Unsupported encoding: DELTA_BYTE_ARRAY

Is this exception by design, or am I missing something?

If I turn off the vectorized reader, reading these files works fine.

AndreiL
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parquet vectorized reader DELTA_BYTE_ARRAY

Michael Allman-2
Hi AndreiL,

Were these files written with the Parquet V2 writer? The Spark 2.1 vectorized reader does not appear to support that format.

Michael


> On May 9, 2017, at 11:04 AM, andreiL <[hidden email]> wrote:
>
> Hi, I am getting an exception in Spark 2.1 reading parquet files where some
> columns are DELTA_BYTE_ARRAY encoded.
>
> java.lang.UnsupportedOperationException: Unsupported encoding:
> DELTA_BYTE_ARRAY
>
> Is this exception by design, or am I missing something?
>
> If I turn off the vectorized reader, reading these files works fine.
>
> AndreiL
>
>
>
> --
> View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Parquet-vectorized-reader-DELTA-BYTE-ARRAY-tp21538.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parquet vectorized reader DELTA_BYTE_ARRAY

Ryan Blue
Michael is right, the delta byte array encoding is a Parquet v2 feature. Parquet v2 isn't finished yet, though some features are in releases and those features will be supported in future releases. In other words, Parquet will maintain backward-compatibility for any released v2 features.

I don't recommend using Parquet v2 yet because Parquet doesn't guarantee forward-compatibility for those features. For v1, old readers should be able to read the data written by newer versions, but we won't make that guarantee for v2 until the spec is considered finished.

rb

On Mon, May 22, 2017 at 10:16 AM, Michael Allman <[hidden email]> wrote:
Hi AndreiL,

Were these files written with the Parquet V2 writer? The Spark 2.1 vectorized reader does not appear to support that format.

Michael


> On May 9, 2017, at 11:04 AM, andreiL <[hidden email]> wrote:
>
> Hi, I am getting an exception in Spark 2.1 reading parquet files where some
> columns are DELTA_BYTE_ARRAY encoded.
>
> java.lang.UnsupportedOperationException: Unsupported encoding:
> DELTA_BYTE_ARRAY
>
> Is this exception by design, or am I missing something?
>
> If I turn off the vectorized reader, reading these files works fine.
>
> AndreiL
>
>
>
> --
> View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Parquet-vectorized-reader-DELTA-BYTE-ARRAY-tp21538.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]




--
Ryan Blue
Software Engineer
Netflix
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parquet vectorized reader DELTA_BYTE_ARRAY

andreiL
I took a closer look and, yes the files were written with Parquet v2.

For some reason Parquet v2 was set as the default, I set it back to Parquet v1.

Thanks Michael and Ryan for the info.

Andrei.
Loading...