Re: java.lang.ClassNotFoundException for s3a comitter

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: java.lang.ClassNotFoundException for s3a comitter

murat migdisoglu
Hi all 
I've upgraded my test cluster to spark 3 and change my comitter to directory and I still get this error.. The documentations are somehow obscure on that. 
Do I need to add a third party jar to support new comitters? 

java.lang.ClassNotFoundException: org.apache.spark.internal.io.cloud.PathOutputCommitProtocol


On Thu, Jun 18, 2020 at 1:35 AM murat migdisoglu <[hidden email]> wrote:
Hello all, 
we have a hadoop cluster (using yarn) using  s3 as filesystem with s3guard is enabled. 
We are using hadoop 3.2.1 with spark 2.4.5. 

When I try to save a dataframe in parquet format, I get the following exception: 
java.lang.ClassNotFoundException: com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol

My relevant spark configurations are as following: 
"hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory",
"fs.s3a.committer.name": "magic",
"fs.s3a.committer.magic.enabled": true,
"fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",

While spark streaming fails with the exception above, apache beam succeeds writing parquet files. 
What might be the problem?

Thanks in advance 


--
"Talkers aren’t good doers. Rest assured that we’re going there to use our hands, not our tongues."
W. Shakespeare


--
"Talkers aren’t good doers. Rest assured that we’re going there to use our hands, not our tongues."
W. Shakespeare
Reply | Threaded
Open this post in threaded view
|

Re: java.lang.ClassNotFoundException for s3a comitter

Steve Loughran-2
you are going to need hadoop-3.1 on your classpath, with hadoop-aws and the same aws-sdk it was built with (1.11.something). Mixing hadoop JARs is doomed. using a different aws sdk jar is a bit risky, though more recent upgrades have all be fairly low stress

On Fri, 19 Jun 2020 at 05:39, murat migdisoglu <[hidden email]> wrote:
Hi all 
I've upgraded my test cluster to spark 3 and change my comitter to directory and I still get this error.. The documentations are somehow obscure on that. 
Do I need to add a third party jar to support new comitters? 

java.lang.ClassNotFoundException: org.apache.spark.internal.io.cloud.PathOutputCommitProtocol


On Thu, Jun 18, 2020 at 1:35 AM murat migdisoglu <[hidden email]> wrote:
Hello all, 
we have a hadoop cluster (using yarn) using  s3 as filesystem with s3guard is enabled. 
We are using hadoop 3.2.1 with spark 2.4.5. 

When I try to save a dataframe in parquet format, I get the following exception: 
java.lang.ClassNotFoundException: com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol

My relevant spark configurations are as following: 
"hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory",
"fs.s3a.committer.name": "magic",
"fs.s3a.committer.magic.enabled": true,
"fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",

While spark streaming fails with the exception above, apache beam succeeds writing parquet files. 
What might be the problem?

Thanks in advance 


--
"Talkers aren’t good doers. Rest assured that we’re going there to use our hands, not our tongues."
W. Shakespeare


--
"Talkers aren’t good doers. Rest assured that we’re going there to use our hands, not our tongues."
W. Shakespeare