Package option gets outdated jar when running with "latest"

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Package option gets outdated jar when running with "latest"

Alessandro Liparoti
Hi everyone,

I am encountering an annoying issue when running spark with external jar dependency downloaded from maven. This is how we run it

spark-shell --repositories <our-own-maven-release-repo> --packages <our-package:latest.release>

When we release a new version and we have some big change in the API, things start to randomly break for some users. For example, in version 0.44 we had a class DateUtils (used by class Utils) that was dropped in version 0.45. Running when version 0.45 was released (spark shows it is correctly downloading it from maven) and using the class Utils some users got

NoClassDefFoundError for class DateUtils

To me this looks like a caching problem. Probably some node (master or an executor) ClassLoader is still pointing to v0.44 and when loading Utils it tries to find DateUtils class which has disappeared in newer jar. Not sure how this can happen, this is only an intution.

Does anyone have any idea on how to solve this? It is also very hard to debug since I couldn't find a pattern to reproduce it. It happens on every release that changes a class name but not for everyone running the job (that's why caching looked like a good hint to me).

Thanks,
Alessandro Liparoti