Behroz Sikander
I already posted the question on the users mailing list but I was suggested
to post here.

Note: The problems regarding the application running on top are
understandable and this issue is not about the application code problems.

The gist of the problem is that I am trying to upgrade my cluster from 2.2.2
to 2.4.4. I am using spark standalone with HA mode (zookeeper). The upgrade
goes really strange and with alot of problems.

- Spark master on version 2.4.4 tries to recover itself from zookeeper and
fails to deserialize the app/worker objects and throws
InvalidClassException. It successfully deserializes driverInfo objects. The
deserialization is failing due to "RpcEndpointRef"
- Spark master (2.4.4) after failing to deserialize, deletes all the
information about apps/workers from zookeeper and loses all contacts
to running JVMs.
- Old spark workers (2.2) fails to communicate with new Spark master (2.4.4)

I was expecting a smooth upgrade of spark itself and i was thinking that
spark should gracefully recover.

So, my questions are:
- Is upgrade of spark from version 2.2.2 to 2.4.4 supposed to go smooth?
- If not, then do we have this documented somewhere?
- If yes, then why serialization is failing or how can I fix it?

Any help would be much appreciated.

