[SPARK-25299] A Discussion About Shuffle Metadata Tracking
A working group in the community have been having ongoing discussions regarding how we can allow for flexible storage solutions for shuffle data that is compatible with containerized systems, more resilient
to node failures, and can support disaggregated storage architectures.
One of the core challenges we have been trying to overcome is navigating the space of shuffle metadata tracking, and reasoning about how we approach recomputing lost shuffle blocks in the case when the shuffle
file storage system is not resilient.
I have written a design document on the subject, and a proposed set of APIs to fix it. These should be considered as part of the APIs for
SPARK-25299. Once we have reached some common conclusion on the proper APIs to build, I can modify the original SPARK-25299 SPIP to reflect the choices we’ve made. But I wanted to write more extensively
on this topic separately to encourage focused discussion on this subset of the problem space.
If you would like to catch up on the discussions we have had so far, I give some background to the subject matter and have linked to other relevant design documents and discussion threads in this document.
Feedback is definitely appreciated – I acknowledge that this is a fairly complex space with lots of potential viable options, so I’m looking forward to engaging with dialogue moving forward.