[SS] Why does StreamingQueryManager.notifyQueryTermination use id and runId (not just id)?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[SS] Why does StreamingQueryManager.notifyQueryTermination use id and runId (not just id)?

Jacek Laskowski
Hi,

I'm wondering why StreamingQueryManager.notifyQueryTermination [1] use a query id to remove it from the activeQueries internal registry [2] while notifies stateStoreCoordinator using runId [3]?

My understanding is that id is the same across different runs of a query so once StreamingQueryManager removes the query (by its id) it effectively knows nothing about the query yet stateStoreCoordinator may have other instances running (since we only deactivated a single run).

Why is the "inconsistency"?



Reply | Threaded
Open this post in threaded view
|

Re: [SS] Why does StreamingQueryManager.notifyQueryTermination use id and runId (not just id)?

Shixiong(Ryan) Zhu
stateStoreCoordinator uses runId to deal with a small chance that Spark cannot turn a bad task down. Please see https://github.com/apache/spark/pull/18355

On Fri, Oct 27, 2017 at 3:40 AM, Jacek Laskowski <[hidden email]> wrote:
Hi,

I'm wondering why StreamingQueryManager.notifyQueryTermination [1] use a query id to remove it from the activeQueries internal registry [2] while notifies stateStoreCoordinator using runId [3]?

My understanding is that id is the same across different runs of a query so once StreamingQueryManager removes the query (by its id) it effectively knows nothing about the query yet stateStoreCoordinator may have other instances running (since we only deactivated a single run).

Why is the "inconsistency"?