SQLListener concurrency bug?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

SQLListener concurrency bug?

Oleksandr Vayda
Hi all,

Reading the source code of the org.apache.spark.sql.execution.ui.SQLListener, specifically this place - https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala#L328

def getFailedExecutions: Seq[SQLExecutionUIData] = synchronized {
failedExecutions
}
def getCompletedExecutions: Seq[SQLExecutionUIData] = synchronized {
completedExecutions
}
I believe the synchronized block is used here incorrectly. If I get it right the main purpose here is to synchronize access to the mutable collections from the UI (read) and the event bus (read/write) threads. But in the current implementation the "synchronized" blocks return bare references to mutable collections and in fact nothing gets synchronized.
Is it a bug?

Sincerely yours,
Oleksandr Vayda

mobile: +420 604 113 056
Reply | Threaded
Open this post in threaded view
|

Re: SQLListener concurrency bug?

Shixiong(Ryan) Zhu
Right now they are safe because the caller also calls synchronized when using them. This is to avoid copying objects. It's probably a bad design. If you want to refactor them, PR is welcome.

On Mon, Jun 26, 2017 at 2:27 AM, Oleksandr Vayda <[hidden email]> wrote:
Hi all,

Reading the source code of the org.apache.spark.sql.execution.ui.SQLListener, specifically this place - https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala#L328

def getFailedExecutions: Seq[SQLExecutionUIData] = synchronized {
failedExecutions
}
def getCompletedExecutions: Seq[SQLExecutionUIData] = synchronized {
completedExecutions
}
I believe the synchronized block is used here incorrectly. If I get it right the main purpose here is to synchronize access to the mutable collections from the UI (read) and the event bus (read/write) threads. But in the current implementation the "synchronized" blocks return bare references to mutable collections and in fact nothing gets synchronized.
Is it a bug?

Sincerely yours,
Oleksandr Vayda

mobile: <a href="tel:+420%20604%20113%20056" value="+420604113056" target="_blank">+420 604 113 056

Reply | Threaded
Open this post in threaded view
|

Re: SQLListener concurrency bug?

Oleksandr Vayda

Cool. I will be happy to create a PR. The simplest and most obvious solution that came to my mind was using Java concurrent collections instead of Scala mutable. Don't you mind to have this bit of Java inside Spark? :) Or perhaps we could use Scala concurrent collections, but they are only available in Scala 2.12 AFAIK.

Alex


On Tue, Jun 27, 2017, 05:42 Shixiong(Ryan) Zhu <[hidden email]> wrote:
Right now they are safe because the caller also calls synchronized when using them. This is to avoid copying objects. It's probably a bad design. If you want to refactor them, PR is welcome.

On Mon, Jun 26, 2017 at 2:27 AM, Oleksandr Vayda <[hidden email]> wrote:
Hi all,

Reading the source code of the org.apache.spark.sql.execution.ui.SQLListener, specifically this place - https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala#L328

def getFailedExecutions: Seq[SQLExecutionUIData] = synchronized {
failedExecutions
}
def getCompletedExecutions: Seq[SQLExecutionUIData] = synchronized {
completedExecutions
}
I believe the synchronized block is used here incorrectly. If I get it right the main purpose here is to synchronize access to the mutable collections from the UI (read) and the event bus (read/write) threads. But in the current implementation the "synchronized" blocks return bare references to mutable collections and in fact nothing gets synchronized.
Is it a bug?

Sincerely yours,
Oleksandr Vayda

mobile: <a href="tel:+420%20604%20113%20056" value="+420604113056" target="_blank">+420 604 113 056

Reply | Threaded
Open this post in threaded view
|

Re: SQLListener concurrency bug?

Oleksandr Vayda

Oh, sorry, I was wrong. Concurrent collections in Scala are available since 2.8. Any objections against replacing mutable list and synchronized with a concurrent collection like or based on TrieMap for instance?


On Wed, Jun 28, 2017, 16:05 Oleksandr Vayda <[hidden email]> wrote:

Cool. I will be happy to create a PR. The simplest and most obvious solution that came to my mind was using Java concurrent collections instead of Scala mutable. Don't you mind to have this bit of Java inside Spark? :) Or perhaps we could use Scala concurrent collections, but they are only available in Scala 2.12 AFAIK.

Alex


On Tue, Jun 27, 2017, 05:42 Shixiong(Ryan) Zhu <[hidden email]> wrote:
Right now they are safe because the caller also calls synchronized when using them. This is to avoid copying objects. It's probably a bad design. If you want to refactor them, PR is welcome.

On Mon, Jun 26, 2017 at 2:27 AM, Oleksandr Vayda <[hidden email]> wrote:
Hi all,

Reading the source code of the org.apache.spark.sql.execution.ui.SQLListener, specifically this place - https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala#L328

def getFailedExecutions: Seq[SQLExecutionUIData] = synchronized {
failedExecutions
}
def getCompletedExecutions: Seq[SQLExecutionUIData] = synchronized {
completedExecutions
}
I believe the synchronized block is used here incorrectly. If I get it right the main purpose here is to synchronize access to the mutable collections from the UI (read) and the event bus (read/write) threads. But in the current implementation the "synchronized" blocks return bare references to mutable collections and in fact nothing gets synchronized.
Is it a bug?

Sincerely yours,
Oleksandr Vayda

mobile: <a href="tel:+420%20604%20113%20056" value="+420604113056" target="_blank">+420 604 113 056