SessionCatalog lock issue

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

SessionCatalog lock issue

Chang Chen
hi  all

We met an issue which is related with SessionCatalog synchronized, for example

def tableExists(name: TableIdentifier): Boolean = synchronized {
val db = formatDatabaseName(name.database.getOrElse(currentDb))
val table = formatTableName(name.table)
externalCatalog.tableExists(db, table)
}
We have modified the underlying hive meta store which a different hive database is placed in its own shard for performance. However, we found that the synchronized limits the concurrency, we would like 
1. replace  synchronized with a read-write lock
2. remove synchronized in hive withClient function 

Let me know what you guys think of it?

If it is ok, I will create an issue and contribute a PR

Thanks
Chang 
Reply | Threaded
Open this post in threaded view
|

Re: SessionCatalog lock issue

cloud0fan
The `synchronized` is needed for getting `currentDb` IIUC. So a small change is to only wrap `formatDatabaseName(name.database.getOrElse(currentDb))` with `synchronized`.

On Thu, Mar 18, 2021 at 3:38 PM Chang Chen <[hidden email]> wrote:
hi  all

We met an issue which is related with SessionCatalog synchronized, for example

def tableExists(name: TableIdentifier): Boolean = synchronized {
val db = formatDatabaseName(name.database.getOrElse(currentDb))
val table = formatTableName(name.table)
externalCatalog.tableExists(db, table)
}
We have modified the underlying hive meta store which a different hive database is placed in its own shard for performance. However, we found that the synchronized limits the concurrency, we would like 
1. replace  synchronized with a read-write lock
2. remove synchronized in hive withClient function 

Let me know what you guys think of it?

If it is ok, I will create an issue and contribute a PR

Thanks
Chang 
Reply | Threaded
Open this post in threaded view
|

Re: SessionCatalog lock issue

Chang Chen
Oh, yeah, moving the call to externalCatalog out of `synchronized` would be better.

But we can not do such a simple change in all cases, anyway, we will try.

Thanks


Wenchen Fan <[hidden email]> 于2021年3月18日周四 下午9:47写道:
The `synchronized` is needed for getting `currentDb` IIUC. So a small change is to only wrap `formatDatabaseName(name.database.getOrElse(currentDb))` with `synchronized`.

On Thu, Mar 18, 2021 at 3:38 PM Chang Chen <[hidden email]> wrote:
hi  all

We met an issue which is related with SessionCatalog synchronized, for example

def tableExists(name: TableIdentifier): Boolean = synchronized {
val db = formatDatabaseName(name.database.getOrElse(currentDb))
val table = formatTableName(name.table)
externalCatalog.tableExists(db, table)
}
We have modified the underlying hive meta store which a different hive database is placed in its own shard for performance. However, we found that the synchronized limits the concurrency, we would like 
1. replace  synchronized with a read-write lock
2. remove synchronized in hive withClient function 

Let me know what you guys think of it?

If it is ok, I will create an issue and contribute a PR

Thanks
Chang 
Reply | Threaded
Open this post in threaded view
|

Re: SessionCatalog lock issue

Chang Chen
In reply to this post by cloud0fan
Hi Wenchen

Thanks for your suggestion, would you please review this PR(https://github.com/apache/spark/pull/31891)

Thanks
Chang

Wenchen Fan <[hidden email]> 于2021年3月18日周四 下午9:47写道:
The `synchronized` is needed for getting `currentDb` IIUC. So a small change is to only wrap `formatDatabaseName(name.database.getOrElse(currentDb))` with `synchronized`.

On Thu, Mar 18, 2021 at 3:38 PM Chang Chen <[hidden email]> wrote:
hi  all

We met an issue which is related with SessionCatalog synchronized, for example

def tableExists(name: TableIdentifier): Boolean = synchronized {
val db = formatDatabaseName(name.database.getOrElse(currentDb))
val table = formatTableName(name.table)
externalCatalog.tableExists(db, table)
}
We have modified the underlying hive meta store which a different hive database is placed in its own shard for performance. However, we found that the synchronized limits the concurrency, we would like 
1. replace  synchronized with a read-write lock
2. remove synchronized in hive withClient function 

Let me know what you guys think of it?

If it is ok, I will create an issue and contribute a PR

Thanks
Chang