For the same data source in two SQLs, how to read it once?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

For the same data source in two SQLs, how to read it once?

Gang Li
Hi all,

I ran two Spark SQL, they read the same table, partition, but write to
different tables. Is there any way to merge them into one SQL, and realize
the read data operation is only run once?

Suppose there are two SQL:
-----------------------------------------------------------------------------------------------------------------
INSERT OVERWRITE TABLE spark_input_test2 PARTITION(dt='20200909')
SELECT name, number, age
FROM spark_input_test  WHERE dt='20200908'
-----------------------------------------------------------------------------------------------------------------
INSERT OVERWRITE TABLE spark_input_test1 PARTITION(dt='20200909')
SELECT name, number, sex
FROM spark_input_test  WHERE dt='20200908'
-----------------------------------------------------------------------------------------------------------------

Running these two SQL statements will generate two Physical Plan, and the
data source "spark_input_test" will be read twice. If spark_input_test is
read only once, it will save memory.


Cheers,
Gang Li





--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: For the same data source in two SQLs, how to read it once?

Yang shun
You can do this by creating a temporary table.
1.Ensure that all fields are included and cached as a dataset when the data is first pulled(age、sex、other...)  
2.When outputting to different tables, select different fields of the cached dataset.
Reply | Threaded
Open this post in threaded view
|

Re: For the same data source in two SQLs, how to read it once?

Gang Li
This post was updated on .
Writing to the temporary table does allow the data source to read once, but
writing to the temporary table will have disk I/O operations, and there is
no effective use of Spark RDD's memory-based operations



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Reply | Threaded
Open this post in threaded view
|

Re: For the same data source in two SQLs, how to read it once?

Gang Li
This post was updated on .
If use a temporary table, the execution process is shown in the following
figure


Is there any way to achieve the following figure?



thanks!



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Reply | Threaded
Open this post in threaded view
|

回复: For the same data source in two SQLs, how to read it once?

Liu Genie
I think Yang Sun means createTempView, which will cache data in memory instead of writing to disk.

And I sugguest this post should be discussed in [hidden email] mail list.

发件人: Gang Li <[hidden email]>
发送时间: 2020年9月9日 17:06
收件人: [hidden email] <[hidden email]>
主题: Re: For the same data source in two SQLs, how to read it once?
 
If use a temporary table, the execution process is shown in the following
figure
<http://apache-spark-developers-list.1001551.n3.nabble.com/file/t3738/tmp1.png>

Is there any way to achieve the following figure?
<http://apache-spark-developers-list.1001551.n3.nabble.com/file/t3738/tmp2.png>

thanks!



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: 回复: For the same data source in two SQLs, how to read it once?

Gang Li
I will pay attention in the future, thank you very much for your suggestions.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]