Support for Hive buckets

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Support for Hive buckets

Cody Koeninger-2
I noticed that the release notes for 1.1.0 said that spark doesn't support
Hive buckets "yet".  I didn't notice any jira issues related to adding
support.

Broadly speaking, what would be involved in supporting buckets, especially
the bucketmapjoin and sortedmerge optimizations?
Reply | Threaded
Open this post in threaded view
|

Source code for mining big data with Spark

David Tung-2
Hi all,

I watched am impressed spark demo video by Reynold Xin and Aaron Davidson
in youtube ( https://www.youtube.com/watch?v=FjhRkfAuU7I ). Can someone
let me know where can I find the source codes for the demo? I can¹t see
the source codes from video clearly.

Thanks in advance

CONFIDENTIALITY CAUTION
This e-mail and any attachments may be confidential or legally privileged. If you received this message in error or are not the intended recipient, you should destroy the e-mail message and any attachments or copies, and you are prohibited from retaining, distributing, disclosing or using any information contained herein. Please inform us of the erroneous delivery by return e-mail. Thank you for your cooperation.
DOCUMENT CONFIDENTIEL
Le présent courriel et tout fichier joint à celui-ci peuvent contenir des renseignements confidentiels ou privilégiés. Si cet envoi ne s'adresse pas à vous ou si vous l'avez reçu par erreur, vous devez l'effacer. Vous ne pouvez conserver, distribuer, communiquer ou utiliser les renseignements qu'il contient. Nous vous prions de nous signaler l'erreur par courriel. Merci de votre collaboration.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Support for Hive buckets

Michael Armbrust
In reply to this post by Cody Koeninger-2
Hi Cody,

There are currently no concrete plans for adding buckets to Spark SQL, but
thats mostly due to lack of resources / demand for this feature.  Adding
full support is probably a fair amount of work since you'd have to make
changes throughout parsing/optimization/execution.  That said, there are
probably some smaller tasks that could be easier (for example, you might be
able to avoid a shuffle when doing joins on tables that are already
bucketed by exposing more metastore information to the planner).

Michael

On Sun, Sep 14, 2014 at 3:10 PM, Cody Koeninger <[hidden email]> wrote:

> I noticed that the release notes for 1.1.0 said that spark doesn't support
> Hive buckets "yet".  I didn't notice any jira issues related to adding
> support.
>
> Broadly speaking, what would be involved in supporting buckets, especially
> the bucketmapjoin and sortedmerge optimizations?
>
Reply | Threaded
Open this post in threaded view
|

Re: Support for Hive buckets

tanejagagan
(for example, you might be
able to avoid a shuffle when doing joins on tables that are already
bucketed by exposing more metastore information to the planner).

Can you provide more input on how to implement this functionality so that i can speed up join between 2 hive tables, both with few billion rows
Reply | Threaded
Open this post in threaded view
|

Re: Support for Hive buckets

welder404
https://issues.apache.org/jira/browse/SPARK-19256 is an active umbrella
feature.

But as of 2.2, you can invoke APIs on DataFrames today to bucketize them on
serialization using Hive.

If you invoke

val bucketCount = 100

df1
.repartition(bucketCount, col("a"), col("b"))
.bucketBy(bucketCount, "a","b")
.sortBy("a", "b")
.saveAsTable("default.table_1")

df2
.repartition(bucketCount, col("a"), col("b"))
.bucketBy(bucketCount, "a","b")
.sortBy("a", "b")
.saveAsTable("default.table_2")

Then, join table_1 on table_2 on "a", "b",  you'll find that your query plan
involves no sort or exchange, only a SortMergeJoin.




--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]