Re: Need help for Delta.io

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: Need help for Delta.io

Chetan Khatri
Any thoughts.. Please

On Fri, May 10, 2019 at 2:22 AM Chetan Khatri <[hidden email]> wrote:
Hello All,

I need your help / suggestions,

I am using Spark 2.3.1 with HDP 2.6.1 Distribution, I will tell my use case so you get it where people are trying to use Delta. 
My use case is I have source as a MSSQL Server (OLTP) and get data at HDFS currently in Parquet and Avro formats. Now I would like to do Incremental load / delta load, so I am using CT (Change Tracking Ref. https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/enable-and-disable-change-tracking-sql-server?view=sql-server-2017) to get updated and deleted records Primary Key and using that I am only pulling those records which got updated and deleted. And I would like to now Update / Delete Data from Parquet. Currently I am doing full  load, which I would like to avoid.

Could you please suggest me, what is best approach.

As HDP doesn't have Spark 2.4.2 available so I can't change the infrastructure, Is there any way to use Delta.io on Spark 2.3.1 as I have existing codebase written for last year and half  in Scala 2.11  which also I don't want to break with Scala 2.12.

I don't need versioning, transaction log at parquet. So if anything else fits to my use case. Please do advise.

Thank you.