Data ingestion from Oracle Cloud into S3

0

A customer wants to build a data lake on AWS and one of the sources will be a Unified Model EDW in Oracle Cloud. Which are the options to extract data from the EDW and load into S3 as text flat files (CSV)? Can Glue do the job?

Many thanks

AWS
Antonio
질문됨 6년 전981회 조회
1개 답변
0
수락된 답변

Assuming Oracle is going to stay in Oracle Cloud, this is going to be an ongoing extract process, you could go with Glue ETL with a JDBC connection or DMS with Oracle as source and S3 as target as a replication use case. My customer is doing the latter.

The part which you need to clarify from the customer more is that are they looking to download CDC (Change Data Capture) data as ongoing (which might very well be the case) in which case S3 being a target, what is that CDC merge process going to look like to keep both Oracle and S3 in sync with S3 having a current snapshot of Oracle. Basically, I would find out more on the consumption patterns once the data is in S3. The reason I ask is that you could use DMS out of the box to do CDC, however, DMS will create separate files in S3 for each CDC run and then some ETL process needs to merge the CDC files to keep a golden copy of current snapshot on S3. Other option is using Glue ETL or DMS for full table download every time but this has to be looked at closely with a long term focus on how many tables and how much data, since downloading full volume every time on S3 can be a challenge otherwise due to the SLA etc.

Using Glue ETL with JDBC connection is really a Spark job which in my mind fits better if you want to do the following

(1) Merge the CDC data coming from Oracle to create the current snapshot copy on S3

(2) Any other transformation you want to do with the data either after it is brought from Oracle to S3 or during

Having said that, you do need to write the CDC queries by hand to pull delta from Oracle for ongoing runs and also CDC merge process needs to be coded.

Hope this helps!

AWS
답변함 6년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠