By using AWS re:Post, you agree to the Terms of Use
/Data ingestion from Oracle Cloud into S3/

Data ingestion from Oracle Cloud into S3


A customer wants to build a data lake on AWS and one of the sources will be a Unified Model EDW in Oracle Cloud. Which are the options to extract data from the EDW and load into S3 as text flat files (CSV)? Can Glue do the job?

Many thanks

1 Answers
Accepted Answer

Assuming Oracle is going to stay in Oracle Cloud, this is going to be an ongoing extract process, you could go with Glue ETL with a JDBC connection or DMS with Oracle as source and S3 as target as a replication use case. My customer is doing the latter.

The part which you need to clarify from the customer more is that are they looking to download CDC (Change Data Capture) data as ongoing (which might very well be the case) in which case S3 being a target, what is that CDC merge process going to look like to keep both Oracle and S3 in sync with S3 having a current snapshot of Oracle. Basically, I would find out more on the consumption patterns once the data is in S3. The reason I ask is that you could use DMS out of the box to do CDC, however, DMS will create separate files in S3 for each CDC run and then some ETL process needs to merge the CDC files to keep a golden copy of current snapshot on S3. Other option is using Glue ETL or DMS for full table download every time but this has to be looked at closely with a long term focus on how many tables and how much data, since downloading full volume every time on S3 can be a challenge otherwise due to the SLA etc.

Using Glue ETL with JDBC connection is really a Spark job which in my mind fits better if you want to do the following

(1) Merge the CDC data coming from Oracle to create the current snapshot copy on S3

(2) Any other transformation you want to do with the data either after it is brought from Oracle to S3 or during

Having said that, you do need to write the CDC queries by hand to pull delta from Oracle for ongoing runs and also CDC merge process needs to be coded.

Hope this helps!

answered 4 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions