- Newest
- Most votes
- Most comments
Assuming Oracle is going to stay in Oracle Cloud, this is going to be an ongoing extract process, you could go with Glue ETL with a JDBC connection or DMS with Oracle as source and S3 as target as a replication use case. My customer is doing the latter.
The part which you need to clarify from the customer more is that are they looking to download CDC (Change Data Capture) data as ongoing (which might very well be the case) in which case S3 being a target, what is that CDC merge process going to look like to keep both Oracle and S3 in sync with S3 having a current snapshot of Oracle. Basically, I would find out more on the consumption patterns once the data is in S3. The reason I ask is that you could use DMS out of the box to do CDC, however, DMS will create separate files in S3 for each CDC run and then some ETL process needs to merge the CDC files to keep a golden copy of current snapshot on S3. Other option is using Glue ETL or DMS for full table download every time but this has to be looked at closely with a long term focus on how many tables and how much data, since downloading full volume every time on S3 can be a challenge otherwise due to the SLA etc.
Using Glue ETL with JDBC connection is really a Spark job which in my mind fits better if you want to do the following
(1) Merge the CDC data coming from Oracle to create the current snapshot copy on S3
(2) Any other transformation you want to do with the data either after it is brought from Oracle to S3 or during
Having said that, you do need to write the CDC queries by hand to pull delta from Oracle for ongoing runs and also CDC merge process needs to be coded.
Hope this helps!
Relevant content
- asked 2 years ago
- asked 10 months ago
- AWS OFFICIALUpdated 22 days ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 10 months ago
- AWS OFFICIALUpdated a year ago