netezza -> S3 copy

0

A customer is running IBM Netezza:

  • they want to keep a copy of data stored in netezza in AWS
  • as the netezza will still be used for some time, the copy needs to stay in sync
  • gradually over time, netezza will be replaced by functionality in SAP running in EC2
  • multiple solutions will be using the copy in AWS as a the single source of truth.

So I was thinking to let them use the SCT Data Extractors to store the copy into S3: https://aws.amazon.com/blogs/database/how-to-migrate-your-data-warehouse-to-amazon-redshift-using-the-aws-schema-conversion-tool-data-extractors/

While Redshift will be an option, it won't be the only solution that needs to access this data. I understand that SCT prepares the data for redshift, so will it make sense to use the copy in S3 as a source? Is it a reliable solution to keep it in sync on a daily basis for a relatively long term with this SCT process?

asked 6 years ago347 views
1 Answer
0
Accepted Answer

SCT can use Netezza as a source for the "schema" only, not the actual data. DMS uses Change Data Capture (CDC) to keep a source and a target synchronized. Netezza is not a source for DMS, because the CDC relies on logs, which Netezza does not use for transaction control. So, Netezza cannot be "synched" with a target using DMS.

The work-around is to have the ETL systems that are loading Netezza write to a second target, in this case SAP, so that the data changes can be applied to each independently. There is a lot of complexity in making sure there is no split-brain, where the systems become unsynchronized. This is mitigated by using Audit/Balance/Control mechanisms in the ETL. This ABC will likely need to be built as net-new for this migration--most Netezza customers do not have ABC built into their ETL architecture.

Sending the changes to S3 is not recommended, because they usually contain UPDATE and DELETE requests, which S3 cannot support. The target needs to be able to perform these (in addition to INSERT) in a preserved order, to ensure the two databases are equivalent/synchronized.

Some customers perform this synchronization in low-latency form, others prefer to update the second target in batches, but the order of the transactions matters and so they must be performed in alignment with that strategy.

answered 6 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions