AWS DMS - CDC - Unique Constraint Error

0

How to resolve "ORA-00001: unique constraint" during AWS DMS CDC mode ?

Source - On Prem - Oracle || Target - AWS RDS - Oracle

I exported source DB with Datapump with SCN, imported same on target and started migration task with same SCN but I am getting error "ORA-00001: unique constraint" , because of this migration task is slow and change mode to "one-by-one" from "Bulk Apply".

I am getting above error for table which has more than 100 millions of records. How can I get rid of this error ? Do I need to change anything before I take export?

1 Answer
1
Accepted Answer

You can divide the tables into multiple tasks to improve the CDC performance. You can review the "awsdms_apply_exceptions" in your target database schema for more information on the violations. Increasing the logging will give you more information on the why the violations are occurring (duplicated data in batch transactions before applying the commit vs real violations on the keys).

https://aws.amazon.com/premiumsupport/knowledge-center/dms-enable-debug-logging/

Further you can also enable BatchApplyEnabled and disable (BatchApplyPreserveTransaction) please refer below link for more information:

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.Oracle.html#CHAP_Target.Oracle.ConnectionAttrib

AWS DMS uses the following methods to replicate data in the change data capture (CDC) phase:

Transactional apply
Batch apply

The AWS DMS CDC process is single threaded, by default (transactional apply). This is the same method used for SQL replication as by all other online transactional processing (OLTP) database engines. DMS CDC replication is dependent on the source database transaction logs. During the ongoing replication phase, DMS applies changes using a transactional apply method, as follows:

DMS reads changes from the transaction log, from the source into the replication DB instance memory.
DMS translates changes and then passes them on to a sorter component.
The sorter component sorts transactions in commit order, and then forwards them to the target, sequentially.

If the rate of change is high on the source DB, this process can take time. You might observe a spike in CDC target latency metrics when DMS receives high incoming workload from source DB.

DMS uses a single threaded replication method to process the CDC changes. DMS provides the task level setting BatchApplyEnabled to quickly process changes on a target using batches. BatchApplyEnabled is useful if you have high workload on the source DB, and a task with high target CDC latency. By default, DMS disables BatchApplySetting. You can enable this using AWS Command Line Interface (AWS CLI).

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TaskSettings.ChangeProcessingTuning.html

If you run a task with BatchApplyEnabled, DMS processes changes in the following way:

DMS collects the changes in batch from the source DB transaction logs.
DMS creates a table called the net changes table, with all changes from the batch.
This table resides in the memory of the replication DB instance, and is passed on to the target DB instance.
DMS applies a net changes algorithm that nets out all changes from the net changes table to actual target table.

For example, if you run a DMS task with BatchApplyEnabled, and you have a new row insert, ten updates to that row, and a delete for that row in a single batch, then DMS nets out all these transactions and doesn’t carry them over. It does this because the row is eventually deleted and no longer exists. This process reduces the number of actual transactions that are applied on the target.

BatchApplyEnabled applies the net changes algorithm in row level of a table within a batch of a particular task. So, if the source database has frequent changes (update, delete, and insert) or a combination of those workloads on the same rows, you can then get optimal use from the BatchApplyEnabled. This minimizes the changes to be applied to the target. If the collected batch is unique in changes (update/delete/insert changes for different row records), then the net change table algorithm process can't filter any events. As a result, all batch events are applied on the target in batch mode. Tables must have either a primary key or a unique key for batch apply to work.

DMS also provides the BatchApplyPreserveTransaction setting for change-processing tuning. If you enable BatchApplySetting, then BatchApplyPreserveTransaction turns on, by default. If you set it to true, then transactional integrity is preserved. A batch is guaranteed to contain all the changes within a transaction from the source. This setting applies only to Oracle target endpoints.

When the BatchApplyPreserveTransaction setting is true, DMS captures the entire long-running transaction in the memory of the replication DB instance. It does this according to the task settings MemoryLimitTotal and MemoryKeepTime, and swaps as needed, before it sends changes to the net changes table. When the BatchApplyPreserveTransaction setting is false, changes from a single transaction can span across multiple batches. This can lead to data loss when partially applied, for example, due to target database unavailability.

[+] https://aws.amazon.com/blogs/database/debugging-your-aws-dms-migrations-what-to-do-when-things-go-wrong-part-2/

[+] https://aws.amazon.com/blogs/database/debugging-your-aws-dms-migrations-what-to-do-when-things-go-wrong-part-3/

You can use batch apply in the following circumstances:

The task has a high number of transactions captured from the source and this is causing target latency.
The task has a workload from source that is a combination of insert, update, and delete on the same rows.
No requirement to keep strict referential integrity on the target (disabled FKs).

When BatchApplyEnabled is set to true, AWS DMS generates an error message if a target table has a unique constraint.

As such, I would recommend to enable Batch apply to 'true' and increasing BatchApplyTimeoutMin to ~300 seconds (5 minutes) and BatchApplyTimeoutMax to 600 seconds (10 minutes) and increasing BatchApplyMemoryLimit (depending on your current free memory on your RI).

This means that DMS will only send a batch to target every 10 minutes, or once BatchApplyMemoryLimit of transactions have been captured on the RI. This should positively influence the target latency, by allowing larger, less frequent commits to take place. In order to alter this setting, stop the task, modify the setting, save the changes and resume the task (can test on test environment first to monitor change in case you need to further increase BatchApplyMemoryLimit).

Only once the amount of time in BatchApplyTimeoutMin has passed, will DMS assess BatchApplyTimeoutMax and BatchApplyMemoryLimit. Once one of the values for BatchApplyTimeoutMax or BatchApplyMemoryLimit is reached, the batch will be sent to the target. So to explain it differently, once BatchApplyTimeoutMin has passed, whichever comes first between BatchApplyTimeoutMax and BatchApplyMemoryLimit will trigger DMS to send the batch to the target.

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TaskSettings.ChangeProcessingTuning.html

AWS
SUPPORT ENGINEER
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions