DMS - Ongoing replication causing duplicate records due to redshift cluster maintenance

0

Hi,

I'm facing an issue with DMS ongoing replication causing a small number of duplicate records due to the target redshift cluster going down for maintenance every week. One of the error messages in the cloudwatch logs is Failed to start rollback apply transaction, so it looks like DMS fails to undo a batch, and with RecoverableErrorCount set to -1, the process ends up retrying the ongoing replication from the batch it already ran, resulting in the duplicates.

Is there a way via the DMS settings to handle this rollback failure issue caused by the cluster maintenance? Or maybe a way to work around the cluster going down for maintenance?

Thanks!

Below are my settings:

Source: Mysql Target: Redshift

{ "ErrorBehavior": { "FailOnNoTablesCaptured": true, "ApplyErrorUpdatePolicy": "SUSPEND_TABLE", "FailOnTransactionConsistencyBreached": true, "RecoverableErrorThrottlingMax": 1800, "DataErrorEscalationPolicy": "SUSPEND_TABLE", "ApplyErrorEscalationCount": 5, "RecoverableErrorStopRetryAfterThrottlingMax": true, "RecoverableErrorThrottling": true, "ApplyErrorFailOnTruncationDdl": false, "DataTruncationErrorPolicy": "SUSPEND_TABLE", "ApplyErrorInsertPolicy": "SUSPEND_TABLE", "EventErrorPolicy": "IGNORE", "ApplyErrorEscalationPolicy": "SUSPEND_TABLE", "RecoverableErrorCount": -1, "DataErrorEscalationCount": 5, "TableErrorEscalationPolicy": "SUSPEND_TABLE", "RecoverableErrorInterval": 5, "ApplyErrorDeletePolicy": "SUSPEND_TABLE", "TableErrorEscalationCount": 5, "FullLoadIgnoreConflicts": true, "DataErrorPolicy": "SUSPEND_TABLE", "TableErrorPolicy": "SUSPEND_TABLE" }, "TTSettings": { "TTS3Settings": null, "TTRecordSettings": null, "EnableTT": false }, "FullLoadSettings": { "CommitRate": 10000, "StopTaskCachedChangesApplied": false, "StopTaskCachedChangesNotApplied": false, "MaxFullLoadSubTasks": 8, "TransactionConsistencyTimeout": 1200, "CreatePkAfterFullLoad": false, "TargetTablePrepMode": "DROP_AND_CREATE" }, "TargetMetadata": { "ParallelApplyBufferSize": 10000, "ParallelApplyQueuesPerThread": 0, "ParallelApplyThreads": 32, "TargetSchema": "", "InlineLobMaxSize": 0, "ParallelLoadQueuesPerThread": 0, "SupportLobs": true, "LobChunkSize": 0, "TaskRecoveryTableEnabled": true, "ParallelLoadThreads": 32, "LobMaxSize": 63, "BatchApplyEnabled": true, "FullLobMode": false, "LimitedSizeLobMode": true, "LoadMaxFileSize": 0, "ParallelLoadBufferSize": 1000 }, "BeforeImageSettings": null, "ControlTablesSettings": { "historyTimeslotInMinutes": 5, "HistoryTimeslotInMinutes": 5, "StatusTableEnabled": true, "SuspendedTablesTableEnabled": true, "HistoryTableEnabled": true, "ControlSchema": "dms_control", "FullLoadExceptionTableEnabled": true }, "LoopbackPreventionSettings": null, "CharacterSetSettings": null, "FailTaskWhenCleanTaskResourceFailed": false, "ChangeProcessingTuning": { "StatementCacheSize": 50, "CommitTimeout": 1, "BatchApplyPreserveTransaction": true, "BatchApplyTimeoutMin": 1, "BatchSplitSize": 0, "BatchApplyTimeoutMax": 30, "MinTransactionSize": 1000, "MemoryKeepTime": 60, "BatchApplyMemoryLimit": 500, "MemoryLimitTotal": 1024 }, "ChangeProcessingDdlHandlingPolicy": { "HandleSourceTableDropped": true, "HandleSourceTableTruncated": true, "HandleSourceTableAltered": true }, "PostProcessingRules": null }

gefragt vor 2 Jahren1209 Aufrufe
1 Antwort
0
Akzeptierte Antwort

Dear Customer,

A very warm greeting from AWS! I hope that you are doing good!

After going through the post, I understand that you have started to witnessed the duplicate records in the ongoing replication post redshift cluster maintenance with the error "Failed to start rollback apply transaction" in dms logs. Hence you would like to know any ways/steps to rectify the error and also a workaround to handle dms task when the redshift cluster goes for a maintenance .

I would like to inform you that duplication of records into the target redshift has already been fixed in the replication instance version 3.4.5 as per link [1]. Hence, please make sure to use the replication instance version 3.4.5 and above version.

[+] AWS DMS release notes - AWS Database Migration Service 3.4.5 release notes - https://docs.aws.amazon.com/dms/latest/userguide/CHAP_ReleaseNotes.html#CHAP_ReleaseNotes.DMS345

In case if you are already using the replication instance with 3.4.5 and above; then we require task information along with the logs to get more details on this error.

Therefore, I would kindly request you to raise a support case along with task details for further investigation using the below link.

[+] https://support.console.aws.amazon.com/support/home?#/case/create

In regards to redshift cluster maintenance, as an alternative approach ~ you may consider setting up a RedShift events notification for each maintenance start (REDSHIFT-EVENT-2003) and (REDSHIFT-EVENT-2004) to trigger a Lambda function which will automatically stop and resume the DMS Task based on the event. Please go through the links [1] and [2] for more details.

[1] Amazon Redshift event notifications - Amazon Redshift event categories and event messages - https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-event-notifications.html#redshift-event-messages

[2] Automating DMS Tasks - https://aws.amazon.com/blogs/database/automating-aws-dms-migration-tasks

Have a good day!

SUPPORT-TECHNIKER
beantwortet vor 2 Jahren
profile picture
EXPERTE
überprüft vor einem Monat

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen