DMS - Ongoing replication causing duplicate records due to redshift cluster maintenance

0

Hi,

I'm facing an issue with DMS ongoing replication causing a small number of duplicate records due to the target redshift cluster going down for maintenance every week. One of the error messages in the cloudwatch logs is Failed to start rollback apply transaction, so it looks like DMS fails to undo a batch, and with RecoverableErrorCount set to -1, the process ends up retrying the ongoing replication from the batch it already ran, resulting in the duplicates.

Is there a way via the DMS settings to handle this rollback failure issue caused by the cluster maintenance? Or maybe a way to work around the cluster going down for maintenance?

Thanks!

Below are my settings:

Source: Mysql Target: Redshift

{ "ErrorBehavior": { "FailOnNoTablesCaptured": true, "ApplyErrorUpdatePolicy": "SUSPEND_TABLE", "FailOnTransactionConsistencyBreached": true, "RecoverableErrorThrottlingMax": 1800, "DataErrorEscalationPolicy": "SUSPEND_TABLE", "ApplyErrorEscalationCount": 5, "RecoverableErrorStopRetryAfterThrottlingMax": true, "RecoverableErrorThrottling": true, "ApplyErrorFailOnTruncationDdl": false, "DataTruncationErrorPolicy": "SUSPEND_TABLE", "ApplyErrorInsertPolicy": "SUSPEND_TABLE", "EventErrorPolicy": "IGNORE", "ApplyErrorEscalationPolicy": "SUSPEND_TABLE", "RecoverableErrorCount": -1, "DataErrorEscalationCount": 5, "TableErrorEscalationPolicy": "SUSPEND_TABLE", "RecoverableErrorInterval": 5, "ApplyErrorDeletePolicy": "SUSPEND_TABLE", "TableErrorEscalationCount": 5, "FullLoadIgnoreConflicts": true, "DataErrorPolicy": "SUSPEND_TABLE", "TableErrorPolicy": "SUSPEND_TABLE" }, "TTSettings": { "TTS3Settings": null, "TTRecordSettings": null, "EnableTT": false }, "FullLoadSettings": { "CommitRate": 10000, "StopTaskCachedChangesApplied": false, "StopTaskCachedChangesNotApplied": false, "MaxFullLoadSubTasks": 8, "TransactionConsistencyTimeout": 1200, "CreatePkAfterFullLoad": false, "TargetTablePrepMode": "DROP_AND_CREATE" }, "TargetMetadata": { "ParallelApplyBufferSize": 10000, "ParallelApplyQueuesPerThread": 0, "ParallelApplyThreads": 32, "TargetSchema": "", "InlineLobMaxSize": 0, "ParallelLoadQueuesPerThread": 0, "SupportLobs": true, "LobChunkSize": 0, "TaskRecoveryTableEnabled": true, "ParallelLoadThreads": 32, "LobMaxSize": 63, "BatchApplyEnabled": true, "FullLobMode": false, "LimitedSizeLobMode": true, "LoadMaxFileSize": 0, "ParallelLoadBufferSize": 1000 }, "BeforeImageSettings": null, "ControlTablesSettings": { "historyTimeslotInMinutes": 5, "HistoryTimeslotInMinutes": 5, "StatusTableEnabled": true, "SuspendedTablesTableEnabled": true, "HistoryTableEnabled": true, "ControlSchema": "dms_control", "FullLoadExceptionTableEnabled": true }, "LoopbackPreventionSettings": null, "CharacterSetSettings": null, "FailTaskWhenCleanTaskResourceFailed": false, "ChangeProcessingTuning": { "StatementCacheSize": 50, "CommitTimeout": 1, "BatchApplyPreserveTransaction": true, "BatchApplyTimeoutMin": 1, "BatchSplitSize": 0, "BatchApplyTimeoutMax": 30, "MinTransactionSize": 1000, "MemoryKeepTime": 60, "BatchApplyMemoryLimit": 500, "MemoryLimitTotal": 1024 }, "ChangeProcessingDdlHandlingPolicy": { "HandleSourceTableDropped": true, "HandleSourceTableTruncated": true, "HandleSourceTableAltered": true }, "PostProcessingRules": null }

posta 2 anni fa1209 visualizzazioni
1 Risposta
0
Risposta accettata

Dear Customer,

A very warm greeting from AWS! I hope that you are doing good!

After going through the post, I understand that you have started to witnessed the duplicate records in the ongoing replication post redshift cluster maintenance with the error "Failed to start rollback apply transaction" in dms logs. Hence you would like to know any ways/steps to rectify the error and also a workaround to handle dms task when the redshift cluster goes for a maintenance .

I would like to inform you that duplication of records into the target redshift has already been fixed in the replication instance version 3.4.5 as per link [1]. Hence, please make sure to use the replication instance version 3.4.5 and above version.

[+] AWS DMS release notes - AWS Database Migration Service 3.4.5 release notes - https://docs.aws.amazon.com/dms/latest/userguide/CHAP_ReleaseNotes.html#CHAP_ReleaseNotes.DMS345

In case if you are already using the replication instance with 3.4.5 and above; then we require task information along with the logs to get more details on this error.

Therefore, I would kindly request you to raise a support case along with task details for further investigation using the below link.

[+] https://support.console.aws.amazon.com/support/home?#/case/create

In regards to redshift cluster maintenance, as an alternative approach ~ you may consider setting up a RedShift events notification for each maintenance start (REDSHIFT-EVENT-2003) and (REDSHIFT-EVENT-2004) to trigger a Lambda function which will automatically stop and resume the DMS Task based on the event. Please go through the links [1] and [2] for more details.

[1] Amazon Redshift event notifications - Amazon Redshift event categories and event messages - https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-event-notifications.html#redshift-event-messages

[2] Automating DMS Tasks - https://aws.amazon.com/blogs/database/automating-aws-dms-migration-tasks

Have a good day!

TECNICO DI SUPPORTO
con risposta 2 anni fa
profile picture
ESPERTO
verificato un mese fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande