DMS - Ongoing replication causing duplicate records due to redshift cluster maintenance

0

Hi,

I'm facing an issue with DMS ongoing replication causing a small number of duplicate records due to the target redshift cluster going down for maintenance every week. One of the error messages in the cloudwatch logs is Failed to start rollback apply transaction, so it looks like DMS fails to undo a batch, and with RecoverableErrorCount set to -1, the process ends up retrying the ongoing replication from the batch it already ran, resulting in the duplicates.

Is there a way via the DMS settings to handle this rollback failure issue caused by the cluster maintenance? Or maybe a way to work around the cluster going down for maintenance?

Thanks!

Below are my settings:

Source: Mysql Target: Redshift

{ "ErrorBehavior": { "FailOnNoTablesCaptured": true, "ApplyErrorUpdatePolicy": "SUSPEND_TABLE", "FailOnTransactionConsistencyBreached": true, "RecoverableErrorThrottlingMax": 1800, "DataErrorEscalationPolicy": "SUSPEND_TABLE", "ApplyErrorEscalationCount": 5, "RecoverableErrorStopRetryAfterThrottlingMax": true, "RecoverableErrorThrottling": true, "ApplyErrorFailOnTruncationDdl": false, "DataTruncationErrorPolicy": "SUSPEND_TABLE", "ApplyErrorInsertPolicy": "SUSPEND_TABLE", "EventErrorPolicy": "IGNORE", "ApplyErrorEscalationPolicy": "SUSPEND_TABLE", "RecoverableErrorCount": -1, "DataErrorEscalationCount": 5, "TableErrorEscalationPolicy": "SUSPEND_TABLE", "RecoverableErrorInterval": 5, "ApplyErrorDeletePolicy": "SUSPEND_TABLE", "TableErrorEscalationCount": 5, "FullLoadIgnoreConflicts": true, "DataErrorPolicy": "SUSPEND_TABLE", "TableErrorPolicy": "SUSPEND_TABLE" }, "TTSettings": { "TTS3Settings": null, "TTRecordSettings": null, "EnableTT": false }, "FullLoadSettings": { "CommitRate": 10000, "StopTaskCachedChangesApplied": false, "StopTaskCachedChangesNotApplied": false, "MaxFullLoadSubTasks": 8, "TransactionConsistencyTimeout": 1200, "CreatePkAfterFullLoad": false, "TargetTablePrepMode": "DROP_AND_CREATE" }, "TargetMetadata": { "ParallelApplyBufferSize": 10000, "ParallelApplyQueuesPerThread": 0, "ParallelApplyThreads": 32, "TargetSchema": "", "InlineLobMaxSize": 0, "ParallelLoadQueuesPerThread": 0, "SupportLobs": true, "LobChunkSize": 0, "TaskRecoveryTableEnabled": true, "ParallelLoadThreads": 32, "LobMaxSize": 63, "BatchApplyEnabled": true, "FullLobMode": false, "LimitedSizeLobMode": true, "LoadMaxFileSize": 0, "ParallelLoadBufferSize": 1000 }, "BeforeImageSettings": null, "ControlTablesSettings": { "historyTimeslotInMinutes": 5, "HistoryTimeslotInMinutes": 5, "StatusTableEnabled": true, "SuspendedTablesTableEnabled": true, "HistoryTableEnabled": true, "ControlSchema": "dms_control", "FullLoadExceptionTableEnabled": true }, "LoopbackPreventionSettings": null, "CharacterSetSettings": null, "FailTaskWhenCleanTaskResourceFailed": false, "ChangeProcessingTuning": { "StatementCacheSize": 50, "CommitTimeout": 1, "BatchApplyPreserveTransaction": true, "BatchApplyTimeoutMin": 1, "BatchSplitSize": 0, "BatchApplyTimeoutMax": 30, "MinTransactionSize": 1000, "MemoryKeepTime": 60, "BatchApplyMemoryLimit": 500, "MemoryLimitTotal": 1024 }, "ChangeProcessingDdlHandlingPolicy": { "HandleSourceTableDropped": true, "HandleSourceTableTruncated": true, "HandleSourceTableAltered": true }, "PostProcessingRules": null }

preguntada hace 2 años1209 visualizaciones
1 Respuesta
0
Respuesta aceptada

Dear Customer,

A very warm greeting from AWS! I hope that you are doing good!

After going through the post, I understand that you have started to witnessed the duplicate records in the ongoing replication post redshift cluster maintenance with the error "Failed to start rollback apply transaction" in dms logs. Hence you would like to know any ways/steps to rectify the error and also a workaround to handle dms task when the redshift cluster goes for a maintenance .

I would like to inform you that duplication of records into the target redshift has already been fixed in the replication instance version 3.4.5 as per link [1]. Hence, please make sure to use the replication instance version 3.4.5 and above version.

[+] AWS DMS release notes - AWS Database Migration Service 3.4.5 release notes - https://docs.aws.amazon.com/dms/latest/userguide/CHAP_ReleaseNotes.html#CHAP_ReleaseNotes.DMS345

In case if you are already using the replication instance with 3.4.5 and above; then we require task information along with the logs to get more details on this error.

Therefore, I would kindly request you to raise a support case along with task details for further investigation using the below link.

[+] https://support.console.aws.amazon.com/support/home?#/case/create

In regards to redshift cluster maintenance, as an alternative approach ~ you may consider setting up a RedShift events notification for each maintenance start (REDSHIFT-EVENT-2003) and (REDSHIFT-EVENT-2004) to trigger a Lambda function which will automatically stop and resume the DMS Task based on the event. Please go through the links [1] and [2] for more details.

[1] Amazon Redshift event notifications - Amazon Redshift event categories and event messages - https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-event-notifications.html#redshift-event-messages

[2] Automating DMS Tasks - https://aws.amazon.com/blogs/database/automating-aws-dms-migration-tasks

Have a good day!

INGENIERO DE SOPORTE
respondido hace 2 años
profile picture
EXPERTO
revisado hace un mes

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas