Skip to content

DMS task is hanging and not showing any progress or errors in the Cloudwatch logs.

0

I am reaching out to the AWS DMS Community for assistance. I have a MySQL instance and am performing ongoing replication (without a full load) of 800k tables from MySQL to an S3 bucket using DMS. I created a DMS task specifying the table names, but after a few minutes, no logs appear in CloudWatch, and only 100 to 300 tables show up in the table statistics. The task remains in a running state. What steps can I take to troubleshoot this issue? How can I know what is going with the DMS?

Here is my setting for the task? { "Logging": { "EnableLogging": true, "EnableLogContext": true, "LogComponents": [ { "Severity": "LOGGER_SEVERITY_DEFAULT", "Id": "TRANSFORMATION" }, { "Severity": "LOGGER_SEVERITY_ERROR", "Id": "SOURCE_UNLOAD" }, { "Severity": "LOGGER_SEVERITY_DEFAULT", "Id": "IO" }, { "Severity": "LOGGER_SEVERITY_ERROR", "Id": "TARGET_LOAD" }, { "Severity": "LOGGER_SEVERITY_DEFAULT", "Id": "PERFORMANCE" }, { "Severity": "LOGGER_SEVERITY_ERROR", "Id": "SOURCE_CAPTURE" }, { "Severity": "LOGGER_SEVERITY_DEFAULT", "Id": "SORTER" }, { "Severity": "LOGGER_SEVERITY_DEFAULT", "Id": "REST_SERVER" }, { "Severity": "LOGGER_SEVERITY_DEFAULT", "Id": "VALIDATOR_EXT" }, { "Severity": "LOGGER_SEVERITY_ERROR", "Id": "TARGET_APPLY" }, { "Severity": "LOGGER_SEVERITY_ERROR", "Id": "TASK_MANAGER" }, { "Severity": "LOGGER_SEVERITY_DEFAULT", "Id": "TABLES_MANAGER" }, { "Severity": "LOGGER_SEVERITY_DEFAULT", "Id": "METADATA_MANAGER" }, { "Severity": "LOGGER_SEVERITY_DEFAULT", "Id": "FILE_FACTORY" }, { "Severity": "LOGGER_SEVERITY_DEFAULT", "Id": "COMMON" }, { "Severity": "LOGGER_SEVERITY_DEFAULT", "Id": "ADDONS" }, { "Severity": "LOGGER_SEVERITY_DEFAULT", "Id": "DATA_STRUCTURE" }, { "Severity": "LOGGER_SEVERITY_DEFAULT", "Id": "COMMUNICATION" }, { "Severity": "LOGGER_SEVERITY_DEFAULT", "Id": "FILE_TRANSFER" } ], "LogConfiguration": { "TraceOnErrorMb": 10, "EnableTraceOnError": false }, "CloudWatchLogGroup": "dms-tasks-glue-mysql-to-sf-dms-replication-instance-prd", "CloudWatchLogStream": "dms-task-5ARBYB5IL5FSRCZY54UFPOGXCA" }, "StreamBufferSettings": { "StreamBufferCount": 3, "CtrlStreamBufferSizeInMB": 5, "StreamBufferSizeInMB": 8 }, "ErrorBehavior": { "FailOnNoTablesCaptured": false, "ApplyErrorUpdatePolicy": "LOG_ERROR", "FailOnTransactionConsistencyBreached": false, "RecoverableErrorThrottlingMax": 2, "DataErrorEscalationPolicy": "SUSPEND_TABLE", "ApplyErrorEscalationCount": 0, "RecoverableErrorStopRetryAfterThrottlingMax": true, "RecoverableErrorThrottling": true, "ApplyErrorFailOnTruncationDdl": false, "DataTruncationErrorPolicy": "LOG_ERROR", "ApplyErrorInsertPolicy": "LOG_ERROR", "EventErrorPolicy": "IGNORE", "ApplyErrorEscalationPolicy": "LOG_ERROR", "RecoverableErrorCount": 3, "DataErrorEscalationCount": 0, "TableErrorEscalationPolicy": "STOP_TASK", "RecoverableErrorInterval": 1, "ApplyErrorDeletePolicy": "IGNORE_RECORD", "TableErrorEscalationCount": 0, "FullLoadIgnoreConflicts": true, "DataErrorPolicy": "LOG_ERROR", "TableErrorPolicy": "SUSPEND_TABLE" }, "TTSettings": { "TTS3Settings": null, "TTRecordSettings": null, "EnableTT": false }, "FullLoadSettings": { "CommitRate": 50000, "StopTaskCachedChangesApplied": false, "StopTaskCachedChangesNotApplied": false, "MaxFullLoadSubTasks": 40, "TransactionConsistencyTimeout": 600, "CreatePkAfterFullLoad": false, "TargetTablePrepMode": "DO_NOTHING" }, "TargetMetadata": { "ParallelApplyBufferSize": 0, "ParallelApplyQueuesPerThread": 0, "ParallelApplyThreads": 0, "TargetSchema": "", "InlineLobMaxSize": 0, "ParallelLoadQueuesPerThread": 0, "SupportLobs": false, "LobChunkSize": 0, "TaskRecoveryTableEnabled": false, "ParallelLoadThreads": 0, "LobMaxSize": 0, "BatchApplyEnabled": false, "FullLobMode": false, "LimitedSizeLobMode": false, "LoadMaxFileSize": 0, "ParallelLoadBufferSize": 0 }, "BeforeImageSettings": null, "ControlTablesSettings": { "historyTimeslotInMinutes": 5, "HistoryTimeslotInMinutes": 5, "StatusTableEnabled": false, "SuspendedTablesTableEnabled": false, "HistoryTableEnabled": false, "ControlSchema": "", "FullLoadExceptionTableEnabled": false }, "LoopbackPreventionSettings": null, "CharacterSetSettings": null, "FailTaskWhenCleanTaskResourceFailed": false, "ChangeProcessingTuning": { "StatementCacheSize": 50, "CommitTimeout": 1, "RecoveryTimeout": -1, "BatchApplyPreserveTransaction": true, "BatchApplyTimeoutMin": 1, "BatchSplitSize": 0, "BatchApplyTimeoutMax": 30, "MinTransactionSize": 1000, "MemoryKeepTime": 60, "BatchApplyMemoryLimit": 2000, "MemoryLimitTotal": 70000 }, "ChangeProcessingDdlHandlingPolicy": { "HandleSourceTableDropped": false, "HandleSourceTableTruncated": false, "HandleSourceTableAltered": false }, "PostProcessingRules": null }

Please see all attached screen shots. Enter image description here Enter image description here
!Enter image description here Enter image description here
Enter image description here Enter image description here Enter image description here

asked a year ago653 views
2 Answers
0

Based on the information provided, here are some steps to troubleshoot the AWS DMS task issue and some considerations:

  1. Increase logging severity: Your current logging configuration has most components set to "LOGGER_SEVERITY_DEFAULT" or "LOGGER_SEVERITY_ERROR". Increase the severity to "LOGGER_SEVERITY_INFO" or "LOGGER_SEVERITY_DEBUG" for key components like SOURCE_CAPTURE, TARGET_LOAD, and TASK_MANAGER. This will provide more detailed logs in CloudWatch.

  2. Check CloudWatch Logs: Ensure you're looking at the correct CloudWatch Log Group and Stream. The log group should be "dms-tasks-glue-mysql-to-sf-dms-replication-instance-prd" and the stream "dms-task-5ARBYB5IL5FSRCZY54UFPOGXCA" based on your configuration.

  3. Verify source database connectivity: Ensure the DMS replication instance can connect to your MySQL source. Check network settings, security groups, and database credentials.

  4. Review task settings:

    • Increase "MaxFullLoadSubTasks" from 40 to a higher value, considering your large number of tables.
    • Adjust "ChangeProcessingTuning" settings, particularly increasing "BatchSplitSize" and "BatchApplyMemoryLimit".
  5. Monitor task progress: Use the AWS DMS console or AWS CLI to check the task status and table statistics regularly.

  6. Check for resource constraints: Ensure your DMS replication instance has sufficient CPU and memory to handle 800k tables. Consider upgrading to a larger instance type if necessary.

  7. Validate table selection: Confirm that your task is correctly configured to capture all 800k tables. Check for any table filters or selection rules that might be limiting the number of tables processed.

  8. Use AWS DMS Assessment: Run an AWS DMS Database Migration Assessment to identify potential issues with your migration setup.

  9. Break down the migration: Consider splitting the 800k tables into multiple DMS tasks to reduce the load on a single task.

  10. Enable CloudWatch Metrics: Enable detailed CloudWatch metrics for your DMS task to get more insights into its performance.

Best Practices:

  • Start with a subset of tables to test and validate the migration process before attempting all 800k tables.
  • Use parallel load where possible to improve performance.
  • Regularly monitor and tune your DMS task based on performance metrics.
  • Consider using AWS Schema Conversion Tool (SCT) to help with large-scale migrations.

Important Considerations:

  • Migrating 800k tables is a significant undertaking. Expect longer processing times and potential resource constraints.
  • Ensure your S3 bucket has appropriate permissions and capacity to handle the data volume.
  • Be prepared to iterate on your task configuration to find the optimal settings for your specific use case.

If these steps don't resolve the issue, consider opening a support ticket with AWS for more specialized assistance with your large-scale migration.

AWS
answered a year ago
  • All Permission issues are working fine. I am only doing ongoing replication changes. Not the full load.

    I see below errors. I have attached the screenshot.

    Error parsing DDL [1020454] (mysql_endpoint_util.c:955)

0

I see below errors. I have attached the screenshot.

Error parsing DDL [1020454] (mysql_endpoint_util.c:955)

Enter image description here

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.