DMS COPY errors in Cloudwatch and task not failing, tables aren't suspended, but records are missing.

0

We have a task that migrates around 250 tables from an Aptible MySQL instance into Redshift that has been running fine for a month on the latest version (3.5.1.R705. It was running fine for a year on an earlier version). We were alerted by end users that data was missing from some tables. We successfully ran reloads of those tables, but as CDC went on for a couple hours, data started to go missing. Looking at the Redshift UI, we see some failed COPY commands which appear to occur in clusters, but not constantly (for instance, we haven't seen one for the past hour). Investigating Cloudwatch logs for the DMS task we see corresponding errors:

1698804134000,"2023-11-01T02:02:14:421823 [TARGET_APPLY ]D: RetCode: SQL_ERROR SqlState: XX000 NativeError: 30 Message: [Amazon][Amazon Redshift] (30) Error occurred while trying to execute a query: [SQLState XX000] ERROR: exception name : , error type : 100, message: Failed to parse error payload, should retry : 0 DETAIL: ----------------------------------------------- error: exception name : , error type : 100, message: Failed to parse error payload, should retry : 0 code: 30000 context: query: 239776237 location: xen_aws_credentials_mgr.cpp:422 process: padbmaster [pid=1073930812] ----------------------------------------------- [1022500] (ar_odbc_stmt.c:5007)"

Per the logs, it looks like these errors fail for both the manifest and the data. That hints at a Redshift issue, but the cluster appears to be fine during that time with plenty of headroom on CPU and disk and no contention. However we also then see in the DMS task logs

The target state saved. The last confirmed rec. id = 24385

where the id is being advanced later in the log after failures whereas I would expect it to remain the same. It looks like multiple application threads are writing to the logs at the same time so this may just be coming from a different thread. We are not seeing any log messages indicating a RECOVERABLE_ERROR. The task error configuration is below:

"ErrorBehavior": { "FailOnNoTablesCaptured": false, "ApplyErrorUpdatePolicy": "LOG_ERROR", "FailOnTransactionConsistencyBreached": false, "RecoverableErrorThrottlingMax": 1800, "DataErrorEscalationPolicy": "SUSPEND_TABLE", "ApplyErrorEscalationCount": 0, "RecoverableErrorStopRetryAfterThrottlingMax": false, "RecoverableErrorThrottling": true, "ApplyErrorFailOnTruncationDdl": false, "DataTruncationErrorPolicy": "LOG_ERROR", "ApplyErrorInsertPolicy": "LOG_ERROR", "EventErrorPolicy": "IGNORE", "ApplyErrorEscalationPolicy": "LOG_ERROR", "RecoverableErrorCount": -1, "DataErrorEscalationCount": 0, "TableErrorEscalationPolicy": "STOP_TASK", "RecoverableErrorInterval": 5, "ApplyErrorDeletePolicy": "IGNORE_RECORD", "TableErrorEscalationCount": 0, "FullLoadIgnoreConflicts": true, "DataErrorPolicy": "LOG_ERROR", "TableErrorPolicy": "SUSPEND_TABLE" }

Any insight that can be lent here would be greatly appreciated.

mtaluc
asked 6 months ago214 views
2 Answers
0

Did your grants complete successfully? Did you run these as a superuser or as the owner? Try reapplying the grants and checking the privileges. If DMS is using a role for rights in Redshift you may need to grant to the role.

you can refer below link for more details. https://docs.aws.amazon.com/redshift/latest/dg/r_GRANT-examples.html

you can also check awslabs' Redshift github repo - https://github.com/awslabs/amazon-redshift-utils - there are a number of views that explore permissions. Knowing which step in the process is not doing what you expect will narrow things down.

Sachin
answered 6 months ago
0

The issue turned out to be a backend problem with Redshift where it would sporadically lose authorization to S3 (odd that we didn't see any relevant messages in the logs). We were informed of the issue by AWS and rebooted the cluster which implemented their fix.

mtaluc
answered 6 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions