Skip to content

AWS DMS CDC Error - "Start Processing New RESETLOGS_ID '123' that does not still have archived Redo logs

0

I am trying to run a full-load-and-cdc from Oracle RDS to Aurora PostgreSQL. Last week I got the full load working for the tables, this week I am trying to get full-load-and-cdc to work. I have validated, based on the documentation, that all of the v$ grants have been applied to the DMS user, the ARCHIVELOG_DIR and ONLINELOG_DIR objects are existing and the DMS user has read access, supplemental logging is enabled at the DB and table level, and the DMS source extra connection attributes are set to useLogminerReader=N;useBfile=Y;.

I have also validated that the retention period is set to a value (it is set to 12 hours since this is a lower environment we are testing in).

I have deleted and recreated the task multiple times in the event that it is still referencing an old SCN that has been purged. Every time I recreate the task and reload-target, the tables are all stuck in a "Before load" status and never proceed.

In cloud watch, I also see a corrupted redo log data block.

Can someone help me understand how to further troubleshoot this?

asked 2 months ago113 views
2 Answers
0

The error you're encountering indicates that AWS DMS is attempting to process archived redo logs that are no longer available on your Oracle RDS instance. This typically occurs when the needed Oracle archive logs have been removed from your server before AWS DMS was able to use them to capture changes.

While you've set your log retention to 12 hours, this may not be sufficient for your use case. Consider increasing your log retention policy on your Oracle RDS database. You can do this by running:

exec rdsadmin.rdsadmin_util.set_configuration('archivelog retention hours',24);

This will increase the retention to 24 hours, which is the recommended minimum for AWS DMS operations.

Additionally, since you're seeing corrupted redo log data blocks in CloudWatch, here are some additional troubleshooting steps:

  1. Verify AWS DMS permissions: Ensure the AWS account that AWS DMS uses has all required permissions for both the source Oracle database and target Aurora PostgreSQL database.

  2. Check CDC metrics: Monitor the CDCLatencySource and CDCLatencyTarget CloudWatch metrics for your AWS DMS task to identify any latency issues that might be contributing to the problem.

  3. Review task logs thoroughly: Look for any specific error messages related to the redo log corruption or missing archived logs that might provide more context about the timing of when logs are being purged versus when DMS needs them.

  4. Consider the timing: If your full load takes a significant amount of time to complete, the archived logs from the start of the migration might be purged before CDC can begin. Increasing retention beyond 24 hours might be necessary depending on your full load duration.

The combination of the RESETLOGS_ID error and corrupted redo log blocks suggests that the archive logs DMS needs to establish CDC are being removed before they can be processed, even with your current 12-hour retention.
Sources
Troubleshooting migration tasks in AWS Database Migration Service - AWS Database Migration Service
Resolve data replication issues with AWS DMS migration to Amazon Redshift | AWS re:Post

answered 2 months ago
  • The full load last week took about 90 minutes to run, so I don't think the retention period being 24 hours is the solution.

0

The error you're getting implies that Oracle has performed a RESETLOGS operation while the full load was running. If so, the captured SCN that the CDC process intends to start from won't be valid any longer, and all previous archived logs become invalid for the new incarnation. You can check this:

-- Check if RESETLOGS has occurred
SELECT incarnation#, resetlogs_change#, resetlogs_time, status 
FROM V$DATABASE_INCARNATION 
ORDER BY incarnation#;

-- Check current database state
SELECT name, db_unique_name, resetlogs_change#, resetlogs_time, current_scn, log_mode 
FROM V$DATABASE;

If there are multiple incarnations, or the resetlogs_time is recent, this confirms a RESETLOGS event occurred. All archived logs from before the RESETLOGS are invalid for CDC purposes.

You may be able to sort this out by recreating the archive and online log directories (if so, validate afterwards that log switching works by forcing it with exec rdsadmin.rdsadmin_util.switch_logfile;), then delete and recreate the DMS task.

If the source database is idle from a DML perspective, you could also consider separating the full-load and CDC parts by doing a full load only first, then creating a CDC-only task with a Custom CDC start point set to the SCN from Oracle after the full load completed.

SELECT current_scn FROM V$DATABASE;
AWS
answered 2 months ago
  • Thanks for your response here, Johan. I ran both of the queries (which I previously did to try to troubleshoot this as well), and below are the results:

    Query 1: Incarnation# - 1, RESETLOGS_CHANGE#, 1, RESETLOGS_TIME, 26-Apr-23, STATUS - CURRENT

    Query 2: Name - (redacted), DB_UNIQUE_NAME - (redacted), RESETLOGS_CHANGE# - 1, RESETLOGS_TIME - 26-APR-23, CURRENT_SCN - (redacted), LOG_MODE - ACHIVELOG

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.