DMS Ignore Duplicate key errors while migrating data between DocumentDB instances
We need to replicate data between two collections in AWS documentDB to get rid of duplicate documents.
Source and Target is AWS documentDB instances version 4.0.0.
I've created a unique index in target table to only allow non-duplicate values. I needed to create index before migrating the data to new target, because our data size in ~1TB and index creation on source collection is impossible.
Full load fails after the following error. Task status becomes table error and no data is migrated further to that collection.
2022-03-23T03:13:57 [TARGET_LOAD ]E: Execute bulk failed with errors: 'Multiple write errors: "E11000 duplicate key error collection: reward_users_v4 index: lockId", "E11000 duplicate key error collection: reward_users_v4 index: lockId"' [1020403] (mongodb_apply.c:153)
2022-03-23T03:13:57 [TARGET_LOAD ]E: Failed to handle execute bulk when maximum events per bulk '1000' was reached [1020403] (mongodb_apply.c:433)
"ErrorBehavior": {
"FailOnNoTablesCaptured": false,
"ApplyErrorUpdatePolicy": "LOG_ERROR",
"FailOnTransactionConsistencyBreached": false,
"RecoverableErrorThrottlingMax": 1800,
"DataErrorEscalationPolicy": "SUSPEND_TABLE",
"ApplyErrorEscalationCount": 1000000000,
"RecoverableErrorStopRetryAfterThrottlingMax": true,
"RecoverableErrorThrottling": true,
"ApplyErrorFailOnTruncationDdl": false,
"DataTruncationErrorPolicy": "LOG_ERROR",
"ApplyErrorInsertPolicy": "LOG_ERROR",
"ApplyErrorEscalationPolicy": "LOG_ERROR",
"RecoverableErrorCount": 1000000000,
"DataErrorEscalationCount": 1000000000,
"TableErrorEscalationPolicy": "SUSPEND_TABLE",
"RecoverableErrorInterval": 10,
"ApplyErrorDeletePolicy": "IGNORE_RECORD",
"TableErrorEscalationCount": 1000000000,
"FullLoadIgnoreConflicts": true,
"DataErrorPolicy": "LOG_ERROR",
"TableErrorPolicy": "SUSPEND_TABLE"
},
How can I configure AWS DMS to continue even if such duplicate key errors keep on happening. I tried modifying the TableErrorEscalation count and many other error counts but loading always stops at first duplicate key error.
I have 580k Documents in test workload for this task.
@Mihai A Yeah, So ultimately we ended up only streaming changes in real-time from one collection to another and used mongodump and restore to perform the full load.
Great, glad you got it sorted in the end.
Relevant questions
HIVE_UNKNOWN_ERROR: Duplicate key string
asked 3 months agoDo we have to Spice duplicate data?
Accepted Answerasked 10 days agodata type converstion in DocumentDB using aggregation pipelines
asked 2 months agoUpdate the instance type of AWS DocumentDB
Accepted Answerasked 4 months agoMongoDB Atlas vs DocumentDB
asked 3 months agoDoes DocumentDB support client side encryption?
asked 7 months agoDMS Ignore Duplicate key errors while migrating data between DocumentDB instances
asked 2 months agoMigrate or dump DocumentDB data
asked 3 years agoLower pricing options on DocumentDB
asked 2 years agodata transfer cost between two AZs in the same VPC same account
Accepted Answerasked a year ago
I don't think this can be achievable with DMS even if FullLoadIgnoreConflicts is set to true in the migration task settings. The error comes from the target database because of the unique index. One option is to use mongodump/mongorestore, by default it reports dupes and continues ( unless --stopOnError is used). Other option is to identify and eliminate duplicates on target, it can be done with an aggregation query.