DMS Ignore Duplicate key errors while migrating data between DocumentDB instances

0

We need to replicate data between two collections in AWS documentDB to get rid of duplicate documents.

Source and Target is AWS documentDB instances version 4.0.0.

I've created a unique index in target table to only allow non-duplicate values. I needed to create index before migrating the data to new target, because our data size in ~1TB and index creation on source collection is impossible.

Full load fails after the following error. Task status becomes table error and no data is migrated further to that collection.

2022-03-23T03:13:57 [TARGET_LOAD     ]E:  Execute bulk failed with errors: 'Multiple write errors: "E11000 duplicate key error collection: reward_users_v4 index: lockId", "E11000 duplicate key error collection: reward_users_v4 index: lockId"' [1020403]  (mongodb_apply.c:153) 
2022-03-23T03:13:57 [TARGET_LOAD     ]E:  Failed to handle execute bulk when maximum events per bulk '1000' was reached [1020403]  (mongodb_apply.c:433)
"ErrorBehavior": {
		"FailOnNoTablesCaptured": false,
		"ApplyErrorUpdatePolicy": "LOG_ERROR",
		"FailOnTransactionConsistencyBreached": false,
		"RecoverableErrorThrottlingMax": 1800,
		"DataErrorEscalationPolicy": "SUSPEND_TABLE",
		"ApplyErrorEscalationCount": 1000000000,
		"RecoverableErrorStopRetryAfterThrottlingMax": true,
		"RecoverableErrorThrottling": true,
		"ApplyErrorFailOnTruncationDdl": false,
		"DataTruncationErrorPolicy": "LOG_ERROR",
		"ApplyErrorInsertPolicy": "LOG_ERROR",
		"ApplyErrorEscalationPolicy": "LOG_ERROR",
		"RecoverableErrorCount": 1000000000,
		"DataErrorEscalationCount": 1000000000,
		"TableErrorEscalationPolicy": "SUSPEND_TABLE",
		"RecoverableErrorInterval": 10,
		"ApplyErrorDeletePolicy": "IGNORE_RECORD",
		"TableErrorEscalationCount": 1000000000,
		"FullLoadIgnoreConflicts": true,
		"DataErrorPolicy": "LOG_ERROR",
		"TableErrorPolicy": "SUSPEND_TABLE"
	},

How can I configure AWS DMS to continue even if such duplicate key errors keep on happening. I tried modifying the TableErrorEscalation count and many other error counts but loading always stops at first duplicate key error.

I have 580k Documents in test workload for this task.

  • I don't think this can be achievable with DMS even if FullLoadIgnoreConflicts is set to true in the migration task settings. The error comes from the target database because of the unique index. One option is to use mongodump/mongorestore, by default it reports dupes and continues ( unless --stopOnError is used). Other option is to identify and eliminate duplicates on target, it can be done with an aggregation query.

1 Answer
0

@Mihai A Yeah, So ultimately we ended up only streaming changes in real-time from one collection to another and used mongodump and restore to perform the full load.

Raj
answered 2 years ago
  • Great, glad you got it sorted in the end.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions