DMS Ignore Duplicate key errors while migrating data between DocumentDB instances

0

We need to replicate data between two collections in AWS documentDB to get rid of duplicate documents.

Source and Target is AWS documentDB instances version 4.0.0.

I've created a unique index in target table to only allow non-duplicate values. I needed to create index before migrating the data to new target, because our data size in ~1TB and index creation on source collection is impossible.

Full load fails after the following error. Task status becomes table error and no data is migrated further to that collection.

2022-03-23T03:13:57 [TARGET_LOAD     ]E:  Execute bulk failed with errors: 'Multiple write errors: "E11000 duplicate key error collection: reward_users_v4 index: lockId", "E11000 duplicate key error collection: reward_users_v4 index: lockId"' [1020403]  (mongodb_apply.c:153) 
2022-03-23T03:13:57 [TARGET_LOAD     ]E:  Failed to handle execute bulk when maximum events per bulk '1000' was reached [1020403]  (mongodb_apply.c:433)
"ErrorBehavior": {
		"FailOnNoTablesCaptured": false,
		"ApplyErrorUpdatePolicy": "LOG_ERROR",
		"FailOnTransactionConsistencyBreached": false,
		"RecoverableErrorThrottlingMax": 1800,
		"DataErrorEscalationPolicy": "SUSPEND_TABLE",
		"ApplyErrorEscalationCount": 1000000000,
		"RecoverableErrorStopRetryAfterThrottlingMax": true,
		"RecoverableErrorThrottling": true,
		"ApplyErrorFailOnTruncationDdl": false,
		"DataTruncationErrorPolicy": "LOG_ERROR",
		"ApplyErrorInsertPolicy": "LOG_ERROR",
		"ApplyErrorEscalationPolicy": "LOG_ERROR",
		"RecoverableErrorCount": 1000000000,
		"DataErrorEscalationCount": 1000000000,
		"TableErrorEscalationPolicy": "SUSPEND_TABLE",
		"RecoverableErrorInterval": 10,
		"ApplyErrorDeletePolicy": "IGNORE_RECORD",
		"TableErrorEscalationCount": 1000000000,
		"FullLoadIgnoreConflicts": true,
		"DataErrorPolicy": "LOG_ERROR",
		"TableErrorPolicy": "SUSPEND_TABLE"
	},

How can I configure AWS DMS to continue even if such duplicate key errors keep on happening. I tried modifying the TableErrorEscalation count and many other error counts but loading always stops at first duplicate key error.

I have 580k Documents in test workload for this task.

  • I don't think this can be achievable with DMS even if FullLoadIgnoreConflicts is set to true in the migration task settings. The error comes from the target database because of the unique index. One option is to use mongodump/mongorestore, by default it reports dupes and continues ( unless --stopOnError is used). Other option is to identify and eliminate duplicates on target, it can be done with an aggregation query.

1개 답변
0

@Mihai A Yeah, So ultimately we ended up only streaming changes in real-time from one collection to another and used mongodump and restore to perform the full load.

Raj
답변함 2년 전
  • Great, glad you got it sorted in the end.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인