I want to resolve inconsistent data when I use Amazon Simple Storage Service (Amazon S3) as the target for AWS Database Migration Service (AWS DMS) migration.
Short description
When you migrate data from a source database to an S3 bucket, you might see missing data, duplicate data, or missing columns after the schema changes. This happens because Amazon S3 doesn't support primary keys, updates, or referential integrity.
To resolve inconsistent data, complete the resolution for your issue.
Resolution
Missing or removed data in the S3 target
If TRUNCATE TABLE or DROP TABLE operations occur on the source database, then AWS DMS replicates the operations to Amazon S3. This replication might result in missing data.
To prevent missing or removed data from TRUNCATE or DROP TABLE operations, review and adjust your data definition language (DDL) handling settings. Complete the following steps:
-
Open the AWS DMS console.
-
In the navigation pane, under Migrate or replicate, choose Tasks.
-
Select your task.
-
Check that the task status is Stopped.
-
Choose Modify.
-
Choose Task settings.
-
Choose JSON editor.
-
Find the ChangeProcessingDdlHandlingPolicy parameters section. Then, set the HandleSourceTableTruncated and HandleSourceTableDropped parameters to False. Example configuration:
"ChangeProcessingDdlHandlingPolicy": {
"HandleSourceTableDropped": false,
"HandleSourceTableTruncated": false
}
-
Choose Save.
-
Choose Actions, and then choose Resume.
Amazon S3 doesn't reflect schema changes
If you add new columns to a table on the source but don't activate schema evolution, then AWS DMS might not capture the changes.
To resolve missing schema changes, complete the following steps:
-
Open the AWS DMS console.
-
In the navigation pane, under Migrate or replicate, choose Tasks.
-
Select your task.
-
Check that the task status is Stopped.
-
Choose Modify.
-
Choose Task settings.
-
Choose JSON editor.
-
Find the ChangeProcessingDdlHandlingPolicy parameters section. Then, set the HandleSourceTableAltered parameter to True. Example configuration:
"ChangeProcessingDdlHandlingPolicy": {
"HandleSourceTableAltered": true
}
-
Choose Save.
-
Choose Actions, and then choose Resume.
-
To include the new columns in your Amazon S3 data, reload the table.
Multiple tasks write to the same Amazon S3 path and cause duplicate or inconsistent data
If multiple AWS DMS tasks write to the same S3 path, then duplicate data occurs because Amazon S3 stores all incoming objects.
To resolve this issue, complete the following steps:
- Open the AWS DMS console.
- In the navigation pane, under Migrate or replicate, choose Tasks.
- Select your task.
- Check that the task status is Stopped.
Note: Make sure to stop unused or test tasks that still write data.
- In the navigation pane, choose Endpoints.
- Select your target endpoint.
- Choose Modify.
- Assign a unique BucketFolder value.
- Choose Save.
Note: AWS DMS automatically tests the connection from the target to the replication instance. Make sure that the connection test successfully completes.
- In the navigation pane, under Migrate or replicate, choose Tasks.
- Select your task, and then choose Actions. Choose Restart.
- Repeat steps 2-11 for each task that replicates from the same source.
TargetTablePrepMode is set to DO_NOTHING and causes duplicate records
If you set the TargetTablePrepMode to DO_NOTHING for your task and it stops during the full load phase, then the task writes the full load files again when you restart it. This causes duplicate full load data because Amazon S3 doesn't support automatic updates or cleanups.
To resolve this issue, complete the following steps:
- Open the AWS DMS console.
- In the navigation pane, under Migrate or replicate, choose Tasks.
- Select your task.
- Check that the task status is Stopped.
- Choose Modify.
- Choose Task settings.
- Choose JSON editor.
- Find the FullLoadSettings parameters section. Set the TargetTablePrepMode to TRUNCATE_BEFORE_LOAD. Example configuration:
"FullLoadSettings": {
"TargetTablePrepMode": "TRUNCATE_BEFORE_LOAD"
}
Note: To prevent duplicates with the TargetTablePrepMode set to DO_NOTHING, stop tasks only after the full load phase completes.
- Before you restart the task, manually delete the previously written full load files from your S3 bucket.
Related information
Endpoint settings when using Amazon S3 as a target for AWS DMS
Limitations to using Amazon S3 as a target