How can I efficiently migrate a large MongoDB collection (with documents exceeding 16MB) to Amazon S3 using AWS DMS without failures?

0

I am using AWS Database Migration Service (DMS) to migrate data from MongoDB (source) to Amazon S3 (target). The migration includes both Full Load (FL) and Change Data Capture (CDC). However, after some time, the process fails with the error:

"BSON Size exceeds 16 MB"

My source endpoint settings use "NestingLevel": "none" and "ExtractDocId": "true". The target endpoint writes to S3 in Parquet format.

Steps Taken So Far: Increased LobChunkSize to 10000KB in task settings.

Changed CdcMinFileSize and CdcMaxBatchInterval values.

Considered switching to JSON instead of Parquet for output.

What I Need Help With: Are there AWS-recommended best practices to handle large BSON documents in DMS?

Would splitting large MongoDB documents before migration be a better approach?

Are there alternative AWS services better suited for migrating large MongoDB datasets to S3?

1 Answer
3

Below are some pratice you can take as reference:

  1. Handling Large BSON Documents • Split Large Documents: If possible, consider splitting large MongoDB documents into smaller sub-documents before migration. This can be done programmatically using MongoDB's aggregation framework or custom scripts. • Use LOB Mode: AWS DMS supports migrating large objects (LOBs) by enabling LOB mode. You can adjust the LobChunkSize and MaxLobSize parameters to handle large BSON documents more effectively. • Enable Compression: Compressing the data before migration can help reduce the size of BSON documents.
  2. Adjusting DMS Settings • NestingLevel: If your documents are deeply nested, consider using "NestingLevel": "one" instead of "none". This can help flatten the structure and reduce size issues. • Task Settings: Fine-tune the CdcMinFileSize and CdcMaxBatchInterval values to optimize the CDC process for large documents. • Output Format: Switching to JSON instead of Parquet might simplify the migration process, as JSON is more flexible with document structures.
  3. Alternative AWS Services • AWS DataSync: For large datasets, AWS DataSync can be a better alternative. It supports high-speed data transfers and can handle large files efficiently. • MongoDB Atlas Data Federation: If you're using MongoDB Atlas, you can leverage its Data Federation feature to export data directly to S3 in Parquet format. • AWS Snowball: For extremely large datasets, AWS Snowball provides a physical device for secure data transfer to S3.
  4. Segmentation for Performance • Use segmentation to improve performance during migration. AWS DMS supports auto-segmentation and range segmentation for MongoDB collections, which can help distribute the load and avoid bottlenecks.
  5. Error Handling and Monitoring • Implement robust error handling to retry failed migrations. • Use AWS CloudWatch to monitor DMS tasks and identify potential issues early.
EXPERT
answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions