- Newest
- Most votes
- Most comments
The issue you're experiencing with BSON size exceeding 16 MB during the Change Data Capture (CDC) phase of your AWS Database Migration Service (DMS) task from MongoDB to Amazon S3 is a known limitation. MongoDB has a maximum BSON document size of 16 MB, and when DMS encounters documents larger than this limit, it can cause the task to fail.
To address this issue and improve the reliability of your pipeline, consider the following approaches:
-
Reduce document size: If possible, review your MongoDB data model and see if you can reduce the size of individual documents. This might involve splitting large documents into smaller, related documents.
-
Use LOB settings: Adjust the LOB (Large Object) settings in your DMS task. You mentioned using a LOB size of 10000 KB (about 10 MB). Try increasing this to the maximum allowed value, which is 102400 KB (100 MB). This might help DMS handle larger documents more effectively.
-
Modify target endpoint settings: Adjust your S3 target endpoint settings to optimize for larger file sizes and less frequent writes. Consider increasing
CdcMinFileSize
to a larger value, such as 64000 (64 MB), and increaseCdcMaxBatchInterval
to 3600 seconds (1 hour). This can help reduce the number of small files created in S3 and potentially alleviate some pressure on the migration process. -
Use table mapping to filter: If the large documents are concentrated in specific collections, you can use table mapping rules in your DMS task to exclude these problematic collections or filter out large documents.
-
Split the migration: Consider splitting your migration into multiple tasks, separating collections with large documents from those with smaller ones. This can help isolate the problem and allow you to migrate most of your data successfully.
-
Increase replication instance size: If you haven't already, try using a larger replication instance. This can provide more resources to handle the migration process, potentially alleviating some issues related to large document sizes.
-
Monitor and adjust: Closely monitor your task's performance using CloudWatch metrics. Look for indicators of latency or resource constraints, and adjust your settings accordingly.
If these approaches don't resolve the issue, you may need to consider alternative migration strategies, such as using MongoDB's native tools to export data and then importing it into S3, or using a custom script to handle the migration of large documents separately.
Remember to test any changes in a non-production environment before applying them to your main migration pipeline.
Sources
Step 7: Run the AWS DMS Task - Database Migration Guide
Step-By-Step Migration - Database Migration Guide
The "BSON Size exceeds 16 MB" error happens because MongoDB has a 16MB document size limit, causing AWS DMS to fail when processing large records. To fix this, first, enable DocumentFiltering in your source settings to skip documents larger than 16MB. Next, adjust LobChunkSize to 64KB to break large objects into smaller chunks. Change NestingLevel to "one" to reduce deep nesting. Modify CDC settings by lowering CdcBatchSize and CdcMaxBatchInterval to avoid memory overload. If using Parquet, try switching to JSON format for flexibility. Another option is to split large MongoDB documents manually before migration using the $unwind command. These changes will improve pipeline stability and prevent failures. If the issue persists, consider restructuring how data is stored in MongoDB to keep document sizes manageable. By making these adjustments, you can continue migrating MongoDB to S3 without interruptions. Let me know if you need further assistance!