- Newest
- Most votes
- Most comments
Yes, the migration using the Amazon OpenSearch Ingestion pipeline can capture delete operations from the source domain when properly configured.
When using the AWSOpenSearchDataMigrationPipeline blueprint for migrating from an OpenSearch domain to an OpenSearch Serverless collection, the pipeline can be set up to handle not just new and updated data, but also deletions.
The key to capturing delete operations lies in how you configure the pipeline, particularly the use of DynamoDB Streams or a similar change data capture (CDC) mechanism. When properly set up, the pipeline can detect and propagate delete operations from the source to the destination.
For ongoing replication after the initial data migration, the pipeline can be configured to consume changes from a stream or CDC source, which would include create, update, and delete events. This ensures that deletions in the source domain are reflected in the target OpenSearch Serverless collection.
It's important to note that while multiple scans can help capture new or updated data, the ability to capture deletes in real-time typically relies on a continuous replication mechanism rather than periodic scans. To ensure that deletions are properly captured and reflected in your migrated data, you should configure the pipeline to use a streaming or CDC approach in addition to any full or incremental scans you might perform.
When setting up your migration pipeline, make sure to carefully review and configure the source and sink settings to ensure that all types of data changes, including deletions, are properly captured and replicated to your target OpenSearch Serverless collection.
Sources
dynamodb table clone into opensearch | AWS re:Post
Migrating existing data from DDB to ES Domain | AWS re:Post
Migrating data from self-managed OpenSearch clusters using Amazon OpenSearch Ingestion - Amazon OpenSearch Service
Hello Naga,
Greetings from AWS!
I can confirm that deletions in the source domain are not captured or replicated in the migrated data, even when multiple scans are configured.
The migration pipeline, implemented through the AWSOpenSearchDataMigrationPipeline pipeline blueprint, is specifically designed as a migration solution rather than a continuous replication tool. While the pipeline is capable of handling updates to existing documents through multiple scans, as mentioned in the documentation, it does not track or apply deletions from the source to the destination cluster.
Each new scan performs a complete reprocessing of the entire index, but documents that have been deleted in the source domain will continue to persist in the destination collection unless manually removed. This behavior is intentional and aligns with the tool's primary purpose of facilitating data transfer rather than maintaining a fully synchronized state between source and destination.
When planning your migration strategy, it's important to account for this limitation and implement additional procedures if deletion synchronization is crucial for your use case. For comprehensive information about pipeline configuration and behavior, you can refer to the official OpenSearch documentation:
Data Prepper Pipelines Overview OpenSearch Source Configuration OpenSearch Sink Configuration
I hope above information is helpful. If you have any questions or queries, feel free to write back.
Thank you and have a great day ahead!
Thanks for the response Tanya_D, Is there any recommendation/tutorial for the configuration of streams or CDC source in pipeline?
For ongoing replication after the initial data migration, the pipeline can be configured to consume changes from a stream or CDC source, which would include create, update, and delete events. This ensures that deletions in the source domain are reflected in the target OpenSearch Serverless collection.
Relevant content
- AWS OFFICIALUpdated 8 months ago
Can you share more information about streaming or CDC approach? Is there a tutorial I can refer to?