Do source filters speed up DMS from AWS RDS to S3?

0

Hello,

I have a nightly DMS pipeline setup to move data from an AWS Aurora RDS MySql instance to S3. The DMS process has gotten longer and longer. I only use a fraction of the rows that are migrated, and I'm wondering if using a source filter would speed up the DMS process tremendously. Is the source filter applied on the MySql database (hence speed depend on MySql indexing), or is it applied after the data is already migrated?

Thanks for your help!

3 Answers
0

Yes, if you only want fraction of the data to target, you can let DMS know to extract the only data that you need at target. This may help in your use case. Depending upon what those data are, you can have filter in your DMS task. DMS will only read the data from source that is needed at target and in your DMS task setting to extract. Generally DMS read the data from source based on your primary key of the table in chunks. [+] https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.Filters.html

If you want only specific tables, you can mentioned that in DMS task instead of whole schema/database to migrate to s3. This will speed up overall task performance. If your task is FULL LOAD only task on daily basis, you can set it up as full_load+CDC task so that ongoing changes only get replicated to target instead of doing full load every day.

Bakul_R
answered 2 years ago
  • Do you have a suggestion for a guide to implementing CDC tasks with S3? I understand it can be done with Redshift, and I have seen external products handling CDC to S3, but nothing about setting it up just in AWS. Thanks again!

0

Sure. I hope below may help. Depending upon your source, you can refer that "source engine" as source document in AWS DMS repo.

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Task.CDC.html

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html

Bakul_R
answered 2 years ago
0

Yes, adding a source filter will definitely speed up your pipeline because right now DMS does is performing a "SELECT * FROM TABLE" on your MySQL database and the data is getting pumped with a default batch size of 10,0000 records. Once you add a source/column filter, DMS will add a WHERE clause and if your criteria is selective to the point where an index can be used, then it could be significantly faster. You can validate all of this by enabling logging on your DMS task, and in the "Source Unload" logging section set the value to DEBUG. After you run the task, you can go to CloudWatch logs , search the logs for "SOURCE_UNLOAD" keyword and see the actual SQL statement that DMS ran on your MySQL database.

AWS
Peter_S
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions