We just created a matching job on AWS Identity Resolution. For our initial testing we used a sample data of 2781 records. It took about 807 seconds to finish this job.

Assuming we are satisfied with the output, we are going to run this on a dataset with 30m records. At this pace, it would take about 67 days if I am not mistaken, do you think Entity will keep running on this pace or are there any optimization steps we can do?

Estimates for process duration for Machine Learning matching and Rule based matching?

Note: we are storing our data in S3 -> crawled by AWS Glue -> AWS Entity Resolution matching jobs -> output to S3

Matching data within AWS Entity Resolution does not have a specific linear scale when dealing with dataset sizes. In testing, a dataset with <5K records took around 6-8 minutes and a dataset with 1M records took around 20-30 minutes. I expect a dataset with 30M records should take no longer than a few hours to complete.

