How long will matching job take for 30M records?


We just created a matching job on AWS Identity Resolution. For our initial testing we used a sample data of 2781 records. It took about 807 seconds to finish this job.

Assuming we are satisfied with the output, we are going to run this on a dataset with 30m records. At this pace, it would take about 67 days if I am not mistaken, do you think Entity will keep running on this pace or are there any optimization steps we can do?

Estimates for process duration for Machine Learning matching and Rule based matching?

Note: we are storing our data in S3 -> crawled by AWS Glue -> AWS Entity Resolution matching jobs -> output to S3

1 Answer
Accepted Answer


Matching data within AWS Entity Resolution does not have a specific linear scale when dealing with dataset sizes. In testing, a dataset with <5K records took around 6-8 minutes and a dataset with 1M records took around 20-30 minutes. I expect a dataset with 30M records should take no longer than a few hours to complete.

profile pictureAWS
answered 8 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions