How long will matching job take for 30M records?

0

We just created a matching job on AWS Identity Resolution. For our initial testing we used a sample data of 2781 records. It took about 807 seconds to finish this job.

Assuming we are satisfied with the output, we are going to run this on a dataset with 30m records. At this pace, it would take about 67 days if I am not mistaken, do you think Entity will keep running on this pace or are there any optimization steps we can do?

Estimates for process duration for Machine Learning matching and Rule based matching?

Note: we are storing our data in S3 -> crawled by AWS Glue -> AWS Entity Resolution matching jobs -> output to S3

1 回答
1
已接受的回答

Hello,

Matching data within AWS Entity Resolution does not have a specific linear scale when dealing with dataset sizes. In testing, a dataset with <5K records took around 6-8 minutes and a dataset with 1M records took around 20-30 minutes. I expect a dataset with 30M records should take no longer than a few hours to complete.

profile pictureAWS
专家
Chris_G
已回答 7 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则