How long will matching job take for 30M records?

0

We just created a matching job on AWS Identity Resolution. For our initial testing we used a sample data of 2781 records. It took about 807 seconds to finish this job.

Assuming we are satisfied with the output, we are going to run this on a dataset with 30m records. At this pace, it would take about 67 days if I am not mistaken, do you think Entity will keep running on this pace or are there any optimization steps we can do?

Estimates for process duration for Machine Learning matching and Rule based matching?

Note: we are storing our data in S3 -> crawled by AWS Glue -> AWS Entity Resolution matching jobs -> output to S3

jp
已提問 7 個月前檢視次數 249 次
1 個回答
1
已接受的答案

Hello,

Matching data within AWS Entity Resolution does not have a specific linear scale when dealing with dataset sizes. In testing, a dataset with <5K records took around 6-8 minutes and a dataset with 1M records took around 20-30 minutes. I expect a dataset with 30M records should take no longer than a few hours to complete.

profile pictureAWS
專家
Chris_G
已回答 7 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南