How long will matching job take for 30M records?

0

We just created a matching job on AWS Identity Resolution. For our initial testing we used a sample data of 2781 records. It took about 807 seconds to finish this job.

Assuming we are satisfied with the output, we are going to run this on a dataset with 30m records. At this pace, it would take about 67 days if I am not mistaken, do you think Entity will keep running on this pace or are there any optimization steps we can do?

Estimates for process duration for Machine Learning matching and Rule based matching?

Note: we are storing our data in S3 -> crawled by AWS Glue -> AWS Entity Resolution matching jobs -> output to S3

1개 답변
1
수락된 답변

Hello,

Matching data within AWS Entity Resolution does not have a specific linear scale when dealing with dataset sizes. In testing, a dataset with <5K records took around 6-8 minutes and a dataset with 1M records took around 20-30 minutes. I expect a dataset with 30M records should take no longer than a few hours to complete.

profile pictureAWS
전문가
Chris_G
답변함 7달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠