Enforce FindIncrementalMatches in AWS Glue to use existing match_id

1

The use case is that we have an incremental source of data from which we need to identify the matching records. For this purpose, we are running Find Matches using AWS Glue 2.0.

Result after Find Matches is Run

When I run FindMatches on the initial source, the following result is generated against the source. Note the match_id generated for each record

Find Matches Result

Result after Find Incremental Matches is Run

When I run FindIncrementalMatches using the above result of Find Matches as the existing source, the following result is generated with completely different Match IDs: Find Incremental Matches Result

Question

Is there a way to enforce FindIncrementalMatches in AWS Glue to use existing match_id while processing matches from an incremental source?

Important links

We've followed the steps described in the following link:

  1. AWS Blog - Incremental Data Matching
1개 답변
0
수락된 답변

I understood that you wanted Glue to use the same match_id's for subsequent FindIncrementalMatches runs for incremental data.

Please note that the match_id's are arbitrary identifier which act as labeling for your data and it denotes the matching records which is predicted by your ML transform Algorithm. In case of incremental data the datasets is changing overtime and new or or modified records are being added thus there are now more candidate rows taken to be consideration by the machine learning model to decide the pairing and it is possible to have new match_id as it starts labeling from scratch, just like you provided as in the example. Unfortunately in Glue we do not have any option to enforce the previously generated match_id's to put it in use for the subsequent FindIncrementalMatches runs for incremental data as of now.

AWS
지원 엔지니어
Shubh
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠