By using AWS re:Post, you agree to the Terms of Use

Enforce FindIncrementalMatches in AWS Glue to use existing match_id

1

The use case is that we have an incremental source of data from which we need to identify the matching records. For this purpose, we are running Find Matches using AWS Glue 2.0.

Result after Find Matches is Run

When I run FindMatches on the initial source, the following result is generated against the source. Note the match_id generated for each record

Find Matches Result

Result after Find Incremental Matches is Run

When I run FindIncrementalMatches using the above result of Find Matches as the existing source, the following result is generated with completely different Match IDs: Find Incremental Matches Result

Question

Is there a way to enforce FindIncrementalMatches in AWS Glue to use existing match_id while processing matches from an incremental source?

Important links

We've followed the steps described in the following link:

  1. AWS Blog - Incremental Data Matching
1 Answers
0
Accepted Answer

I understood that you wanted Glue to use the same match_id's for subsequent FindIncrementalMatches runs for incremental data.

Please note that the match_id's are arbitrary identifier which act as labeling for your data and it denotes the matching records which is predicted by your ML transform Algorithm. In case of incremental data the datasets is changing overtime and new or or modified records are being added thus there are now more candidate rows taken to be consideration by the machine learning model to decide the pairing and it is possible to have new match_id as it starts labeling from scratch, just like you provided as in the example. Unfortunately in Glue we do not have any option to enforce the previously generated match_id's to put it in use for the subsequent FindIncrementalMatches runs for incremental data as of now.

SUPPORT ENGINEER
answered 15 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions