- Newest
- Most votes
- Most comments
I understood that you wanted Glue to use the same match_id's for subsequent FindIncrementalMatches runs for incremental data.
Please note that the match_id's are arbitrary identifier which act as labeling for your data and it denotes the matching records which is predicted by your ML transform Algorithm. In case of incremental data the datasets is changing overtime and new or or modified records are being added thus there are now more candidate rows taken to be consideration by the machine learning model to decide the pairing and it is possible to have new match_id as it starts labeling from scratch, just like you provided as in the example. Unfortunately in Glue we do not have any option to enforce the previously generated match_id's to put it in use for the subsequent FindIncrementalMatches runs for incremental data as of now.
Relevant content
- asked 5 months ago
- Accepted Answerasked 2 years ago
- AWS OFFICIALUpdated 6 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a month ago
Thank you for the response. I have a follow-up question that there is a field with the key: "enforcedMatches" in FindIncrementalMatches. What is the purpose of this field?
It is mentioned in "https://docs.aws.amazon.com/glue/latest/dg/glue-etl-scala-apis-glue-ml-findincrementalmatches.html" but there is no implementation example for this.