Utilize Glue ETL for mapping misspelled information to existing data sets

0

I am looking to ingest upstream files via Glue ETL and I need to match misspellings to existing, already standardized data, based on rules that I can either continually add to or train a model to learn from, to them add to my database. It is basically a continuously growing reference table(s). All of this is currently done in Excel by hand for multiple columns/fields and I need to automate it.

General Example: I already have a list of known matches (e.g., "tigre" = "tiger"), so any field that has "tigre" should map to the proper spelling without any additional steps.

I believe that I need to have a training step for matches that don't already exist. So when the spelling "tigerrr" comes along, I can map to "tiger" and, next time the process runs, the mapping occurs properly.

I dove into DataBrew, but does not appear to be able to handle large reference tables for mapping in the recipes. It did not look like FindMatches in Glue Studio was quite the right tool either as it appears to focus on full record matching, not individual field matches.

Any recommendations?

질문됨 2년 전60회 조회
답변 없음

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인