Glue Crawler in State Machine Shows As Complete Before Glue Data Catalog is Updated

0

I have a state machine which includes a step to run a Glue Crawler which adds partitions to an existing table. The following step in the state machine is a Glue Job which reads from the new partition in the Glue Data Catalog and does some transformations on the data.

The Glue Job step always fails initially with an error message indicating that the specified partition doesn't exist. A subsequent retry always succeeds.

Looking at the timing of the state machine events and the Glue Data Catalog partition creation, the Glue Crawler step completes and then the initial run of the Glue Job step starts before the Glue Data Catalog partition is added. The retry of the Glue Job happens after the partition is added and so it succeeds.

Is there a configuration or setting that will make the Glue Crawler step wait for the partition to be added before it shows as complete?

profile picture
질문됨 8달 전269회 조회
1개 답변
2
수락된 답변

There are a couple of ways to handle this, but the easiest way is to proceed with your "StartCrawler" step function. Then add a wait condition followed by a "GetCrawler" call which gets the status. Next , add a choice condition that will proceed to the next step if successful or return to the wait condition if it is not ready. So basically you are waiting, getting the status and looping back to the wait step and repeating until the status is ready to proceed to the next step.

Hope this helps, if it does please accept this answer and give it a thumbs up

profile picture
답변함 8달 전
profile pictureAWS
전문가
검토됨 8달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠