Glue Crawler in State Machine Shows As Complete Before Glue Data Catalog is Updated

0

I have a state machine which includes a step to run a Glue Crawler which adds partitions to an existing table. The following step in the state machine is a Glue Job which reads from the new partition in the Glue Data Catalog and does some transformations on the data.

The Glue Job step always fails initially with an error message indicating that the specified partition doesn't exist. A subsequent retry always succeeds.

Looking at the timing of the state machine events and the Glue Data Catalog partition creation, the Glue Crawler step completes and then the initial run of the Glue Job step starts before the Glue Data Catalog partition is added. The retry of the Glue Job happens after the partition is added and so it succeeds.

Is there a configuration or setting that will make the Glue Crawler step wait for the partition to be added before it shows as complete?

profile picture
已提問 8 個月前檢視次數 269 次
1 個回答
2
已接受的答案

There are a couple of ways to handle this, but the easiest way is to proceed with your "StartCrawler" step function. Then add a wait condition followed by a "GetCrawler" call which gets the status. Next , add a choice condition that will proceed to the next step if successful or return to the wait condition if it is not ready. So basically you are waiting, getting the status and looping back to the wait step and repeating until the status is ready to proceed to the next step.

Hope this helps, if it does please accept this answer and give it a thumbs up

profile picture
已回答 8 個月前
profile pictureAWS
專家
已審閱 8 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南