Glue Crawler in State Machine Shows As Complete Before Glue Data Catalog is Updated

0

I have a state machine which includes a step to run a Glue Crawler which adds partitions to an existing table. The following step in the state machine is a Glue Job which reads from the new partition in the Glue Data Catalog and does some transformations on the data.

The Glue Job step always fails initially with an error message indicating that the specified partition doesn't exist. A subsequent retry always succeeds.

Looking at the timing of the state machine events and the Glue Data Catalog partition creation, the Glue Crawler step completes and then the initial run of the Glue Job step starts before the Glue Data Catalog partition is added. The retry of the Glue Job happens after the partition is added and so it succeeds.

Is there a configuration or setting that will make the Glue Crawler step wait for the partition to be added before it shows as complete?

profile picture
已提问 8 个月前270 查看次数
1 回答
2
已接受的回答

There are a couple of ways to handle this, but the easiest way is to proceed with your "StartCrawler" step function. Then add a wait condition followed by a "GetCrawler" call which gets the status. Next , add a choice condition that will proceed to the next step if successful or return to the wait condition if it is not ready. So basically you are waiting, getting the status and looping back to the wait step and repeating until the status is ready to proceed to the next step.

Hope this helps, if it does please accept this answer and give it a thumbs up

profile picture
已回答 8 个月前
profile pictureAWS
专家
已审核 8 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则