Glue Crawler in State Machine Shows As Complete Before Glue Data Catalog is Updated

0

I have a state machine which includes a step to run a Glue Crawler which adds partitions to an existing table. The following step in the state machine is a Glue Job which reads from the new partition in the Glue Data Catalog and does some transformations on the data.

The Glue Job step always fails initially with an error message indicating that the specified partition doesn't exist. A subsequent retry always succeeds.

Looking at the timing of the state machine events and the Glue Data Catalog partition creation, the Glue Crawler step completes and then the initial run of the Glue Job step starts before the Glue Data Catalog partition is added. The retry of the Glue Job happens after the partition is added and so it succeeds.

Is there a configuration or setting that will make the Glue Crawler step wait for the partition to be added before it shows as complete?

profile picture
asked 7 months ago259 views
1 Answer
2
Accepted Answer

There are a couple of ways to handle this, but the easiest way is to proceed with your "StartCrawler" step function. Then add a wait condition followed by a "GetCrawler" call which gets the status. Next , add a choice condition that will proceed to the next step if successful or return to the wait condition if it is not ready. So basically you are waiting, getting the status and looping back to the wait step and repeating until the status is ready to proceed to the next step.

Hope this helps, if it does please accept this answer and give it a thumbs up

profile picture
answered 7 months ago
profile pictureAWS
EXPERT
reviewed 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions