Glue Crawler in State Machine Shows As Complete Before Glue Data Catalog is Updated

0

I have a state machine which includes a step to run a Glue Crawler which adds partitions to an existing table. The following step in the state machine is a Glue Job which reads from the new partition in the Glue Data Catalog and does some transformations on the data.

The Glue Job step always fails initially with an error message indicating that the specified partition doesn't exist. A subsequent retry always succeeds.

Looking at the timing of the state machine events and the Glue Data Catalog partition creation, the Glue Crawler step completes and then the initial run of the Glue Job step starts before the Glue Data Catalog partition is added. The retry of the Glue Job happens after the partition is added and so it succeeds.

Is there a configuration or setting that will make the Glue Crawler step wait for the partition to be added before it shows as complete?

profile picture
demandé il y a 8 mois269 vues
1 réponse
2
Réponse acceptée

There are a couple of ways to handle this, but the easiest way is to proceed with your "StartCrawler" step function. Then add a wait condition followed by a "GetCrawler" call which gets the status. Next , add a choice condition that will proceed to the next step if successful or return to the wait condition if it is not ready. So basically you are waiting, getting the status and looping back to the wait step and repeating until the status is ready to proceed to the next step.

Hope this helps, if it does please accept this answer and give it a thumbs up

profile picture
répondu il y a 8 mois
profile pictureAWS
EXPERT
vérifié il y a 8 mois

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions