Synschronous Glue Job in Step Function is slow to recognize completion of Glue Job

0

I am using a Step Function to execute a Glue Job. The Step Function is set to run in synchronous mode, however, there is usually a 2-4 minute lag from Glue Job completion to the point at which the Step Function considers the Glue Job complete and moves to the next step. For example, the Glue Job's last run took 15 minutes but the Step Function spent 19 minutes on this step. Has anyone else experienced this? Is my only option to execution in async mode and poll more often for completion?

tjtoll
질문됨 2년 전393회 조회
1개 답변
1
수락된 답변

The reason why your experiencing this delay is because Glue does not support cloudwatch event for notifying the step functions with the latest status. Same is the case with EMR as well. Currently, by default the polling schedule is every 1 minute for the first 10 minutes, then every 5 minutes thereafter. Therefore, if the job is taking more than 10 minutes to complete it's execution the you can expect a delay of an average of 2.5 minutes with 5 minutes being the worst case. The only way is we poll its status by making Describe* api call to EMR/Glue up to every 5 minutes. The step function team knows about this issue and are trying to implement a solution.

The workaround that you can implement on your end is to make use of a Lambda function to make Describe API calls to describe EMR/Glue job status more often than Step Functions does.

If you require in depth assistance about this issue then I would advice you to raise a support case with the Technical Support team of Step Functions.

profile pictureAWS
지원 엔지니어
Chaitu
답변함 2년 전
  • Thank you for responding. We'll implement a workaround for now.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠