AWS Glue retry a job after an execution error

1

Hi,

Currently we have not been able to find a way for a job to retry execution after a failure, and this is because if there is a communication failure with OnPrem, it is required that the job be re-executed automatically to guarantee the data integrity.

Exist any way to do that? i mean using AWS Eventbridge or a workaround related ?

Thanks in advance for any help,

  • Hi, We have a Glue job that accepts the parameter "table_name," with the default value set as "dummy" in the Glue job parameters section. Additionally, the Glue job configuration allows a total of 4 retries.

    A Lambda function was created to invoke the Glue job with the parameter "table_name" set to "xxxx." During the initial run, the Glue job failed with the error message: "The specified subnet does not have enough free addresses to satisfy the request. Please provide a connection with a subnet with IPs available" and "exception: Number of IP addresses on subnet is 0."

    Despite these errors, the Glue job retried itself 4 times as per the configuration. However, in each retry attempt, the "table_name" parameter inexplicably changed to the default value "dummy," deviating from the original value passed by the Lambda function ("xxxx"). This unexpected parameter change caused the Glue job to fail repeatedly with error "An error occurred while calling o93.getCatalogSource. : com.amazonaws.services.glue.model.EntityNotFoundException: Entity Not Found".

    Could you please confirm why the parameters in the Glue job are reverting to default values when the job is retried

  • +1 to the above comment, In my case the job failed due to a connection issue with redshift, which defaulted to automatic retires, where the default parameters which were passed to the glue job were missing in the job instances that were being triggered due to original job run failure.

2개 답변
2
수락된 답변

There is a native retry functionality within Glue through the MaxRetries parameter. This parameter can be defined programmatically or if using Glue Studio in the "Job details" tab.

MaxRetries – Number (integer). The maximum number of times to retry this job after a JobRun fails.

If this is not sufficient for your use-case, consider wrapping your Glue job within a Step Function and here you can implement a more sophisticated retry/control mechanism.

AWS
답변함 2년 전
profile picture
전문가
검토됨 6달 전
AWS
전문가
검토됨 2년 전
0

Thanks for you answer, i'm gonna test the suggest steps and let you know :)

Karlos
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠