Cause and Solution for com.amazonaws.services.gluejobexecutor.model.InvalidInputException: Entity size has exceeded the maximum allowed size

Question

We have a glue job using workload partitioning by bounded execution. In a recent run the job failed during the Job.commit call.

Based on the message I assume that the bookmark was too large to save.

1) How would this occur?  
2) What options are available to prevent this from happening? 
3) How would we recover from this if this occurred in a production environment?

The error stack trace provides:
2022-10-20 17:05:44,413 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(91)): Exception in User Class com.amazonaws.services.gluejobexecutor.model.InvalidInputException: Entity size has exceeded the maximum allowed size. (Service: AWSGlueJobExecutor; Status Code: 400; Error Code: InvalidInputException; Request ID: xxx; Proxy: null) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1819) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1403) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1372) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530) at com.amazonaws.services.gluejobexecutor.AWSGlueJobExecutorClient.doInvoke(AWSGlueJobExecutorClient.java:6964) at com.amazonaws.services.gluejobexecutor.AWSGlueJobExecutorClient.invoke(AWSGlueJobExecutorClient.java:6931) at com.amazonaws.services.gluejobexecutor.AWSGlueJobExecutorClient.invoke(AWSGlueJobExecutorClient.java:6920) at com.amazonaws.services.gluejobexecutor.AWSGlueJobExecutorClient.executeUpdateJobBookmark(AWSGlueJobExecutorClient.java:6610) at com.amazonaws.services.gluejobexecutor.AWSGlueJobExecutorClient.updateJobBookmark(AWSGlueJobExecutorClient.java:6580) at com.amazonaws.services.glue.util.AWSGlueJobBookmarkService$$anonfun$commit$1.apply(AWSGlueJobBookmarkService.scala:184) at com.amazonaws.services.glue.util.AWSGlueJobBookmarkService$$anonfun$commit$1.apply(AWSGlueJobBookmarkService.scala:183) at scala.Option.foreach(Option.scala:257) at com.amazonaws.services.glue.util.AWSGlueJobBookmarkService.commit(AWSGlueJobBookmarkService.scala:183) at com.amazonaws.services.glue.util.JobBookmark$.commit(JobBookmarkUtils.scala:88) at com.amazonaws.services.glue.util.Job$.commit(Job.scala:121) at ...

Answer

1)How this occurs:

-->A bookmark enabled Glue job to tracks the processed data in a state. By design, there is a hard limit of 400 KB on the size of this state and if your bookmark state exceeds this value, the below error will be thrown.
More about bookmarks: https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html

2)There are many options available to overcome the problem:

Resetting the bookmark,Splitting the job into different jobs to process different datasources,Using bounded execution for S3 datasources to reduce the number of files to be processed,If there are many transformation_ctx parameters in your job, you can reduce the number of transformation_ctx parameters by only using the transformation_ctx parameter on your read operations/data source,If bookmark support is needed for all of the datasources, can you split up the job to have fewer data sources in per job?

3)Regarding the query "How would we recover from this if this occurred in a production environment?"

-->“MAX_FILES_IN_BAND” to reduce the number of bookmark states. This option specifies the maximum number of files to save from the last maxBand seconds. If this number is exceeded, extra files are skipped and only processed in the next job run .It defaults to 1000. So with the default setting, there can be as many as 1000 s3 prefixes stored in the bookmark state ,it can be set to 0 for mitigating the issue. More about connection types and options:
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect.html

Cause and Solution for com.amazonaws.services.gluejobexecutor.model.InvalidInputException: Entity size has exceeded the maximum allowed size

Contenido relevante