Questions tagged with AWS Glue

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Glue Start-Blueprint-Run running into timeout issues with increased number of jobs

**Describe the bug** Using AWS-CLI (Version: aws-cli/2.8.6 Python/3.9.11 Windows/10 exe/AMD64 prompt/off), starting a Glue Blueprint Run works fine when the number of objects generated inside the workflow (Triggers/Glue Jobs) is under 30-40 objects in total. But when there's more objects being generated inside the glue workflow, the Blueprint Run seems to be timing out and gets stuck in RUNNING state. **Expected Behavior** We expect the Blueprint Run to spin up the workflow with all the number of jobs as needed. This is a sample example of the workflow where each row consists of 2 jobs with a trigger in the middle for each task, and there could be N number of tasks like this: ![AWS Glue workflow from Blueprint run](/media/postImages/original/IMdi9yk9dcS7m5jVqlSi8u-A) **Current Behavior** These number of rows of tasks when under 8-10 tasks, the blueprint run is successful and doesn't time out but when it's more the Blueprint Run is stuck in RUNNING state and we never get the workflow generated. **Reproduction Steps** This is the AWS CLI command we're using right now: aws glue start-blueprint-run --blueprint-name BLUEPRINT_NAME --role-arn IAMRoleARN --parameters "file://FILE_PATH.json" --region us-east-1 --profile test-naga --cli-connect-timeout 900 --cli-read-timeout 900 The JSON object takes in a collection of table names and loops over them and the layout file creates the workflow as shown in the image above. It's not easy to reproduce this with the exact same example but maybe one of the samples here https://github.com/awslabs/aws-glue-blueprint-libs/tree/master/samples/crawl_s3_locations could be used but for a higher number of jobs/objects created through the Blueprint Run **Possible Solution** I'm wondering if it's related to the --cli-connect-timeout and --cli-read-timeout issue as the default value is 60 seconds and it seems like the Blueprint Run tries to spin up all the resources in that time but if there are more objects to spin up and it crosses this time, the whole process times out and gets stuck in the RUNNING state without doing anything. We also tried setting these values to 0 and still the same issue. The number of objects it spins up when timing out seems to be random across each runs. **CLI version used** 2.8.6 **Environment details (OS name and version, etc.)** Windows 10
0
answers
0
votes
23
views
asked a month ago

Unable to Relaunch Elasticsearch Connector for AWS Glue from Marketplace

Prior, I was able to Subscribe to [Relaunch Elasticsearch Connector](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwj9kJCX3vz6AhVyMjQIHdn6D4UQFnoECBsQAQ&url=https%3A%2F%2Faws.amazon.com%2Fmarketplace%2Fpp%2Fprodview-v5ygernwn2gb6&usg=AOvVaw0TMrJCuyHDp4nv1T9PSuBd). Upon subscribing and following the instructions on that page, I ended at "Configure this software" in the Elasticsearch Connector Marketplace subscription. Upon choosing "Glue version 3.0" and selecting software version "7.13.4-2", initially, I was met with a box labeled "Usage instructions", which I was able to follow, get SecretsManager set up successfully. Initially, I was met with a generated link below "Usage instructions" reading "Deployment template: Activate connector in AWS Glue". Upon clicking this convenient link, I was taken to "AWS Glue Studio > Connectors" page in my account, with a generated MARKETPLACE type connector. Through trial and error in getting the connector set up, I had along the lines deleted this MARKETPLACE connector, with seemingly no obvious way to restore or retrieve it. After googling around, I was unable to find any similar issues as this, so I attempted to unsubscribe and re-subscribe to the Marketplace connector. Upon doing so, and reaching "Usage instructions" (on [Relaunch Elasticsearch Connector](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwj9kJCX3vz6AhVyMjQIHdn6D4UQFnoECBsQAQ&url=https%3A%2F%2Faws.amazon.com%2Fmarketplace%2Fpp%2Fprodview-v5ygernwn2gb6&usg=AOvVaw0TMrJCuyHDp4nv1T9PSuBd)), the option "Deployment template: Activate connector in AWS Glue" had disappeared with no apparent way to re-create this MARKETPLACE connector in glue. Is this expected behavior for Marketplace custom glue connectors, and, if so, are there any steps to properly recreate a MARKETPLACE custom connector within my account?
1
answers
0
votes
24
views
asked a month ago

Cause and Solution for com.amazonaws.services.gluejobexecutor.model.InvalidInputException: Entity size has exceeded the maximum allowed size

We have a glue job using workload partitioning by bounded execution. In a recent run the job failed during the Job.commit call. Based on the message I assume that the bookmark was too large to save. 1) How would this occur? 2) What options are available to prevent this from happening? 3) How would we recover from this if this occurred in a production environment? The error stack trace provides: 2022-10-20 17:05:44,413 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(91)): Exception in User Class com.amazonaws.services.gluejobexecutor.model.InvalidInputException: Entity size has exceeded the maximum allowed size. (Service: AWSGlueJobExecutor; Status Code: 400; Error Code: InvalidInputException; Request ID: xxx; Proxy: null) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1819) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1403) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1372) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530) at com.amazonaws.services.gluejobexecutor.AWSGlueJobExecutorClient.doInvoke(AWSGlueJobExecutorClient.java:6964) at com.amazonaws.services.gluejobexecutor.AWSGlueJobExecutorClient.invoke(AWSGlueJobExecutorClient.java:6931) at com.amazonaws.services.gluejobexecutor.AWSGlueJobExecutorClient.invoke(AWSGlueJobExecutorClient.java:6920) at com.amazonaws.services.gluejobexecutor.AWSGlueJobExecutorClient.executeUpdateJobBookmark(AWSGlueJobExecutorClient.java:6610) at com.amazonaws.services.gluejobexecutor.AWSGlueJobExecutorClient.updateJobBookmark(AWSGlueJobExecutorClient.java:6580) at com.amazonaws.services.glue.util.AWSGlueJobBookmarkService$$anonfun$commit$1.apply(AWSGlueJobBookmarkService.scala:184) at com.amazonaws.services.glue.util.AWSGlueJobBookmarkService$$anonfun$commit$1.apply(AWSGlueJobBookmarkService.scala:183) at scala.Option.foreach(Option.scala:257) at com.amazonaws.services.glue.util.AWSGlueJobBookmarkService.commit(AWSGlueJobBookmarkService.scala:183) at com.amazonaws.services.glue.util.JobBookmark$.commit(JobBookmarkUtils.scala:88) at com.amazonaws.services.glue.util.Job$.commit(Job.scala:121) at ...
1
answers
0
votes
30
views
jhenn
asked a month ago