AWS Glue DataBrew Job crashing

0

Hello, we are receiving the below error on one of our jobs:

RecipeStepError: Failed at step 2. Parameters: {'operation': 'JSON_TO_STRUCTS', 'sourceColumns': '["data"]', 'unnestLevel': '120'}. Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 10.192.20.37, executor 6): java.io.IOException: No space left on device

The step its failing on is changing the data type of a column to struct- and this dataset has a large number of rows. What can i do to fix this? I have tried increasing the number of nodes from 5 to 75 but its still not working? This was previously working okay and only stopped last week

asked a year ago393 views
1 Answer
1
Accepted Answer

Hello,

I understand that your Databrew recipe job is failing with the out of memory exception. As per your explanation, seems like that there is an excessive amount of data loaded into an individual node thereby the particular node is unable to handle such data and it is failing with the 'No space left on device'. Thus, increasing the number of nodes might not be helping you out.

The better solution I would suggest you is to move to Glue ETL job where in you get more flexibility into the type of executors. Thereby, you could go for a node with greater capacity.

profile pictureAWS
SUPPORT ENGINEER
Chaitu
answered a year ago
profile picture
EXPERT
reviewed 5 months ago
  • Thank you for your answer, I am looking at using Glue ETL now. Would avoiding changing the JSON to Structs step in the recipe help do you think? I think we could work around it

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions