2 Answers
- Newest
- Most votes
- Most comments
0
Hello,
This issue happens in an edge case when you have a very huge block which is larger than the MAX_INT, so that the buffer array is not able to store it into memory while reading from S3.
In an attempt to mitigate the issue, you can consider increasing the partitions or the number of workers. [+] Configuring job properties for Spark jobs in AWS Glue - https://docs.aws.amazon.com/glue/latest/dg/add-job.html
Or if the shuffle data is small, you can disable writing shuffle to S3.
answered a year ago
0
For each cases what the number I should expect?
- You mentioned, "if the shuffle data is small, you can disable writing shuffle to S3", what is the size of shuffle data is considered as small? On the metric, I saw the shuffle plot has a peak of 15G at some point before the job failed.
- I had enable auto-scaling, and I did not see any workers added via auto-scaling, so I assume that increasing the number of worker will not help. is it the right assumption?
- by "increasing the partitions", you mean increasing the size of the partitions (and reducing the number of the partition)? Or increasing the number of partitions (via reducing the size of the partitions)? Either way, how can I determine the numbers to set? I would assume that there is simple calculation to do it.
Thanks for your input.
answered a year ago
