- Newest
- Most votes
- Most comments
The "Unable to execute HTTP request: readHandshakeRecord" exception is likely caused by insufficient memory issue. When reading large amount of small file,s spark driver keeps track of metadata for each file it reads and keeps this in memory, such may put a lot pressure on the driver memory can cause malfunctions with Http/API calls. I'd recommend you to check "glue.driver.jvm.heap.usage" and "glue.ALL.jvm.heap.usage" metrics in cloudwatch console [1] to see what the memory usage looks like.
To fix this issue, I suggest you to consider following optimization:
-
use S3ListImplementation When AWS Glue lists files, it creates a file index in driver memory. If 'useS3ListImplementation' is set to True, it doesn't cache the list of files in memory all at once. Instead, AWS Glue caches the list in batches. This will help reduce out of memory errors within Spark driver. [2]
-
change Worker types. a. Use G.2x worker type. The G.2x worker type has more memory. This will help alleviate memory pressure on the driver. b. Increase the number of worker. You can refer to "glue.driver.ExecutorAllocationManager.executors.numberMaxNeededExecutors" [3]to determine the optimal number of worker which can maximize the parallelism and performance.
-
Use bounded execution If you still face the same issue, you can try the bounded execution [4] to split and distribute the large number of files to multiple job runs. With this setting, you can set how many files will be processed at one glue job.
I hope this information helps you resolve the issue. In case the issue still persists after implemented the optimization above, I'd recommend you to cut a support ticket with the Glue job details, our support team will be happy to help you further troubleshoot and resolve the issue.
Ref: [1] https://docs.aws.amazon.com/glue/latest/dg/monitoring-awsglue-with-cloudwatch-metrics.html#:~:text=all%20Spark%20executors.-,AWS%20Glue%20Metrics,-AWS%20Glue%20profiles [2] https://aws.amazon.com/premiumsupport/knowledge-center/glue-oom-java-heap-space-error/ [3] https://docs.aws.amazon.com/glue/latest/dg/monitoring-awsglue-with-cloudwatch-metrics.html#glue.driver.ExecutorAllocationManager.executors.numberMaxNeededExecutors [4] https://docs.aws.amazon.com/glue/latest/dg/bounded-execution.html
Relevant content
- asked 3 years ago
- asked 2 years ago
- asked 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 7 months ago
- AWS OFFICIALUpdated 7 months ago