- Newest
- Most votes
- Most comments
The "No space left on device" error you're experiencing in your AWS Glue ETL job is a common issue that typically occurs when there's significant data skew in your dataset. This happens when Apache Spark, which runs on AWS Glue workers, needs to spill data from memory to local disk due to insufficient heap space.
Here are some potential causes and solutions:
-
Data skew: Your job might be experiencing significant data skew during processing, causing some executors to handle disproportionately large amounts of data. This is common with wide transformations like groupByKey(), reduceByKey(), and join() operations that cause data shuffling.
-
Monitoring and debugging: Use AWS Glue job metrics and Apache Spark UI to identify data skew in the executors. These tools can help you understand the behavior of each stage, task, job, and executor in detail.
-
Solutions to consider:
- Disaggregate compute and storage: Use dedicated serverless storage like Amazon S3 to store Spark shuffle and spill data instead of writing to local disk.
- Scale out: Consider increasing the number of workers or upgrading worker types, though this might not always work for heavily skewed data.
- Reduce and filter input data: Minimize data shuffling and network usage during wide operations by filtering data early in your processing pipeline.
- Use broadcasting for small tables: If you're joining tables and one is small (tens of MBs), consider broadcasting it to minimize network overhead.
- Implement Adaptive Query Execution (AQE): This optimization technique can help resolve data skew and dynamically manage shuffle partitions.
-
Check your partitioning strategy: Using repartition(1) forces a single core to handle all writing, which can significantly slow down the process and cause disk space issues. Consider repartitioning to match your worker configuration.
-
Review resource allocation: With g.1x worker type and 50 workers, verify that this is still appropriate for your data volume and processing needs.
By monitoring your job's execution and implementing these optimizations, you should be able to resolve the "No space left on device" error and improve your Glue ETL job's performance.
Sources
Resolve "No space left on device" error in AWS Glue ETL job | AWS re:Post
Glue performance issue in recent times | AWS re:Post
Debugging OOM exceptions and job abnormalities - AWS Glue
Relevant content
- asked 7 months ago
- asked 7 months ago
- asked 2 years ago
