- Newest
- Most votes
- Most comments
In AWS Glue 5.0, there isn't a direct flag to extend the 10-minute timeout for Python module installation. However, there are a few approaches you can try to address this issue:
-
Use a requirements.txt file: Instead of specifying the modules directly in the job parameters, you can create a requirements.txt file with your dependencies and upload it to S3. Then, use the following job parameters:
--python-modules-installer-option -r --additional-python-modules s3://path_to_your_requirements.txt
This method allows AWS Glue to manage dependencies more efficiently.
-
Pre-install heavy dependencies: For large packages like torch, scipy, or scikit-image, you might want to include them in the --additional-python-modules parameter separately, rather than as part of your custom wheel file. This can help distribute the installation load.
-
Optimize your package: Try to minimize the dependencies in your custom package. If possible, include only the necessary modules and consider using lighter alternatives for some libraries.
-
Use a custom runtime: If the above methods don't work, you might need to create a custom runtime for your Glue job that includes these dependencies pre-installed.
-
Split dependencies: If possible, try splitting your dependencies across multiple smaller packages. This might allow each to install within the time limit.
Remember, the 10-minute timeout is a system limitation and there's no direct way to extend it. If these approaches don't resolve the issue, you may need to contact AWS support for further assistance or consider restructuring your job to work within these constraints.
Sources
Introducing AWS Glue 5.0 for Apache Spark | AWS Big Data Blog
Using Python libraries with AWS Glue - AWS Glue
It looks like the packages you are trying to install, have a lot of transient dependencies, which would require you to reach to the internet for downloading them. If you don't have any glue connection, you can attach a network type glue connection and make sure you attach a NAT gateway to your connection subnet. Make sure you can reach out to the internet.
Relevant content
- asked 3 years ago
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated a month ago
how would I add this flag point to my req.txt in the aws glue web UI ? I would need to import the .whl and req.txt as two separate commands?