Passer au contenu

AWS Glue 5 - Command failed with exit code 10 with --additional-python-modules

0

I'm trying to setup a glue job that need an additional package (PyPDF2) but keep getting Command failed with exit code 10 BEFORE anything within the script is run.

Since I'm exploring the service, I'm doing everything from the console before going with IaC (Terraform). Update 2024/12/11: I Also setup the same job with terraform and same result.

As stated in the documentation here: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html#addl-python-modules-requirements-txt

I've added two job parameters:

Key: --python-modules-installer-option

Value: -r

Key: --additional-python-modules

Value: s3://aws-glue-config-bucket/requirements.txt

Note: Printing the content of the requirements.txt file from the script itself is working.

For minimal reproduction case:

  • I have a debug script with only: print("DEBUG PRINT REACHED")
  • The execution role of the job has adminsitrator privileges (for debug only) and trust relationship granted to glue.amazonaws.com

Is there any other stuff to check ?

Update 2024/12/11: -> See my comment for a workaround, but need an AWS fix and/or documentation update

  • With Glue 4 I get clearer log message, but I still can't see how to fix this since I'm already using "-r" flag:

    • com.amazonaws.services.glue.PythonModuleInstaller [main] pip3 install -r --user /tmp/requirements.txt
    • com.amazonaws.services.glue.PythonModuleInstaller [main] Defaulting to user installation because normal site-packages is not writeable
    • com.amazonaws.services.glue.PythonModuleInstaller [main] ERROR: Invalid requirement: '/tmp/requirements.txt'Hint: It looks like a path. The path does exist. The argument you provided (/tmp/requirements.txt) appears to be a requirements file. If that is the case, use the '-r' flag to install the packages specified within it.
  • By inspecting generated commands, It seems --user flag is injected by AWS glue at the wrong place (ie after "-r"), this means the requirements.txt file path used is "--user" instead "/tmp/requirements.txt"

    Here is the matching log output with Glue 5: PythonModuleInstaller: pip3.11 install -r --user https://<fully-public-url>/requirements.txt PythonModuleInstaller: ERROR: Could not open requirements file: [Errno 2] No such file or directory: '--user'

  • I confirm the previous statement through this test:

    Also it now work with: key: --python-modules-installer-option value: -r s3://<private-bucket>/requirements.txt

    Hence the --user isn't inserted in the wrong place. Would really need a fix since following the documentation is literally the way to reproduce the problem (this is why I'm not logging this as an answer since it's just a workaround)

demandé il y a un an562 vues
1 réponse
0

I understand you're encountering issues with setting up an AWS Glue 5.0 job that requires an additional Python package (PyPDF2). The "Command failed with exit code 10" error occurring before your script runs suggests there might be a problem with the job configuration or the way the additional Python modules are being installed.

Let's go through a few things to check and try:

  1. Ensure your requirements.txt file is correctly formatted and contains only the package you need: PyPDF2==version_number

  2. Double-check that the S3 path to your requirements.txt file is correct and that the Glue job has permissions to access it.

  3. Instead of using the requirements.txt file, you could try specifying the package directly in the --additional-python-modules parameter: Key: --additional-python-modules Value: PyPDF2==version_number

  4. Verify that you're using the correct job parameters for AWS Glue 5.0. The parameters you've provided look correct, but ensure they're exactly as follows: Key: --python-modules-installer-option Value: -r Key: --additional-python-modules Value: s3://aws-glue-config-bucket/requirements.txt

  5. Check the CloudWatch logs for the Glue job. Even though your script isn't running, there might be error messages in the logs that could provide more insight into why the command is failing.

  6. Ensure you've selected the correct Glue version (5.0) when setting up your job.

  7. Try running a Glue job without any additional Python modules to ensure the basic configuration is working correctly.

  8. If possible, test the job using the AWS CLI or AWS SDK to rule out any potential issues with the console interface.

If none of these suggestions resolve the issue, you may need to contact AWS support for further assistance, as there could be an underlying problem with the Glue environment or configuration that isn't immediately apparent.
Sources
Introducing AWS Glue 5.0 for Apache Spark | AWS Big Data Blog
Using Python libraries with AWS Glue - AWS Glue
Using job parameters in AWS Glue jobs - AWS Glue

répondu il y a un an
  • I've done all checks from 1 to 7 and If the console is not working I will probably consider other service provider before checking 8

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.