AWS glue python shell script unable to connect to oraclDB

0

Hello All,

I am trying to create small python shell script to connect to oracle DB In code I have mentioned import oracledb

Since this module is not available. I ran pip install oracledb on my local system and folder got generated in C:\Dev\Python3.11\Lib\site-packages as oracledb. Then copied that folder to s3 bucket and passed the path of s3 bucket s3://XXXXXXXX/oracledb/ like this . Now when i am running it I am getting below error CommandFailedException: Library file doesn't exist: /tmp/glue-python-libs-jmz3/

asked a year ago891 views
2 Answers
1

Hello,

I understand you wish to use python-oracledb in you Glue PySpark ETL job. It can be done by either of the following approaches:

  1. If your Glue job runs in a VPC subnet with public Internet access (a NAT gateway is required since Glue workers don't have public Ip address [1]). You can specify the job parameter like this:
Key:  --additional-python-modules
Value:  oracledb
  1. If your Glue job runs in a VPC without internet access, you must create a Python repository on Amazon S3 by following this documentation [2] and include oracledb in your "modules_to_install.txt" file. Then, you should be able to install the package from your own Python repository on S3 by using following parameters. (make sure replace the MY-BUCKET with the real bucket name according to your use case)
"--additional-python-modules" : "oracledb",
"--python-modules-installer-option" : "--no-index --find-links=http://MY-BUCKET.s3-website-us-east-1.amazonaws.com/wheelhouse --trusted-host MY-BUCKET.s3-website-us-east-1.amazonaws.com"
  • As you are facing "CommandFailedException: Library file doesn't exist:" error, please consider looking at the IAM permission for Glue and the S3 object as well.

  • Unless a library is contained in a single .py file, it should be packaged in a .zip archive. [3] Please try creating zip files and use Python 3.9. For using extra python files you can use it in the job parameters as follows:

key: --extra-py-files 
value: s3://<bucket_name>/etl_jobs/my_etl_job.zip

References:

  1. https://aws.amazon.com/premiumsupport/knowledge-center/nat-gateway-vpc-private-subnet/
  2. https://aws.amazon.com/blogs/big-data/building-python-modules-from-a-wheel-for-spark-etl-workloads-using-aws-glue-2-0/
  3. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html#aws-glue-programming-python-libraries-zipping
  4. https://aws.amazon.com/premiumsupport/knowledge-center/glue-version2-external-python-libraries/
  5. https://stackoverflow.com/questions/61217834/how-to-use-extra-files-for-aws-glue-job
  6. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html

In order for me to troubleshoot further, by taking a look at the logs in the backend, please feel free to open a support case with AWS using the following link with the sanitized script, the job run, any additional dependency that you are trying to import and we would be happy to help.

AWS
SUPPORT ENGINEER
answered a year ago
0

For example 1 ) I am not finding --additional-python-modules key in AWS console. Does its name got changed ?

key: --extra-py-files value: s3://<bucket_name>/etl_jobs/my_etl_job.zip

I also tried adding Zip file in S3 bucket for oracleDB . but it is giving ModuleNotFoundError: No module named 'oracledb'. After adding this file do I need to change something my script so that it reads from this file ?

When adding wheel file then getting error :

ImportError: cannot import name 'base_impl' from partially initialized module 'oracledb' (most likely due to a circular import) (/glue/lib/installation/oracledb/init.py)

answered a year ago
  • You can use --additional-python-modules even if it's not offered in the this. The reason you get that import 'base_impl' is because it cannot find the native .so file, it needs to be precompiled that's why it's better just to install from pip

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions