AWS glue python shell script unable to connect to oraclDB

Question

Hello All,

I am trying to create small python shell script to connect to oracle DB 
In code I have mentioned 
import oracledb

Since this module is not available. I ran pip install  oracledb on my local system and folder got generated in C:\Dev\Python3.11\Lib\site-packages as oracledb.
Then copied that folder to s3 bucket and passed the path of s3 bucket s3://XXXXXXXX/oracledb/ like this .
Now when i am running it I am getting below error CommandFailedException: Library file doesn't exist: /tmp/glue-python-libs-jmz3/

Answer

Hello,

I understand you wish to use python-oracledb in you Glue PySpark ETL job. It can be done by either of the following approaches:

1. If your Glue job runs in a VPC subnet with public Internet access (a NAT gateway is required since Glue workers don't have public Ip address [1]). You can specify the job parameter like this:

```
Key:  --additional-python-modules
Value:  oracledb
```
    
2. If your Glue job runs in a VPC without internet access, you must create a Python repository on Amazon S3 by following this documentation [2] and include oracledb in your "modules_to_install.txt" file. Then, you should be able to install the package from your own Python repository on S3 by using following parameters. (make sure replace the MY-BUCKET with the real bucket name according to your use case)

```
"--additional-python-modules" : "oracledb",
"--python-modules-installer-option" : "--no-index --find-links=http://MY-BUCKET.s3-website-us-east-1.amazonaws.com/wheelhouse --trusted-host MY-BUCKET.s3-website-us-east-1.amazonaws.com"
```

* As you are facing "CommandFailedException: Library file doesn't exist:" error, please consider looking at the IAM permission for Glue and the S3 object as well.

* Unless a library is contained in a single .py file, it should be packaged in a .zip archive. [3] Please try creating zip files and use Python 3.9.
  For using extra python files you can use it in the job parameters as follows:

```
key: --extra-py-files 
value: s3:///etl_jobs/my_etl_job.zip
```

* For Glue Python Shell jobs , you can add python libraries (not spark) and the method to do so is found here: https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html#create-python-extra-library

* For Glue ETL jobs (PySpark) you can find the info on how to add additional libraries here: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html

* Please refer to the below [4] [5] [6] articles for more elaborate explanation on the above.

**References:**
1. https://aws.amazon.com/premiumsupport/knowledge-center/nat-gateway-vpc-private-subnet/
2. https://aws.amazon.com/blogs/big-data/building-python-modules-from-a-wheel-for-spark-etl-workloads-using-aws-glue-2-0/
3. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html#aws-glue-programming-python-libraries-zipping
4. https://aws.amazon.com/premiumsupport/knowledge-center/glue-version2-external-python-libraries/ 
5. https://stackoverflow.com/questions/61217834/how-to-use-extra-files-for-aws-glue-job 
6. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html

In order for me to troubleshoot further, by taking a look at the logs in the backend, please feel free to open a support case with AWS using the following [link](https://support.console.aws.amazon.com/support/home#/case/create) with the sanitized script, the job run, any additional dependency that you are trying to import and we would be happy to help.

Answer

For example 1 ) I am not finding --additional-python-modules key in AWS console. Does its name got changed ?

key: --extra-py-files 
value: s3:///etl_jobs/my_etl_job.zip

I also tried adding Zip file in S3 bucket for oracleDB . but it is giving ModuleNotFoundError: No module named 'oracledb'. After adding this file do I need to change something my script so that it reads from this  file ?

When adding wheel file then getting error :

ImportError: cannot import name 'base_impl' from partially initialized module 'oracledb' (most likely due to a circular import) (/glue/lib/installation/oracledb/__init__.py)

AWS glue python shell script unable to connect to oraclDB

관련 콘텐츠