AWS glue python shell script unable to connect to oraclDB

0

Hello All,

I am trying to create small python shell script to connect to oracle DB In code I have mentioned import oracledb

Since this module is not available. I ran pip install oracledb on my local system and folder got generated in C:\Dev\Python3.11\Lib\site-packages as oracledb. Then copied that folder to s3 bucket and passed the path of s3 bucket s3://XXXXXXXX/oracledb/ like this . Now when i am running it I am getting below error CommandFailedException: Library file doesn't exist: /tmp/glue-python-libs-jmz3/

질문됨 일 년 전929회 조회
2개 답변
1

Hello,

I understand you wish to use python-oracledb in you Glue PySpark ETL job. It can be done by either of the following approaches:

  1. If your Glue job runs in a VPC subnet with public Internet access (a NAT gateway is required since Glue workers don't have public Ip address [1]). You can specify the job parameter like this:
Key:  --additional-python-modules
Value:  oracledb
  1. If your Glue job runs in a VPC without internet access, you must create a Python repository on Amazon S3 by following this documentation [2] and include oracledb in your "modules_to_install.txt" file. Then, you should be able to install the package from your own Python repository on S3 by using following parameters. (make sure replace the MY-BUCKET with the real bucket name according to your use case)
"--additional-python-modules" : "oracledb",
"--python-modules-installer-option" : "--no-index --find-links=http://MY-BUCKET.s3-website-us-east-1.amazonaws.com/wheelhouse --trusted-host MY-BUCKET.s3-website-us-east-1.amazonaws.com"
  • As you are facing "CommandFailedException: Library file doesn't exist:" error, please consider looking at the IAM permission for Glue and the S3 object as well.

  • Unless a library is contained in a single .py file, it should be packaged in a .zip archive. [3] Please try creating zip files and use Python 3.9. For using extra python files you can use it in the job parameters as follows:

key: --extra-py-files 
value: s3://<bucket_name>/etl_jobs/my_etl_job.zip

References:

  1. https://aws.amazon.com/premiumsupport/knowledge-center/nat-gateway-vpc-private-subnet/
  2. https://aws.amazon.com/blogs/big-data/building-python-modules-from-a-wheel-for-spark-etl-workloads-using-aws-glue-2-0/
  3. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html#aws-glue-programming-python-libraries-zipping
  4. https://aws.amazon.com/premiumsupport/knowledge-center/glue-version2-external-python-libraries/
  5. https://stackoverflow.com/questions/61217834/how-to-use-extra-files-for-aws-glue-job
  6. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html

In order for me to troubleshoot further, by taking a look at the logs in the backend, please feel free to open a support case with AWS using the following link with the sanitized script, the job run, any additional dependency that you are trying to import and we would be happy to help.

AWS
지원 엔지니어
답변함 일 년 전
0

For example 1 ) I am not finding --additional-python-modules key in AWS console. Does its name got changed ?

key: --extra-py-files value: s3://<bucket_name>/etl_jobs/my_etl_job.zip

I also tried adding Zip file in S3 bucket for oracleDB . but it is giving ModuleNotFoundError: No module named 'oracledb'. After adding this file do I need to change something my script so that it reads from this file ?

When adding wheel file then getting error :

ImportError: cannot import name 'base_impl' from partially initialized module 'oracledb' (most likely due to a circular import) (/glue/lib/installation/oracledb/init.py)

답변함 일 년 전
  • You can use --additional-python-modules even if it's not offered in the this. The reason you get that import 'base_impl' is because it cannot find the native .so file, it needs to be precompiled that's why it's better just to install from pip

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠