- Newest
- Most votes
- Most comments
Hi,
I understand you wish to use python-oracledb in you Glue PySpark ETL job. I did some tests with my test environment and I'm able to confirm this can be done by either of the following approaches:
- If your Glue job runs in a VPC subnet with public Internet access (a NAT gateway is required since Glue workers don't have public Ip address [1]). You can specify the job parameter like this:
Key: --additional-python-modules
Value: oracledb
- If your Glue job runs in a VPC without internet access, you must create a Python repository on Amazon S3 by following this documentation [2] and include oracledb in your "modules_to_install.txt" file. Then, you should be able to install the package from your own Python repository on S3 by using following parameters. (make sure replace the MY-BUCKET with the real bucket name according to your use case)
"--additional-python-modules" : "oracledb",
"--python-modules-installer-option" : "--no-index --find-links=http://MY-BUCKET.s3-website-us-east-1.amazonaws.com/wheelhouse --trusted-host MY-BUCKET.s3-website-us-east-1.amazonaws.com"
Ref:
[1] https://aws.amazon.com/premiumsupport/knowledge-center/nat-gateway-vpc-private-subnet/
Hello,
Thank you for your question. My name is Yvonne, from RDS team.
From your question I understand that you experienced an error "NoModuleFoundError: No module named oracledb" and also noticed the error in the log "it is not supported" while trying to include python-oracledb in your Glue job, so you want to know when it will be supported.
Unfortunately i am not able to provide the timelines as our development team has their own timelines however we announce all new features when we release them in below blogs [1] [2] .
Please note that 'additional-python-modules' is applicable for Spark Glue Job with Glue version 2.0 and 3.0. You can include the external python library as mentioned in the link[3] .
For supported versions please refer to the below documentation:
[+] https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html
In case you require further assistance or have any queries, feel free to respond back to the case and I will be happy to assist you.
References:
[1] https://aws.amazon.com/new/
[2] https://aws.amazon.com/blogs/aws/
[3] https://docs.aws.amazon.com/glue/latest/dg/reduced-start-times-spark-etl-jobs.html#reduced-start-times-limitations
Relevant content
- asked a year ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
Just found the error in the log, it is not supported. Any idea if it will ever be supported.