AWS Glue Python Shell – Oracle DB Connection Solution

2 minute read
Content level: Intermediate
0

This document lists the challenges and solution to connecting an Oracle DB from Glue Python shell.

Problem Statement: Glue Python shell has to use the ‘Oracle’ python package to connect to Oracle database from the python shell jobs and until the ‘cx_Oracle’ version by default Oracle client library had to be present on the server/node for the ‘cx_Oracle’ package to work. Unfortunately installing an Oracle client is not an option due to the serverless nature of Glue and hence the below discussed solution. Note: The latest version of python package ‘python-oracledb’ version provides a thin client but this version is not yet supported within Glue Python shell. Work Around for ‘cx_Oracle’ Package: Fundamental change to make ‘cx_Oracle’ work is to re-point the package to use Oracle client library files from a given set of files instead of the default client installation. Below mentioned activities are required for the workaround.

  1. Get Oracle client archive, patchelf and cx_Oracle packages.
  2. Remove symlinks and move the file they point to the one that is going to be looked up by ‘cx_oracle’.
  3. Patch the ‘rpath’ to point to a static directory
  4. Add libaio.so.1 in the archive
  5. Zip the archive
  6. Upload both client archive and cx-oracle to S3 bucket. a. Put the S3 URL to the archive in "Referenced files path" configuration parameter b. Put the S3 URL to the cx-Oracle wheel in "Python library path" configuration parameter
  7. Configure Glue Job by adding a bit of code to the glue job to set up the libraries. This code must be executed before any usage of cx-Oracle

Code Snippets:

Library Prep: wget https://download-ib01.fedoraproject.org/pub/epel/7/x86_64/Packages/p/patchelf-0.12-1.el7.x86_64.rpm sudo yum install patchelf-0.12-1.el7.x86_64.rpm python3 -m venv cxora18 source cxora18/bin/activate python3 -m pip install --upgrade pip python3 -m pip install cx_Oracle cd ~/cxora18/lib64/python3.7/site-packages wget https://download.oracle.com/otn_software/linux/instantclient/185000/instantclient-basic-linux.x64-18.5.0.0.0dbru.zip unzip instantclient-basic-linux.x64-18.5.0.0.0dbru.zip mkdir oralib cp /lib64/libaio.so.1 ./oralib cp .so ./oralib cp -r ./instantclient_18_5 ./oralib cd oralib patchelf --set-rpath /tmp .so chmod 755 -R ./instantclient_18_5/.so patchelf --set-rpath /tmp ./instantclient_18_5/.so aws s3 cp . s3://YOUR S3 BUCKET*/lib-ora18/ --recursive

Glue Job Code:

import zipfile
import os
from pathlib import Path
import glob
import cx_Oracle

filename = 'instantclient-basic-linuxx64_patched.zip'
oracle_archive = next(Path('/tmp').glob(f'**/{filename}'))
with zipfile.ZipFile(oracle_archive, 'r') as f:
    f.extractall('/tmp/libs')
files = glob.glob('/tmp/**/*', recursive = True)
for file in files:
    if file.find('/tmp/libs/instantclient_18_5/') > -1 and file.find('network') < 0:
        os.rename(file, "/tmp/" + file.split("/")[-1])
cx_Oracle.init_oracle_client(lib_dir='/tmp')
AWS
EXPERT
AWSRB
published a year ago2574 views
1 Comment

What about using the python-oracledb instead ? It's the "evolution" of cx_oracle. It has a 'thin' mode which does not require the Oracle Client Library thus solves this problem.

https://oracle.github.io/python-oracledb/

replied 8 months ago