Is there a way to install unixodbc-dev and unixODBC-devel in AWS Glue Python Shell Jobs?

0

We have been able to connect to a Microsoft SQL Server DB using both Glue's DynamicFrame and Spark's own JDBC write option due to the Glue connection option. However, we want to move this workload to AWS Glue Python Shell Jobs. Pyodbc seemed to be the option to connect to MS SQL. Despite using pyodbc itself and aws wrangler which uses it, I believe, we get the following error:

pyodbc.Error: ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'ODBC Driver 17 for SQL Server' : file not found (0) (SQLDriverConnect)")

I was referred to a document which specifically instructs to install unixodbc-dev and unixODBC-devel files before installing pyodbc. https://aws-sdk-pandas.readthedocs.io/en/stable/install.html#notes-for-microsoft-sql-server Is there a detail documentation instructing to do so? Would be grateful for your guidance.

1 Answer
0

You cannot install system headers or packages in the shell since that requires root access.
What you do instead is use a package that doesn't require the ODBC driver, like pymssql.

To do so, in the shell arguments (in the Job Details tab), add a parameter: --additional-python-modules with value pymssql
Then you can just use the library in the code, reference:
https://learn.microsoft.com/en-us/sql/connect/python/pymssql/step-3-proof-of-concept-connecting-to-sql-using-pymssql?view=sql-server-ver16

profile pictureAWS
EXPERT
answered a year ago
profile picture
EXPERT
reviewed a month ago
  • Hi Gonzalo Thanks so much for responding. I have tried pymssql. However, in my experience, it only installs with python 3.6 using whl file: pymssql-2.2.7-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl However, I need to use python 3.9 because I am using boto3 redshift-data api in the same script. When I try to install pymssql in python 3.9 with following file: pymssql-2.2.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl, I am unable to install it in AWS Glue Python Sell Job. Could you please guide me with the installation? I would be really grateful.

  • If your Shell has internet connectity, just specify "pymssql" as I showed above and pip will determine the right package. If you need to download the package and load it from s3, you do need to know which is the right one for that OS and Python version. To do so run once with internet access and see in the log which specific whl the Shell picks to install, then download the same.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions