AWS Docker container for Glue and Databricks JDBC connection

0

Hello, We are using the AWS Docker container for Glue (available here) and we are trying to connect to a Databricks JDBC connection using the DatabricksJDBC42.jar (available here). We placed the jar file both in the same folder as the jupyter notebook, and have also placed it in the C:/.aws/ folder. When we try to connect we get the error "java.lang.ClassNotFoundException: com.databricks.client.jdbc.Driver".

We have used DB2 driver without issue, using the same setup. Also, when we upload the jar to AWS and attach it to the glue job as an --extra-jars parameter it works fine.

Has anyone gotten this to successfully work?

asked a year ago673 views
3 Answers
0

Hello,

I understand that you are receiving the following error while trying to connect to your Databricks cluster when you are following the blog post “Develop and test AWS Glue version 3.0 and 4.0 jobs locally using a Docker container” :

java.lang.ClassNotFoundException: com.databricks.client.jdbc.Driver

Since you are using the updated DatabricksJDBC42.jar driver, please ensure that the naming convention used for the JDBC URL is also as per DatabricksJDBC42.jar and not according to the legacy SparkJDBC42.jar.

Refer to: https://docs.databricks.com/integrations/jdbc-odbc-bi.html#building-the-connection-url-for-the-databricks-driver

Modified params:

  • jdbc:databricks://
  • Use HttpPath
  • Supply driver class name as 'com.databricks.client.jdbc.Driver'

If the issue still persists, then please open a support case with AWS providing the connection details and code snippet used - https://docs.aws.amazon.com/awssupport/latest/user/case-management.html#creating-a-support-case

Thank you.

AWS
SUPPORT ENGINEER
answered a year ago
0

If it works with --extra-jars, it means in the docker container Glue is not able to find the jar, placing it in the notebook folder or .aws won't do.
The safest thing is to ssh into the container and put the jar under /home/glue_user/spark/jars

profile pictureAWS
EXPERT
answered a year ago
0

Gonzalo's answer worked, but also I found that adding the jar in the docker run command was easiest. There was no need to commit the modified docker container image. However, I am now facing a new error related to SSL PKIX path building failed. I will post it as a separate question. Thanks for your attention team! Appreciate the inputs. :)

docker run -it -v ~/.aws:/home/glue_user/.aws -v $WORKSPACE_LOCATION:/home/glue_user/workspace/ -e AWS_PROFILE=$PROFILE_NAME -e DISABLE_SSL=true PYSPARK_SUBMIT_ARGS="--jars /root/.aws/db2jcc4.jar,/root/.aws/DatabricksJDBC42.jar,/root/.aws/AthenaJDBC42-2.0.35.1000,/root/.aws/presto-jdbc-0.225-SNAPSHOT.jar pyspark-shell" --rm -p 4040:4040 -p 18080:18080 --name glue_spark_submit amazon/aws-glue-libs:glue_libs_3.0.0_image_01 spark-submit /home/glue_user/workspace/src/$SCRIPT_FILE_NAME

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions