AWS Docker container for Glue and Databricks JDBC connection

0

Hello, We are using the AWS Docker container for Glue (available here) and we are trying to connect to a Databricks JDBC connection using the DatabricksJDBC42.jar (available here). We placed the jar file both in the same folder as the jupyter notebook, and have also placed it in the C:/.aws/ folder. When we try to connect we get the error "java.lang.ClassNotFoundException: com.databricks.client.jdbc.Driver".

We have used DB2 driver without issue, using the same setup. Also, when we upload the jar to AWS and attach it to the glue job as an --extra-jars parameter it works fine.

Has anyone gotten this to successfully work?

已提問 1 年前檢視次數 691 次
3 個答案
0

Hello,

I understand that you are receiving the following error while trying to connect to your Databricks cluster when you are following the blog post “Develop and test AWS Glue version 3.0 and 4.0 jobs locally using a Docker container” :

java.lang.ClassNotFoundException: com.databricks.client.jdbc.Driver

Since you are using the updated DatabricksJDBC42.jar driver, please ensure that the naming convention used for the JDBC URL is also as per DatabricksJDBC42.jar and not according to the legacy SparkJDBC42.jar.

Refer to: https://docs.databricks.com/integrations/jdbc-odbc-bi.html#building-the-connection-url-for-the-databricks-driver

Modified params:

  • jdbc:databricks://
  • Use HttpPath
  • Supply driver class name as 'com.databricks.client.jdbc.Driver'

If the issue still persists, then please open a support case with AWS providing the connection details and code snippet used - https://docs.aws.amazon.com/awssupport/latest/user/case-management.html#creating-a-support-case

Thank you.

AWS
支援工程師
已回答 1 年前
0

If it works with --extra-jars, it means in the docker container Glue is not able to find the jar, placing it in the notebook folder or .aws won't do.
The safest thing is to ssh into the container and put the jar under /home/glue_user/spark/jars

profile pictureAWS
專家
已回答 1 年前
0

Gonzalo's answer worked, but also I found that adding the jar in the docker run command was easiest. There was no need to commit the modified docker container image. However, I am now facing a new error related to SSL PKIX path building failed. I will post it as a separate question. Thanks for your attention team! Appreciate the inputs. :)

docker run -it -v ~/.aws:/home/glue_user/.aws -v $WORKSPACE_LOCATION:/home/glue_user/workspace/ -e AWS_PROFILE=$PROFILE_NAME -e DISABLE_SSL=true PYSPARK_SUBMIT_ARGS="--jars /root/.aws/db2jcc4.jar,/root/.aws/DatabricksJDBC42.jar,/root/.aws/AthenaJDBC42-2.0.35.1000,/root/.aws/presto-jdbc-0.225-SNAPSHOT.jar pyspark-shell" --rm -p 4040:4040 -p 18080:18080 --name glue_spark_submit amazon/aws-glue-libs:glue_libs_3.0.0_image_01 spark-submit /home/glue_user/workspace/src/$SCRIPT_FILE_NAME

已回答 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南