AWS Docker container for Glue and Databricks JDBC connection

0

Hello, We are using the AWS Docker container for Glue (available here) and we are trying to connect to a Databricks JDBC connection using the DatabricksJDBC42.jar (available here). We placed the jar file both in the same folder as the jupyter notebook, and have also placed it in the C:/.aws/ folder. When we try to connect we get the error "java.lang.ClassNotFoundException: com.databricks.client.jdbc.Driver".

We have used DB2 driver without issue, using the same setup. Also, when we upload the jar to AWS and attach it to the glue job as an --extra-jars parameter it works fine.

Has anyone gotten this to successfully work?

질문됨 일 년 전691회 조회
3개 답변
0

Hello,

I understand that you are receiving the following error while trying to connect to your Databricks cluster when you are following the blog post “Develop and test AWS Glue version 3.0 and 4.0 jobs locally using a Docker container” :

java.lang.ClassNotFoundException: com.databricks.client.jdbc.Driver

Since you are using the updated DatabricksJDBC42.jar driver, please ensure that the naming convention used for the JDBC URL is also as per DatabricksJDBC42.jar and not according to the legacy SparkJDBC42.jar.

Refer to: https://docs.databricks.com/integrations/jdbc-odbc-bi.html#building-the-connection-url-for-the-databricks-driver

Modified params:

  • jdbc:databricks://
  • Use HttpPath
  • Supply driver class name as 'com.databricks.client.jdbc.Driver'

If the issue still persists, then please open a support case with AWS providing the connection details and code snippet used - https://docs.aws.amazon.com/awssupport/latest/user/case-management.html#creating-a-support-case

Thank you.

AWS
지원 엔지니어
답변함 일 년 전
0

If it works with --extra-jars, it means in the docker container Glue is not able to find the jar, placing it in the notebook folder or .aws won't do.
The safest thing is to ssh into the container and put the jar under /home/glue_user/spark/jars

profile pictureAWS
전문가
답변함 일 년 전
0

Gonzalo's answer worked, but also I found that adding the jar in the docker run command was easiest. There was no need to commit the modified docker container image. However, I am now facing a new error related to SSL PKIX path building failed. I will post it as a separate question. Thanks for your attention team! Appreciate the inputs. :)

docker run -it -v ~/.aws:/home/glue_user/.aws -v $WORKSPACE_LOCATION:/home/glue_user/workspace/ -e AWS_PROFILE=$PROFILE_NAME -e DISABLE_SSL=true PYSPARK_SUBMIT_ARGS="--jars /root/.aws/db2jcc4.jar,/root/.aws/DatabricksJDBC42.jar,/root/.aws/AthenaJDBC42-2.0.35.1000,/root/.aws/presto-jdbc-0.225-SNAPSHOT.jar pyspark-shell" --rm -p 4040:4040 -p 18080:18080 --name glue_spark_submit amazon/aws-glue-libs:glue_libs_3.0.0_image_01 spark-submit /home/glue_user/workspace/src/$SCRIPT_FILE_NAME

답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠