How to fix: AWS Docker container for Glue and Databricks JDBC connection - SSL PKIX path building failed?

0

I was able to include the DatabricksJDBC42.jar in my Glue Docker container used for local machine development (link).

I am able to reach the host using Jupyter notebook, but I am getting an SSL type error

Py4JJavaError: An error occurred while calling o80.load. : java.sql.SQLException: [Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target.

My connection string looks like this: .option("url","jdbc:databricks://host.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/111111111111111/1111-111111-abcdefghi;AuthMech=3;UseNativeQuery=0;StripCatalogName=0;")\ .option("dbtable","select 1")\ .option("driver", "com.databricks.client.jdbc.Driver")\ .load()

I used the same JDBC string in my code uploaded to our live account and the AWS Glue job runs and executes the queries in dbtable just fine. Its just in the local Docker Glue development container where we get this SSL error.

I tried adding a separate option for the sslConnection and sslCertLocation and placed tried the files in /root/,aws as well as the jupyter notebook folder. The cert is showing in directory listings and is correctly assigned but the jdbc connection is failing with the SSL error.

Anyone see this before or have a suggestion for next steps?
Thanks.

asked 2 years ago603 views
2 Answers
0

Hello,

The SSLHandshakeException error you are seeing typically occurs when the SSL certificate presented by the server is not trusted by the client. Since the the JDBC connection works in the AWS Glue cloud environment but not in your local development Docker container, it's possible that the difference in behavior is due to differences in the networking and security setup between the two environments. Here are a few suggestions to help you troubleshoot this issue:

  1. Check if there are any firewall rules or network restrictions that could be blocking the connection from your local development environment. For example, if you are behind a corporate firewall, it may be necessary to configure the firewall to allow outbound connections to the Databricks host and port.

  2. Verify that the SSL certificate presented by the Databricks host is trusted by your local development environment. You can do this by checking the truststore used by the JVM running in your Docker container. You may also want to check if there are any differences in the SSL/TLS configuration between the Glue cloud environment and your local development environment.

  3. Check if the version of the JDBC driver used in your local development environment is the same as the one used in the Glue cloud environment. If there are any differences in the driver version, it's possible that there could be compatibility issues.

Additionally, you could try setting the "sslTrustStore" and "sslTrustStorePassword" options in your JDBC connection string to point to the location of the truststore and the password to access it, respectively.

AWS
SUPPORT ENGINEER
Nitin_S
answered 2 years ago
  • I think that specific error can only be caused by 2, it's possible that the cacert (Certificate Authorities certs) than the container brings are not up to date. Maybe you want to try with the new Glue 4 docker image, otherwise you would have to add a truststore with the DataBricks cert but that's not easy if you have never done it.

  • Hi Nitin and Gonzalo, thanks for your responses. I thought the same and retrieved the certificate from the host and tried to see if I could add that to docker container but it seemed to be missing some tools and I couldnt figure out how to install them, yet. I have added certs to a truststore before but not in a docker container with a pared down selection of tools and ability to install/use root.

    I will def. try Glue 4 docker image, and cont. to see if I can get the cert added. Thanks for providing some avenues to move forward! :)

  • You can root into a container e.g. "docker exec -u root -ti glue_pyspark /bin/bash" but to make the changes permanent would need to update the image

0

I tried glue 4.0 container image, but it didn't work out of the box. The updates to the container are very nice though--I haven't tried the debug options, but looks great!

I tried to install the certs but faced a challenge because update-ca-certificates command is not available. I tried logging in as root and installing it, but faced different SSL errors. It might be an uphill battle for me and it would be less painful to push changes in and test, rather than wrestle with this.

Thanks for the help, and I look forward to working in the new glue 4.0 docker container.

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions