Connecting to Glue Hive Data Catalog from EC2 or Local Computer with Spark

1

Hi, I built Iceberg table that uses Glue as the Hive catalog. Team members I work with want to connect to it using Spark. They run Spark locally on their laptop and want to read the table or they have Spark running locally in an Airflow Task on an EC2 and want to connect to it. Is that possible to configure Spark not running on Glue or EMR to connect to Glue as the Hive Metastore? If so some examples would be appreciative.

We set this conf when running Iceberg "spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory". Is this a JAR I can add to any Spark application that allows it to connect to AWS Glue as the Hive site or only works on EMR?

Thomas
已提問 1 年前檢視次數 334 次
1 個回答
1

Unfortunately, it's not enough to configure that and add the library, the Spark distribution needs to be patched to work with the Glue catalog client.
You can build that distribution yourself following the instructions here: https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore but it's much easier to extract the patched jars from EMR

profile pictureAWS
專家
已回答 1 年前
  • Having a similar situation with Hudi tables. How do you extract the patched jars from EMR? Can you link to the documentation?

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南