Connecting to Glue Hive Data Catalog from EC2 or Local Computer with Spark

1

Hi, I built Iceberg table that uses Glue as the Hive catalog. Team members I work with want to connect to it using Spark. They run Spark locally on their laptop and want to read the table or they have Spark running locally in an Airflow Task on an EC2 and want to connect to it. Is that possible to configure Spark not running on Glue or EMR to connect to Glue as the Hive Metastore? If so some examples would be appreciative.

We set this conf when running Iceberg "spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory". Is this a JAR I can add to any Spark application that allows it to connect to AWS Glue as the Hive site or only works on EMR?

Thomas
質問済み 1年前335ビュー
1回答
1

Unfortunately, it's not enough to configure that and add the library, the Spark distribution needs to be patched to work with the Glue catalog client.
You can build that distribution yourself following the instructions here: https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore but it's much easier to extract the patched jars from EMR

profile pictureAWS
エキスパート
回答済み 1年前
  • Having a similar situation with Hudi tables. How do you extract the patched jars from EMR? Can you link to the documentation?

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ