Connecting to Glue Hive Data Catalog from EC2 or Local Computer with Spark

1

Hi, I built Iceberg table that uses Glue as the Hive catalog. Team members I work with want to connect to it using Spark. They run Spark locally on their laptop and want to read the table or they have Spark running locally in an Airflow Task on an EC2 and want to connect to it. Is that possible to configure Spark not running on Glue or EMR to connect to Glue as the Hive Metastore? If so some examples would be appreciative.

We set this conf when running Iceberg "spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory". Is this a JAR I can add to any Spark application that allows it to connect to AWS Glue as the Hive site or only works on EMR?

Thomas
feita há um ano335 visualizações
1 Resposta
1

Unfortunately, it's not enough to configure that and add the library, the Spark distribution needs to be patched to work with the Glue catalog client.
You can build that distribution yourself following the instructions here: https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore but it's much easier to extract the patched jars from EMR

profile pictureAWS
ESPECIALISTA
respondido há um ano
  • Having a similar situation with Hudi tables. How do you extract the patched jars from EMR? Can you link to the documentation?

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas