How to connect EKS cluster spark application with a Glue Catalogue ?

0

I'm trying to fetch Glue Catalogue tables from Spark on EKS cluster, added to spark configuration:

"spark.hadoop.hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
          "spark.sql.catalogImplementation": "hive",

and .enableHiveSupport() to session creation, but it looks like I'm missing something

Exception in thread "main" org.apache.spark.sql.AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] The table or view zzz.yyy cannot be found. Verify the spelling and correctness of the schema and catalog.

Mind tell me what additional connectivity I'm missing ?

asked 4 months ago91 views
1 Answer
0

Alright, to link your Spark application with the Glue Catalog, here's what you do:

First, make sure Hive support is turned on in Spark. You can do this by simply calling spark.enableHiveSupport().

Then, configure Spark SQL to use Hive. Just set the catalog implementation to "hive" using spark.conf.set("spark.sql.catalogImplementation","hive").

Next up, set the Hive metastore client factory to the Glue Data Catalog client with spark.conf.set("spark.hadoop.hive.metastore.client.factory.class", "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory").

Once that's done, you're all set to access tables in the Glue Catalog using Spark SQL queries like spark.sql("SELECT * FROM database.table").

And remember, to ensure everything works smoothly, make sure your Spark application and Glue Catalog are in the same AWS account and region for cross-account access.

profile picture
EXPERT
answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions