Integrating Glue Catalog with EMR on EKS Managed Endpoint for EMR Studio

0

In EMR Studio, when attaching an EMR Virtual cluster to the notebook, Glue catalog is not accessible. Some common errors on trying to access Glue are:

  1. "Hive support is required to ..."
  2. "Table or view not found ..."

Adding enableHiveSupport() to the spark statement does not seem to work either. For example:

spark = SparkSession \
    .builder \
    .appName("Python Spark SQL Hive integration example") \
    .enableHiveSupport() \
    .getOrCreate()

What configuration is required when connecting from the notebook kernel to make Glue catalog accessible to the notebook?

AWS
asked 3 years ago994 views
1 Answer
0
Accepted Answer

For EMR Studio to connect to EMR on EKS, a managed endpoint needs to be created. This managed endpoint needs to be configured to use Hive as the catalog and to point to Glue. Use the following CLI to configure the managed endpoint that is able to connect to Glue as the catalog:

aws emr-containers create-managed-endpoint \
--type JUPYTER_ENTERPRISE_GATEWAY \
--virtual-cluster-id ${virtclusterid} \
--name virtual-emr-endpoint \
--execution-role-arn ${role_arn} \
--release-label ${emr_release_label} \
--certificate-arn ${certarn} \
--region ${region} \
--configuration-overrides '{
    "applicationConfiguration": [
      {
        "classification": "spark-defaults",
        "properties": {
          "spark.hadoop.hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
          "spark.sql.catalogImplementation": "hive"
        }
      }
    ]
  }'

See the following documentation for more information about the various flags and description: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-create-eks-cluster.html

AWS
answered 3 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions