Integrating Glue Catalog with EMR on EKS Managed Endpoint for EMR Studio

0

In EMR Studio, when attaching an EMR Virtual cluster to the notebook, Glue catalog is not accessible. Some common errors on trying to access Glue are:

  1. "Hive support is required to ..."
  2. "Table or view not found ..."

Adding enableHiveSupport() to the spark statement does not seem to work either. For example:

spark = SparkSession \
    .builder \
    .appName("Python Spark SQL Hive integration example") \
    .enableHiveSupport() \
    .getOrCreate()

What configuration is required when connecting from the notebook kernel to make Glue catalog accessible to the notebook?

AWS
posta 3 anni fa1017 visualizzazioni
1 Risposta
0
Risposta accettata

For EMR Studio to connect to EMR on EKS, a managed endpoint needs to be created. This managed endpoint needs to be configured to use Hive as the catalog and to point to Glue. Use the following CLI to configure the managed endpoint that is able to connect to Glue as the catalog:

aws emr-containers create-managed-endpoint \
--type JUPYTER_ENTERPRISE_GATEWAY \
--virtual-cluster-id ${virtclusterid} \
--name virtual-emr-endpoint \
--execution-role-arn ${role_arn} \
--release-label ${emr_release_label} \
--certificate-arn ${certarn} \
--region ${region} \
--configuration-overrides '{
    "applicationConfiguration": [
      {
        "classification": "spark-defaults",
        "properties": {
          "spark.hadoop.hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
          "spark.sql.catalogImplementation": "hive"
        }
      }
    ]
  }'

See the following documentation for more information about the various flags and description: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-create-eks-cluster.html

AWS
con risposta 3 anni fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande