跳至內容

Running notebook against EMR cluster trying to create delta table in S3

0

I'm trying to run an EMR notebook to create a delta table in S3.

EMR Cluster Version: emr-7.7.0 Installed Applications: Hadoop 3.4.0, Hive 3.1.3, JupyterEnterpriseGateway 2.6.0, Livy 0.8.0, Spark 3.5.3

Code:

%%configure -f
{
    "conf": {
        "spark.jars": "s3://<bucket>/jars/mssql-jdbc-12.8.1.jre8.jar",
        "spark.sql.catalog.spark_catalog": "org.apache.spark.sql.delta.catalog.DeltaCatalog",
        "spark.sql.extensions": "io.delta.sql.DeltaSparkSessionExtension,com.amazonaws.emr.recordserver.connector.spark.sql.RecordServerSQLExtension",
        "spark.sql.catalog.spark_catalog.lf.managed": "true"
    }
}
spark.sql("""CREATE TABLE IF NOT EXISTS ThisIsATable (
    ColumnName bigint not null
)
USING delta location
's3://<bucket>/ThisIsATable'""");

I'm able to run the notebook on this cluster to query other data sources via JDBC but when I run the spark.sql command I get this error: "org.apache.spark.SparkException: Cannot find catalog plugin class for catalog 'spark_catalog': org.apache.spark.sql.delta.catalog.DeltaCatalog."

Thank you!

已提問 1 年前檢視次數 82 次
1 個回答
0

Hi Armen,

The error "Cannot find catalog plugin class for catalog 'spark_catalog': org.apache.spark.sql.delta.catalog.DeltaCatalog" means that Delta Lake is not installed on your EMR cluster.

Delta Lake is not automatically installed with EMR 7.7.0 when you only select Hadoop, Hive, JupyterEnterpriseGateway, Livey, and Spark. You must explicitly select Delta Lake during cluster creation as an application to install.

To fix the error you are getting, create a new EMR cluster with Delta Lake selected as an application.

"Cannot find catalog plugin class for catalog 'spark_catalog': org.apache.spark.sql.delta.catalog.DeltaCatalog"

References:

EMR Delta Lake

AWS
專家
已回答 3 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。