내용으로 건너뛰기

Running notebook against EMR cluster trying to create delta table in S3

0

I'm trying to run an EMR notebook to create a delta table in S3.

EMR Cluster Version: emr-7.7.0 Installed Applications: Hadoop 3.4.0, Hive 3.1.3, JupyterEnterpriseGateway 2.6.0, Livy 0.8.0, Spark 3.5.3

Code:

%%configure -f
{
    "conf": {
        "spark.jars": "s3://<bucket>/jars/mssql-jdbc-12.8.1.jre8.jar",
        "spark.sql.catalog.spark_catalog": "org.apache.spark.sql.delta.catalog.DeltaCatalog",
        "spark.sql.extensions": "io.delta.sql.DeltaSparkSessionExtension,com.amazonaws.emr.recordserver.connector.spark.sql.RecordServerSQLExtension",
        "spark.sql.catalog.spark_catalog.lf.managed": "true"
    }
}
spark.sql("""CREATE TABLE IF NOT EXISTS ThisIsATable (
    ColumnName bigint not null
)
USING delta location
's3://<bucket>/ThisIsATable'""");

I'm able to run the notebook on this cluster to query other data sources via JDBC but when I run the spark.sql command I get this error: "org.apache.spark.SparkException: Cannot find catalog plugin class for catalog 'spark_catalog': org.apache.spark.sql.delta.catalog.DeltaCatalog."

Thank you!

질문됨 일 년 전80회 조회
1개 답변
0

Hi Armen,

The error "Cannot find catalog plugin class for catalog 'spark_catalog': org.apache.spark.sql.delta.catalog.DeltaCatalog" means that Delta Lake is not installed on your EMR cluster.

Delta Lake is not automatically installed with EMR 7.7.0 when you only select Hadoop, Hive, JupyterEnterpriseGateway, Livey, and Spark. You must explicitly select Delta Lake during cluster creation as an application to install.

To fix the error you are getting, create a new EMR cluster with Delta Lake selected as an application.

"Cannot find catalog plugin class for catalog 'spark_catalog': org.apache.spark.sql.delta.catalog.DeltaCatalog"

References:

EMR Delta Lake

AWS
전문가
답변함 3달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

관련 콘텐츠