Skip to content

Glue script job error spark_catalog requires a single-part namespace, but got [glue_catalog, foo]

0

I am trying to query an iceberg table via the glue data catalog. This works fine in my visual etl job, but when I try to do it in a script, it's throwing an error. It's likely due to some sort of settings, but I haven't been able to determine what. The code is

glueContext.create_data_frame.from_catalog(
        database="foo",
        table_name="bar",
    )

and the error is AnalysisException: spark_catalog requires a single-part namespace, but got [glue_catalog, foo] The error itself seems to be something that the from_catalog method is doing, and I can't figure out what the root cause is. Any suggestions?

  • Do you happen to be running this in a Glue ETL Notebook? Using Glue 4.0, my notebook runs perfectly fine through the Jupyter interface executing cells from top to bottom, but when I "Run" the notebook as a job I get the error that you have. Has anyone experienced inconsistencies with the behavior of notebooks and jobs?

    UPDATE: I figured this out, see the comment in the answers below.

asked 2 years ago11.4K views
3 Answers
1

I figured it out! After a while of trying a few things, you have to provide the conf as follows:

%spark_conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions --conf spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.glue_catalog.warehouse=s3://YOUR-BUCKET/ --conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog --conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO --conf spark.sql.defaultCatalog=glue_catalog
answered a year ago
  • I had to apply the configuration via SparkConf in my notebook for it to work as a job/script.

    from pyspark.conf import SparkConf
    scf = SparkConf()
    scf.setAll([
        ('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions'),
        ('spark.sql.catalog.glue_catalog', 'org.apache.iceberg.spark.SparkCatalog'),
        ('spark.sql.catalog.glue_catalog.warehouse', 's3://<my-bucket>/<my-prefix>'),
        ('spark.sql.catalog.glue_catalog.catalog-impl', 'org.apache.iceberg.aws.glue.GlueCatalog'),
        ('spark.sql.catalog.glue_catalog.io-impl', 'org.apache.iceberg.aws.s3.S3FileIO'),
        ('spark.sql.defaultCatalog', 'glue_catalog')
    ])
    sc = SparkContext(conf=scf)
    
0

In the visual job, the Iceberg catalog is configured for you, in your script job, you have to do it yourself, otherwise it won't recognize the "glue_catalog" (which is really the Iceberg catalog backed by Glue).
See https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-iceberg.html#aws-glue-programming-etl-format-iceberg-enable

AWS
EXPERT
answered 2 years ago
0

Did you manage to resolve this?

I have the following magics:

%%configure 
{
    '--datalake-formats': 'iceberg',
    '--enable-glue-datacatalog': True
}

And following spark session config:

sc = (
    SparkSession.builder
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
    .config("spark.sql.catalog.glue_catalog", "org.apache.iceberg.spark.SparkCatalog")
    .config("spark.sql.catalog.glue_catalog.warehouse", "s3://my_warehouse/")
    .config("spark.sql.catalog.glue_catalog.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog")
    .config("spark.sql.catalog.glue_catalog.io-impl", "org.apache.iceberg.aws.s3.S3FileIO")
    .getOrCreate()
)

The examples to create a table work but I'm getting the same error:

df = glue_context.create_data_frame.from_catalog(
    database='foo', 
    table_name='table'
)
AnalysisException: spark_catalog requires a single-part namespace, but got [glue_catalog, foo]
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.