3 Answers
- Newest
- Most votes
- Most comments
1
I figured it out! After a while of trying a few things, you have to provide the conf as follows:
%spark_conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions --conf spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.glue_catalog.warehouse=s3://YOUR-BUCKET/ --conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog --conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO --conf spark.sql.defaultCatalog=glue_catalog
answered a year ago
I had to apply the configuration via SparkConf in my notebook for it to work as a job/script.
from pyspark.conf import SparkConf scf = SparkConf() scf.setAll([ ('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions'), ('spark.sql.catalog.glue_catalog', 'org.apache.iceberg.spark.SparkCatalog'), ('spark.sql.catalog.glue_catalog.warehouse', 's3://<my-bucket>/<my-prefix>'), ('spark.sql.catalog.glue_catalog.catalog-impl', 'org.apache.iceberg.aws.glue.GlueCatalog'), ('spark.sql.catalog.glue_catalog.io-impl', 'org.apache.iceberg.aws.s3.S3FileIO'), ('spark.sql.defaultCatalog', 'glue_catalog') ]) sc = SparkContext(conf=scf)
0
In the visual job, the Iceberg catalog is configured for you, in your script job, you have to do it yourself, otherwise it won't recognize the "glue_catalog" (which is really the Iceberg catalog backed by Glue).
See https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-iceberg.html#aws-glue-programming-etl-format-iceberg-enable
0
Did you manage to resolve this?
I have the following magics:
%%configure
{
'--datalake-formats': 'iceberg',
'--enable-glue-datacatalog': True
}
And following spark session config:
sc = (
SparkSession.builder
.config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
.config("spark.sql.catalog.glue_catalog", "org.apache.iceberg.spark.SparkCatalog")
.config("spark.sql.catalog.glue_catalog.warehouse", "s3://my_warehouse/")
.config("spark.sql.catalog.glue_catalog.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog")
.config("spark.sql.catalog.glue_catalog.io-impl", "org.apache.iceberg.aws.s3.S3FileIO")
.getOrCreate()
)
The examples to create a table work but I'm getting the same error:
df = glue_context.create_data_frame.from_catalog(
database='foo',
table_name='table'
)
AnalysisException: spark_catalog requires a single-part namespace, but got [glue_catalog, foo]
answered a year ago
Relevant content
- asked 5 months ago

Do you happen to be running this in a Glue ETL Notebook? Using Glue 4.0, my notebook runs perfectly fine through the Jupyter interface executing cells from top to bottom, but when I "Run" the notebook as a job I get the error that you have. Has anyone experienced inconsistencies with the behavior of notebooks and jobs?
UPDATE: I figured this out, see the comment in the answers below.