- Newest
- Most votes
- Most comments
To query both Iceberg and non-Iceberg tables in AWS Glue Interactive Sessions using the same configuration, you need to set up your Spark session to support both table types. Here's a configuration that should allow you to query both Iceberg and non-Iceberg tables:
--conf spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog
--conf spark.sql.catalog.glue_catalog.warehouse=<s3_warehouse_path>
--conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog
--conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog
--conf spark.sql.catalog.spark_catalog.type=hive
This configuration combines elements from both of your provided configurations. Here's what each line does:
- Sets up the
glue_catalog
as an Iceberg catalog. - Specifies the S3 warehouse location for Iceberg tables. 3-4. Configures the Glue catalog implementation and S3 file I/O for Iceberg.
- Adds Iceberg extensions to Spark.
6-7. Configures the default
spark_catalog
to use a SparkSessionCatalog that can handle both Iceberg and non-Iceberg tables.
With this configuration, you should be able to query both Iceberg and non-Iceberg tables in the same Glue Interactive Session. For Iceberg tables, you can use the glue_catalog
prefix, and for non-Iceberg tables, you can use the default catalog or specify the spark_catalog
prefix.
When using dbt with the Glue adapter, make sure to update your dbt profile to include these configurations. You may need to adjust your dbt models to use the appropriate catalog prefix when referencing tables.
If you're still experiencing issues, double-check that your IAM roles and permissions are correctly set up to access both Iceberg and non-Iceberg tables in your Glue Data Catalog and S3 locations.
Remember that AWS Glue supports Apache Iceberg as one of the open-source frameworks for managing data in a transactionally consistent manner in your Amazon S3-based data lake. This support allows you to work with both traditional and Iceberg tables within the same environment.
Sources
Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter | AWS Big Data Blog
Serverless Data Integration – AWS Glue Features – AWS
Optimizing query performance for Iceberg tables - AWS Glue
Relevant content
- asked 3 years ago
- asked 2 years ago
- AWS OFFICIALUpdated 5 months ago
- AWS OFFICIALUpdated 5 months ago
- AWS OFFICIALUpdated 9 months ago
- AWS OFFICIALUpdated 4 months ago