- 최신
- 최다 투표
- 가장 많은 댓글
Thank you for reporting this bug in SageMaker Visual ETL flows. I can confirm this is indeed a syntax error in the generated Python code.
The issue occurs when working with catalog names that contain special characters like underscores. The Visual ETL flow is incorrectly generating Python code that treats the catalog name as a variable rather than a string when referencing the S3 Tables Catalog for Apache Iceberg.
In the generated code, the catalog name s3tablescatalog_iceberg is not properly quoted, causing Python to interpret it as a variable name rather than a string literal. The correct syntax should have the catalog name in quotes, something like:
all_columns = get_table_columns(spark, "s3tablescatalog_iceberg", "dwh", "test")
When working with Apache Iceberg tables in Spark SQL, catalog names, namespaces, and table names should be properly quoted. This is especially important when the names contain special characters like underscores.
As a temporary workaround, you might want to avoid using underscores or special characters in your catalog names until this bug is fixed. Alternatively, if you have access to edit the generated Python script before execution, you could manually add the necessary quotes around the catalog name.
I'll make sure this bug report is forwarded to the appropriate team for resolution.
Sources
Accessing Amazon S3 tables with the Amazon S3 Tables Catalog for Apache Iceberg - Amazon Simple Storage Service
Using both external data and fine-grained data in Amazon SageMaker Unified Studio visual ETL jobs - Amazon SageMaker Unified Studio
