내용으로 건너뛰기

[Bug Report] Bug found on Sagemaker Visual ETL flows

0

**Bug Description: ** When creating the following flow, SageMaker produces a python script with a syntax error.

Steps to reproduce:

  1. Create a source S3 Bucket containing a csv file and a iceberg tables destination table in LakeHouse: Catalog name: s3tablescatalog/iceberg Database: dwh Table: test
  2. Create a visual ETL flow in "spark.compatibility" mode:
  • Read from a table in your Lakehouse catalogs (S3 Bucket with the raw file)
  • Filter data based on set conditions.
  • Choose specific columns to keep in dataset.
  • Change data types or add custom expressions to modify column values.
  • Write into a table in your Lakehouse catalogs (Iceberg table)
  1. Run the script

Output "SyntaxError: invalid syntax. Perhaps you forgot a comma? (filename.py, line 38)

Root cause analysis The visual editor generates the following line of code where the catalog name is not quoted:

all_columns = get_table_columns(spark, s3tablescatalog_iceberg, "dwh", "test") 

Best,

Wolfgang.

질문됨 3달 전42회 조회
1개 답변
0

Thank you for reporting this bug in SageMaker Visual ETL flows. I can confirm this is indeed a syntax error in the generated Python code.

The issue occurs when working with catalog names that contain special characters like underscores. The Visual ETL flow is incorrectly generating Python code that treats the catalog name as a variable rather than a string when referencing the S3 Tables Catalog for Apache Iceberg.

In the generated code, the catalog name s3tablescatalog_iceberg is not properly quoted, causing Python to interpret it as a variable name rather than a string literal. The correct syntax should have the catalog name in quotes, something like:

all_columns = get_table_columns(spark, "s3tablescatalog_iceberg", "dwh", "test")

When working with Apache Iceberg tables in Spark SQL, catalog names, namespaces, and table names should be properly quoted. This is especially important when the names contain special characters like underscores.

As a temporary workaround, you might want to avoid using underscores or special characters in your catalog names until this bug is fixed. Alternatively, if you have access to edit the generated Python script before execution, you could manually add the necessary quotes around the catalog name.

I'll make sure this bug report is forwarded to the appropriate team for resolution.
Sources
Accessing Amazon S3 tables with the Amazon S3 Tables Catalog for Apache Iceberg - Amazon Simple Storage Service
Using both external data and fine-grained data in Amazon SageMaker Unified Studio visual ETL jobs - Amazon SageMaker Unified Studio

답변함 3달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

관련 콘텐츠