Glue delete not working on Iceberg

0

Hallo!

I am using Glue 4.0 and would like to delete rows from Iceberg table. In order to get the deletion condition I need to fetch data from another table which is a dblink table and not Iceberg format.

Relevant Spark config info:

    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
    .config("spark.sql.catalog.glue_catalog", "org.apache.iceberg.spark.SparkCatalog")
    .config("spark.sql.catalog.glue_catalog.warehouse", f"s3://[REDACTED]/")
    .config("spark.sql.catalog.glue_catalog.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog")
    .config("spark.sql.catalog.glue_catalog.io-impl", "org.apache.iceberg.aws.s3.S3FileIO")
    .enableHiveSupport()

Additinal Job Parameters:

--datalake-formats: iceberg

When I am issuing this kind of Delete command:

spark.sql("DELETE FROM `glue_catalog.ice_db`.`ice_table` AS T WHERE EXISTS (SELECT col1 FROM `another_db_link`.`db_link_table` AS R WHERE T.col1 = R.col1)")

Get this error message:

An error occurred while calling o97.showString. Cannot support vectorized reads for column [uuid] optional binary uuid (STRING) = 1 with encoding DELTA_BYTE_ARRAY. Disable vectorized reads to read this table/file

The simple deletion is working fine, like this:

spark.sql("DELETE FROM `glue_catalog`.`ice_db.ice_table` AS T WHERE col1 = 1")

Do you have any idea?

已提问 1 年前641 查看次数
1 回答
1
已接受的回答

From the above issues where you are getting an error "An error occurred while calling o97.showString. Cannot support vectorized reads for column [uuid] optional binary uuid (STRING) = 1 with encoding DELTA_BYTE_ARRAY. Disable vectorized reads to read this table/file" , as iceberg does not use spark's vectorized reader, a solution is to set the parameter "read.parquet.vectorization.enabled" to false on the Glue table's Table properties itself, to avoid vectorized reads.

Could you please try this at your end? To do this navigate to your Glue tables page and choose you Glue table that is being accessed in the job. Later click Actions > Edit table. And add a new Table property with: key: read.parquet.vectorization.enabled value: false

AWS
Sahil_S
已回答 1 年前
AWS
专家
已审核 1 年前
  • Thanks! One remark: It works for me only if add the table property via Glue.

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则