ICEBERG_CANNOT_OPEN_SPLIT: Error opening Iceberg split s3

0

Recurring Issue with AWS Athena When Running Queries on Iceberg Table.

Description I am trying to run queries on an Iceberg table using AWS Athena. The data is stored in S3, and I am using EMR 6.12.0, Iceberg 1.3.0-amzn-0, and Spark 3.4.0. The data ingestion process is running on EMR, which consumes data from a Kafka topic and ingests it into my Iceberg table in S3. Interestingly, sometimes the query runs successfully, but other times I encounter the following error in Athena:

ICEBERG_CANNOT_OPEN_SPLIT: Error opening Iceberg split my_s3_path/data/id_pk_bucket=2/created_at_month=2023-08/my_parquet.parquet (offset=4, length=16038): Incorrect file size (16042) for file (end of stream not reached): my_s3_path/data/id_pk_bucket=2/created_at_month=2023-08/my_parquet.parquet

The error occurs only in Athena; running a query on the table using Spark works fine.

Steps to Reproduce Configured EMR with version 6.12.0 and Spark 3.4.0.

Set up an ingestion process on EMR to consume data from a Kafka topic and insert it into an Iceberg table on S3.

Created an Iceberg table on S3 using Iceberg version 1.3.0-amzn-0 and the following properties:

OPTIONS ( 'format-version'='2', 'write.target-file-size-bytes'='124217728', 'history.expire.max-snapshot-age-ms'='172800000' PARTITIONED BY (bucket(10, my_pk), months(created_at)) ) Data write process executed in Spark:

query = ( df.writeStream.format("iceberg") .outputMode("append") .trigger(once=True) .option("path", iceberg_table) .option("fanout-enabled", "true") .option( "checkpointLocation", checkpoint_location, ) )

query.toTable(iceberg_table).awaitTermination() Tried running a query in AWS Athena. SELECT * FROM "db"."table" limit 10; Expected Result I expected the query in AWS Athena to run without any issues.

Actual Result I am receiving a recurring error, ICEBERG_CANNOT_OPEN_SPLIT, which appears to indicate there is an issue with the file size or with the data streaming from S3.

Additional Information EMR Version: 6.12.0 Iceberg Version: 1.3.0-amzn-0 Spark Version: 3.4.0 We are using Glue as the catalog I am open to providing more information as needed. Thank you!

  • did u find the answer? can u pls let me know?

asked 8 months ago586 views
1 Answer
0

Thank you for your question, however this requires us to look at cluster and logs so that we can troubleshoot further. Please open a case with premium support so that we may debug the issue and look into your resources accordingly. We are unable to share details on the cluster here due to privacy and security concerns.

AWS
SUPPORT ENGINEER
answered 8 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions