HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://bucket_1/parquet_sample/data_10720000_1 (offset=0, length=16981465): org.apache.parquet.io.GroupColumnIO cannot be cast

0

I converted a couple of compressed json files into parquet files with default snappy compression. The resulting files total less than 50MB. I am not sure why Athena throws this error. I read the background behind the error but I don't understand how Athena is querying several thousands of files which result in this error. I can run a simple select count(*) from table but a select * from table query fails

The json file had to be read line by line due to trailing error I encountered in Python.

Any idea why this occurs?

sl
質問済み 2年前1785ビュー
2回答
1
承認された回答

Hi There

I don't think this is related to the S3 rate-limiting topic that comes up when you search this error. You would see something like "Slow Down" if that was the case. The key is in the last part of the error

org.apache.parquet.io.GroupColumnIO cannot be cast

This error is probably caused by a parquet schema mismatch. Check your table creation query and test with a smaller subset of the data. There may be some issues with the data format or the table config. see https://docs.aws.amazon.com/athena/latest/ug/troubleshooting-athena.html

profile pictureAWS
エキスパート
Matt-B
回答済み 2年前
0

Thanks for the response!

sl
回答済み 2年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ