HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://bucket_1/parquet_sample/data_10720000_1 (offset=0, length=16981465): org.apache.parquet.io.GroupColumnIO cannot be cast

0

I converted a couple of compressed json files into parquet files with default snappy compression. The resulting files total less than 50MB. I am not sure why Athena throws this error. I read the background behind the error but I don't understand how Athena is querying several thousands of files which result in this error. I can run a simple select count(*) from table but a select * from table query fails

The json file had to be read line by line due to trailing error I encountered in Python.

Any idea why this occurs?

sl
질문됨 2년 전1786회 조회
2개 답변
1
수락된 답변

Hi There

I don't think this is related to the S3 rate-limiting topic that comes up when you search this error. You would see something like "Slow Down" if that was the case. The key is in the last part of the error

org.apache.parquet.io.GroupColumnIO cannot be cast

This error is probably caused by a parquet schema mismatch. Check your table creation query and test with a smaller subset of the data. There may be some issues with the data format or the table config. see https://docs.aws.amazon.com/athena/latest/ug/troubleshooting-athena.html

profile pictureAWS
전문가
Matt-B
답변함 2년 전
0

Thanks for the response!

sl
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠