HIVE_CURSOR_ERROR: Failed to read Parquet file:

0

Getting HIVE_CURSOR_ERROR: Failed to read Parquet file: s3://<my_bucket>/<my_directory>/output.parquet

I have created an external table on Parquet files generated by Pandas & Pyarrow library in Python.

If I execute SELECT count(*) FROM <table> it's giving me correct output But If I execute SELECT * FROM <table> It's throwing the above exception

What's going wrong ?

The Athena table structure is below .

CREATE EXTERNAL TABLE mmx_india_parquet_edr_part( uid string COMMENT 'from deserializer', customerid string COMMENT 'from deserializer', productid string COMMENT 'from deserializer', edrmode string COMMENT 'from deserializer', destinationid string COMMENT 'from deserializer', protocolid string COMMENT 'from deserializer', host string COMMENT 'from deserializer', servicetype string COMMENT 'from deserializer', t1 bigint COMMENT 'from deserializer', t2 bigint COMMENT 'from deserializer', status string COMMENT 'from deserializer', ruleid string COMMENT 'from deserializer', supplierid string COMMENT 'from deserializer', commandstatus string COMMENT 'from deserializer', httpstatus string COMMENT 'from deserializer', messageid string COMMENT 'from deserializer', fragmented string COMMENT 'from deserializer', fragnumber string COMMENT 'from deserializer', fragtotal string COMMENT 'from deserializer', messagestate string COMMENT 'from deserializer') PARTITIONED BY ( traffic_date date) STORED AS PARQUET LOCATION 's3://<my_bucket>/<my_directory>/output.parquet'

질문됨 일 년 전10568회 조회
1개 답변
0

Hello,

This error normally indicates there’s a mismatch between the table DDL and the actual underlying data. You can use this parquet tool to inspect parquet file schema. If the file is quite large, you may also want to test on a small sample of the data first, in case there’s mixed data types for one column.

Apart from that, you can also test on different Athena engine versions to see if this is caused by some engine specific behaviours. (Athena engine V2 uses Presto and V3 uses Trino). This doc shows how to change the engine version under a workgroup.

AWS
지원 엔지니어
Jann_P
답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인