HIVE_CURSOR_ERROR: Failed to read Parquet file:

0

Getting HIVE_CURSOR_ERROR: Failed to read Parquet file: s3://<my_bucket>/<my_directory>/output.parquet

I have created an external table on Parquet files generated by Pandas & Pyarrow library in Python.

If I execute SELECT count(*) FROM <table> it's giving me correct output But If I execute SELECT * FROM <table> It's throwing the above exception

What's going wrong ?

The Athena table structure is below .

CREATE EXTERNAL TABLE mmx_india_parquet_edr_part( uid string COMMENT 'from deserializer', customerid string COMMENT 'from deserializer', productid string COMMENT 'from deserializer', edrmode string COMMENT 'from deserializer', destinationid string COMMENT 'from deserializer', protocolid string COMMENT 'from deserializer', host string COMMENT 'from deserializer', servicetype string COMMENT 'from deserializer', t1 bigint COMMENT 'from deserializer', t2 bigint COMMENT 'from deserializer', status string COMMENT 'from deserializer', ruleid string COMMENT 'from deserializer', supplierid string COMMENT 'from deserializer', commandstatus string COMMENT 'from deserializer', httpstatus string COMMENT 'from deserializer', messageid string COMMENT 'from deserializer', fragmented string COMMENT 'from deserializer', fragnumber string COMMENT 'from deserializer', fragtotal string COMMENT 'from deserializer', messagestate string COMMENT 'from deserializer') PARTITIONED BY ( traffic_date date) STORED AS PARQUET LOCATION 's3://<my_bucket>/<my_directory>/output.parquet'

已提问 1 年前10569 查看次数
1 回答
0

Hello,

This error normally indicates there’s a mismatch between the table DDL and the actual underlying data. You can use this parquet tool to inspect parquet file schema. If the file is quite large, you may also want to test on a small sample of the data first, in case there’s mixed data types for one column.

Apart from that, you can also test on different Athena engine versions to see if this is caused by some engine specific behaviours. (Athena engine V2 uses Presto and V3 uses Trino). This doc shows how to change the engine version under a workgroup.

AWS
支持工程师
Jann_P
已回答 1 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则