HIVE_CURSOR_ERROR: Failed to read Parquet file:

0

Getting HIVE_CURSOR_ERROR: Failed to read Parquet file: s3://<my_bucket>/<my_directory>/output.parquet

I have created an external table on Parquet files generated by Pandas & Pyarrow library in Python.

If I execute SELECT count(*) FROM <table> it's giving me correct output But If I execute SELECT * FROM <table> It's throwing the above exception

What's going wrong ?

The Athena table structure is below .

CREATE EXTERNAL TABLE mmx_india_parquet_edr_part( uid string COMMENT 'from deserializer', customerid string COMMENT 'from deserializer', productid string COMMENT 'from deserializer', edrmode string COMMENT 'from deserializer', destinationid string COMMENT 'from deserializer', protocolid string COMMENT 'from deserializer', host string COMMENT 'from deserializer', servicetype string COMMENT 'from deserializer', t1 bigint COMMENT 'from deserializer', t2 bigint COMMENT 'from deserializer', status string COMMENT 'from deserializer', ruleid string COMMENT 'from deserializer', supplierid string COMMENT 'from deserializer', commandstatus string COMMENT 'from deserializer', httpstatus string COMMENT 'from deserializer', messageid string COMMENT 'from deserializer', fragmented string COMMENT 'from deserializer', fragnumber string COMMENT 'from deserializer', fragtotal string COMMENT 'from deserializer', messagestate string COMMENT 'from deserializer') PARTITIONED BY ( traffic_date date) STORED AS PARQUET LOCATION 's3://<my_bucket>/<my_directory>/output.parquet'

asked 10 months ago10413 views
1 Answer
0

Hello,

This error normally indicates there’s a mismatch between the table DDL and the actual underlying data. You can use this parquet tool to inspect parquet file schema. If the file is quite large, you may also want to test on a small sample of the data first, in case there’s mixed data types for one column.

Apart from that, you can also test on different Athena engine versions to see if this is caused by some engine specific behaviours. (Athena engine V2 uses Presto and V3 uses Trino). This doc shows how to change the engine version under a workgroup.

AWS
SUPPORT ENGINEER
Jann_P
answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions