Error reading data with Athena version 3.0 for some fields

0

Hi there,

We are having trouble with Athena Version 3 when some files have different types of data from the DDL table. In our case, some fields are timestamps in numeric format and some in date string.

The error is: HIVE_CURSOR_ERROR: Failed to read Parquet file.

When we had activated Athena Version 2, we didn't have these kind of errors, because it is like v2 ignores the data that mismatch de DDL but It works and shows the field blanks and doesn't throw any problems.

Is there a special conditions or properties we can update in the DDL in order to have the same effect? Right now It is not a solution to reprocess all the historical data, and the only fix that we have right now is to ignore these columns, but we need them for the rows that are correctly inserted. Today we couldn't return to version 2.

Another problem we have is that when you filter some integer fields, it does not return any data, but v2 does.

Do you know how to fix this?

Thanks!

rmis
asked 2 months ago172 views
2 Answers
0

Hello,

As you mentioned in the details, The - HIVE_CURSOR_ERROR: Failed to read Parquet file. Usually happens when there’s a mismatch between the table DDL and the actual underlying data. You can use parquet tools to inspect parquet file schema. A suggestion would be to make changes at the metadata level itself in the parquet files to have single schema as to not encounter the issues.

Also there are some breaking changes in v3 with respect to timestamp fields which can also cause this - check [1][2][3]

Having said that, other possible reasons could be due to[4] - https://docs.aws.amazon.com/athena/latest/ug/troubleshooting-athena.html#troubleshooting-athena-parquet-issues

Try setting the parquet.ignore.statistics property to true for your tables and check if it helps.

Having said that, If it still doesn’t work, then would request you to open a support case with AWS and provide more specific details like the QueryId and sample files to investigate and troubleshoot.

Thank you!

References: [1]https://docs.aws.amazon.com/athena/latest/ug/engine-versions-reference-0003.html#engine-versions-reference-0003-breaking-changes [2]https://repost.aws/questions/QUYNLXewhyQwaOxR0-FdbQEQ/bug-athena-engine-version-3-cannot-handle-timestamps-before-epoch-time [3]https://www.repost.aws/questions/QU-AZ_S-NrT8uMY-updOUFnA/hive-cursor-error-failed-to-read-parquet-file-when-i-using-athena-engine-version-3-in-my-workgroup [4]https://docs.aws.amazon.com/athena/latest/ug/troubleshooting-athena.html#troubleshooting-athena-parquet-issues

AWS
SUPPORT ENGINEER
answered 2 months ago
profile picture
EXPERT
reviewed a month ago
0

Thank you for the information. However, I have a concern regarding the management of changes in field data types. In the previous version, changing the data type of a field did not generate any errors; it was possible to query the field using the new data type without the old values, but you could use it. Does this mean that, in the current version, changing the data type of a field corrupts the table, you need the creation of a new table to address this issue? So v2 supported schema evolution but v3 not?

rmis
answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions