Error reading data with Athena version 3.0 for some fields

0

Hi there,

We are having trouble with Athena Version 3 when some files have different types of data from the DDL table. In our case, some fields are timestamps in numeric format and some in date string.

The error is: HIVE_CURSOR_ERROR: Failed to read Parquet file.

When we had activated Athena Version 2, we didn't have these kind of errors, because it is like v2 ignores the data that mismatch de DDL but It works and shows the field blanks and doesn't throw any problems.

Is there a special conditions or properties we can update in the DDL in order to have the same effect? Right now It is not a solution to reprocess all the historical data, and the only fix that we have right now is to ignore these columns, but we need them for the rows that are correctly inserted. Today we couldn't return to version 2.

Another problem we have is that when you filter some integer fields, it does not return any data, but v2 does.

Do you know how to fix this?

Thanks!

rmis
posta 3 mesi fa182 visualizzazioni
2 Risposte
0

Hello,

As you mentioned in the details, The - HIVE_CURSOR_ERROR: Failed to read Parquet file. Usually happens when there’s a mismatch between the table DDL and the actual underlying data. You can use parquet tools to inspect parquet file schema. A suggestion would be to make changes at the metadata level itself in the parquet files to have single schema as to not encounter the issues.

Also there are some breaking changes in v3 with respect to timestamp fields which can also cause this - check [1][2][3]

Having said that, other possible reasons could be due to[4] - https://docs.aws.amazon.com/athena/latest/ug/troubleshooting-athena.html#troubleshooting-athena-parquet-issues

Try setting the parquet.ignore.statistics property to true for your tables and check if it helps.

Having said that, If it still doesn’t work, then would request you to open a support case with AWS and provide more specific details like the QueryId and sample files to investigate and troubleshoot.

Thank you!

References: [1]https://docs.aws.amazon.com/athena/latest/ug/engine-versions-reference-0003.html#engine-versions-reference-0003-breaking-changes [2]https://repost.aws/questions/QUYNLXewhyQwaOxR0-FdbQEQ/bug-athena-engine-version-3-cannot-handle-timestamps-before-epoch-time [3]https://www.repost.aws/questions/QU-AZ_S-NrT8uMY-updOUFnA/hive-cursor-error-failed-to-read-parquet-file-when-i-using-athena-engine-version-3-in-my-workgroup [4]https://docs.aws.amazon.com/athena/latest/ug/troubleshooting-athena.html#troubleshooting-athena-parquet-issues

AWS
TECNICO DI SUPPORTO
con risposta 3 mesi fa
profile picture
ESPERTO
verificato un mese fa
0

Thank you for the information. However, I have a concern regarding the management of changes in field data types. In the previous version, changing the data type of a field did not generate any errors; it was possible to query the field using the new data type without the old values, but you could use it. Does this mean that, in the current version, changing the data type of a field corrupts the table, you need the creation of a new table to address this issue? So v2 supported schema evolution but v3 not?

rmis
con risposta 3 mesi fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande