Skip to content

ATHENA query redshift table with a timestamptz column FAIL

0

when using ATHENA to query a redshift table

it fail on column of type timestamptz

ICEBERG_BAD_DATA: Field created_at's type INT96 in parquet file s3://AAAAAAAAAAAAAAA-5b5dc388-103a-4130-bab4-c1508ede6f84/AROATTRJ76GEXBTJK47R6:b3932febca3c2cbc91ae4255d654c1917dacfb9780a2f2d80ec346c9293c02bc/write/19d1db55-6de8-4ca1-b985-7799a6335aa7/0000_part_00.parquet is incompatible with type timestamp(6) with time zone defined in table schema

Enter image description here

asked 10 months ago232 views
2 Answers
0

Hello,

From your case correspondence, I understand that you are encountering the following error in Amazon Athena:

Error:

ICEBERG_BAD_DATA: Field created_at's type INT96 in parquet file s3://AAAAAAAAAAAAAAA-5b5dc388-103a-4130-bab4-c1508ede6f84/AROATTRJ76GEXBTJK47R6:b3932febca3c2cbc91ae4255d654c1917dacfb9780a2f2d80ec346c9293c02bc/write/19d1db55-6de8-4ca1-b985-7799a6335aa7/0000_part_00.parquet is incompatible with type timestamp(6) with time zone defined in table schema

After reviewing the screenshot you shared, I can see that the column is defined as a timestamp. However, based on my research, I found that Athena currently cannot read timestamp columns represented as INT64 in Iceberg Parquet tables that were migrated from Hive tables written by Spark.

To mitigate this issue, I suggest adding the following Spark configurations:

'spark.sql.iceberg.use-timestamp-without-timezone-in-new-tables': 'true' 'spark.sql.iceberg.handle-timestamp-without-timezone': 'true' 'spark.sql.iceberg.read-timestamp-as-timestamp-without-timezone': 'true'

I would request that you add the above Spark configurations to your Spark environment for your migration to Iceberg tables.

Please try this approach and let me know how it works when querying the migrated Iceberg table in Athena.

If you have any other questions or concerns, feel free to reach out to me. I’ll be more than happy to assist further. Thank you for your understanding and cooperation.

Have a great day ahead!

AWS
answered 10 months ago
  • No, I'm using athena to read DIRECTLY the content of a REDSHIFT table ( i'm not using spark )

0

The error you're encountering is due to a mismatch between the data type in the Parquet file and the data type defined in the Athena table schema. Specifically, the Parquet file contains a column 'created_at' with the INT96 data type, which is commonly used to represent timestamps in Parquet files. However, the Athena table schema defines this column as 'timestamp(6) with time zone', which is equivalent to the timestamptz type in Redshift.

This mismatch is causing Athena to fail when trying to read the data. The INT96 type in Parquet is not directly compatible with Athena's timestamp with time zone type.

To resolve this issue, you have a few options:

  1. Modify the table definition in Athena to match the data type in the Parquet file. You could try using the 'timestamp' type without the time zone specification.

  2. If possible, adjust the data export process from Redshift to use a different timestamp format that's more compatible with Athena, such as using a string representation of the timestamp.

  3. Use a custom SerDe (Serializer/Deserializer) that can handle the conversion between INT96 and timestamptz. However, this might require additional setup and configuration.

  4. If you're using the Redshift Spectrum feature to query Redshift data through Athena, ensure that the external table definition in Redshift matches the actual data types in the underlying files.

Remember that Athena has some limitations when it comes to certain timestamp types, especially when dealing with time zones. It's often safer to use 'timestamp' without time zone information if possible, and handle time zone conversions in your queries or application logic if needed.
Sources
Troubleshoot Athena Apache Iceberg table errors | AWS re:Post
Supported data types for Iceberg tables in Athena - Amazon Athena

answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.