Problem with Timestamps since we changed our tables' classification to 'delta'

1

Previously we were using the Symlink Manifest format to query Delta tables, using Athena, stored in our S3 buckets; we recently noticed that Athena added support to query Delta format tables without using the Symlink Manifest, so we dropped our Glue tables and re-created them using Glue Crawlers with Delta source changing their classification to 'delta'. Some users are reporting troubles when querying views and tables that depended on these tables re-created as 'delta' , some examples are:

my_view is stale or in invalid state: column [my_column] of type timestamp(3) with time zone projected from query view at position 13 cannot be coerced to column [my_view_column] of type timestamp(3) stored in view definition
pyathena.error.OperationalError: NOT_SUPPORTED: Unsupported Hive type: timestamp(3) with time zone.

and similar issues related to timestamp(3) with timezone, we never had this type of issue with the Symlink Manifest and we think since the Metadata, now that is registered as 'delta', is more specific Athena detects this incompatibility.

We found that casting these timestamp(3) with timezone columns to timestamp when creating the views seems to sort the problem

Is there any specific reason for this sudden change in compatibility?

preguntada hace un año1438 visualizaciones
2 Respuestas
0

Hello,

Timestamp with Time Zone is not supported by the glue crawler or by Athena. [+] https://repost.aws/questions/QUiFhuBzMrR3-QQV1W4XgIgA/no-time-with-time-zone-data-type-in-aws-glue-crawler

If you have a table column of type TIMESTAMP, Athena expects the corresponding column or property of the data to be a string in the format YYYY-MM-DD HH:MM:SS.SSS (note the space between the date and time), or a built-in time type like those for Parquet, ORC, or Ion. Please refer: https://docs.aws.amazon.com/athena/latest/ug/data-types.html

Your finding about casting these timestamp(3) with time zone columns to timestamp is one of the workarounds suggested by us to fix this issue.

In your case, if you didn't face any issue with Timestamp with Time Zone columns when used with the Symlink Manifest, please consider to create a support case with AWS Technical support and provide some sample data for detailed investigation. Refer below for creating a support case with AWS: https://docs.aws.amazon.com/awssupport/latest/user/case-management.html

AWS
Dimpy S
respondido hace un año
0

Exactly the same case here, we used Symlink Manifest format to query Delta tables and after changing the table classification to Delta the issue started.

I still have my Symlink Manifest table, and when I query using the Symlink Manifest it works, but the delta one doesn't.

Both tables point to the same parquet files, looks like an Athena issue.

respondido hace un año

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas