Problem with Timestamps since we changed our tables' classification to 'delta'

1

Previously we were using the Symlink Manifest format to query Delta tables, using Athena, stored in our S3 buckets; we recently noticed that Athena added support to query Delta format tables without using the Symlink Manifest, so we dropped our Glue tables and re-created them using Glue Crawlers with Delta source changing their classification to 'delta'. Some users are reporting troubles when querying views and tables that depended on these tables re-created as 'delta' , some examples are:

my_view is stale or in invalid state: column [my_column] of type timestamp(3) with time zone projected from query view at position 13 cannot be coerced to column [my_view_column] of type timestamp(3) stored in view definition
pyathena.error.OperationalError: NOT_SUPPORTED: Unsupported Hive type: timestamp(3) with time zone.

and similar issues related to timestamp(3) with timezone, we never had this type of issue with the Symlink Manifest and we think since the Metadata, now that is registered as 'delta', is more specific Athena detects this incompatibility.

We found that casting these timestamp(3) with timezone columns to timestamp when creating the views seems to sort the problem

Is there any specific reason for this sudden change in compatibility?

已提问 1 年前1438 查看次数
2 回答
0

Hello,

Timestamp with Time Zone is not supported by the glue crawler or by Athena. [+] https://repost.aws/questions/QUiFhuBzMrR3-QQV1W4XgIgA/no-time-with-time-zone-data-type-in-aws-glue-crawler

If you have a table column of type TIMESTAMP, Athena expects the corresponding column or property of the data to be a string in the format YYYY-MM-DD HH:MM:SS.SSS (note the space between the date and time), or a built-in time type like those for Parquet, ORC, or Ion. Please refer: https://docs.aws.amazon.com/athena/latest/ug/data-types.html

Your finding about casting these timestamp(3) with time zone columns to timestamp is one of the workarounds suggested by us to fix this issue.

In your case, if you didn't face any issue with Timestamp with Time Zone columns when used with the Symlink Manifest, please consider to create a support case with AWS Technical support and provide some sample data for detailed investigation. Refer below for creating a support case with AWS: https://docs.aws.amazon.com/awssupport/latest/user/case-management.html

AWS
Dimpy S
已回答 1 年前
0

Exactly the same case here, we used Symlink Manifest format to query Delta tables and after changing the table classification to Delta the issue started.

I still have my Symlink Manifest table, and when I query using the Symlink Manifest it works, but the delta one doesn't.

Both tables point to the same parquet files, looks like an Athena issue.

已回答 1 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则