Athena unloads to parquet a Timestamp(3) type as Timestamp(9)

0

I have a datetime[ms, UTC] parquet column I write to s3. In Athena if I unload it, redundantly casting to timestamp(3) (Trino datatype) the VS Code extension "DataWrangler" confirms it is a datetime[ns].

UNLOAD( select cast(purchased as timestamp(3)) as purchased from store where year = 2013) TO 's3://buck/folder' WITH (format = 'PARQUET', compression = 'UNCOMPRESSED')

Does this mean Athena unload queries cannot represent anything other than timestamp(9)? Where is this limitation documented?

asked 13 days ago28 views
1 Answer
0

While Athena supports various timestamp precisions in queries, there appears to be a limitation when it comes to unloading data to Parquet format. Based on the behavior you've described, it seems that Athena is indeed converting timestamp(3) to timestamp(9) during the unload process, even when explicitly cast to timestamp(3).

This behavior is not explicitly documented in Athena's official documentation, which is an oversight. The conversion to nanosecond precision (timestamp(9)) during unload operations appears to be an implicit behavior of Athena when writing to Parquet format.

It's worth noting that Parquet as a file format supports timestamps with nanosecond precision, which might explain why Athena defaults to this higher precision when unloading data. However, this automatic conversion can lead to unexpected results, especially when working with tools that are sensitive to timestamp precision.

To work around this issue, you might consider:

  1. Using a different file format for unloading if maintaining millisecond precision is crucial.
  2. Post-processing the unloaded Parquet files to adjust the timestamp precision if necessary.
  3. Accepting the higher precision in the unloaded files, but being aware of it when using the data in subsequent processes.

If this behavior is causing significant issues in your workflow, it would be advisable to contact AWS support or file a feature request to either document this behavior clearly or provide an option to maintain the original timestamp precision during unload operations.
Sources
Work with timestamp data - Amazon Athena
Athena engine version 3 - Amazon Athena

profile picture
answered 13 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions