- Newest
- Most votes
- Most comments
While Athena supports various timestamp precisions in queries, there appears to be a limitation when it comes to unloading data to Parquet format. Based on the behavior you've described, it seems that Athena is indeed converting timestamp(3) to timestamp(9) during the unload process, even when explicitly cast to timestamp(3).
This behavior is not explicitly documented in Athena's official documentation, which is an oversight. The conversion to nanosecond precision (timestamp(9)) during unload operations appears to be an implicit behavior of Athena when writing to Parquet format.
It's worth noting that Parquet as a file format supports timestamps with nanosecond precision, which might explain why Athena defaults to this higher precision when unloading data. However, this automatic conversion can lead to unexpected results, especially when working with tools that are sensitive to timestamp precision.
To work around this issue, you might consider:
- Using a different file format for unloading if maintaining millisecond precision is crucial.
- Post-processing the unloaded Parquet files to adjust the timestamp precision if necessary.
- Accepting the higher precision in the unloaded files, but being aware of it when using the data in subsequent processes.
If this behavior is causing significant issues in your workflow, it would be advisable to contact AWS support or file a feature request to either document this behavior clearly or provide an option to maintain the original timestamp precision during unload operations.
Sources
Work with timestamp data - Amazon Athena
Athena engine version 3 - Amazon Athena
Relevant content
- asked a year ago
- asked 2 years ago
- asked 5 years ago