- 最新
- 投票最多
- 评论最多
There is a known limitation where Athena may not correctly infer timestamp columns from Parquet files generated in certain ways.
When writing the DataFrame to Parquet, Pandas uses nanosecond resolution timestamps which Parquet supports as INT96. However, some data catalogs and query engines may expect microsecond resolution instead. Explicitly converting the timestamp column to microsecond resolution before writing to Parquet ensures the data type will be correctly identified.
df['date'] = df['date'].astype('datetime64[us]')
Another option is to set the proper metadata in the Parquet file itself to specify nanosecond resolution timestamps. Tools like Spark handle this automatically, but you may need to configure other systems like DMS to do the same.
相关内容
- AWS 官方已更新 2 年前
- AWS 官方已更新 8 个月前
- AWS 官方已更新 3 年前
- AWS 官方已更新 2 年前
I used parquet format version 2.6 which stores nanosecond resolution timestamps as int64.
The previous code snippet returns the following values:
So for me it's strange why I get this behavior with bigint