- Più recenti
- Maggior numero di voti
- Maggior numero di commenti
There is a known limitation where Athena may not correctly infer timestamp columns from Parquet files generated in certain ways.
When writing the DataFrame to Parquet, Pandas uses nanosecond resolution timestamps which Parquet supports as INT96. However, some data catalogs and query engines may expect microsecond resolution instead. Explicitly converting the timestamp column to microsecond resolution before writing to Parquet ensures the data type will be correctly identified.
df['date'] = df['date'].astype('datetime64[us]')
Another option is to set the proper metadata in the Parquet file itself to specify nanosecond resolution timestamps. Tools like Spark handle this automatically, but you may need to configure other systems like DMS to do the same.
Contenuto pertinente
- AWS UFFICIALEAggiornata un anno fa
- AWS UFFICIALEAggiornata 2 anni fa
- AWS UFFICIALEAggiornata 8 mesi fa
- AWS UFFICIALEAggiornata 2 anni fa
I used parquet format version 2.6 which stores nanosecond resolution timestamps as int64.
The previous code snippet returns the following values:
So for me it's strange why I get this behavior with bigint