- Newest
- Most votes
- Most comments
I have some updated information. I was able to get the schema of the DMS generated parquet and it has this for the column that will be a timestamp
optional int64 PROCESS_DATE (TIMESTAMP_MICROS);
Here is the schema for the pyspark 2.3.2 generated parquet file using defaults for this version
optional int96 PROCESS_DATE;
Also, it was brought to my attention that if you take the int64 value from the DMS parquet, eg PROCESS_DATE = 1493942400000000, and translate as a timestamp in nanoseconds it comes out to 2017-05-05. Plus it appears that is not associated to MSSQL Server, as I test extracting a MySQL table with a datetime column and it did the exact same thing.
After further research, I discovered that I was incorrect in assuming that the actual value was in nanoseconds, it is actually in microseconds just like the Parquet Schema states. I no longer think this is a DMS issue and believe it is an issue with Athena not being able to read the value correctly. Using https://www.epochconverter.com/ with the 1493942400000000 value it comes back with 2017-05-05 and identifies it as a microsecond time value. If you remove the last three zeros from it and run it again, you get the same result and it identifies it as milliseconds.
However if you run this in Athena
SELECT from_unixtime(1493942400000)
you get the following response
+49311-02-25 00:00:00.000
Oh, it looks like from_unixtime is expecting the value to be in seconds.
SELECT from_unixtime(1493942400)
2017-05-05 00:00:00.000
So I guess the hacky solution would be to set the column to BIGINT, then select timestamp column in Athena like so from_unixtime(col1/1000/1000).
For me adding parquetTimestampInMillisecond=true to the extra connection attributes of the target endpoint fixed this
equationcrunchor wrote:
For me adding parquetTimestampInMillisecond=true to the extra connection attributes of the target endpoint fixed this
Thank you @equationcrunchor - this worked for me extracting from SQL Server 2016
Relevant content
- asked a year ago
- asked 6 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago