- Le plus récent
- Le plus de votes
- La plupart des commentaires
Hello Michael,
I have quickly created the data file and the Athena table using the python script that you have provided and faced with the same error as you with Athena v3.
On further research, I did check Athena documentation on this, and it identifies the cause as: a timestamp overflow for the int96 Parquet format. It is located at this documentation page [1], and you can find it easier by searching for "timeOfDayNanos".
-The suggested workaround from the docs is to identify the specific files that have the issue and generate the data file again with an up-to-date, well known Parquet library, or use Athena CTAS [2].
Checking further through various platforms, I have found the following information regarding this issue:
-Int96 timestamps are encoded as 12 byte arrays in parquet. The first 8 bytes are used to encode the nanoseconds into the day, and the remaining 4 bytes are used to encode the "Julian day" (basically, seconds since epoch). Here's the code for reference:
public static DecodedTimestamp decodeInt96Timestamp(Binary timestampBinary) { if (timestampBinary.length() != 12) { throw new TrinoException(NOT_SUPPORTED, "Parquet timestamp must be 12 bytes, actual " + timestampBinary.length()); } byte[] bytes = timestampBinary.getBytes();
// little endian encoding - need to invert byte order
long timeOfDayNanos = Longs.fromBytes(bytes[7], bytes[6], bytes[5], bytes[4], bytes[3], bytes[2], bytes[1], bytes[0]);
int julianDay = Ints.fromBytes(bytes[11], bytes[10], bytes[9], bytes[8]);
return decodeInt96Timestamp(timeOfDayNanos, julianDay);
}
-This then cals decodeInt96Timestamp, which performs the following validation:
public static DecodedTimestamp decodeInt96Timestamp(long timeOfDayNanos, int julianDay) { verify(timeOfDayNanos >= 0 && timeOfDayNanos < NANOSECONDS_PER_DAY, "Invalid timeOfDayNanos: %s", timeOfDayNanos);
long epochSeconds = (julianDay - JULIAN_EPOCH_OFFSET_DAYS) * SECONDS_PER_DAY + timeOfDayNanos / NANOSECONDS_PER_SECOND;
return new DecodedTimestamp(epochSeconds, (int) (timeOfDayNanos % NANOSECONDS_PER_SECOND));
}
-That verify statement in the above snippet is failing for your data, because the timeOfDayNanos value is negative. I suspect this is the issue
-If your actual fdata is going to have the same kind of data columns, I suggest you check the the same on your end.
Resources: [1] - https://docs.aws.amazon.com/athena/latest/ug/engine-versions-reference-0003.html#engine-versions-reference-0003-breaking-changes [2] Creating a table from query results (CTAS) - https://docs.aws.amazon.com/athena/latest/ug/ctas.html
Contenus pertinents
- demandé il y a un an
- demandé il y a un an
- AWS OFFICIELA mis à jour il y a 3 ans
- AWS OFFICIELA mis à jour il y a 3 ans
- AWS OFFICIELA mis à jour il y a 3 ans
- AWS OFFICIELA mis à jour il y a 2 ans