- Newest
- Most votes
- Most comments
Hello,
The error “integer overflow” occurs when a numeric value is larger than the range of an integer where the maximum value allowed for an integer is "2147483647” as mentioned here https://docs.aws.amazon.com/athena/latest/ug/data-types.html
As you may know, Athena uses Presto as a query engine in the backend, So when Presto reads a Parquet file, it attempts to get the chunk size as an integer. If the total chunk size in bytes is greater than the maximum value for an integer, Presto will return an integer overflow error.
There are few options to narrow down which column could be causing the issue :
- Run a simple select query on the individual columns and determine which succeed and which fail with the "GENERIC_INTERNAL_ERROR: integer overflow" error. OR
- Inspect the metadata of the parquet file using a tool or library like Pyarrow or parquet-tools.
As a workaround, it is suggested to use smaller block size for parquet depending upon how you are generating the parquet data .In Spark you can try setting "parquet.block.size" and "dfs.blocksize”. Please find the 3rd party guide below http://what-when-how.com/Tutorial/topic-2059e313/Hadoop-The-Definitive-Guide-457.html
I hope the above information helps!
Thank you!
Relevant content
- asked 2 years ago
- Accepted Answerasked 5 months ago
- asked 8 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 4 years ago
- AWS OFFICIALUpdated a year ago