Athena JSON SerDe Error

0

I have successfully created a database using a Glue crawler, but ran into this error (or an extremely similar one) when trying to access it with even a basic Athena query.

GENERIC_INTERNAL_ERROR: IndexOutOfBoundsException thrown initializing deserializer org.openx.data.jsonserde.JsonSerDe. Cause: Index: 55, Size: 55

I have successfully run Athena queries on another database, which I believe is formatted the same way. I tried taking just a single file out of the S3 bucket and using a new crawler to make a new table, but it created the same error. My best guess is that there's a special character or something being used in the JSON that this particular deserializer can't handle. A JSON linter read the file as a valid JSON. Are there reserved characters for the AWS SerDe? Where can I look to find more information on this particular error?

The data I am trying to assess belongs to a vendor, so I don't think I can post it here, but here is an example line from the file:

{"AE Weather Station (Standard w/ GHI) (Estimated GHI) (Watt hours/meter2)_Irradiance": "216.806", "AE Weather Station (Standard w/ GHI) (GHI sensor) (Watts/meter2)_GHI comparison": "0", "AE Weather Station (Standard w/ GHI) (GHI) (Watt hours/meter2)_Irradiance": "0", "AE Weather Station (Standard w/ GHI) - Ambient Temperature (Degrees Fahrenheit)_Temperature": "75.028", "AE Weather Station (Standard w/ GHI) - Module temperature (est) (Degrees Fahrenheit)_Temperature": "75.028", "AE Weather Station (Standard w/ GHI), Sun (GHI) (Watts/meter2)_Irradiance Sensor Degrade": "0", "AE Weather Station (Standard w/ GHI), Sun (GHI) (Watts/meter2)_Irradiance sensor orientation": "0", "AE Weather Station (Standard w/ GHI), Sun (GHI) * (Watts/meter2)_POA comparison": "0", "AE Weather Station (Standard w/ GHI): Clear Sky (E-W) (Watts/meter2)_Irradiance sensor orientation": "0", "AE Weather Station (Standard w/ GHI): Clear Sky GHI (Watts/meter2)_Irradiance sensor orientation": "0", "Ambient temperature (Degrees Celsius)_Clean average weather data": "23.904", "Ambient temperature Quality (Quality (asdf))_Clean average weather data": "100", "Bin Size": "Bin1Hour", "Blue-sky ratio: AE Weather Station (Standard w/ GHI), Sun (GHI) (Percent)_Irradiance Sensor Degrade": "104.792", "Blue-sky ratio: IMT 1 (1810) Ref Cell w/ Mod Temp, Irradiance (Percent)_Irradiance Sensor Degrade": "102.432", "Blue-sky ratio: IMT 2 (1550) Ref Cell w/ Mod Temp, Irradiance (Percent)_Irradiance Sensor Degrade": "111.587", "Blue-sky ratio: IMT 3 (1220) Ref Cell w/ Mod Temp, Irradiance (Percent)_Irradiance Sensor Degrade": "NaN", "GHI (Watts/meter2)_Clean average weather data": "0", "GHI Quality (Quality (asdf))_Clean average weather data": "100", "IMT 1 (1810) Ref Cell w/ Mod Temp (Estimated POA) (Watt hours/meter2)_Irradiance": "0", "IMT 1 (1810) Ref Cell w/ Mod Temp (Watt hours/meter2)_Irradiance": "0.090", "IMT 1 (1810) Ref Cell w/ Mod Temp - Irradiance weighted module temperature (Degrees Fahrenheit)_Temperature": "71.581", "IMT 1 (1810) Ref Cell w/ Mod Temp, Irradiance (Watts/meter2)_Irradiance Sensor Degrade": "0.090", "IMT 1 (1810) Ref Cell w/ Mod Temp, Irradiance (Watts/meter2)_Irradiance sensor orientation": "0.090", "IMT 1 (1810) Ref Cell w/ Mod Temp, Irradiance * (Watt hours/meter2)_GHI comparison": "0.090", "IMT 1 (1810) Ref Cell w/ Mod Temp, Irradiance * (Watts/meter2)_POA comparison": "0.090", "IMT 1 (1810) Ref Cell w/ Mod Temp: Clear Sky (E-W) (Watts/meter2)_Irradiance sensor orientation": "0", "IMT 1 (1810) Ref Cell w/ Mod Temp: Clear Sky GHI (Watts/meter2)_Irradiance sensor orientation": "0", "IMT 2 (1550) Ref Cell w/ Mod Temp (Watt hours/meter2)_Irradiance": "0.048", "IMT 2 (1550) Ref Cell w/ Mod Temp - Irradiance weighted module temperature (Degrees Fahrenheit)_Temperature": "71.331", "IMT 2 (1550) Ref Cell w/ Mod Temp, Irradiance (Watts/meter2)_Irradiance Sensor Degrade": "0.048", "IMT 2 (1550) Ref Cell w/ Mod Temp, Irradiance (Watts/meter2)_Irradiance sensor orientation": "0.048", "IMT 2 (1550) Ref Cell w/ Mod Temp, Irradiance * (Watt hours/meter2)_GHI comparison": "0.048", "IMT 2 (1550) Ref Cell w/ Mod Temp, Irradiance * (Watts/meter2)_POA comparison": "0.048", "IMT 2 (1550) Ref Cell w/ Mod Temp: Clear Sky (E-W) (Watts/meter2)_Irradiance sensor orientation": "0", "IMT 2 (1550) Ref Cell w/ Mod Temp: Clear Sky GHI (Watts/meter2)_Irradiance sensor orientation": "0", "IMT 3 (1220) Ref Cell w/ Mod Temp (Watt hours/meter2)_Irradiance": "0.052", "IMT 3 (1220) Ref Cell w/ Mod Temp - Irradiance weighted module temperature (Degrees Fahrenheit)_Temperature": "70.246", "IMT 3 (1220) Ref Cell w/ Mod Temp, Irradiance (Watts/meter2)_Irradiance Sensor Degrade": "0.052", "IMT 3 (1220) Ref Cell w/ Mod Temp, Irradiance (Watts/meter2)_Irradiance sensor orientation": "0.052", "IMT 3 (1220) Ref Cell w/ Mod Temp, Irradiance * (Watt hours/meter2)_GHI comparison": "0.052", "IMT 3 (1220) Ref Cell w/ Mod Temp, Irradiance * (Watts/meter2)_POA comparison": "0.052", "IMT 3 (1220) Ref Cell w/ Mod Temp: Clear Sky (E-W) (Watts/meter2)_Irradiance sensor orientation": "0", "IMT 3 (1220) Ref Cell w/ Mod Temp: Clear Sky GHI (Watts/meter2)_Irradiance sensor orientation": "0", "Max wind speed - AE Weather Station (Standard w/ GHI) (Miles/hour)_Wind": "0", "Module temperature (Degrees Celsius)_Clean average weather data": "21.652", "Module temperature Quality (Quality (asdf))_Clean average weather data": "100", "POA (Watts/meter2)_Clean average weather data": "0", "POA Quality (Quality (asdf))_Clean average weather data": "100", "Rain (Rain (Inches))_Rain, Humidity, Barometric Pressure": "0", "Variance (Percent)_Irradiance": "NaN", "datetime": "2021-06-01 00:00:00", "time_retrieved": "20221007T10:56:1665165415"}
asked 2 years ago333 views
1 Answer
2

I copied the JSON sample that you had provided and tried to run the crawler. When I queried the table, I noticed that I had a similar error to the one you have pointed out. Enter image description here I also had the same problem when I tried running SHOW CREATE TABLE

I was able to resolve the issue when the columns names did not have special characters. '-', '*', '/', '(', and ')' are special characters in Athena when used as column names to define a table. To create tables that can be queries without issues in Athena, only alphanumeric characters, underscore (_), and period (.) are allowed in column names. Spaces would work, but would always need your columns named to be embedded in double quotes. To create a table and mappings for keys that have restricted characters:

  • you can use a custom DDL statement instead of using a crawler to build the table for you.
  • OR you can replace the special characters with underscores before you run the crawler References:
  1. https://aws.amazon.com/premiumsupport/knowledge-center/error-json-athena/
  2. https://docs.aws.amazon.com/athena/latest/ug/tables-databases-columns-names.html
  3. https://aws.amazon.com/blogs/big-data/create-tables-in-amazon-athena-from-nested-json-and-mappings-using-jsonserde/ - see the section Walkthrough: Handling forbidden characters with mappings for how to map special characters into other characters.
profile pictureAWS
answered 2 years ago
AWS
EXPERT
reviewed 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions