Athena JSON SerDe Error

0

I have successfully created a database using a Glue crawler, but ran into this error (or an extremely similar one) when trying to access it with even a basic Athena query.

GENERIC_INTERNAL_ERROR: IndexOutOfBoundsException thrown initializing deserializer org.openx.data.jsonserde.JsonSerDe. Cause: Index: 55, Size: 55

I have successfully run Athena queries on another database, which I believe is formatted the same way. I tried taking just a single file out of the S3 bucket and using a new crawler to make a new table, but it created the same error. My best guess is that there's a special character or something being used in the JSON that this particular deserializer can't handle. A JSON linter read the file as a valid JSON. Are there reserved characters for the AWS SerDe? Where can I look to find more information on this particular error?

The data I am trying to assess belongs to a vendor, so I don't think I can post it here, but here is an example line from the file:

{"AE Weather Station (Standard w/ GHI) (Estimated GHI) (Watt hours/meter2)_Irradiance": "216.806", "AE Weather Station (Standard w/ GHI) (GHI sensor) (Watts/meter2)_GHI comparison": "0", "AE Weather Station (Standard w/ GHI) (GHI) (Watt hours/meter2)_Irradiance": "0", "AE Weather Station (Standard w/ GHI) - Ambient Temperature (Degrees Fahrenheit)_Temperature": "75.028", "AE Weather Station (Standard w/ GHI) - Module temperature (est) (Degrees Fahrenheit)_Temperature": "75.028", "AE Weather Station (Standard w/ GHI), Sun (GHI) (Watts/meter2)_Irradiance Sensor Degrade": "0", "AE Weather Station (Standard w/ GHI), Sun (GHI) (Watts/meter2)_Irradiance sensor orientation": "0", "AE Weather Station (Standard w/ GHI), Sun (GHI) * (Watts/meter2)_POA comparison": "0", "AE Weather Station (Standard w/ GHI): Clear Sky (E-W) (Watts/meter2)_Irradiance sensor orientation": "0", "AE Weather Station (Standard w/ GHI): Clear Sky GHI (Watts/meter2)_Irradiance sensor orientation": "0", "Ambient temperature (Degrees Celsius)_Clean average weather data": "23.904", "Ambient temperature Quality (Quality (asdf))_Clean average weather data": "100", "Bin Size": "Bin1Hour", "Blue-sky ratio: AE Weather Station (Standard w/ GHI), Sun (GHI) (Percent)_Irradiance Sensor Degrade": "104.792", "Blue-sky ratio: IMT 1 (1810) Ref Cell w/ Mod Temp, Irradiance (Percent)_Irradiance Sensor Degrade": "102.432", "Blue-sky ratio: IMT 2 (1550) Ref Cell w/ Mod Temp, Irradiance (Percent)_Irradiance Sensor Degrade": "111.587", "Blue-sky ratio: IMT 3 (1220) Ref Cell w/ Mod Temp, Irradiance (Percent)_Irradiance Sensor Degrade": "NaN", "GHI (Watts/meter2)_Clean average weather data": "0", "GHI Quality (Quality (asdf))_Clean average weather data": "100", "IMT 1 (1810) Ref Cell w/ Mod Temp (Estimated POA) (Watt hours/meter2)_Irradiance": "0", "IMT 1 (1810) Ref Cell w/ Mod Temp (Watt hours/meter2)_Irradiance": "0.090", "IMT 1 (1810) Ref Cell w/ Mod Temp - Irradiance weighted module temperature (Degrees Fahrenheit)_Temperature": "71.581", "IMT 1 (1810) Ref Cell w/ Mod Temp, Irradiance (Watts/meter2)_Irradiance Sensor Degrade": "0.090", "IMT 1 (1810) Ref Cell w/ Mod Temp, Irradiance (Watts/meter2)_Irradiance sensor orientation": "0.090", "IMT 1 (1810) Ref Cell w/ Mod Temp, Irradiance * (Watt hours/meter2)_GHI comparison": "0.090", "IMT 1 (1810) Ref Cell w/ Mod Temp, Irradiance * (Watts/meter2)_POA comparison": "0.090", "IMT 1 (1810) Ref Cell w/ Mod Temp: Clear Sky (E-W) (Watts/meter2)_Irradiance sensor orientation": "0", "IMT 1 (1810) Ref Cell w/ Mod Temp: Clear Sky GHI (Watts/meter2)_Irradiance sensor orientation": "0", "IMT 2 (1550) Ref Cell w/ Mod Temp (Watt hours/meter2)_Irradiance": "0.048", "IMT 2 (1550) Ref Cell w/ Mod Temp - Irradiance weighted module temperature (Degrees Fahrenheit)_Temperature": "71.331", "IMT 2 (1550) Ref Cell w/ Mod Temp, Irradiance (Watts/meter2)_Irradiance Sensor Degrade": "0.048", "IMT 2 (1550) Ref Cell w/ Mod Temp, Irradiance (Watts/meter2)_Irradiance sensor orientation": "0.048", "IMT 2 (1550) Ref Cell w/ Mod Temp, Irradiance * (Watt hours/meter2)_GHI comparison": "0.048", "IMT 2 (1550) Ref Cell w/ Mod Temp, Irradiance * (Watts/meter2)_POA comparison": "0.048", "IMT 2 (1550) Ref Cell w/ Mod Temp: Clear Sky (E-W) (Watts/meter2)_Irradiance sensor orientation": "0", "IMT 2 (1550) Ref Cell w/ Mod Temp: Clear Sky GHI (Watts/meter2)_Irradiance sensor orientation": "0", "IMT 3 (1220) Ref Cell w/ Mod Temp (Watt hours/meter2)_Irradiance": "0.052", "IMT 3 (1220) Ref Cell w/ Mod Temp - Irradiance weighted module temperature (Degrees Fahrenheit)_Temperature": "70.246", "IMT 3 (1220) Ref Cell w/ Mod Temp, Irradiance (Watts/meter2)_Irradiance Sensor Degrade": "0.052", "IMT 3 (1220) Ref Cell w/ Mod Temp, Irradiance (Watts/meter2)_Irradiance sensor orientation": "0.052", "IMT 3 (1220) Ref Cell w/ Mod Temp, Irradiance * (Watt hours/meter2)_GHI comparison": "0.052", "IMT 3 (1220) Ref Cell w/ Mod Temp, Irradiance * (Watts/meter2)_POA comparison": "0.052", "IMT 3 (1220) Ref Cell w/ Mod Temp: Clear Sky (E-W) (Watts/meter2)_Irradiance sensor orientation": "0", "IMT 3 (1220) Ref Cell w/ Mod Temp: Clear Sky GHI (Watts/meter2)_Irradiance sensor orientation": "0", "Max wind speed - AE Weather Station (Standard w/ GHI) (Miles/hour)_Wind": "0", "Module temperature (Degrees Celsius)_Clean average weather data": "21.652", "Module temperature Quality (Quality (asdf))_Clean average weather data": "100", "POA (Watts/meter2)_Clean average weather data": "0", "POA Quality (Quality (asdf))_Clean average weather data": "100", "Rain (Rain (Inches))_Rain, Humidity, Barometric Pressure": "0", "Variance (Percent)_Irradiance": "NaN", "datetime": "2021-06-01 00:00:00", "time_retrieved": "20221007T10:56:1665165415"}
gefragt vor 2 Jahren296 Aufrufe
1 Antwort
2

I copied the JSON sample that you had provided and tried to run the crawler. When I queried the table, I noticed that I had a similar error to the one you have pointed out. Enter image description here I also had the same problem when I tried running SHOW CREATE TABLE

I was able to resolve the issue when the columns names did not have special characters. '-', '*', '/', '(', and ')' are special characters in Athena when used as column names to define a table. To create tables that can be queries without issues in Athena, only alphanumeric characters, underscore (_), and period (.) are allowed in column names. Spaces would work, but would always need your columns named to be embedded in double quotes. To create a table and mappings for keys that have restricted characters:

  • you can use a custom DDL statement instead of using a crawler to build the table for you.
  • OR you can replace the special characters with underscores before you run the crawler References:
  1. https://aws.amazon.com/premiumsupport/knowledge-center/error-json-athena/
  2. https://docs.aws.amazon.com/athena/latest/ug/tables-databases-columns-names.html
  3. https://aws.amazon.com/blogs/big-data/create-tables-in-amazon-athena-from-nested-json-and-mappings-using-jsonserde/ - see the section Walkthrough: Handling forbidden characters with mappings for how to map special characters into other characters.
profile pictureAWS
beantwortet vor 2 Jahren
AWS
EXPERTE
überprüft vor 2 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen