When I run Amazon Athena queries on pretty-print JSON files with JSON SerDe, my queries fail with the following error message: "HIVE_CURSOR_ERROR: Row is not a valid JSON Object".
Resolution
Athena JSON SerDe doesn't support pretty-print JSON files, such as the following file:
{
"id": "b179db57-9d5b-40c8-8f07-f75bd1fa1b3c",
"random_number": 19,
"categories": [
"category 1",
"category 2"
],
"dict": {
"name": "category 1",
"comment": "category 2"
}
}
{
"id": "bc4624ce-d752-4e38-8a30-42de158dbc50",
"random_number": 95,
"categories": [
"category 1",
"category 2",
"category 3",
"category 4"
],
"dict": {
"name": "category 2",
"comment": "category 1"
}
}
{
"id": "1d3298e8-e3bb-41c0-950c-6a146555ce3c",
"random_number": 89,
"categories": [
"category 1"
],
"dict": {
"name": "category 5",
"comment": "category 3"
}
}
Because Amazon Ion is a superset of JSON, use the Amazon Ion Hive SerDe to query non-Amazon Ion JSON datasets. Unlike other JSON SerDe libraries, the Amazon Ion SerDe doesn't expect each row of data to be on a single line. Use this feature to query JSON datasets that are in pretty print format, or break up the fields in a row with newline characters.
1. Create a table that's similar to the following one with Athena Ion Hive SerDe:
CREATE EXTERNAL TABLE table_name (
id string,
random_number int,
categories array<string>,
dict struct<name:string, comment:string>
)
STORED AS ION
LOCATION 's3://sample-bucket/prefix/'
2. Query the table:
SELECT * FROM table_name
The output looks similar to the following output:
# id random_number categories dict
1 b179db57-9d5b-40c8-8f07-f75bd1fa1b3c 19 [category 1, category 2] {name=category 1, comment=category 2}
2 bc4624ce-d752-4e38-8a30-42de158dbc50 95 [category 1, category 2, category 3, category 4] {name=category 2, comment=category 1}
3 1d3298e8-e3bb-41c0-950c-6a146555ce3c 89 [category 1] {name=category 5, comment=category 3}
Related information
Why do I get errors when I try to read JSON data in Amazon Athena?