By using AWS re:Post, you agree to the AWS re:Post Terms of Use

Why does the SELECT COUNT query in Amazon Athena return only one record even though the input JSON file has multiple records?

1 minute read
0

When I run a SELECT COUNT(*) FROM TABLE query, the output is "1", but the input file has multiple JSON records. I used org.openx.data.jsonserde.JsonSerDe to create the table.

Resolution

When records aren't separated by a newline character (\n), the SELECT COUNT(*) FROM TABLE query returns "1."

For example, the three records in the following JSON file are not separated by a newline character:

{"ID":"1","MYTIMESTAMP":"2020-06-09 17:50:46.267","Name":"John"}{"ID":"2","MYTIMESTAMP":"2020-06-10 17:50:46.267","Name":"Joel"}{"ID":"3","MYTIMESTAMP":"2020-06-10 17:50:46.267","Name":"Patrick"}

Add a newline character after each record in the previous JSON file to get the following JSON file:

{"ID":"1","MYTIMESTAMP":"2020-06-09 17:50:46.267","Name":"John"}  
{"ID":"2","MYTIMESTAMP":"2020-06-10 17:50:46.267","Name":"Joel"}  
{"ID":"3","MYTIMESTAMP":"2020-06-10 17:50:46.267","Name":"Patrick"}

If you run a SELECT COUNT(*) FROM TABLE query on the new JSON file, then the query returns "3."

Note: The SerDe expects each JSON document to be on a single line of text with no line termination characters separating the fields in the record. For more information, see Best practices for reading JSON data.

Related information

SELECT

CREATE TABLE

Run SQL queries in Amazon Athena

AWS OFFICIAL
AWS OFFICIALUpdated 4 months ago