Queries fail when schema has array type


I get the following error message:

Query dd3ba400-cda1-4b56-aa04-63ed487f8606 failed with error code HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://path/to/file/part-r-00040-4740245e-8b22-460d-bfdf-012dd1a52147.snappy.parquet (offset=0, length=57192693): Column emails.element type null not supported

This only happens when I define an "emails" column, which is an array of strings. If I leave the column out, I can query without issues. The schema information was gotten from our parquet file using Apache Spark. The create table statement I'm using to reproduce the error follows:

  `emails` array<string>,
  `first_name` string,
  `last_name` string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
  'serialization.format' = '1'
) LOCATION 's3://path/to/files/';

From the error, my guess is that Athena is expecting an emails array in every record but not finding one. Can anyone help?

asked 7 years ago564 views
1 Answer

I hit my messages quota when I went to close this and couldn't post why I closed. The issue was on my side because I didn't properly define the array type (it was an array of structs, not strings).

answered 7 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions