I get the following error message:
Query dd3ba400-cda1-4b56-aa04-63ed487f8606 failed with error code HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://path/to/file/part-r-00040-4740245e-8b22-460d-bfdf-012dd1a52147.snappy.parquet (offset=0, length=57192693): Column emails.element type null not supported
This only happens when I define an "emails" column, which is an array of strings. If I leave the column out, I can query without issues. The schema information was gotten from our parquet file using Apache Spark. The create table statement I'm using to reproduce the error follows:
CREATE EXTERNAL TABLE `arraytest`(
`emails` array<string>,
`first_name` string,
`last_name` string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1'
) LOCATION 's3://path/to/files/';
From the error, my guess is that Athena is expecting an emails array in every record but not finding one. Can anyone help?