Queries fail when schema has array type

0

I get the following error message:

Query dd3ba400-cda1-4b56-aa04-63ed487f8606 failed with error code HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://path/to/file/part-r-00040-4740245e-8b22-460d-bfdf-012dd1a52147.snappy.parquet (offset=0, length=57192693): Column emails.element type null not supported

This only happens when I define an "emails" column, which is an array of strings. If I leave the column out, I can query without issues. The schema information was gotten from our parquet file using Apache Spark. The create table statement I'm using to reproduce the error follows:

CREATE EXTERNAL TABLE `arraytest`(
  `emails` array<string>,
  `first_name` string,
  `last_name` string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1'
) LOCATION 's3://path/to/files/';

From the error, my guess is that Athena is expecting an emails array in every record but not finding one. Can anyone help?

philote
已提問 7 年前檢視次數 642 次
1 個回答
0

I hit my messages quota when I went to close this and couldn't post why I closed. The issue was on my side because I didn't properly define the array type (it was an array of structs, not strings).

philote
已回答 7 年前
profile picture
專家
已審閱 23 天前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南