Queries fail when schema has array type

0

I get the following error message:

Query dd3ba400-cda1-4b56-aa04-63ed487f8606 failed with error code HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://path/to/file/part-r-00040-4740245e-8b22-460d-bfdf-012dd1a52147.snappy.parquet (offset=0, length=57192693): Column emails.element type null not supported

This only happens when I define an "emails" column, which is an array of strings. If I leave the column out, I can query without issues. The schema information was gotten from our parquet file using Apache Spark. The create table statement I'm using to reproduce the error follows:

CREATE EXTERNAL TABLE `arraytest`(
  `emails` array<string>,
  `first_name` string,
  `last_name` string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1'
) LOCATION 's3://path/to/files/';

From the error, my guess is that Athena is expecting an emails array in every record but not finding one. Can anyone help?

philote
demandé il y a 7 ans652 vues
1 réponse
0

I hit my messages quota when I went to close this and couldn't post why I closed. The issue was on my side because I didn't properly define the array type (it was an array of structs, not strings).

philote
répondu il y a 7 ans
profile picture
EXPERT
vérifié il y a un mois

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions