Queries fail when schema has array type

0

I get the following error message:

Query dd3ba400-cda1-4b56-aa04-63ed487f8606 failed with error code HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://path/to/file/part-r-00040-4740245e-8b22-460d-bfdf-012dd1a52147.snappy.parquet (offset=0, length=57192693): Column emails.element type null not supported

This only happens when I define an "emails" column, which is an array of strings. If I leave the column out, I can query without issues. The schema information was gotten from our parquet file using Apache Spark. The create table statement I'm using to reproduce the error follows:

CREATE EXTERNAL TABLE `arraytest`(
  `emails` array<string>,
  `first_name` string,
  `last_name` string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1'
) LOCATION 's3://path/to/files/';

From the error, my guess is that Athena is expecting an emails array in every record but not finding one. Can anyone help?

philote
preguntada hace 7 años657 visualizaciones
1 Respuesta
0

I hit my messages quota when I went to close this and couldn't post why I closed. The issue was on my side because I didn't properly define the array type (it was an array of structs, not strings).

philote
respondido hace 7 años
profile picture
EXPERTO
revisado hace un mes

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas