Queries fail when schema has array type

0

I get the following error message:

Query dd3ba400-cda1-4b56-aa04-63ed487f8606 failed with error code HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://path/to/file/part-r-00040-4740245e-8b22-460d-bfdf-012dd1a52147.snappy.parquet (offset=0, length=57192693): Column emails.element type null not supported

This only happens when I define an "emails" column, which is an array of strings. If I leave the column out, I can query without issues. The schema information was gotten from our parquet file using Apache Spark. The create table statement I'm using to reproduce the error follows:

CREATE EXTERNAL TABLE `arraytest`(
  `emails` array<string>,
  `first_name` string,
  `last_name` string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1'
) LOCATION 's3://path/to/files/';

From the error, my guess is that Athena is expecting an emails array in every record but not finding one. Can anyone help?

philote
gefragt vor 7 Jahren657 Aufrufe
1 Antwort
0

I hit my messages quota when I went to close this and couldn't post why I closed. The issue was on my side because I didn't properly define the array type (it was an array of structs, not strings).

philote
beantwortet vor 7 Jahren
profile picture
EXPERTE
überprüft vor einem Monat

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen