Redshift External Table - unable to query struct data types using Avro as a format

0

Hi, I have created an external table in redshift as below:

CREATE EXTERNAL TABLE "someschema"."avro_simple_nested" (
  "actor_id" varchar(200),
  "first_name" varchar(200),
  "last_name" varchar(200),
  "last_update" bigint,
  "version" int,
  "debut_film" struct<
  	"name":varchar(200),
  	"score":int
  >
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES (
	'avro.schema.literal'='{"namespace": "example.avro",
 "type": "record",
 "name": "actor",
 "fields": [
        {"name": "actor_id", "type": "string"},
        {"name": "first_name", "type": "string"},
        {"name": "last_name", "type": "string"},
        {"default": 0, "name": "last_update", "type": "long"},
        {"name": "version", "type": "int"},
        {"name": "debut_film", "type": {"type":"record", "name":"debut_film_name",
            "fields": [
                {"name": "name", "type": "string"},
                {"name": "score", "type": "int"}
            ]}}
    ]
}'
)
STORED AS AVRO
LOCATION 's3://....../..../avro-files-simple-nested/'

If I try and query the table by using SELECT * from Table I get: ERROR: Cannot expand query to include unsupported column type for column "debut_film". If I try and query the non struct columns, I am able to see the data. If I check the same data in Athena, it shows fine. The table is shown correctly and the data is as expected. If I query the SVV_EXTERNAL_COLUMNS table, you can see the table definition is correct. debut_film struct<name:varchar(200),score:int> Has anyone else used an external table over avro data and managed to view nested data in redshift? Thanks!

gefragt vor 2 Jahren1289 Aufrufe
2 Antworten
1
Akzeptierte Antwort

Currently, Redshift Spectrum does not support AVRO nested types. The alternatives would be

-> to un-nest the columns and then query them as primitive types

-> use another format (for example Apache Parquet, Apache ORC, JSON ) that supports nested types with Redshift Spectrum https://aws.amazon.com/about-aws/whats-new/2018/08/amazon-redshift-announces-support-for-nested-data-with-redshift-spectrum/

->Try to ingest the data into Redshift using COPY command https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-avro

AWS
SUPPORT-TECHNIKER
beantwortet vor 2 Jahren
AWS
EXPERTE
überprüft vor 2 Jahren
0

Thank you for the answer. Do you know if there are any plans to include avro support in spectrum? Thanks!

beantwortet vor 2 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen