By using AWS re:Post, you agree to the Terms of Use

Redshift External Table - unable to query struct data types using Avro as a format

0

Hi, I have created an external table in redshift as below:

CREATE EXTERNAL TABLE "someschema"."avro_simple_nested" (
  "actor_id" varchar(200),
  "first_name" varchar(200),
  "last_name" varchar(200),
  "last_update" bigint,
  "version" int,
  "debut_film" struct<
  	"name":varchar(200),
  	"score":int
  >
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES (
	'avro.schema.literal'='{"namespace": "example.avro",
 "type": "record",
 "name": "actor",
 "fields": [
        {"name": "actor_id", "type": "string"},
        {"name": "first_name", "type": "string"},
        {"name": "last_name", "type": "string"},
        {"default": 0, "name": "last_update", "type": "long"},
        {"name": "version", "type": "int"},
        {"name": "debut_film", "type": {"type":"record", "name":"debut_film_name",
            "fields": [
                {"name": "name", "type": "string"},
                {"name": "score", "type": "int"}
            ]}}
    ]
}'
)
STORED AS AVRO
LOCATION 's3://....../..../avro-files-simple-nested/'

If I try and query the table by using SELECT * from Table I get: ERROR: Cannot expand query to include unsupported column type for column "debut_film". If I try and query the non struct columns, I am able to see the data. If I check the same data in Athena, it shows fine. The table is shown correctly and the data is as expected. If I query the SVV_EXTERNAL_COLUMNS table, you can see the table definition is correct. debut_film struct<name:varchar(200),score:int> Has anyone else used an external table over avro data and managed to view nested data in redshift? Thanks!

asked 7 months ago198 views
2 Answers
1
Accepted Answer

Currently, Redshift Spectrum does not support AVRO nested types. The alternatives would be

-> to un-nest the columns and then query them as primitive types

-> use another format (for example Apache Parquet, Apache ORC, JSON ) that supports nested types with Redshift Spectrum https://aws.amazon.com/about-aws/whats-new/2018/08/amazon-redshift-announces-support-for-nested-data-with-redshift-spectrum/

->Try to ingest the data into Redshift using COPY command https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-avro

SUPPORT ENGINEER
answered 7 months ago
EXPERT
reviewed 7 months ago
0

Thank you for the answer. Do you know if there are any plans to include avro support in spectrum? Thanks!

answered 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions