Redshift Spectrum and tables with nested columns in Hive Metastore

0

Hello,

I managed to connect Redshift to my Hive Metastore using an external schema. But it seems I can only query the tables that does not contain nested columns otherwise Redshift Spectrum doesn't recognize the columns.

Is there a way to fix this please? Or can I somehow create an external table in Redshift Spectrum and point it to the partitions from the Hive Metastore so that it will be in sync?

The table in Hive is like that:

CREATE TABLE `test`.`product_created` (`data` STRUCT<`id`: STRING, `type`: STRING>, `meta` STRUCT<`created_at`: BIGINT>, `year` INT, `month` INT, `day` INT, `hour` INT)
USING json
OPTIONS (
  `serialization.format` '1',
  path 's3a://my_bucket/product_created'
)
PARTITIONED BY (year, month, day, hour)

When I check in SVV_EXTERNAL_COLUMNS I see those columns:
col: array<string>
year: int
month: int
day: int
hour: int
And it gives an error if I try to query 'col' column.

I have the same issue with data in Parquet.
But it works for tables without nested columns like that:

CREATE TABLE `test`.`users` (`id` INT, `object_id` STRING, `username` STRING, `user_type` INT)
USING PARQUET
OPTIONS (
  `serialization.format` '1',
  path 's3a://my_bucket/users'
)

Thanks

Edited by: mpierre on Mar 13, 2020 6:44 AM

mpierre
질문됨 4년 전733회 조회
1개 답변
0

I managed to fix the issues.

The first one with JSON is because I was creating the table with Spark and it's using a different SerDe by default.
I tried using this one instead and it works:

CREATE EXTERNAL TABLE `test`.`product_created` (`data` STRUCT<`id`: STRING, `type`: STRING>, `meta` STRUCT<`created_at`: BIGINT>)
ROW FORMAT SERDE &#39;org.openx.data.jsonserde.JsonSerDe&#39; LOCATION &#39;s3a://my_bucket/product_created&#39;
PARTITIONED BY (`year` INT, `month` INT, `day` INT, `hour` INT)

More info: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-data-catalog-hive.html

For the second error with Parquet files, it's strange, I just recreated the table in Hive with the same query and gave it a different name and it works well now.

mpierre
답변함 4년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠