Redshift Spectrum and tables with nested columns in Hive Metastore

0

Hello,

I managed to connect Redshift to my Hive Metastore using an external schema. But it seems I can only query the tables that does not contain nested columns otherwise Redshift Spectrum doesn't recognize the columns.

Is there a way to fix this please? Or can I somehow create an external table in Redshift Spectrum and point it to the partitions from the Hive Metastore so that it will be in sync?

The table in Hive is like that:

CREATE TABLE `test`.`product_created` (`data` STRUCT<`id`: STRING, `type`: STRING>, `meta` STRUCT<`created_at`: BIGINT>, `year` INT, `month` INT, `day` INT, `hour` INT)
USING json
OPTIONS (
  `serialization.format` '1',
  path 's3a://my_bucket/product_created'
)
PARTITIONED BY (year, month, day, hour)

When I check in SVV_EXTERNAL_COLUMNS I see those columns:
col: array<string>
year: int
month: int
day: int
hour: int
And it gives an error if I try to query 'col' column.

I have the same issue with data in Parquet.
But it works for tables without nested columns like that:

CREATE TABLE `test`.`users` (`id` INT, `object_id` STRING, `username` STRING, `user_type` INT)
USING PARQUET
OPTIONS (
  `serialization.format` '1',
  path 's3a://my_bucket/users'
)

Thanks

Edited by: mpierre on Mar 13, 2020 6:44 AM

mpierre
demandé il y a 4 ans733 vues
1 réponse
0

I managed to fix the issues.

The first one with JSON is because I was creating the table with Spark and it's using a different SerDe by default.
I tried using this one instead and it works:

CREATE EXTERNAL TABLE `test`.`product_created` (`data` STRUCT<`id`: STRING, `type`: STRING>, `meta` STRUCT<`created_at`: BIGINT>)
ROW FORMAT SERDE &#39;org.openx.data.jsonserde.JsonSerDe&#39; LOCATION &#39;s3a://my_bucket/product_created&#39;
PARTITIONED BY (`year` INT, `month` INT, `day` INT, `hour` INT)

More info: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-data-catalog-hive.html

For the second error with Parquet files, it's strange, I just recreated the table in Hive with the same query and gave it a different name and it works well now.

mpierre
répondu il y a 4 ans

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions