EMR - Hive 3.1.3 with external metastore on RDS Aurora MySQL8

0

Hi Team, We are trying to setup hive with external metastore running in Aurora MySQL 8 , we are using emr 6.15.0 and we used the instructions from the AWS documentation .

We are able to successfully initialize the schema in rds via the hive schema tool but once we started creating tables using Hive / Pyspark in EMR , we are observing the below error.

pyspark.errors.exceptions.captured.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:javax.jdo.JDOUserException: Table "partition_keys" has been specified with a primary-key to include column "TBL_ID" but this column is not found in the table. Please check your <primary-key> column specification. NestedThrowables: org.datanucleus.exceptions.NucleusUserException: Table "partition_keys" has been specified with a primary-key to include column "TBL_ID" but this column is not found in the table. Please check your <primary-key> column specification.)

This is a big blocker for us in migration , the same setup works for us using Postgres as a external metastore but we have to use MySQL as we need to use that RDS for other services like Hue and Oozie as well.

Please help us on this

Thanks Prabakaran

質問済み 3ヶ月前208ビュー
1回答
0

Found the problem , the mariadb connector that gets shipped with emr release version was the root cause of the above issue. We switched to https://github.com/awslabs/aws-mysql-jdbc driver and the issue is resolved. Hope this helps someone

Thanks Prabakaran

回答済み 3ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ