EMR Hive Metastore Configuration Issue with Hudi

0

Hello Community,

I’m running into an issue with configuring Hive Metastore on an EMR Cluster using a CloudFormation template. Here's a brief overview of my setup:

EMR Cluster: Created with Hive Metastore configured.

CloudFormation Template: The relevant part of the template is as follows:

- Classification: hive-site
        #   ConfigurationProperties:
        #     javax.jdo.option.ConnectionURL: !Sub "jdbc:mysql://<Hostname>:3306/<dbname>?createDatabaseIfNotExist=true"
        #     javax.jdo.option.ConnectionDriverName: "org.mariadb.jdbc.Driver"
        #     javax.jdo.option.ConnectionUserName: <username>
        #     javax.jdo.option.ConnectionPassword: <password>

The template runs successfully and writes the RDS username and password to the hive-site.xml file. However, when I attempt to sync the Hive Metastore using Hudi, it seems to be interacting with a local Hive database instead of the RDS database configured in hive-site.xml.

Questions:

  1. Is there something I might be missing in my CloudFormation template that could cause Hive Metastore to default to a local database?
  2. Are there additional configuration steps required for Hudi to properly connect to the RDS Hive Metastore?
  3. Should I check any specific logs or configurations to troubleshoot why Hudi is not using the RDS database? Any insights or suggestions would be greatly appreciated!

Thanks in advance!

Sachin
asked a month ago49 views
1 Answer
0

Potential Issues in CloudFormation Template:

If Hive is defaulting to a local database, it’s possible that the hive-site.xml configuration isn’t being properly applied or overridden by another configuration. Ensure that the hive-site classification in your CloudFormation template is correctly formatted and that no other conflicting configurations are being applied elsewhere in the EMR setup.

Additional Configuration Steps for Hudi:

For Hudi to properly connect to the RDS Hive Metastore, ensure that Hudi's configuration explicitly points to the Hive Metastore in the hive-site.xml. Additionally, confirm that the Hive service is using the correct hive-site.xml file. Hudi generally relies on Hive’s configurations, so if Hive is correctly configured, Hudi should follow suit.

Logs and Configurations to Check:

Check the following logs and configurations:

  1. Hive Logs: Located in /var/log/hive/ on the EMR cluster nodes. These logs will show whether Hive is connecting to the RDS instance or a local Derby database.
  2. EMR Step Logs: Review the step logs in the EMR console to see if there are any indications that the configuration was not applied correctly.
  3. hive-site.xml: Manually inspect the hive-site.xml file on the EMR cluster to ensure it has the correct RDS configuration after the cluster is launched.
profile pictureAWS
EXPERT
Deeksha
answered a month ago
profile picture
EXPERT
reviewed a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions