Glue catalog query from sagemaker EMR cluster fails

3

I am running catalog queries like spark.catalog.tables, or show databases or show tables. All these commands failing emr_instance_profile role is not authorized to perform: glue:CreateDatabase on resource arn:aws:glue:xxxxxxx:catalog because no identity based policy allows the glue:createDatabase action

Very strange, I am not creating any database and the permissions are already checked twice and tested using IAM simulator. So I am sure it had necessary permissions to list db or tables. I am no where relate why it is so weird 😕 Really appreciate the suggestion

Vaas
asked 6 months ago235 views
2 Answers
3
Accepted Answer

Hello,

There could a couple of reasons which might block to run your queries.

  1. Either the permission denied for the instance profile role which used to query from Sagemaker studio. You can check the if any SCP(organization level policy blocking)
  2. There is a database called "default" should exist in the catalog. Please check the API call invoked from SM studio to Glue in Cloudtrail service API Event is GetDatabases or CreateDatabase.
  3. If this tries to create the default database failed due to instance profile role has no permission, then it might be case you get the permission denied error even when you run list database command.
  4. Based on the CloudTrail API info that you check, either provide createDatabase permission to EC2 instance profile role or create the "default" database in the catalog who can able create database.

To grant the Create database permission to the relevant IAM role.

1. Open the AWS Lake Formation console.
2. In the navigation pane, under Permissions, choose Administrative roles and tasks.
3. Under Database creators, choose Grant.
4. For IAM users and roles, from the dropdown list, choose the IAM role that you want to grand access to.
5. Under Catalog permissions, choose Create database.

If the EC2 instance profile role does not grant CreateDatabase permission, then,

1. Go to Glue data catalog, 
2. Create database
3. Specify name as "default", provide location optionally and save it. 

You can just keep the default database without doing anything but it is kinda required to query the other databases. If you delete the default database, then the role/user will try to create database even when run list database query for an instance. Hope this information is clear.

AWS
SUPPORT ENGINEER
answered 6 months ago
1

Excellent. Thank you for this info. It worked fine after created the default database!.

Vaas
answered 6 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions