Glue catalog query from sagemaker EMR cluster fails

3

I am running catalog queries like spark.catalog.tables, or show databases or show tables. All these commands failing emr_instance_profile role is not authorized to perform: glue:CreateDatabase on resource arn:aws:glue:xxxxxxx:catalog because no identity based policy allows the glue:createDatabase action

Very strange, I am not creating any database and the permissions are already checked twice and tested using IAM simulator. So I am sure it had necessary permissions to list db or tables. I am no where relate why it is so weird 😕 Really appreciate the suggestion

Vaas
已提问 6 个月前248 查看次数
2 回答
3
已接受的回答

Hello,

There could a couple of reasons which might block to run your queries.

  1. Either the permission denied for the instance profile role which used to query from Sagemaker studio. You can check the if any SCP(organization level policy blocking)
  2. There is a database called "default" should exist in the catalog. Please check the API call invoked from SM studio to Glue in Cloudtrail service API Event is GetDatabases or CreateDatabase.
  3. If this tries to create the default database failed due to instance profile role has no permission, then it might be case you get the permission denied error even when you run list database command.
  4. Based on the CloudTrail API info that you check, either provide createDatabase permission to EC2 instance profile role or create the "default" database in the catalog who can able create database.

To grant the Create database permission to the relevant IAM role.

1. Open the AWS Lake Formation console.
2. In the navigation pane, under Permissions, choose Administrative roles and tasks.
3. Under Database creators, choose Grant.
4. For IAM users and roles, from the dropdown list, choose the IAM role that you want to grand access to.
5. Under Catalog permissions, choose Create database.

If the EC2 instance profile role does not grant CreateDatabase permission, then,

1. Go to Glue data catalog, 
2. Create database
3. Specify name as "default", provide location optionally and save it. 

You can just keep the default database without doing anything but it is kinda required to query the other databases. If you delete the default database, then the role/user will try to create database even when run list database query for an instance. Hope this information is clear.

AWS
支持工程师
已回答 6 个月前
1

Excellent. Thank you for this info. It worked fine after created the default database!.

Vaas
已回答 6 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则