Glue catalog query from sagemaker EMR cluster fails

3

I am running catalog queries like spark.catalog.tables, or show databases or show tables. All these commands failing emr_instance_profile role is not authorized to perform: glue:CreateDatabase on resource arn:aws:glue:xxxxxxx:catalog because no identity based policy allows the glue:createDatabase action

Very strange, I am not creating any database and the permissions are already checked twice and tested using IAM simulator. So I am sure it had necessary permissions to list db or tables. I am no where relate why it is so weird 😕 Really appreciate the suggestion

Vaas
已提問 6 個月前檢視次數 247 次
2 個答案
3
已接受的答案

Hello,

There could a couple of reasons which might block to run your queries.

  1. Either the permission denied for the instance profile role which used to query from Sagemaker studio. You can check the if any SCP(organization level policy blocking)
  2. There is a database called "default" should exist in the catalog. Please check the API call invoked from SM studio to Glue in Cloudtrail service API Event is GetDatabases or CreateDatabase.
  3. If this tries to create the default database failed due to instance profile role has no permission, then it might be case you get the permission denied error even when you run list database command.
  4. Based on the CloudTrail API info that you check, either provide createDatabase permission to EC2 instance profile role or create the "default" database in the catalog who can able create database.

To grant the Create database permission to the relevant IAM role.

1. Open the AWS Lake Formation console.
2. In the navigation pane, under Permissions, choose Administrative roles and tasks.
3. Under Database creators, choose Grant.
4. For IAM users and roles, from the dropdown list, choose the IAM role that you want to grand access to.
5. Under Catalog permissions, choose Create database.

If the EC2 instance profile role does not grant CreateDatabase permission, then,

1. Go to Glue data catalog, 
2. Create database
3. Specify name as "default", provide location optionally and save it. 

You can just keep the default database without doing anything but it is kinda required to query the other databases. If you delete the default database, then the role/user will try to create database even when run list database query for an instance. Hope this information is clear.

AWS
支援工程師
已回答 6 個月前
1

Excellent. Thank you for this info. It worked fine after created the default database!.

Vaas
已回答 6 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南