Share Your AWS re:Post Experience - Quick 3 Question Survey
Help us improve AWS re:Post! We're interested in understanding how you use re:Post and its impact on your AWS journey. Please take a moment to complete our brief 3-question survey.
How can I set up a cross-account AWS Glue ETL job for tables using catalog resource policies?
I want to access tables in account A from account B using ETL jobs in the AWS Glue Data Catalog, without using AWS Lake Formation.
Short description
You can access cross-accounts in the AWS Glue Data Catalog with or without using AWS Lake Formation. The following sections outline the setup to access a cross-accounts catalog using only AWS Glue.
If you're using Lake Formation, see Cross-account data sharing in Lake Formation for more information on setting up a cross-account resource share.
Note: The steps describe how you can access cross-accounts within a single AWS Region. They don’t address access to resources located in a different AWS Region.
Resolution
Set up access policies in source and target accounts
Use the following steps to grant resource-level permissions to account B from account A's AWS Glue Data Catalog.
Note: Account A has the AWS Glue Data Catalog resources and account B is the extract, transform, and load (ETL) account. In addition, account A has the resource-based policy modifications while account B holds some of the AWS Identity and Access Management (IAM) policy modifications.
Attach a catalog-resource policy in account A
1. Log in to the AWS Management console.
2. In the search bar, search for AWS Glue. Choose Get Started with AWS Glue.
3. From the left panel, choose Catalog settings.
4. Under Permissions, enter the following resource policy. This resource policy lets account B access the databases and tables in account A.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::1111222233334444:root" }, "Action": "glue:*", "Resource": [ "arn:aws:glue:us-east-1:5555666677778888:catalog", "arn:aws:glue:us-east-1:5555666677778888:database/doc_example_DB", "arn:aws:glue:us-east-1:5555666677778888:table/doc_example_DB/*" ] } ] }
Note: Replace the following values in the policy:
- 1111222233334444 with the account ID for account B
- 5555666677778888 with the account ID for account A
- us-east-1 with the Region of your choice
- doc_example_DB with the name of your database
5. (OPTIONAL) You can limit access to a specific role in account A by including the Amazon Resource Name (ARN) of the role in the policy. For example:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::1111222233334444:role/service-role/AWSGlueServiceRole_Glue_Test" }, "Action": "glue:*", "Resource": [ "arn:aws:glue:us-east-1:5555666677778888:catalog", "arn:aws:glue:us-east-1:5555666677778888:database/doc_example_DB", "arn:aws:glue:us-east-1:5555666677778888:table/doc_example_DB/*" ] } ] }
Note: Replace the following values in the policy:
- 1111222233334444 with the account ID for account B
- 5555666677778888 with the account ID for account A
- us-east-1 with the Region of your choice
- doc_example_DB with the name of your database
- AWSGlueServiceRole_Glue_Test with the ARN of the role that's used to run the ETL job
Attach an IAM policy in account B
The IAM user in account B that runs the ETL job needs access to the databases and tables in account A.
Note: If you're using Amazon Athena with the Data Catalog, then include the default database in the policy. This inclusion makes sure that the GetDatabase and CreateDatabase actions succeed. For more information, see Default database and catalog per AWS Region.
1. Log in to the IAM console with your AWS login credentials.
2. From the left panel, choose Roles.
3. Choose the role name that you're using inside the ETL script.
4. Attach an IAM policy to the AWS Glue ETL job's IAM role in account B. This gives you access to the database and tables in account A:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "glue:GetDatabase", "glue:GetConnection", "glue:GetTable", "glue:GetPartition" ], "Resource": [ "arn:aws:glue:us-east-1:5555666677778888:catalog", "arn:aws:glue:us-east-1:5555666677778888:database/default", "arn:aws:glue:us-east-1:5555666677778888:database/doc_example_DB", "arn:aws:glue:us-east-1:5555666677778888:table/doc_example_DB/*" ] } ] }
Note: Replace the following values in the policy:
- 5555666677778888 with the account ID for account A
- us-east-1 with the Region of your choice
- doc_example_DB with the name of your database
5. Verify that the policy you created is attached to the IAM role in account B.
6. Test if account B has access to the Data Catalog in account A. Create an ETL job with the following scripts:
Dynamic frame script:
df = glueContext.create_dynamic_frame.from_catalog(database="doc_example_DB", table_name="doc_example_table", catalog_id="5555666677778888", region="us-east-1")
Data frame script:
"""Create Spark Session with cross-account AWS Glue Data Catalog""" from pyspark.sql import SparkSession spark_session = SparkSession.builder.appName("Spark Glue Example") \ .config("hive.metastore.client.factory.class", \ "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory") \ .config("hive.metastore.glue.catalogid", "5555666677778888") \ .enableHiveSupport() \ .getOrCreate() table_df = spark_session.sql("SELECT * FROM doc_example_DB.doc_example_table limit 10") table_df.show()
Note: Replace the following values in the policy:
- 5555666677778888 with the account ID for account A
- doc_example_DB with the name of your database
- doc_example_table with the name of your table
- us-east-1 with the Region of your choice
Related information
Specifying AWS Glue resource ARNs
About upgrading to the Lake Formation permissions model
Migration between the Hive metastore and the AWS Glue Data Catalog
Relevant content
- asked 8 months agolg...
- Accepted Answerasked a year agolg...
- asked 2 years agolg...
- asked 7 months agolg...
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago