S3 permissions for launching an EMR cluster

0

I am trying to launch an EMR cluster using the default settings using the EMR new console page.

I dont have any existing IAM roles for EMR service and EMR instance profile and am letting AWS create a new one for me.

I have also configured a bootstrap action which install some python package as a shell script in S3. However, the cluster launch fails with the error code : BOOTSTRAP_FAILURE_BA_DOWNLOAD_FAILED_PRIMARY

In the S3 bucket permissions i am setting the below configuration to be set within the instance profile section as below:

Enter image description here

Attaching the permissions for the instance role created automatically by AWS below:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketVersioning",
                "s3:GetObject",
                "s3:GetObjectTagging",
                "s3:GetObjectVersion",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:ListBucketVersions",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                "arn:aws:s3:::elasticmapreduce",
                "arn:aws:s3:::elasticmapreduce/*",
                "arn:aws:s3:::*.elasticmapreduce/*"
            ]
        }
    ]
}

It is observed that the configuration set in the console for the S3 bucket access permissions is not reflected in the policy created.

I updated the policy to allow S3 full access and the cluster launched successfully. However, i want to control access to my buckets.

  1. I want to know what permissions does EMR needs to launch a cluster with bootstrap action.

  2. If I need to configure access to other buckets or prefixes do i need to modify the policy manually or it shall be created automatically during the cluster setup process.

  3. What is the reason for using the elasticmapreduce S3 bucket arn by default even when I am not creating it or specify it in the configuration?

Any update is much aprreciated. Thanks.

asked 10 months ago529 views
1 Answer
0

Hello There,

Thank you for raising this question in re:Post.

I understand that you are looking for some granular information on customizing the IAM roles while launching an EMR Cluster.

To answer your queries

1 - You should have below if you are performing basic operations on the required S3 bucket

s3:GetObject: This permission is required to download any files from an S3 bucket that are needed for the bootstrap action. s3:PutObject: This permission is required to upload any files to an S3 bucket that are generated by the bootstrap action.

If suppose your application runs referencing the data using s3://<bucket> , then Amazon EMR uses EC2 Instance profile to make the request and the respective permissions has to be provided as per the documentation[1]

2 - If you would like to configure access to other buckets/prefixes you have to your custom policy manually as the policy is not an default policy so, the permissions will not be created automatically during cluster setup process.

3 - elasticmapreduce S3 bucket arn is an public repository which contains patches and fixes for example recent log4j patches and as well as "bootstrap-actions/run-if" which will be used to install our scripts on master node . If you would like to check further you can run below command to see what else you are having in the public repository.

aws s3 ls s3://elasticmapreduce/bootstrap-actions --recursive

I hope the above information helps.

References: [1] https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-iam-role-for-ec2.html

AWS
answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions