EMR on EKS Read From S3 Cross Account

0

I am trying to access S3 buckets (DEV env) from EMR on EKS (INT env) cluster running on different accounts. I have created the IAM roles and configurations following the guide on EMR on EKS cross-account access setup guide . When I start my spark job I get error logs on S3 Bucket read operation that the access is denied.

On further debugging, I also get access denied error when I manually do aws s3 ls inside the spark job pod shell.

For DEV account, the IAM role is TestEMRCA with following policy and trust relationship

// permission policy
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:Get*",
                "s3:List*",
                "s3-object-lambda:Get*",
                "s3-object-lambda:List*"
            ],
            "Resource": "*"
        }
    ]
}

// trust policy
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AR",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::INT_ID:role/emr_on_eks" //Job Execution Role
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

For INT account, the IAM role is emr_on_eks with following policy and trust relationship. It is also the job execution role for the EMR to run jobs

// permission policy
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "logs:CreateLogStream",
                "logs:DescribeLogGroups",
                "logs:DescribeLogStreams",
                "s3:ListBucket",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor2",
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::DEV_ID:role/TestEMRCA" 
        }
    ]
}

// trust policy 
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "elasticmapreduce.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        },
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::DEV_ID:oidc-provider/<OIDC_URL>/id/<OIDC_ID>"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringLike": {
                    "<OIDC_URL>/id/<OIDC_ID>:sub": "system:serviceaccount:<NAMESPACE>:<SA>"
                }
            }
        }
    ]
}

To test if there is problem in IAM role in DEV account, I created a new ROLE and associated that in service account in EKS cluster on INT account. When I run pod shell annotated with that service account, I can access the buckets (using aws s3 ls).

I don't know what I'm missing in case of EMR on EKS as there is only one tutorial from AWS I found and followed, I hope someone can help me.

UPDATE: I tried to manually assume the DEVB role and then set the AWS env vars. I can access the s3 buckets, If I manually do set AWS env var which are as follows

aws sts assume-role --role-arn arn:aws:iam::DEV_ID:role/TestEMRCA --role-session-name s3-access-example
export AWS_ACCESS_KEY_ID=VAL_FROM_ABOVE_CMD
export AWS_SECRET_ACCESS_KEY=VAL2
export AWS_SESSION_TOKEN=VAL3

Doing this I can access the buckets but this is manual thing which I don't want to do.

Since EMR on EKS has this conf param --conf spark.hadoop.fs.s3.customAWSCredentialsProvider=com.amazonaws.emr.AssumeRoleAWSCredentialsProvider --conf spark.kubernetes.driverEnv.ASSUME_ROLE_CREDENTIALS_ROLE_ARN=arn:aws:iam::DEV_ID:role/TestEMRCA --role-session-name --conf spark.executorEnv.ASSUME_ROLE_CREDENTIALS_ROLE_ARN=arn:aws:iam::DEV_ID:role/TestEMRCA --role-session-name" to automatically assume the role, if I correctly understand. I think either this is a bug or am i still missing something.

1 Answer
0

Hi,

Thank you for reporting the issue! Do you mind sharing a stacktrace on the Access Denied exception you got? Was it failing when trying to access the entrypoint jar/pyspark script from S3, or when trying to access the data? Note that when you use Cross-Account IAM access, the entrypoint script also needs to be present on the account that you are trying to assume (ie. Your DEV account).

AWS
Joe Yin
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions