EMR on EKS Read From S3 Cross Account

0

I am trying to access S3 buckets (DEV env) from EMR on EKS (INT env) cluster running on different accounts. I have created the IAM roles and configurations following the guide on EMR on EKS cross-account access setup guide . When I start my spark job I get error logs on S3 Bucket read operation that the access is denied.

On further debugging, I also get access denied error when I manually do aws s3 ls inside the spark job pod shell.

For DEV account, the IAM role is TestEMRCA with following policy and trust relationship

// permission policy
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:Get*",
                "s3:List*",
                "s3-object-lambda:Get*",
                "s3-object-lambda:List*"
            ],
            "Resource": "*"
        }
    ]
}

// trust policy
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AR",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::INT_ID:role/emr_on_eks" //Job Execution Role
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

For INT account, the IAM role is emr_on_eks with following policy and trust relationship. It is also the job execution role for the EMR to run jobs

// permission policy
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "logs:CreateLogStream",
                "logs:DescribeLogGroups",
                "logs:DescribeLogStreams",
                "s3:ListBucket",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor2",
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::DEV_ID:role/TestEMRCA" 
        }
    ]
}

// trust policy 
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "elasticmapreduce.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        },
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::DEV_ID:oidc-provider/<OIDC_URL>/id/<OIDC_ID>"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringLike": {
                    "<OIDC_URL>/id/<OIDC_ID>:sub": "system:serviceaccount:<NAMESPACE>:<SA>"
                }
            }
        }
    ]
}

To test if there is problem in IAM role in DEV account, I created a new ROLE and associated that in service account in EKS cluster on INT account. When I run pod shell annotated with that service account, I can access the buckets (using aws s3 ls).

I don't know what I'm missing in case of EMR on EKS as there is only one tutorial from AWS I found and followed, I hope someone can help me.

UPDATE: I tried to manually assume the DEVB role and then set the AWS env vars. I can access the s3 buckets, If I manually do set AWS env var which are as follows

aws sts assume-role --role-arn arn:aws:iam::DEV_ID:role/TestEMRCA --role-session-name s3-access-example
export AWS_ACCESS_KEY_ID=VAL_FROM_ABOVE_CMD
export AWS_SECRET_ACCESS_KEY=VAL2
export AWS_SESSION_TOKEN=VAL3

Doing this I can access the buckets but this is manual thing which I don't want to do.

Since EMR on EKS has this conf param --conf spark.hadoop.fs.s3.customAWSCredentialsProvider=com.amazonaws.emr.AssumeRoleAWSCredentialsProvider --conf spark.kubernetes.driverEnv.ASSUME_ROLE_CREDENTIALS_ROLE_ARN=arn:aws:iam::DEV_ID:role/TestEMRCA --role-session-name --conf spark.executorEnv.ASSUME_ROLE_CREDENTIALS_ROLE_ARN=arn:aws:iam::DEV_ID:role/TestEMRCA --role-session-name" to automatically assume the role, if I correctly understand. I think either this is a bug or am i still missing something.

1 Antwort
0

Hi,

Thank you for reporting the issue! Do you mind sharing a stacktrace on the Access Denied exception you got? Was it failing when trying to access the entrypoint jar/pyspark script from S3, or when trying to access the data? Note that when you use Cross-Account IAM access, the entrypoint script also needs to be present on the account that you are trying to assume (ie. Your DEV account).

AWS
Joe Yin
beantwortet vor einem Jahr

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen