I am trying to access S3 buckets (DEV env) from EMR on EKS (INT env) cluster running on different accounts. I have created the IAM roles and configurations following the guide on EMR on EKS cross-account access setup guide . When I start my spark job I get error logs on S3 Bucket read operation that the access is denied.
On further debugging, I also get access denied error when I manually do aws s3 ls inside the spark job pod shell.
For DEV account, the IAM role is TestEMRCA with following policy and trust relationship
// permission policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:Get*",
"s3:List*",
"s3-object-lambda:Get*",
"s3-object-lambda:List*"
],
"Resource": "*"
}
]
}
// trust policy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AR",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::INT_ID:role/emr_on_eks" //Job Execution Role
},
"Action": "sts:AssumeRole"
}
]
}
For INT account, the IAM role is emr_on_eks with following policy and trust relationship. It is also the job execution role for the EMR to run jobs
// permission policy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"logs:CreateLogStream",
"logs:DescribeLogGroups",
"logs:DescribeLogStreams",
"s3:ListBucket",
"logs:PutLogEvents"
],
"Resource": "*"
},
{
"Sid": "VisualEditor2",
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Resource": "arn:aws:iam::DEV_ID:role/TestEMRCA"
}
]
}
// trust policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "elasticmapreduce.amazonaws.com"
},
"Action": "sts:AssumeRole"
},
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::DEV_ID:oidc-provider/<OIDC_URL>/id/<OIDC_ID>"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringLike": {
"<OIDC_URL>/id/<OIDC_ID>:sub": "system:serviceaccount:<NAMESPACE>:<SA>"
}
}
}
]
}
To test if there is problem in IAM role in DEV account, I created a new ROLE and associated that in service account in EKS cluster on INT account. When I run pod shell annotated with that service account, I can access the buckets (using aws s3 ls).
I don't know what I'm missing in case of EMR on EKS as there is only one tutorial from AWS I found and followed, I hope someone can help me.
UPDATE: I tried to manually assume the DEVB role and then set the AWS env vars.
I can access the s3 buckets, If I manually do set AWS env var which are as follows
aws sts assume-role --role-arn arn:aws:iam::DEV_ID:role/TestEMRCA --role-session-name s3-access-example
export AWS_ACCESS_KEY_ID=VAL_FROM_ABOVE_CMD
export AWS_SECRET_ACCESS_KEY=VAL2
export AWS_SESSION_TOKEN=VAL3
Doing this I can access the buckets but this is manual thing which I don't want to do.
Since EMR on EKS has this conf param --conf spark.hadoop.fs.s3.customAWSCredentialsProvider=com.amazonaws.emr.AssumeRoleAWSCredentialsProvider --conf spark.kubernetes.driverEnv.ASSUME_ROLE_CREDENTIALS_ROLE_ARN=arn:aws:iam::DEV_ID:role/TestEMRCA --role-session-name --conf spark.executorEnv.ASSUME_ROLE_CREDENTIALS_ROLE_ARN=arn:aws:iam::DEV_ID:role/TestEMRCA --role-session-name"
to automatically assume the role, if I correctly understand. I think either this is a bug or am i still missing something.