Skip to content

Glue job failing to write Parquet files to KMS enable S3 bucket location

0

I have a glue job that is reading TXT files from S3 location, performing some validations and then writing the final file in Parquet format to S3. At this last spark dataframe write to S3 it fails. Code:

Set KMS encryption config

spark.conf.set("spark.hadoop.fs.s3a.server-side-encryption-algorithm", "SSE-KMS") spark.conf.set("spark.hadoop.fs.s3a.server-side-encryption.key", kms_key_arn)

Write Parquet file

cleaned_df.write.mode("overwrite").format("parquet").save(output_path)

Error: Traceback (most recent call last): File "/tmp/s3_to_s3_custom_glue_job_inbound_script.py", line 636, in process_table cleaned_df.write.mode("overwrite").format("parquet").save(output_path) File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 968, in save self._jwrite.save(path) File "/opt/amazon/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in call return_value = get_return_value( File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 190, in deco return f(*a, **kw) File "/opt/amazon/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 326, in get_return_value raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling o189.save. : java.nio.file.AccessDeniedException: INBOUND/PROCESSED_DATA/Full_Load/20250721_1405/Product_Details/_temporary/0: PUT 0-byte object on INBOUND/PROCESSED_DATA/Full_Load/20250721_1405/Product_Details/_temporary/0: com.amazonaws.services.s3.model.AmazonS3Exception: User: arn:aws:sts::297895608229:assumed-role/RRDRAR35DEV_AWS_APP01-CustomGlueRole02-dev/GlueJobRunnerSession is not authorized to perform: s3:PutObject on resource: "arn:aws:s3:::bucket/INBOUND/PROCESSED_DATA/Full_Load/20250721_1405/Product_Details/_temporary/0/" with an explicit deny in a resource-based policy (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied;

The glue role attached is having all the necessary policies with S3 and KMS access. Below are the policy details, custom_policy1 { "Version": "2012-10-17", "Statement": [ { "Action": [ "s3:", "kms:", "glue:", "secretsmanager:", "rds:", "ses:", "ses:SendRawEmail", "rds-data:" ], "Resource": [ "" ], "Effect": "Allow", "Sid": "GlueCRPolicy" }, { "Action": "events:", "Resource": "", "Effect": "Allow" }, { "Action": [ "ec2:DescribeNetworkInterfaces", "ec2:CreateNetworkInterface", "ec2:DeleteNetworkInterface", "ec2:DescribeInstances", "ec2:AttachNetworkInterface", "ses:SendRawEmail" ], "Resource": "", "Effect": "Allow" }, { "Action": [ "logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents" ], "Resource": [ "" ], "Effect": "Allow", "Sid": "CloudWatchLogsPolicy" } ] } Custom_policy2 { "Version": "2012-10-17", "Statement": [ { "Action": "iam:", "Effect": "Deny", "NotResource": "arn:aws:iam::role/RRDRAR35DEV_AWS_APP01-" }, { "Action": [ "s3:AbortMultipartUpload", "s3:AssociateAccessGrantsIdentityCenter", "s3:BypassGovernanceRetention", "s3:CreateAccessGrant", "s3:CreateAccessGrantsInstance", "s3:CreateAccessGrantsLocation", "s3:CreateAccessPoint", "s3:CreateAccessPointForObjectLambda", "s3:CreateBucket", "s3:CreateJob", "s3:CreateMultiRegionAccessPoint", "s3:CreateStorageLensGroup", "s3:DeleteAccessGrant", "s3:DeleteAccessGrantsInstance", "s3:DeleteAccessGrantsInstanceResourcePolicy", "s3:DeleteAccessGrantsLocation", "s3:DeleteAccessPoint", "s3:DeleteAccessPointForObjectLambda", "s3:DeleteAccessPointPolicy", "s3:DeleteAccessPointPolicyForObjectLambda", "s3:DeleteBucket", "s3:DeleteBucketPolicy", "s3:DeleteBucketWebsite", "s3:DeleteJobTagging", "s3:DeleteMultiRegionAccessPoint", "s3:DeleteObject", "s3:DeleteObjectTagging", "s3:DeleteObjectVersion", "s3:DeleteObjectVersionTagging", "s3:DeleteStorageLensConfiguration", "s3:DeleteStorageLensConfigurationTagging", "s3:DeleteStorageLensGroup", "s3:DescribeJob", "s3:DescribeMultiRegionAccessPointOperation", "s3:DissociateAccessGrantsIdentityCenter", "s3:GetAccelerateConfiguration", "s3:GetAccessGrant", "s3:GetAccessGrantsInstance", "s3:GetAccessGrantsInstanceForPrefix", "s3:GetAccessGrantsInstanceResourcePolicy", "s3:GetAccessGrantsLocation", "s3:GetAccessPoint", "s3:GetAccessPointConfigurationForObjectLambda", "s3:GetAccessPointForObjectLambda", "s3:GetAccessPointPolicy", "s3:GetAccessPointPolicyForObjectLambda", "s3:GetAccessPointPolicyStatus", "s3:GetAccessPointPolicyStatusForObjectLambda", "s3:GetAccountPublicAccessBlock", "s3:GetAnalyticsConfiguration", "s3:GetBucketAcl", "s3:GetBucketCORS", "s3:GetBucketLocation", "s3:GetBucketLogging", "s3:GetBucketNotification", "s3:GetBucketObjectLockConfiguration", "s3:GetBucketOwnershipControls", "s3:GetBucketPolicy", "s3:GetBucketPolicyStatus", "s3:GetBucketPublicAccessBlock", "s3:GetBucketRequestPayment", "s3:GetBucketTagging", "s3:GetBucketVersioning", "s3:GetBucketWebsite", "s3:GetDataAccess", "s3:GetEncryptionConfiguration", "s3:GetIntelligentTieringConfiguration", "s3:GetInventoryConfiguration", "s3:GetJobTagging", "s3:GetLifecycleConfiguration", "s3:GetMetricsConfiguration", "s3:GetMultiRegionAccessPoint", "s3:GetMultiRegionAccessPointPolicy", "s3:GetMultiRegionAccessPointPolicyStatus", "s3:GetMultiRegionAccessPointRoutes", "s3:GetObject", "s3:GetObjectAcl", "s3:GetObjectAttributes", "s3:GetObjectLegalHold", "s3:GetObjectRetention", "s3:GetObjectTagging", "s3:GetObjectTorrent", "s3:GetObjectVersion", "s3:GetObjectVersionAcl", "s3:GetObjectVersionAttributes", "s3:GetObjectVersionForReplication", "s3:GetObjectVersionTagging", "s3:GetObjectVersionTorrent", "s3:GetReplicationConfiguration", "s3:GetStorageLensConfiguration", "s3:GetStorageLensConfigurationTagging", "s3:GetStorageLensDashboard", "s3:GetStorageLensGroup", "s3:InitiateReplication", "s3:ObjectOwnerOverrideToBucketOwner", "s3:PauseReplication", "s3:PutAccelerateConfiguration", "s3:PutAccessGrantsInstanceResourcePolicy", "s3:PutAccessPointConfigurationForObjectLambda", "s3:PutAccessPointPolicy", "s3:PutAccessPointPolicyForObjectLambda", "s3:PutAccessPointPublicAccessBlock", "s3:PutAccountPublicAccessBlock", "s3:PutAnalyticsConfiguration", "s3:PutBucketAcl", "s3:PutBucketCORS", "s3:PutBucketLogging", "s3:PutBucketNotification", "s3:PutBucketObjectLockConfiguration", "s3:PutBucketOwnershipControls", "s3:PutBucketPolicy", "s3:PutBucketPublicAccessBlock", "s3:PutBucketRequestPayment", "s3:PutBucketTagging", "s3:PutBucketVersioning", "s3:PutBucketWebsite", "s3:PutEncryptionConfiguration", "s3:PutIntelligentTieringConfiguration", "s3:PutInventoryConfiguration", "s3:PutJobTagging", "s3:PutLifecycleConfiguration", "s3:PutMetricsConfiguration", "s3:PutMultiRegionAccessPointPolicy", "s3:PutObject", "s3:PutObjectAcl", "s3:PutObjectLegalHold", "s3:PutObjectRetention", "s3:PutObjectTagging", "s3:PutObjectVersionAcl", "s3:PutObjectVersionTagging", "s3:PutReplicationConfiguration", "s3:PutStorageLensConfiguration", "s3:PutStorageLensConfigurationTagging", "s3:ReplicateDelete", "s3:ReplicateObject", "s3:ReplicateTags", "s3:RestoreObject", "s3:SubmitMultiRegionAccessPointRoutes", "s3:TagResource", "s3:UntagResource", "s3:UpdateAccessGrantsLocation", "s3:UpdateJobPriority", "s3:UpdateJobStatus", "s3:UpdateStorageLensGroup" ], "Effect": "Deny", "NotResource": [ "arn:aws:s3:::bucket/INBOUND/" ] }, { "Action": "comprehend:", "Effect": "Deny", "NotResource": [ "arn:aws:Bucket1/", "arn:aws:Bucket1/" ] }, { "Condition": { "StringNotEquals": { "aws:ResourceTag/DataProductId": "project" } }, "Resource": "", "Effect": "Deny", "NotAction": [ "s3:", "kms:", "iam:", "comprehend:", "execute-api:", "logs:", "ses:", "secretsmanager:GetSecretValue" ] } ] } How to resolve this error and complete write o S3

asked 10 months ago270 views
2 Answers
0

Hello,

In this context, please note that the “s3:PutObject” API call is getting denied due to resource based policy which can be the s3 bucket policy[1]. The IAM policy attached to the Role seems fine. Hence please check the bucket policy of your s3 bucket "arn:aws:s3:::bucket/INBOUND/PROCESSED_DATA/Full_Load/20250721_1405/Product_Details/_temporary/0/" and remove any Deny statement that is causing the “s3:PutObject” API call. It should mitigate the issue.

In case you still see the same error, then please check the Glue Catalog policy if applied[2].

That being said, if you would like resource based Information/troubleshooting, please raise a support case with AWS for further information. If a support case has already been created please be assured that we will get back to you and assist you in the best way possible.

References-

[1]- https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucket-policies.html

[2]- https://docs.aws.amazon.com/glue/latest/dg/security_iam_resource-based-policy-examples.html#security_iam_resource-based-policy-examples-considerations

AWS
answered 10 months ago
  • This is the bucket policy {"Sid":"CCEncryptionEnforcement","Effect":"Deny","Principal":"","Action":"s3:PutObject","Resource":"arn:aws:s3:::bucket/","Condition":{"Null":{"s3:x-amz-server-side-encryption":"true"}}},{"Sid":"CCDenyHttp","Effect":"Deny","Principal":"","Action":["s3:GetObject","s3:PutObject"],"Resource":"arn:aws:s3:::bucket/","Condition":{"Bool":{"aws:SecureTransport":"false"}}},{"Sid":"CCDenyUnEncryptedObjectUploads","Effect":"Deny","Principal":"","Action":"s3:PutObject","Resource":"arn:aws:s3:::bucket/","Condition":{"StringNotEquals":{"s3:x-amz-server-side-encryption":"aws:kms"}}},{"Sid":"AllowAccessFromAccount","Effect":"Allow","Principal":{"AWS":["arn:aws:iam::role/RRDRAR35DEV_AWS_APP01-CustomGlueRole02-dev"]},"Action":["s3:Get*","s3:Put*","s3:List*","s3:PutObject","s3:PutObjectAcl","s3:GetObject","s3:GetObjectAcl"],"Resource":["arn:aws:s3:::bucket","arn:aws:s3:::bucket/"]},{"Sid":"MustBeEncryptedInTransit","Effect":"Deny","Principal":"","Action":"s3:","Resource":["arn:aws:s3:::bucket","arn:aws:s3:::bucket/"],"Condition":{"Bool":{"aws:SecureTransport":"false"}}}

    please advice

0

The error you're encountering is related to permissions when trying to write Parquet files to a KMS-enabled S3 bucket. Despite having broad permissions in your policies, there's an explicit deny in a resource-based policy (likely a bucket policy) that's preventing the write operation.

The key issue is in the error message: "User: arn:aws:sts::297895608229:assumed-role/RRDRAR35DEV_AWS_APP01-CustomGlueRole02-dev/GlueJobRunnerSession is not authorized to perform: s3:PutObject on resource: "arn:aws:s3:::bucket/INBOUND/PROCESSED_DATA/Full_Load/20250721_1405/Product_Details/_temporary/0/" with an explicit deny in a resource-based policy."

Here are some steps to resolve this issue:

  1. Check the S3 bucket policy: There appears to be an explicit deny rule in the bucket policy that's preventing your Glue job from writing to the temporary directories that Spark creates during the write process. Look for any deny statements in the bucket policy that might be affecting the path pattern "INBOUND/PROCESSED_DATA/Full_Load/*/Product_Details/_temporary/".

  2. KMS configuration issue: Your current KMS configuration is using the s3a filesystem, but the error suggests you might be using the standard s3 filesystem. Try updating your KMS configuration to use the correct format:

spark.conf.set("spark.hadoop.fs.s3.enableServerSideEncryption", "true")
spark.conf.set("spark.hadoop.fs.s3.serverSideEncryption.kms.keyId", kms_key_arn)
  1. Check Custom_policy2: Your second policy has a Deny statement for S3 actions that only allows operations on "arn:aws:s3:::bucket/INBOUND/*". The error is occurring when trying to write to a temporary directory within this path. Make sure your NotResource pattern properly includes all the necessary paths, including temporary directories that Spark creates.

  2. Verify KMS key permissions: Ensure that the KMS key policy allows the Glue role to use the key for encryption operations.

  3. Consider using a different write approach: If the bucket policy cannot be modified, you might need to adjust your write strategy. For example, you could write to a temporary location first (where you have full permissions) and then use AWS SDK commands to copy the files to the final destination with the correct encryption settings.

The most likely issue is that the bucket policy has an explicit deny that's preventing writes to temporary directories, which Spark needs during the write process. Review and update the bucket policy to allow these operations for your Glue role.
Sources
An error occurred while calling o1109.pyWriteDynamicFrame - AWS Glue Job | AWS re:Post
Data protection - Amazon EMR

answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.