- Newest
- Most votes
- Most comments
You’re absolutely correct that when a bucket’s policy delegates access control to an S3 Access Point, you generally shouldn’t need to attach explicit bucket-level permissions for the consuming accounts. In theory, the access point policy should act as the single source of truth. However, in practice, AWS Glue’s runtime behavior for requester pays access via access points isn’t fully consistent with that model yet.
When Glue launches an ETL job, it internally initializes multiple Spark executors that interact directly with S3 using the Hadoop S3A client. The challenge is that Glue’s underlying S3 client doesn’t always resolve the delegated access point policy correctly during the requester pays handshake. Instead, it still attempts to verify permissions on the underlying bucket resource ARN, even when the bucket policy explicitly delegates control to the access point. This is why you’re seeing the 403 AccessDenied response despite correct access point configuration.
Here are a few approaches that have worked in similar cross-account setups:
Add explicit bucket-level read permissions to the Glue job role even if delegation is configured. This doesn’t violate the access point model but compensates for the current behavior of the Glue runtime’s S3 client. Use a scoped statement limited to the specific bucket ARN and restrict to GetObject, GetObjectVersion, and ListBucket.
Confirm the Glue job role has permission to call s3:GetAccessPoint and s3:GetAccessPointPolicy. These permissions are sometimes overlooked but are required when Glue needs to resolve access point aliases during initialization.
Verify the URI structure. The Hadoop client expects the access point alias or full ARN in the format:
s3://<access-point-name>-<account-id>.s3-accesspoint.<region>.amazonaws.com/<prefix>/
Using only s3://accesspoint/[accessPointName]/ can sometimes fail resolution inside Glue depending on the SDK version.
Ensure the requester pays header is applied globally. Set both:
spark._jsc.hadoopConfiguration().set("fs.s3.useRequesterPaysHeader", "true") spark._jsc.hadoopConfiguration().set("fs.s3a.requester.pays", "true")
The second property is used by the newer S3A client, and both may be needed depending on the Glue version.
If you’re running Glue 4.0 or earlier, the internal SDK for S3 access points doesn’t fully handle cross-account requester pays buckets. Upgrading to Glue 5.0, which uses a newer AWS SDK for Java, improves consistency with access points and requester pays headers.
In the long term, AWS is expected to harmonize Glue’s S3 client behavior with the same model used in Athena and EMR, where the access point delegation is respected transparently. For now, the safest path is to explicitly grant read access on the bucket to the Glue job role, even if that seems redundant.
For reference, check:
Using S3 access points with AWS Glue : https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect-s3.html
Requester Pays buckets in Amazon S3 : https://docs.aws.amazon.com/AmazonS3/latest/userguide/RequesterPaysBuckets.html
AWS Glue job properties for Spark and Hadoop : https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html
It’s a nuanced setup, and your observation about access point delegation is absolutely right. This looks more like a client behavior gap than a permissions misconfiguration
To enable a Glue ETL job to access Parquet data in a requester pays S3 bucket through an access point, you need to configure specific Spark properties in your Glue job.
For requester pays buckets, you need to add the following Spark property to your Glue job configuration:
"spark.hadoop.fs.s3.useRequesterPaysHeader": "true"
This property instructs the Glue job to include the necessary x-amz-request-payer header in S3 requests, which is required when accessing requester pays buckets.
When using S3 access points, you also need to ensure your IAM role has the proper permissions to access both the S3 bucket and the access point. The role used by your Glue job should have permissions for GetObject, GetObjectVersion, and ListBucket operations on both the bucket and the access point.
For accessing data through an access point, you should use the access point ARN or alias in your Glue job configuration. The format for accessing data through an access point would be:
s3://accesspoint/[accesspointname]/[optional-prefix]
If you're still experiencing issues, verify that:
- The IAM role used by your Glue job has the necessary permissions
- The requester pays header is properly configured
- The access point resource policy correctly allows the consuming account's Glue service role
In Athena for Spark, requester pays buckets are enabled per session, not per workgroup, which is different from how regular Athena SQL queries work. This might explain why your Athena queries work but Glue ETL jobs don't.
Sources
Configure DMS S3 Target Endpoint for Requester Pays Bucket | AWS re:Post
Enable requester pays Amazon S3 buckets in Athena for Spark - Amazon Athena
The automated answer says this
"When using S3 access points, you also need to ensure your IAM role has the proper permissions to access both the S3 bucket and the access point. The role used by your Glue job should have permissions for GetObject, GetObjectVersion, and ListBucket operations on both the bucket and the access point."
If the underlying s3 bucket delegates access control to the access point, which is my case, that should work and I shouldn't have to grant access to the other accounts directly on the s3 bucket. Requesting someone to help clarify this. I have used the spark._jsc.hadoopConfiguration().set("fs.s3.useRequesterPaysHeader","true") in the Glue job as described in this re:Post article https://repost.aws/knowledge-center/requester-pays-buckets-glue-emr-athena
Thank you very much Hawke for the fast and thorough explanation. I understand now that Glue client has to evolve to support access points comprehensively. A quick test of providing access to the underlying bucket succeeded. I have accepted your answer. Thanks.
Relevant content
- AWS OFFICIALUpdated 9 months ago
