- Newest
- Most votes
- Most comments
Hello.
Could you please attach the entire policy you are using?
Also, will it succeed if I temporarily set S3FullAccess?
If you set S3FullAccess and it succeeds, there is a problem with access privileges to S3.
The following documents may be helpful.
https://repost.aws/knowledge-center/glue-403-access-denied-error
If you enable the S3 server access log, you will be able to see which IAM role was used to access it. You may be able to confirm this by checking the log field "Requester". https://docs.aws.amazon.com/AmazonS3/latest/userguide/LogFormat.html
This was a great idea, thank you!
I see, in that case, we as users cannot control Glue's movements, so it would be difficult to deal with it...
OK - so tried adding S3FullAccess to the list of policies and get the same behavior; error during writing rows, due to AWS s3 403 error - which suggests then the piece that is performing the writing is NOT the Glue worker - which is supposed to run with the configured service role.
So if the piece that is writing the rows is NOT under the configured IAM role, then it begs the question, what is writing the rows, if not a worker executing with the configured Role :/ Given that S3FullAccess isn't working, then me attaching the full set of policies is unlikely to help much. my mistake was in thinking the writing of the dynamic frame would be executed under the role given to the job.
Any suggestions as to which user / role / set of permissions it is actually executing as? I will try to butcher my script with a set of boto3 commands which log the user / role(s) etc. but i'm pretty stumped then, if it is not running as the configured role as per "IAM Role" in the "Job Details" tab :/
@Riku_Kobayashi
Great idea re: S3 logs - by all accounts the process seems to have problems with the partitions that have been set up:
ed972da9ad376f018cd13ea47ea3527296a57d1a6e03d455c86cedc1ec558fa5 cognius-messaging-staging [03/Oct/2023:14:38:18 +0000] 18.212.199.80 arn:aws:sts::438298068074:assumed-role/AWSGlueServiceRole-messaging/GlueJobRunnerSession KV8R86NG6X4AKA6Y REST.HEAD.OBJECT person/year%253D2023/month%253D09/day%253D28 "HEAD /person/year%3D2023/month%3D09/day%3D28 HTTP/1.1" 404 NoSuchKey 303 - 8 - "-" "ElasticMapReduce/1.0.0 emrfs/s3n user:spark,groups:[root], aws-internal/3 aws-sdk-java/1.12.331 Linux/4.14.238-125.422.amzn1.x86_64 OpenJDK_64-Bit_Server_VM/25.382-b05 java/1.8.0_382 scala/2.12.15 groovy/2.4.4 vendor/Amazon.com_Inc. cfg/retry-mode/standard" - eHncuUZDJ4BBK49tqG4HMOPWiYJ8SpaFrQby3lWsY1+Y2EOtCweDooAqtdJyzOdpIKaQkgI+m14= SigV4 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader cognius-messaging-staging.s3.amazonaws.com TLSv1.2 - -
ed972da9ad376f018cd13ea47ea3527296a57d1a6e03d455c86cedc1ec558fa5 cognius-messaging-staging [03/Oct/2023:14:27:19 +0000] 34.204.78.144 arn:aws:sts::438298068074:assumed-role/AWSGlueServiceRole-messaging/GlueJobRunnerSession VTATVJ3V07SHEVT2 REST.HEAD.OBJECT person/year%253D2023/month%253D09/day%253D28_%2524folder%2524 "HEAD /person/year%3D2023/month%3D09/day%3D28_%24folder%24 HTTP/1.1" 404 NoSuchKey 312 - 8 - "-" "ElasticMapReduce/1.0.0 emrfs/s3n user:spark,groups:[root], aws-internal/3 aws-sdk-java/1.12.331 Linux/4.14.238-125.422.amzn1.x86_64 OpenJDK_64-Bit_Server_VM/25.382-b05 java/1.8.0_382 scala/2.12.15 groovy/2.4.4 vendor/Amazon.com_Inc. cfg/retry-mode/standard" - G06qLT2FisbJTTIF+7iE640eJ4KnT9g+kxHbhkjzPZR+ZSBcejzfNXONO22T2Xgba8CQ8p5GjTQ= SigV4 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader cognius-messaging-staging.s3.amazonaws.com TLSv1.2 - -
For further explanation - we use a glue crawler to automatically create our tables, columns etc. Downside to this process is that it will automatically set the partition names as "Partition 0", "Partition 1" etc. when using a yyyy/mm/dd structure such as:
"person/2023/10/04" - so to get around that, we name the directories as: "person/year=2023/month=10/day=04" which ensures the partitions are named correctly when creating the table and partitions. If the url is completely unescaped, it does exist - there are partitions for person/2023/09/28. As such if it's coming up with 404, I wonder if either the engine doesn't know how to work with url escaped paths.
Although irritatingly - from the glue logs:
Caused by: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: W2S801ETG7WGY7SX; S3 Extended Request ID: ZDuhL26URJL9Lu+lcj+Pm1NrYOSAUpjTpgozDcHHUApnbJF64fWfsI6AnmfIhkQUlmOtpxgqvjtqaZaNqPwTmFikiAY/nlLkb4bjNykaGxM=; Proxy: null)
ls . | grep xargs W2S801ETG7WGY7SX
(none)
So it seems these occasions where there is a 403 are not being written to the logs - presumably that this is not the target bucket is more likely then than it just not being logged?
Relevant content
- asked 3 years ago
- AWS OFFICIALUpdated 9 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 9 months ago
- AWS OFFICIALUpdated 3 years ago
have you tried with "s3://bucket/*" resource, I have always done it like that, not sure if you can put the wildcard in the bucket name to include any prefix