Skip to content

AWS Glue using s3 for data shuffling

0

Hi,

I am using s3 for data shuffling in my glue job.

When I ran the notebook, it failed with FileNotFoundException of some objects. However, I can see the objects on the shuffling s3 bucket. Any thoughts on what can be missing?

I used the following magic: %%configure { "--write-shuffle-files-to-s3": "true", "--conf": "spark.shuffle.glue.s3ShuffleBucket=s3://x/tmp/" }

Thanks! MC

asked a year ago265 views
2 Answers
0

Hello,

"FileNotFoundError: No such file or directory" is a generic error which can be occurred due to multiple reasons and few of them are as below:[1]

  • Specified S3 bucket/path does not exist -It is possible the underlying files have been updated at the same time while reading from the job
  • IAM role associated with the AWS Glue Job does not have the required permissions for the respective S3 bucket
  • Any specified bucket policy not allowing to access the S3 bucket.
  • If the path into script/code is not defined properly

I would request you to check the above and see if there is any discrepancy in accessing those s3 objects.

Thank you!

References: [1]https://stackoverflow.com/questions/22282760/filenotfounderror-errno-2-no-such-file-or-directory - [3rd party]

AWS
SUPPORT ENGINEER
answered a year ago
  • Thanks for the hints. Yes, it turns out to be missing some IAM permission. Out of curiosity, why AWS treats such a permission issue as FileNotFoundError? IMHO, it makes the troubleshooting easier if it is an Access Denied Error.

0

Thanks for the hints. Yes, it turns out to be missing some IAM permission. Out of curiosity, why AWS treats such a permission issue as FileNotFoundError? IMHO, it makes the troubleshooting easier if it is an Access Denied Error.

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.