- Newest
- Most votes
- Most comments
Hello.
It's possible that the answer in the URL below doesn't match your situation, but it seems like you're avoiding the error by using an S3 event trigger to load the data with Lambda.
https://repost.aws/ja/questions/QU2cuK87vHSyCNOBE439xqDQ/aws-glue-error-file-already-exists
I contacted AWS Support and we worked together in this issue. But not able to fix. The query I am using through this logic is very huge. For some reason, if the query is very huge, I ma getting this error. Support team told that, internally some node failure is happening when the data is COPIED from S3 to redshift. This failure is displayed as the error I told you. But till now, I am did not get any solution. But found a workaround. I am writing the output to an S3 file, and Lambda will take care of loading into Redshift through an event trigger. This is working smooth.
Relevant content
- Accepted Answerasked a year ago
- asked 3 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 9 months ago
I've read that, thanks. Mine is not a huge query, and we're talking about 1.6M rows, not a big deal for Glue. I'll try to fix it within Glue, if I don't succeed I'll try using S3 as staging area.
Follow up: I've loaded the full table to S3, no problem. Then I've created a Glue Job using the S3 bucket catalog table as source, cleaned up the data, and tried to load to Redshift, same error: An error occurred while calling o127.pyWriteDynamicFrame. File already exists:s3://aws-glue-assets-*******