Cleaning a bucket with purge_s3_path with Glue console.
With the Glue Console (Glue 3.0 - python and spark), I'm need to overwrite the data of an S3 bucket in a automated daily process. I tried with the glueContext.purge_s3_path( "s3://bucket-to-clean-path/", { "retentionPeriod": 1, "manifestFilePath": "s3://bucket-for-manifest-path/" } )
function, but it is not cleaning the bucket before sending the data.
The idea is to create a process to transform some data, send it to the bucket, sent it to QS throught a manifest and repeat the process daily. All the script works as it should, it is just that the bucket keeps the data from the previous runs.
Does someone knows what could be causing this problem?
Edit: I tried changing the retention period to one hour (several hours after the data was upload), but it still doesn't remove the files. If I remove the script non relate with emptying the bucket, the job allways takes one minute.
In the manifest appears a partition in success and another in failure, but none of them are from the files that should be removed. Example of the partition: run-1639727067782-part-r-00000
Under options there are two parms. One is retention period and the other is manifest file. The retention period is 7 days by default and manifest file path might show what files are deleted successfully and which ones failed. Hope that helps. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-glue-context.html
Thank you for your help, it gave me a hint of should I look. Unfortunatelly, I wasn't able to fix it yet. It seems like none of the files appears in the manifest folder, just some partitions that I'm not sure of what they represent.
Relevant questions
Can an Glue Crawler use a S3 Lambda Access Point as a data store?
asked 7 days agoCopying data from sql server to snowflake with AWS GLUE
asked a month agoCan't get Partitions to work with my Glue Data Catalog
Accepted Answerasked 2 months agoNeed AWS Glue to store bad records/ records with error when reading Mongo db data to a S3 path and process the rest of the data.
asked 2 months agoAWS Glue read a csv file encoded in Windows 1252 with extended characters
Accepted AnswerSetting ACL in S3 objects written by an AWS Glue Job
Accepted Answer403 Access denied error from S3 in Glue
Accepted Answerasked 5 years agoHow to keep the source file name in the target output file with a AWS Glue job
Accepted Answerasked 2 years agoCleaning a bucket with purge_s3_path with Glue console.
Accepted Answerasked 7 months agoCan I use glue interactive sessions with pythonshell?
asked 5 months ago
The problem was due to lack of permissions for deleting objets in the role created for the jobs. Once revised the roles permissions, the query worked correctly