How to delete old files deployed in S3 with CodePipeline
Hi, we have a CodePipeline that we use to deploy our static website. We have it configured to be triggered whenever a zip file is dropped in a specific S3 location, following this: https://docs.aws.amazon.com/codepipeline/latest/userguide/tutorials-s3deploy.html The problem we have is that we publish some documents to the website, which after some time will have to be deleted from it. With the current behaviour, we will push a new zip file without those files that have to be deleted, but since the pipeline will not delete those files, they will still be available.
1. We will deploy our website with a new job post. When the user goes to https://www.my-website.com/careers/senior-developer it will access the file
careers/senior-developer.html in our bucket.
2. A few weeks later, redeploy the website removing that job post. Because the pipeline doesn't delete any files, a user could still go to https://www.my-website.com/careers/senior-developer and still access the file.
What's the recommended way of dealing with this issue?
Thanks in advance
Yes - files are not deleted automatically from S3 as per tutorial Note: "Even if you delete files from the source repository, the S3 deploy action does not delete S3 objects corresponding to deleted files."
Depending on the availability requirements of the website (need more details).
Assumption: you publish the whole website with all the documents every time.
The best way to handle this situation - is to cleanup the bucket before each deploy. You could leverage "build stage" that you skipped in the tutorial - to issue the AWS CLI S3 command: (https://docs.aws.amazon.com/cli/latest/reference/s3/)
aws s3 ... to cleanup the bucket.
Tip: You could improve the smoothness of the process if you enable CloudFront with caching of the S3 as origin for website content. In this case during "website" cleanup on S3 - CloudFront still have the cached version, and users can access that, but after TTL for cache expires on CF - it'll route to S3 origin and files on S3 won't be available. (that's what you expect).
You don't delete the whole website on s3, but you compare the diffs on current build with previous tag in Git and looking for files/folders being deleted - and via script generate list of files that needs to be deleted correspondingly from S3.
Thanks for the quick response and the options, Aleksandr.
The issue we will have with Option 1 is that our pipeline is deploying the site to three different environments, so it will look like this: Source > Build > DeployToDev > DeployToQA > DeployToProd
In this case, we cannot do the cleanup of the three buckets in the Build stage, because it's possible that the transitions to prod have been disabled for QA to do some manual testing. I guess we can have a build stage before each deployment stage, and that can delete the files in the bucket just before the new deployment.
We already have cloudfront in place, so that would help us to make the deployment smoother.
Option 2 feels a bit hacky in my opinion, so we will probably go for some variation of option 1.
Deploy to elastic beanstalk across accountsAccepted Answerasked 3 months ago
Steps for deploying code on IISasked 4 months ago
Is there an option in 'My Files' for a user to delete files?asked 3 years ago
Enabling S3 Encryption-at-rest on a go-forward basis with s3fsasked 2 days ago
How to escape a comma in a csv file in AWS Glue?Accepted AnswerMODERATORasked 3 years ago
Internal error with CodePipeline + CodeDeployasked 3 years ago
Manual Approval step - Ability to customize email body programmaticallyasked 2 years ago
How to delete old files deployed in S3 with CodePipelineasked 5 days ago
trigger CodePipeline on scheduleAccepted Answerasked 5 years ago
Migration to cdk: How to include existing Serverless::Function yaml files in cdk codepipelineasked 2 months ago