How to delete old files deployed in S3 with CodePipeline
Hi, we have a CodePipeline that we use to deploy our static website. We have it configured to be triggered whenever a zip file is dropped in a specific S3 location, following this: https://docs.aws.amazon.com/codepipeline/latest/userguide/tutorials-s3deploy.html The problem we have is that we publish some documents to the website, which after some time will have to be deleted from it. With the current behaviour, we will push a new zip file without those files that have to be deleted, but since the pipeline will not delete those files, they will still be available.
e.g.
1. We will deploy our website with a new job post. When the user goes to https://www.my-website.com/careers/senior-developer it will access the file careers/senior-developer.html
in our bucket.
2. A few weeks later, redeploy the website removing that job post. Because the pipeline doesn't delete any files, a user could still go to https://www.my-website.com/careers/senior-developer and still access the file.
What's the recommended way of dealing with this issue?
Thanks in advance
Yes - files are not deleted automatically from S3 as per tutorial Note: "Even if you delete files from the source repository, the S3 deploy action does not delete S3 objects corresponding to deleted files."
Depending on the availability requirements of the website (need more details).
Option 1:
Assumption: you publish the whole website with all the documents every time.
The best way to handle this situation - is to cleanup the bucket before each deploy. You could leverage "build stage" that you skipped in the tutorial - to issue the AWS CLI S3 command: (https://docs.aws.amazon.com/cli/latest/reference/s3/) aws s3 ...
to cleanup the bucket.
Tip: You could improve the smoothness of the process if you enable CloudFront with caching of the S3 as origin for website content. In this case during "website" cleanup on S3 - CloudFront still have the cached version, and users can access that, but after TTL for cache expires on CF - it'll route to S3 origin and files on S3 won't be available. (that's what you expect).
Option 2
You don't delete the whole website on s3, but you compare the diffs on current build with previous tag in Git and looking for files/folders being deleted - and via script generate list of files that needs to be deleted correspondingly from S3.
Relevant questions
Deploy to elastic beanstalk across accounts
Accepted Answerasked 3 months agoSteps for deploying code on IIS
asked 4 months agoIs there an option in 'My Files' for a user to delete files?
asked 3 years agoEnabling S3 Encryption-at-rest on a go-forward basis with s3fs
asked 2 days agoHow to escape a comma in a csv file in AWS Glue?
Accepted AnswerInternal error with CodePipeline + CodeDeploy
asked 3 years agoManual Approval step - Ability to customize email body programmatically
asked 2 years agoHow to delete old files deployed in S3 with CodePipeline
asked 5 days agotrigger CodePipeline on schedule
Accepted Answerasked 5 years agoMigration to cdk: How to include existing Serverless::Function yaml files in cdk codepipeline
asked 2 months ago
Thanks for the quick response and the options, Aleksandr.
The issue we will have with Option 1 is that our pipeline is deploying the site to three different environments, so it will look like this: Source > Build > DeployToDev > DeployToQA > DeployToProd
In this case, we cannot do the cleanup of the three buckets in the Build stage, because it's possible that the transitions to prod have been disabled for QA to do some manual testing. I guess we can have a build stage before each deployment stage, and that can delete the files in the bucket just before the new deployment.
We already have cloudfront in place, so that would help us to make the deployment smoother.
Option 2 feels a bit hacky in my opinion, so we will probably go for some variation of option 1.
Thanks again.