By using AWS re:Post, you agree to the Terms of Use
/How to delete old files deployed in S3 with CodePipeline/

How to delete old files deployed in S3 with CodePipeline

0

Hi, we have a CodePipeline that we use to deploy our static website. We have it configured to be triggered whenever a zip file is dropped in a specific S3 location, following this: https://docs.aws.amazon.com/codepipeline/latest/userguide/tutorials-s3deploy.html The problem we have is that we publish some documents to the website, which after some time will have to be deleted from it. With the current behaviour, we will push a new zip file without those files that have to be deleted, but since the pipeline will not delete those files, they will still be available.

e.g. 1. We will deploy our website with a new job post. When the user goes to https://www.my-website.com/careers/senior-developer it will access the file careers/senior-developer.html in our bucket. 2. A few weeks later, redeploy the website removing that job post. Because the pipeline doesn't delete any files, a user could still go to https://www.my-website.com/careers/senior-developer and still access the file.

What's the recommended way of dealing with this issue?

Thanks in advance

1 Answers
0

Yes - files are not deleted automatically from S3 as per tutorial Note: "Even if you delete files from the source repository, the S3 deploy action does not delete S3 objects corresponding to deleted files."

Depending on the availability requirements of the website (need more details).

Option 1:

Assumption: you publish the whole website with all the documents every time.
The best way to handle this situation - is to cleanup the bucket before each deploy. You could leverage "build stage" that you skipped in the tutorial - to issue the AWS CLI S3 command: (https://docs.aws.amazon.com/cli/latest/reference/s3/) aws s3 ... to cleanup the bucket.

Tip: You could improve the smoothness of the process if you enable CloudFront with caching of the S3 as origin for website content. In this case during "website" cleanup on S3 - CloudFront still have the cached version, and users can access that, but after TTL for cache expires on CF - it'll route to S3 origin and files on S3 won't be available. (that's what you expect).

Option 2

You don't delete the whole website on s3, but you compare the diffs on current build with previous tag in Git and looking for files/folders being deleted - and via script generate list of files that needs to be deleted correspondingly from S3.

EXPERT
answered 5 days ago
  • Thanks for the quick response and the options, Aleksandr.

    The issue we will have with Option 1 is that our pipeline is deploying the site to three different environments, so it will look like this: Source > Build > DeployToDev > DeployToQA > DeployToProd

    In this case, we cannot do the cleanup of the three buckets in the Build stage, because it's possible that the transitions to prod have been disabled for QA to do some manual testing. I guess we can have a build stage before each deployment stage, and that can delete the files in the bucket just before the new deployment.

    We already have cloudfront in place, so that would help us to make the deployment smoother.

    Option 2 feels a bit hacky in my opinion, so we will probably go for some variation of option 1.

    Thanks again.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions