How to delete old files deployed in S3 with CodePipeline

1

Hi, we have a CodePipeline that we use to deploy our static website. We have it configured to be triggered whenever a zip file is dropped in a specific S3 location, following this: https://docs.aws.amazon.com/codepipeline/latest/userguide/tutorials-s3deploy.html The problem we have is that we publish some documents to the website, which after some time will have to be deleted from it. With the current behaviour, we will push a new zip file without those files that have to be deleted, but since the pipeline will not delete those files, they will still be available.

e.g.

  1. We will deploy our website with a new job post. When the user goes to https://www.my-website.com/careers/senior-developer it will access the file careers/senior-developer.html in our bucket.
  2. A few weeks later, redeploy the website removing that job post. Because the pipeline doesn't delete any files, a user could still go to https://www.my-website.com/careers/senior-developer and still access the file.

What's the recommended way of dealing with this issue?

Thanks in advance

已提问 2 年前2933 查看次数
3 回答
1

Yes - files are not deleted automatically from S3 as per tutorial Note: "Even if you delete files from the source repository, the S3 deploy action does not delete S3 objects corresponding to deleted files."

Depending on the availability requirements of the website (need more details).

Option 1:

Assumption: you publish the whole website with all the documents every time.
The best way to handle this situation - is to cleanup the bucket before each deploy. You could leverage "build stage" that you skipped in the tutorial - to issue the [AWS CLI S3 command]: (https://docs.aws.amazon.com/cli/latest/reference/s3/) aws s3 ... to cleanup the bucket.

Tip: You could improve the smoothness of the process if you enable CloudFront with caching of the S3 as origin for website content. In this case during "website" cleanup on S3 - CloudFront still have the cached version, and users can access that, but after TTL for cache expires on CF - it'll route to S3 origin and files on S3 won't be available. (that's what you expect).

Option 2

You don't delete the whole website on s3, but you compare the diffs on current build with previous tag in Git and looking for files/folders being deleted - and via script generate list of files that needs to be deleted correspondingly from S3.

AWS
专家
已回答 2 年前
  • Thanks for the quick response and the options, Aleksandr.

    The issue we will have with Option 1 is that our pipeline is deploying the site to three different environments, so it will look like this: Source > Build > DeployToDev > DeployToQA > DeployToProd

    In this case, we cannot do the cleanup of the three buckets in the Build stage, because it's possible that the transitions to prod have been disabled for QA to do some manual testing. I guess we can have a build stage before each deployment stage, and that can delete the files in the bucket just before the new deployment.

    We already have cloudfront in place, so that would help us to make the deployment smoother.

    Option 2 feels a bit hacky in my opinion, so we will probably go for some variation of option 1.

    Thanks again.

1

I had to create a custom AWS construct to replace S3DeployAction to deploy to S3 and delete old files. The construct diffs the changes using a lambda function to delete old files and upload new files to S3. You can use the Typescript construct from here.

已回答 1 年前
0

I have been working using AWS Amplify as a contained solution and alternative for a static website. Every commit/PR on the repository starts a pipeline. It is important to understand storage size cost in comparison with S3 and service costs. There is a quick tutorial at > https://aws.amazon.com/getting-started/hands-on/host-static-website/ if you are interested trying it out. Hope that helps!

profile pictureAWS
已回答 1 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则