Solution to delete new ECR images, from PutImage actions, that contain CRITICAL vulnerabilites

0

I'm trying to create a solution to capture new ECR image push, check ECR scan result to see if there is any CRITICAL finding, if it does then delete the image (based on image tag).

There are existing 'bad' images that we can't cleanup yet, that's why I need this solution to only delete newly created images.

Workflow will be like this:

Docker push -> EventBridge captures PutImage API and trigger lambda -> Lambda function parses the input data, get the image ID, calls DescribeImageScanFindings API to get CRITICAL finding count, if > 1 then proceed to delete that image.

Now here's the problem: When new image gets pushed up and lambda checks for scan result, it is either not available yet or still in progress (CRITICAL finding not yet found), and there is no way to tell if scan is still running. In the response from DescribeImageScanFindings API, there is a property called imageScanStatus, but it's the status of scan configuration. The value is always 'ACTIVE' for me. If I keep calling DescribeImageScanFindings API, the results change each time and imageScanStatus is still ACTIVE.

We're using continuous scanning and StartImageScan API is disabled, so there is no way to scan on demand and check status.

Is there a way to get the scan status after new image gets pushed up?

2 Answers
0

Hi,

Since you have continuous scanning enabled, you can change the approach a little bit. Rather than trigger an Event Bridge -> Lambda on Image push you can trigger the same logic on completion of scan. Details about Event Bridge event on completion of scan can be found in the section - "Event for an initial image scan (enhanced scanning)" in this document. It gives a nice summary of findings by severity.

--Syd

profile picture
Syd
answered 2 years ago
  • Thanks for your suggestion, I tried to use this event pattern from inspector2 "Inspector2 Scan" and it looks usable for this scan-status "INITIAL_SCAN_COMPLETE".

    Do you also happen to know if "INITIAL_SCAN_COMPLETE" only on newly pushed images? I'm worried that this event might get triggered on existing images that could cause Lambda to delete them (old images), which is not ideal for me right now. I only want to delete new images with findings.

  • I'm not sure if the INITIAL_SCAN_COMPLETE is applicable only for new images or also applies to images already in repo but being scanned for first time. But you can always lookup the image creation date (imagePushedAt value in describe images) and delete only images / tags pushed to repo after specific date. Not the perfect answer but should help achieve the desired result.

  • Update - Did some basic testing on a ECR repo which had two images. One manually scanned and other not. After switching to Continuous scanning both the images were scanned by Inspector and I had INITIAL_SCAN_COMPLETE events for both. So I infer that you can expect one time INITIAL_SCAN_COMPLETE for already existing images which were not scanned by AWS Inspector.

  • Hi Syd,

    This 'almost' works for my use case, there's still 1 problem with the functionality: When I push a new image with 2 different tags (latest and a random name, usually commit sha), only 1 of them received INITIAL_SCAN_COMPLETE treatment, that means the other image is left unchecked by my Lambda. The goal is to prevent images with Critical findings from being stored in ECS.

  • To test, I pushed three images with diff tags to the same repo and received INITIAL_SCAN_COMPLETE events for all three. I can only think of two reasons 1) delay in initiation of scan by AWS but that should happen eventually. 2) The image that was pushed to the repo is only a new tag of the existing image with some other tag. ie the SHA matches that of an existing image. Other than that even I'm clueless ;)

0

About 2/3 months before continuous scanning was a thing, I had written ECR Scan Reporter | docs which I use mostly with the ECS definitions scanning (so I didn't scan all the images, only the ones I used).

I specifically used SQS as the middle man here so that I can perform retries when there are too many API calls (which in repositories with 100s of images, definitely did get). But also that way I am truly waiting on events, there is no direct / sync dependencies between the trigger of the scan, the execution of the scan, and the evaluation of the scan.

profile picture
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions