AWS Macie based on S3 bucket event

0

Hi,

We want to run Macie job for specific S3 put event .e.g., when user upload file to s3 bucket and let Macie scan the file for sensitive information. Here is plan to achieve this, -- Create Macie ONE_TIME job everytime user upload new file to S3 bucket. Use tags to filter files to be scanned in new Job. -- We expect 400-500 files per months so 400-500 Macie jobs per month.

Technically, I have tested above solution and works fine but not sure if this is best approach. I did not find any limitation on numbers Macie jobs per account. Anyone have tried such option? Any recommendation ?

1 Answer
1

Hi there,

This approach may work for low volumes, with a few caveats.

  • Macie's create-classification-job API has a very low TPS limit (once per 10 seconds with some bursting). This means that if you have multiple uploads in a short time, you will be throttled which will add additional code complexity.
  • S3 has no way of saying "Get me all objects tagged X". In order for Macie to determine which objects to classify based on a tag, it needs to iterate over all objects in a bucket and call S3's get-object-tagging APIs. If your landing zone bucket is large and you are relying on tags, this can cause a lot of extra S3 API calls and delays in job performance.

In general, Macie's jobs are relatively heavyweight, meaning they are optimized for running over large volumes of data and not real time / just-in-time data flows. Some suggestions:

  • Run jobs less frequently based on SLA. For instance, if you have a processing SLA of 4 hours, you can call Macie every 4 hours and process all objects that landed in that time. This leverages Macie's batch efficiencies and will avoid throttling.
  • Use prefixes or other filters besides tagging to identify which objects to classify. For instance, instead of tagging an object with "date:5/19/2023", which can incur the overhead described above, you can put objects in a 2023/19/5 prefix and then scan, as prefix filtering within S3 is extremely efficient.

Hope that helps!

AWS
answered a year ago
profile picture
EXPERT
reviewed a year ago
  • Hi @alatech

    Thank you so much for quick response. I understand Macie has very low TPS but we expect 40-50 max file transfer requests in day. We have lifecycle on S3 object where object get removed in 30 days. so at any given point, S3 bucket will have max 1500 files.

    Macie does not have hourly based schedule option, min/lowest frequency is once in day which does not fulfill our requirement. Since these files support critical use cases so we cannot have delay in process. With all limitation about Macie, I think we will have think about alternate options.

    Do you think it is better to explore alternate option than S3 bucket ?

    Thank Kiran

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions