By default, Amazon CloudFront standard logs capture sensitive data for some of its fields. Due to privacy concerns, I want to remove this part of the logs.
Note: This article uses the example of Client-IP (c-ip) field.
CloudFront logs capture c-ip as one of its fields by default. There are three ways to remove c-ip from your logs.
- Trigger an AWS Lambda function that removes the field on the log delivery into Amazon Simple Storage Service (Amazon S3).
- Have an Amazon Elastic Compute Cloud (Amazon EC2) process that runs at certain intervals to remove the field.
- Use CloudFront real-time logs, and remove the sensitive field before you send the log data to Amazon S3.
Trigger a Lambda function
One way to remove the c-ip field is to use Amazon S3 notification events. When CloudFront delivers the log file into the Amazon S3 bucket, configure your bucket to trigger a Lambda function.
Create a Lambda function
1. Open the AWS Lambda console.
2. Under Functions, create a new Lambda function that has the following configurations:
- Uses the object name from the Amazon S3 event.
- Gets the object from the S3 bucket.
3. Remove the c-ip column, or replace the values with anonymized data.
Note: Replace the values to keep the same format in case you have other applications process the logs further.
4. Save and upload the log back to Amazon S3.
Create a new event
1. In the logs target bucket, go to Properties.
2. Under Event notifications, create a new event.
3. Select the event type Put, and the destination Lambda function.
4. Select the Lambda function created in step 1, and then choose Save.
Important: To avoid a recursive invocation (infinite loop) with your Lambda function, perform the following actions:
- Have your CloudFront logs delivered to an initial staging prefix. For example, "original".
- Have the Amazon S3 event triggered on that prefix only.
- Have the Lambda function deliver the logs into a different prefix. For example, "processed".
If you deliver the logs into the same prefix, the Lambda function triggers again and creates a recursive invocation. For more information, see Avoiding recursive invocation with Amazon S3 and AWS Lambda.
Note: To keep Amazon S3 costs low, set up an Amazon S3 Lifecycle policy to expire the original logs after a certain time period.
Have an Amazon EC2 process
Use Amazon EventBridge to create a scheduled rule (cron) that launches an EC2 instance and processes the log files at a scheduled recurrence. For example, one time per day. When the process is done, stop the EC2 instance until the next recurrence to save on costs.
1. Configure EventBridge and Lambda to start an EC2 instance at a given time. For more information, see How do I stop and start Amazon EC2 instances at regular intervals using Lambda?
2. On the EC2 instance, deploy a code that'll download the logs for a certain time period. For example, a full day. Remove the c-ip column to process the logs, or replace the column values with anonymized data. Upload the processed logs back to the S3 bucket.
Optional: Merge all the processed logs into a single file to save on Amazon S3 Lifecycle transitions costs. This process is helpful if you intend to store the logs for long time periods.
Use Kinesis Data Firehose
Use CloudFront real-time logs to select the fields that you want to save. Later, have Amazon Kinesis Data Firehose send the log data to Amazon S3.
When you configure CloudFront real-time logs, a list of fields that are included in each real-time log record is available. Each log record contains up to 40 fields. You have the option to receive all the available fields or only the fields that you must monitor and analyze performance. Deactivate the field c-ip to exclude the field from your logs.
Note: Due to the use of Amazon Kinesis Data Streams, this option can get expensive. Consider the other two options (Trigger a Lambda function or Have an Amazon EC2 process) for a more cost-effective solution.