How to make AWS Lambda logs GDPR compliant?

0

If an AWS Lambda function performs logging in order debug activities performed by a user, assuming a user performs a Right to Erasure Request, there does not seem to be a way to carve out log entries even if an AWS Lambda function invocation only performs operations for a single user. This is because all AWS Lambda functions dump data in common log streams, which may only be deleted as a whole.

This effectively makes the logging functionality of AWS Lambda functions non-GDPR compliant. At most it can only be used to store only logs that do not assist in tracking and debugging user-related activities, e.g. assuming a user wishes to trace why certain changes took place for their account.

Is there any advice on how to alternately perform user-related logging in AWS Lambda functions so that user-related logs may be subsequently deleted on demand?

Update - clarification on GDPR requirements The Right to Erasure Request GDPR requirement mandates that GDPR compliant software must allow all traces of personally identifiable information related to a particular person must be removable. Assuming an AWS Lambda function performs user-driven operations such as data retrieval or modification, any related logs that are stored in order to assist in tracing any future debugging or audit trail cannot be complied with due to the hard limitation that AWS Lambda functions store logs in log streams shared with other requests to the same AWS Lambda virtual machine.

Potential Solution: As there is no system to excise log entries related to a single AWS Function invocation seemingly the only alternative to remain compliant is to use pseudonymization and erase the correspondence between the user identifiers and the pseudonym used in the logs. This would mean the logs would stay but the entries related to the erased user would not be traceable back to the user.

  • Hi NicM.

    What would be the requirements to make the log GDPR compliant? I'm not familiar with the details so if you could list what the expectation is it would be easier to formulate a possible solution.

  • @Jose Guay - please see update, thanks!

2 Answers
1
Accepted Answer

NicM,

Without knowing how your application works and how your Lambda function logs information, I would think the following might be worth looking into:

  • Check CloudWatch Log Streams and Log Groups and how to programmatically add logs to CloudWatch Logs so you have full control of what gets logged.
  • Try to tie some sort of identifier to the user and add the identifier to each and every log entry related to the user. It could very well be the username or other type of id. If this is not in existence you could probably use a DynamoDB table to store it for quick reference.
  • In CloudWatch Logs you can set a retention period if you don't need the logs after some time so they get discarded automatically.
  • Identifying each log entry with a link to the user will greatly simplify the identification and removal of the log entries when needed.

I hope this helps.

profile pictureAWS
EXPERT
answered a year ago
1

This issue doesnt sit with Lambda, it can be anything that logs to cloudwatch log groups of which lambda stores its logs here. I have seen other services output data to CW logs.

I guess the best option is that you have is to ensure your applications does not log personal information to logs and to omit this data when writing logs. If users need to be outputted into Logs then ensure masked information such as GUID's are captured in logs which in turn can be looked up if required. The application could encrypt/mask the data stored in the logs here also. I would focus more on the reason why there is PII information being captured in logs?

The other options are to set a short retention period for the logs so that the data is purged. You could have a specific KMS key for the real sensitive information and encrypt these cloudwatch logs with the KMS key that is limited to a small number of IAM users or services.

profile picture
EXPERT
answered a year ago
profile picture
EXPERT
reviewed a year ago
  • Data must be stored in logs for audit trail purposes and in order to assist with R&D debugging of certain behaviors related to a user's activity. This includes data such as IP Addresses and user identifiers. In case of storing 'masked' information, the masking must be done in such a way that there is no way to trace the masked identifiers to the real identifiers such as thru a separate service that stores the references.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions