Test Data Management tool for file Anonymization in AWS

0

Hi All,

I'm looking for a Test Data management (TDM) tool in AWS which can perform below requirements,

  1. TDM tool to Connect Production S3 bucket to extract files for anonymization and load it in Test S3 bucket

  2. Job scheduled on daily basis to anonymize files in Prod S3 and store files in Test S3 bucket

  3. Identify PII columns from S3 file and anonymize it, later these files are loaded in redshift database

  4. Data integrity should be maintained between file and database, for example, incremental daily data should able to match the existing mocked PII columns in the redshift database

Kindly let me know how can I achieve above requirements using AWS services

Thanks & Regards, Aflah

2 Answers
1

AWS Glue Databrew could help to achieve your objective. It offers different functions to mask data:

  1. It can connect to Amazon S3 and many other data stores including Amazon Redshift to read source data and to write the output data.
  2. Databrew Job can be scheduled and managed by the service or integrated with AWS StepFunctions and other workflow services
  3. it does include a PII detection feature
  4. some of the data masking techniques that can be used provide repeatable output

you can read more in this blog post.

hope this helps.

AWS
EXPERT
answered 2 years ago
0

Hi, Good question

you could use a service called Comprehend to detect PII and replace it. An example workshop is located at https://github.com/karchit/s3-object-lambda-workshop/tree/Lab2

profile picture
Sri
answered 2 years ago
  • The solution provided is functionally correct but it offers limited redaction capabilities and it might not fully answer the question use case.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions