Using DMS, while Replicating data from S3 to RDS, table and schema details are not available in csv file.

0

We are trying to Replicate data from RDS in one account RDS in other account by Masking PII data. As part of this we are following below approach

Step 1: Replicate data from Source RDS to S3 by masking PII data (In this we remove PII data columns and add new columns with masked data) for e.g.: If Email column needs to be masked, then we will remove email column and new column will be created with email_mask

Step 2: Replicate data from s3 to Target RDS by renaming newly added column names of masked data with original columns name (i.e., change column name from email_mask to email)

The issue we are facing is,

While Replicating Data, if we are using s3 as source and doing CDC, the CDC files(.csv) should be below format (i.e., Table name and schema name should be 2nd and 3rd values) INSERT,employee,hr,101,Smith,Bob,2014-06-04,New York (table name: employee, schema name: hr)

But while replicating from RDS to S3, we will not have those values added in .csv files. Do we have any specific Transformation rules to add those details in that specific position before replicating it back to S3?

1 Answer
0

To add the table name, schema name and column details to the CSV files generated while replicating from RDS to S3, you can use the following approach:

  • Configure the data replication job from RDS to S3 using AWS Database Migration Service.
  • In the transformation settings of the job, map the table name, schema name and column names as new fields that will be added to each record in the CSV.
  • For example, you can extract the table and schema details from the query used to fetch records from the RDS database. And column names can be hard coded.
  • These new fields along with the actual data values will be written to each record in the generated CSV files during the replication process.
  • The CSV files stored in S3 will now contain the additional context fields providing table, schema and column details for each record.
  • This enriched format can be consumed downstream for loading back to another database while retaining the schema level context for each record.
profile picture
EXPERT
answered 2 months ago
  • Hi Giovanni Lauria,

    Thanks for the response. During Replication from RDS to S3 using DMS, if I am trying to add table name and schema name using add-column transformation rules, the table name and schema name are getting added at last like below

    I,C901,2024-02-29 02:04:24,d94EE059335E5, TEMP_CONTENT,OWN Here TEMP_CONTENT is table name and OWN is schema name.

    But as per rule when using S3 as source, The table name should be in 2nd index and schema name should be in 3rd index Expected: I,TEMP_CONTENT,OWN,C901,2024-02-29 02:04:24,d94EE059335E5

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions