Support Ordinal Position Mismatches in cdc Replication - Prefer Column Name Instead.

0

Greetings,

There is a Many to one database replication from many on-prem databases to one RDS database instance (Full-Load, Ongoing). In order to support a blue/green type of replication, table mapping transformations rules have been added to "include-column". This allows fields added to the source databases to be ignored from replication and that works great.

Another problem is that there are over 200 source databases, with most of them having been generated from the same model database. Some of the older databases created 20+ years ago have the same columns but minor changes in some of the columns ordinal positioning in relation to the target RDS database.

I have been looking at transformation rules to add-column and remove-column, however I can't seem to find a way to put priority on the column name and not ordinal position.

For example, with the following Source Databases:

DB_Coke.dbo.TableX
.CreatedBy int
CreatedDate datetime

...

DB_Pepsi.dbo.TableX
CreatedDate datetime
.CreatedBy int

Does there exist a built-in transformation strategy that will allow full-load and ongoing replication tasks to feed both source tables into the single target table below?

RDS_Database.dbo.TableX
.CreatedBy int
CreatedDate datetime
1 Answer
0

Hello There,

Based on the requirement shared, I have tried the replicate the same at my end and was able to achieve the below.

  • Created a database (employee) at the source instance with two tables X & Y.

    1. Create table tableX(createdby int, createddate datetime);
    2. Create table tableY(createddate datetime, createdby int);
    • Setup DMS RI & endpoints.

    • Configured the task with below mapping rule. * Selection rule: where schema name is like 'employee' and Source table name is like 'table%', include

      • Transformation rules: where schema name is like 'employee' and Source table name is like 'table%', rename
        • Action:Rename:FinalData
      • This would create a new table 'finaldata', in the target database 'employee'.
    • To test this further, I have executed below insert queries.

         * Insert into tableY values(CURRENT_TIME(),20);
             * Insert into tableY values(CURRENT_TIME(),20);
             * Insert into tableX values('12',CURRENT_TIME());
             * Insert into tableX values('12',CURRENT_TIME());
      
    • On the destination, the misplaced columns are handled, as indicated in the below result.

          select * from finaldata;
      
createdbycreateddate
202023-02-18 07:43:31
202023-02-18 07:43:35
122023-02-18 07:43:46
122023-02-18 07:43:52

Hence, I would suggest you to try the above setup at your end and test it further.

AWS
SUPPORT ENGINEER
answered a year ago
  • Thanks for the post it is very helpful. This might work for full load. However, when the CDC files land in the cdc-file folder there is no column name. I am not seeing where, in the transformation above, the column order has been reversed for TableY to match the expected structure in the target, so no errors occur during cdc ongoing replication.

  • Just to be clear all databases have a “Full-Load and Ongoing” Task that pulls data from on-prem to S3 targets. So, in the example above there would be four S3 target locations: S3:Bucket/Coke/dbo/TableX, S3:Bucket/Pepsi/dbo/TableX, S3:Bucket/Coke/cdc-files S3:Bucket/Pepsi/cdc-files

    These buckets above are also the Source of an S3->Single RDS instance where a SourceID column is added to every table with the value of “Coke” or “Pepsi” or whatever other database is in the mix.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions