Skip to content

How to determine which record was updated when primary key is updated

1

Hi there, I've been playing around with AWS DMS and Aurora PostgreSQL serverless and I've come across a bit of a problem. I've demonstrated it on a table with a composite primary key, but I suspect it also affects singular PKs as well.

I'm working with a northwind database schema that applies a PK constraint on order_id and product_id in the order_details table. If I insert a record into this table, and then update the record to change the product_id, I get two rows in my CDC output that I can't tie together.

For example

BEGIN;
-- insert new order '30004'
INSERT INTO orders VALUES(30004, 'VINET', ..., 'France');
COMMIT;

BEGIN;
-- insert an order detail for this order and product_id '1'
INSERT INTO order_details VALUES(30004,1,11,12,0);
COMMIT;

BEGIN;
-- insert an order detail for this order and product_id '2'
INSERT INTO order_details VALUES(30004,2,11,12,0);

-- update the order detail for product_id '1' to product_id '66'
UPDATE order_details SET product_id = 66 WHERE order_id = 30004 AND product_id = 1;

-- insert an order detail for this order and product_id '3'
INSERT INTO order_details VALUES(30004,3,11,12,0);
COMMIT;

BEGIN;
-- update the order details for product_id '2' to product_id '77'
UPDATE order_details SET product_id = 77 WHERE order_id = 30004 AND product_id = 2;
COMMIT;

Results in the following database state

order_idproduct_idquantity
300046612
300047712
30004312

But I get this out of my CDC feed

cdc_operationtransaction_commit_timestamporder_idproduct_idquantity
I2024-06-12 14:43:09.32529930004112
I2024-06-12 14:43:09.70280930004212
U2024-06-12 14:43:09.834742300046612
I2024-06-12 14:43:10.04897130004312
U2024-06-12 14:43:10.589195300047712

I can't see any way of knowing that the UPDATE row with product_id 66 was an update of the row originally with product_id 1. I can't see any settings to capture any more information.

I have written up a more in-depth article on this CDC issue on my blog https://tempered.works/posts/2024/06/12/breaking-cdc-with-primary-keys/

Have I missed something? Is this a known property documented somewhere? Can you confirm that you would get the same behaviour with singular PKs? (I've confirmed single-column PKs do exhibit the same behaviour)

Thanks,

Paul

1 Answer
2

Hi Paul,

Using AWS Database Migration Service (DMS), with Postgres as source and S3 as target, you can use the add-before-image-columns transformation rule to add the before image of the primary key columns to the S3 output file.

Consider the below table definition:

create table order_details (
order_id int, 
product_id int, 
qty int, 
primary key(order_id, product_id)
);

In the DMS task, we can use the below table mapping rule:

	    "rules": [
	        {
	            "rule-type": "selection",
	            "rule-id": "1",
	            "rule-name": "1",
	            "object-locator": {
	                "schema-name": "public",
	                "table-name": "order_details"
	            },
	            "rule-action": "include"
	        },
	        {
	            "rule-type": "transformation",
	            "rule-id": "2",
	            "rule-name": "2",
	            "rule-target": "column",
	            "object-locator": {
	                "schema-name": "public",
	                "table-name": "order_details"
	            },
	            "rule-action": "add-before-image-columns",
	            "before-image-def": {
	                "column-prefix": "BI_",
	                "column-suffix": "",
	                "column-filter": "pk-only"
	            }
	        }
	    ]
	}

Once the task is running, perform the below changes on the source database:

INSERT INTO order_details VALUES(30004,2,12);
UPDATE order_details SET product_id = 66 WHERE order_id = 30004 AND product_id = 1;
INSERT INTO order_details VALUES(30004,3,13);

The csv file output is as follows:

I,30004,1,11,,
I,30004,2,12,,
U,30004,66,11,30004,1
I,30004,3,13,,

In the above output, we can see the before image columns named BI_order_id and BI_product_id added to the output. They contain the original primary key values before it was updated.

The add-before-image-columns transformation rule also works for single column primary keys.

The important elements in the table mappings are:

  1. "rule-action": "add-before-image-columns" This attribute specifies that we are adding before image columns

  2. before-image-def - This parameter defines a naming convention to identify the before-image columns and specifies a filter to identify which source columns can have before-image columns created for them on the target. Example:

                "column-prefix": "BI_",
                "column-suffix": "",
                "column-filter": "pk-only"

For more details, more options for before-image-def and sample transformation rules, please refer to the below document: [-] Transformation rules and actions - Values - https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Transformations.html#CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Transformations.Values

AWS
SUPPORT ENGINEER
answered 2 years ago
EXPERT
reviewed 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.