Scan entire dynamo db and update records based on a condition

0

Hi,

We have a business requirement to deprecate certain field values. So we need to scan the entire db and find these deprecated field values and take the last record of that partition key(as there can be duplicate records for the same partition key), then update the row. Right now the table contains around 600k records. What's the best way to do this without bringing down the db service?
This is a one time activity. As this will take time, we cannot run this in a lambda since it will exceed 15minutes.

asked 4 years ago1516 views
4 Answers
0

Hello - if I understand correctly, you have a table with composite primary key (partition key and sort key), so the content is a large number of item collections - each in sorted order. You want to find the last item in each collection, and if it has a particular attribute you want to update that last record to remove the deprecated attribute.

You should be able to do this in a controlled manner using Scan - to avoid competing with throughput you use for regular production load, use the Limit parameter to Scan in small pages. Loop through this and introduce a suitable delay in each iteration of the loop. In your loop, identify the first item you see for a new item collection - the item right before it is the last in the prior item collection. Check to see if it has the deprecated attribute, and if it does, issue an UpdateItem call where the update expression applies the REMOVE function for the deprecated attribute - you can condition this on the item still existing (just to make sure some other thread has not already deleted it).

Conceptually I think this process should work - I would encourage you to test outside of production first of course.

answered 4 years ago
0

Thanks for the inputs PeteNaylor-AWS. Can you also give an idea where we would need to run this? Should we go for Lambdas?

answered 4 years ago
0

I'm not really an expert on compute choices sorry, but I would imagine this can be solved in an EC2 instance, a container, or perhaps broken out across many Lambda invocations. Depending on the use case, my go-to would be to construct a tool (perhaps a Python script since that's familiar to me) and run it on an EC2 instance. There are many options and approaches available - the best choice in tooling might just be what you are most comfortable with.

answered 4 years ago
0

Thanks PeteNaylor-AWS, We discussed as a team and thought of EC2 running python script too.
Thanks a lot for your inputs

answered 4 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions