Scan entire dynamo db and update records based on a condition

0

Hi,

We have a business requirement to deprecate certain field values. So we need to scan the entire db and find these deprecated field values and take the last record of that partition key(as there can be duplicate records for the same partition key), then update the row. Right now the table contains around 600k records. What's the best way to do this without bringing down the db service?
This is a one time activity. As this will take time, we cannot run this in a lambda since it will exceed 15minutes.

질문됨 4년 전1528회 조회
4개 답변
0

Hello - if I understand correctly, you have a table with composite primary key (partition key and sort key), so the content is a large number of item collections - each in sorted order. You want to find the last item in each collection, and if it has a particular attribute you want to update that last record to remove the deprecated attribute.

You should be able to do this in a controlled manner using Scan - to avoid competing with throughput you use for regular production load, use the Limit parameter to Scan in small pages. Loop through this and introduce a suitable delay in each iteration of the loop. In your loop, identify the first item you see for a new item collection - the item right before it is the last in the prior item collection. Check to see if it has the deprecated attribute, and if it does, issue an UpdateItem call where the update expression applies the REMOVE function for the deprecated attribute - you can condition this on the item still existing (just to make sure some other thread has not already deleted it).

Conceptually I think this process should work - I would encourage you to test outside of production first of course.

답변함 4년 전
0

Thanks for the inputs PeteNaylor-AWS. Can you also give an idea where we would need to run this? Should we go for Lambdas?

답변함 4년 전
0

I'm not really an expert on compute choices sorry, but I would imagine this can be solved in an EC2 instance, a container, or perhaps broken out across many Lambda invocations. Depending on the use case, my go-to would be to construct a tool (perhaps a Python script since that's familiar to me) and run it on an EC2 instance. There are many options and approaches available - the best choice in tooling might just be what you are most comfortable with.

답변함 4년 전
0

Thanks PeteNaylor-AWS, We discussed as a team and thought of EC2 running python script too.
Thanks a lot for your inputs

답변함 4년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠