Best Practice Guidance Billions of records with a changing status field Dynamo DB

0

I have billions of records each with a message payload pertaining to millions of users, some are 1 to 1 to a user, many more are 1 to many users.

The status field of those records will change frequently.

Being concious of single table design, am I being more efficient storing each message by user with a status field in one table or each message in 1 table and a messageid userid status in another table querying table 2 by user and then getting the messages by id from table 1?

Alternatively, am I using the wrong tool for the job with Dynamo DB?

Rich
asked a year ago541 views
2 Answers
1
Accepted Answer

With DynamoDB you typically store related data together, in denormalized fashion. Making multiple requests to fulfill a single access pattern is generally considered an access pattern.

pkskstatusother
user1message123Activemessage payload
user1message674Activemessage payload
user2message174Removedmessage payload

This simple data model will allow you to retrieve all of the users messages in a single Query request, by specifying the userId:

SELECT * FROM mytable WHERE pk = 'user1'

Now, if a status is related to a user and not a users messages, then you can keep a metadata record for each user, the schema would look like this:

pkskstatusother
user1USER#user1ActiveData
user1MESSAGE#message123message payload
user1MESSAGE#message674message payload
user2USER#user2RemovedData
user2MESSAGE#message174message payload

You can still make the same single request to get the users status and messages. But it gives you a single status item to update, as its not directly related to the messages.

Update on comment

Given that a single message can go to thousands of users, you could model it like this, keeping a message item which holds the content of the message and then having an item with a pointer to the message for each user.

This will allow you to obtain all the message id's for a user, then you would need to do a BatchGetItem to obtain the actual message payloads. But typically for systems like this, and the speed of DynamoDB, the message metadata is stored on the UI, then when a user clicks on the message you fetch the message payload GetItem which will retrieve the message in a couple of milliseconds.

pkskstatusother
message123MESSAGE#message123Activemessage payload
message674MESSAGE#message674Activemessage payload
message174MESSAGE#message174Activemessage payload
user1MESSAGE#message123time sent
user1MESSAGE#message674time sent
user2MESSAGE#message174time sent
profile pictureAWS
EXPERT
answered a year ago
profile picture
EXPERT
reviewed a month ago
  • Thanks Leeroy. My issue is users are unique but many messages are sent to more than one user so I don't fully understand if I should counter-intuitively store multiple copies of those messages (and each message could go to many million users).

    Its the message that has the status. In SQL I'd have messages in one table and userid, messageid, status in another and join the tables.

  • Updated my answer

  • Excellent suggestion. Thank you very much.

0

Hi, DynamoDb is the right tools for such large scale.

The right question is: what are your access patterns?

  1. message id
  2. user id for sender or receivers of both?

When this is known you will be able to create the proper indexes to access your table efficiently.

I strongly recommend DDB book by Alex de Brie: https://www.dynamodbbook.com/

Have a look at his video: https://www.youtube.com/watch?v=h7mH2Bxkc6k

profile pictureAWS
EXPERT
answered a year ago
  • Thank you.

    I need all messages associated with a userid

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions