Dynamo DB Design For Frequent Access Pattern Dependant On External ID

0

I have a service which is expected to see ~13mil requests per year. ~4mil requests are for creating new records whilst ~9mil will be requests to update existing records.

The identifier is provided by an external service but I need to decouple this ID from the internal ID we use for this service (and other services that reference the ID, it cannot be dependant on a 3rd parties ID).

My table design PK and SK looks like the following:

PK: THING#123
SK: THING
att1: 'Hello'
att2: 'World'

and

PK: EXTERNALTHING#456
SK: EXTERNALTHING
thingId: 123

I am looking for the most efficient design, currently this relies on between 2-3 commands depending on whether it's a new item or existing and wondering whether there is a better approach.

e.g.

  1. HTTP request is made from 3rd party { externalThing: '456', att1: 'Hello', att2: 'World' }
  2. uuid() is generated (lets call it 123 then PutItem command issued to Dynamo DB { PK: 'EXTERNALTHING#456', SK: 'EXTERNALTHING', thingId: 123 } which attempts to register our internal id
  3. IF the above command fails due to already exists then GetItem command is issued to Dynamo DB to read the item and extract thingId
  4. PutItem command issued to Dynamo DB { PK: 'THING#123', SK: 'THING', att1: 'Hello', att2: 'World' }

Any suggestions here to reduce the number of commands would be welcome, the key element is we must abstract the incoming ID that the external caller will have in their system and generate an internal id we can use everywhere (why is this important, well, we want to support multiple external caller organisations each with their own ID's we can't alter)

asked a year ago232 views
1 Answer
1

There are two ways I can think of doing this:

  1. Use a Global Secondary Index for each of the incoming (foreign) ids that you're going to use. That way you can efficiently (i.e. quickly from a code perspective) query the table using that id and it will directly pull out the item that you need. However, there is additional cost here because there is some extra storage required; and the indexes consume capacity independently of the main table. Note that there is a limit of 20 GSIs per table.
  2. For each "primary" item in the table that you retrieve using the internal id; create an extra item in the table which has the external id as the partition key and an attribute which contains the internal id. This will allow you to create as many external id to internal id mappings as you require. The downside here is that two queries are required to retrieve the primary item: You first take the supplied (external) id and query for that; then take the result and query for the "primary" item which holds the rest of the details. Additional cost because of storage and because of extra queries; lookups also take longer because of the additional query.

There are probably other ways of doing this but those were the first that popped to mind.

profile pictureAWS
EXPERT
answered a year ago
  • I fear 1. has the issues surrounding eventual consistency, the rate of change is expected to be relatively high and not under my control so this feels like it would work "most of the time" but be error prone.

    I've actually gone for a 3rd option which is composing a system ID that is deterministic off the incoming external id plus some extra composite keys that uniquely identify it, it makes for an ugly PK but it is very efficient achieving single query read/writes, thanks.

  • Yup - that is a great idea; my assumption from the question was that it wasn't possible to do that given that the external id was not under your control; and somehow I got the idea there were multiple external parties so was overdesigning...

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions