Is there a "DynamoDb by example" document anywhere

0

Experienced programmer, less experienced database user, beginner AWS user... second-guessing myself to death, so please have patience which what I'm sure is a beginner question.

I'm looking at using DynamoDb as data storage behind my lambda. This will be a small database (under 10K records) and traffic will be low, so I'm undoubtedly overthinking it... but I haven't used NoSQL databases before, and I'm trying to figure out how to map from my conceptual data structures to DynamoDb's indexed-pile-of-mixed-record-types mindset.

The DynamoDb developer's guide (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide) seems to be a good discussion of recommended design principles for this approach, but I'm still having trouble wrapping my head around those relatively abstract recommendations. I think it might help me a lot to see some examples of how people have defined DynamoDb records and keys for specific applications. If those were commented with explanations of why those design decisions were made, that might help even more. And I'm sure I'm not the only one who'd find best-practice examples useful to illuminate the best-practice theoretical discussion.

Does such a collection exist? Haven't found it yet if so.

Context follows, in case anyone cares:

My application is a fairly trivial one: Indexing archives of a radio show for retrieval by episode number, by broadcast date (may be N:1 since rebroadcasts happen), and eventually perhaps by keywords (specifically guests on episodes, N:N since guests may appear multiple times). Since episodes are only one per day, this is a relatively small list -- increasing only at 365 per year, and with the rebroadcasts decades of production still have us under 5000 episodes total.

The obvious data structure for in-memory implementation would be one table mapping unique episode number to episode details (which could include a list of broadcast dates and a list of keywords for that episode), one table mapping unique date to episode number for quick two-step lookup, and a table mapping keywords to lists of episode numbers (followed by list-intersection if multiple keywords are being matched upon). But that doesn't seem to be how DynamoDb wants data handled; the dev guide seems to prefer having all the records (and all the record types?) in a single conceptual table with secondary keys (which act like shadow tables, if I'm understanding this correctly) used both to separate them back out and to perform specific retrievals.

Eventually I may want similar lookup for other shows that overlap this one. Unclear at this time whether that's best handled with a single table having show ID as one of the columns, or separate tables which could be unioned if I want to find all shows for a particular date or with a particular guest.

I suspect that the best solution(s) is/are immediately obvious to an experienced DynamoDb user. But as a beginner I'm having trouble wrapping my head around it. Hence the desire to see how others have handled similar data patterns.

I suppose I should also say that I'm not by any means locked into using DynamoDb. It just seems to be what's most commonly suggested for small-dataset evolving-data applications on AWS. If I'm barking up the wrong tree, pointers to better ones would be appreciated before I invest too much more heavily in this solution.

asked 3 years ago432 views
5 Answers
3

I would recommend you watch Rick Houlihan's Dynamodb office hours youtube videos. Rick models real use cases in each video and he explains each pattern he uses and why you should use them.

When it comes to NoSQL databases you shouldn't think how data is organized but how you will access that data. Plus prioritize those patterns so you can optimize the patterns that are more commonly used. I would recommend you list all your access patterns, like:

  1. fetching an episode by episode number.
  2. fetching all episodes that occurred in given time range.
  3. fetching all episodes which include a given keyword list (this is a tricky one in dynamodb)

Another key thing to take in mind is how partition key is built. You want your partition keys to be as distributed as possible so dynamodb can scale in easily. If you just have one single radio show (with many episodes). It looks to me a good PK here would be the episodeNumber, although that ties you up to have one single radio show.

Since an episode may be broadcasted more than once, I would include a SK based on broadcastedAt (this gives you a bonus pattern, iterate over the different broadcast for a given episode number). Something like:

|pk|sk|attributes| |---------| |<episodeNumber>|Metadata|<episode details>| |<episodeNumber>|<broadcastedAt>|<you could duplicate episode details here depending on how reads/writes happen>|

That will cover your first pattern + the bonus pattern of accessing different broadcasts of the same episode by date.

The second pattern: fetching all episodes that occurred in given time range, will depend on how you will query that range, is it by day? other granularity? I would add a GSI which PK is a day, then within that partition you will have all episodes that occurred that day (if you need query more than one day, then you would need to run parallel queries though).

|pk|sk|gsi1pk|gsi1sk|attributes| |---------| |<episodeNumber>|Metadata|||<episode details>| |<episodeNumber>|<broadcastedAt>|<broadcastedAtDay>|<episodeNumber>|<you could duplicate episode details here depending on how reads/writes happen>|

The third pattern is quite tricky as you don't know in advance how many keywords you have. If your app is a write-once-read-many application, then I would duplicate episode entries in different partitions based on those keywords, so you have data duplicated but optimized for reading. To do so, there are a few things your app must take in mind:

  • writing an episode will be a mix of write/delete items in the database.
  • you must sort keywords at for storage purposes.

|pk|sk|gsi1pk|gsi1sk|attributes| |---------| |<episodeNumber>|Metadata|||<episode details>| |<episodeNumber>|<broadcastedAt>|<broadcastedAtDay>|<episodeNumber>|<you could duplicate episode details here depending on how reads/writes happen>| |<keyword1>|<broadcastedAt>|||<you could duplicate episode details here depending on how reads/writes happen>| |<keyword2>|<broadcastedAt>#<episodeNumber>|||<you could duplicate episode details here depending on how reads/writes happen>| |<keyword1>#<keyword2>|<broadcastedAt>#<episodeNumber>|||<you could duplicate episode details here depending on how reads/writes happen>|

cjuega
answered 3 years ago
2

I would really recommend The DynamoDB book from Alex Debrie.

There is also the cheatsheet with summary of best practices and patterns.

profile picture
MG
answered 3 years ago
1

Consider DynamoDB, explained - A Primer on the DynamoDB NoSQL database. The authors blog also has a number of articles on DynamoDB.

RoB
answered 3 years ago
  • Thanks, reading through that -- it's answered some of my questions, so far. (This shouldn't be hard, I'm just stumbling over the shift in mindset.) And thanks for your patience!

0

Agreed, Rick Houlihan's the man to follow when learning about DynamoDB.

Plenty of AWS tech talks/re:invent content on YouTube, he also makes regular appearances on the "Amazon DynamoDB | Office Hours" thread on the AWS Twitch channel.

answered 3 years ago
0

Hi,

You could also start with a single database document structure:

{
  "EpisodeId": {
    "S": "EP01"
  },
  "Title": {
    "S": "Title"
  },
  "Guests": {
    "SS": [
      "Jacco",
      "John"
    ]
  },
  "Keywords": {
    "SS": [
      "aws"
    ]
  },
  "AiringDates": {
    "SS": [
      "2021-12-12",
      "2021-12-19"
    ]
  }
}

EpisodeId would be the partition key.

All necessary query operations can be easily performed using a Scan. You will always get the full details of the episode in one operation.

The API to access the data should be of more concern:

createEpisode episodeId, airDates, guests, keywords deleteEpisode episodeId getEpisode episodeId addAirDate episodeId, airDate removeAirDate episodeId, airDate addGuest episodeId, guest removeGuest episodeId, guest addKeyword episodeId, keyword removeKeyword episodeId, keyword getEpisodesByAirDate airdate getEpisodesByKeyword keyword getEpisodesByGuest guest

If you database grows and feel it is not performing any more or that you pay too much for the scans you can switch to using a more complicated database design. The API can stay the same.

One probable improvement you might consider doing right away is using a separate table for the guests. And store their IDs in the episode table instead of the names. The API could use BatchGetItem if you want to return the details of the guests when getting episodes (potentially caching the guests).

Going for a more complicated single-table database design is actually for access optimization which in this case might be immature.

Regards, Jacco

profile picture
JaccoPK
answered 3 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions