Hierarchical data plus condition - Table Design

0

Hi all,

I'm having problems designing my DynamoDB table to support a fairly simple access pattern. I hope you can help me a little bit :)

I have 4 different products types (A, B, C and D) that have a price and a location (country#state#city).

The access patterns are:

(1) Filter by product type
(2) Filter by product type and location
(3) Filter by product type, location and price

The problem is that a product located in USA#NY#NY must be also available in USA#NY and USA.

To be able to filter by location and price I came up with this solution, however, a lot of data gets duplicated and I am sure there must be a much better solution:

PK |     SK & GSI PK     | GSI SK | Other product details (duplicated data)
-----------------------------------------------------------------------------------
ID | TYPE                | PRICE | Image, name, etc
ID | TYPE#USA            | PRICE | Image, name, etc
ID | TYPE#USA#NY         | PRICE | Image, name, etc
ID | TYPE#USA#NY#NY      | PRICE | Image, name, etc

This solves every access pattern:

(1) GSI PK = TYPE

(2) GSI PK = TYPE#USA
GSI PK = TYPE#USA#NY
GSI PK = TYPE#USA#NY#NY

(3) GSI PK = TYPE#USA & GSI SK > 150
GSI PK = TYPE#USA#NY & GSI SK > 150

Reads are efficient but a lot of data gets duplicated (price and product details) and updating an item requires multiple writes.

Is it possible to achieve this without duplicating all the product details?

Edit: I don't want to use filter expressions on the price as it is the most common filter pattern and I'd like to make it efficient and cheap

Edited by: jexposito on Aug 1, 2019 9:53 AM

已提問 5 年前檢視次數 208 次
3 個答案
0
已接受的答案

I'll show you two alternate options (below). A few things before we dive into them.
First, looking at your schema, each item has about 5 attributes, so I expect your items are average 40B-50B each? With that assumption, we know 1RCU can read 4K data (or 8K if you are okay with eventual consistency). At 1RCU per 4K, you can read, say, 100 items with 1 RCU. If on an average, each of your queries will return 100 items, then you can go with your current schema, and filter out the items you don't need.
Second, instead of duplicating the items by implementing that logic in your application code, you can leverage GSIs. See option#2 below.
With that, please review the following. Your business use case and shape of your data will drive which option works best for you.
==Alternate option#1==
Table-PK | Table-SK | GSI-PK | GSI-SK | Price
abc | xyz | TYPE | USA#NY#NY#PRICE | $x

(1) Filter by product type
GSI-PK = TYPE

(2) Filter by product type and location
GSI-PK = TYPE, GSI-SK starts with USA
GSI-PK = TYPE, GSI-SK starts with USA#NY
GSI-PK = TYPE, GSI-SK starts with USA#NY#NY

(3) Filter by product type, location and price
GSI-PK = TYPE, GSI-SK starts with USA, filter (price)
GSI-PK = TYPE, GSI-SK starts with USA#NY, filter (price)
GSI-PK = TYPE, GSI-SK > USA#NY#NY#PRICE

==Alternate option#2==
Table-PK | Table-SK | GSI-PK | GSI1-SK | GSI2-SK | GSI3-SK
abc | xyz | TYPE | USA#PRICE | USA#NY#PRICE | USA#NY#NY#PRICE

(1) Filter by product type
GSI-PK = TYPE

(2) Filter by product type and location
GSI-PK = TYPE, GSI3-SK starts with USA
GSI-PK = TYPE, GSI3-SK starts with USA#NY
GSI-PK = TYPE, GSI3-SK starts with USA#NY#NY

(3) Filter by product type, location and price
GSI-PK = TYPE, GSI1-SK > USA#PRICE
GSI-PK = TYPE, GSI2-SK > USA#NY#PRICE
GSI-PK = TYPE, GSI3-SK > USA#NY#NY#PRICE

AWS
已回答 5 年前
0

Thank you so much for your answer @padma-aws it was very helpful.

You are right about the cost of filtering and I'll definitely consider it for other use cases in my app.

However, I find the second solution much better as it also allows me to sort by price (expensive/cheap first) and the first one doesn't.

已回答 5 年前
0

Glad it is helpful. Have a nice day.

AWS
已回答 5 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南