- Mais recentes
- Mais votos
- Mais comentários
Your cost is proportional to the amount of data scanned by the query, so whatever scans less data, will be less costly. What I understand from your description, the case A won't have more data scanned than the case B - every bit of data is retrieved once by its ID, difficult to be more efficient than that.
Now, I don't know how much read activity you think there will be - if the difference is couple of dollars per month then maybe you shouldn't bother yourself with cost optimizations a this stage, just implement what you see as easier to maintain in the long run? :)
I know this is an old question but I'll answer it anyway, maybe it'll help somebody :)
With DynamoDB the schema* depends strongly on the access patterns. Given your description I personally would design the table in the following way:
- Hash/Partition key - user Id
- Range/Sort key - either a string "USER" or the device Id (it may make sense to prefix the device Id with some kind of disriminator like DEVICE#, but want to keep it simple)
- The rest of the attributes - more or less like you've described in the first approach, i.e. user item has user's attributes while a device item has device attributes, no duplication.
- DeviceIdToUserId GSI:
- Hash key - device Id
- Range key - user Id
To get a user and her devices you need only one query - just get all items using a query by the user Id. To get a user while only having a device - use the GSI to map device Id to user Id and the get the item that has that hash key and range key equal to "USER". This is achieved by using only a small amount of data duplication in the form of the GSI. One caveat - GSIs are eventually consistent with the main table data, there's no strongly consistent read support on GSIs. It can be solved by using the main table as the index and leveraging transactional writes, but as long as you can handle eventual consistency I would just use GSIs.
(* - I know that NoSQL DBs are called schemaless, but I've read in M. Kleppmann's book "Designing Data-Intensive Applications" that there definitely is a schema, it's just applied on read instead of being enforced on write, which means there's more flexibility. It makes 100% sense to me so I'll say that DynamoDB data has a schema)
You're right. However, this is more of a learning question for me. There is so much emphasis on "reduce number of tables" and "reduce lookups", that I would like to figure out how they want me to do that. The use case isn't that strange either: Imagine 50 employees at a company. Each employee is an item in a table. Then you have Departments, also each as one item. Each department has an attribute called "department employees", which is a set of employees - an array. It also has the department's address, phone number, etc.. Now, if there were a way that I could lookup an employee id, within that set, with a query and not a scan, then I'd immediately be able to find an employee's department and grab the department phone number without a second lookup. Otherwise, I have to first query the employeeID, get the employee's item, find the departmentID, and then lookup the Department, to find out where the department's address is. So, is there a simple way to do this, because if so, I want to learn it. Otherwise, yes, it really isn't a big deal and I should just move on.