DynamoDB query between dates

0

I'm trying to implement a query between two dates but from what I understood here https://stackoverflow.com/a/38790120/10586800 and from many other sources it's not possible. You must pass an id to filter with to use query.
The only way to only filter between dates is to use scan... then I thought, based on my table structure, save all the different sourceName at a different table that will hold all my sourceName and when I will want to query my table between dates, I will first query for all my sourceName from that table and then I will query my main table with each sourceName.

I'm just trying to understand if it's really better then scan or maybe there is a better option?

Here is my table structure:

Primary partition key recordID (String)

createdAt: Number,  
data: Map,  
recordId: String,  
sourceName: String,  
updatedAt: Number  

example:

All this process works on lambda, I'm getting from the user lastUpdateFrom (number) and lastUpdateTo (number)

then I want to do something like this:

 let params = {  
        TableName: process.env.RECORDS_TABLE_NAME,  
        IndexName: 'updatedAt-index',  
        KeyConditionExpression: "updatedAt BETWEEN :lastUpdateFrom AND :lastUpdateTo",  
        ExpressionAttributeValues: {  
          ":lastUpdateFrom":{  
            N: lastUpdateFrom,  
          },  
          ":lastUpdateTo":{  
            N: lastUpdateTo,  
          }  
        },  
        ProjectionExpression: "recordID",  
      };  

But it's impossible, then what I thought is to fetch all the sources and query each source with the lastUpdateFrom and lastUpdateTo and using query with the index I created (partition key sourceName and sort key updatedAt)

example:

const sources = await getAllRecords();  
const dbCalls = \[];  
for (var i = 0; i < sources.Count; i++) {  
   dbCalls.push(getRecordsBetweenDatesFilteredWithSource(sources.Items\[i].sourceName.S, lastUpdateFrom, lastUpdateTo, skip, limit));  
}  
return Promise.all(dbCalls);  

and then my query will look like this:

let params = {  
        TableName: process.env.RECORDS_TABLE_NAME,  
        IndexName: 'sourceName-updatedAt-index',  
        KeyConditionExpression: "sourceName = :sn AND updatedAt BETWEEN :lastUpdateFrom AND :lastUpdateTo",  
        ExpressionAttributeValues: {  
          ":lastUpdateFrom":{  
            N: lastUpdateFrom,  
          },  
          ":lastUpdateTo":{  
            N: lastUpdateTo,  
          },  
          ":sn":{S: sourceName}  
        },  
        ProjectionExpression: "recordID",  
};  

Edited by: ronmar on Oct 8, 2020 11:52 AM

Edited by: ronmar on Oct 8, 2020 11:53 AM

Edited by: ronmar on Oct 8, 2020 11:53 AM

ronmar
asked 4 years ago12583 views
2 Answers
0

You're right about the options in this regard. You can use a Scan with a filter expression to fetch records with update time within your range. This option is not very selective or efficient as you'll consume capacity to read the entire table. But if this is not a frequent query pattern it might be a reasonable choice.

The other option if you want to support this access pattern at higher frequency without inefficiency is to collect all the items into a single item collection (same value of partition key attribute) so that you can Query and use a key condition expression to select your time range within the sort key value. This limits scalability though as a single item collection like this can only be expected to support up to 1000 write units per second or 3000 read units per second. You can improve this by distributing across a known set of item collections - just as you've done in your second design scenario. This pattern is discussed in the documentation here: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-indexes-gsi-sharding.html In this way, you "scatter" the write traffic across a number of different item collections (so DynamoDB can scale horizontally across multiple partitions, and then you "gather" by making multiple Query calls across all your item collections.

answered 4 years ago
profile picture
EXPERT
reviewed 2 months ago
0

Then if I have about 10 reads and 70 inserts in a minute the first method (using the same value of partition key) will work for me.. right?

The problem that I came across with the second solution (distributing across a known set of item collections) that you cant use skip so easily, you will need to send each request the skip value...

ronmar
answered 4 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions