DynamoDb bug on python boto3 query with GSI
Assume that there are 3 rows in dynamodb as below
coin_transaction_id | coin_amount | customer_id | transaction_time |
---|---|---|---|
027072a5-fdf3-48ed-b36e-90a317d63c31 | 80 | bfbd221a-bc1f-4e96-943f-d809bb5e5af3 | 2022-06-15T18:52:40.044239+00:00 |
f074d831-5791-4ba1-a596-ab114d14eaac | 40 | bfbd221a-bc1f-4e96-943f-d809bb5e5af3 | 2022-06-15T17:42:36.830859+00:00 |
adafd372-03de-437a-887e-9027007517a2 | 160 | bfbd221a-bc1f-4e96-943f-d809bb5e5af3 | 2022-06-15T17:42:36.830859+00:00 |
GSI Partition Key : customer_id Sort key : transaction_time
when query data by python boto 3 in lambda with below code, the row 2, which GSI is same as row 3, cannot be retrieved. Result return is only row 1 and 3 can be retrieved.
coinResp = coinTable.query(
IndexName = "customer_id-transaction_time-index",
KeyConditionExpression = 'customer_id = :ci and transaction_time between :date1 and :date2',
ExpressionAttributeValues = {
':ci' : customerID,
':date1' : startDate,
':date2' : endDate,
},
Limit = 1,
ExclusiveStartKey = receivedLastEvaluatedKey,
ScanIndexForward = False,
)
Is this cosidered as a bug?
Here you are making an assumption that row 2 in your base table is going to be row 2 in your GSI, however that is not the case. Items in your base table are grouped by the partition key and ordered by the sort key. Whereas items in your GSI can have multiple items with the same partition/sort key combo, meaning the order cannot be guaranteed.
As to your issue, you are setting the LEK to ""
in your code, causing you to miss the last record in the response. Try the below code snippet:
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('cust1')
customerID = 'bfbd221a-bc1f-4e96-943f-d809bb5e5af3'
startDate = '2022-06-14T17:42:36.830859+00:00'
endDate = '2022-06-16T18:52:40.044239+00:00'
def processItems(items):
print(items)
kwargs = {
'IndexName' : "my-index",
'KeyConditionExpression' : 'customer_id = :ci and transaction_time between :date1 and :date2',
'ExpressionAttributeValues' : {
':ci' : customerID,
':date1' : startDate,
':date2' : endDate,
},
'Limit' : 1,
'ScanIndexForward' : False
}
while True:
response = table.query( **kwargs )
processItems( response[ u'Items' ] )
if 'LastEvaluatedKey' in response:
kwargs[ 'ExclusiveStartKey' ] = response[ 'LastEvaluatedKey' ]
else:
break
I don't see a bug here, you are using Limit 1
and also an ExclusiveStartKey
. Can you share the same code snippet with the values for your variables, that will help me determine the issue. Also, please share your Boto3 and Python version.
Here is the full code in lambda function (BOTO3 and Python 3.9)
import simplejson as json
import boto3
from boto3.dynamodb.conditions import Key
def lambda_handler(event, context):
# TODO implement
try:
dynamodb = boto3.resource('dynamodb')
customerID = event['pathParameters']['customer-id']
startDate = (event['queryStringParameters']).get('start_date')
endDate = (event['queryStringParameters']).get('end_date')
LKCTI = (event['queryStringParameters']).get('lkcti')
LKCI = (event['queryStringParameters']).get('lkci')
LKTT = (event['queryStringParameters']).get('lktt')
receivedLastEvaluatedKey = {}
if LKCTI and LKCI and LKTT:
receivedLastEvaluatedKey['coin_transaction_id'] = LKCTI
receivedLastEvaluatedKey['customer_id'] = LKCI
receivedLastEvaluatedKey['transaction_time'] = LKTT
coinTable = dynamodb.Table('customer-coin-transaction')
coinResp = {}
if LKCTI and LKCI and LKTT:
#created date is needed for sorting by created date
coinResp = coinTable.query(
IndexName = "customer_id-transaction_time-index",
KeyConditionExpression = 'customer_id = :ci and transaction_time between :date1 and :date2',
ExpressionAttributeValues = {
':ci' : customerID,
':date1' : startDate,
':date2' : endDate,
},
Limit = 1,
ExclusiveStartKey = receivedLastEvaluatedKey,
ScanIndexForward = False,
)
else:
coinResp = coinTable.query(
IndexName = "customer_id-transaction_time-index",
KeyConditionExpression = 'customer_id = :ci and transaction_time between :date1 and :date2',
ExpressionAttributeValues = {
':ci' : customerID,
':date1' : startDate,
':date2' : endDate,
},
Limit = 1,
ScanIndexForward = False,
)
if 'Items' in coinResp:
lastEvaluatedKey = ""
if 'LastEvaluatedKey' in coinResp:
lastEvaluatedKey = coinResp['LastEvaluatedKey']
result = {
"last_evaluated_key" : lastEvaluatedKey,
"last_evaluated_key_old" : receivedLastEvaluatedKey,
"items" : coinResp['Items'],
}
return {
'statusCode': 200,
'body': json.dumps(result)
}
else:
return {
'statusCode': 200,
'body': json.dumps('')
}
return {
'statusCode': 400,
'body': json.dumps('failed')
}
except Exception as e:
return {
"statusCode": 400,
"body": str(e),
}
I call this service from a frontend, pagination only 1 row for each call with descending sorting, from first call with empty last evaluated key .
The first call result is first row with last evaluated key which is the row #1 returned.
and, when I make the second call with last evaluated key which is the row #1 (result from the first call), a result returned from the second call is row #3 with row #3 is last evalauted key returned.
and, if I make the third call with last evalauted key from second call, which is the row #3, the third call result is row #4.
Problem is the row #2 is skipped when customer_id and transaction_time is the same value as row #3. (Expected result for each call is 1->2->3->4->.... but given result for each call is 1->3->4->..... )
Row #2 will not be skipped if its transaction_time, sorting key, is different from row #3.
Relevant questions
Amazon DynamoDB across account access with VPC endpoints
Accepted Answerasked 3 years agoUnable to query dynamodb table in Athena using DynamoDB Connector
Accepted Answerasked 2 years agoAmazon Athena error on querying DynamoDB exported data
asked 4 days agoDynamodb: Deleting all items & Read Write Capacity Query
asked 6 months agoAppsync GSI not producing existing DynamoDB record by Graphql using amplify
Accepted Answerasked 6 months agoAppsync composite key (GSI in dynamodb) returns Null in query response
asked 2 months agoDynamoDB vs Aurora
Accepted Answerasked 7 months agoDynamoDb bug on python boto3 query with GSI
Accepted Answerasked 19 days agoDynamoDB Hierarchical, Sorted Queries
Accepted Answerasked 4 months agoAWS DynamoDB tables and GSI
Accepted Answerasked 3 years ago
The code snippet is in the below answer.