DynamoDb bug on python boto3 query with GSI

0

Assume that there are 3 rows in dynamodb as below

coin_transaction_idcoin_amountcustomer_idtransaction_time
027072a5-fdf3-48ed-b36e-90a317d63c3180bfbd221a-bc1f-4e96-943f-d809bb5e5af32022-06-15T18:52:40.044239+00:00
f074d831-5791-4ba1-a596-ab114d14eaac40bfbd221a-bc1f-4e96-943f-d809bb5e5af32022-06-15T17:42:36.830859+00:00
adafd372-03de-437a-887e-9027007517a2160bfbd221a-bc1f-4e96-943f-d809bb5e5af32022-06-15T17:42:36.830859+00:00

GSI Partition Key : customer_id Sort key : transaction_time

when query data by python boto 3 in lambda with below code, the row 2, which GSI is same as row 3, cannot be retrieved. Result return is only row 1 and 3 can be retrieved.

      coinResp = coinTable.query(
                IndexName = "customer_id-transaction_time-index",
                KeyConditionExpression = 'customer_id = :ci and transaction_time between :date1 and :date2',
                ExpressionAttributeValues = {
                    ':ci' : customerID,
                    ':date1' : startDate,
                    ':date2' : endDate,
                },
                Limit = 1,
                ExclusiveStartKey = receivedLastEvaluatedKey,
                ScanIndexForward = False,
            )

Is this cosidered as a bug?

xclbr
asked 2 years ago1059 views
3 Answers
1
Accepted Answer

Here you are making an assumption that row 2 in your base table is going to be row 2 in your GSI, however that is not the case. Items in your base table are grouped by the partition key and ordered by the sort key. Whereas items in your GSI can have multiple items with the same partition/sort key combo, meaning the order cannot be guaranteed.

As to your issue, you are setting the LEK to "" in your code, causing you to miss the last record in the response. Try the below code snippet:

import boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('cust1')
customerID = 'bfbd221a-bc1f-4e96-943f-d809bb5e5af3'
startDate = '2022-06-14T17:42:36.830859+00:00'
endDate = '2022-06-16T18:52:40.044239+00:00'

def processItems(items):
    print(items)

kwargs = {
    'IndexName' : "my-index",
    'KeyConditionExpression' : 'customer_id = :ci and transaction_time between :date1 and :date2',
    'ExpressionAttributeValues' : {
        ':ci' : customerID,
        ':date1' : startDate,
        ':date2' : endDate,
    },
    'Limit' : 1,
    'ScanIndexForward' : False
}
   


while True:
    response = table.query( **kwargs )
    processItems( response[ u'Items' ] )
    if 'LastEvaluatedKey' in response:
        kwargs[ 'ExclusiveStartKey' ] = response[ 'LastEvaluatedKey' ]
    else:
        break
profile pictureAWS
EXPERT
answered 2 years ago
1

I don't see a bug here, you are using Limit 1 and also an ExclusiveStartKey. Can you share the same code snippet with the values for your variables, that will help me determine the issue. Also, please share your Boto3 and Python version.

profile pictureAWS
EXPERT
answered 2 years ago
  • The code snippet is in the below answer.

0

Here is the full code in lambda function (BOTO3 and Python 3.9)

import simplejson as json
import boto3
from boto3.dynamodb.conditions import Key

def lambda_handler(event, context):
    # TODO implement
    
    
    try:
        
        dynamodb = boto3.resource('dynamodb')
        customerID = event['pathParameters']['customer-id']
        startDate = (event['queryStringParameters']).get('start_date')
        endDate = (event['queryStringParameters']).get('end_date')
        LKCTI = (event['queryStringParameters']).get('lkcti')
        LKCI = (event['queryStringParameters']).get('lkci')
        LKTT = (event['queryStringParameters']).get('lktt')
        
        
        receivedLastEvaluatedKey = {}
        
        if LKCTI and LKCI and LKTT:
            receivedLastEvaluatedKey['coin_transaction_id'] = LKCTI
            receivedLastEvaluatedKey['customer_id'] = LKCI
            receivedLastEvaluatedKey['transaction_time'] = LKTT
            
        
            
        coinTable = dynamodb.Table('customer-coin-transaction')
        coinResp = {}
        if LKCTI and LKCI and LKTT:
            #created date is needed for sorting by created date
            coinResp = coinTable.query(
                IndexName = "customer_id-transaction_time-index",
                KeyConditionExpression = 'customer_id = :ci and transaction_time between :date1 and :date2',
                ExpressionAttributeValues = {
                    ':ci' : customerID,
                    ':date1' : startDate,
                    ':date2' : endDate,
                },
                Limit = 1,
                ExclusiveStartKey = receivedLastEvaluatedKey,
                ScanIndexForward = False,
            )
        else:
            coinResp = coinTable.query(
                IndexName = "customer_id-transaction_time-index",
                KeyConditionExpression = 'customer_id = :ci and transaction_time between :date1 and :date2',
                ExpressionAttributeValues = {
                    ':ci' : customerID,
                    ':date1' : startDate,
                    ':date2' : endDate,
                },
                Limit = 1,
                ScanIndexForward = False,
            )
        
        
        if 'Items' in coinResp:
            lastEvaluatedKey = ""
            
            if 'LastEvaluatedKey' in coinResp:
                lastEvaluatedKey = coinResp['LastEvaluatedKey']
            
            result = {
                "last_evaluated_key" : lastEvaluatedKey,
                "last_evaluated_key_old" : receivedLastEvaluatedKey,
                "items" : coinResp['Items'],
            }
            
            return {
                'statusCode': 200,
                'body': json.dumps(result)
            }
        
        else:
            return {
                'statusCode': 200,
                'body': json.dumps('')
            }
            
        return {
            'statusCode': 400,
            'body': json.dumps('failed')
        }

    except Exception as e:
        return {
            "statusCode": 400,
            "body": str(e),
        }

I call this service from a frontend, pagination only 1 row for each call with descending sorting, from first call with empty last evaluated key .

The first call result is first row with last evaluated key which is the row #1 returned.

and, when I make the second call with last evaluated key which is the row #1 (result from the first call), a result returned from the second call is row #3 with row #3 is last evalauted key returned.

and, if I make the third call with last evalauted key from second call, which is the row #3, the third call result is row #4.

Problem is the row #2 is skipped when customer_id and transaction_time is the same value as row #3. (Expected result for each call is 1->2->3->4->.... but given result for each call is 1->3->4->..... )

Row #2 will not be skipped if its transaction_time, sorting key, is different from row #3.

xclbr
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions