Lambda to Dynamo Write: Intermittently missing batch data

0

Hi,

I am using AWS Lamba to read data from web and store it in dynamoDB at periodic intervals during the day. Post which, I take the daily dump from DynamoDB and get a CSV file is S3 for daily reporting.

This had been working perfectly fine and I was getting data for all batches during the day. But off late my S3 dump seems to have gaps and I am missing data for different batches. And these missing batches are different for different days.

I am not getting any error message - so, just wanted to get some guidance on how to go about debugging this.

Below are stats for my dyanmoDB table:

Item count: 235,267 Table size: 65.6 megabytes Average item size: 279.03 bytes Provisioned read capacity units: 1 Provisioned write capacity units: 1

Also below is snapshot of key operational metrics for the table: https://ibb.co/TPhWqBj https://ibb.co/jVgRS93

Regards, Dbeings

4 Answers
1

You are being throttled for exceeding your provisioned capacity. It working for a couple of months does not mean it will always work, it will not scale as the more data you add, the more RCU you will consume. You also mentioned you deleted CloudWatch Alarms, you likely have Autoscaling on your table and you deleted the alarms required for triggering scaling actions.

In essence, you need to increase the capacity on your table. For free tier, you get 25 RCU free per region. You can divide those 25 units across all tables in a region anyway you like.

I would also add try/except block to your code and log any exceptions you may receive so you can better understand what is happening when things go wrong.

profile pictureAWS
EXPERT
answered 2 years ago
  • Hi Leeory,

    Thanks that clears it up. Below is screenshot for my read capacity setting and I have auto scaling enabled. Did deleting CloudWatch alarms mess up the auto scaling? Should I increase minimum read capacity to 4 as the max in RCU graph was 2.81 if I dont want to enable CloudWatch?

    https://ibb.co/LNxgFwg

    More importantly, is there any way I can complete my task with current read capacity. Though I write to table at every 5 mins. during day time my read is just once a day. The read fetches about 6000 records and I am fine with fairly slow performance for this read - as this is just for daily MIS based on one time CSV extract.

    Regards, Dbeings

0

Based on the screenshot attached, it shows your read-requests are getting throttled. Try increasing RCU more than 1 or on-demand to see how much is the usage capacity now and then provision.

profile pictureAWS
irshad
answered 2 years ago
0

Hi Irshad, Thanks for your response. I can increase RCU but a bit reluctant as I am new on AWS free tier and have faced some issue with AWS charges when I change defaults. This was working perfectly fine for more than a month. The only change that I did recently was deleting Cloudwatch alarms, which I think should not impact read / write operation to Dynamo.

My requirements are fairly basic: I am not doing any reads during the day and its just a write operation by Lambda at different batches. The read operation is just once a day when I pull entire day's data from the Dynamo DB and put it in a S3 bucket. I am not really bothered about the performance for the read as this is just once a day and all I need is a day's data. Below is the python script for the read operation that is being used in Lambda.

def getDynamoData():

dynamodb = boto3.resource('dynamodb', region_name = 'us-east-2')

table = dynamodb.Table('MyTable')

mystr = ''

df_list = pd.DataFrame(data={'ItemName':[],'Metric'[],'UpdateTime':[],'MettricID':[]

})

done = False
start_key = None
 
row_num = 0
page_no = 0
mydate = date.today()
mydateStr = str(mydate.year) +  "-" + str(mydate.month) +  "-" + str(mydate.day)  

while not done:   
    if start_key:
        response = table.query(IndexName="feedDate-index", KeyConditionExpression=Key('feedDate').eq(mydateStr),ExclusiveStartKey=start_key)
    else:
        response = table.query(IndexName="feedDate-index", KeyConditionExpression=Key('feedDate').eq(mydateStr))
        
    items = response.get('Items', [])
    
    page_no = page_no+1
    item_i = 0
    if len(items) == 0:
        break
    while item_i<len(items):
        for key in items[item_i]:
            df_list.loc[row_num,key] = str(items[item_i][key])
        item_i = item_i+ 1
        row_num = row_num+1
    start_key = response.get('LastEvaluatedKey', None)
    done = start_key is None
lastItem = df_list.iloc[dfLen-1]['ItemName']

s3 = boto3.client('s3')
fileName = 's3://mybucket/MyFile' + mydateStr + '.csv'
wr.s3.to_csv(df_list, fileName, index=False)
dbeing
answered 2 years ago
0

Hi,

I increased RCU to minimum 5 units but still facing the issue. I observed that issue is not at the time of reading but while writing to the table. The value are missing form DyanmoDB itself. I had escalated this through a ticket to AWS team and they indicated that they could find errors using internal tools in Lambda function for said timeslots where values were missing.

But I am not able to see any thing in Lambda logs. The last log is from Feb 1 and the issue started happening in April. There appear to be some error indicated by the graphs under lambda metrics but not able to get details on them. Here are screenshots for key lambda metrics and logs for the said function.

https://ibb.co/SBmrHY0 https://ibb.co/0m0Lsqd

Is my Dynamo table size could be an issue? I am planning to move the write to a new table to rule this out. Below are stats to my dynamo Table.

Item count: 268,759 Table size: 75 megabytes Average item size279.19 bytes

I would really appreciate if anyone could please provide some pointers here.

Regards, Dbeings

dbeing
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions