CloudFront realtime log missing more than 50%

1

I have set up an environment to process CloudFront Realtime log. I have a Kinesis stream with on-demand shard and a lambda function to process log in batch of 10k. I am using wrk to generate load.

At first lambda was getting logs for most of the requests. so, I increased the amount of requests per second. Then I realized it takes some time for kinesis shards to keep up with increasing numbers of requests.

After that 40-60% logs did not arrived at lambda. Then I used SubscribeToShard on my local machine. In my first test with 1 million request, I had lost only 1.7k requests, In my 2nd test, I have lost 2.8k requests in 2 million requests. Then it started to behave like lambda 40%-60% log missing. Even when testing with 100 request, 50% log missing.

How to avoid this kind of behavior? How to quickly recover from this kind of situation?

Edit: I get missing logs even with 100/200 requests. So, it is not shard capacity. When this log missing issue started, it was consistent to every tests I have done. I have checked, I have enough lambda concurrency capacity and memory usage is okay. I have tested with batch size 100/1k/10k in lambda. I have also monitored wrk concurrency and number of requests.

  • I was also looking into this just now and ran requests with a counter, in kinesis the first 50 records arrive as: 2, 4, 6, 14, 17, 21, 23, 26, 28, 30, 34, 36, 39, 42, 43, 44, 46, 49, 50, which is < 50% of the requests.

    Which seems very odd since docs say: "The log entry for a particular request might be delivered long after the request was actually processed and, in rare cases, a log entry might not be delivered at all."

1 Answer
0

This is indeed a complex scenario, and it can be affected by a number of factors, including the capacity of your Kinesis stream, the configuration of your Lambda function, and the load generated by the wrk tool.

Here are a few things that might help:

  1. Kinesis Stream Shards: Remember that the total capacity of a Kinesis stream is the sum of the capacities of its shards. So, if your stream has a single shard, it can handle up to 1 MB or 1000 records of data per second for writes. If the incoming data exceeds this limit, Kinesis Data Streams throttles the data sources. In this case, you might want to add more shards to your stream.

  2. Lambda Throttling: AWS Lambda might also be throttling your function. If the number of invocation requests exceeds the available concurrency, AWS Lambda might throttle the function. To solve this, you can either increase the reserved concurrency for the function, or optimize your function to complete more quickly.

  3. Batch size and Timeout: Be aware that larger batch sizes can result in partial processing. For example, if a batch contains 10,000 records and your Lambda function only processes 9,000 records before it times out, all 10,000 records will be returned to the stream. You might need to adjust your batch size or increase your function timeout.

  4. Understand CloudFront Real-time Logs: As per the AWS documentation, CloudFront real-time logs are designed to deliver log entries typically within seconds of a viewer request, but delivery time can be longer. Some log entries might be delivered long after the viewer request was processed, and in rare cases, some log entries might not be delivered at all.

In order to understand more about what's going on, you may need to add some debugging or logging to your Lambda function, or use AWS CloudWatch or AWS X-Ray to get more visibility into how your function is behaving.

Keep in mind that wrk tool may also be contributing to the issue if it's generating more traffic than your resources can handle. Adjusting the load might be necessary to fit within the limits of your resources.

Lastly, if the problem persists, you might want to consider reaching out to AWS Support for more direct assistance with this issue. They might be able to provide more detailed insight into what's happening with your resources.

profile picture
EXPERT
answered 10 months ago
  • In my case I sent 1 request to cloudfront per second on a fresh query string, and didn't involve a lambda of any kind on origin. The realtime logs in Kinesis pretty consistently has gaps of 2-3 records. This is on an on-demand Kinesis stream, and there's no other traffic happening here. It seems like records are being dropped even under the most trivial circumstances.

  • CloudFront delivers access logs on a best-effort basis.

  • There's a huge difference between "best effort" meaning whatever happens happens and documentation that states "[..] in rare cases, a log entry might not be delivered at all." < 50% delivery of logs isn't rare under any definition.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions