- Newest
- Most votes
- Most comments
This is indeed a complex scenario, and it can be affected by a number of factors, including the capacity of your Kinesis stream, the configuration of your Lambda function, and the load generated by the wrk
tool.
Here are a few things that might help:
-
Kinesis Stream Shards: Remember that the total capacity of a Kinesis stream is the sum of the capacities of its shards. So, if your stream has a single shard, it can handle up to 1 MB or 1000 records of data per second for writes. If the incoming data exceeds this limit, Kinesis Data Streams throttles the data sources. In this case, you might want to add more shards to your stream.
-
Lambda Throttling: AWS Lambda might also be throttling your function. If the number of invocation requests exceeds the available concurrency, AWS Lambda might throttle the function. To solve this, you can either increase the reserved concurrency for the function, or optimize your function to complete more quickly.
-
Batch size and Timeout: Be aware that larger batch sizes can result in partial processing. For example, if a batch contains 10,000 records and your Lambda function only processes 9,000 records before it times out, all 10,000 records will be returned to the stream. You might need to adjust your batch size or increase your function timeout.
-
Understand CloudFront Real-time Logs: As per the AWS documentation, CloudFront real-time logs are designed to deliver log entries typically within seconds of a viewer request, but delivery time can be longer. Some log entries might be delivered long after the viewer request was processed, and in rare cases, some log entries might not be delivered at all.
In order to understand more about what's going on, you may need to add some debugging or logging to your Lambda function, or use AWS CloudWatch or AWS X-Ray to get more visibility into how your function is behaving.
Keep in mind that wrk
tool may also be contributing to the issue if it's generating more traffic than your resources can handle. Adjusting the load might be necessary to fit within the limits of your resources.
Lastly, if the problem persists, you might want to consider reaching out to AWS Support for more direct assistance with this issue. They might be able to provide more detailed insight into what's happening with your resources.
In my case I sent 1 request to cloudfront per second on a fresh query string, and didn't involve a lambda of any kind on origin. The realtime logs in Kinesis pretty consistently has gaps of 2-3 records. This is on an on-demand Kinesis stream, and there's no other traffic happening here. It seems like records are being dropped even under the most trivial circumstances.
CloudFront delivers access logs on a best-effort basis.
There's a huge difference between "best effort" meaning whatever happens happens and documentation that states "[..] in rare cases, a log entry might not be delivered at all." < 50% delivery of logs isn't rare under any definition.
Relevant content
- asked a year ago
- Accepted Answerasked 9 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 2 years ago
I was also looking into this just now and ran requests with a counter, in kinesis the first 50 records arrive as: 2, 4, 6, 14, 17, 21, 23, 26, 28, 30, 34, 36, 39, 42, 43, 44, 46, 49, 50, which is < 50% of the requests.
Which seems very odd since docs say: "The log entry for a particular request might be delivered long after the request was actually processed and, in rare cases, a log entry might not be delivered at all."