Puzzling math when calculating record size when writing to S3 from Kinesis Firehose

0

Hi there, I'm looking at my CloudWatch logs to try to estimate the average size of a record delivered to S3 via my firehose instance. Here's my math: I look at Records delivered to Amazon S3 (Sum) at two places in time (t1, t2) and see 363 records and 746 records respectively. However if i look at the Bytes delivered to Amazon S3 (Sum) metric i see 1600 KB and and 2300KB. Assuming all records are identical and no transformations are applied on my end, shouldn't there be a linear relationship between the two variables? ie: shouldn't Bytes delivered to Amazon S3 (Sum) at t2 be ~3200KB? Thanks for reading.

1개 답변
3

You are correct that if all records are identical in size and no transformations are applied, there should be a linear relationship between the number of records and the total bytes delivered. However, there are a few factors that can cause the relationship to be non-linear, such as:

Record size variation: If the size of the records varies, it can cause the total bytes delivered to not have a linear relationship with the number of records. In this case, it is possible that the average record size is different between t1 and t2.

Data compression: Amazon Kinesis Data Firehose automatically compresses the data before delivering it to the S3 bucket. The compression ratio can vary depending on the nature of the data. If the data delivered between t1 and t2 have different compression characteristics, this can cause the non-linear relationship you observed.

Aggregation and batching: Kinesis Data Firehose aggregates multiple records into a single object before delivering it to the S3 bucket. The size of the aggregated object might not be a simple sum of the sizes of the individual records, as there can be some overhead associated with the aggregation process.

To estimate the average size of a record, you can calculate the difference in bytes and the difference in the number of records between t1 and t2, then divide the bytes difference by the records difference:

(2300 KB - 1600 KB) / (746 records - 363 records)

However, keep in mind that the actual average record size might still vary due to the factors mentioned above.

profile picture
전문가
답변함 일 년 전
  • Thank you for answering!! Hmm ok so for:

    1. Record sizes are obviously varying in size however I figured over hundreds of records they would average out to about the same at any measured instance in time, will have to look into it.
    2. I am not telling firehose to compress the data.
    3. I am not aggregating objects, 1 record per object (for now).
    4. [edit] nvm i got it. its been a while.

    So it would seem i made an incorrect assumption on 1?

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠