Puzzling math when calculating record size when writing to S3 from Kinesis Firehose

0

Hi there, I'm looking at my CloudWatch logs to try to estimate the average size of a record delivered to S3 via my firehose instance. Here's my math: I look at Records delivered to Amazon S3 (Sum) at two places in time (t1, t2) and see 363 records and 746 records respectively. However if i look at the Bytes delivered to Amazon S3 (Sum) metric i see 1600 KB and and 2300KB. Assuming all records are identical and no transformations are applied on my end, shouldn't there be a linear relationship between the two variables? ie: shouldn't Bytes delivered to Amazon S3 (Sum) at t2 be ~3200KB? Thanks for reading.

1 Answer
3

You are correct that if all records are identical in size and no transformations are applied, there should be a linear relationship between the number of records and the total bytes delivered. However, there are a few factors that can cause the relationship to be non-linear, such as:

Record size variation: If the size of the records varies, it can cause the total bytes delivered to not have a linear relationship with the number of records. In this case, it is possible that the average record size is different between t1 and t2.

Data compression: Amazon Kinesis Data Firehose automatically compresses the data before delivering it to the S3 bucket. The compression ratio can vary depending on the nature of the data. If the data delivered between t1 and t2 have different compression characteristics, this can cause the non-linear relationship you observed.

Aggregation and batching: Kinesis Data Firehose aggregates multiple records into a single object before delivering it to the S3 bucket. The size of the aggregated object might not be a simple sum of the sizes of the individual records, as there can be some overhead associated with the aggregation process.

To estimate the average size of a record, you can calculate the difference in bytes and the difference in the number of records between t1 and t2, then divide the bytes difference by the records difference:

(2300 KB - 1600 KB) / (746 records - 363 records)

However, keep in mind that the actual average record size might still vary due to the factors mentioned above.

profile picture
EXPERT
answered a year ago
  • Thank you for answering!! Hmm ok so for:

    1. Record sizes are obviously varying in size however I figured over hundreds of records they would average out to about the same at any measured instance in time, will have to look into it.
    2. I am not telling firehose to compress the data.
    3. I am not aggregating objects, 1 record per object (for now).
    4. [edit] nvm i got it. its been a while.

    So it would seem i made an incorrect assumption on 1?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions