How to post separate messages to Kinesis

0

Hi, I'm using a Kinesis + Firehose to write the received messages to S3 with each message represented as a single file. However, I found that all messages I write to Kineses are get serialised in a single file, instead of separate files. What could be causing this behaviour?

Context: I'm using .NET client to generate and write JSON messages to Kinesis

Example:

// post JSON messages from jsonObjects array
foreach(var jsonMessage in jsonObjects)
{
    var postMessage = PutKinesisData(awsClient, jsonMessage);
}

PutRecordResponse PutKinesisData(AmazonKinesisClient client, JObject jsonObject)
{
    var jsonString = JsonConvert.SerializeObject(jsonObject);
    byte[] messageObj = Encoding.UTF8.GetBytes(jsonString);

    var request = new PutRecordRequest();
    request.StreamName = KINESIS_STREAM_NAME;
    request.PartitionKey = PARTITION_KEY;
    request.Data = new MemoryStream(messageObj);

    var response = client.PutRecordAsync(request).Result;
    return response;
}

As per documentation, the PutRecord was supposed to post single messages, but for some reason, I'm ending up with a single file in S3 that contains all my JSON messages combined

WHY IS IT THE ISSUE: as per current behaviour my array of JSON is being written to the single S3 file as {json1}{json2} which makes the JSON file incorrect, instead of separate JSON files

Michael
asked 2 years ago747 views
1 Answer
0

When you add data to your Kinesis Data Stream every record is persisted in the stream as a single record. With PutRecord() you are effectively adding a single message to the stream.

Kinesis Data Firehose is the easiest way to load streaming data into data stores. With Kinesis Data Firehose you create Delivery Streams. A Delivery Stream has a Source to read data from, and a Destination to deliver the streaming data. In your case the Kinesis Data Stream is the source, and S3 is the Destination. But it does not write the data from the Data Stream as one record per file.

The frequency of data delivery to Amazon S3 is determined by the S3 buffer size and buffer interval value you configured for your Delivery Stream. Kinesis Data Firehose buffers incoming data before delivering it to Amazon S3. You can configure the values for S3 buffer size (1 MB to 128 MB) or buffer interval (60 to 900 seconds), and the condition satisfied first triggers data delivery to Amazon S3. This writes1 file to S3 with all the records in the buffer, and starts again buffering records until the condition for buffer size or buffer interval is met again.

You can find more information on Kinesis Data Firehose streaming concepts and how it uses Data Sources and Data Delivery in the Kinesis Data Firehose FAQs.

AWS
answered 2 years ago
  • ok I see, it makes sense. But there seems to be a flaw in the way Kinesis Firehose buffer behaves, as it simply appends all received messages together, without any delimited, breaking their structure. for the text files it might be not a critical issue, but sending binaries would make them all invalid when saving to S3 because their integrity is lost due to combining them all togeher

    Any advise how to tackle such situation?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions