AWS Kinesis Firehose direct put latency

0

Background: We have a kinesis Firehose configured to send to encrypted S3, it is converting from JSON to Parquet. The source is Direct PUT. The Buffer hints are set with 64MiB and interval of 60 sec. There is no encryption, dynamic partitioning, or deaggregation configured. AWS Glue database / table are stored in a different AWS Account.

Observations: Intermittently (every week or two) we are seeing PutRecord incur a performance delay of 15+ seconds when called from a ecs-fargate application written in Golang. There are more frequent instances 1-5 second response times noted with same symptoms. There are no errors reported in the PutRecord response. The metrics for the firehose show no throttled records, no errors in delivery to S3. Incidents of poor response don't appear correlated with highest volumes (based upon records / bytes converted. Incoming byes, records), and put requests are substantially below limits.

Questions:

  1. What performance expectations are reasonable to expect for DirectPut against Firehose?
  2. What metrics should I view to uncover root cause of these performance delays?
  3. When does the Firehose do its conversion (i.e., when data is written to the firehose does it parse the JSON and convert it to parquet before responding with success, or is it parsing and converting only when being read from the firehose to put in S3)?
  4. Would fronting the Firehose with a Kinesis Datastream likely provide performance improvement?
1 Answer
0

Hello,

  1. What performance expectations are reasonable to expect for DirectPut against Firehose? By default Kinesis Firehose has limits and you can expect the firehose to provide performance within the limits stated.

[+] Amazon Kinesis Data Firehose Quota - https://docs.aws.amazon.com/firehose/latest/dev/limits.html

  1. What metrics should I view to uncover root cause of these performance delays?

The metrics which you can refer are PutRecords.Latency, Incoming Bytes, Incoming Records, PutRequests. I would recommend you to reach out through a Support case so that the your firehose can be reviewed to identify the issue better.

[+] Monitoring Kinesis Data Firehose Using CloudWatch Metrics - https://docs.aws.amazon.com/firehose/latest/dev/monitoring-with-cloudwatch-metrics.html

  1. When does the Firehose do its conversion (i.e., when data is written to the firehose does it parse the JSON and convert it to parquet before responding with success, or is it parsing and converting only when being read from the firehose to put in S3)?

The firehose does not convert the schema itself and the firehose provides the response before the schema conversion process begins.

  1. Would fronting the Firehose with a Kinesis Datastream likely provide performance improvement?

The direct PUT method has a maximum quota of data which it can process as per the limits shared in the above documentation after that it gets throttled. You can raise request to increase the limits by raising request through following link

[+] https://console.aws.amazon.com/support/home#/case/create?issueType=service-limit-increase&limitType=service-code-kinesis-firehose

Whereas when Firehose has origin as Kinesis Data Streams then the throughput/ quota of Kinesis Firehose depends upon the number of shards in the Kinesis Data Stream. Increasing the capacity of Kinesis Data Stream will increase the performance of Kinesis Firehose. So you need to consider above fact and then you can determine on the basis of usecase which is better suited for your usecase.

AWS
SUPPORT ENGINEER
Aman_A
answered 8 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions