Skip to content

About AWS Kinesis Product Stack

0

I'm comming from the Azure World and trying to understand the different roles each product plays in a real time orchestation. So, I would be gratefull if you can give some guidance in some comparisons and provide me some documentation to explore AWS Kinesis Stack:

  • AWS Kinesis Data Stream / Azure Event Hubs: Message queuer
  • AWS Kinesis Firehose / Azure Stream Analytics: Stream processor and Kinesis Data Stream Integration with other products
  • AWS Kinesis Data Analytics: / Azure Stream Analytics: Stream processor to process real time data in motion.

I'm a little consufed in the boundaries between Firehose and Data Analytics.

Besides, I created a SAM project where a Lambda Functions was triggered based on inputs on a Kinesis Data Stream. When doing this:

  • A Data Stream tigguered the Lambda. In this case, since the Lambda was associated to the Data Stream, I assume that there must exist some kind of "default consumer group" (as it exists in Azure Event Hubs)
  • A Stream Consumer: In this case, I assume I'm connecting Kinesis to a Lambda via a "consumer group"

I'll appreciate any guidance, explanation and finally, documentation

Regards Jona

4 Answers
1

Hi

AWS Kinesis is definitively "better" than Azure Event Hubs depends on your specific needs and priorities.

AWS Kinesis Advantages:

Simple Setup & Integration: Kinesis is a fully managed service, making it easier to set up and use, especially for those already invested in the AWS ecosystem.

**Seamless Integration with Other AWS Services: **Kinesis integrates well with other AWS services like Lambda, S3, and Redshift, simplifying data pipelines.

Cost-Effectiveness for Simple Use Cases: For basic data ingestion and triggering Lambda functions, Kinesis can be a cost-effective option.

AWS Kinesis Data Streams vs. Azure Event Hubs: Both are message queues for ingesting real-time data streams. They offer similar functionalities like scaling, durability, and low latency. However, Event Hubs supports more protocols (AMQP, Kafka) for broader integration.

AWS Kinesis Firehose vs. Azure Stream Analytics: Here's where the analogy gets a bit stretched. Firehose is a data delivery service, taking data from Kinesis and pushing it to other AWS destinations like S3, Redshift, or Elasticsearch. It doesn't process the data itself. Azure Stream Analytics, on the other hand, is a true stream processor, allowing real-time transformations and aggregations on your data stream.

AWS Kinesis Data Analytics vs. Azure Stream Analytics: Both are stream processing engines for real-time data analysis. They let you write code (SQL-like or custom) to manipulate data as it flows in. Kinesis Data Analytics offers serverless execution, while Stream Analytics provides more control over infrastructure.

Please follow the links for more info: https://aws.amazon.com/kinesis/data-streams/

https://aws.amazon.com/firehose/

https://docs.aws.amazon.com/kinesisanalytics/latest/dev/what-is.html

https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/sam-property-function-kinesis.html

EXPERT
answered a year ago
EXPERT
reviewed a year ago
  • I realized that you edited tour answer. Very good answer, however, I just mentioned Azure products not to be compared to AWS offering, but to give an example to finally get to know the role of each Kinesis product (specifically, the role StreamConsumer inside a Data Stream)

    I didn't mentioned Azure to figure out which cloud provider was "better" in Data processing products

    Regards

1

Here are some key points to help you understand the differences between AWS Kinesis services and how they can be used:

AWS Kinesis Data Streams vs Azure Event Hubs:
    Both are message queuing services that can ingest and store real-time data streams.
    Kinesis Data Streams is the AWS equivalent of Azure Event Hubs.
    Key differences include pricing models, scalability, and integration with other AWS services.

AWS Kinesis Firehose vs Azure Stream Analytics:
    Kinesis Firehose is a fully managed service that can collect, transform, and load streaming data into AWS destinations like S3, Redshift, or Elasticsearch.
    Azure Stream Analytics is a real-time analytics service that can process and analyze streaming data from various sources.
    Kinesis Firehose focuses more on data ingestion and delivery, while Azure Stream Analytics emphasizes real-time stream processing.

AWS Kinesis Data Analytics vs Azure Stream Analytics:
    Both services provide capabilities for real-time stream processing and analytics.
    Kinesis Data Analytics is optimized for stateful stream processing using Apache Flink, while Azure Stream Analytics has a broader set of built-in analytics capabilities.
    The choice between the two depends on your specific requirements, such as language support, processing model, and integration with other services.

Kinesis Data Stream and Lambda Integration:
    When you associate a Lambda function with a Kinesis Data Stream, the Lambda function is automatically invoked whenever new records are added to the stream.
    There is no concept of "default consumer group" in Kinesis Data Streams. Instead, the Lambda function acts as the consumer, processing the records from the stream.
    The integration between Kinesis Data Streams and Lambda is handled by the AWS service, and you don't need to manage a separate consumer group.

For more detailed and up-to-date information, please refer to the AWS Documentation:

Amazon Kinesis Data Streams

Amazon Kinesis Data Firehose

Amazon Kinesis Data Analytics

Integrating AWS Lambda with Amazon Kinesis Data Streams

AWS
answered a year ago
0

Thanks, so my last question ..

What is the role of a StreamConsumer component in a integration with Lambda? Just to put as a example:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Python Kinesis Consumer

Parameters:
  KinesisStreamName:
    Type: String

Resources:
  KinesisOrdersStream:
    Type: AWS::Kinesis::Stream
    Properties:
      Name: !Ref KinesisStreamName
      ShardCount: 1

  OrdersEfoConsumerMapping:
    Type: AWS::Kinesis::StreamConsumer
    Properties:
      ConsumerName: orders-efo-consumer
      StreamARN: !GetAtt KinesisOrdersStream.Arn

  OrdersConsumerFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: orders_consumer/
      Handler: app.lambda_handler
      Runtime: python3.11
      Timeout: 30
      MemorySize: 128
      FunctionName: consumer-01
      Events:
        StreamRecordsBatch:
          Type: Kinesis
          Properties:
            Stream: !GetAtt KinesisOrdersStream.Arn
            BatchSize: 20
            MaximumBatchingWindowInSeconds: 45
            StartingPosition: TRIM_HORIZON
            MaximumRetryAttempts: 3
            Enabled: true
            BisectBatchOnFunctionError: true

  OrdersEfoConsumerFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: orders_efoconsumer/
      Handler: app.lambda_handler
      Runtime: python3.11
      Timeout: 30
      MemorySize: 128
      FunctionName: consumer-02
      Events:
        StreamRecordsBatch:
          Type: Kinesis
          Properties:
            Stream: !GetAtt OrdersEfoConsumerMapping.ConsumerARN
            BatchSize: 20
            MaximumBatchingWindowInSeconds: 45
            StartingPosition: LATEST
            MaximumRetryAttempts: 3
            Enabled: true
            BisectBatchOnFunctionError: true

As you see in this SAM App, one Lambda is connected to Data Stream, an the other to a Stream Consumer (created inside the first Data Stream)... Beyond pricing models... ¿What is the difference to connect my Lambda to a Data Stream vs a Stream Consumer?

I know there are better consumption capacity using a Stream Consumer, but I need to know the role of this component beyong pricing, capacities, etc.

Regards. Jona

answered a year ago
  • In the context of AWS Kinesis Data Streams, there is no explicit "StreamConsumer" component. Instead, the AWS Lambda function acts as the consumer, processing the records from the Kinesis Data Stream.

    When you associate a Lambda function with a Kinesis Data Stream, the Lambda function is automatically invoked whenever new records are added to the stream. This integration is handled by the AWS service, and you don't need to manage a separate consumer group.

    The Lambda function processes the records from the Kinesis Data Stream, performing any necessary transformations or processing logic. This allows you to build a completely serverless data streaming pipeline, where the data ingestion, processing, and storage are all managed by AWS services.

    The integration between Kinesis Data Streams and Lambda is seamless, and you don't need to explicitly create or manage a "StreamConsumer" component. The AWS service handles the coordination between the data stream and the Lambda function.

  • Thanks for your response, but it actually doesn't answer the question about the role of StreamConsumer inside a Data Stream.

    In the example above, when creating the StreamConsumer component (AWS::Kinesis::StreamConsumer, so it actually exists an explicit StreamConsumer componen inside Kinesis).

    When deployed the above SAM App, it appears a component on the column "Consumers with enhanced fan-out" in the list of Data Streams. Again, despide its capacity, pricing model, etc... I need to understand the role in the context of a real time orchestation of this component (aka, StreamConsumer or Consumers with enhanced fan-out)

    Regards Jona

0

Just walking around here to ask if somebody can enlightme about the role of the StreamConsumer component of Data Stream

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.