Kinesis Data Stream to Kinesis Data Firehose

Question

Hi,

I have seen a lot of examples where data records are sent from KDS to KDF even there is NO real-time processing is required. Why can't we ingest data directly to KDF and store the records in data lake... use Redshift may be for analytics just as an example. Please see the below link for architecture example - I don't understand the need for KDS here

https://aws.amazon.com/blogs/business-intelligence/how-medhosts-cardiac-risk-prediction-successfully-leveraged-aws-analytic-services/

Thank you

Accepted Answer

The use of Amazon Kinesis Data Streams (KDS) in the architecture described in the blog post is likely due to the following reasons:

Real-time Processing: While the blog post does not explicitly mention real-time processing requirements, the use of KDS suggests that there may be a need for low-latency, real-time processing of the data. KDS is designed to handle real-time streaming data and can be integrated with other AWS services like AWS Lambda for event-driven processing.

Decoupling Data Ingestion and Analytics: By using KDS as an intermediary between the data source and the analytics pipeline, the architecture decouples the data ingestion and the data processing/analytics components. This allows for more flexibility and scalability, as the data ingestion and analytics can be scaled independently based on the workload.

Exactly-Once Delivery: KDS provides exactly-once delivery semantics, which ensures that each record is processed exactly once, even in the face of failures or retries. This can be important for certain use cases where data integrity and consistency are critical.

Durability and Scalability: KDS provides durable storage and scalable throughput, which can be important for handling large volumes of streaming data. This can help ensure that the data is not lost and can be processed at the required scale.

Instead of directly ingesting the data into Amazon Kinesis Data Firehose (KDF) or Amazon Redshift, the architecture in the blog post uses KDS as an intermediary layer. This approach can provide the following benefits:

Flexibility: By using KDS, the architecture can easily integrate with other real-time processing or analytics services, such as AWS Lambda, Amazon Kinesis Data Analytics, or Amazon Kinesis Data Firehose.
    Scalability: KDS can handle high-throughput, real-time data streams, which may be difficult to achieve with a direct ingestion into KDF or Redshift.
    Reliability: KDS provides durability and fault tolerance, ensuring that the data is not lost in the event of failures or other issues.

However, it's important to note that the specific use case and requirements of the application will ultimately determine the most appropriate architecture. The decision to use KDS, KDF, or Redshift (or a combination of these services) should be based on factors such as the volume and velocity of the data, the need for real-time processing, the desired level of data durability and reliability, and the overall cost and operational complexity.

Kinesis Data Stream to Kinesis Data Firehose

Relevant content