Kinesis data ingestion

0

Hi AWS, there is a question:

A company is using a fleet of Amazon EC2 instances to ingest data from on-premises data sources. The data is in JSON format and ingestion rates can be as high as 1 MB/s. When an EC2 instance is rebooted, the data in-flight is lost. The company’s data science team wants to query ingested data in near-real time.

Which solution provides near-real-time data querying that is scalable with minimal data loss?

  1. Publish data to Amazon Kinesis Data Streams, Use Kinesis Data Analytics to query the data.
  2. Publish data to Amazon Kinesis Data Firehose with Amazon Redshift as the destination. Use Amazon Redshift to query the data.
  3. Store ingested data in an EC2 instance store. Publish data to Amazon Kinesis Data Firehose with Amazon S3 as the destination. Use Amazon Athena to query the data.
  4. Store ingested data in an Amazon Elastic Block Store (Amazon EBS) volume. Publish data to Amazon ElastiCache for Redis. Subscribe to the Redis channel to query the data.

Kinesis Data Streams for used for real-time ingestion whereas Kinesis Data Firehose is for near real-time so option (B) should be correct but the poll says option (A) is the right one because as per it "Redshift would lack real-time capabilities."

This is not true. Redshift could do real-time. Evidence: https://aws.amazon.com/blogs/big-data/real-time-analytics-with-amazon-redshift-streaming-ingestion/

Please suggest.

1개 답변
1

Hi.

Option A is actually correct. The question ask for minimal data loss and that query of data should be near real time, not the ingestion. Kinesis data analytics is near real time.

Recent changes to Redshift actually make B correct as well, but A is also correct.

profile picture
전문가
답변함 10달 전
profile picture
전문가
검토됨 10달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠