KDA (Flink) to S3


A customer is building a streaming solution that processes messages from their customers. Data in the stream is sent from multiple customers - each JSON in the stream has a customerId that tells you what customer originally sent the data.

They want to store a copy of this data on S3, but partitioned by customerId. We could do this with multiple Firehoses (one for each customer). Is this possible to do with Flink - so use Flink to pull from the Kinesis stream and send to multiple destinations on S3, into files based on customer Id? I'm guessing this is possible - is it easy and can we use any existing libraries?


asked 4 years ago629 views
1 Answer
Accepted Answer

This post describes how you can use KDA/Flink for exactly that use case: https://aws.amazon.com/blogs/big-data/streaming-etl-with-apache-flink-and-amazon-kinesis-data-analytics/. The section "Persisting data in Amazon S3 with data partitioning" describes how to realize data partitioning and you can find the sources on GitHub: https://github.com/aws-samples/amazon-kinesis-analytics-streaming-etl.

answered 4 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions