KDA (Flink) to S3

Question

A customer is building a streaming solution that processes messages from their customers.  Data in the stream is sent from multiple customers - each JSON in the stream has a customerId that tells you what customer originally sent the data.

They want to store a copy of this data on S3, but partitioned by customerId.  We *could* do this with multiple Firehoses (one for each customer).  Is this possible to do with Flink - so use Flink to pull from the Kinesis stream and send to multiple destinations on S3, into files based on customer Id?  I'm guessing this is possible - is it easy and can we use any existing libraries?

Thanks

Accepted Answer

This post describes how you can use KDA/Flink for exactly that use case: https://aws.amazon.com/blogs/big-data/streaming-etl-with-apache-flink-and-amazon-kinesis-data-analytics/. The section "Persisting data in Amazon S3 with data partitioning" describes how to realize data partitioning and you can find the sources on GitHub: https://github.com/aws-samples/amazon-kinesis-analytics-streaming-etl.

KDA (Flink) to S3

相关内容