KDA (Flink) to S3

0

A customer is building a streaming solution that processes messages from their customers. Data in the stream is sent from multiple customers - each JSON in the stream has a customerId that tells you what customer originally sent the data.

They want to store a copy of this data on S3, but partitioned by customerId. We could do this with multiple Firehoses (one for each customer). Is this possible to do with Flink - so use Flink to pull from the Kinesis stream and send to multiple destinations on S3, into files based on customer Id? I'm guessing this is possible - is it easy and can we use any existing libraries?

Thanks

AWS
Nick
gefragt vor 4 Jahren708 Aufrufe
1 Antwort
0
Akzeptierte Antwort

This post describes how you can use KDA/Flink for exactly that use case: https://aws.amazon.com/blogs/big-data/streaming-etl-with-apache-flink-and-amazon-kinesis-data-analytics/. The section "Persisting data in Amazon S3 with data partitioning" describes how to realize data partitioning and you can find the sources on GitHub: https://github.com/aws-samples/amazon-kinesis-analytics-streaming-etl.

beantwortet vor 4 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen