KDA (Flink) to S3

0

A customer is building a streaming solution that processes messages from their customers. Data in the stream is sent from multiple customers - each JSON in the stream has a customerId that tells you what customer originally sent the data.

They want to store a copy of this data on S3, but partitioned by customerId. We could do this with multiple Firehoses (one for each customer). Is this possible to do with Flink - so use Flink to pull from the Kinesis stream and send to multiple destinations on S3, into files based on customer Id? I'm guessing this is possible - is it easy and can we use any existing libraries?

Thanks

AWS
Nick
demandé il y a 4 ans708 vues
1 réponse
0
Réponse acceptée

This post describes how you can use KDA/Flink for exactly that use case: https://aws.amazon.com/blogs/big-data/streaming-etl-with-apache-flink-and-amazon-kinesis-data-analytics/. The section "Persisting data in Amazon S3 with data partitioning" describes how to realize data partitioning and you can find the sources on GitHub: https://github.com/aws-samples/amazon-kinesis-analytics-streaming-etl.

répondu il y a 4 ans

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions