KDA (Flink) to S3

0

A customer is building a streaming solution that processes messages from their customers. Data in the stream is sent from multiple customers - each JSON in the stream has a customerId that tells you what customer originally sent the data.

They want to store a copy of this data on S3, but partitioned by customerId. We could do this with multiple Firehoses (one for each customer). Is this possible to do with Flink - so use Flink to pull from the Kinesis stream and send to multiple destinations on S3, into files based on customer Id? I'm guessing this is possible - is it easy and can we use any existing libraries?

Thanks

AWS
Nick
preguntada hace 4 años723 visualizaciones
1 Respuesta
0
Respuesta aceptada

This post describes how you can use KDA/Flink for exactly that use case: https://aws.amazon.com/blogs/big-data/streaming-etl-with-apache-flink-and-amazon-kinesis-data-analytics/. The section "Persisting data in Amazon S3 with data partitioning" describes how to realize data partitioning and you can find the sources on GitHub: https://github.com/aws-samples/amazon-kinesis-analytics-streaming-etl.

respondido hace 4 años

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas