Kinesis Firehose Delivery Stream - S3 - JSON

0

I've got a client writing JSON to a Kinesis Stream with an S3 Firehose Destination. The resulting files have multiple JSON documents within (eg. {"data": "JSON Payload 1"}{"data": "JSON Payload 2"}{"data": "JSON Payload 3"} - which means it's not valid JSON and I can't easily rehyrdate it. Other than switching to something like a Lambda to specifically emit one file per document or switching to a format like YAML which would allow multiple documents per file is there a configuration within the delivery stream that will yield 1 file per document?

2回答
0

What you are referring to is the JSON lines format and is a valid format for storing JSON documents - https://jsonlines.org/

Have you tried using S3 Select - https://docs.aws.amazon.com/AmazonS3/latest/userguide/selecting-content-from-objects.html

It provides the ability to query JSON lines

You can try it from the Amazon S3 console as well as using SDKs.

The following page provides example code in Java and points to blogs for Javascript and Python examples - https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-select.html

profile pictureAWS
エキスパート
回答済み 2年前
profile pictureAWS
エキスパート
Uri
レビュー済み 2年前
0

Firehose adds end of line characters and each JSON message is stored in a separate line. Each individual file is a combination of mutiple JSON records. You could programmatically read the file and if you read it line by line, that should work in parsing each JSON object.

If each record were written into multiple files, we will create multiple small files and small files impede performance (too much latency) for data analytics - whether you are using Spark/Hadoop/Presto or Amazon Athena. The way to address this “small files” issue is via compaction – merging many small files into fewer larger ones. This is the most efficient use of compute time; the query engine spends much less time opening and closing files, and much more time reading file contents.

profile pictureAWS
回答済み 2年前
profile picture
エキスパート
レビュー済み 2ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ