Importing json that has a nested array

0

My json looks like this:

{
"userid" : "10",
"actions" :
[
{"type":"flip", "timestamp":"5pm"},
{"type":"jump", "timestamp":"6pm"}
]
}

I want to bring that into a table in redshift that has columns "userid, type, timestamp"
Do I need to reformat the file into flat lines or is this possible with a standard copy command?

質問済み 5年前1219ビュー
3回答
0
承認された回答

It is not currently possible to import such data using COPY. However, you can define the data as a Spectrum external table and use our nested data support to bring the data in. https://docs.aws.amazon.com/redshift/latest/dg/tutorial-query-nested-data.html

CREATE EXTERNAL TABLE spectrum.nested_example
     ( userid     int
     , actions    array<struct<type:varchar(20), timestamp:varchar(20)>>
     ) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://benchmark-files/temp/nested_data/nested_example/'
;
SELECT u.userid
      ,a.type
      ,a.timestamp
FROM spectrum.nested_example u
INNER JOIN c.actions a ON true
;
回答済み 5年前
0

I'm guessing this degrades query performance significantly over having it stored flat?

回答済み 5年前
0

Hi meyerovb,

You could always take the SELECT against the external table that Joe gave you and put it into a CTAS statement to materialize the data in local Redshift storage if you're concerned about the unnest the columns at query time. It's really an alternate load pattern for Redshift and one that only need be executed once each time the external table data changes.

However, don't be too surprised when you find case where Spectrum query performance on external data beats Redshift local storage performance, especially when not joining big tables. This happens because Redshift Spectrum parallelism is up to 10x wider than Redshift slice parallelism. So even though the data is read from S3 instead of local disk there are 10x more scan workers to get the heavy lifting at the access operator level of the query plan done faster, assuming there are enough splits in S3 to feed them all relatively evenly.

I hope this helps you at least a little,
-Kurt

klarson
回答済み 5年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ