Skip to content

How do I use the SUPER data type in Amazon Redshift to handle and query JSON data?

4 minute read
0

I want to use the SUPER data type in Amazon Redshift to handle and query JSON data.

Short description

Use the SUPER data type to store semistructured data in your Amazon Redshift data warehouses. You can query the SUPER data type with the PartiQL language.

You can also use Amazon Redshift Spectrum for semistructured data support.

Resolution

You can use either the COPY command or the INSERT command with the JSON_PARSE function to load JSON into SUPER data. When you load large data, it's a best practice to use the COPY command.

Use the COPY command

Use the COPY command to load JSON from data files in Amazon Simple Storage Service (Amazon S3). For more information, see Loading semistructured data into Amazon Redshift.

To load JSON into a single SUPER data column, use the noshred option.

To load JSON into multiple columns, use the auto option or specify the jsonpaths file. When you use the auto option, COPY matches the top-level JSON attributes with column names and lets you load nested values as SUPER values. The following examples demonstrate how to use the auto option.

Example JSON data file:

{"r_regionkey":0,"r_name":"AFRICA","r_nations":[{"n_nationkey":11,"n_name":"ALGERIA"},{"n_nationkey":5,"n_name":"ETHIOPIA"},{"n_nationkey":14,"n_name":"KENYA"}]}
 {"r_regionkey":1,"r_name":"AMERICA","r_nations":[{"n_nationkey":1,"n_name":"ARGENTINA"},{"n_nationkey":2,"n_name":"BRAZIL"}]}
 {"r_regionkey":2,"r_name":"ASIA","r_nations":[{"n_nationkey":8,"n_name":"INDIA"}]}

Example COPY command:

# CREATE TABLE region_nations
 (
  r_regionkey smallint
  ,r_name varchar
  ,r_nations super
  );

# COPY region_nations 
  FROM 's3://<S3 path to the JSON data file>'
  IAM_ROLE '<IAM role>'
  FORMAT JSON 'auto';

# SELECT * FROM region_nations;
 r_regionkey | r_name  |                                                     r_nations                                                     
-------------+---------+-------------------------------------------------------------------------------------------------------------------
           0 | AFRICA  | [{"n_nationkey":11,"n_name":"ALGERIA"},{"n_nationkey":5,"n_name":"ETHIOPIA"},{"n_nationkey":14,"n_name":"KENYA"}]
           1 | AMERICA | [{"n_nationkey":1,"n_name":"ARGENTINA"},{"n_nationkey":2,"n_name":"BRAZIL"}]
           2 | ASIA    | [{"n_nationkey":8,"n_name":"INDIA"}]
(3 rows)

Load the JSON data in a text or .csv file to SUPER. Make sure that it's in a valid JSON format. Amazon Redshift uses standard escaping rules for .csv files.

Example .csv file:

r_regionkey,r_name,r_nations
0,AFRICA,"[{""n_nationkey"":11,""n_name"":""ALGERIA""},{""n_nationkey"":5,""n_name"":""ETHIOPIA""},{""n_nationkey"":14,""n_name"":""KENYA""}]"
1,AMERICA,"[{""n_nationkey"":1,""n_name"":""ARGENTINA""},{""n_nationkey"":2,""n_name"":""BRAZIL""}]"
2,ASIA,"[{""n_nationkey"":8,""n_name"":""INDIA""}]"

Example COPY command:

# CREATE TABLE region_nations
 (
  r_regionkey smallint
  ,r_name varchar
  ,r_nations super
  );

# COPY region_nations 
   FROM 's3://<S3 path to the CSV file>'
   IAM_ROLE '<IAM role>'
   FORMAT CSV
   IGNOREHEADER 1;

# SELECT * FROM region_nations;
 r_regionkey | r_name  |                                                     r_nations                                                     
-------------+---------+-------------------------------------------------------------------------------------------------------------------
           0 | AFRICA  | [{"n_nationkey":11,"n_name":"ALGERIA"},{"n_nationkey":5,"n_name":"ETHIOPIA"},{"n_nationkey":14,"n_name":"KENYA"}]
           1 | AMERICA | [{"n_nationkey":1,"n_name":"ARGENTINA"},{"n_nationkey":2,"n_name":"BRAZIL"}]
           2 | ASIA    | [{"n_nationkey":8,"n_name":"INDIA"}]
(3 rows)

Use the INSERT command and JSON_PARSE function

The JSON_PARSE function parses data in JSON format and converts it into the SUPER data type that you can use with the INSERT command.

Example command:

# CREATE TABLE region_nations
 (
  r_regionkey smallint
  ,r_name varchar
  ,r_nations super
  );

# INSERT INTO region_nations VALUES(0,'AFRICA',json_parse('[{"n_nationkey":11,"n_name":"ALGERIA"},{"n_nationkey":5,"n_name":"ETHIOPIA"},{"n_nationkey":14,"n_name":"KENYA"}]'));
# INSERT INTO region_nations VALUES(1,'AMERICA',json_parse('[{"n_nationkey":1,"n_name":"ARGENTINA"},{"n_nationkey":2,"n_name":"BRAZIL"}]'));
# INSERT INTO region_nations VALUES(2,'ASIA',json_parse('[{"n_nationkey":8,"n_name":"INDIA"}]'));

# SELECT * FROM region_nations;
 r_regionkey | r_name  |                                                     r_nations                                                     
-------------+---------+-------------------------------------------------------------------------------------------------------------------
           0 | AFRICA  | [{"n_nationkey":11,"n_name":"ALGERIA"},{"n_nationkey":5,"n_name":"ETHIOPIA"},{"n_nationkey":14,"n_name":"KENYA"}]
           1 | AMERICA | [{"n_nationkey":1,"n_name":"ARGENTINA"},{"n_nationkey":2,"n_name":"BRAZIL"}]
           2 | ASIA    | [{"n_nationkey":8,"n_name":"INDIA"}]
(3 rows)

Query SUPER data with PartiQL

Amazon Redshift uses the PartiQL language for SQL-compatible access to relational, semistructured and nested data. For more information, see Querying semistructured data.

Example query:

# SELECT * FROM region_nations;
 r_regionkey | r_name  |                                                     r_nations                                                     
-------------+---------+-------------------------------------------------------------------------------------------------------------------
           0 | AFRICA  | [{"n_nationkey":11,"n_name":"ALGERIA"},{"n_nationkey":5,"n_name":"ETHIOPIA"},{"n_nationkey":14,"n_name":"KENYA"}]
           1 | AMERICA | [{"n_nationkey":1,"n_name":"ARGENTINA"},{"n_nationkey":2,"n_name":"BRAZIL"}]
           2 | ASIA    | [{"n_nationkey":8,"n_name":"INDIA"}]
(3 rows)

# SELECT region_nations.r_nations[1].n_name FROM region_nations;
   n_name   
------------
 "ETHIOPIA"
 "BRAZIL"
 
(3 rows)

# SELECT rn.r_regionkey, rn.r_name, n FROM region_nations rn, rn.r_nations n;
 r_regionkey | r_name  |                   n                    
-------------+---------+----------------------------------------
           0 | AFRICA  | {"n_nationkey":11,"n_name":"ALGERIA"}
           0 | AFRICA  | {"n_nationkey":5,"n_name":"ETHIOPIA"}
           0 | AFRICA  | {"n_nationkey":14,"n_name":"KENYA"}
           1 | AMERICA | {"n_nationkey":1,"n_name":"ARGENTINA"}
           1 | AMERICA | {"n_nationkey":2,"n_name":"BRAZIL"}
           2 | ASIA    | {"n_nationkey":8,"n_name":"INDIA"}
(6 rows)

Unload SUPER data to a JSON file

You can use UNLOAD command to extract data from the SUPER data type and store it in S3 as a JSON document.

Example command:

# SELECT * FROM region_nations;
 r_regionkey | r_name  |                                                     r_nations                                                     
-------------+---------+-------------------------------------------------------------------------------------------------------------------
           0 | AFRICA  | [{"n_nationkey":11,"n_name":"ALGERIA"},{"n_nationkey":5,"n_name":"ETHIOPIA"},{"n_nationkey":14,"n_name":"KENYA"}]
           1 | AMERICA | [{"n_nationkey":1,"n_name":"ARGENTINA"},{"n_nationkey":2,"n_name":"BRAZIL"}]
           2 | ASIA    | [{"n_nationkey":8,"n_name":"INDIA"}]
(3 rows)

# UNLOAD ('SELECT * FROM region_nations')
  TO 's3://<S3 path>'
  IAM_ROLE '<IAM role>'
  FORMAT JSON;

Example JSON document:

{"r_regionkey":0,"r_name":"AFRICA","r_nations":[{"n_nationkey":11,"n_name":"ALGERIA"},{"n_nationkey":5,"n_name":"ETHIOPIA"},{"n_nationkey":14,"n_name":"KENYA"}]}
{"r_regionkey":1,"r_name":"AMERICA","r_nations":[{"n_nationkey":1,"n_name":"ARGENTINA"},{"n_nationkey":2,"n_name":"BRAZIL"}]}
{"r_regionkey":2,"r_name":"ASIA","r_nations":[{"n_nationkey":8,"n_name":"INDIA"}]}
AWS OFFICIALUpdated a year ago