Step Function: Edit map output

0

I have a step function which iterates over several data, at the end of the map I got a single json with all iterations and then uploads as json file to S3 bucket. This file on S3 I will need to use it later to run a Copy command on redshift, but I am having the issue that the output file comes as an array and redshift does not recognize the format For example this is the input for the step function:

[
    {
      "data": { ... }
    },
    {
      "data": { ... }
},
    {
      "data": { ... }
}
]

Then I include some data and this is to output being uploaded to S3

[
    {
      "executiontime":"....", 
       "id":"...", 
      "data": { ... }
    },
    {
        "executiontime":"....", 
       "id":"...", 
      "data": { ... }
},
    {
       "executiontime":"....", 
       "id":"...", 
      "data": { ... }
}
]

However for redshift I need an output like this, without the square brackets and the comma between items, I have tried different ways but I always get the same output as an array

    {
      "executiontime":"....", 
       "id":"...", 
      "data": { ... }
    }
    {
        "executiontime":"....", 
       "id":"...", 
      "data": { ... }
}
    {
       "executiontime":"....", 
       "id":"...", 
      "data": { ... }
}
kimar
asked a month ago273 views
1 Answer
1

If your data is in the format of a JSON array, as shown in your example, you'll need to transform it into a line-delimited JSON format (JSONL) before using the COPY command with the json 'auto' option in Redshift. This is because Redshift expects each line in the file to be a separate JSON object.

To transform your JSON array into a line-delimited JSON format, you can use a text editor or a command-line tool like jq. Here's how you can do it using jq:

  1. Save your JSON array to a file, for example, input.json.

  2. Run the following command to convert the JSON array into a line-delimited JSON format:

    jq -c '.[]' input.json > output.jsonl

The transformed output should be in the following JSON Lines (JSONL) format:

{"executiontime":"....", "id":"...", "data": {...}}
{"executiontime":"....", "id":"...", "data": {...}}
{"executiontime":"....", "id":"...", "data": {...}}
  1. Use the output.jsonl file with the Redshift COPY command:

    COPY your_table_name
    FROM 's3://your-bucket-name/output.jsonl'
    IAM_ROLE 'your-iam-role'
    JSON 'auto';

In this example, jq -c '.[]' input.json reads the JSON array from input.json, and for each element in the array, it outputs a compact JSON representation (-c flag) on a new line. The result is redirected to output.jsonl, which is then used in the Redshift COPY command.

You can integrate the following steps into your Step Function to automatically generate the output.jsonl file

profile picture
EXPERT
answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions