Step Function: Edit map output

0

I have a step function which iterates over several data, at the end of the map I got a single json with all iterations and then uploads as json file to S3 bucket. This file on S3 I will need to use it later to run a Copy command on redshift, but I am having the issue that the output file comes as an array and redshift does not recognize the format For example this is the input for the step function:

[
    {
      "data": { ... }
    },
    {
      "data": { ... }
},
    {
      "data": { ... }
}
]

Then I include some data and this is to output being uploaded to S3

[
    {
      "executiontime":"....", 
       "id":"...", 
      "data": { ... }
    },
    {
        "executiontime":"....", 
       "id":"...", 
      "data": { ... }
},
    {
       "executiontime":"....", 
       "id":"...", 
      "data": { ... }
}
]

However for redshift I need an output like this, without the square brackets and the comma between items, I have tried different ways but I always get the same output as an array

    {
      "executiontime":"....", 
       "id":"...", 
      "data": { ... }
    }
    {
        "executiontime":"....", 
       "id":"...", 
      "data": { ... }
}
    {
       "executiontime":"....", 
       "id":"...", 
      "data": { ... }
}
kimar
질문됨 한 달 전297회 조회
1개 답변
1

If your data is in the format of a JSON array, as shown in your example, you'll need to transform it into a line-delimited JSON format (JSONL) before using the COPY command with the json 'auto' option in Redshift. This is because Redshift expects each line in the file to be a separate JSON object.

To transform your JSON array into a line-delimited JSON format, you can use a text editor or a command-line tool like jq. Here's how you can do it using jq:

  1. Save your JSON array to a file, for example, input.json.

  2. Run the following command to convert the JSON array into a line-delimited JSON format:

    jq -c '.[]' input.json > output.jsonl

The transformed output should be in the following JSON Lines (JSONL) format:

{"executiontime":"....", "id":"...", "data": {...}}
{"executiontime":"....", "id":"...", "data": {...}}
{"executiontime":"....", "id":"...", "data": {...}}
  1. Use the output.jsonl file with the Redshift COPY command:

    COPY your_table_name
    FROM 's3://your-bucket-name/output.jsonl'
    IAM_ROLE 'your-iam-role'
    JSON 'auto';

In this example, jq -c '.[]' input.json reads the JSON array from input.json, and for each element in the array, it outputs a compact JSON representation (-c flag) on a new line. The result is redirected to output.jsonl, which is then used in the Redshift COPY command.

You can integrate the following steps into your Step Function to automatically generate the output.jsonl file

profile picture
전문가
답변함 한 달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠