Step Function: Edit map output

0

I have a step function which iterates over several data, at the end of the map I got a single json with all iterations and then uploads as json file to S3 bucket. This file on S3 I will need to use it later to run a Copy command on redshift, but I am having the issue that the output file comes as an array and redshift does not recognize the format For example this is the input for the step function:

[
    {
      "data": { ... }
    },
    {
      "data": { ... }
},
    {
      "data": { ... }
}
]

Then I include some data and this is to output being uploaded to S3

[
    {
      "executiontime":"....", 
       "id":"...", 
      "data": { ... }
    },
    {
        "executiontime":"....", 
       "id":"...", 
      "data": { ... }
},
    {
       "executiontime":"....", 
       "id":"...", 
      "data": { ... }
}
]

However for redshift I need an output like this, without the square brackets and the comma between items, I have tried different ways but I always get the same output as an array

    {
      "executiontime":"....", 
       "id":"...", 
      "data": { ... }
    }
    {
        "executiontime":"....", 
       "id":"...", 
      "data": { ... }
}
    {
       "executiontime":"....", 
       "id":"...", 
      "data": { ... }
}
kimar
posta un mese fa298 visualizzazioni
1 Risposta
1

If your data is in the format of a JSON array, as shown in your example, you'll need to transform it into a line-delimited JSON format (JSONL) before using the COPY command with the json 'auto' option in Redshift. This is because Redshift expects each line in the file to be a separate JSON object.

To transform your JSON array into a line-delimited JSON format, you can use a text editor or a command-line tool like jq. Here's how you can do it using jq:

  1. Save your JSON array to a file, for example, input.json.

  2. Run the following command to convert the JSON array into a line-delimited JSON format:

    jq -c '.[]' input.json > output.jsonl

The transformed output should be in the following JSON Lines (JSONL) format:

{"executiontime":"....", "id":"...", "data": {...}}
{"executiontime":"....", "id":"...", "data": {...}}
{"executiontime":"....", "id":"...", "data": {...}}
  1. Use the output.jsonl file with the Redshift COPY command:

    COPY your_table_name
    FROM 's3://your-bucket-name/output.jsonl'
    IAM_ROLE 'your-iam-role'
    JSON 'auto';

In this example, jq -c '.[]' input.json reads the JSON array from input.json, and for each element in the array, it outputs a compact JSON representation (-c flag) on a new line. The result is redirected to output.jsonl, which is then used in the Redshift COPY command.

You can integrate the following steps into your Step Function to automatically generate the output.jsonl file

profile picture
ESPERTO
con risposta un mese fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande