- Newest
- Most votes
- Most comments
The issue you're facing is due to the fact that the JSON data is being escaped during the UNLOAD operation from Athena. When you use the format as json option in the UNLOAD command, it outputs the JSON data as a string, which includes escaping the quotes in the JSON structure.
When this data is then loaded into Redshift using the COPY command, it's being treated as a string, not as a JSON object. That's why you're seeing the escaped characters in your SUPER column.
To resolve this, you can modify your UNLOAD command to output the data as CSV, and then use a JSON parsing function in Redshift to convert the string to a JSON object after it's been loaded.
Here's how you can modify your UNLOAD command:
UNLOAD ( SELECT curpVerification::varchar AS curpVerification ,idOcrConfidence::varchar AS idOcrConfidence ,"$path" as s3_file_name ,cast(dt as date) dt ,localtimestamp AS last_synced_time FROM table WHERE dt = '2024-06-26' AND "$path" like 's3://xxxxx' ) TO 's3://xxxxxx/test/' WITH (format = 'csv')
And then, after loading the data into Redshift, you can use the PARSE_JSON function to convert the string to a JSON object:
UPDATE schema.table SET curpVerification = PARSE_JSON(curpVerification), idOcrConfidence = PARSE_JSON(idOcrConfidence)
This should give you the desired result in your SUPER columns. Please replace schema.table with your actual schema and table name. Hope this helps! Let me know if you have any other questions.
Hi! that solution could be ok from small tables, but running that on big tables doesn't sound like a optimal solution, which format/parameters could I use to UNLOAD from Athema and load unescaped fields on Redshift? Thanks!

Is your Athena accessing data from s3? Why unload from it before you copy into redshift? U can just query the Athena table directly using spectrum no?