- Newest
- Most votes
- Most comments
The issue you're experiencing is likely related to how Python 3.12+ (including your Python 3.13 runtime) handles Unicode characters differently compared to earlier versions.
In Python 3.12 and later runtimes, Lambda functions return Unicode characters directly as part of their JSON response, whereas earlier Python runtimes (like those you were using with psycopg2) returned escaped sequences for Unicode characters. For example, a Unicode string like "こんにちは" would be returned as-is in Python 3.12+, but would be escaped as "\u3053\u3093\u306b\u3061\u306f" in earlier versions.
When using the COPY command in Redshift, there are specific requirements for handling UTF-8 characters:
- The COPY command only supports UTF-8 characters that are up to 4 bytes long.
- CHAR columns only accept single-byte UTF-8 characters (up to byte value 127).
- VARCHAR columns accept multibyte UTF-8 characters up to a maximum of four bytes.
To resolve this issue, you have several options:
- Use the ACCEPTINVCHARS option in your COPY command to replace invalid characters with a specified character. For example:
COPY table_name FROM 's3://bucket/file' ACCEPTINVCHARS AS '^';
This will replace any invalid UTF-8 characters with the ^ character.
-
Modify your Lambda function to manually escape Unicode characters before passing them to the COPY command.
-
Ensure your input data only contains valid UTF-8 characters that are supported by Redshift.
After implementing the COPY command, you can check for any character replacements that occurred during the operation by querying:
SELECT query, session, filename, line_number, colname FROM stl_replacements WHERE query = <Your_Query_ID>;
This will help you identify which characters were replaced and where they occurred in your data.
Sources
Troubleshoot S3 ingestion into Redshift | AWS re:Post
Multibyte character load errors - Amazon Redshift
Troubleshooting Redshift Copy and Unload query performance and errors using system logs and views. | AWS re:Post
Tutorial: Loading data from Amazon S3 - Amazon Redshift
Building Lambda functions with Python - AWS Lambda
Relevant content
- asked a year ago
