Skip to content

Unicode error in lambda in redshift copy command

0

I'm using a copy command in my lambda function and my python runtime is 3.12 and I'm using psycopg3 for execute this copy command . It is working fine in psycopg2 when I changed to psycopg3 I'm getting this error codec not available in python Unicode. What is the reason for this issue and how to fix this issue

asked 9 months ago159 views
1 Answer
0

The issue you're experiencing is likely related to how Python 3.12+ (including your Python 3.13 runtime) handles Unicode characters differently compared to earlier versions.

In Python 3.12 and later runtimes, Lambda functions return Unicode characters directly as part of their JSON response, whereas earlier Python runtimes (like those you were using with psycopg2) returned escaped sequences for Unicode characters. For example, a Unicode string like "こんにちは" would be returned as-is in Python 3.12+, but would be escaped as "\u3053\u3093\u306b\u3061\u306f" in earlier versions.

When using the COPY command in Redshift, there are specific requirements for handling UTF-8 characters:

  1. The COPY command only supports UTF-8 characters that are up to 4 bytes long.
  2. CHAR columns only accept single-byte UTF-8 characters (up to byte value 127).
  3. VARCHAR columns accept multibyte UTF-8 characters up to a maximum of four bytes.

To resolve this issue, you have several options:

  1. Use the ACCEPTINVCHARS option in your COPY command to replace invalid characters with a specified character. For example:
COPY table_name FROM 's3://bucket/file' ACCEPTINVCHARS AS '^';

This will replace any invalid UTF-8 characters with the ^ character.

  1. Modify your Lambda function to manually escape Unicode characters before passing them to the COPY command.

  2. Ensure your input data only contains valid UTF-8 characters that are supported by Redshift.

After implementing the COPY command, you can check for any character replacements that occurred during the operation by querying:

SELECT query, session, filename, line_number, colname FROM stl_replacements WHERE query = <Your_Query_ID>;

This will help you identify which characters were replaced and where they occurred in your data.
Sources
Troubleshoot S3 ingestion into Redshift | AWS re:Post
Multibyte character load errors - Amazon Redshift
Troubleshooting Redshift Copy and Unload query performance and errors using system logs and views. | AWS re:Post
Tutorial: Loading data from Amazon S3 - Amazon Redshift
Building Lambda functions with Python - AWS Lambda

answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.