Error when importing data from S3 to RDS PostgreSQL

0

Hello,

we are trying aws_s3.table_import_from_s3() extension to load data into RDS PostgreSQL (v13.7) directly from an S3 bucket. To speed up the process, we are opening multiple (=30) connections to RDS (r6g.8xlarge instance type) and run the imports in parallel, for hundreds of CSV files (8 GB each) stored in our S3 bucket. This has generally worked quite well for a couple of weeks during our trial period. Unfortunately, yesterday we faced an error. From PostgreSQL logs:

2023-05-08 13:12:53 UTC:10.0.1.115(27341):postgres@foodb:[12800]:LOG: Unable to complete the COPY from s3. CURL error code 18 with message transfer closed with 6681718036 bytes remaining to read.

Which then naturally leads to the following error:

2023-05-08 13:12:53 UTC:10.0.1.115(27341):postgres@foodb:[12800]:STATEMENT: SELECT aws_s3.table_import_from_s3( 'mytable', 'col1,col2,col3,col4,col5', '(format csv, header true, escape '''')', 'my-bucket','my-prefix/my-file.csv', 'us-east-1')

2023-05-08 13:12:53 UTC:10.0.1.115(27341):postgres@foodb:[12800]:ERROR: missing data for column "col2"

Curl error code 18 indicates a transfer closed with outstanding read data remaining. This error occurs when the remote server or website closes the connection before all the data has been transferred to the client.

What could be the underlying issue? Is it performance / throttling related? Is there anything I can do to make sure this doesn't happen again?

EDIT 2023-05-17: Since then, I've had multiple queries running aws_s3.table_import_from_s3() hang for hours, even days, without really progressing. I can see in pg_stat_activity the queries are listed as "active", but they are just stuck. They even cannot be terminated using pg_terminate_backend() or pg_cancel_backend(). I reduced the parallel sessions to half (15) and tried with objects of smaller size (~500MB), neither of these helped. Are there any recommended best practices regarding the use of aws_s3 extension? Unfortunately it doesn't seem very reliable to me...

1 Answer
0

I have similar issue. Did you ever found a solution?

answered 8 months ago
  • Hi, unfortunately no, I gave up on aws_s3 extension. I developed my own solution using psql's \copy command, running inside an ECS task (one task per csv file) and the whole thing being coordinated by AWS Step Functions.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions