Skip to content

Corrupt data landing in S3 via aws_s3.query_export_to_s3 extension

0

When using aws_s3.query_export_to_s3, on Aurora Postgres, version 11, data is reliably and predictably corrupted at certain positions of the payloads that land in s3.

It seems that when data falls over a certain byte length, it is reset and resent mid-stream to s3.

We have been able to reproduce this positively, as well as reproduce a negative as well; ensuring that data falls at powers of 2 on even byte lengths seems to prevent this issue from occurring.

As the length of data increases, as does the frequency of the corruption. For small data sets, it may never surface.

You will see upon running this SQL that it occurs once in 100,000 records with 128_9 byte length records, twice for 256_9, and 4 times for 512+9, and so-on, where each record looks like {"A":""}\n where \n is a new line control character, amounting to 9 extra bytes on top of the value in the JSON objects.

Has anyone else ever seen anything like this happening with S3?

Here's a link to the SQL file:https://gist.github.com/gordol/85226ac917de7984010c507591078792

asked 5 years ago601 views
2 Answers
0

To be clear, this happens every 10MB.

For example, in the 256 byte value variant, the 39569th line fails to fully send and truncates on the 240th byte.

The overhead for the JSON is 9 bytes for {"A": ""} where data goes in the quotes.

We can easily do the math on this, using the line numbers, plus the payload size, plus the json overhead.

((39568 * (256_9)) _ 240) / 1024 / 1024 = 10

Same for the 128byte variant... The 76539th row fails and data truncates on the 54th byte of the 76539th row.

((76538 * (128_9) ) _ 54) / 1024 / 1024 = 10

It's easily reproducible...

Edited by: GordoL on Nov 4, 2020 3:50 PM

answered 5 years ago
0

This is resolved by upgrading from Aurora Postgres 3.1.0/11.6 to 3.3/11.8. The issue is not present in the latest version

answered 5 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.