Table id found in `stl_load_errors` doesn't correspond to loaded table

0

I use a COPY query to load data from S3 to a Redshift table t1.

99% of the time there's no errors, data is correctly loaded. So I know there's no mistake on the loaded table name.

But when there are errors, depending on the table I loaded, I can't always correctly track which error happened during loading of this specific table because, for some loaded tables like t1, the table identifier tbl found in stl_load_errors doesn't correspond to the one expected for table t1.

The loaded table name doesn't appear in this table, instead, we get an integer identifier 1234567 in the column tbl. We have to join with another table stv_tbl_perm that contains both name and identifier, to get the name. This "trick" is shown in the Redshift documentation.

For some tables, like t1, the error row I may found in stl_load_errors after using COPY t1 ... will have an identifier 1234567 in tbl column that doesn't correspond to anything found in stv_tbl_perm. As if it was the identifier of a temporary table.

Then, when I look in stv_tbl_perm for"name"='t1', I do find an identifier 10111213 in column id, but it's not the one I saw in stl_load_errors.

What makes it even more confusing is that my method of filtering errors on file and table, works perfectly fine for some Redshift tables, the id found is the one expected, it matches with the correct name in stv_tbl_perm.

Why would my COPY to a table go through what seems like a temporary table for some table, and not for others?

Any idea to be able to link the errors to the right table even when the id doesn't match?

You can find code to (maybe) reproduce my error on a similar post I made on Stackoverflow (which allows modern formatting) 3 days ago and that didn't get any solution yet: https://stackoverflow.com/questions/55480847/amazon-redshift-how-to-associate-stl-load-errors-row-with-the-correct-table-n

I'm posting here in the hope of finding more Redshift specialists!

asked 4 years ago161 views
8 Answers
0
Accepted Answer

OK, thank you for bringing this to our attention. We've identified the source of this issue and will investigate a fix. When it is fixed a note will to appear in our regular maintenance announcements at the top of the forum.

This is related to the automatic DISTSTYLE feature we introduced recently. For DISTSTYLE AUTO tables that are currently ALL, we load new data into a temp table and then check if the new rows added require the table to be converted from ALL to EVEN.
https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html

DISTSTYLE { AUTO | EVEN | KEY | ALL }
The default is AUTO. …
AUTO: Amazon Redshift assigns an optimal distribution style based on the table data. For example, if AUTO distribution style is specified, Amazon Redshift initially assigns ALL distribution to a small table, then changes the table to EVEN distribution when the table grows larger. The change in distribution occurs in the background, in a few seconds. …

There are 2 ways work around this:

  1. Define an explicit DISTSTYLE of EVEN, KEY, or ALL on the table being loaded. I recommend this option primarily for tables that can use a KEY diststyle. If you are not using KEY then the AUTO diststyle will be the most efficient.
  2. Retrieve the query ID from the stl_load_errors, retrieve the SQL from stl_querytext for the query, and then look up the table by name from pg_class, svv_table_info, or stv_tbl_perm.
answered 4 years ago
0

Just to respond quickly, here answering only this one specific question;

Why would my COPY to a table go through what seems like a temporary table for some table, and not for others?

It could concievaby depend on the amount of data. The incoming data is sorted with respect to itself before being appended to the table. If the data is small, it could be done in memory - if the data is large, perhaps a temporary table is needed.

This however is a complete guess and with zero knowledge of how sorting is handled internally, so it's 99.9% likely to be wrong.

Toebs2
answered 4 years ago
0

Thanks for your answer.
That could have happened for some big loads I made with a couple millions of lines, but it also happens with the extremely small load of 3 lines I gave as an example in the link, so I don't understand. :/

answered 4 years ago
0

Do you mean that weird unexpected ID would come from the intermediate analysis Redshift does to determine compression?

If I add encoding to all the columns of my output table it's not going to happen?

I'm going to try with encoding and see if I can reproduce the issue.

Edited by: NicolasPA on Apr 16, 2019 3:05 AM

answered 4 years ago
0

So I have tried:

  • keeping table columns without encoding in the CREATE TABLE and using COPY ... COMPUPDATE OFF
  • adding columns encoding in the CREATE TABLE and using COPY ... without compupdate off.
  • adding columns encoding in the CREATE TABLE and using COPY ... COMPUPDATE OFF

And none of these ideas changed anything, the table id found stl_load_errors still doesn't corresponding to the one expected for the table I loaded.

Have I understood your comment correctly?

Thanks for your help.

Edited by: NicolasPA on Apr 16, 2019 6:59 AM

answered 4 years ago
0

Please send me a PM with the name and region of your cluster and the query you ran to identify the load errors.

answered 4 years ago
0

When loading data (COPY, INSERT, or CTAS) into new tables Redshift loads a sample of the data and performs a compression analysis. It then applies the selected compression encodings to the table and completes the data load.

If desired, you can avoid this step in INSERTs by specifying compression encodings manually for all columns (remember to specify RAW for the first sort key column). You can avoid this in COPY by specifying the COMPUPDATE OFF parameter, although I strongly recommend against this unless you have carefully specified compression for all columns in advance.

answered 4 years ago
0

Woo!

Specifying a DISTSTYLE in my little reproduction does solve the problem and I'll apply the same solution to my production tables because they seem to fit with the EVEN style (hundreds of millions of rows, no joins).

I have also tested your method with the querying of the stl_querytext table, and it works too in my reproduction case, but I don't find it very clean nor effective to parse a string to find the table name inside, so I'll stick with DISTSTYLE EVEN in my current cases.

Thanks a lot Joe!

Quoted you in an answer on my Stackoverflow post: https://stackoverflow.com/a/55762845/9599601

Edited by: NicolasPA on Apr 19, 2019 3:48 PM

answered 4 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions