How do I troubleshoot data loading errors while using the COPY command in Amazon Redshift?

4 minutos de lectura
0

I tried to use the COPY command to load a flat file. However, I'm experiencing data loading issues or errors in Amazon Redshift. How do I troubleshoot this?

Short description

Use the STL_LOAD_ERRORS table to identify any data loading errors that occur during a flat file load. The STL_LOAD_ERRORS table can help you track the progress of a data load, recording any failures or errors along the way. After you troubleshoot the identified issue, reload the data in the flat file while using the COPY command.

Tip: If you're using the COPY command to load a flat file in Parquet format, you can also use the SVL_S3LOG table. The SVL_S3LOG table can be used to identify any data loading errors.

Resolution

Note: The following steps use an example dataset of cities and venues.

1.    Check the data in your sample flat file to confirm that the source data is valid.

For example:

7|BMO Field|Toronto|ON|0
16|TD Garden|Boston|MA|0
23|The Palace of Auburn Hills|Auburn Hills|MI|0
28|American Airlines Arena|Miami|FL|0
37|Staples Center|Los Angeles|CA|0
42|FedExForum|Memphis|TN|0
52|PNC Arena|Raleigh|NC  ,25   |0
59|Scotiabank Saddledome|Calgary|AB|0
66|SAP Center|San Jose|CA|0
73|Heinz Field|Pittsburgh|PA|65050

In this example demo.txt file, five fields are used, separated by a pipe character. For more information, see Load LISTING from a pipe-delimited file (default delimiter).

2.    Open the Amazon Redshift console.

3.    Create a sample table using the following DDL:

CREATE TABLE VENUE1(
VENUEID SMALLINT,
VENUENAME VARCHAR(100),
VENUECITY VARCHAR(30),
VENUESTATE CHAR(2),
VENUESEATS INTEGER
) DISTSTYLE EVEN;

4.    Create a view to preview the relevant columns from the STL_LOAD_ERRORS table:

create view loadview as
(select distinct tbl, trim(name) as table_name, query, starttime,
trim(filename) as input, line_number, colname, err_code,
trim(err_reason) as reason
from stl_load_errors sl, stv_tbl_perm sp
where sl.tbl = sp.id);

This view can help you identify the cause of the data loading error.

5.    Use the COPY command to load the data:

copy Demo
from 's3://your_S3_bucket/venue/'
iam_role 'arn:aws:iam::123456789012:role/redshiftcopyfroms3'
delimiter '|' ;

Note: Replace your_S3_bucket with the name of your S3 bucket. Then, replace arn:aws:iam::123456789012:role/redshiftcopyfroms3 with the Amazon Resource Name (ARN) for your AWS Identity and Access Management (IAM) role. This IAM role must be able to access data from your S3 bucket. For more information, see Parameters.

6.    Query the load view to display and review the error load details of the table:

testdb=# select * from loadview where table_name='venue1';
tbl | 265190
table_name | venue1
query | 5790
starttime | 2017-07-03 11:54:22.864584
input | s3://
your_S3_bucket/venue/venue_pipe0000_part_00
line_number | 7
colname | venuestate
err_code | 1204
reason | Char length exceeds DDL length

In this example, the exception is caused by the length value, which must be added to the venuestate column. The (NC ,25 |) value is longer than the length defined in the VENUESTATE CHAR(2) DDL.

You can resolve this exception in two different ways:

  • If the data is expected to exceed the defined length of the column, then review and update the table definition to modify the column length.
    -or-
  • If the data isn't properly formatted or transformed, then modify the data in file to use the correct value.

The output from this query includes the following important information:

  • The file causing the error.
  • The column causing the error.
  • The line number in the input file.
  • The reason for the exception.

7.    Modify the data in your load file to use the correct values (the length must align with the defined column length):

7|BMO Field|Toronto|ON|0
16|TD Garden|Boston|MA|0
23|The Palace of Auburn Hills|Auburn Hills|MI|0
28|American Airlines Arena|Miami|FL|0
37|Staples Center|Los Angeles|CA|0
42|FedExForum|Memphis|TN|0
52|PNC Arena|Raleigh|NC|0
59|Scotiabank Saddledome|Calgary|AB|0
66|SAP Center|San Jose|CA|0
73|Heinz Field|Pittsburgh|PA|65050

8.    Reload the data load:

testdb=# copy Demo
from 's3://your_S3_bucket/sales/'
iam_role 'arn:aws:iam::123456789012:role/redshiftcopyfroms3' delimiter '|' ;
INFO:  Load into table 'venue1' completed, 808 record(s) loaded successfully.

Note: The STL_LOAD_ERRORS table can hold only a limited amount of logs (typically for around 4 to 5 days). Also, standard users can view only their own data when querying the STL_LOAD_ERRORS table. To view all the table data, you must be a superuser.


Related information

Amazon Redshift best practices for designing tables

Amazon Redshift best practices for loading data

System tables for troubleshooting data loads

Working with recommendations from Amazon Redshift Advisor

OFICIAL DE AWS
OFICIAL DE AWSActualizada hace 2 años