Skip to content

How do I resolve data incompatibility errors in Redshift Spectrum?

6 minute read
0

I get an error message when I try to use an external schema, object, or file format in Amazon Redshift Spectrum.

Resolution

Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.

Incompatible data format error

To resolve your incompatible data format error in Redshift Spectrum, complete the following steps:

  1. To get the complete error message from the SVL_S3LOG system view, run the following query:

    select * from SVL_S3LOG where query = 'Spectrum_query_ID';

    Note: Replace Spectrum_query_ID with the Redshift Spectrum ID.
    If you have an incompatible Parquet schema, then you might get the following error message:
    "File 'https://s3bucket/location/file has an incompatible Parquet schema for column 's3://s3bucket/location.col1'. Column type: VARCHAR, Par..."

  2. Check the Message column to view the error description. The error description explains the data incompatibility between Redshift Spectrum and the external file.

  3. To check the schema of your external file, run the following command:

    show external table schema-name.table-name

    Note: Replace schema-name with your schema's name and table-name with your table's name.

  4. Compare the schema with the column definition in the CREATE EXTERNAL TABLE definition.

  5. (Optional) If the column definition in the Apache Parquet file is different than the external table, then modify the column definition in the external table. The column definition must match the columnar file format of the Apache Parquet file.

  6. To check the data type of the column in the CREATE EXTERNAL TABLE definition, run the following SVV_EXTERNAL_COLUMNS query:

    select * from SVV_EXTERNAL_COLUMNS where schemaname = 'ext_schema_name' and tablename = 'ext_table_name';

    Note: Replace ext_schema_name with your external column schema name and ext_table_name with your external column table name.

    For columnar file formats such as Apache Parquet, the column type is embedded with the data. The column type in the CREATE EXTERNAL TABLE definition must match the column type of the data file. Mismatched column definitions result in a data incompatibility error.

Invalid type length error

If you select an Redshift Spectrum table with DECIMAL data columns, then you might get the following error message:

"S3 Query Exception (Fetch). Task failed due to an internal error. File 'https://s3.amazonaws.com/…/' has an incompatible Parquet schema for column ''column '' has an invalid type length"

To resolve this error, use an external table definition. The table definition values must match the precision and scale values that you define in the external file.

Example:

create external table ext_schema.tablename (c1 int, c2 decimal (6,2)) stored as PARQUET location 's3://.../.../'; 

Note: Replace 6,2 with your precision and scale values.

Internal error

If you select an external schema from an Amazon Athena catalog, then you might receive the following error message:

"Task failed due to an internal error. File 'https://s3...snappy.parquet has an incompatible Parquet schema for column 's3://.../tbl.a'. Column type: BOOLEAN, Parquet schema:\noptional int32 b [i:26 d:1 r:0]"

In Redshift Spectrum, the column order in the CREATE EXTERNAL TABLE must match the Parquet file field order. For Apache Parquet files, files must have the same field order as in the external table definition. If you skip this order or rearrange a data type column, then you receive an error message.

Note: Redshift Spectrum matches column names to Apache Parquet file fields. Amazon Athena uses the column names to map to fields in the Apache Parquet file.

To resolve this error, specify the following column names in the SELECT statement:

select col_1, col_2, col_3, .... col_n from athena_schema.tablename;

Make sure that the AWS Identity and Access Management (IAM) role allows access to Amazon Athena. For more information, see IAM policies for Amazon Redshift Spectrum.

Invalid column type error

If you use Redshift Spectrum to query VARCHAR data type columns from an AWS Glue Data Catalog table, then you might receive the following error message:

“<column_name> - Invalid column type for column <column_name>. Type: varchar"” 

Both AWS Glue and Redshift Spectrum support the VARCHAR data type. However, the VARCHAR data type defined by AWS Glue Catalog doesn't include a size parameter, such as VARCHAR (256). When Redshift Spectrum queries a VARCHAR column defined without a size parameter, then you might get an invalid column type error message.

To resolve this error, complete the following steps:

  1. To retrieve and store the AWS Glue table data in a local file, run the following get-table AWS CLI command:
    aws glue get-table --region your-region --database gluedb --name click_data_json > click-data-table.json
    Note: Replace your-region with your AWS Region.
  2. Use a text editor to open the click-data-table.json file and then remove the outer {"Table": ...} envelope.
  3. Remove fields that aren't allowed in the UpdateTable action.
  4. Modify the STRING column types to varchar with the updated size parameter. For example:
    "Type": "varchar(1000)"
    Note: Replace 1000 with your size parameter.
  5. To update your AWS Glue table, run the following update-table AWS CLI command:
    aws glue update-table --region your-region --database gluedb --table-input "$(cat click-data-table.json)"
    Note: Replace your-region with your Region.
  6. Check your table definition in AWS Glue and verify that you updated the data types.
  7. Query the AWS Glue table for the external schema in Amazon Redshift. For example:
    create external schema glue_schema from data catalog database 'gluedb' iam_role 'arn:aws:iam::111111111111:role/myRedshiftRole' region 'your-region';
    Note: Replace your-region with your Region.
  8. To query click_data_json, run the following command:
    select * from glue_schema.click_data_json;

Invalid range error

If Redshift Spectrum query tries to read a file from Amazon Simple Storage Service (Amazon S3) that has been modified, overwritten, or deleted, then you might get the following error message:

"Error: Spectrum Scan Error. Error: HTTP response error code: 416 Message: InvalidRange The requested range is not satisfiable"

To resolve this error, make sure that you don't overwrite Amazon S3 files when you query with Redshift Spectrum.

Invalid Parquet version number

Redshift Spectrum verifies the metadata of each Apache Parquet file that it accesses. If the verification fails, then you might get an error message similar to the following:

"File 'https://s3.region.amazonaws.com/s3bucket/location/file has an invalid version number"

If the Parquet file was overwritten during the query or if the Parquet file is corrupt, then the verification might fail.

The verification failure occurs when you update existing files. It's a best practice to create a new parquet file to have version control.

Related information

Troubleshooting queries in Amazon Redshift Spectrum

AWS OFFICIALUpdated a year ago
1 Comment

This article was reviewed and updated on 2026-04-06.

AWS
MODERATOR
replied 11 days ago