Spectrum Scan Error - Unable to create parquet column scanner

0

Hi, I am trying to load a simple 13 columns table from s3 parquet to Redshift table. Not using spectrum or external tables etc, but looks like reading parquet is spectrum in the background anyway. I am getting same error on other tables too. I am keep getting this error below, re-created cluster, rebooted as well but error is still there. Could someone tell me where can i find the root cause of this error? I am sure iam role is picked up, file is there, column count match etc as I fixed all those errors and got this last error that I can not understand the cause of it.

ERROR: Spectrum Scan Error Detail: ----------------------------------------------- 
error: Spectrum Scan Error 
code: 15001 
context: Unable to create parquet column scanner 
query: 525 
location: dory_util.cpp:1167 process: worker_thread [pid=27989] ----------------------------------------------- 
[ErrorId: 1-6204d783-12b35e0914588ba01e3c14fe]

copy command I use, which is simple

 COPY my_schema_name.mdvm					
FROM 's3://bucket/data/table/files/'							
IAM_ROLE 'arn:aws:iam::xxxxx:role/service-role/AmazonRedshift-xxxxxxxxx'							
FORMAT AS PARQUET;
질문됨 2년 전2341회 조회
1개 답변
0

It would be best to open a support ticket so the files can be investigated. I have seen instances where decimal value was encoded differently than Redshift expectation, encoded as binary (STRING) instead of fixed_len_byte_array(5) (DECIMAL(10,4)). If this is the case then you can convert the string values to decimal with following Pythin code

import pyarrow as pa
new_col = pa.chunked_array(pa.Array.from_pandas(table.column('col1').to_pandas().astype(float)).cast(pa.decimal128(10, 4)))
table = table.set_column(position, 'col1', new_col)

Hope this helps!

profile pictureAWS
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인