Spectrum - Invalid S3 arn provided

0

Hey,

I am trying to query a spectrum table from parquet S3 files using Spectrum but I receive this error

ERROR: error getting AWS credentials: Invalid S3 arn provided Detail: ----------------------------------------------- error: error getting AWS credentials: Invalid S3 arn provided code: 9002 context: Listing bucket=xxx-octank-xxx-data prefix= query: 13312 location: scan_range_manager.cpp:695 process: padbmaster pid=5372 -----------------------------------------------

I am pretty sure that IAM roles are fine and also the query editor is working as I manage to create tables.
I cannot find any similar errors around the web.
Somehow it's internally fetching the S3 arn and it's getting an invalid one. This is the table:

create external table spectrum.telemetry(
timestamp varchar(128),
tripid varchar(128),
vin varchar(128),
name varchar(128),
value varchar(128),
latitude decimal(8,8),
longitude decimal(8,8))
stored as PARQUET
location 's3://xxx-octank-xxx-data/';

I am using lake formation as well but I already passed and solved some errors on the GRANTs on lake formation side.
I will keep checking but in case you have some ideas let me know

Thanks
Fabio

Edited by: fbotton on May 14, 2021 3:22 AM

fbotton
질문됨 3년 전407회 조회
2개 답변
0

Hi fbotton,

This response assumes you're external schema is using the Glue Data Catalog.

If you've already declared the external schema and external table then it should all exist in the Glue Data Catalog. If so, one thing you could try to isolate where the issue is would be to try to query the same external table from Athena. Athena should be able to see a Glue Data Catalog "database", here thanks to Hive a "database" and "schema" are really the same thing. In the Glue Data Catalog "database" you should be able to see the same Redshift external table you declared and query it from Athena. If you get a similar error in Athena then I'd look to the Glue Data Catalog table LOCATION property to see what may be wrong with the S3 location that table data is located in.

I'll bet this will help you find the problem.

Regards,
-Kurt

klarson
답변함 3년 전
0

Hey,

managed to fix the issue following the path from Kurt.

Initially in my case what I did was the following:

  1. Used a crawler to create a table in glue from an existing S3
  2. Created a new database and schema/table from Redshift to point to the same S3 location
  3. Tried to query the table with the error

Clearly I missed that the database to query from Spectrum was the one created with the crawler, so I basically deleted the old table and schema in Redshift and created a new schema pointing to the database of the table already existing in Glue.
In this way I did not had to create the table in Spectrum since Redshift was in this case already capable of reaching the one in the catalog.

I did query this new table using this newly created schema and this time it worked fine.

Anyone reading feel free to reach out

Fabio

Edited by: fbotton on May 18, 2021 3:39 AM

fbotton
답변함 3년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠