Spectrum - Invalid S3 arn provided

Question

Hey,  
  
I am trying to query a spectrum table from parquet S3 files using Spectrum but I receive this error   
  
ERROR: error getting AWS credentials: Invalid S3 arn provided Detail: ----------------------------------------------- error: error getting AWS credentials: Invalid S3 arn provided code: 9002 context: Listing bucket=xxx-octank-xxx-data prefix= query: 13312 location: scan_range_manager.cpp:695 process: padbmaster pid=5372 -----------------------------------------------  
  
I am pretty sure that IAM roles are fine and also the query editor is working as I manage to create tables.  
I cannot find any similar errors around the web.  
Somehow it's internally fetching the S3 arn and it's getting an invalid one. This is the table:  
  
create external table spectrum.telemetry(  
timestamp varchar(128),  
tripid varchar(128),  
vin varchar(128),  
name varchar(128),  
value varchar(128),  
latitude decimal(8,8),  
longitude decimal(8,8))  
stored as PARQUET  
location 's3://xxx-octank-xxx-data/';  
  
I am using lake formation as well but I already passed and solved some errors on the GRANTs on lake formation side.  
I will keep checking but in case you have some ideas let me know  
  
Thanks  
Fabio  
  
Edited by: fbotton on May 14, 2021 3:22 AM

Answer

Hi fbotton,  
  
This response assumes you're external schema is using the Glue Data Catalog.  
  
If you've already declared the external schema and external table then it should all exist in the Glue Data Catalog.  If so, one thing you could try to isolate where the issue is would be to try to query the same external table from Athena.  Athena should be able to see a Glue Data Catalog "database", here thanks to Hive a "database" and "schema" are really the same thing.  In the Glue Data Catalog "database" you should be able to see the same Redshift external table you declared and query it from Athena.  If you get a similar error in Athena then I'd look to the Glue Data Catalog table LOCATION property to see what may be wrong with the S3 location that table data is located in.    
  
I'll bet this will help you find the problem.  
  
Regards,  
-Kurt

Answer

Hey,  
  
managed to fix the issue following the path from Kurt.  
  
Initially in my case what I did was the following:  
1. Used a crawler to create a table in glue from an existing S3  
2. Created a new database and schema/table from Redshift to point to the same S3 location  
3. Tried to query the table with the error  
  
Clearly I missed that the database to query from Spectrum was the one created with the crawler, so I basically deleted the old table and schema in Redshift and created a new schema pointing to the database of the table already existing in Glue.  
In this way I did not had to create the table in Spectrum since Redshift was in this case already capable of reaching the one in the catalog.  
  
I did query this new table using this newly created schema and this time it worked fine.  
  
Anyone reading feel free to reach out  
  
Fabio  
  
Edited by: fbotton on May 18, 2021 3:39 AM

Spectrum - Invalid S3 arn provided

Relevant content