HIVE_FILESYSTEM_ERROR: Input path does not exist: when querying a table created by glue crawler using Athena. Possibly due to missing slash in s3 path.

0

Hello. I am running into the following error when querying a delta lake table that was built using a glue crawler. The s3 path has been modified to hide sensitive info. Notice how there is a slash missing between the name of the glue catalog table and the partition. I believe this is the cause of the error, but I do not think this is coming from our end or something we can fix on our end: "HIVE_FILESYSTEM_ERROR: Input path does not exist: s3://bucket_name_here/glue_catalog_db_name_here/table_namepartition/additional_partition/part_name.snappy.parquet This query ran against the "db_name" database, unless qualified by the query."

anUser
asked 10 days ago36 views
3 Answers
0

Hello, I understand that you are querying a delta lake table in Athena which is created using a Glue crawler and facing the below error:

"HIVE_FILESYSTEM_ERROR: Input path does not exist: s3://bucket_name_here/glue_catalog_db_name_here/table_namepartition/additional_partition/part_name.snappy.parquet This query ran against the "db_name" database, unless qualified by the query."



This issue occurs when the S3 Path is incorrect or the object being referenced in the path does not exist.

Since you are suspecting that the issue might me due to the slash(/) missing between table_name and partition then please check on the below:

Are the table_name and partition different folder in S3? If yes then they should have a slash(/) in between the 2 folders.

Next generate the table DDL using the below query:


SHOW CREATE TABLE table_name;


On running this query you can see the table DDL and then check the location and see if the S3 path is correctly defined or not, i.e the S3 path where the files containing the data are stored. If the S3 path is not correctly defined then check S3 path you have used to run the crawler and ensure the correct path is defined.

In case the path defined is correct then this issue might occur when the object being access in the above location does not exist. Please check on the same as well.

Also sharing the below link on how to Crawl Delta Lake tables using AWS Glue crawlers and query them in Athena:

https://aws.amazon.com/blogs/big-data/crawl-delta-lake-tables-using-aws-glue-crawlers/

SUPPORT ENGINEER
answered 9 days ago
0

Thanks for responding.

I ran the query again today, not having changed anything on our, end and it worked. The issue spontaneously resolved without any configuration changes from our side.

Thank you for sharing the crawler resource, however it's safe to say that the crawler was configured correctly since we did not change anything about the crawler or the s3 paths and the problem went away. I think if there was an issue with the s3 path used in the crawler definition, the crawler would not have been able to run successfully and create the table in the first place.

"Are the table_name and partition different folder in S3? If yes then they should have a slash(/) in between the 2 folders." --- Yes the table name and partition are different folders in s3, and there should definitely be a slash there, however Athena did not put one in, at least not in the error message. That is why I pointed it out when reporting the issue.

"Next generate the table DDL using the below query:
SHOW CREATE TABLE table_name;" - That was one of the first things I did and there was nothing abnormal about the output.

We do have some vacuum jobs that run in the delta lake so it is possible that the file has been deleted and glue did not register the change yet, however I would still expect the delta table name and partition name to be separated by a slash because they are definitely in different folders in s3.

anUser
answered 9 days ago
0

Thank you for your response.



I understand that you ran the query again and the issue seem to have spontaneously resolved without any configuration changes from your side.

Since nothing was changed hence yes the crawler was configured correctly and since the table name and partition are 2 different folders so they will be having a slash(/) in between in the S3 path. 

Seems that this was an intermittent issue where the S3 path was not read correctly and was missing the slash(/) in between the 2 folders. If you face this issue again or frequently then I would request you to open an AWS support case from the account you are facing this issue in so that the support team can check on this issue by checking the resources in the backend and see why the particular behaviour is occurring.

Creating support cases and case management: https://docs.aws.amazon.com/awssupport/latest/user/case-management.html

SUPPORT ENGINEER
answered 8 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions