HIVE_UNKNOWN_ERROR: Path missing in file system location: s3://...

0

Hello, im new to aws and im studying data engineering na dim facing a problem.

Ive created a parquet file that is on my bucket. I also created a database that contains this parquet file.

When i try to run my query on Athena i got this error message: HIVE_UNKNOWN_ERROR: Path missing in file system location: s3://[NAME-OF-MY-BUCKET]

I did the process with csv data and everything went smoothly.

I dont know if i messed up on the lambda function, the crawler, or settings somewhere else. Can anyone help me??

  • Did you solve the issue?

Gustavo
asked 2 months ago279 views
4 Answers
0

Hello, Did you crawl the data to Glue data catalog? If yes, can you see the database and tables listed on Athena?

AWS
answered 2 months ago
  • Yes and Yes I have Database and Tables listed on glue and athena

0

Based on the additional details you provided, a few more things to check that could be causing the "path missing" error when querying Parquet data in Athena:

  • Verify that the Parquet files were successfully written to the S3 location and exist there. Use the AWS S3 console or AWS CLI to confirm the files are present.
  • Before you run your first query, you need to set up a query result location in Amazon S3.
  • Check the table information by following this - Show Table Information
  • Check that the table DDL statement in Athena is pointing to the correct S3 location with the Parquet files.
  • Check that the DDL specifies the Parquet format. For example: STORED AS PARQUET - Querying Data Stored as Parquet
  • Make sure the data types in the DDL match what is in the Parquet file schema. Mismatches can cause issues reading the file.
AWS
answered 2 months ago
  • Steps 1,2,3 are ok

    now with 4,5,6 Im not sure, as i said im new to this

    I ran the query: SHOW CREATE TABLE cleaned_statistics_reference_data;

    This is the result: CREATE EXTERNAL TABLE cleaned_statistics_reference_data( kind string, etag string, id string, snippet_channelid string, snippet_title string, snippet_assignable boolean) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3://BUCKETNAME' TBLPROPERTIES ( 'classification'='parquet', 'compressionType'='snappy', 'projection.enabled'='false', 'typeOfData'='file')

    Seems like its not stored correctly, so the problem is my lambda function? how do i change it?

0

If your data is already in S3, then a glue crawler should suffice. It will have the tables created in your glue catalog that you can query directly in Athena. You don't need Lambda here unless you're using it for some other processing. Crawler-Tutorial

After you have created a table in Athena, its name displays in the Tables list on the left. To show information about the table and manage it, choose the vertical three dots next to the table name in the Athena console.

Preview table – Shows the first 10 rows of all columns by running the SELECT * FROM "database_name"."table_name" LIMIT 10 statement in the Athena query editor. If this works fine, then you should be able to query your data.

Also, could you share what are you trying to do with the Lambda function? You don't need to share the code, just what are you using it for.

AWS
answered 2 months ago
0

Also if you want to update the s3 location in an existing table then you can use Alter table command.

ALTER TABLE tablename SET LOCATION 's3://test-bucket/testdata/

AWS
answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions