- Newest
- Most votes
- Most comments
Hello, Did you crawl the data to Glue data catalog? If yes, can you see the database and tables listed on Athena?
Yes and Yes I have Database and Tables listed on glue and athena
Based on the additional details you provided, a few more things to check that could be causing the "path missing" error when querying Parquet data in Athena:
- Verify that the Parquet files were successfully written to the S3 location and exist there. Use the AWS S3 console or AWS CLI to confirm the files are present.
- Before you run your first query, you need to set up a query result location in Amazon S3.
- Check the table information by following this - Show Table Information
- Check that the table DDL statement in Athena is pointing to the correct S3 location with the Parquet files.
- Check that the DDL specifies the Parquet format. For example: STORED AS PARQUET - Querying Data Stored as Parquet
- Make sure the data types in the DDL match what is in the Parquet file schema. Mismatches can cause issues reading the file.
Steps 1,2,3 are ok
now with 4,5,6 Im not sure, as i said im new to this
I ran the query: SHOW CREATE TABLE
cleaned_statistics_reference_data
;This is the result: CREATE EXTERNAL TABLE
cleaned_statistics_reference_data
(kind
string,etag
string,id
string,snippet_channelid
string,snippet_title
string,snippet_assignable
boolean) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3://BUCKETNAME' TBLPROPERTIES ( 'classification'='parquet', 'compressionType'='snappy', 'projection.enabled'='false', 'typeOfData'='file')Seems like its not stored correctly, so the problem is my lambda function? how do i change it?
If your data is already in S3, then a glue crawler should suffice. It will have the tables created in your glue catalog that you can query directly in Athena. You don't need Lambda here unless you're using it for some other processing. Crawler-Tutorial
After you have created a table in Athena, its name displays in the Tables list on the left. To show information about the table and manage it, choose the vertical three dots next to the table name in the Athena console.
Preview table – Shows the first 10 rows of all columns by running the SELECT * FROM "database_name"."table_name" LIMIT 10 statement in the Athena query editor. If this works fine, then you should be able to query your data.
Also, could you share what are you trying to do with the Lambda function? You don't need to share the code, just what are you using it for.
Also if you want to update the s3 location in an existing table then you can use Alter table command.
ALTER TABLE tablename SET LOCATION 's3://test-bucket/testdata/
Relevant content
- asked 10 months ago
- asked 2 years ago
- AWS OFFICIALUpdated 6 months ago
- AWS OFFICIALUpdated 5 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 8 months ago
Did you solve the issue?