How to query large parquet file

0

Hi ,

We have a large parquet file which is stored in S3 storage. It contains the data for 6 months of logs or even more. As as service we need to run a query to fetch the user data from parquet file. It has to be done end-to-end using aws data apis.

I have explored the redshift but the challenge is data loading from s3 to redshift needs additional effort such as provisioning a cluster and copying data from S3 to redshift data base.

I have the following query.

  1. Redshift provisioning can be done using aws data APIs ?
  2. How to run the query to fetch user data from redshift database. Is any data apis available for this?
  3. Is there any better aws service is available other than redshift ?

Regards, Ashok

已提問 1 年前檢視次數 237 次
1 個回答
1

You should run a Glue crawler against the S3 location of your parquet dataset. Then you will be able to query the data using Athena. You can use the console for Athena or APIs to run your queries.

AWS
Don_D
已回答 1 年前
profile pictureAWS
專家
Chris_G
已審閱 1 年前
  • Thank you for reply. What are the challenge in using the redshift serverless.?

  • Redshift Serverless is an option, you could use it to query the data in s3 by running the crawler and then create an external schema. If you need better performance you could COPY it in the serverless endpoint. There are cost differences with Athena and Redshift Serverless that should be compared.

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南