- 最新
- 最多得票
- 最多評論
You need to create partitions, this can be achieved by writing a simple AWS Lambda code.
Event based serverless architecture: S3 Access Log Bucket (PUT event notification) --> SQS --> Lambda in batch mode of 10 (Parses file name and perform S3 copy while adding partition prefix)
For example: Input: s3://my-access-logs/2022-02-15-11-11-18-6E829E27FAAA289A Output:
- Option-1 s3://my-access-logs/pt_date=2022-02-15/2022-02-15-11-11-18-6E829E27FAAA289A
- Option-2 s3://my-access-logs/year=2022/month=02/day=15/hour=11/2022-02-15-11-11-18-6E829E27FAAA289A
Additionally, you can have aggressive s3 "Lifecycle Policy" to delete the original logs in 1 day to save on storage costs.
Once this is done, you can register Athena table with HIVE partitions and query it seamlessly.
Alternatively, there is this blog which utilized Glue to perform this in a batch mode - https://aws.amazon.com/blogs/big-data/easily-query-aws-service-logs-using-amazon-athena/
This is very helpful - thank you!
相關內容
- 已提問 1 年前
- 已提問 6 個月前
- AWS 官方已更新 2 年前
- AWS 官方已更新 2 年前
- AWS 官方已更新 1 年前
Athena needs partitions to be by folder. It cannot partition individual files by their name.