- Más nuevo
- Más votos
- Más comentarios
You need to create partitions, this can be achieved by writing a simple AWS Lambda code.
Event based serverless architecture: S3 Access Log Bucket (PUT event notification) --> SQS --> Lambda in batch mode of 10 (Parses file name and perform S3 copy while adding partition prefix)
For example: Input: s3://my-access-logs/2022-02-15-11-11-18-6E829E27FAAA289A Output:
- Option-1 s3://my-access-logs/pt_date=2022-02-15/2022-02-15-11-11-18-6E829E27FAAA289A
- Option-2 s3://my-access-logs/year=2022/month=02/day=15/hour=11/2022-02-15-11-11-18-6E829E27FAAA289A
Additionally, you can have aggressive s3 "Lifecycle Policy" to delete the original logs in 1 day to save on storage costs.
Once this is done, you can register Athena table with HIVE partitions and query it seamlessly.
Alternatively, there is this blog which utilized Glue to perform this in a batch mode - https://aws.amazon.com/blogs/big-data/easily-query-aws-service-logs-using-amazon-athena/
This is very helpful - thank you!
Contenido relevante
- OFICIAL DE AWSActualizada hace 2 años
- OFICIAL DE AWSActualizada hace un año
- OFICIAL DE AWSActualizada hace 2 años
- OFICIAL DE AWSActualizada hace 2 años
Athena needs partitions to be by folder. It cannot partition individual files by their name.