How to limit Athena CloudWatch query by time

0

Hi!

I have several thousand CloudWatch log files accumulated over the years and I have set up an Athena connection to query them.

Q: Is there a way to limit the query to - say the last month - so that it won't scan all the tables.

I.e. to somehow call "WHERE [table].time > [X]" or "WHERE [table] LIKE '2024/01/%'"?

Snapshot of the Athena tables

All I have found is this syntax which takes ages as it scans all the tables:

SELECT * 
FROM "lambda:cloudwatch_connector_lambda_name"."log_group_path"."all_log_streams" 

Yours sincerely, Joni

J_N__
已提問 3 個月前檢視次數 249 次
1 個回答
0

Add a filter on the date field in your data. For example if your records have a date field, you can filter like:

WHERE date > date_sub(current_date, interval '1' month) Or you can: Partition your data by date at the S3 level. Athena can then use the partitions to only scan relevant data. For example if you partitioned by year/month, it would only scan the latest month's partition. OR else: If your data is in CloudWatch Logs, you can export logs for the desired time range to S3 and query from there. This allows Athena to leverage the date partitions.

aws logs put-export-task
--log-group-name /my/log/group
--from 1234567890
--to 2345678901
--destination /my/s3/bucket/prefix

profile picture
專家
已回答 3 個月前
  • Let me clarify: The table name AND the table metadata 'time' field already hold the date information, so is it possible to filter the query based solely on either one of those, without looking inside the tables. There must be a reason the metadata is there, mustn't there?

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南