How to limit Athena CloudWatch query by time

0

Hi!

I have several thousand CloudWatch log files accumulated over the years and I have set up an Athena connection to query them.

Q: Is there a way to limit the query to - say the last month - so that it won't scan all the tables.

I.e. to somehow call "WHERE [table].time > [X]" or "WHERE [table] LIKE '2024/01/%'"?

Snapshot of the Athena tables

All I have found is this syntax which takes ages as it scans all the tables:

SELECT * 
FROM "lambda:cloudwatch_connector_lambda_name"."log_group_path"."all_log_streams" 

Yours sincerely, Joni

J_N__
已提问 3 个月前250 查看次数
1 回答
0

Add a filter on the date field in your data. For example if your records have a date field, you can filter like:

WHERE date > date_sub(current_date, interval '1' month) Or you can: Partition your data by date at the S3 level. Athena can then use the partitions to only scan relevant data. For example if you partitioned by year/month, it would only scan the latest month's partition. OR else: If your data is in CloudWatch Logs, you can export logs for the desired time range to S3 and query from there. This allows Athena to leverage the date partitions.

aws logs put-export-task
--log-group-name /my/log/group
--from 1234567890
--to 2345678901
--destination /my/s3/bucket/prefix

profile picture
专家
已回答 3 个月前
  • Let me clarify: The table name AND the table metadata 'time' field already hold the date information, so is it possible to filter the query based solely on either one of those, without looking inside the tables. There must be a reason the metadata is there, mustn't there?

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则