How to limit Athena CloudWatch query by time

0

Hi!

I have several thousand CloudWatch log files accumulated over the years and I have set up an Athena connection to query them.

Q: Is there a way to limit the query to - say the last month - so that it won't scan all the tables.

I.e. to somehow call "WHERE [table].time > [X]" or "WHERE [table] LIKE '2024/01/%'"?

Snapshot of the Athena tables

All I have found is this syntax which takes ages as it scans all the tables:

SELECT * 
FROM "lambda:cloudwatch_connector_lambda_name"."log_group_path"."all_log_streams" 

Yours sincerely, Joni

J_N__
質問済み 3ヶ月前249ビュー
1回答
0

Add a filter on the date field in your data. For example if your records have a date field, you can filter like:

WHERE date > date_sub(current_date, interval '1' month) Or you can: Partition your data by date at the S3 level. Athena can then use the partitions to only scan relevant data. For example if you partitioned by year/month, it would only scan the latest month's partition. OR else: If your data is in CloudWatch Logs, you can export logs for the desired time range to S3 and query from there. This allows Athena to leverage the date partitions.

aws logs put-export-task
--log-group-name /my/log/group
--from 1234567890
--to 2345678901
--destination /my/s3/bucket/prefix

profile picture
エキスパート
回答済み 3ヶ月前
  • Let me clarify: The table name AND the table metadata 'time' field already hold the date information, so is it possible to filter the query based solely on either one of those, without looking inside the tables. There must be a reason the metadata is there, mustn't there?

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ