How to limit Athena CloudWatch query by time

0

Hi!

I have several thousand CloudWatch log files accumulated over the years and I have set up an Athena connection to query them.

Q: Is there a way to limit the query to - say the last month - so that it won't scan all the tables.

I.e. to somehow call "WHERE [table].time > [X]" or "WHERE [table] LIKE '2024/01/%'"?

Snapshot of the Athena tables

All I have found is this syntax which takes ages as it scans all the tables:

SELECT * 
FROM "lambda:cloudwatch_connector_lambda_name"."log_group_path"."all_log_streams" 

Yours sincerely, Joni

J_N__
asked 3 months ago232 views
1 Answer
0

Add a filter on the date field in your data. For example if your records have a date field, you can filter like:

WHERE date > date_sub(current_date, interval '1' month) Or you can: Partition your data by date at the S3 level. Athena can then use the partitions to only scan relevant data. For example if you partitioned by year/month, it would only scan the latest month's partition. OR else: If your data is in CloudWatch Logs, you can export logs for the desired time range to S3 and query from there. This allows Athena to leverage the date partitions.

aws logs put-export-task
--log-group-name /my/log/group
--from 1234567890
--to 2345678901
--destination /my/s3/bucket/prefix

profile picture
EXPERT
answered 3 months ago
  • Let me clarify: The table name AND the table metadata 'time' field already hold the date information, so is it possible to filter the query based solely on either one of those, without looking inside the tables. There must be a reason the metadata is there, mustn't there?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions