How to limit Athena CloudWatch query by time

0

Hi!

I have several thousand CloudWatch log files accumulated over the years and I have set up an Athena connection to query them.

Q: Is there a way to limit the query to - say the last month - so that it won't scan all the tables.

I.e. to somehow call "WHERE [table].time > [X]" or "WHERE [table] LIKE '2024/01/%'"?

Snapshot of the Athena tables

All I have found is this syntax which takes ages as it scans all the tables:

SELECT * 
FROM "lambda:cloudwatch_connector_lambda_name"."log_group_path"."all_log_streams" 

Yours sincerely, Joni

J_N__
질문됨 3달 전249회 조회
1개 답변
0

Add a filter on the date field in your data. For example if your records have a date field, you can filter like:

WHERE date > date_sub(current_date, interval '1' month) Or you can: Partition your data by date at the S3 level. Athena can then use the partitions to only scan relevant data. For example if you partitioned by year/month, it would only scan the latest month's partition. OR else: If your data is in CloudWatch Logs, you can export logs for the desired time range to S3 and query from there. This allows Athena to leverage the date partitions.

aws logs put-export-task
--log-group-name /my/log/group
--from 1234567890
--to 2345678901
--destination /my/s3/bucket/prefix

profile picture
전문가
답변함 3달 전
  • Let me clarify: The table name AND the table metadata 'time' field already hold the date information, so is it possible to filter the query based solely on either one of those, without looking inside the tables. There must be a reason the metadata is there, mustn't there?

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠