Skip to content

Why does it take so long to query the CloudTrail log in Athena?

0

Why does it take so long to query the CloudTrail log in Athena?

The query is running the following SQL. I think I could get only the necessary data quickly because I have a search condition in the where clause, but it takes about 2 minutes to finish.

select useridentity, eventsource,eventname,requestparameters from "cloudtrail_logs"
where region = 'us-east-1' and year = 2024 and month = 12 and eventname = 'CreateTags'
LIMIT 10;
asked a year ago187 views
2 Answers
0
Accepted Answer

Hello.

It is possible to add a partition later using "ALTER TABLE ADD PARTITION".
https://docs.aws.amazon.com/athena/latest/ug/alter-table-add-partition.html

Also, as re:Post Agent answered, using partition projection means you don't have to run "ALTER TABLE ADD PARTITION" every time.
Therefore, if your data is constantly increasing, such as CloudTrail logs, I recommend creating a table with partition projection.
https://docs.aws.amazon.com/athena/latest/ug/create-cloudtrail-table-partition-projection.html

EXPERT
answered a year ago
EXPERT
reviewed a year ago
  • OK. I'll try it

  • Thank you. When I added a partition, the query speed improved.

0

Querying CloudTrail logs in Athena can take a long time for several reasons, even when you have search conditions in the WHERE clause. In your case, there are a few factors that could be contributing to the slow query performance:

  1. Data volume: CloudTrail logs can grow significantly over time, especially for busy AWS accounts. Even with partitioning by year and month, you might still be scanning a large amount of data.

  2. Partitioning: While you're using partitions for year and month, which is good, you might benefit from more granular partitioning (e.g., by day) to further reduce the amount of data scanned.

  3. File format and compression: The format and compression of your CloudTrail log files can impact query performance. Optimizing these can help speed up queries.

  4. Partition projection: You're not using partition projection, which can help Athena dynamically calculate partition values and reduce query runtime.

To improve the performance of your query, consider the following optimizations:

  1. Use partition projection: This allows Athena to dynamically calculate partition values, reducing query runtime and eliminating the need to manually manage partitions.

  2. Compress your log files: This can significantly reduce the amount of data Athena needs to scan.

  3. Optimize file sizes: Aim for file sizes between 128MB-1GB for better query performance.

  4. Use more specific partitioning: Consider partitioning by day instead of just year and month.

  5. Include a timestamp filter: Add a condition on the eventtime column to further narrow down the data being scanned.

  6. Use CTAS (Create Table As Select) queries: These can improve query performance and reduce costs by creating new tables with optimized data layouts.

By implementing these optimizations, you should be able to significantly reduce your query runtime. If you continue to experience long query times or timeout errors, you may need to consider increasing your query runtime limit or using Athena's EXPLAIN and EXPLAIN ANALYZE statements to further optimize your query.
Sources
Resolve Athena “Query Timeout” errors | AWS re:Post
Resolve the Athena error "query exhausted resources" | AWS re:Post

answered a year ago
EXPERT
reviewed a year ago
  • Thank you. Do I need to recreate the table to create a partition?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.