How do I troubleshoot Athena SELECT query issues for CloudTrail logs?

3 minute read
0

I ran a SELECT query on my Amazon Athena table for AWS CloudTrail logs. However, the query either runs slowly, didn't return results, or failed with a timeout error.

Resolution

Follow the troubleshooting steps for your use case.

Query returns no results

Incorrect Amazon S3 location path

Check the Amazon Simple Storage Service (Amazon S3) table location for your CloudTrail logs. If the input LOCATION path in your table is incorrect, then Athena doesn't return any records.

Partitions not loaded to table

If you used the manual partitioning CREATE TABLE statement to create your table, then verify that you loaded partitions to the Athena table. If you don't load partitions to the table, then Athena queries don't return results.

Use a SHOW PARTITIONS statement to view partitions that you already loaded to the table. Use the ALTER TABLE ADD PARTITION command to load the required partitions so that your query returns results.

Partitions incorrectly defined

If you used the manual partitioning CREATE TABLE statement to create your table, then check the timestamp property range. Make sure that the projection.timestamp.range property matches the partitions in your Amazon S3 bucket location.

Make sure that the SELECT query doesn't filter data on values beyond the range that you set in the projection.timestamp.range property.

Storage classes

Athena doesn't support data queries for S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage classes, and objects are ignored.

Check if the CloudTrail logs are in unsupported S3 storage classes. Restore the Amazon S3 archived objects that you want to query. Then, query the restored Amazon S3 objects.

Query takes a long time to run or fails with a timeout error

Nonpartitioned table

If you used the CloudTrail console to create an Athena table for CloudTrail logs, then a nonpartitioned table is created.

SELECT queries scan all log files in your Amazon S3 bucket with nonpartitioned tables. If the DML query timeout limit is exceeded, then queries might run slowly or time out.

To optimize query performance, it's a best practice to create a partitioned table. To reduce query time, specify a partition column in the WHERE clause so that Athena scans only data from the matched partitions.

Partitioned table query doesn't use the partition column to filter data

If the SELECT query doesn't use a partition column in the WHERE clause, then Athena scans all log files from the Amazon S3 bucket.

The following example query has a partition projection CloudTrail table that runs slowly because it scans all logs:

SELECT useridentity, sourceipaddress, eventtime
FROM cloudtrail_table
WHERE eventname = 'RunInstances'
AND eventtime >= '2023-10-30T00:00:00Z'
AND eventtime < '2023-10-30T00:04:00Z';

To reduce query time, specify a partition column in the WHERE clause so that Athena scans only data from the matched partitions.

The following example query has a WHERE clause on the partition column timestamp that scans only logs from the 2023/10/30 partition:

SELECT useridentity, sourceipaddress, eventtime
FROM cloudtrail_table
WHERE eventname = 'RunInstances'
AND eventtime >= '2023-10-30T00:00:00Z'
AND eventtime < '2023-10-30T00:04:00Z'
AND timestamp >= '2023/10/30'
AND timestamp < '2023/10/31'

For more information, see How do I troubleshoot timeout issues when I query CloudTrail data using Athena?

Related information

How do I automatically create tables in Amazon Athena to search through AWS CloudTrail logs?

Querying AWS CloudTrail logs

AWS OFFICIAL
AWS OFFICIALUpdated 3 months ago